Explainable Machine Learning in Deployment

Explainable machine learning offers the potential to provide stakeholders with insights into model behavior by using various methods such as feature importance scores, counterfactual explanations, or influential training data. Yet there is little und…

Authors: Umang Bhatt, Alice Xiang, Shubham Sharma

Explainable Machine Learning in Deploy ment Umang Bhatt 1 , 2 , 3 , 4 , Alice Xiang 2 , Shubham Sharma 5 , Adrian W eller 3 , 4 , 6 , Ankur T aly 7 , Y unha n Jia 8 , Joy deep Ghosh 5 , 9 , Ruchir Puri 10 , José M. F . Moura 1 , Peter Eckersley 2 1 Carnegie Mellon Universi ty , 2 Partnership on AI, 3 Universi ty of Cambridge, 4 Lev erhulme CFI, 5 Universi ty of T exas at A ustin, 6 The Alan T uring Institute , 7 Fiddler Labs, 8 Baidu, 9 CognitiveScale , 10 IBM Research ABSTRA CT Explainable machine learning offers the potential to pro vide stake- holders with insights into model behavior by using various meth- ods such as feature importance scores, counterfactual explanations, or influential training dat a. Y et there is l ittle u nderstanding of how organizations use t hese methods in practice. This study explores how organizations view and use explainability for stakeholder con- sumption. W e find that, currently , the majority of depl oyments are not for end users affect ed by the model but rather for machine learning engineers, who use explainability to debug the model it- self. There is thus a gap between e xplainability in pract ice and the goal of tr anspare ncy , since explanations primarily serve internal stakeholders rather than external ones. Our study sy nthesizes the limitations of current explainability techniques that hamper t heir use for end users. T o facilitate end user interaction, we dev elop a framew ork for establishing clear goals for explainability . W e end by discussin g concerns raised regardin g explainability . CCS CONCEPTS • Hum an-centered computing ; • Social and professional top - ics → S ocio-te chnical syste ms ; • Computing methodologies → Machi ne learning ; KEYWORDS machine learning, explainability , t ransparen cy , deployed systems, qualitative study A CM Refere nce Format : Umang Bhatt 1 , 2 , 3 , 4 , Alice Xiang 2 , Shubham Sharma 5 , Adrian W eller 3 , 4 , 6 , Ankur T aly 7 , Y unhan Jia 8 , Joydeep Ghosh 5 , 9 , Ruchir Puri 10 , José M. F. Moura 1 , Peter Eckersley 2 . 2020. Explainabl e M achine Learning in Deployment. In Conference on Fairness, Accountability , and Transp arency (F A T* ’20), Jan- uary 27–30 , 2020, Barcelona, Spain. ACM, New Y ork, NY, USA, 10 pages. https://doi.org/10.1145/3351095.3375624 1 IN TRODUCTION Machine learning (ML) mo dels are b eing increasing ly embedded into many aspects of daily life, such as healthcare [16], finance [26], and social media [5]. T o build ML models worthy of human trust, resear chers have propo sed a variety of techniques for explainin g ML mo dels to stakeholders. Deemed “e xplainability , ” this bo dy of prev ious work attempts to illuminate t he reasoning u sed by ML Permission to make digital o r hard copies of part or all of this work for p ersonal or classroom u se is granted without fee provided that c opies are not ma de or distributed for profit or co mmercial advantage and that copies b ear this notice and the full citation on the first page. Copyrights for third-party components of this work must b e honored. For all other uses, contact the owner /author(s). F A T* ’20, January 27–30, 2020, Barcelona, Spain © 2020 Copyright held by the owner/author(s). A CM ISBN 978-1-4503-6936-7/20/02. https://doi.org/10.1145/3351095.3375624 models. “Ex plainability” loosely refers t o any te chnique that helps the user or dev eloper of ML mod els u nderstand why mo dels b e- have the way they do. Explanations can come in many forms: from telling patients which symptoms were indicative of a particular di- agnosis [35] to helping factor y workers analyze in efficiencie s in a production p ipeline [17]. Explainability has be en tou ted as a way to enhance transparency of M L mo dels [33]. Transpar ency includes a wide variety of efforts to provide stakeholders, particular l y end users, wit h rele vant infor- mation ab out how a model works [67]. One form of this woul d b e to publish an algorit hm’s cod e, though this type of t r anspare ncy would not provide an intelligible explanation to most users. An- other form would b e to disclose p roperties o f the t raining pro- cedure and datasets used [ 39]. Users, howe ver , are generally not equipp ed to be able to understand how raw dat a and co d e t rans- late into b enefits o r harms that might affect t hem individually . By providin g an explanation for how the model made a decision, ex- plainability techniques seek t o provide transparency directly t ar- geted to human u sers, often aiming to increase tru st worthiness [44]. The importance of e xplainability as a concept has been re- flected in legal and ethical guidelines fo r data and ML [53]. In cases of auto mated decision-making , Ar ticles 13-15 of the European Gen- eral Data Protection Regulation (GDPR) require that data subjects have access to “meaningful information about the l ogic involv ed, as well as the significance and the envisa ged consequ ences of such processing for the data subject” [ 45]. In addition, te chnology com- panies have released artificial intelligence (AI) principles that in- clude t ransparen cy as a core value, including notions of explain- ability , interpretability , or intelligibility [1, 2 ]. With growing intere st in “p eering under the hood ” of ML mod- els and in providing explanations to human users, explainability has beco me an imp ortant subfield of ML. Despite a b u rgeoning lit- erature, there has b een little work charact erizing how explanations have b een deploy ed by organizations in t he r eal world. In this paper , we explore how o rganizations have deployed lo cal explainability techniques so that we can obser ve which te chniques work b est in practice, report on the short comings of existin g tech- niques, and recommend paths for future rese arch. W e focus specif- ically on local explainability te chniques since these techniques e x- plain individual pr edictions, making them typically the most r ele- vant form of model transparency for end users. Our stud y synthesizes interviews with roughly fifty pe ople from approximately thirty organizations to understand which explain- ability te chniques are used and how . W e report t rends from two sets of in terviews and provide recommendations to organizations deploying explainability . T o the b est of o ur knowledge, we are the first to conduct a study of how e xplainability techniques are u sed by organizations that d epl oy ML mo dels in t heir workflows. Our main co ntributions are threefold: F A T* ’20, January 27–30, 2020, Bar celona, Spain Bha et al. • W e interview around twenty data scientists, who are not currently using e xplainability tools, to understand their or- ganization’s needs fo r explainability . • W e interview around thirty different individuals on how their o rganizations have deployed explainability techniques, reporting case studies and takeaways for e ach technique . • W e suggest a framework for organizations to clarify t heir goals for deploying e xplainability . The r est of this paper is organized as follows: (1) W e d iscuss the methodology of our surve y in Section 2. (2) W e su mmariz e our ov erall fin dings in Section 3. (3) W e detail how lo cal explainability techniques are used at various organizations and discuss technique-spec ific take- aways in Section 4. (4) W e dev elop a framework for establishing clear goals when deploying local explainability in Se ct ion 5.1 and discuss con- cerns o f e xplainability in Section 5.2. 2 METHODOLOGY In t he spirit o f Holstein et al. [28], we study how industry practi- tioners look at and deploy explainable ML. Sp ecifically , we stud y how particular o rganizations deploy explainability algorithms, in - cluding who consumes the explanation and how it is evaluated for t he intended stakeholder . W e co nduct two sets of interviews: Group 1 consisted at how d at a scientists who are not currently us- ing explainable machine learning hop e to leverag e various explain- ability too ls, while Group 2, t he crux of this paper , consisted at how explainable mach ine learn ing has b een deploy ed in practice. For Group 1, Fiddl er Labs led a set of around twenty interviews to assess explainability needs across various o rganizations in the technology and financial services sectors. W e specifically fo cused on teams that d o not cu r rently employ explainability tools. These semi-structured, hour-long in terviews included, but w ere not l im- ited to, the following questions: • What are your ML use case s? • What is your curr ent model development w orkflow? • What are your pain points in deploying ML mo dels? • W ould explainability help addr ess t hose p ain p oints? Group 2 spanned roughly thirty peo ple across app roximately twenty different organizations, b oth for -profit and non-profit. Most of these organizations are members of the Partnership on AI, which is a global multistakeholder non-profit established to study and for- mulate b est practices for AI to b enefit so ciety . With each individual, we he ld a t hirt y-minute to two-hour semi -structured intervie w to understand the state of explainability in their organization, their motivation for using e xplanations, and the benefits and shortcom- ings of the method s used. Some organizations asked to stay anony- mous, not to b e re ferred to explicitly in the prose, or not to be in- cluded in the acknowledgemen ts. Of the people we sp oke with in Group 2, around one-thir d rep- resen ted non-profit organizations (academics, civil societies, and think t anks), while the rest worked for for-profit organizations (corporations, industrial resear ch labs, and start-up s). Broken down by organization, around half were fo r-profit and half were aca- demic / non-profit. Ar ound one-thir d of the in terviewees wer e e x- ecutives at t heir organization, around half were rese arch scientists or engineers, and the remainder were p rofessors at academic in- stitutions, who co mmented on the co nsult ing the y had done with industry leaders to co mmercialize their resear ch. The q uestions we asked Group 2 in cluded, but w ere not limited to, the following: • Do es y our organ ization use ML model explanations? • What typ e of explanations have you used (e .g. , feature-based, sample-based, counterfactual, or natural l anguage)? • Who is t he audience for the mo del explanation (e .g. , research scientists, pr oduct mana gers, domain exp ert s, o r users)? • In what context have you deployed the explanations (e.g. , in- forming the development pro cess, informing human decision- makers ab out the mo d el, or i nforming the end user on ho w actions w ere t aken based on the mo del’s output)? • How does your o r ganization decide when an d where to use model e xplanations? 3 SUMMARY OF FINDINGS Here we synthesize the re sults from both intervie w groups. For the sake of clarity , we define various terms based on the context in which the y appear in the fort hcoming prose. • Trustworthiness refers to the extent t o which stakeholders can r easonably t rust a mo del’s ou tputs. • Transparency refers to attempts to provide stakeholders (par- ticularly external stakeholders) with relevant information about how the model works: this includes documentation of t he training pr ocedu re , analysis of training data distribu- tion, code releases, feature- lev el explanations, e tc. • Explainabili ty refers to attempts to provide insights into a model’s behavior . • Stakeholders are the pe o ple who either want a mo del to b e “e xplainable , ” will consume the model explanation, or are affected by decisions made based on model output. • Practice refers to the real-world context in whi ch the mo del has been depl oyed. • Lo cal Explainabili ty aims to explain the mo del’s b ehavior for a spe cific input. • Global Explainability attempts to understan d t he high- lev el concepts and reasoning used by a model. 3.1 Explainability Needs This subsection p rovide s an ov erview of explainabi lity needs that were uncovered with Group 1, data scientists from o r ganizations that do n ot curre ntly deploy e xplainability te chniques. These data scientists wer e asked t o describe t heir “pain points” in building and deploying M L models, and how the y hope t o u se explainability . • Mod e l debugging : Most data scientists struggle with de- bugging p oo r mo del p erformance. They wish to identify why the mo del p erforms p oorly on certain inputs, and also to identify reg ions of the input space with below average per- formance. In addition, they seek guidance on how to en- gineer new features, drop redundant features, and gather more dat a to improve model performance. For instance, one data scien tist said: “If I have 60 features, maybe it’s e qually effective if I just have 5 features. ” Dealing with feature inter- actions was al so a concern, as the data scientist co ntinue d , “Feature A will impact feature B, [since ] feature A might Deploying Explainability F A T* ’20, January 27–30, 2020, Bar celona, Spain negativ ely affect feature B—how do I attribute [importance in the presence o f ] correlations?” Others mentioned explain- ability as a debugging solut ion, helping to “narrow down where thin gs are broken. ” • Model monitoring : Sev eral individuals worry about drift in t he feature and prediction distributions after deployment. Ideally , they would like t o b e alerted when ther e is a signif- icant drift relative to t he t raining distribution [6, 4 7]. One organization would like explanations for how drift in fea- ture distribut ions would impact model outcomes and fea- ture contribution to t he model: “W e can compute how much each feature is dr ifting, but we w ant to cross-re feren ce [ this] with which features ar e impact ing the mo d el a lot. ” • Model transparency : Organizations that deploy mod els to make de cisions that directly affect end u sers seek explana- tions for model predictions. The e xplanations are mean t to increas e mo del transparency and comply with cu rrent or forthcoming regulations. In general, data scientists b eliev e that explanations can also help communicate predictions to a broader external audience of o ther business teams and customers. One company stressed the nee d to “show your work” to provide reasons on underwriting d e cisions to cus- tomers, and another company nee ded explanations to re- spond to custo mer complaints. • Model audit : In financial organizations, due to regulatory requirements, all deploy ed ML models must go through an internal audit . Data scientists building these models need t o have t hem revie w ed by internal risk and legal teams. One of the goals of the mo d el audit is t o conduct various kinds of tests provided by re gulations like SR 11-7 [43]. An effec- tive model validation frame work should include: (1) ev alua- tion of conceptual soundness of t he model, (2) ongoing mo n- itoring, includ ing b enchmarking, and (3) out comes analy- sis, including back-testing. Explainability is view ed as a t ool for evaluating t he sou ndness of t he mo del on variou s data points. Financial institutions would like t o conduct sensitiv- ity analyse s, checking the impact of small changes t o inputs on model outputs. Unexpect edly large changes in ou t puts can i ndicate an unstable model. 3.2 Explainability Usage In T able 1, w e aggr egate some of the explainability use cases that we r eceived from different organizations in Group 2. For each use case, we define the do main of use (i.e. , the industry in which the model is deployed), t he purp o se o f the model, the explainability technique used, the st akeholder consuming the explanation, and how the explanation is evaluated. Evaluatio n criteria denote how the organization compares the success of various explanation func- tions for the c ho sen technique (e .g., after selecting feature impo r- tance as t he technique, an organization can compare LIME [50] and SHAP [34] explanations via the faithfulness criterion [69]). In our study , feature imp ortance was the most common explain- ability technique, and Shapley values were the most common t ype of feature impor tance explanation. The most common stakehold- ers were ML engineers (or research scientists), followed by domain experts ( e.g ., loan officers and content moderators). Section 4 pro- vides definitions for each technique and further details on how these t echniques w ere u sed at Group 2 organizations. 3.3 Stakeholders Most or ganizations in Gr oup 2 deploy explainability atop thei r e x- isting ML workflow for one of the following stakeholders: (1) Exe cutives : These individuals de em explain ability neces- sary to achieve an o r ganization’s AI p r inciples. One research scientist felt that “explaina bility was st rongly advised and marketed by higher-ups, ” thou gh sometimes explainability simply became a checkb ox. (2) ML Engine ers : These individuals (including d ata scientists and r esear chers) train ML mod els at their organization and use explainability techniques to understand how the t rained model works: do the most imp ortant features, most similar samples, and neare st t raining point(s) in the opp o site class make sense? Usi ng explainability to d ebug what the model has l earned, this group of individuals were the most com- mon e xplanation co nsumers in our study . (3) End Use rs : T his is the most intuitive consumer of an expla- nation. The e nd user is the person consumin g t he output o f a model or making a decision based on mo del ou tput. Ex- plainability shows the end user why the mo del behaved the way it did, which is important for showing t hat the mo del is trustworthy and also providing g reater transpar ency . (4) Othe r Stakeholders : There are many other possible stake- holders for explainability . One such group is regulators, who may mandate that algorithmic de cision-making systems pro- vide explanations to affecte d po pulations or the regulators themselves . It is important t hat this group understands how explanations are deployed based on existing research, what techniques are feasible, and how the techniques can align with the desired explanation from a model. Another group is domain experts, who are often tasked with auditing the model’s behavior and ensuring it al igns with expert intu- ition. For many organizations, minimizing the dive rgence between an expert’s intuition and the mo del’s explanation is ke y t o su ccessfully impl ementing e xplainability . Overwhelmingly , we found that local explainability techniques are mostly consumed by ML engineers and data scientists t o au- dit mo dels before d eploymen t rather than t o provide explanations to end users. Our interviews reve al factors that prev ent organiza- tions from sho wing e xplanations t o end users or those affected by decisions made from ML mo del out puts. 3.4 K e y T akeaways This subse ction su mmariz es some key takeaways from Group 2 that shed light on the reasons for the limited deploymen t o f ex- plainability techniques and their u se primarily as sanity che cks for ML engineers. Organizations generally still consider the judg- ments o f domain experts t o be the implicit ground tr u th for e xpla- nations. Sin ce explanations produced b y current techniques often deviate from t he u nderstanding of domain experts, some organiza- tions still u se human experts to evaluate the explanation before it is p resented to users. Part of t his deviation stems from the p otential F A T* ’20, January 27–30, 2020, Bar celona, Spain Bha et al. Domain Model Purpose Explainabili ty Tech niqe St akeholde rs Ev alua tion C riteria Finance Loan Repa yment Fea ture I mp ort ance Loan Officers Compl e teness [34] Insurance Risk Assessment Feature Import ance Risk Analy sts Completeness [34] Content Moderation Malicious Reviews Fea ture I mp ort ance Content Moderat ors Completeness [34] Finance Cash Distribution Fea ture Import ance ML Engineers Sensitivity [69] Facial Recognition Smile Detection F ea ture I mport ance ML Engineers Faithfulness [7] Content Moderation Sentiment Analy sis Feat ure Import ance QA ML Engineers ℓ 2 norm Healthcare M edicare access Counterf actu al Explanations ML Engineers normalized ℓ 1 norm Content Moderation Object Detection Adversarial Perturba tion QA ML Engi neers ℓ 2 norm T abl e 1: Sum mary of sele ct deployed local explainability use cases for ML explanations t o reflect spurious correlations, which result from models d etecting patterns in the data that lack causal unde r- pinning s. As a r esult, organ izations find explainability techniques useful for helping their ML engin eers identify a nd reconcile incon- sistencies b etween the model’s explanations and their intuition or that of domain experts, r ather than for dir ectly pr oviding explana- tions to end users. In ad d ition, t here are technical limitations that make it difficult for organizations to show end users explanations in real-time. The non-conv exity of certain mod els make certain explanations (e.g ., providin g the most influential datap oints) hard to compu t e q uickly . Moreover , finding plausible counterfactual datap oints (that are fea- sible in the real world and on the input data manifold) is nontrivial, and many existing techniques cu rrently make crude approxima- tions or return t he closest datapoint of the ot her class in the train - ing set. Moreover , prov iding certain explanations can raise privacy concerns due to the risk of model inversion. More br oadly , organizations lack framew orks f or deciding why they w ant an explanation, and cur rent research fails to capture t he objective of an explanation. For example, large gradien ts, repre- senting t he direction of maximal variation with respect t o the out- put manifold, do not necessarily “e xplain” anything to stakehold- ers. At b est, gradient-based explanations provide an interpretation of how t he model behav es up on an infinitesimal perturbation (not necessarily a feasible one [29]), but do es not “ explain” if the model captures the underlying causal mechanism from t he d ata. 4 DEPLO Y ING LOCAL EXPLAINABI LI T Y In this section, we dive into how lo cal explainability te chniques are used at various or ganizations (Group 2). After revie wing technical notation, we define lo cal explainability techniques, discuss o rgani- zations’ use cases, and t hen report takeaways for each technique. 4.1 Preliminarie s A black b ox mo del f maps an input x ∈ X ⊆ R d to an output f ( x ) ∈ Y , f : R d 7→ Y . When we assume f has a parametric form, we write f θ . L ( f ( x ) , y ) denotes the loss function use d to train f on a dataset D of input-output pairs ( x ( i ) , y ( i ) ) . Each o r ganization we spoke with has deployed an ML mo del f . They hope t o e xplain a data point x using an explanation function д . Local explainability refers to an explanation for why f predicted f ( x ) for a fixe d p oint x . The l ocal explanation methods we discuss come in one of the following for ms: Which feature x i of x was most important fo r prediction with f ? Which training datapoint z ∈ D was most import ant to f ( x ) ? What is t he minimal change to the input x required to change the output f ( x ) ? In this pap er , w e deliberately decide to fo cus on t he more po pu- larly depl oyed local explainability techniques instea d of glob al ex- plainability techniques. Global explainability refers to t echniques that attempt t o explain the model as a whole. These techniques attempt to characterize the concepts learned by the mo del [3 1], simpler mo dels learned from the representation of complex mo d- els [1 7], prototypical samples fr om a particular model out put [10], or t he to polo gy of the data itself [20]. None of our interview ees reported deploy ing g lobal e xplainability techniques, thou gh some studied these techniques in research settings. 4.2 Feature Importance Feature imp ortance was by far t he most popu lar technique we found across our stud y . It is use d across many different domains (finance , healthcare, facial recognition, and content mo deration). Also known as feature-le vel interpre tations, feature at tributions, or saliency maps, this metho d is by far the most widely used and most w ell-studied e xplainability technique [9, 24]. 4.2.1 Formulation. Feature importance methods define an expla- nation fu nction д : f × R d 7→ R d that takes in a mo del f and a point o f interest x and returns importance scores д ( f , x ) ∈ R d for all features. is the impo rtance of (or attribution for) feature x i of x . These explanation functions roughly fall into two categories: perturb at ion-based techniques [8, 14, 22, 34 , 50, 61] and gradient- based t echniques [7, 41, 57, 59, 60, 62 ]. Note that gradient-based techniques can b e se en as a spec ial case of a perturbat ion-based technique with an infinites imal perturbation size. Heatmaps are also a type of feature-leve l explanation that denote the importance of a region or collection of features [4, 22] . A prominent class of perturb at ion b ased metho ds is based o n Shapley values from co- operative game theory [54]. Shapley values are a way to distrib- ute the gains from a co operative game to its players. In applying the method to explaining a mod el prediction, a coop erative game is defined b etween t he features w ith the mo del predict ion as the gain. T he h ighlight of Shapley values is that the y en joy axiomatic uniqueness guaran tees. Unfortunately , calculating the exact Shap- ley value is exponential in d , input dimensionality; howev er , the literature has propose d approximate methods using weighted lin- ear reg ression [34], Monte Carlo approximation [61], centroid ag- gregation [11], and graph- structured factorizat io n [14]. When we Deploying Explainability F A T* ’20, January 27–30, 2020, Bar celona, Spain refer to Shapley-r elated methods her eafter , we mean such appr ox- imate me thods. 4.2.2 Shapley V alues in Practice. Organization A works with fi- nancial institutions and helps explain mo dels for credit risk anal- ysis. T o integrate into the existing ML workflow of t hese institu- tions, O rganization A procee ds as follows. They let data scie ntists train a mo del to the desired accuracy . Note that Organization A focuses most l y o n models trained on t abu lar dat a, though they are beginning to venture into unstructured data (i.e. , l anguage and im- ages). During mo del validation, risk analysts conduct stress t ests before deploying the mod el to l o an officers and other decision- makers. After decision-makers vet t he mod el outputs as a sanity check and decide whe ther or not t o ove rride the mo del output, Or- ganization A generates Shaple y value explanations. Before launching the mo del, risk analysts are asked to revie w the Shapley valu e explanations to ensure t hat the mod el exhibits expected b ehavior (i.e ., the mod el uses the same features that a hu- man would for t he same task). Notably , the cust omer supp ort team at these institut ions can al so use these explanations to provide in- dividuals information about what went into the de cision-makin g process for their loan appro val or cash distribution de cision. They are shown the p ercentag e contribution to the mo del output (the positive ℓ 1 norm of the Shapley value explanation along with the sign of contribution). This means that the explanation would b e along the lines of, “55% of the de cision was decided by your age , which positively cor related with t he predicted outcome. ” When comparing Shapley value explanations to other pop u lar feature import ance te chniques, Organization A fou nd that in prac- tice LIME e xplanations [50] give unexpected explanations t hat do not align with human intuition. Recent work [71] shows that the fragility of LIME explanations can b e traced to the sampling vari- ance when explainin g a singular dat a point and t o t he e xplana- tion sensitivity t o sample size and sampling proximity . Thou gh decision-makers have access to the feature-importance explana- tions, end u sers are still not shown these explanations as reasonin g for model output. Organization A aspires to ev entually provide this “e xplanation” to end users. For gradient-based language models, Organization A uses In te- grated Gradients (related to Sha pley V alues by Sundararajan et al. [62]) to flag malicious re views and mo derate conten t at the afore- mentioned institutions. This information can be highlighted to en- sure the trustworthiness and transparency of the mod el to the de- cision maker (the hired content moderator here), since they can now se e which word w as most import ant to flag the content. Going for ward, Organization A intends to use a global variant of t he Shapley value explanations by exposing how Shapley valu e explanations work on average for datap oints of a particu l ar pre- dicted class ( e .g., on av erage someone who was denied a loan had their age matter most for the prediction). This global explanation would help risk analysts get a birds-e ye view of how a mo del b e- haves and whether it aligns with their expectations. 4.2.3 Heatmaps in Transportatio n. Organization B looks t o detect facial e xpressi ons from video feed s of users driving. They hop e to use explainability to identify the actions a u ser is p erforming while the user drives. Organ ization B has tried fe ature visualizatio n and activation visualization techniques that get attributions by back- propagating gradients to regi ons of interest [4, 70] . Specifically , they u se the se probabilistic Winner- Ta ke- Al l te chniques ( variants of existing gradient-based feature import ance techniques [ 5 7, 62]) to lo cal ize the region of importance in t he input space for a partic- ular classification t ask. For example, when dete cting a smile , the y expect the mout h of the d r iver to be important. Though none o f these desired te chniques have be en deployed for the end user (the driver in t his case), ML engineers at Organi- zation B found these techniques u seful for qu alitative re view . On tiny datasets, e ngineers can figure out which scenarios have false positives (vi deos falsely detecte d to contain smiles) and why . T hey can also identify if true p ositives ar e paying attention to the right place or if there is a problem with spu rious art ifacts. Howe ver , while tr ying t o understand why the mo del erred by analyzing similarities in false p ositives, they have struggled to scale this lo cal technique across heatmaps in aggregate across multi- ple videos. They are able to q ualitatively evaluate a sequence of heatmaps for one video, but doing so across 100M frames simulta- neously is far more difficult. P araphrasing the VP of AI at Organi- zation B, aggregating saliency maps across vide os is mo ot and con- tains littl e information. No t e that an individual heatmap is an ex- ample of a local explainability technique, but an aggreg ate heatmap for 100M frames would be a global technique. Unlike aggr egating Shapley values for tabular d ata as done at Organi zation A, taking an e xpect ation over heatmaps (in the statistical sense) does not work, since aggr egating pixel att ributions is meani ngless. One op- tion Organization B d iscussed would be to clustering low dimen- sional repr esentations of the heatmaps and t hen tagging each clus- ter based o n what the model is focu sing on; unfortunately , humans would still have to manually label t he clusters of impo rtant regions. 4.2.4 Spurious C orrelations. Related to model mon itoring for fea- ture drift dete ction d iscusse d in Se ction 3 .1, Organization B has encountered issues with spu rious correlations in their smile dete c- tion mo dels. Their Vice President of AI noted t hat “[ML engin eers] must kno w to what e xtent you want ML to le ve rage highly corre- lated data t o make cl assifications. ” Explainability can help identify models that fo c u s on that correlation and can find ways to have models ignore it. Fo r example , there may be a side effect of a corre- lated facial expression or co-o ccurrence: cheek raising, for exam- ple, co -occurs wit h smiling. In a cheek-raise detect or t rained on the same dataset as a smile detector b u t with different lab els, the model still focused on the mouth instead of the cheeks. Both mo d- els wer e fixated on a prevalent co-occurrence. Att ending t o the mouth was unde sirable in the che ek-raise d etector but allow ed in the smile detecto r . One way Organization B co mb ats this is by using simpler mo d- els on top of complex feature engineering . For example, they use black b ox deep learning mo dels for building go od d escriptors that are robust across camera viewpoints and will detect different fea- tures t hat subje ct matter experts deem import ant for drowsiness. There is one mo del p er important descriptor (i.e ., one mo del for ey es closed, o ne for yawns, etc.). Then , the y fit a simple model on the extracted descriptors such that the important descriptors are F A T* ’20, January 27–30, 2020, Bar celona, Spain Bha et al. obvious for the final prediction of drowsiness. Ideally , if Organi- zation B had guarantees abou t the disentanglemen t of data gen er- ating factors [ 3], they would b e able to understand which factors (descriptors) play a role in do wnstream cl assification. 4.2.5 Feature Imp ortance - T akeaways. (1) Shapley values are rigorously motivated, and approximate methods ar e si mple to deploy for decision makers to sanity check the mo dels t hey have built. (2) Feature imp ortance is not shown to end users, but is used by machine learning engineers as a sanity check. Looping other stakeholders (who make decisions based on mo del out - puts) into the model development process is e ssential to un- derstanding what type of explanations the model delivers. (3) Heatmaps (and feature importance scores, in general) are hard to aggr egate , w hich makes it hard to do false p ositive detection at scale. (4) Spurious correlations can be d etected with simple gradient- based techniques. 4.3 Counterfactual Explanations Counterfactual explanations are te chniques that e xplain individ- ual pr edictions by providing a mean s for recourse. Contrastive e x- planations that highlight contextually r elevan t information to the model out put are most similar to human explanations[40]; how- ev er , in their current fo r m, finding the rele vant set of plausible counterfactual p oint is no clear . Moreover , while some existing open source implementations for counterfactual explanations ex- ist [65, 68], they either work for sp ecific mo del-types or are not black-box in nature. In this sectio n, we discuss the formul ation for counterfactual explanations and des cribe one solu tion for each de- ployed t echnique. 4.3.1 Formulation. Counterfactual explanations are p oints close to the input for which the decision of the classifier changes. For example , for a p erson who was rejected for a loan by a ML mo del, a co u nterfactual explanation would possibly suggest: "Had y our income b een greater by $5000, the lo an would have b een grante d. " Given an input x , a classifier f , and a distance metric d , we find a counterfactual explanation c by solving the optimization problem: min c d ( x , c ) s.t. f ( x ) , f ( c ) (1) The method can b e tailored to allow only certain relevant fea- tures to be changed. No t e that the term counterfactual has a differ- ent me aning in the causality literature [27, 46]. Counterfactual ex- planations for ML were in troduced by W achter et al. [66]. Sh arma et al. [55] pro vide details on e xisting techniques. 4.3.2 Counterfact ual Explanation s in H ealthcare. Organization C uses a faster ve rsion of the formulation in Sharma et al. [ 55] to find counterfactual explanations for p rojects in he althcare. When peo- ple app l y for Me d icare , Organization C hopes t o flag if a user’s ap- plication has errors and to provide explanations on how to corr ect the errors. M oreove r , ML engineers can use the robustness score t o compare d ifferent mo dels trained using this data: this robustness score is effectively a suitably normalized and averag ed distance between the counterfactual and original p oint in Euclidean space . The origin al formulation makes u se of a slower genetic algorithm, so t hey optimized the co unterfactual explanation generation pro- cess. They are cu rrently dev eloping a first-of-its-kind application that can directly take in any bl ack-box mo del an d data and return a robustness score, fairness measure , and co unterfactual explana- tions, all from a single underlying algorithm. The use of t his approach has several advantages: it can be ap- plied to black-box models, works for an y input d at a type, and gen - erates multiple explanations in a single run of the algorithm. How- ev er , there ar e some shortcomings that Organization C is address- ing. One challen ge of counterfactual models is t hat t he co u nterfac- tual mig ht not b e feasi ble. Organ ization C addres ses this by using the training data to guide the counterfactual generation p rocess and by providin g a user interface that all ows do main exp erts to specify constraints. The flexibility of the counterfactual approach comes with a drawback common among explanations for black- box mod els: there is no g uarantee o f the optimality of the explana- tion since black-box techniques cannot guarantee optimality . Through t he creation of a deployed solution for this metho d, the organization realized that cl ients would ideally want an explain- ability score, along w it h a measure of fairness and robustness; as such, they hav e developed an explainability sco re that can be used to compare the explainability of diffe rent models. 4.3.3 Counterfact ual Explanations - T ak eaways. (1) Organizations are interested in cou nterfactual explanation solutions since the underlying method is flexible and such explanations ar e easy for e nd users to understand. (2) It is not clear exactly what should be optimize d for when generating a co unterfactual or how to do it efficiently . Still, approximate solu tions may suffice in practical applications. 4.4 A dversarial Training In order to ensure the model b eing d eploy ed is robust to adver- saries and behaves as intended, many organizations we interviewed use adversarial training to impro ve performance. It has recently been shown that in fact, t his also can lead t o more human inter- pretable features [30]. 4.4.1 Formulation. Other works have also explored t he intersec- tion between adversarial robustness and model interpretations [1 8, 21, 23, 59, 69]. The claim of one of t hese works is that t he closest adversarial example shou ld p erturb ‘fragile’ features, e nabling the model to fit to robust features (indicativ e of a part icular class) [30]. The setup of feature i mportance in t he adversarial training setting from Singla et al. [59] is as follows: д ( f , x ) = max ˜ x L ( f θ ∗ ( ˜ x ) , y ) k ˜ x − x k 0 ≤ k k ˜ x − x k 2 ≤ ρ W e let | ˜ x − x | b e the top- k feature importance scores of the input, x . T his is s imilar to the adversarial example setup which is usually written in the same manner as the above ( without the ℓ 0 norm to limit the number of features that changed). It is also interesting to note that the formulation to find counterfactual explanations above matches the for mulation for finding adversarial examples. Deploying Explainability F A T* ’20, January 27–30, 2020, Bar celona, Spain Sharma et al. [55] use this co nnectio n t o generate adversarial ex- amples an d define a black-box model robustness score . 4.4.2 Image C ontent Moderatio n. Organization D mo derates user- generated content (UGC) o n se veral public pl atforms. Specifically , the R&D team at Organization D d ev eloped sev eral mo dels to de- tect adult and violent conten t from users’ uploaded ima ges. Their quality assurance (Q A) team measure s model robustness to im- prov e co ntent detection accuracy under the threat o f adversarial examples. T he robust ness of a content moderation mo del is mea- sured by the minimum p erturbation required for an image to evade detection. Given a gradient-based image classification mo del f : R d → { 1 , . . . , K } , and we assume f ( x ) = argmax i ( Z ( x ) i ) where Z ( x ) ∈ R K is the final (logit) layer output, and Z ( x ) i is the predic- tion score for the i -th class. The objective can be formulate d as the following optimization p roblem to find the minimum pertu rbation: argmin x { d ( x , x 0 ) + c L ( f ( x ) , y ) } (2) d ( · , ·) is some distance measure that Orga nization D chooses to be the ℓ 2 distance in Eucl idean space; L ( ·) is the cross-en tropy loss function an d c is a b al ancing facto r . As is co mmon in the adversarial literatu re , Organization D ap- plies Projected Gradient Descent (PGD ) to search for the minimum perturb at ion from the set of allowable perturbations S ⊆ R d [36]. The sea rch process can be fo rmulated as x t + 1 = Π x + S  x t + α sgn ( ∇ x L ( f θ ∗ ( x ) , y ))  until x t is misclassified by the detection mo del. ML engineers on the QA team are shown a ℓ 2 -norm p erturbation distance averaged ove r n test images randomly sampled from the test dataset. T he larger the average pertu r bation, the more robust the model is, as it takes greater effort fo r an att acker t o evade detection. The average perturb at ion required is widely use d as a metric when comparing differe nt can didate mod els and differ ent versions of a given model. Organization D finds that more robust models have more con- vincing gradient-based explanations, i.e ., the gradient o f t he ou tput with respe ct to the input shows that the mo del is focusing on rele- vant p ortions of the images, co nfirming recent research [21, 30, 64 ] . 4.4.3 T ext Cont ent Moderation. Organization E uses te xt content moderation algorithms on its UGC platforms, such as forums. Its Q A team is responsible for the reliability and robustness of a sen- timent analysis mo del, which lab els po sts as positive or negativ e, trained on UGC. The QA team seeks to find the minimum p ertur- bation requ ired to change the classification o f a post. In particular , they want to know how to take misclassified p osts (e .g., negative ones classified as positive) and change them t o t he co rrect class. Given a sentime nt analysis mo del f : X → Y , which maps from feature space X to a set of class Y , an adversa ry aims to gen- erate an adversa rial post x ad v from t he o riginal p ost x ∈ X whose ground t ruth label is f ( x ) = y ∈ Y so that f  x a d v  , y . The Q A team tries to minimize d ( x , x ad v ) for a d omain-specific distance function. Organization E uses the ℓ 2 distance in the emb edding space, but it is equally valid to use the editing distance [42]. Note that perturbation t echnique chan ges accordi ngly . In practice, to find the minimum distance in embe dding space, Organization E chooses to iterative ly modify the words in the orig- inal post, starting from the words with the highest imp ortance. Here impor tance is defined as the gradient of the model ou tput with respe ct to a particular word. ML engin eers compute the Jaco- bian matrix of the given po sts x = ( x 1 , x 2 , · · · , x N ) where x i is the i -th w ord. The Jacobian ma trix is as follows: J f ( x ) = ∂ f ( x ) ∂ x = " ∂ f j ( x ) ∂ x i # i ∈ 1 . . . N , j ∈ 1 . . . K (3) where K represe nts the number of classes (in this case K = 2), and f j (· ) represents the confidence valu e of the j th class. The impo r- tance of word x i is defin ed as C x i = J f , i , y = ∂ f y ( x ) ∂ x i (4) i.e ., t he partial derivative o f the confidence value based on the pre - dicted class y regarding to the input word x i . This proce d ure ranks the words by t heir impact on the sentiment analysis results. T he Q A team t hen applies a set of transformations/perturbat io ns t o the most important words t o find the mi nimum number of important words that must be pertur b ed in o rder to flip an sentimen t analysis API r esult. 4.4.4 Adv ersarial T raining - T ake aways. (1) There is a relation between model robustness and explain- ability . Model robustness improves the qu ality of feature im- port ances (specifically saliency maps), confirming recent re- search fi ndings [ 21]. (2) Feature impo rtance helps find minimal adversarial p ertur- bations for language models in practice. 4.5 Influential Samples This technique asks the question: Which data p oint in the training dataset x ∈ D x is most influential to t he mo del’s output f ( x test ) for a test po int x test ? Statisticians hav e u sed me asures like Cook’s distance [15] which measure the effect of deleting a data point on the model output. Howe ver , such measures require an e xhaustive search and hence do not scale well for larger datasets. 4.5.1 Formulation. For over half of the organizations, influence functions has b een the too l o f choice fo r explaining which t rain- ing po ints are influential to the model’s outp u t for a point x [32] (though only one organization act ually deployed the technique). W e let L ( f θ , x ) be the model’s loss for point x . The empirical risk minimize r is give n by ˆ f θ = arg min θ ∈ Θ 1 N Í N i = 1 L ( f θ , y x ( i ) ) . Note that y x = ˆ f θ ( x ) is t he predicted o utput at x with the trained risk minimize r . Koh and Liang [32] define the mo st influential data point z to a fixed point x as that which maximizes the following: I up , loss ( z , x ) = − ∇ θ L  ˆ f θ ( x ) , y x  ⊤ H − 1 ˆ f θ ∇ θ L  ˆ f θ ( z ) , y x  This quantity measures the effect of upweightin g on datapoint ( z ) on the loss at x . The goal of sample impo rtance is to uncover which training examples, when p erturbe d, w ould have the largest effect (positive or negative ) on the loss of a test point. F A T* ’20, January 27–30, 2020, Bar celona, Spain Bha et al. 4.5.2 Inf luenc e Functions in Insurance. Organization F uses influ- ence functions t o explain risk mo dels in the insurance industr y . They hope to identify which customers might see an increase in their premiums based on their drivin g history in the past. The or- ganization hopes to divulge to t he end u ser how the premiums for drivers similar t o them are priced. In other words, they hop e to identify the influential tr aining data p oints [32] to understand which past driv ers had the gr eatest influence o n the prediction for the obser ved driver . Unfortunately , Organization F has struggled to pr ovide this information to end users since the H essian compu- tation has made doing so impractical sin ce the lat ency is hig h. More p ressingly , ev en when Organization F lets the influence function proce dure run, they find that many influen tial data po ints are simpl y outliers that are important for all d rivers since those anomalous d rivers are far ou t of distribu tion. As a result, instead of identifying which drivers are most similar to a given driv er , the influential sample explanation identifies d rivers that are very dif- feren t from any driver (i.e ., outliers). While this is could in theory be useful for outlier detection, it preve nts the explanations from being used in practice. 4.5.3 Inf luent ial Samples - T akeaways. (1) Influence functions can be intractable for large datasets; as such, a significan t effort is nee d ed to improv e these methods to make them easy to deploy in practice. (2) Influence functions can be sensitive to outliers in the data, such that they might be more useful for outlier dete ction than for providing end users e xplanations. 5 RECOMMEND A TIONS This section provides recommendations for organizations based on the key takeaways in Section 3.4 and the te chnique-specific take- aways in Sect ion 4. In order to address t he challenges organizations face w hen striving to provide explanations t o end u sers, we recom- mend a framew ork for establishing clear des iderata in explainabi l- ity an d then include concerns associated with explainability . 5.1 Establish Cl ear Desiderata Most organizations we spoke to so l ely deploy e xplainability tech- niques fo r internal engineers and scientists, as a debugging mecha- nism or as a sanity check. At the same time, these organizations af- firmed the importance of u nderstanding t he st akeholder , and hop e to be able to explain a model prediction to t he end user . Once the target po pulation of the explanation is understoo d, organizations can devise and deploy explainability t e chniques acco rdingly . W e propose the following three steps for establishing clear desider- ata and improving decision making around explainability . These include: clearly identifying t he target populat io n, understanding their needs, an d clarifyin g the intention of t he explanation. (1) Identify st akeholders. Who are your desired explanation consumers? T ypically this will be those affect e d by or shown model outputs. Preece et al. [49] describ e how stake holders have different needs for explainability . Distinctions b etween these groups can help d esign better explanation te chniques. (2) Engage with each stakeholder . Ask the stakeholder some variant o f “What would you need t he mo del to explain to you in o rder to understand, trust, or contest the model pre- diction?” and “What type of explanation do you want from your mo del?” Doshi- V elez and Kim [19] highlight how the task being modeled dictates what typ e of explanation the human will need from t he model. (3) Unde rs t and the purp ose of the explanation. Once the context and utility of the explanation are stated, u nderstand what the stakeholder wan ts to do with the explanation [ 24]. • Static Consumption : Will the explanation be used as a one- off sanity check for some stakeholders or shown to other stakeholders as reasoning for a particular prediction [48]? • Dynami c Mo d el Updates : Will the e xplanation be used to garner fee dback from the stakeholder as t o how the model ought to be upd ated to bett er align wit h their intuition? That is, how does the stakeholder interact w ith the mo del after viewing the explanation? Ross et al. [51] attempt to dev elop a technique for dynamic explanations, wherein the human can guide the mo del towards learning the cor- rect explanation. Once desiderata are cl arified, stakeholders should b e consulted again. 5.2 Conce rns of Explainability While there are po sitive reasons to encourage explainability of ML models, w e n ote some concerns raised i n our inter vie ws. 5.2.1 On Causality . One chief scientist told u s that “Figuring out causal factors is t he holy grail of explainability . ” Howeve r , causal explanations are largely lacking in t he l iterature, with a few excep- tions [13]. Though non-causal e xplanations can still provide valid and useful i nterpretations of how the model works [37] , man y or- ganizations said that they would be keen to use causal explanations if the y w ere available. 5.2.2 On Privacy . Three organizations mentioned dat a privacy in the context of explainability , si nce in some cases e xplanations can be used to learn about the model [38, 63] or the training data [56]. Methods to counter these concerns have b een dev eloped. For exam- ple, Har der et al. [25] dev elop a methodology for training a differ- entially pr ivate mo del that generates loc al and global explanations using l o cally line ar maps. 5.2.3 On Impr oving Performance. One pu rpose of e xplanations is to improve ML engin eers’ understanding of their models, in order to help them refine and improv e performance. Since machine learn- ing models are “dual use ” [12], we should b e aw are that in some settings, explanations or ot her tools coul d enable malicious users to increase capabilit ies and performance of undesirable systems. For example, sev eral organizations we tal ked wit h use explanation methods to improv e their natural language processing and image recognition mo dels for content moderation in ways that may con- cern so me s takeholders. 5.2.4 Bey ond Deep Learning. Though deep learning has gained popu l arity in recent years, many o rganizations still use classical ML te chniques ( e .g., logistic regression, support vector machines), likely d u e to a n eed for simpler , mo re interpretable models [52]. Many in the explainability community have focu sed on inter- preting black-box deep learning mo dels, ev en though practitioners Deploying Explainability F A T* ’20, January 27–30, 2020, Bar celona, Spain feel that t here is a dearth of mo del-specific techniques to under- stand traditional ML mo dels. For example, one research scientist noted that, “M any [financial institut ions] use kernel-based meth- ods o n t abular data. ” As a result, there is a desire to translate ex- plainability techniques for kernel suppor t vector machines in ge- nomics [58] to models trained o n tabular d ata. Model agnostic te chniques like Lundberg and Lee [34] can b e used for traditional models, b u t are “likely overkill” for explain- ing kernel-based ML mo dels, according to one research scientist, since mo del-agnostic metho ds can be co mputationally expensiv e and lead to poo rly approximated e xplanations. 6 CONCLUSION In this study , we critically examine how explanation te chniques are used in practice. W e are the first, t o our knowledge, to inter- view various organizations on how they deploy explainability in their ML workflows, concluding with salien t directions for future resear ch. W e found that while ML engineers are increasingly using explainability t echniques as sanity checks d uring t he dev elopment process, there are st ill significant l imitat io ns to curr ent t echniques that preven t their use to directly inform end users. These limita- tions include the need for do main experts to evaluate explanations, the risk of spurious correlations reflected in mo del explanations, the l ack of causal intuition, and t he latency in computing and show- ing explanations in real-time. Future research should seek to ad- dress these limitations. W e also highlighted the need for organiza- tions to establish clear d esiderata for their explanation techniques and to be cognizant of the concerns associated wit h explainability . Through this analysis, we t ake a step t owards des cribing explain- ability deploymen t and hope that future res earch builds trustwor- thy e xplainability solutions. 7 A CKNOWLEDGMEN TS The authors would like to thank the following individuals for t heir advice, contributions, and/or suppor t: Karina Alexany an (Partner- ship on AI), Gagan Bansal (Univ ersity of W ashington), Rich Caru- ana (Microsoft), Amit Dhurandhar (IBM), Krishna Gade (Fiddler Labs), Konstantinos Georgatzis (QuantumBlack), Jette Henderson (Cogni tiveScale ), Bahador Kaleghi (Element AI), Hima Lakkaraju (Harvard University ), Katherine Lewis (Partnership on AI), Peter Lo (Partnership on AI), T erah Lyons (Partnership on AI), Saayeli Mukherji (Partnership on AI), E rik Pazos (QuantumBlack), Inioluwa Deborah Raji (Partnership on AI + AI Now ), Nicole Rigillo (Elemen- tAI), Francesca Rossi (IBM), Jay T urcot ( A ffe ct iva), Ku sh V arshn ey (IBM), Denni s W ei (IBM), Edward Zhong (Bai du), Gabi Zijderveld ( Affectiva), and ten other an onymous individuals. UB acknowledges supp ort fr om DeepMind via the Lev erhulme Centre for the Future of Intelligence (CFI) and the P artnership on AI research fellowship. A W acknowledges support from the David MacKay Newton resear ch fe llowship at D arwin Co l lege, The Alan T uring Institute under EPSRC grant EP/N51012 9 /1 & T U/B/000074, and the Lev erhulme Trust via CFI. REFERENCES [1] 2019. IBM’S Principles for Data Trust and Transparency . https://www .ibm.com/blogs/policy/trust- principles/ [2] 2019. Our approac h: Microsoft AI principles. https://www .microsoft.com/en- u s /ai/our- approach- to- a i [3] Tameem Adel, Zoubin Ghahramani, and Adrian W eller . 2018. Discoveri ng inter- pretable representations for b oth deep generative and discri minative models. In International Conference on Machine Learning . 50–59. [4] Sara h Adel Bargal, Andrea Zunino , Donghyun Kim, Jianming Zhang, Vittorio Murino, and Stan Sclaroff. 2018. Excitation backprop for RNN s . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1440–1449. [5] Osca r Alvarado and Annika W aern. 2018. T owards algorithmic experience: Ini- tial efforts for social media contexts. In Proceedings of the 2018 CHI Conference on Human Factors in Com puting Systems . ACM, 286. [6] Dar io Amodei, Chr i s O la h, Jacob Steinhardt, Paul Christiano, John Schul- man, a nd Da n Mané. 2016. Concrete problems i n AI sa fety . arXiv preprint arXiv:1606.06565 (2016). [7] Mar co Ancona, Enea Ceolini, C engiz Oztireli, and Markus Gr oss. 2018. T o- wards better understanding of gradient-based attribution methods for Deep Neu - ral Networks. In 6th International Conference on Learning Representations (ICLR 2018) . [8] Mar co Ancona, Cengiz Oztireli, and Ma rkus Gross. 2019. Explaining Deep Neu- ral Networks with a Polynomial Time Algorithm for Shapley Value A pproxi- mation. In Proc eedings of the 36th International Conference on Machine Learn- ing (Proceedings of Machine Learning Research) , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), V ol. 97. PML R, Long Beach, C alifornia, USA, 272–281. [9] Davi d Baehrens, Timon Schroeter , Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus -Robert MÞller . 2010. How to explain individual classifi c a- tion decisions. Journal of M ac hine Learning Research 11, Jun (2010), 1803–1831. [10] Rajiv Khanna Been Ki m and Sanmi Koyejo . 2016. Examples are not Enough, Learn to Criticize ! Critici sm for Interpretability. In Adva nces in Neural Informa- tion Proc essing Systems . [11] Umang Bhatt, Pra deep Rav i kumar, and J osé MF Moura. 2019. T owards a ggr egat- ing weighted feature attributions. arXiv preprint arXiv:1901.10040 (2019). [12] Miles Brundage, Shahar A vin, J ack Clar k, Helen T oner , Peter Eckersley , Ben Garfinkel, Allan Dafoe, Paul Scharr e, Thomas Zeitzoff, Bobb y Filar, et al. 2018. The ma licious use of ar tificial intelligence: Forecasting, preve ntion, and mitiga- tion. arXiv preprint arXiv:1802.07228 (2018). [13] Aditya Chattopadhyay , Piyushi Manupriya, Anirban Sarka r , and Vineeth N Bala- subrama nian. 2019. Ne u ral Network Attributions: A C ausal Perspective. In Pro- ceedings of the 36th International Conference on M achine Learning (Proc eedings of Machine Learning Research) , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), V ol. 97. PMLR, Long Beach, C alifornia, USA, 981–990. [14] Jia nb o Chen, Le Song, Martin J W ainwright, and Michael I Jordan. [n. d.]. L- shapley and c-shapley: Efficient mod el interpretation fo r structured da ta . 7th International Conference on Learning Representations (ICLR 2019) ([n. d.]). [15] R Dennis Cook. 1977. Detection of infl uential obser vation in linear regressi on. T echnometrics 19, 1 (1977), 15–18. [16] Jeffrey De Fauw , Joseph R Ledsam, Bernardino Romera-Paredes, Stanislav Nikolov , Nenad To masev , Sam Blac k well, Harr y Askham, X avier Glorot, B ren- dan O’Donoghue, Da niel Visentin, et al. 2018. C linically applicab le deep learning for diagnosis and referr a l i n retinal diseas e. Nature m edicine 24, 9 (2018), 1342. [17] Amit Dhurandhar, Karthikeyan Shanmugam, Ronny Luss, a nd Peder A Olsen. 2018. Improving s imple models with confidence profiles. In Adv ances in Neural Information Proc essing Systems . 10296–10306. [18] Ann-Kathrin Dombrowski, Maximilia n Alber, Christopher J Anders, M arcel Ack- ermann, Klaus-Robert Müller , and Pan Kessel. 2019. Exp lanations can be manip- ulated and geometry is to blame. arXiv preprint arXiv:1906.07983 (2019). [19] Finale Doshi- V elez and Been Kim. 2017. T owards A Rigorous Science of Inter- pretable Machine L earning. (2017). [20] William Du Mouchel. 2002. Data squashing: constructing summar y data sets. In Handbook of Massive Data Sets . Springer , 579–591. [21] Chris tian Etmann, Sebastia n Lunz, Peter Maass , and Carola Schoenlieb. 2019. On the Connection Between Adversarial Robustness and Saliency Map Inter- pretability . In Proceedings of the 36th International Conference on Machine Learn- ing (Proceedings of Machine Learning Research) , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), V ol. 97. PML R, Long Beach, C alifornia, USA, 1823–1832. [22] Ruth Fong and Andrea V edaldi. 2017. Interpretable Explanations of Black Boxes by Mea ningful Perturbation. Proceedings of the 2017 IEEE International Confer- ence on Computer Vision (ICCV). (2017). https://doi.org/10.1109/ICC V .2017.371 arXiv:arXiv: 1704.03296 [23] Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neu- ral networks is fragile. AAAI (2019). [24] Leilani H Gilpin, Davi d Bau, Ben Z Yuan, A yesha Bajwa, Michael Spect er , and Lalana Kagal. 2018. Explaining explanations: An over view of interpretability of machine learning. In 2018 IEEE 5th International Conference o n data science and advanced analytics (DSAA) . IEEE, 80–89. [25] Frederik Harder, Ma tthias Bauer , and Mijung Par k . 2019. Interpretable and Di f- ferentially Private Predictions. arXiv preprint arXiv:1906.020 04 (2019). F A T* ’20, January 27–30, 2020, Bar celona, Spain Bha et al. [26] JB Heaton, Nic holas G Polson, a nd Jan Hendrik Witte. 2016. Deep lear ning in finance. arXiv preprint arXiv:1602.0 6561 (2016). [27] Paul W . Holland. 1986. Statistics and C ausal Inference. J. A mer . Statist. Assoc. 81, 396 (1986), 945–960. [28] Kenn eth Holstein, Jennifer W ortman Vaughan, Hal Daumé III, M iro Dudik, and Hanna Wall ach. 2019. Improving fai rness in machine learning s ystems: What do industry pra ctitioners need? . In Proc eedings of the 2019 CHI Conference on Human Fact or s in Computing Systems . A CM, 600. [29] Giles Hooker and Lucas Mentch. 2019. Please Stop Permuting Features: An Ex- planation and Alternatives. arXiv preprint arXiv:1905.0 3151 (2019). [30] Andrew I lyas, Shibani Santurkar, Dimitris T sipras, Logan Engstrom, B randon Tran, and Aleksander Madry. 2019. Adversaria l Examples Are Not Bugs, T hey Are Features. http://arxiv.org/abs/1905.02175 ci te ar xiv:1905.02175. [31] Been Kim, Ma r tin Wattenberg, Justin Gilmer, Ca rrie Ca i, James W exler , Fer- nanda Viegas, and Ror y Sayres. 2017. Interpretability beyond feature attribu- tion: Q uantitative testing with concept activation vectors (tcav). arXiv preprint arXiv:1711.11279 (2017). [32] Pang W ei Koh and Percy Liang. 2017. Understanding black -box predictions via influence function s. In Pr oceedings of t he 3 4th Internationa l Conference on Ma- chine Lea rning- V olume 70 (ICML 2017) . Journal of Machine Lear ning Research, 1885–1894. [33] Bruno Lepri, Nuria Oliver, Emmanuel Letouzé, Alex Pentland, and Patrick V inck. 2018. Fa i r , transparent, and ac c ountable algorithmic decision-mak ing process es. Philosophy & T echnology 31, 4 (2018), 611–627. [34] Scott M Lundberg a nd Su-In L ee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 (NeurIPS 20 17) , I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus , S. Vish- wanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765–4774. [35] Scott M Lu ndb erg, Bala N air , Monica S V avilala, Mayumi Horibe, Michael J Eisses, Trevor A dams, David E Liston, Daniel King-W ai Low, Shu-Fang New- man, J erry Kim, et al. 2018. Explainable m a chine-learning prediction s for the prevention of hypoxaemia during surger y . N ature biomedical engineering 2, 10 (2018), 749. [36] Aleksander Ma dry, Aleksandar Makelov , Ludwig Schmidt, Dimitri s T sipras, a nd Adrian Vladu. 2017. T owards deep learning models resistant to a dver sarial at- tacks. arXiv preprint arXiv:1706.06083 (2017). [37] Tim M iller . 2018. Explanation in a rtificial intelligence: Insights from the social sciences. A rtificial Intelligence (2018). [38] Smitha M illi, L udwig Schmidt, Anca Draga n, and Moritz Hardt. 2019. Model Re- construction from Model Explanations. In Pr oceedings of ACM FA T* 2019 (2019). [39] Margaret Mitchell, Simon e W u, Andrew Zaldivar, Parker Barnes, Lu c y Vasser- man, Ben Hutchinson, Elena Spitzer , Inioluwa Deb orah Raji, and Timnit Gebru. 2019. Model ca rds for model reporting. In Proceedings of the Conference on F air- ness, Accountability , and Transparency . ACM, 220–229. [40] Brent Mi ttelstadt, Chris Russell, and Sand ra W achter. 2019. Explaining expla- nations in AI. In Proceedings o f the conference on fairness, accountability, and transparency . ACM, 279–288. [41] Grégoire Montavon, Sebastia n Lapuschkin, Alexander Binder , W ojciech Samek, and Klaus- Rob ert Müller . 2017. Expl aining nonlinear clas sification decisions with deep taylor decomposition. Pattern Recognition 65 (2017), 211–222. [42] Yilin Niu, Chao Qiao, Hang Li, a nd Minlie Huang. 2018. W ord Embedding ba sed Edit Distance. arXiv preprint arXiv:1810.10752 (2018). [43] Board of Governors of the Federal Reserve System. 2011. Supervisor y Guidance on Model Risk Management. https://ww w .federal reserve.gov/supervisionreg/srletters/s r 1107a1.pdf (2011). [44] Onora O’Neill. 2018. Linking trust to trustworthiness. International Journal of Philosophical Studies 26, 2 (2018), 293–300. [45] European Par liament a nd Council of European Union. 2018. European Union General Data Protection Regulation, A rticles 13-15. http://ww w .privacy- regulation.eu/en/13.htm (2018). [46] Judea Pearl. 2000. Causality: models, reasoning and inference . V ol. 29. Springer . [47] Fábio Pinto, Ma rco OP Sampaio, and Pedro Biza rro. 2019. Automatic Model Monitoring for Data Streams. arXiv preprint arXiv:1908 . 04240 (2019). [48] Forough Pours a bzi-Sangdeh, Da niel G Goldstein, Ja ke M Hofman, J ennifer W ort- man V aughan, and Han na Wallach. 2018. Manipulating and mea suring model interpretability . arX iv preprint arXiv:1802.07810 (2018). [49] Alun Preece, Dan Harborne, Dave Braines, Richard T omsett, and Supriyo Chakrab orty . 2018. Stakeholders in explainable AI. arX iv preprint arXiv:1810.00184 (2018). [50] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any class i fier . In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 1135–1144. [51] Andrew S lavin Ross, Michael C H ughes, and Finale Doshi- V elez. 2017. Right for the r ight reasons: training differentiable models by constraining their ex- planations. In Proceedings of the 26th International Joint Conference on Artificial Intelligence . AAAI Press, 2662–2670. [52] Cynthia Rudin. 2019. Stop ex plaining black b ox machine learning models for high stakes decisions and use interpretable models i nstead. Nature Machine Intelligence 1, 5 (2019), 206. [53] Andrew D Selbst and Solon Baroca s. 2018. The intuitive appeal of explainable machines. Fordham L. Rev . 87 (2018), 1085. [54] Lloyd S Shapley . 1953. A Value for n-Pers on Games. In Contributions to the Theory of Games II . 307–317. [55] Shubham Sharma, Jette Henderson, a nd Joydeep Ghosh. 2019. CERTIF AI: Coun- terfactual Explanations for Robustness, Transparency, Interpretability, and Fair- ness of Artificial Intelligence models. arXiv preprint arXiv:1905.07 857 (2019). [56] Reza Shokri, Ma rtin St robel, and Y air Zick . 2019. Privacy Risks of Ex plaining Machine Lear ning Models. arXiv preprint arXiv:1907 .00164 (2019). [57] A vanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning im- portant features through propagating activation differences. In Proceed ings of the 34th International Conference on Mac hine Learning- V olume 70 (ICML 2017 ) . Journal of Machine Lear ning Research, 3145–3153. [58] A vanti Shrikum ar , Eva Prakash, and Anshul Kundaje. 2018. Gkmexplain: Fas t and Accurate Interpretation of Nonlinear Gapped k-mer Support V ecto r Ma - chines Using Integrated Gra dients. BioRxiv (2018), 457606. [59] Sahil Singla, E ric Wallace , Shi Feng, and Soh eil Feizi. 2019. Understanding Im- pacts of High-Order Loss Approximations and Features in Deep Learning Inter- pretation. In Proceedings of the 36th International Conference on Machine Learn- ing (Proceedings of Machine Learning Research) , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), V ol. 97. PML R, Long Beach, C alifornia, USA, 5848–5856. [60] Daniel Smilkov , Nikhil Thorat, Been Kim, Fer nanda Viégas, a nd Martin Wat- tenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017). [61] Erik Štrumbelj and Igor Ko nonenko . 2014. Explaining prediction models and individual prediction s with fea ture contributions. Knowledge and Information Systems 41, 3 (2014), 647–665. [62] Mukund Sundarar ajan, Ankur Taly , and Qiqi Yan. 2017. Axiomatic attrib ution for deep networks. In Proceedings of the 34th International Conference on Machine Learning- Vol ume 70 (ICM L 2017) . Journal of Machine Lear ning Resea rch, 3319– 3328. [63] Floria n Tramèr, Fan Zhang, Ari Juels, M ichael K Reiter, a nd Thomas Ris tenpart. 2016. Stealing machine learning models via prediction apis. In 25th { USENIX } Security Symposium ( { USENIX } Security 16 ) . 601–618. [64] Dimitri s Tsipras, Shibani Santurkar, Logan En gstrom, Alexander Turner , and Aleksander Madr y . 2019. Robustness May Be at Odds with Accuracy . In International Conference on Learning Representations . https://openreview .net /forum?id=SyxAb30cY7 [65] Berk Ustun, Al exander Spangher , and Y ang Liu. 2019. Actionable recourse in linear cla s sification. In Proceedings of the Conference on Fairness, Accountability, and Transparency . ACM, 10–19. [66] Sandra W achter , Brent Mittelstadt, and Chris Russell. 2017. Counterfactual Ex- planations without Opening the Black Box: A utomated Decisions and the GPDR. Harv . JL & T ech. 31 (2017), 841. [67] Adrian W eller . 2019. Transparency: motivations and challenges. In Explainable AI: Interpreting, Explaining and Vis ualizing Deep Learning . Springer , 23–40. [68] James W exl er , Mahima Pushkarna, T olga Bolukba si, Ma rtin W attenberg, Fer- nanda Viegas , and Jimb o Wilson. 2019. The What-If T ool: Interactive Probing of Mac hine Learning Models. arXiv preprint arXiv:1907.04135 (2019). [69] Chih-Kua n Y eh, Cheng- Y u Hsi eh, Arun Sai Suggala, Dav id Inouye, and Pradeep Ravikumar. 2019. How Sensitive are Sensitivi ty-Based Explanations? arXiv preprint arXiv:1901.0939 2 (2019). [70] Jia nming Zhang, Sarah Adel Bargal, Zhe Lin, Jon athan Brandt, Xiaohui Shen , and Stan Sclaroff. 2018. Top-d own neural attention by excitation bac k prop. In- ternational Journal of Computer Vis ion 126, 10 (2018), 1084–1102. [71] Yujia Zhang, Kua ngyan Song, Yiming Sun, Sa rah T an, a nd Ma deleine Udell. 2019. "Why Should Y ou Trust My Explanation?" Understanding Uncertainty in L IME Explanations.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment