Explainable Machine Learning in Deployment

Explainable Machine Learning in Deploy ment Umang Bhatt 1 , 2 , 3 , 4 , Alice Xiang 2 , Shubham Sharma 5 , Adrian W eller 3 , 4 , 6 , Ankur T aly 7 , Y unha n Jia 8 , Joy deep Ghosh 5 , 9 , Ruchir Puri 10 , José M. F . Moura 1 , Peter Eckersley 2 1 Carnegie Mellon Universi ty , 2 Partnership on AI, 3 Universi ty of Cambridge, 4 Lev erhulme CFI, 5 Universi ty of T exas at A ustin, 6 The Alan T uring Institute , 7 Fiddler Labs, 8 Baidu, 9 CognitiveScale , 10 IBM Research ABSTRA CT Explainable machine learning oﬀers the potential to pro vide stake- holders with insights into model behavior by using various meth- ods such as feature importance scores, counterfactual explanations, or inﬂuential training dat a. Y et there is l ittle u nderstanding of how organizations use t hese methods in practice. This study explores how organizations view and use explainability for stakeholder con- sumption. W e ﬁnd that, currently , the majority of depl oyments are not for end users aﬀect ed by the model but rather for machine learning engineers, who use explainability to debug the model it- self. There is thus a gap between e xplainability in pract ice and the goal of tr anspare ncy , since explanations primarily serve internal stakeholders rather than external ones. Our study sy nthesizes the limitations of current explainability techniques that hamper t heir use for end users. T o facilitate end user interaction, we dev elop a framew ork for establishing clear goals for explainability . W e end by discussin g concerns raised regardin g explainability . CCS CONCEPTS • Hum an-centered computing ; • Social and professional top - ics → S ocio-te chnical syste ms ; • Computing methodologies → Machi ne learning ; KEYWORDS machine learning, explainability , t ransparen cy , deployed systems, qualitative study A CM Refere nce Format : Umang Bhatt 1 , 2 , 3 , 4 , Alice Xiang 2 , Shubham Sharma 5 , Adrian W eller 3 , 4 , 6 , Ankur T aly 7 , Y unhan Jia 8 , Joydeep Ghosh 5 , 9 , Ruchir Puri 10 , José M. F. Moura 1 , Peter Eckersley 2 . 2020. Explainabl e M achine Learning in Deployment. In Conference on Fairness, Accountability , and Transp arency (F A T* ’20), Jan- uary 27–30 , 2020, Barcelona, Spain. ACM, New Y ork, NY, USA, 10 pages. https://doi.org/10.1145/3351095.3375624 1 IN TRODUCTION Machine learning (ML) mo dels are b eing increasing ly embedded into many aspects of daily life, such as healthcare [16], ﬁnance [26], and social media [5]. T o build ML models worthy of human trust, resear chers have propo sed a variety of techniques for explainin g ML mo dels to stakeholders. Deemed “e xplainability , ” this bo dy of prev ious work attempts to illuminate t he reasoning u sed by ML Permission to make digital o r hard copies of part or all of this work for p ersonal or classroom u se is granted without fee provided that c opies are not ma de or distributed for proﬁt or co mmercial advantage and that copies b ear this notice and the full citation on the ﬁrst page. Copyrights for third-party components of this work must b e honored. For all other uses, contact the owner /author(s). F A T* ’20, January 27–30, 2020, Barcelona, Spain © 2020 Copyright held by the owner/author(s). A CM ISBN 978-1-4503-6936-7/20/02. https://doi.org/10.1145/3351095.3375624 models. “Ex plainability” loosely refers t o any te chnique that helps the user or dev eloper of ML mod els u nderstand why mo dels b e- have the way they do. Explanations can come in many forms: from telling patients which symptoms were indicative of a particular di- agnosis [35] to helping factor y workers analyze in eﬃciencie s in a production p ipeline [17]. Explainability has be en tou ted as a way to enhance transparency of M L mo dels [33]. Transpar ency includes a wide variety of eﬀorts to provide stakeholders, particular l y end users, wit h rele vant infor- mation ab out how a model works [67]. One form of this woul d b e to publish an algorit hm’s cod e, though this type of t r anspare ncy would not provide an intelligible explanation to most users. An- other form would b e to disclose p roperties o f the t raining pro- cedure and datasets used [ 39]. Users, howe ver , are generally not equipp ed to be able to understand how raw dat a and co d e t rans- late into b eneﬁts o r harms that might aﬀect t hem individually . By providin g an explanation for how the model made a decision, ex- plainability techniques seek t o provide transparency directly t ar- geted to human u sers, often aiming to increase tru st worthiness [44]. The importance of e xplainability as a concept has been re- ﬂected in legal and ethical guidelines fo r data and ML [53]. In cases of auto mated decision-making , Ar ticles 13-15 of the European Gen- eral Data Protection Regulation (GDPR) require that data subjects have access to “meaningful information about the l ogic involv ed, as well as the signiﬁcance and the envisa ged consequ ences of such processing for the data subject” [ 45]. In addition, te chnology com- panies have released artiﬁcial intelligence (AI) principles that in- clude t ransparen cy as a core value, including notions of explain- ability , interpretability , or intelligibility [1, 2 ]. With growing intere st in “p eering under the hood ” of ML mod- els and in providing explanations to human users, explainability has beco me an imp ortant subﬁeld of ML. Despite a b u rgeoning lit- erature, there has b een little work charact erizing how explanations have b een deploy ed by organizations in t he r eal world. In this paper , we explore how o rganizations have deployed lo cal explainability techniques so that we can obser ve which te chniques work b est in practice, report on the short comings of existin g tech- niques, and recommend paths for future rese arch. W e focus specif- ically on local explainability te chniques since these techniques e x- plain individual pr edictions, making them typically the most r ele- vant form of model transparency for end users. Our stud y synthesizes interviews with roughly ﬁfty pe ople from approximately thirty organizations to understand which explain- ability te chniques are used and how . W e report t rends from two sets of in terviews and provide recommendations to organizations deploying explainability . T o the b est of o ur knowledge, we are the ﬁrst to conduct a study of how e xplainability techniques are u sed by organizations that d epl oy ML mo dels in t heir workﬂows. Our main co ntributions are threefold: F A T* ’20, January 27–30, 2020, Bar celona, Spain Bha et al. • W e interview around twenty data scientists, who are not currently using e xplainability tools, to understand their or- ganization’s needs fo r explainability . • W e interview around thirty diﬀerent individuals on how their o rganizations have deployed explainability techniques, reporting case studies and takeaways for e ach technique . • W e suggest a framework for organizations to clarify t heir goals for deploying e xplainability . The r est of this paper is organized as follows: (1) W e d iscuss the methodology of our surve y in Section 2. (2) W e su mmariz e our ov erall ﬁn dings in Section 3. (3) W e detail how lo cal explainability techniques are used at various organizations and discuss technique-spec iﬁc take- aways in Section 4. (4) W e dev elop a framework for establishing clear goals when deploying local explainability in Se ct ion 5.1 and discuss con- cerns o f e xplainability in Section 5.2. 2 METHODOLOGY In t he spirit o f Holstein et al. [28], we study how industry practi- tioners look at and deploy explainable ML. Sp eciﬁcally , we stud y how particular o rganizations deploy explainability algorithms, in - cluding who consumes the explanation and how it is evaluated for t he intended stakeholder . W e co nduct two sets of interviews: Group 1 consisted at how d at a scientists who are not currently us- ing explainable machine learning hop e to leverag e various explain- ability too ls, while Group 2, t he crux of this paper , consisted at how explainable mach ine learn ing has b een deploy ed in practice. For Group 1, Fiddl er Labs led a set of around twenty interviews to assess explainability needs across various o rganizations in the technology and ﬁnancial services sectors. W e speciﬁcally fo cused on teams that d o not cu r rently employ explainability tools. These semi-structured, hour-long in terviews included, but w ere not l im- ited to, the following questions: • What are your ML use case s? • What is your curr ent model development w orkﬂow? • What are your pain points in deploying ML mo dels? • W ould explainability help addr ess t hose p ain p oints? Group 2 spanned roughly thirty peo ple across app roximately twenty diﬀerent organizations, b oth for -proﬁt and non-proﬁt. Most of these organizations are members of the Partnership on AI, which is a global multistakeholder non-proﬁt established to study and for- mulate b est practices for AI to b eneﬁt so ciety . With each individual, we he ld a t hirt y-minute to two-hour semi -structured intervie w to understand the state of explainability in their organization, their motivation for using e xplanations, and the beneﬁts and shortcom- ings of the method s used. Some organizations asked to stay anony- mous, not to b e re ferred to explicitly in the prose, or not to be in- cluded in the acknowledgemen ts. Of the people we sp oke with in Group 2, around one-thir d rep- resen ted non-proﬁt organizations (academics, civil societies, and think t anks), while the rest worked for for-proﬁt organizations (corporations, industrial resear ch labs, and start-up s). Broken down by organization, around half were fo r-proﬁt and half were aca- demic / non-proﬁt. Ar ound one-thir d of the in terviewees wer e e x- ecutives at t heir organization, around half were rese arch scientists or engineers, and the remainder were p rofessors at academic in- stitutions, who co mmented on the co nsult ing the y had done with industry leaders to co mmercialize their resear ch. The q uestions we asked Group 2 in cluded, but w ere not limited to, the following: • Do es y our organ ization use ML model explanations? • What typ e of explanations have you used (e .g. , feature-based, sample-based, counterfactual, or natural l anguage)? • Who is t he audience for the mo del explanation (e .g. , research scientists, pr oduct mana gers, domain exp ert s, o r users)? • In what context have you deployed the explanations (e.g. , in- forming the development pro cess, informing human decision- makers ab out the mo d el, or i nforming the end user on ho w actions w ere t aken based on the mo del’s output)? • How does your o r ganization decide when an d where to use model e xplanations? 3 SUMMARY OF FINDINGS Here we synthesize the re sults from both intervie w groups. For the sake of clarity , we deﬁne various terms based on the context in which the y appear in the fort hcoming prose. • Trustworthiness refers to the extent t o which stakeholders can r easonably t rust a mo del’s ou tputs. • Transparency refers to attempts to provide stakeholders (par- ticularly external stakeholders) with relevant information about how the model works: this includes documentation of t he training pr ocedu re , analysis of training data distribu- tion, code releases, feature- lev el explanations, e tc. • Explainabili ty refers to attempts to provide insights into a model’s behavior . • Stakeholders are the pe o ple who either want a mo del to b e “e xplainable , ” will consume the model explanation, or are aﬀected by decisions made based on model output. • Practice refers to the real-world context in whi ch the mo del has been depl oyed. • Lo cal Explainabili ty aims to explain the mo del’s b ehavior for a spe ciﬁc input. • Global Explainability attempts to understan d t he high- lev el concepts and reasoning used by a model. 3.1 Explainability Needs This subsection p rovide s an ov erview of explainabi lity needs that were uncovered with Group 1, data scientists from o r ganizations that do n ot curre ntly deploy e xplainability te chniques. These data scientists wer e asked t o describe t heir “pain points” in building and deploying M L models, and how the y hope t o u se explainability . • Mod e l debugging : Most data scientists struggle with de- bugging p oo r mo del p erformance. They wish to identify why the mo del p erforms p oorly on certain inputs, and also to identify reg ions of the input space with below average per- formance. In addition, they seek guidance on how to en- gineer new features, drop redundant features, and gather more dat a to improve model performance. For instance, one data scien tist said: “If I have 60 features, maybe it’s e qually eﬀective if I just have 5 features. ” Dealing with feature inter- actions was al so a concern, as the data scientist co ntinue d , “Feature A will impact feature B, [since ] feature A might Deploying Explainability F A T* ’20, January 27–30, 2020, Bar celona, Spain negativ ely aﬀect feature B—how do I attribute [importance in the presence o f ] correlations?” Others mentioned explain- ability as a debugging solut ion, helping to “narrow down where thin gs are broken. ” • Model monitoring : Sev eral individuals worry about drift in t he feature and prediction distributions after deployment. Ideally , they would like t o b e alerted when ther e is a signif- icant drift relative to t he t raining distribution [6, 4 7]. One organization would like explanations for how drift in fea- ture distribut ions would impact model outcomes and fea- ture contribution to t he model: “W e can compute how much each feature is dr ifting, but we w ant to cross-re feren ce [ this] with which features ar e impact ing the mo d el a lot. ” • Model transparency : Organizations that deploy mod els to make de cisions that directly aﬀect end u sers seek explana- tions for model predictions. The e xplanations are mean t to increas e mo del transparency and comply with cu rrent or forthcoming regulations. In general, data scientists b eliev e that explanations can also help communicate predictions to a broader external audience of o ther business teams and customers. One company stressed the nee d to “show your work” to provide reasons on underwriting d e cisions to cus- tomers, and another company nee ded explanations to re- spond to custo mer complaints. • Model audit : In ﬁnancial organizations, due to regulatory requirements, all deploy ed ML models must go through an internal audit . Data scientists building these models need t o have t hem revie w ed by internal risk and legal teams. One of the goals of the mo d el audit is t o conduct various kinds of tests provided by re gulations like SR 11-7 [43]. An eﬀec- tive model validation frame work should include: (1) ev alua- tion of conceptual soundness of t he model, (2) ongoing mo n- itoring, includ ing b enchmarking, and (3) out comes analy- sis, including back-testing. Explainability is view ed as a t ool for evaluating t he sou ndness of t he mo del on variou s data points. Financial institutions would like t o conduct sensitiv- ity analyse s, checking the impact of small changes t o inputs on model outputs. Unexpect edly large changes in ou t puts can i ndicate an unstable model. 3.2 Explainability Usage In T able 1, w e aggr egate some of the explainability use cases that we r eceived from diﬀerent organizations in Group 2. For each use case, we deﬁne the do main of use (i.e. , the industry in which the model is deployed), t he purp o se o f the model, the explainability technique used, the st akeholder consuming the explanation, and how the explanation is evaluated. Evaluatio n criteria denote how the organization compares the success of various explanation func- tions for the c ho sen technique (e .g., after selecting feature impo r- tance as t he technique, an organization can compare LIME [50] and SHAP [34] explanations via the faithfulness criterion [69]). In our study , feature imp ortance was the most common explain- ability technique, and Shapley values were the most common t ype of feature impor tance explanation. The most common stakehold- ers were ML engineers (or research scientists), followed by domain experts ( e.g ., loan oﬃcers and content moderators). Section 4 pro- vides deﬁnitions for each technique and further details on how these t echniques w ere u sed at Group 2 organizations. 3.3 Stakeholders Most or ganizations in Gr oup 2 deploy explainability atop thei r e x- isting ML workﬂow for one of the following stakeholders: (1) Exe cutives : These individuals de em explain ability neces- sary to achieve an o r ganization’s AI p r inciples. One research scientist felt that “explaina bility was st rongly advised and marketed by higher-ups, ” thou gh sometimes explainability simply became a checkb ox. (2) ML Engine ers : These individuals (including d ata scientists and r esear chers) train ML mod els at their organization and use explainability techniques to understand how the t rained model works: do the most imp ortant features, most similar samples, and neare st t raining point(s) in the opp o site class make sense? Usi ng explainability to d ebug what the model has l earned, this group of individuals were the most com- mon e xplanation co nsumers in our study . (3) End Use rs : T his is the most intuitive consumer of an expla- nation. The e nd user is the person consumin g t he output o f a model or making a decision based on mo del ou tput. Ex- plainability shows the end user why the mo del behaved the way it did, which is important for showing t hat the mo del is trustworthy and also providing g reater transpar ency . (4) Othe r Stakeholders : There are many other possible stake- holders for explainability . One such group is regulators, who may mandate that algorithmic de cision-making systems pro- vide explanations to aﬀecte d po pulations or the regulators themselves . It is important t hat this group understands how explanations are deployed based on existing research, what techniques are feasible, and how the techniques can align with the desired explanation from a model. Another group is domain experts, who are often tasked with auditing the model’s behavior and ensuring it al igns with expert intu- ition. For many organizations, minimizing the dive rgence between an expert’s intuition and the mo del’s explanation is ke y t o su ccessfully impl ementing e xplainability . Overwhelmingly , we found that local explainability techniques are mostly consumed by ML engineers and data scientists t o au- dit mo dels before d eploymen t rather than t o provide explanations to end users. Our interviews reve al factors that prev ent organiza- tions from sho wing e xplanations t o end users or those aﬀected by decisions made from ML mo del out puts. 3.4 K e y T akeaways This subse ction su mmariz es some key takeaways from Group 2 that shed light on the reasons for the limited deploymen t o f ex- plainability techniques and their u se primarily as sanity che cks for ML engineers. Organizations generally still consider the judg- ments o f domain experts t o be the implicit ground tr u th for e xpla- nations. Sin ce explanations produced b y current techniques often deviate from t he u nderstanding of domain experts, some organiza- tions still u se human experts to evaluate the explanation before it is p resented to users. Part of t his deviation stems from the p otential F A T* ’20, January 27–30, 2020, Bar celona, Spain Bha et al. Domain Model Purpose Explainabili ty Tech niqe St akeholde rs Ev alua tion C riteria Finance Loan Repa yment Fea ture I mp ort ance Loan Officers Compl e teness [34] Insurance Risk Assessment Feature Import ance Risk Analy sts Completeness [34] Content Moderation Malicious Reviews Fea ture I mp ort ance Content Moderat ors Completeness [34] Finance Cash Distribution Fea ture Import ance ML Engineers Sensitivity [69] Facial Recognition Smile Detection F ea ture I mport ance ML Engineers Faithfulness [7] Content Moderation Sentiment Analy sis Feat ure Import ance QA ML Engineers ℓ 2 norm Healthcare M edicare access Counterf actu al Explanations ML Engineers normalized ℓ 1 norm Content Moderation Object Detection Adversarial Perturba tion QA ML Engi neers ℓ 2 norm T abl e 1: Sum mary of sele ct deployed local explainability use cases for ML explanations t o reﬂect spurious correlations, which result from models d etecting patterns in the data that lack causal unde r- pinning s. As a r esult, organ izations ﬁnd explainability techniques useful for helping their ML engin eers identify a nd reconcile incon- sistencies b etween the model’s explanations and their intuition or that of domain experts, r ather than for dir ectly pr oviding explana- tions to end users. In ad d ition, t here are technical limitations that make it diﬃcult for organizations to show end users explanations in real-time. The non-conv exity of certain mod els make certain explanations (e.g ., providin g the most inﬂuential datap oints) hard to compu t e q uickly . Moreover , ﬁnding plausible counterfactual datap oints (that are fea- sible in the real world and on the input data manifold) is nontrivial, and many existing techniques cu rrently make crude approxima- tions or return t he closest datapoint of the ot her class in the train - ing set. Moreover , prov iding certain explanations can raise privacy concerns due to the risk of model inversion. More br oadly , organizations lack framew orks f or deciding why they w ant an explanation, and cur rent research fails to capture t he objective of an explanation. For example, large gradien ts, repre- senting t he direction of maximal variation with respect t o the out- put manifold, do not necessarily “e xplain” anything to stakehold- ers. At b est, gradient-based explanations provide an interpretation of how t he model behav es up on an inﬁnitesimal perturbation (not necessarily a feasible one [29]), but do es not “ explain” if the model captures the underlying causal mechanism from t he d ata. 4 DEPLO Y ING LOCAL EXPLAINABI LI T Y In this section, we dive into how lo cal explainability te chniques are used at various or ganizations (Group 2). After revie wing technical notation, we deﬁne lo cal explainability techniques, discuss o rgani- zations’ use cases, and t hen report takeaways for each technique. 4.1 Preliminarie s A black b ox mo del f maps an input x ∈ X ⊆ R d to an output f ( x ) ∈ Y , f : R d 7→ Y . When we assume f has a parametric form, we write f θ . L ( f ( x ) , y ) denotes the loss function use d to train f on a dataset D of input-output pairs ( x ( i ) , y ( i ) ) . Each o r ganization we spoke with has deployed an ML mo del f . They hope t o e xplain a data point x using an explanation function д . Local explainability refers to an explanation for why f predicted f ( x ) for a ﬁxe d p oint x . The l ocal explanation methods we discuss come in one of the following for ms: Which feature x i of x was most important fo r prediction with f ? Which training datapoint z ∈ D was most import ant to f ( x ) ? What is t he minimal change to the input x required to change the output f ( x ) ? In this pap er , w e deliberately decide to fo cus on t he more po pu- larly depl oyed local explainability techniques instea d of glob al ex- plainability techniques. Global explainability refers to t echniques that attempt t o explain the model as a whole. These techniques attempt to characterize the concepts learned by the mo del [3 1], simpler mo dels learned from the representation of complex mo d- els [1 7], prototypical samples fr om a particular model out put [10], or t he to polo gy of the data itself [20]. None of our interview ees reported deploy ing g lobal e xplainability techniques, thou gh some studied these techniques in research settings. 4.2 Feature Importance Feature imp ortance was by far t he most popu lar technique we found across our stud y . It is use d across many diﬀerent domains (ﬁnance , healthcare, facial recognition, and content mo deration). Also known as feature-le vel interpre tations, feature at tributions, or saliency maps, this metho d is by far the most widely used and most w ell-studied e xplainability technique [9, 24]. 4.2.1 Formulation. Feature importance methods deﬁne an expla- nation fu nction д : f × R d 7→ R d that takes in a mo del f and a point o f interest x and returns importance scores д ( f , x ) ∈ R d for all features. is the impo rtance of (or attribution for) feature x i of x . These explanation functions roughly fall into two categories: perturb at ion-based techniques [8, 14, 22, 34 , 50, 61] and gradient- based t echniques [7, 41, 57, 59, 60, 62 ]. Note that gradient-based techniques can b e se en as a spec ial case of a perturbat ion-based technique with an inﬁnites imal perturbation size. Heatmaps are also a type of feature-leve l explanation that denote the importance of a region or collection of features [4, 22] . A prominent class of perturb at ion b ased metho ds is based o n Shapley values from co- operative game theory [54]. Shapley values are a way to distrib- ute the gains from a co operative game to its players. In applying the method to explaining a mod el prediction, a coop erative game is deﬁned b etween t he features w ith the mo del predict ion as the gain. T he h ighlight of Shapley values is that the y en joy axiomatic uniqueness guaran tees. Unfortunately , calculating the exact Shap- ley value is exponential in d , input dimensionality; howev er , the literature has propose d approximate methods using weighted lin- ear reg ression [34], Monte Carlo approximation [61], centroid ag- gregation [11], and graph- structured factorizat io n [14]. When we Deploying Explainability F A T* ’20, January 27–30, 2020, Bar celona, Spain refer to Shapley-r elated methods her eafter , we mean such appr ox- imate me thods. 4.2.2 Shapley V alues in Practice. Organization A works with ﬁ- nancial institutions and helps explain mo dels for credit risk anal- ysis. T o integrate into the existing ML workﬂow of t hese institu- tions, O rganization A procee ds as follows. They let data scie ntists train a mo del to the desired accuracy . Note that Organization A focuses most l y o n models trained on t abu lar dat a, though they are beginning to venture into unstructured data (i.e. , l anguage and im- ages). During mo del validation, risk analysts conduct stress t ests before deploying the mod el to l o an oﬃcers and other decision- makers. After decision-makers vet t he mod el outputs as a sanity check and decide whe ther or not t o ove rride the mo del output, Or- ganization A generates Shaple y value explanations. Before launching the mo del, risk analysts are asked to revie w the Shapley valu e explanations to ensure t hat the mod el exhibits expected b ehavior (i.e ., the mod el uses the same features that a hu- man would for t he same task). Notably , the cust omer supp ort team at these institut ions can al so use these explanations to provide in- dividuals information about what went into the de cision-makin g process for their loan appro val or cash distribution de cision. They are shown the p ercentag e contribution to the mo del output (the positive ℓ 1 norm of the Shapley value explanation along with the sign of contribution). This means that the explanation would b e along the lines of, “55% of the de cision was decided by your age , which positively cor related with t he predicted outcome. ” When comparing Shapley value explanations to other pop u lar feature import ance te chniques, Organization A fou nd that in prac- tice LIME e xplanations [50] give unexpected explanations t hat do not align with human intuition. Recent work [71] shows that the fragility of LIME explanations can b e traced to the sampling vari- ance when explainin g a singular dat a point and t o t he e xplana- tion sensitivity t o sample size and sampling proximity . Thou gh decision-makers have access to the feature-importance explana- tions, end u sers are still not shown these explanations as reasonin g for model output. Organization A aspires to ev entually provide this “e xplanation” to end users. For gradient-based language models, Organization A uses In te- grated Gradients (related to Sha pley V alues by Sundararajan et al. [62]) to ﬂag malicious re views and mo derate conten t at the afore- mentioned institutions. This information can be highlighted to en- sure the trustworthiness and transparency of the mod el to the de- cision maker (the hired content moderator here), since they can now se e which word w as most import ant to ﬂag the content. Going for ward, Organization A intends to use a global variant of t he Shapley value explanations by exposing how Shapley valu e explanations work on average for datap oints of a particu l ar pre- dicted class ( e .g., on av erage someone who was denied a loan had their age matter most for the prediction). This global explanation would help risk analysts get a birds-e ye view of how a mo del b e- haves and whether it aligns with their expectations. 4.2.3 Heatmaps in Transportatio n. Organization B looks t o detect facial e xpressi ons from video feed s of users driving. They hop e to use explainability to identify the actions a u ser is p erforming while the user drives. Organ ization B has tried fe ature visualizatio n and activation visualization techniques that get attributions by back- propagating gradients to regi ons of interest [4, 70] . Speciﬁcally , they u se the se probabilistic Winner- Ta ke- Al l te chniques ( variants of existing gradient-based feature import ance techniques [ 5 7, 62]) to lo cal ize the region of importance in t he input space for a partic- ular classiﬁcation t ask. For example, when dete cting a smile , the y expect the mout h of the d r iver to be important. Though none o f these desired te chniques have be en deployed for the end user (the driver in t his case), ML engineers at Organi- zation B found these techniques u seful for qu alitative re view . On tiny datasets, e ngineers can ﬁgure out which scenarios have false positives (vi deos falsely detecte d to contain smiles) and why . T hey can also identify if true p ositives ar e paying attention to the right place or if there is a problem with spu rious art ifacts. Howe ver , while tr ying t o understand why the mo del erred by analyzing similarities in false p ositives, they have struggled to scale this lo cal technique across heatmaps in aggregate across multi- ple videos. They are able to q ualitatively evaluate a sequence of heatmaps for one video, but doing so across 100M frames simulta- neously is far more diﬃcult. P araphrasing the VP of AI at Organi- zation B, aggregating saliency maps across vide os is mo ot and con- tains littl e information. No t e that an individual heatmap is an ex- ample of a local explainability technique, but an aggreg ate heatmap for 100M frames would be a global technique. Unlike aggr egating Shapley values for tabular d ata as done at Organi zation A, taking an e xpect ation over heatmaps (in the statistical sense) does not work, since aggr egating pixel att ributions is meani ngless. One op- tion Organization B d iscussed would be to clustering low dimen- sional repr esentations of the heatmaps and t hen tagging each clus- ter based o n what the model is focu sing on; unfortunately , humans would still have to manually label t he clusters of impo rtant regions. 4.2.4 Spurious C orrelations. Related to model mon itoring for fea- ture drift dete ction d iscusse d in Se ction 3 .1, Organization B has encountered issues with spu rious correlations in their smile dete c- tion mo dels. Their Vice President of AI noted t hat “[ML engin eers] must kno w to what e xtent you want ML to le ve rage highly corre- lated data t o make cl assiﬁcations. ” Explainability can help identify models that fo c u s on that correlation and can ﬁnd ways to have models ignore it. Fo r example , there may be a side eﬀect of a corre- lated facial expression or co-o ccurrence: cheek raising, for exam- ple, co -occurs wit h smiling. In a cheek-raise detect or t rained on the same dataset as a smile detector b u t with diﬀerent lab els, the model still focused on the mouth instead of the cheeks. Both mo d- els wer e ﬁxated on a prevalent co-occurrence. Att ending t o the mouth was unde sirable in the che ek-raise d etector but allow ed in the smile detecto r . One way Organization B co mb ats this is by using simpler mo d- els on top of complex feature engineering . For example, they use black b ox deep learning mo dels for building go od d escriptors that are robust across camera viewpoints and will detect diﬀerent fea- tures t hat subje ct matter experts deem import ant for drowsiness. There is one mo del p er important descriptor (i.e ., one mo del for ey es closed, o ne for yawns, etc.). Then , the y ﬁt a simple model on the extracted descriptors such that the important descriptors are F A T* ’20, January 27–30, 2020, Bar celona, Spain Bha et al. obvious for the ﬁnal prediction of drowsiness. Ideally , if Organi- zation B had guarantees abou t the disentanglemen t of data gen er- ating factors [ 3], they would b e able to understand which factors (descriptors) play a role in do wnstream cl assiﬁcation. 4.2.5 Feature Imp ortance - T akeaways. (1) Shapley values are rigorously motivated, and approximate methods ar e si mple to deploy for decision makers to sanity check the mo dels t hey have built. (2) Feature imp ortance is not shown to end users, but is used by machine learning engineers as a sanity check. Looping other stakeholders (who make decisions based on mo del out - puts) into the model development process is e ssential to un- derstanding what type of explanations the model delivers. (3) Heatmaps (and feature importance scores, in general) are hard to aggr egate , w hich makes it hard to do false p ositive detection at scale. (4) Spurious correlations can be d etected with simple gradient- based techniques. 4.3 Counterfactual Explanations Counterfactual explanations are te chniques that e xplain individ- ual pr edictions by providing a mean s for recourse. Contrastive e x- planations that highlight contextually r elevan t information to the model out put are most similar to human explanations[40]; how- ev er , in their current fo r m, ﬁnding the rele vant set of plausible counterfactual p oint is no clear . Moreover , while some existing open source implementations for counterfactual explanations ex- ist [65, 68], they either work for sp eciﬁc mo del-types or are not black-box in nature. In this sectio n, we discuss the formul ation for counterfactual explanations and des cribe one solu tion for each de- ployed t echnique. 4.3.1 Formulation. Counterfactual explanations are p oints close to the input for which the decision of the classiﬁer changes. For example , for a p erson who was rejected for a loan by a ML mo del, a co u nterfactual explanation would possibly suggest: "Had y our income b een greater by $5000, the lo an would have b een grante d. " Given an input x , a classiﬁer f , and a distance metric d , we ﬁnd a counterfactual explanation c by solving the optimization problem: min c d ( x , c ) s.t. f ( x ) , f ( c ) (1) The method can b e tailored to allow only certain relevant fea- tures to be changed. No t e that the term counterfactual has a diﬀer- ent me aning in the causality literature [27, 46]. Counterfactual ex- planations for ML were in troduced by W achter et al. [66]. Sh arma et al. [55] pro vide details on e xisting techniques. 4.3.2 Counterfact ual Explanation s in H ealthcare. Organization C uses a faster ve rsion of the formulation in Sharma et al. [ 55] to ﬁnd counterfactual explanations for p rojects in he althcare. When peo- ple app l y for Me d icare , Organization C hopes t o ﬂag if a user’s ap- plication has errors and to provide explanations on how to corr ect the errors. M oreove r , ML engineers can use the robustness score t o compare d iﬀerent mo dels trained using this data: this robustness score is eﬀectively a suitably normalized and averag ed distance between the counterfactual and original p oint in Euclidean space . The origin al formulation makes u se of a slower genetic algorithm, so t hey optimized the co unterfactual explanation generation pro- cess. They are cu rrently dev eloping a ﬁrst-of-its-kind application that can directly take in any bl ack-box mo del an d data and return a robustness score, fairness measure , and co unterfactual explana- tions, all from a single underlying algorithm. The use of t his approach has several advantages: it can be ap- plied to black-box models, works for an y input d at a type, and gen - erates multiple explanations in a single run of the algorithm. How- ev er , there ar e some shortcomings that Organization C is address- ing. One challen ge of counterfactual models is t hat t he co u nterfac- tual mig ht not b e feasi ble. Organ ization C addres ses this by using the training data to guide the counterfactual generation p rocess and by providin g a user interface that all ows do main exp erts to specify constraints. The ﬂexibility of the counterfactual approach comes with a drawback common among explanations for black- box mod els: there is no g uarantee o f the optimality of the explana- tion since black-box techniques cannot guarantee optimality . Through t he creation of a deployed solution for this metho d, the organization realized that cl ients would ideally want an explain- ability score, along w it h a measure of fairness and robustness; as such, they hav e developed an explainability sco re that can be used to compare the explainability of diﬀe rent models. 4.3.3 Counterfact ual Explanations - T ak eaways. (1) Organizations are interested in cou nterfactual explanation solutions since the underlying method is ﬂexible and such explanations ar e easy for e nd users to understand. (2) It is not clear exactly what should be optimize d for when generating a co unterfactual or how to do it eﬃciently . Still, approximate solu tions may suﬃce in practical applications. 4.4 A dversarial Training In order to ensure the model b eing d eploy ed is robust to adver- saries and behaves as intended, many organizations we interviewed use adversarial training to impro ve performance. It has recently been shown that in fact, t his also can lead t o more human inter- pretable features [30]. 4.4.1 Formulation. Other works have also explored t he intersec- tion between adversarial robustness and model interpretations [1 8, 21, 23, 59, 69]. The claim of one of t hese works is that t he closest adversarial example shou ld p erturb ‘fragile’ features, e nabling the model to ﬁt to robust features (indicativ e of a part icular class) [30]. The setup of feature i mportance in t he adversarial training setting from Singla et al. [59] is as follows: д ( f , x ) = max ˜ x L ( f θ ∗ ( ˜ x ) , y ) k ˜ x − x k 0 ≤ k k ˜ x − x k 2 ≤ ρ W e let | ˜ x − x | b e the top- k feature importance scores of the input, x . T his is s imilar to the adversarial example setup which is usually written in the same manner as the above ( without the ℓ 0 norm to limit the number of features that changed). It is also interesting to note that the formulation to ﬁnd counterfactual explanations above matches the for mulation for ﬁnding adversarial examples. Deploying Explainability F A T* ’20, January 27–30, 2020, Bar celona, Spain Sharma et al. [55] use this co nnectio n t o generate adversarial ex- amples an d deﬁne a black-box model robustness score . 4.4.2 Image C ontent Moderatio n. Organization D mo derates user- generated content (UGC) o n se veral public pl atforms. Speciﬁcally , the R&D team at Organization D d ev eloped sev eral mo dels to de- tect adult and violent conten t from users’ uploaded ima ges. Their quality assurance (Q A) team measure s model robustness to im- prov e co ntent detection accuracy under the threat o f adversarial examples. T he robust ness of a content moderation mo del is mea- sured by the minimum p erturbation required for an image to evade detection. Given a gradient-based image classiﬁcation mo del f : R d → { 1 , . . . , K } , and we assume f ( x ) = argmax i ( Z ( x ) i ) where Z ( x ) ∈ R K is the ﬁnal (logit) layer output, and Z ( x ) i is the predic- tion score for the i -th class. The objective can be formulate d as the following optimization p roblem to ﬁnd the minimum pertu rbation: argmin x { d ( x , x 0 ) + c L ( f ( x ) , y ) } (2) d ( · , ·) is some distance measure that Orga nization D chooses to be the ℓ 2 distance in Eucl idean space; L ( ·) is the cross-en tropy loss function an d c is a b al ancing facto r . As is co mmon in the adversarial literatu re , Organization D ap- plies Projected Gradient Descent (PGD ) to search for the minimum perturb at ion from the set of allowable perturbations S ⊆ R d [36]. The sea rch process can be fo rmulated as x t + 1 = Π x + S  x t + α sgn ( ∇ x L ( f θ ∗ ( x ) , y ))  until x t is misclassiﬁed by the detection mo del. ML engineers on the QA team are shown a ℓ 2 -norm p erturbation distance averaged ove r n test images randomly sampled from the test dataset. T he larger the average pertu r bation, the more robust the model is, as it takes greater eﬀort fo r an att acker t o evade detection. The average perturb at ion required is widely use d as a metric when comparing diﬀere nt can didate mod els and diﬀer ent versions of a given model. Organization D ﬁnds that more robust models have more con- vincing gradient-based explanations, i.e ., the gradient o f t he ou tput with respe ct to the input shows that the mo del is focusing on rele- vant p ortions of the images, co nﬁrming recent research [21, 30, 64 ] . 4.4.3 T ext Cont ent Moderation. Organization E uses te xt content moderation algorithms on its UGC platforms, such as forums. Its Q A team is responsible for the reliability and robustness of a sen- timent analysis mo del, which lab els po sts as positive or negativ e, trained on UGC. The QA team seeks to ﬁnd the minimum p ertur- bation requ ired to change the classiﬁcation o f a post. In particular , they want to know how to take misclassiﬁed p osts (e .g., negative ones classiﬁed as positive) and change them t o t he co rrect class. Given a sentime nt analysis mo del f : X → Y , which maps from feature space X to a set of class Y , an adversa ry aims to gen- erate an adversa rial post x ad v from t he o riginal p ost x ∈ X whose ground t ruth label is f ( x ) = y ∈ Y so that f  x a d v  , y . The Q A team tries to minimize d ( x , x ad v ) for a d omain-speciﬁc distance function. Organization E uses the ℓ 2 distance in the emb edding space, but it is equally valid to use the editing distance [42]. Note that perturbation t echnique chan ges accordi ngly . In practice, to ﬁnd the minimum distance in embe dding space, Organization E chooses to iterative ly modify the words in the orig- inal post, starting from the words with the highest imp ortance. Here impor tance is deﬁned as the gradient of the model ou tput with respe ct to a particular word. ML engin eers compute the Jaco- bian matrix of the given po sts x = ( x 1 , x 2 , · · · , x N ) where x i is the i -th w ord. The Jacobian ma trix is as follows: J f ( x ) = ∂ f ( x ) ∂ x = " ∂ f j ( x ) ∂ x i # i ∈ 1 . . . N , j ∈ 1 . . . K (3) where K represe nts the number of classes (in this case K = 2), and f j (· ) represents the conﬁdence valu e of the j th class. The impo r- tance of word x i is deﬁn ed as C x i = J f , i , y = ∂ f y ( x ) ∂ x i (4) i.e ., t he partial derivative o f the conﬁdence value based on the pre - dicted class y regarding to the input word x i . This proce d ure ranks the words by t heir impact on the sentiment analysis results. T he Q A team t hen applies a set of transformations/perturbat io ns t o the most important words t o ﬁnd the mi nimum number of important words that must be pertur b ed in o rder to ﬂip an sentimen t analysis API r esult. 4.4.4 Adv ersarial T raining - T ake aways. (1) There is a relation between model robustness and explain- ability . Model robustness improves the qu ality of feature im- port ances (speciﬁcally saliency maps), conﬁrming recent re- search ﬁ ndings [ 21]. (2) Feature impo rtance helps ﬁnd minimal adversarial p ertur- bations for language models in practice. 4.5 Inﬂuential Samples This technique asks the question: Which data p oint in the training dataset x ∈ D x is most inﬂuential to t he mo del’s output f ( x test ) for a test po int x test ? Statisticians hav e u sed me asures like Cook’s distance [15] which measure the eﬀect of deleting a data point on the model output. Howe ver , such measures require an e xhaustive search and hence do not scale well for larger datasets. 4.5.1 Formulation. For over half of the organizations, inﬂuence functions has b een the too l o f choice fo r explaining which t rain- ing po ints are inﬂuential to the model’s outp u t for a point x [32] (though only one organization act ually deployed the technique). W e let L ( f θ , x ) be the model’s loss for point x . The empirical risk minimize r is give n by ˆ f θ = arg min θ ∈ Θ 1 N Í N i = 1 L ( f θ , y x ( i ) ) . Note that y x = ˆ f θ ( x ) is t he predicted o utput at x with the trained risk minimize r . Koh and Liang [32] deﬁne the mo st inﬂuential data point z to a ﬁxed point x as that which maximizes the following: I up , loss ( z , x ) = − ∇ θ L  ˆ f θ ( x ) , y x  ⊤ H − 1 ˆ f θ ∇ θ L  ˆ f θ ( z ) , y x  This quantity measures the eﬀect of upweightin g on datapoint ( z ) on the loss at x . The goal of sample impo rtance is to uncover which training examples, when p erturbe d, w ould have the largest eﬀect (positive or negative ) on the loss of a test point. F A T* ’20, January 27–30, 2020, Bar celona, Spain Bha et al. 4.5.2 Inf luenc e Functions in Insurance. Organization F uses inﬂu- ence functions t o explain risk mo dels in the insurance industr y . They hope to identify which customers might see an increase in their premiums based on their drivin g history in the past. The or- ganization hopes to divulge to t he end u ser how the premiums for drivers similar t o them are priced. In other words, they hop e to identify the inﬂuential tr aining data p oints [32] to understand which past driv ers had the gr eatest inﬂuence o n the prediction for the obser ved driver . Unfortunately , Organization F has struggled to pr ovide this information to end users since the H essian compu- tation has made doing so impractical sin ce the lat ency is hig h. More p ressingly , ev en when Organization F lets the inﬂuence function proce dure run, they ﬁnd that many inﬂuen tial data po ints are simpl y outliers that are important for all d rivers since those anomalous d rivers are far ou t of distribu tion. As a result, instead of identifying which drivers are most similar to a given driv er , the inﬂuential sample explanation identiﬁes d rivers that are very dif- feren t from any driver (i.e ., outliers). While this is could in theory be useful for outlier detection, it preve nts the explanations from being used in practice. 4.5.3 Inf luent ial Samples - T akeaways. (1) Inﬂuence functions can be intractable for large datasets; as such, a signiﬁcan t eﬀort is nee d ed to improv e these methods to make them easy to deploy in practice. (2) Inﬂuence functions can be sensitive to outliers in the data, such that they might be more useful for outlier dete ction than for providing end users e xplanations. 5 RECOMMEND A TIONS This section provides recommendations for organizations based on the key takeaways in Section 3.4 and the te chnique-speciﬁc take- aways in Sect ion 4. In order to address t he challenges organizations face w hen striving to provide explanations t o end u sers, we recom- mend a framew ork for establishing clear des iderata in explainabi l- ity an d then include concerns associated with explainability . 5.1 Establish Cl ear Desiderata Most organizations we spoke to so l ely deploy e xplainability tech- niques fo r internal engineers and scientists, as a debugging mecha- nism or as a sanity check. At the same time, these organizations af- ﬁrmed the importance of u nderstanding t he st akeholder , and hop e to be able to explain a model prediction to t he end user . Once the target po pulation of the explanation is understoo d, organizations can devise and deploy explainability t e chniques acco rdingly . W e propose the following three steps for establishing clear desider- ata and improving decision making around explainability . These include: clearly identifying t he target populat io n, understanding their needs, an d clarifyin g the intention of t he explanation. (1) Identify st akeholders. Who are your desired explanation consumers? T ypically this will be those aﬀect e d by or shown model outputs. Preece et al. [49] describ e how stake holders have diﬀerent needs for explainability . Distinctions b etween these groups can help d esign better explanation te chniques. (2) Engage with each stakeholder . Ask the stakeholder some variant o f “What would you need t he mo del to explain to you in o rder to understand, trust, or contest the model pre- diction?” and “What type of explanation do you want from your mo del?” Doshi- V elez and Kim [19] highlight how the task being modeled dictates what typ e of explanation the human will need from t he model. (3) Unde rs t and the purp ose of the explanation. Once the context and utility of the explanation are stated, u nderstand what the stakeholder wan ts to do with the explanation [ 24]. • Static Consumption : Will the explanation be used as a one- oﬀ sanity check for some stakeholders or shown to other stakeholders as reasoning for a particular prediction [48]? • Dynami c Mo d el Updates : Will the e xplanation be used to garner fee dback from the stakeholder as t o how the model ought to be upd ated to bett er align wit h their intuition? That is, how does the stakeholder interact w ith the mo del after viewing the explanation? Ross et al. [51] attempt to dev elop a technique for dynamic explanations, wherein the human can guide the mo del towards learning the cor- rect explanation. Once desiderata are cl ariﬁed, stakeholders should b e consulted again. 5.2 Conce rns of Explainability While there are po sitive reasons to encourage explainability of ML models, w e n ote some concerns raised i n our inter vie ws. 5.2.1 On Causality . One chief scientist told u s that “Figuring out causal factors is t he holy grail of explainability . ” Howeve r , causal explanations are largely lacking in t he l iterature, with a few excep- tions [13]. Though non-causal e xplanations can still provide valid and useful i nterpretations of how the model works [37] , man y or- ganizations said that they would be keen to use causal explanations if the y w ere available. 5.2.2 On Privacy . Three organizations mentioned dat a privacy in the context of explainability , si nce in some cases e xplanations can be used to learn about the model [38, 63] or the training data [56]. Methods to counter these concerns have b een dev eloped. For exam- ple, Har der et al. [25] dev elop a methodology for training a diﬀer- entially pr ivate mo del that generates loc al and global explanations using l o cally line ar maps. 5.2.3 On Impr oving Performance. One pu rpose of e xplanations is to improve ML engin eers’ understanding of their models, in order to help them reﬁne and improv e performance. Since machine learn- ing models are “dual use ” [12], we should b e aw are that in some settings, explanations or ot her tools coul d enable malicious users to increase capabilit ies and performance of undesirable systems. For example, sev eral organizations we tal ked wit h use explanation methods to improv e their natural language processing and image recognition mo dels for content moderation in ways that may con- cern so me s takeholders. 5.2.4 Bey ond Deep Learning. Though deep learning has gained popu l arity in recent years, many o rganizations still use classical ML te chniques ( e .g., logistic regression, support vector machines), likely d u e to a n eed for simpler , mo re interpretable models [52]. Many in the explainability community have focu sed on inter- preting black-box deep learning mo dels, ev en though practitioners Deploying Explainability F A T* ’20, January 27–30, 2020, Bar celona, Spain feel that t here is a dearth of mo del-speciﬁc techniques to under- stand traditional ML mo dels. For example, one research scientist noted that, “M any [ﬁnancial institut ions] use kernel-based meth- ods o n t abular data. ” As a result, there is a desire to translate ex- plainability techniques for kernel suppor t vector machines in ge- nomics [58] to models trained o n tabular d ata. Model agnostic te chniques like Lundberg and Lee [34] can b e used for traditional models, b u t are “likely overkill” for explain- ing kernel-based ML mo dels, according to one research scientist, since mo del-agnostic metho ds can be co mputationally expensiv e and lead to poo rly approximated e xplanations. 6 CONCLUSION In this study , we critically examine how explanation te chniques are used in practice. W e are the ﬁrst, t o our knowledge, to inter- view various organizations on how they deploy explainability in their ML workﬂows, concluding with salien t directions for future resear ch. W e found that while ML engineers are increasingly using explainability t echniques as sanity checks d uring t he dev elopment process, there are st ill signiﬁcant l imitat io ns to curr ent t echniques that preven t their use to directly inform end users. These limita- tions include the need for do main experts to evaluate explanations, the risk of spurious correlations reﬂected in mo del explanations, the l ack of causal intuition, and t he latency in computing and show- ing explanations in real-time. Future research should seek to ad- dress these limitations. W e also highlighted the need for organiza- tions to establish clear d esiderata for their explanation techniques and to be cognizant of the concerns associated wit h explainability . Through this analysis, we t ake a step t owards des cribing explain- ability deploymen t and hope that future res earch builds trustwor- thy e xplainability solutions. 7 A CKNOWLEDGMEN TS The authors would like to thank the following individuals for t heir advice, contributions, and/or suppor t: Karina Alexany an (Partner- ship on AI), Gagan Bansal (Univ ersity of W ashington), Rich Caru- ana (Microsoft), Amit Dhurandhar (IBM), Krishna Gade (Fiddler Labs), Konstantinos Georgatzis (QuantumBlack), Jette Henderson (Cogni tiveScale ), Bahador Kaleghi (Element AI), Hima Lakkaraju (Harvard University ), Katherine Lewis (Partnership on AI), Peter Lo (Partnership on AI), T erah Lyons (Partnership on AI), Saayeli Mukherji (Partnership on AI), E rik Pazos (QuantumBlack), Inioluwa Deborah Raji (Partnership on AI + AI Now ), Nicole Rigillo (Elemen- tAI), Francesca Rossi (IBM), Jay T urcot ( A ﬀe ct iva), Ku sh V arshn ey (IBM), Denni s W ei (IBM), Edward Zhong (Bai du), Gabi Zijderveld ( Aﬀectiva), and ten other an onymous individuals. UB acknowledges supp ort fr om DeepMind via the Lev erhulme Centre for the Future of Intelligence (CFI) and the P artnership on AI research fellowship. A W acknowledges support from the David MacKay Newton resear ch fe llowship at D arwin Co l lege, The Alan T uring Institute under EPSRC grant EP/N51012 9 /1 & T U/B/000074, and the Lev erhulme Trust via CFI. REFERENCES [1] 2019. IBM’S Principles for Data Trust and Transparency . https://www .ibm.com/blogs/policy/trust- principles/ [2] 2019. Our approac h: Microsoft AI principles. https://www .microsoft.com/en- u s /ai/our- approach- to- a i [3] Tameem Adel, Zoubin Ghahramani, and Adrian W eller . 2018. Discoveri ng inter- pretable representations for b oth deep generative and discri minative models. In International Conference on Machine Learning . 50–59. [4] Sara h Adel Bargal, Andrea Zunino , Donghyun Kim, Jianming Zhang, Vittorio Murino, and Stan Sclaroﬀ. 2018. Excitation backprop for RNN s . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1440–1449. [5] Osca r Alvarado and Annika W aern. 2018. T owards algorithmic experience: Ini- tial eﬀorts for social media contexts. In Proceedings of the 2018 CHI Conference on Human Factors in Com puting Systems . ACM, 286. [6] Dar io Amodei, Chr i s O la h, Jacob Steinhardt, Paul Christiano, John Schul- man, a nd Da n Mané. 2016. Concrete problems i n AI sa fety . arXiv preprint arXiv:1606.06565 (2016). [7] Mar co Ancona, Enea Ceolini, C engiz Oztireli, and Markus Gr oss. 2018. T o- wards better understanding of gradient-based attribution methods for Deep Neu - ral Networks. In 6th International Conference on Learning Representations (ICLR 2018) . [8] Mar co Ancona, Cengiz Oztireli, and Ma rkus Gross. 2019. Explaining Deep Neu- ral Networks with a Polynomial Time Algorithm for Shapley Value A pproxi- mation. In Proc eedings of the 36th International Conference on Machine Learn- ing (Proceedings of Machine Learning Research) , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), V ol. 97. PML R, Long Beach, C alifornia, USA, 272–281. [9] Davi d Baehrens, Timon Schroeter , Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus -Robert MÃžller . 2010. How to explain individual classiﬁ c a- tion decisions. Journal of M ac hine Learning Research 11, Jun (2010), 1803–1831. [10] Rajiv Khanna Been Ki m and Sanmi Koyejo . 2016. Examples are not Enough, Learn to Criticize ! Critici sm for Interpretability. In Adva nces in Neural Informa- tion Proc essing Systems . [11] Umang Bhatt, Pra deep Rav i kumar, and J osé MF Moura. 2019. T owards a ggr egat- ing weighted feature attributions. arXiv preprint arXiv:1901.10040 (2019). [12] Miles Brundage, Shahar A vin, J ack Clar k, Helen T oner , Peter Eckersley , Ben Garﬁnkel, Allan Dafoe, Paul Scharr e, Thomas Zeitzoﬀ, Bobb y Filar, et al. 2018. The ma licious use of ar tiﬁcial intelligence: Forecasting, preve ntion, and mitiga- tion. arXiv preprint arXiv:1802.07228 (2018). [13] Aditya Chattopadhyay , Piyushi Manupriya, Anirban Sarka r , and Vineeth N Bala- subrama nian. 2019. Ne u ral Network Attributions: A C ausal Perspective. In Pro- ceedings of the 36th International Conference on M achine Learning (Proc eedings of Machine Learning Research) , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), V ol. 97. PMLR, Long Beach, C alifornia, USA, 981–990. [14] Jia nb o Chen, Le Song, Martin J W ainwright, and Michael I Jordan. [n. d.]. L- shapley and c-shapley: Eﬃcient mod el interpretation fo r structured da ta . 7th International Conference on Learning Representations (ICLR 2019) ([n. d.]). [15] R Dennis Cook. 1977. Detection of inﬂ uential obser vation in linear regressi on. T echnometrics 19, 1 (1977), 15–18. [16] Jeﬀrey De Fauw , Joseph R Ledsam, Bernardino Romera-Paredes, Stanislav Nikolov , Nenad To masev , Sam Blac k well, Harr y Askham, X avier Glorot, B ren- dan O’Donoghue, Da niel Visentin, et al. 2018. C linically applicab le deep learning for diagnosis and referr a l i n retinal diseas e. Nature m edicine 24, 9 (2018), 1342. [17] Amit Dhurandhar, Karthikeyan Shanmugam, Ronny Luss, a nd Peder A Olsen. 2018. Improving s imple models with conﬁdence proﬁles. In Adv ances in Neural Information Proc essing Systems . 10296–10306. [18] Ann-Kathrin Dombrowski, Maximilia n Alber, Christopher J Anders, M arcel Ack- ermann, Klaus-Robert Müller , and Pan Kessel. 2019. Exp lanations can be manip- ulated and geometry is to blame. arXiv preprint arXiv:1906.07983 (2019). [19] Finale Doshi- V elez and Been Kim. 2017. T owards A Rigorous Science of Inter- pretable Machine L earning. (2017). [20] William Du Mouchel. 2002. Data squashing: constructing summar y data sets. In Handbook of Massive Data Sets . Springer , 579–591. [21] Chris tian Etmann, Sebastia n Lunz, Peter Maass , and Carola Schoenlieb. 2019. On the Connection Between Adversarial Robustness and Saliency Map Inter- pretability . In Proceedings of the 36th International Conference on Machine Learn- ing (Proceedings of Machine Learning Research) , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), V ol. 97. PML R, Long Beach, C alifornia, USA, 1823–1832. [22] Ruth Fong and Andrea V edaldi. 2017. Interpretable Explanations of Black Boxes by Mea ningful Perturbation. Proceedings of the 2017 IEEE International Confer- ence on Computer Vision (ICCV). (2017). https://doi.org/10.1109/ICC V .2017.371 arXiv:arXiv: 1704.03296 [23] Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neu- ral networks is fragile. AAAI (2019). [24] Leilani H Gilpin, Davi d Bau, Ben Z Yuan, A yesha Bajwa, Michael Spect er , and Lalana Kagal. 2018. Explaining explanations: An over view of interpretability of machine learning. In 2018 IEEE 5th International Conference o n data science and advanced analytics (DSAA) . IEEE, 80–89. [25] Frederik Harder, Ma tthias Bauer , and Mijung Par k . 2019. Interpretable and Di f- ferentially Private Predictions. arXiv preprint arXiv:1906.020 04 (2019). F A T* ’20, January 27–30, 2020, Bar celona, Spain Bha et al. [26] JB Heaton, Nic holas G Polson, a nd Jan Hendrik Witte. 2016. Deep lear ning in ﬁnance. arXiv preprint arXiv:1602.0 6561 (2016). [27] Paul W . Holland. 1986. Statistics and C ausal Inference. J. A mer . Statist. Assoc. 81, 396 (1986), 945–960. [28] Kenn eth Holstein, Jennifer W ortman Vaughan, Hal Daumé III, M iro Dudik, and Hanna Wall ach. 2019. Improving fai rness in machine learning s ystems: What do industry pra ctitioners need? . In Proc eedings of the 2019 CHI Conference on Human Fact or s in Computing Systems . A CM, 600. [29] Giles Hooker and Lucas Mentch. 2019. Please Stop Permuting Features: An Ex- planation and Alternatives. arXiv preprint arXiv:1905.0 3151 (2019). [30] Andrew I lyas, Shibani Santurkar, Dimitris T sipras, Logan Engstrom, B randon Tran, and Aleksander Madry. 2019. Adversaria l Examples Are Not Bugs, T hey Are Features. http://arxiv.org/abs/1905.02175 ci te ar xiv:1905.02175. [31] Been Kim, Ma r tin Wattenberg, Justin Gilmer, Ca rrie Ca i, James W exler , Fer- nanda Viegas, and Ror y Sayres. 2017. Interpretability beyond feature attribu- tion: Q uantitative testing with concept activation vectors (tcav). arXiv preprint arXiv:1711.11279 (2017). [32] Pang W ei Koh and Percy Liang. 2017. Understanding black -box predictions via inﬂuence function s. In Pr oceedings of t he 3 4th Internationa l Conference on Ma- chine Lea rning- V olume 70 (ICML 2017) . Journal of Machine Lear ning Research, 1885–1894. [33] Bruno Lepri, Nuria Oliver, Emmanuel Letouzé, Alex Pentland, and Patrick V inck. 2018. Fa i r , transparent, and ac c ountable algorithmic decision-mak ing process es. Philosophy & T echnology 31, 4 (2018), 611–627. [34] Scott M Lundberg a nd Su-In L ee. 2017. A Uniﬁed Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 (NeurIPS 20 17) , I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus , S. Vish- wanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765–4774. [35] Scott M Lu ndb erg, Bala N air , Monica S V avilala, Mayumi Horibe, Michael J Eisses, Trevor A dams, David E Liston, Daniel King-W ai Low, Shu-Fang New- man, J erry Kim, et al. 2018. Explainable m a chine-learning prediction s for the prevention of hypoxaemia during surger y . N ature biomedical engineering 2, 10 (2018), 749. [36] Aleksander Ma dry, Aleksandar Makelov , Ludwig Schmidt, Dimitri s T sipras, a nd Adrian Vladu. 2017. T owards deep learning models resistant to a dver sarial at- tacks. arXiv preprint arXiv:1706.06083 (2017). [37] Tim M iller . 2018. Explanation in a rtiﬁcial intelligence: Insights from the social sciences. A rtiﬁcial Intelligence (2018). [38] Smitha M illi, L udwig Schmidt, Anca Draga n, and Moritz Hardt. 2019. Model Re- construction from Model Explanations. In Pr oceedings of ACM FA T* 2019 (2019). [39] Margaret Mitchell, Simon e W u, Andrew Zaldivar, Parker Barnes, Lu c y Vasser- man, Ben Hutchinson, Elena Spitzer , Inioluwa Deb orah Raji, and Timnit Gebru. 2019. Model ca rds for model reporting. In Proceedings of the Conference on F air- ness, Accountability , and Transparency . ACM, 220–229. [40] Brent Mi ttelstadt, Chris Russell, and Sand ra W achter. 2019. Explaining expla- nations in AI. In Proceedings o f the conference on fairness, accountability, and transparency . ACM, 279–288. [41] Grégoire Montavon, Sebastia n Lapuschkin, Alexander Binder , W ojciech Samek, and Klaus- Rob ert Müller . 2017. Expl aining nonlinear clas siﬁcation decisions with deep taylor decomposition. Pattern Recognition 65 (2017), 211–222. [42] Yilin Niu, Chao Qiao, Hang Li, a nd Minlie Huang. 2018. W ord Embedding ba sed Edit Distance. arXiv preprint arXiv:1810.10752 (2018). [43] Board of Governors of the Federal Reserve System. 2011. Supervisor y Guidance on Model Risk Management. https://ww w .federal reserve.gov/supervisionreg/srletters/s r 1107a1.pdf (2011). [44] Onora O’Neill. 2018. Linking trust to trustworthiness. International Journal of Philosophical Studies 26, 2 (2018), 293–300. [45] European Par liament a nd Council of European Union. 2018. European Union General Data Protection Regulation, A rticles 13-15. http://ww w .privacy- regulation.eu/en/13.htm (2018). [46] Judea Pearl. 2000. Causality: models, reasoning and inference . V ol. 29. Springer . [47] Fábio Pinto, Ma rco OP Sampaio, and Pedro Biza rro. 2019. Automatic Model Monitoring for Data Streams. arXiv preprint arXiv:1908 . 04240 (2019). [48] Forough Pours a bzi-Sangdeh, Da niel G Goldstein, Ja ke M Hofman, J ennifer W ort- man V aughan, and Han na Wallach. 2018. Manipulating and mea suring model interpretability . arX iv preprint arXiv:1802.07810 (2018). [49] Alun Preece, Dan Harborne, Dave Braines, Richard T omsett, and Supriyo Chakrab orty . 2018. Stakeholders in explainable AI. arX iv preprint arXiv:1810.00184 (2018). [50] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any class i ﬁer . In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 1135–1144. [51] Andrew S lavin Ross, Michael C H ughes, and Finale Doshi- V elez. 2017. Right for the r ight reasons: training diﬀerentiable models by constraining their ex- planations. In Proceedings of the 26th International Joint Conference on Artiﬁcial Intelligence . AAAI Press, 2662–2670. [52] Cynthia Rudin. 2019. Stop ex plaining black b ox machine learning models for high stakes decisions and use interpretable models i nstead. Nature Machine Intelligence 1, 5 (2019), 206. [53] Andrew D Selbst and Solon Baroca s. 2018. The intuitive appeal of explainable machines. Fordham L. Rev . 87 (2018), 1085. [54] Lloyd S Shapley . 1953. A Value for n-Pers on Games. In Contributions to the Theory of Games II . 307–317. [55] Shubham Sharma, Jette Henderson, a nd Joydeep Ghosh. 2019. CERTIF AI: Coun- terfactual Explanations for Robustness, Transparency, Interpretability, and Fair- ness of Artiﬁcial Intelligence models. arXiv preprint arXiv:1905.07 857 (2019). [56] Reza Shokri, Ma rtin St robel, and Y air Zick . 2019. Privacy Risks of Ex plaining Machine Lear ning Models. arXiv preprint arXiv:1907 .00164 (2019). [57] A vanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning im- portant features through propagating activation diﬀerences. In Proceed ings of the 34th International Conference on Mac hine Learning- V olume 70 (ICML 2017 ) . Journal of Machine Lear ning Research, 3145–3153. [58] A vanti Shrikum ar , Eva Prakash, and Anshul Kundaje. 2018. Gkmexplain: Fas t and Accurate Interpretation of Nonlinear Gapped k-mer Support V ecto r Ma - chines Using Integrated Gra dients. BioRxiv (2018), 457606. [59] Sahil Singla, E ric Wallace , Shi Feng, and Soh eil Feizi. 2019. Understanding Im- pacts of High-Order Loss Approximations and Features in Deep Learning Inter- pretation. In Proceedings of the 36th International Conference on Machine Learn- ing (Proceedings of Machine Learning Research) , Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), V ol. 97. PML R, Long Beach, C alifornia, USA, 5848–5856. [60] Daniel Smilkov , Nikhil Thorat, Been Kim, Fer nanda Viégas, a nd Martin Wat- tenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017). [61] Erik Štrumbelj and Igor Ko nonenko . 2014. Explaining prediction models and individual prediction s with fea ture contributions. Knowledge and Information Systems 41, 3 (2014), 647–665. [62] Mukund Sundarar ajan, Ankur Taly , and Qiqi Yan. 2017. Axiomatic attrib ution for deep networks. In Proceedings of the 34th International Conference on Machine Learning- Vol ume 70 (ICM L 2017) . Journal of Machine Lear ning Resea rch, 3319– 3328. [63] Floria n Tramèr, Fan Zhang, Ari Juels, M ichael K Reiter, a nd Thomas Ris tenpart. 2016. Stealing machine learning models via prediction apis. In 25th { USENIX } Security Symposium ( { USENIX } Security 16 ) . 601–618. [64] Dimitri s Tsipras, Shibani Santurkar, Logan En gstrom, Alexander Turner , and Aleksander Madr y . 2019. Robustness May Be at Odds with Accuracy . In International Conference on Learning Representations . https://openreview .net /forum?id=SyxAb30cY7 [65] Berk Ustun, Al exander Spangher , and Y ang Liu. 2019. Actionable recourse in linear cla s siﬁcation. In Proceedings of the Conference on Fairness, Accountability, and Transparency . ACM, 10–19. [66] Sandra W achter , Brent Mittelstadt, and Chris Russell. 2017. Counterfactual Ex- planations without Opening the Black Box: A utomated Decisions and the GPDR. Harv . JL & T ech. 31 (2017), 841. [67] Adrian W eller . 2019. Transparency: motivations and challenges. In Explainable AI: Interpreting, Explaining and Vis ualizing Deep Learning . Springer , 23–40. [68] James W exl er , Mahima Pushkarna, T olga Bolukba si, Ma rtin W attenberg, Fer- nanda Viegas , and Jimb o Wilson. 2019. The What-If T ool: Interactive Probing of Mac hine Learning Models. arXiv preprint arXiv:1907.04135 (2019). [69] Chih-Kua n Y eh, Cheng- Y u Hsi eh, Arun Sai Suggala, Dav id Inouye, and Pradeep Ravikumar. 2019. How Sensitive are Sensitivi ty-Based Explanations? arXiv preprint arXiv:1901.0939 2 (2019). [70] Jia nming Zhang, Sarah Adel Bargal, Zhe Lin, Jon athan Brandt, Xiaohui Shen , and Stan Sclaroﬀ. 2018. Top-d own neural attention by excitation bac k prop. In- ternational Journal of Computer Vis ion 126, 10 (2018), 1084–1102. [71] Yujia Zhang, Kua ngyan Song, Yiming Sun, Sa rah T an, a nd Ma deleine Udell. 2019. "Why Should Y ou Trust My Explanation?" Understanding Uncertainty in L IME Explanations.

Explainable Machine Learning in Deployment

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment