What Would You Ask the Machine Learning Model? Identification of User Needs for Model Explanations Based on Human-Model Conversations

What W ould Y ou Ask the Mac hine Learning Mo del? Iden tiﬁcation of User Needs for Mo del Explanations Based on Human-Mo del Con v ersations. Mic ha l Ku ´ zba 1 , 2 [0000 − 0002 − 9181 − 0126] and Przem ys la w Biecek 1 , 2 [0000 − 0001 − 8423 − 1823] 1 F acult y of Mathematics, Informatics and Mechanics, Univ ersity of W arsaw, P oland 2 F acult y of Mathematics and Information Science, W arsaw Universit y of T ec hnology , P oland kuzba.michal@gmail.com Abstract. Recen tly w e see a rising n umber of metho ds in the ﬁeld of eXplainable Artiﬁcial Intelligence. T o our surprise, their dev elopment is driv en by model developers rather than a study of needs for human end users. The analysis of needs, if done, tak es the form of an A/B test rather than a study of op en questions. T o answer the question “What w ould a h uman operator like to ask the ML mo del?” we propose a con versational system explaining decisions of the predictive model. In this experiment, w e dev elop ed a c hatb ot called dr_ant to talk ab out machine learning mo del trained to predict surviv al o dds on Titanic. People can talk with dr_ant about diﬀerent asp ects of the mo del to understand the rationale b ehind its predictions. Having collected a corpus of 1000+ dialogues, w e analyse the most common t yp es of questions that users would like to ask. T o our knowledge, it is the ﬁrst study whic h uses a conv ersational system to collect the needs of human operators from the interactiv e and iterativ e dialogue explorations of a predictive mo del. Keyw ords: eXplainable Artiﬁcial Intelligence · Iterativ e dialogue ex- planations · Human-centred Mac hine Learning 1 In tro duction Mac hine Learning mo dels are widely adopted in all areas of human life. As they often become critical parts of the automated systems, there is an increasing need for understanding their decisions and ability to interact with such systems. Hence, w e are curren tly seeing the gro wth of the area of eXplainable Artiﬁcial In telligence (XAI). F or instance, Scantam burlo et al. [28] raise an issue of under- standing machine decisions and their consequences on the example of computer- made decisions in criminal justice. This example touc hes upon suc h features as fairness, equality , transparency and accoun tabilit y . Rib era & Lap edriza [26] iden- tify the following motiv ations for why to design and use explanations: system 2 M. Kuzba, P . Biecek v eriﬁcation, including bias detection; impro vemen t of the system (debugging); learning from the system’s distilled knowledge; compliance with legislation, e.g. “Righ t to explanation” set by EU; inform p eople aﬀected b y AI decisions. W e see the rising num b er of explanation methods, such as LIME [25] and SHAP [15] and XAI frameworks such as AIX360 [2], InterpretML [22], DALEX [4], mo delStudio [3], exBER T [10] and many others. T hese systems require a systematic quality ev aluation [8,21,13]. F or instance, T an et al. [32] describ e the uncertain ty of explanations and Molnar et al. [20] describ e a w ay to quan tify the in terpretability of the mo del. These metho ds and to olboxes are focused on the mo del dev elop er p erspective. Most p opular methods like Partial Dep endence Plots, LIME or SHAP are to ols for a p ost-ho c mo del diagnostic rather than to ols linked with the needs of end users. But it is imp ortan t to design an explanation system for its addressee (explainee). Both form and con tent of the system should b e adjusted to the end user. And while explainees migh t not hav e the AI expertise, explanations are often constructed by engineers and researchers for themselv es [19], therefore limiting its usefulness for the other audience [17]. Also, b oth the form and the conten t of the explanations should diﬀer de- p ending on the explainee’s background and role in the mo del lifecycle. Rib era & Lap edriza [26] describ e three types of explainees: AI researchers and developers, domain experts and the la y audience. T omsett et al. [33] in tro duce six groups: creators, op erators, executors, decision-sub jects, data-sub jects and examiners. These roles are p ositioned diﬀerently in the pip eline. Users diﬀer in the back- ground and the goal of using the explanation system. They v ary in the technical skills and the language they use. Finally , explanations should ha ve a compre- hensible form – textual, visual or multimodal. Explanation is a cognitive process and a so cial in teraction [7]. Moreo v er, in teractive exploration of the mo del allows to p ersonalize the explanations presented to the explainee [31]. Ary a et al. identify a space for interactiv e explanations in a tree-shaped tax- onom y of XAI tec hniques [2]. Ho wev er, AIX360 framew ork presen ted in this pap er implements only static explanations. Similarly , most of the other toolkits and metho ds focus en tirely on the static branc h of the explanations taxonom y . Sok ol & Flach [29] prop ose conv ersation using class-contrastiv e counterfactual statemen ts. This idea is implemented as a conv ersational system for the credit score systems la y audience [30]. Pecune et al. describe conv ersational mo vie recommendation agent explaining its recommendations [23]. A rule-based, inter- activ e and con versational agen t for explainable AI is als o proposed by W erner [35]. Madumal et al. prop ose an interaction proto col and iden tify comp onen ts of an explanation dialogue [16]. Finally , Miller [18] claims that truly explainable agen ts will use interactivit y and commu nication. T o address these problems w e create an open-ended dialog based explanation system. W e develop a c hatb ot allowing the explainee to in teract with a predictive mo del and its explanations. W e implemen t this particular system for the random forest mo del trained on Titanic dataset [1,5]. Ho wev er, any mo del trained on this dataset can b e plugged into this system. Also, this approach can b e applied successfully to other datasets and m uch of the comp onen ts can b e reused. What W ould Y ou Ask the Mac hine Learning Mo del? 3 Our goal is t wofold. Firstly , w e create a w orking protot yp e of a con versational system for XAI. Secondly , we wan t to disco ver what questions p eople ask to understand the mo del. This exploration is enabled by the op en-ended nature of the c hatb ot. It means that the user might ask any question ev en if the system is unable to giv e a satisfying answer for each of them. There are engineering challenges of building a dialogue agent and the “Wizard of Oz” pro xy approach might b e used as an alternative [31,11]. In this work ho wev er, we decide to build suc h a system. With this approach we obtain a w orking prototype and a scalable dialogue collection pro cess. As a result, w e gain a b etter understanding of ho w to answ er the explanatory needs of a h uman operator. With this kno wledge, we will b e able to create explanation systems tailored to explainee’s needs by addressing their questions. It is in contrast to developing new metho ds blindly or according to the judgement of their dev elop ers. W e outline the scop e and capabilities of a dialogue agen t (Section 2). In Section 3, w e illustrate the architecture of the en tire system and describ e eac h of the components. W e also demonstrate the agen t’s work on the examples. Finally , in Section 4, w e describ e the experiment and analyze the collected dialogues. 2 Dialogue system This dialogue system is a multi-turn c hatb ot with the user initiative. It oﬀers a conv ersation about the underlying random forest mo del trained on the w ell- kno wn Titanic dataset. W e deliberately select a black b o x mo del with no direct in terpretation together with a dataset and a problem that can be easily imagined for a wider audience. The dialogue system w as built to understand and resp ond to sev eral groups of queries: – Supplying data ab out the passenger, e.g. sp ecifying age or gender. This step migh t be omitted b y imp ersonating one of t wo predeﬁned passengers with diﬀeren t mo del predictions. – Inference – telling users what are their chances of surviv al. Mo del imputes missing v ariables. – Visual explanations from the Explanatory Model Analysis to olbox [5]: Ce- teris Paribus proﬁles [12] (addressing “what-if ” questions) and Break Do wn plots [9] (presenting feature con tributions). Note this is to oﬀer a warm start in to the system by answering some of the anticipated queries. Ho wev er, the principal purpose is to explore what other t yp es of questions migh t be asked. – Dialogue support queries, suc h as listing and d escribing a v ailable v ariables or restarting the con versation. This system was ﬁrstly trained with an initial set of training sen tences and in tents. After the deplo ymen t of the chatbot, it was iterativ ely retrained based on the collected conv ersations. Those were used in tw o w ays: 1) to add new inten ts, 2) to extend the training set with the actual user queries, esp ecially those which w ere misclassiﬁed. The ﬁnal v ersion of the dialogue agen t whic h is used in the exp erimen t at Section 4 consists of 40 in tents and 874 training sentences. 4 M. Kuzba, P . Biecek 3 Implemen tation Explainee Blackbox model Dialogue Management NLU NLG CeterisParibus iBreakDown Chatbot admin Explainers Dialogue agent Slack WWW Interface Query Response Intent, entities Prediction Prediction Visual & textual explanations Context, response handler Retrain agent Fig. 1. Ov erview of the system architecture. Explainee uses the system to talk ab out the blac kb o x mo del . They interact with the system using one of the in terfaces . The con versation is managed by the dialogue agen t which is created and trained b y the c hatb ot admin . T o create a response system queries the blackbox mo del for its predictions and explainers for visual explanations. A top-level chatbot architecture is depicted in Figure 1. The system consists of sev eral comp onen ts: 1. Explainee Human operator – addressee of the system. They c hat ab out the blackbox mo del and its predictions. 2. In terface This dialogue agen t might b e deplo yed to v arious con versational platforms indep enden tly of the back end and eac h other. The only exception to that is rendering some of the graphical, rich messages. W e used a custom w eb in te- gration as a ma jor surface. It comm unicates with the dialogue agen t’s engine sending requests with user queries and receiving text and graphical conten t. The fron tend of the c hatb ot uses Vue.js and is based on dialogflow 3 rep os- 3 https://github.com/mishushakov/dialogflow- web- v2 What W ould Y ou Ask the Mac hine Learning Mo del? 5 itory . It provides a chat in terface and renders rich messages, such as plots and suggestion buttons. This integration allo ws to hav e a voice con versation using the bro wser’s sp eec h recognition and sp eec h synthesis capabilities. 3. Dialogue agen t Chatb ot’s engine implemented using Dialogflow framew ork and Node.js fulﬁlmen t co de run on Google Cloud Functions . – Natural Language Understanding (NLU) The Natural Language Understanding comp onent classiﬁes query inten t and extracts entities. This classiﬁer uses the framework’s builtin rule- based and Machine Learning algorithms. NLU mo dule recognizes 40 in- ten ts suc h as p osing a what-if question, asking ab out a v ariable or sp ec- ifying its v alue. It w as trained on 874 training sentences. Some of these sen tences come from the initial subset of the collected con v ersations. Additionally , NLU mo dule comes with 4 entities – one for capturing the name of the v ariable and 3 to extract v alues of the categorical v ariables – gender, class and the place of embarkmen t. F or n umerical features, a builtin n umerical entit y is utilized. See examples in Section 3.1. – Dialogue managemen t It implemen ts the state and context. F ormer is used to store the pas- senger’s data and the latter to condition resp onse on more than the last query . F or example, when the user sends a query with a num b er it migh t b e classiﬁed as age or fare speciﬁcation dep ending on the curren t context. – Natural-language generation (NLG) Resp onse generation system. T o build a chatbot’s utterance the dialogue agen t might need to use the explanations or the predictions. F or this, the NLG comp onent will query explainers or the mo del corresp ondingly . Plots, images and suggestion buttons whic h are part of the chatbot re- sp onse are rendered as rich messages on the front end. 4. Blac kb o x mo del A random forest mo del was trained to predict the c hance of surviv al on Titanic 4 . The model was trained in R [24] and conv erted in to REST api with the plumber pac k age [34]. The random forest mo del w as trained with default hyperparameters. Data prepro cessing includes imputation of missing v alues. The p erformance of the mo del on the test dataset was A UC 0.84 and F1 score 0.73. 5. Explainers REST API exp osing visual and textual mo del explanations from iBreakDown [9] and CeterisParibus [12] libraries. They explore the blackbox mo del to create an explanation. See the xai2cloud pac k age [27] for more details. 6. Chatb ot admin Human op erator – developer of the system. They can man ually retrain the system based on misclassiﬁed inten ts and misextracted entities. F or instance, this dialogue agent was iteratively retrained based on the initial subset of the collected dialogues. 4 Y ou can download the model from the archivist [6] database with a follo wing hook: archivist::aread("pbiecek/models/42d51") . 6 M. Kuzba, P . Biecek This arc hitecture w orks for any predictiv e model and tabular data. Its compo- nen ts diﬀer in ho w they can be transferred for other tasks and datasets 5 . The user in terface is indep enden t of the rest of the system. When a dataset is ﬁxed, the model is interc hangeable. Ho wev er, the dialogue agent is handcrafted and dep ends on the dataset as well as explainers. Change in a dataset needs to b e at least reﬂected in an update of the data-speciﬁc entities and in tents. F or instance, a new set of v ariables needs to b e cov ered. It is also follo wed by mo difying the training sentences for the NLU mo dule and p erhaps some changes in the gen- erated utterances. Adding a new explainer might require adding a new inten t. Usually , we wan t to capture the user queries, that can b e addressed with a new explanation metho d. 3.1 NLU examples Natural-language understanding mo dule is designed to guess an in tent and ex- tract relev an t parameters/entities from a user query . Queries can be sp eciﬁed in a op en format. Here are examples of NLU for three inten ts. Query : What If I had b e en older? Intent: c eteris p aribus Entities: [variable: age] Query I’m 20 ye ar old woman Intent: multi slot ﬁl ling Entities: [age: 20, gender: female] Query: Which fe atur e is the most imp ortant? Intent: br e ak down Entities: [] 5 The source co de is av ailable at https://github.com/ModelOriented/xaibot . Fig. 2. An example conv ersation. Explainee’s queries in the grey b o xes. What W ould Y ou Ask the Mac hine Learning Mo del? 7 3.2 Example dialogue An excerpt from an example conv ersation is presented in Figure 2. The corre- sp onding inten t classiﬁcation ﬂo w is highlighted in Figure 3. Fig. 3. Screenshot from the Dialogflow Analytics . This ﬂow chart demonstrates the results of the NLU mo dule on a sample of collected dialogues. Example con versation from Figure 2 contributes to the topmost (green) path. Each b ox corresp onds to a classiﬁed inten tion of the query , e.g. tel ling age or c eteris p aribus . 4 Results The initial subset of the collected dialogues is used to improv e the NLU module of the dialogue agen t. As a next step, w e conduct an exp erimen t b y sharing the c hatb ot in the Data Science comm unity and analyzing the collected dialogues. 4.1 Exp erimen t setup F or this exp erimen t, w e work on data collected throughout 2 weeks. This is a subset of all collected dialogues, separate from the data used to train the NLU mo dule. Narro wing the time scop e of the experiment allows to describ e the audience and ensure the coherence of the data. As a next step, we ﬁlter out conv ersations with totally irrelev ant conten t and those with less than 3 user queries. Finally , we obtain 621 dialogues consisting of 5675 user queries in total. The a verage length equals 9.14, maxim um 83 and median 7 queries. W e see the histogram of con versations length in Figure 4. Note that by conv ersation length 8 M. Kuzba, P . Biecek w e mean the num b er of user queries whic h is equal to the n umber of turns in the dialogue (user query , chatbot resp onse). The audience acquisition comes mostly from R and Data Science commu- nit y . Users are instructed to explore the mo del and its explanations individually . Ho wev er, they might come across a demonstration of the c hatb ot’s capabilities p oten tially in tro ducing a source of bias. W e describe the results of the study in the section 4.2 and we share the statistical details ab out the exp erimen t audience in the section 4.3. 0 10 20 30 40 50 60 70 80 Number of user queries 0 20 40 60 80 100 Number of dialogues Fig. 4. Histogram of conv ersations length (n umber of user queries), after ﬁltering out con versations shorter than 3 queries. As exp ected, most con versations were short. Ho w- ev er, there were also dialogues of ov er 20 queries. 4.2 Query t yp es W e analyze the con tent of the dialogues. Similar user queries, when diﬀeren t only in the formulation, are manually group ed together. F or eac h category , we calcu- late the num b er of conv ersations with at least one query of this t yp e. Numbers of o ccurrences are presented in T able 1. Note that users were not prompted or hinted to ask any of these with an exception of the “what do you know ab out me” question. Moreo v er, the taxonomy deﬁned here is indep enden t of the in tents recognized b y the NLU module and is deﬁned based on collected dialogues. Here is the list of the query types ordered decreasingly by the num b er of con versation they o ccur in. 1. wh y – general explanation queries, typical examples of such are: – “wh y?” – “explain it to me” What W ould Y ou Ask the Mac hine Learning Mo del? 9 – “ho w was this calculated?” – “wh y is my chance so low?” 2. what-if – alternativ e scenario queries. F requent examples: what if I’m older? , what if I tr avel le d in the 1st class? . Rarely , w e see multi-v ariable questions suc h as: What if I’m older and tr avel in a diﬀer ent class? . 3. what do y ou know about me – this is the only query hinted to the user using the suggestion button. When the user inputs their data man ually it usually serv es to understand what is yet missing. How ever, in the scenario when the explainee imp ersonates a mo vie character it also aids understanding whic h information ab out the user is possessed by the system. 4. ED A – a general category on Exploratory Data Analysis. All questions re- lated to data rather than the mo del fall in to this category . F or instance, fe atur e distribution , maximum values , plot histo gr am for the variable v , de- scrib e/summarize the data , is dataset imb alanc e d , how many women survive d , dataset size etc. 5. feature imp ortance – here w e group all questions ab out the relev ance, inﬂuence, imp ortance or eﬀect of the feature on the prediction. W e see sev eral subt yp es of that query: – Which ar e the most imp ortant variable(s)? – Do es gender inﬂuenc e the survival chanc e? – lo cal imp ortance – How do es age inﬂuenc e my survival , What makes me mor e likely to survive? – global imp ortance – How do es age inﬂuenc e survival acr oss al l p as- sengers? 6. ho w to impro v e – actionable queries for maximizing the prediction, e.g. what should I do to survive , how c an I incr e ase my chanc es . 7. class comparison – comparison of the predictions across diﬀerent v alues of the categorical v ariable. It might b e seen as a v arian t of the what-if question. Examples: which class has the highest survival chanc e , ar e men mor e likely to die than women . 8. who has the b est score – here, we ask ab out the observ ations that max- imize/minimize the prediction. Examples: who survive d/die d , who is most likely to survive . It is similar to how to impr ove question, but rather on a p er example basis. 9. mo del-related – these are the queries related directly to the mo del, rather than its predictions. W e see questions ab out the algorithm and the co de. W e also see users asking about metrics (accuracy , AUC), confusion matrix and conﬁdence. Ho wev er, these are observed just a few times. 10. con trastive – question ab out wh y predictions for tw o observ ations are dif- feren t. W e see it very rarely . Ho wev er, more often w e observ e the implicit comparison as a follow-up question – for instance, what ab out other p assen- gers , what ab out Jack . 11. plot interaction – follo w-up queries to in teract with the display ed visual con tent. Not observed. 12. similar observ ations – queries regarding “neighbouring” observ ations. F or instance, what ab out p e ople similar to me . Not observed. 10 M. Kuzba, P . Biecek T able 1. Results of the analysis for 621 conv ersations in the experiment. The second column presents the num b er of conv ersations with at least one query of a giv en type. A single dialogue might contain multiple or none of these queries. Query type Dialogues count wh y 73 what-if 72 what do you know ab out me 57 ED A 54 feature imp ortance 31 ho w to impro ve 24 class comparison 22 who has the b est score 20 mo del-related 14 con trastive 1 plot interaction 0 similar observ ations 0 Num b er of all analyzed dialogues 621 W e also see users creating alternativ e scenarios and comparing predictions for diﬀerent observ ations manually , i.e. asking for prediction multiple times with diﬀeren t passenger information. Additionally , we observ e explainees asking ab out other sensitive features, that are not included in the mo del, e.g. nationality , race or income. Ho w ever, some of these, e.g. income, are strongly correlated with class and fare. 4.3 Statistics of surv eyed sample W e use Go ogle Analytics to get insights into the audience of the experiment. Users are distributed across 59 countries with the top ﬁve (Poland, United States, United Kingdom, German y and India, in this order) accoun ting for 63% of the users. Figure 5 presen ts demographics data on the subset of the audience (53%) for whic h this information is av ailable. 5 Conclusions and F uture W ork Dep ending on the area of application, diﬀeren t needs are linked with the concept of interpretabilit y [14,33]. And even for a single area of application, diﬀerent actors ma y hav e diﬀerent needs related to mo del interpretabilit y [2]. In this pap er, w e presented a no vel application of the dialogue system for con versational explanations of a predictive mo del. Detailed con tributions are follo wing (1) we presented a pro cess based on a dialogue system allowing for eﬀectiv e collection of user expe ctations related to mo del in terpretation, (2) w e presen ted a xai-b ot implemen tation for a binary classiﬁcation mo del for Titanic data, (3) w e conducted an analysis of the collected dialogues. What W ould Y ou Ask the Mac hine Learning Mo del? 11 Fig. 5. Demographic statistics for age (left) and gender (right) of the studied group registered by Go ogle Analytics. W e conduct this exp eriment on the surviv al mo del for Titanic. How ever, our prior goal of this work is to understand user needs related to the mo del explana- tion, rather than improv e this speciﬁc implemen tation. The kno wledge w e gain from this exp erimen t will aid in designing the explanations for v arious mo dels trained on tabular data. One example migh t be surviv al models for COVID-19 whic h are currently under large interest. Con versational agent prov ed to w ork as a to ol to explore and extract user needs related to the use of the Mac hine Learning mo dels. This method allow ed us to v alidate h yp otheses and gather requirements for the XAI system on the example from the experiment. In this analysis, we identiﬁed several frequent patterns among user queries. Con versational agent is also a promising, nov el approach to XAI as a mo del- h uman interface. Users were given a to ol for the interactiv e explanation of the mo del’s predictions. In the future, suc h systems migh t b e useful in bridging the gap b et ween automated systems and their end users. An interesting and natural extension of this work would b e to compare user queries for diﬀerent explainee’s groups in the system, e.g. mo del creators, op erators, examiners and decision- sub jects. In particular, it w ould b e interesting to collect needs from explainees with no domain knowledge in Mac hine Learning. Similarly , it is in teresting to tak e adv antage of the pro cess introduced in this work to compare user needs across v arious areas of applications, e.g. legal, medical and ﬁnancial. Addition- ally , based on the analysis of the collected dialogues w e see t wo related areas that w ould b eneﬁt from the con versational human-model interaction – Explor atory Data A nalysis and mo del fairness based on the queries ab out the sensitive and bias-prone features. Ac kno wledgments W e w ould lik e to thank 3 anonymous reviewers for their insightful commen ts and suggestions. Mic ha l Ku ´ zba w as ﬁnancially supp orted b y the NCN Opus gran t 2016/21/B/ST6/0217. 12 M. Kuzba, P . Biecek References 1. Titanic dataset, https://www.kaggle.com/c/titanic/data 2. Ary a, V., Bellam y , R.K.E., Chen, P .Y., Dhurandhar, A., Hind, M., Hoﬀman, S.C., Houde, S., Liao, Q.V., Luss, R., Mo jsilovi, A., Mourad, S., Pedemon te, P ., Ragha vendra, R., Ric hards, J., Sattigeri, P ., Shanm ugam, K., Singh, M., V arshney , K.R., W ei, D., Zhang, Y.: One explanation does not ﬁt all: A toolkit and taxonomy of ai explainability techniques (2019) 3. Baniec ki, H., Biecek, P .: modelStudio: Interactiv e Studio with Explanations for ML Predictive Mo dels. The Journal of Op en Source Softw are (Nov 2019), https: //doi.org/10.21105/joss.01798 4. Biecek, P .: DALEX: Explainers for Complex Predictive Models in R. Journal of Mac hine Learning Research 19 , 1–5 (2018) 5. Biecek, P ., Burzyko wski, T.: Explanatory Mo del Analysis. Explore, Explain and Examine Predictive Mo dels (2020), https://pbiecek.github.io/ema/ 6. Biecek, P ., Kosinski, M.: arc hivist: An R pack age for managing, recording and restoring data analysis results. Journal of Statistical Softw are 82 (11), 1–28 (2017) 7. El-Assady , M., Jentner, W., Kehlb ec k, R., Schlegel, U., Sev astjanov a, R., Sp errle, F., Spinner, T., Keim, D.: T ow ards xai: Structuring the pro cesses of explanations (2019) 8. Gilpin, L.H., Bau, D., Y uan, B.Z., Ba jwa, A., Specter, M., Kagal, L.: Explain- ing explanations: An approac h to ev aluating interpretabilit y of machine learning (2018), 9. Gosiewsk a, A., Biecek, P .: Do Not T rust Additive Explanations. arXiv e-prints (2019) 10. Ho o v er, B., Strobelt, H., Gehrmann, S.: exBER T: A Visual Analysis T o ol to Ex- plore Learned Representations in T ransformers Mo dels (2019) 11. Jen tzsch, S., Hhn, S., Ho c hgesch w ender, N.: Conv ersational interfaces for explain- able ai: A human-cen tred approach (2019) 12. Kuzba, M., Barano wsk a, E., Biecek, P .: pyCeterisP aribus: explaining Mac hine Learning models with Ceteris Paribus Proﬁles in Python. JOSS 4 (37), 1389 (2019), http://joss.theoj.org/papers/10.21105/joss.01389 13. Lage, I., Chen, E., He, J., Naray anan, M., Kim, B., Gershman, S., Doshi-V elez, F.: An ev aluation of the human-in terpretability of explanation (2019), http://arxiv. org/abs/1902.00006 14. Lipton, Z.C.: The mythos of mo del in terpretability (2016), abs/1606.03490 15. Lundb erg, S.M., Lee, S.I.: A Uniﬁed Approach to Interpreting Mo del Predictions. In: Guyon, I., Luxburg, U.V., Bengio, S., W allach, H., F ergus, R., Vishw anathan, S., Garnett, R. (eds.) Adv ances in Neural Information Processing Systems 30, pp. 4765–4774. Curran Associates, Inc. (2017), http://papers.nips.cc/paper/ 7062- a- unified- approach- to- interpreting- model- predictions.pdf 16. Madumal, P ., Miller, T., Sonen b erg, L., V etere, F.: A grounded in teraction protocol for explainable artiﬁcial intelligence. In: AAMAS (2019) 17. Madumal, P ., Miller, T., V etere, F., Sonenberg, L.: T ow ards a grounded dialog mo del for explainable artiﬁcial in telligence (2018), 08055 18. Miller, T.: Explanation in artiﬁcial in telligence: Insights from the social sciences (2017), What W ould Y ou Ask the Mac hine Learning Mo del? 13 19. Miller, T., How e, P ., Sonenberg, L.: Explainable AI: b ew are of inmates running the asylum or: How I learnt to stop worrying and lov e the so cial and b eha vioural sciences (2017), 20. Molnar, C., Casalicchio, G., Bisc hl, B.: Quantifying Interpretabilit y of Arbitrary Mac hine Learning Mo dels Through F unctional Decomp osition. arXiv e-prin ts (2019) 21. Mueller, S.T., Hoﬀman, R.R., Clancey , W.J., Emrey , A., Klein, G.: Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Pub- lications, and Bibliography for Explainable AI (2019), 1902.01876 22. Nori, H., Jenkins, S., Ko c h, P ., Caruana, R.: In terpretML: A Uniﬁed F ramework for Mac hine Learning Interpretabilit y (2019), 23. P ecune, F., Murali, S., Tsai, V., Matsuyama, Y., Cassell, J.: A mo del of so cial explanations for a conv ersational movie recommendation system (2019) 24. R Core T eam: R: A Language and En vironment for Statistical Computing. R F oundation for Statistical Computing, Vienna, Austria (2019), https://www. R- project.org/ 25. Rib eiro, M.T., Singh, S., Guestrin, C.: Why Should I T rust Y ou?: Explaining the Predictions of Any Classiﬁer (2016). https://doi.org/10.1145/2939672.2939778 26. Rib era, M., Lap edriza, ` A.: Can w e do b etter explanations? a proposal of user- cen tered explainable ai. In: IUI W orkshops (2019) 27. Rydelek, A.: xai2cloud: Deploys An Explainer T o The Cloud (2020), h ttps://mo delorien ted.github.io/xai2cloud 28. Scan tamburlo, T., Charlesw orth, A., Cristianini, N.: Mac hine decisions and h uman consequences (2018), 29. Sok ol, K., Flac h, P .: Conv ersational Explanations of Machine Learning Predic- tions Through Class-con trastive Coun terfactual Statemen ts. In: Proceedings of the Tw ent y-Seven th In ternational Joint Conference on Artiﬁcial In telligence, IJCAI- 18. pp. 5785–5786. International Joint Conferences on Artiﬁcial Intelligence Orga- nization (7 2018). https://doi.org/10.24963/ijcai.2018/836 30. Sok ol, K., Flach, P .: Glass-b ox: Explaining ai decisions with counterfactual state- men ts through conv ersation with a voice-enabled virtual assistant. In: Pro ceedings of the Twen t y-Seven th International Join t Conference on Artiﬁcial Intelligence, IJCAI-18. pp. 5868–5870. International Join t Conferences on Artiﬁcial Intelligence Organization (2018), https://doi.org/10.24963/ijcai.2018/865 31. Sok ol, K., Flach, P .: One explanation do es not ﬁt all. KI - Knstlic he Intelligenz (2020). https://doi.org/10.1007/s13218-020-00637-y 32. T an, H.F., Song, K., Udell, M., Sun, Y., Zhang, Y.: Wh y should y ou trust my in terpretation? Understanding uncertaint y in LIME predictions (2019), http:// 33. T omsett, R., Braines, D., Harb orne, D., Preece, A., Chakrab ort y , S.: Interpretable to whom? a role-based mo del for analyzing interpretable mac hine learning systems (2018) 34. T restle T echnology , LLC: plumber: An API Generator for R (2018) 35. W erner, C.: Explainable ai through rule-based in teractive conv ersation. In: EDBT/ICDT W orkshops (2020)

What Would You Ask the Machine Learning Model? Identification of User Needs for Model Explanations Based on Human-Model Conversations

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment