Position: Explainable AI is Causality in Disguise
The demand for Explainable AI (XAI) has triggered an explosion of methods, producing a landscape so fragmented that we now rely on surveys of surveys. Yet, fundamental challenges persist: conflicting metrics, failed sanity checks, and unresolved deba…
Authors: Amir-Hossein Karimi
P osition: Explainable AI is Causality in Disguise Amir -Hossein Karimi 1 Abstract The demand for Explainable AI (XAI) has trig- gered an explosion of methods, producing a land- scape so fragmented that we now rely on surve ys of surveys. Y et, fundamental challenges persist: conflicting metrics, failed sanity checks, and unre- solved debates o ver rob ustness and fairness. The only consensus on how to ac hieve explainability is a lac k of one. This has led many to point to the absence of a ground truth for defining “the” correct explanation as the main culprit. This position paper posits that the persistent dis- cord in XAI arises not from an absent ground truth but from a ground truth that exists, albeit as an elusiv e and challenging tar get: the causal model that gov erns the relev ant system. By reframing XAI queries about data , models , or decisions as causal inquiries, we prove the necessity and suf fi- ciency of causal models for XAI. W e contend that without this causal grounding, XAI remains un- moored. Ultimately , we encourage the community to con verge around adv anced concept and causal discovery to escape this entrenched uncertainty . 1. Introduction As early as the 1980s, the challenge of explainable AI (XAI) has been recognized as both critical and ambiguously de- fined ( K odratof f , 1994 ). Numerous attempts to tackle this issue hav e led to a div erse array of methods, which are or- ganized and categorized across various surve ys. Notable works include those focusing specifically on neural net- works ( Y osinski et al. , 2015 ; Monta v on et al. , 2018 ; Samek et al. , 2021 ), as well as broader surveys addressing e xplain- able AI in general ( Doshi-V elez & Kim , 2017 ; Došilovi ´ c et al. , 2018 ; Hoffman et al. , 2018 ; Guidotti et al. , 2018 ; Lipton , 2018 ; Adadi & Berrada , 2018 ; Gilpin et al. , 2018 ; Miller , 2019 ; Gunning et al. , 2019 ; Du et al. , 2019 ; Tjoa & Guan , 2020 ; Arrieta et al. , 2020 ; Carvalho et al. , 2019 ; 1 Department of Electrical & Computer Engineering, Univ ersity of W aterloo & V ector Institute, Canada. Correspondence to: Amir - Hossein Karimi . Pr eprint. Mar ch 31, 2026. Explainable AI Causality Decision-Based (“Why (not)?”, Y ) 3. Counterfactual Y X = x ′ ( u ) Q6: Why would the decision differ if the input had been different? Q5: Why does the model make a specific decision for a given input? Model-Based (“How?”, f ) 2. Interventional P ( Y | do ( X )) Q4: How do the model’s internal mechanisms function? Q3: How does the model transform inputs into outputs? Data-Based (“What?”, X ) 1. Observational P ( X, Y ) Q2: What underlying factors generate the data? Q1: What explains the distribution of the data? F igure 1. Core methods in XAI for explaining an ML model ( f : X → Y ) are categorized by purpose into data-based, model- based, and decision-based questions. By mapping these directly onto Pearl’ s Ladder of Causation, we rev eal that solving XAI fundamentally requires answering causal inquiries. Murdoch et al. , 2019 ). W ith each of these survey papers exceeding 1,000 citations, it’ s perhaps enough to warrant a surve y of surve ys ( Speith , 2022 ). Despite the very man y attempts, the field continues to grap- ple with fundamental questions. The definitions of ex- plainability and interpr etability may not always be agreed upon ( Preece et al. , 2018 ; Ehsan & Riedl , 2024 ; Leblanc & Germain , 2024 ; Namatevs et al. , 2022 ; Marcinkevi ˇ cs & V ogt , 2020 ), and debates over accuracy-explainability tradeoffs have split the community into proponents of in- her ent vs. post-hoc explainability approaches ( Rudin , 2019 ; Gunning & Aha , 2019 ; Laugel et al. , 2019 ). The lack of con- sensus over definitions and methodologies is further com- pounded by concerns over fairness ( V on Kügelgen et al. , 2022 ), robustness ( Y eh et al. , 2019 ; Ghorbani et al. , 2019 ; Kindermans et al. , 2019 ; Hamon et al. , 2020 ), pri vac y vio- lations ( Shavit & Moses , 2019b ), and the susceptibility of explanations to being manipulated or fooled ( Dombro wski et al. , 2019 ; Shavit & Moses , 2019a ; Heo et al. , 2019 ; Slack et al. , 2020 ; Sulli van & V erreault-Julien , 2022 ; W ickstrøm et al. ). Due to the lack of ground truth explanations, the com- munity has been compelled to pursue an axiomatic frame- work for defining explainability ( Sundararajan et al. , 2017 ; Janizek et al. , 2021 ; Amgoud & Ben-Naim , 2022 ), yet, de- 1 Position: Explainable AI is Causality in Disguise spite their axiomatic appeal, later work has shown failures in essential sanity checks ( Adebayo et al. , 2018 ; T omsett et al. , 2020 ; Karimi et al. , 2023 ). This position paper argues that XAI is fundamentally a supervised pr oblem where the tar get is reality itself—in short, XAI is causality in disguise. While acknowledging the dif ficulty of obtaining this w orld model, we ar gue that the real barrier to consensus in XAI lies in the field’ s near- total disregard for activ ely seeking it. Causal assumptions, we contend, are essential to bring coherence to XAI by ad- dressing core questions through a principled lens; without such assumptions, XAI methods risk providing explana- tions that lack rigor , reliability , or generalizability . Sev- eral studies hav e highlighted the importance of causality in XAI, identifying specific areas where a causal founda- tion could improve e xisting methods. Karimi et al. ( 2020 ; 2021 ) adv ocate for incorporating causal relationships into counterfactual explanations to enable actionable outcomes, while Chou et al. ( 2021 ) and Baron ( 2023 ) critique e xisting counterfactual methods for lacking causal grounding, which they argue leads to spurious correlations and incomplete explanations. Similarly , Carloni et al. ( 2023 ) highlight the absence of causality in current XAI as a critical limitation, emphasizing its necessity for building trust in AI systems. Finally , Beckers ( 2022 ) highlights causality’ s potential for action-guiding explanations in XAI, and Chen et al. ( 2023 ) propose integrating causal discov ery into XAI methods to enhance interpretability , leading to more actionable e xpla- nations. W e ar gue that grounding XAI in causal reasoning offers a principled lens to unify and clarify the field’ s frag- mented objectiv es. 2. Unifying XAI through Causal Semantics T o understand the role of causality in explainable AI (XAI), we first categorize existing XAI methods based on the pri- mary purpose of their explanations: the data X , the model f , 1 or the decisions Y . As in Figure 1 , these questions can be organized into three categories of questions based on purpose: • Data-based (“What?”) : Uncovering the structure and significance of the data X . • Model-based (“How?”) : Exploring how the model f transforms input X into output Y . • Decision 2 -based (“Why (not)?”) : Interpreting specific output Y for gi ven input X and model f . By structuring XAI methods within this framework, we 1 Here, f represents the predicti ve model to be explained, dis- tinct from the causal model of the world, M . 2 Like Miller ( 2019 ), we use “decision” broadly as AI system outputs, such as categorizations or action choices. highlight gaps due to a lack of causal grounding, setting a foundation for our argument that causality is essential for rigorous, valid XAI. 2.1. Data-Based Interpr etability (“What?”) Data-based interpretability focuses on understanding the structure of the input data X , answering questions such as: Q1 : “What explains the distrib ution of the data?” Q2 : “What underlying factors gener ate the data?” Data-based interpretability methods are particularly useful for exploratory data analysis and in contexts where under- standing biases or clusters within the data is crucial, e.g., • Low-Dimensional Representations methods such as Dimensionality Reduction (e.g., PCA, t-SNE ( V an der Maaten & Hinton , 2008 )) and Manifold Learning (e.g., UMAP ( McInnes et al. , 2018 ), Isomap ( T enenbaum et al. , 2000 )) map high-dimensional data, X to lo wer- dimensional spaces, re vealing latent structure, clusters, or trends that explain the distribution of the data and impact predictions. • Clustering & Group Discovery methods such as k- means, DBSCAN ( Schubert et al. , 2017 ), and hierar- chical clustering ( Murtagh & Contreras , 2012 ) identify coherent subgroups within X , often reflecting hidden regimes, user subtypes, or cov ariate shifts, and their boundaries can influence fairness, recourse, or gener- alization. • Density & Generative Modeling techniques like K er- nel Density Estimation ( W ˛ eglarczyk , 2018 ), Normalizing Flows ( Rezende & Mohamed , 2015 ), or V ariational Au- toencoders (V AEs) ( Kingma et al. , 2013 ) aim to estimate p ( X ) or model the generative process behind the data. These probabilistic models capture the underlying distri- butional characteristics of X , enabling the detection of anomalies, rare ev ents, and representative modes. These method families align with the broader goal of under - standing the intrinsic structure of data, and man y share con- ceptual ties with causal discovery —the identification of gen- erati ve mechanisms and dependencies within X itself ( Pearl , 2009 ; Spirtes et al. , 2001 ). Low-dimensional embeddings and manifold learning reveal smooth, interpretable repre- sentations of high-dimensional data; clustering highlights natural groupings; and density estimators and generative models characterize the full data distrib ution. Collectively , these tools expose latent patterns, confounders, and biases in the data that may shape model beha vior . They pro vide an essential foundation for explainability in AI systems, of fer- ing model-agnostic insights that complement model-based explanations like attention or attrib ution. 2 Position: Explainable AI is Causality in Disguise 2.2. Model-Based Interpr etability (“How?”) Model-based interpretability seeks to e xplain the function f , specifically how the model processes input X to produce output Y . This category addresses questions such as: Q3 : “How does the model transform inputs into outputs?” Q4 : “How do the model’ s internal mechanisms function?” Model-based interpretability is essential in re gulatory and high-stakes en vironments where transparenc y into f ’ s work- ings is required. These methods include: • Featur e Interaction methods (e.g., Partial Dependence Plots ( Friedman , 2001 ), Accumulated Local Effects ( Ap- ley & Zhu , 2020 )) explore interactions within f by sho w- ing ho w dif ferent features af fect Y . Partial Dependence Plots, for instance, illustrate the effect of one or two features on Y while other features are kept constant, re- vealing interactions in f . • Featur e Attribution methods (e.g., LIME ( Ribeiro et al. , 2016 ), SHAP ( Lundberg & Lee , 2017 )) decompose f ( X ) to assign an importance score to each feature in X , indi- cating its contrib ution to the output Y . Some works inter - pret these attributions as estimates of local (individual) causal effects ( Chattopadhyay et al. , 2019 ), suggesting that LIME can be approximated via input gradients in sufficiently smooth re gions. • Saliency and Visualization methods (e.g., Saliency Maps ( Simonyan et al. , 2013 ), Grad-CAM ( Selvaraju et al. , 2016 )) visualize gradients to identify important re- gions in X that affect Y , such as which image pixels are influential in a prediction. Grad-CAM, for example, gen- erates a heatmap highlighting image regions that impact the model’ s output. • Surrogate and Simplified Models aim to approximate complex models f in specific regions using inher ently interpr etable models (e.g., decision trees, linear mod- els). T o well & Shavlik ( 1993 ) extract rules to enhance interpretability in neural networks, and LIME provides local explanations through linear models ( Ribeiro et al. , 2016 ). While MASALA adapts locality for improved fi- delity ( Anwar et al. , 2024 ), MaLESCaMo introduces causal surrogate models ( T ermine et al. , 2023 ), and Laugel et al. ( 2018 ) focus on locality for surrogates in post-hoc interpretability . • Model-Intrinsic Interpr etability approaches use inter - pretable models like linear models, decision trees, and rule-based systems, allo wing direct inspection of f ’ s parameters to understand ho w X maps to Y without post- hoc explanations. For instance, Generalized Additiv e Models (GAMs) model responses as sums of functions of predictors ( Hastie & T ibshirani , 1987 ). The Bayesian Case Model uses representative cases for interpretabil- ity ( Kim et al. , 2014 ), while the Bayesian Rule Set frame- work learns interpretable rule sets ( W ang et al. , 2017 ). Interpretable Decision Sets pro vide a joint frame work for description and prediction, facilitating comprehensible decision-making processes ( Lakkaraju et al. , 2016 ). Mechanistic interpretability represents a prime example of successful, e xact causal discovery on a model’ s computation graph. Howe ver , because tracking billions of low-le vel nodes does not scale to human comprehension, it highlights the urgent need for causal abstraction models ( Geiger et al. , 2023 ) that group these low-le vel mechanisms into high-lev el, human-understandable causal variables. These approaches are closely related to understanding causal mechanisms ( Pearl , 2000 ; Peters et al. , 2017 )—the specific processes through which changes in input features X influence the output Y . By attributing importance to features, analyzing interactions, and approximating inter- nal model logic, these methods help unco ver the pathw ays within f that driv e model predictions. For example, fea- ture attrib ution methods quantify each feature’ s contrib ution to Y , aligning with causal mechanisms by revealing ho w particular inputs influence the model’ s output. Similarly , saliency maps and feature interaction methods highlight key re gions and feature dependencies within f , providing an interpretative view of how the model operates. This mechanistic understanding is essential in domains where stakeholders care to see not only which features matter b ut also how the y interact to produce predictions. 2.3. Decision-Based Interpr etability (“Why (not)?”) Decision-based interpretability focuses on e xplaining spe- cific outputs Y for giv en inputs X and model f , addressing questions such as: Q5 : “Why does the model make a specific decision for a given input?” Q6 : “Why would the decision differ if the input had been differ ent?” Decision-based interpretability is valuable in applications where understanding the rationale behind indi vidual deci- sions and possible alternativ es is crucial, such as in per- sonalized recommendations or legal judgments. Example methods include: • Counterfactual and Example-based Methods ( W achter et al. , 2017 ) illustrate what minimal changes to X would be necessary to alter the output Y , providing insight into decision boundaries by sho wing hypothetical scenarios in which the decision would dif fer . • Post-hoc Concept-based Explanation Methods (e.g., TCA V) ( Kim et al. , 2018 ) explain Y in terms of high- 3 Position: Explainable AI is Causality in Disguise lev el human-defined concepts, rather than indi vidual fea- tures of X . TCA V , for example, assesses the rele vance of specific concepts (like “striped” or “curved”) to a predic- tion, offering an interpretable, concept-le vel explanation. These methods draw on concepts from actual causal- ity ( Halpern , 2016 ) by using counterfactual reasoning to explore why a particular outcome was reached. Halpern and Pearl’ s causal model formalizes this approach, defining causes through counterfactual dependencies that clarify nec- essary and sufficient conditions for an outcome ( Halpern & Pearl , 2005 ). In practical terms, answering “why” questions in volves identifying the minimal changes in X that would al- ter Y , thereby uncovering the causal factors influencing the decision ( Lipton , 1990 ). Counterfactual reasoning provides actionable insights, as it clarifies the conditions under which an alternativ e outcome could occur . This concept of causal- ity has also been extended by W oodward ( 2005 ), who argues that interventions and counterfactuals pro vide a foundation for understanding causal explanations and model behavior . By leveraging such causal insights, decision-based inter- pretability approaches not only highlight decision bound- aries but also enhance understanding of model outcomes and potential user actions. T o operationalize this taxonomy for natural language queries, one must identify the target of the inquiry . Ques- tions regarding biases in the en vironment (e.g., “Why are most successful applicants male?”) map to Q1/Q2 (the world’ s SCM). Questions regarding internal architecture or feature weights map to Q3/Q4 (the model’ s SCM). Ques- tions reg arding a specific user outcome (e.g., “Ho w can I change my loan rejection?”) map to Q5/Q6 (Decision). This purpose-driven categorization of data-based, model- based, and decision-based XAI methods structures the re- sponse to XAI questions posed in Figure 1 . Howe ver , lack- ing causal assumptions limits robustness and generalizabil- ity across contexts. Below , we introduce causal foundations and explore ho w causal models address these XAI g aps. W e will also see ho w e xisting lines of circuit-based interpretabil- ity and causal abstraction further strengthen the claim that explanation is causal disco very in disguise . 3. Background on Causality Causality aims to model the relationships between variables where one variable causes changes in another , thereby going beyond mere statistical correlations to capture the under- lying mechanisms of the data-generating process. Unlike correlations, causal relationships entail directional influ- ence, allowing one to predict the effect of interventions and counterfactuals in the system ( Pearl , 2009 ). Multiple frame works formalize causality , including the Potential Out- comes frame work ( Rubin , 2005 ), Graphical Models ( Spirtes et al. , 2001 ), and Structural Causal Models (SCMs) ( Pearl , 2009 ), each offering unique perspectives on understand- ing causation. F or the purposes of this work, we adopt Pearl’ s SCM framework, as it provides a rigorous formalism for reasoning about causal mechanisms, interventions, and counterfactuals—critical components for constructing XAI systems. W e formalize the claim that access to the true causal model, represented as an SCM, is both sufficient and necessary for addressing purpose-driv en methods on the “What?”, “How?”, and “Why (not)?” of explanations. T o ground our claims, we define key concepts and notations employed throughout this section (more in Section B ). Definition 3.1 (Structural Causal Model (SCM)) . An SCM M is a tuple ⟨ U , V , F , P ( U ) ⟩ , where: • U = { U 1 , U 2 , . . . , U m } is a set of exo genous variables. • V = { V 1 , V 2 , . . . , V n } is a set of endogenous variables. • F is a set of structural equations f V : V ∈ V , wher e each f V maps the par ents of V and r elevant exo genous variables to V , i.e., V = f V (pa( V ) , U V ) . F specifies the causal mec hanisms underlying the data-g enerating pr ocess, pr oviding a mechanistic description of causal r elationships. • P ( U ) is a joint pr obability distribution over the e xoge- nous variables U . In the conte xt of a deterministic neural network, the endoge- nous variables ( V ) are the layers and outputs, while the exogenous v ariables ( U ) represent the specific properties of the input instance X = x that are fixed for that forward pass, alongside any e xternal stochasticity (e.g., dropout masks). Definition 3.2 (Causal Graph) . The causal gr aph G asso- ciated with an SCM M is a dir ected acyclic graph (D A G) wher e nodes repr esent variables in V , and edges r epr e- sent dir ect causal r elationships as specified by the struc- tural equations in F . G pr ovides a visual r epr esentation of causal dependencies and is a fundamental tool for iden- tifying causal pathways and potential confounders ( P earl , 2009 ; Spirtes et al. , 2001 ). Definition 3.3 (Observ ., Interv ., and Counterf. Queries) . Access to an SCM M enables analysis of three primary types of queries, each offering unique insights into the r ela- tionships captur ed by M : • Observational Queries : These in volve pr obabilities com- puted fr om the observed data distribution P ( V ) . They de- scribe associations between variables as observed with- out external manipulation and ar e limited to capturing corr elations rather than causation. • Interventional Queries : Interventions modify the under- lying structural equations in F to estimate causal ef- fects. Such interventions are denoted by the do-oper ator , do ( · ) , r epr esenting an exogenous alter ation that severs 4 Position: Explainable AI is Causality in Disguise the usual dependence of a variable on its causal parents, allowing for pr edictions under manipulated conditions. F or e xample, the query P ( Y = y | do ( X = x )) estimates the pr obability of Y = y when X is set to x by interven- tion ( P earl , 2009 ). • Counterfactual Queries : Counterfactual queries explor e hypothetical scenarios that diver ge fr om observed r eality , posing “what if ” questions about alternative outcomes. F or a given observed outcome, counterfactual r easoning considers what the outcome would have been had certain variables taken differ ent values. This r equir es condition- ing on observed data to infer observed values exo genous variables, U = u , and then modifying variables, X = x ′ , to then predict Y X = x ′ ( u ) counterfactuals ( P earl , 2009 ; Rubin , 2005 )—a pr ocess otherwise known as the abduc- tion–action–pr ediction paradigm. Definition 3.4 (Causal Discovery) . Unlike the queries above which pr esuppose a causal model, causal discovery ( Eber- har dt , 2017 ; Glymour et al. , 2019 ; Malinsky & Danks , 2018 ; Nogueira et al. , 2022 ; Spirtes & Zhang , 2016 ; V owels et al. , 2022 ) aims to infer the causal graph G fr om observational or experimental data, an essential step for constructing accurate causal models. This pr ocess faces challenges, in- cluding latent confounders, data scar city , and r eliance on assumptions like causal sufficiency . Methods for causal discovery include constraint-based appr oaches (e.g ., PC al- gorithm) ( Spirtes et al. , 2001 ), scor e-based methods ( Huang et al. , 2018 ), and functional causal models (e.g., additive noise models) ( P eters et al. , 2017 ). The ability to uncover causal r elationships is crucial for XAI, as it dir ectly affects the fidelity of the explanations g enerated. 4. Sufficiency and Necessity of Causality f or Explainable AI In the following theorems, we first formalize the sufficiency claim, followed by the necessity claim. Definition 4.1 (Accurate and Complete Answers to Q1-6) . F ollowing P earl ( 2009 ), we say an answer to any of the six cor e XAI questions (Q1–Q6 in F igure 1 ) is accurate and complete if it coincides e xactly with what the true Structural Causal Model (SCM) M pr edicts for that query . Concr etely: • Observational correctness (Q1, Q2) : The distribution of observed variables and the underlying generating factors match those in M . • Interventional correctness (Q3, Q4) : The effect of ma- nipulating inputs or tracing internal mechanisms r eflects the causal structur e of M . • Counterfactual correctness (Q5, Q6) : The counterfac- tual outcome Y X = x ′ ( u ) for a specific exo genous state u matches the counterfactual computed under M . Theorem 4.2 (Sufficiency of the T rue SCM for XAI) . Let M = ⟨ U , V , F , P ( U ) ⟩ be the unique true Structur al Causal Model of the data-g enerating pr ocess. Under stan- dar d assumptions (acyclicity , no unmeasur ed confounders, well-defined exog enous variables), having full access to M is sufficient to pr ovide accurate and complete answers to the six cor e XAI questions (Q1–Q6) depicted in Figur e 1 . Proof Sk etch (Full pr oof in App. A ) Since M specifies: 1. The causal graph G over the endogenous v ariables V , 2. A set of structural equations F indicating how each V i ∈ V depends on its parents pa( V i ) and possibly exogenous U V i , 3. The distribution P ( U ) over the exogenous v ariables, it uniquely determines the joint distribution of all v ariables, any interventional distribution via the do-operator do ( · ) , and any counterf actual query via abduction–action–prediction ( Pearl , 2009 ). Mapping these distributions to to Q1–Q6: • Q1 (Distribution of data) & Q2 (Underlying factors). The law of structural models allows us to deriv e P ( V ) exactly from M , and we see ho w exogenous v ariables U and functions f V i generate the observed data. • Q3 (How does the model pr ocess inputs?) & Q4 (How do internal mechanisms operate?). By tracing causal pathways in G (and applying F iterativ ely), we rev eal how input X propagates to output Y through intermedi- ate variables (hidden layers or sub-modules). • Q5 (Why a specific decision?) & Q6 (Why would the decision differ?) . Gi ven ( X = x, Y = y ) , we infer exogenous u ( abduction ), modify X ← x ′ ( action ), and compute Y X = x ′ ( u ) ( pr ediction ), explaining both why the model made its decision and how it would change under a different input. Because M yields precise observational, interventional, and counterfactual results, it provides complete and accurate explanations for all six questions. Thus, knowing the true SCM is sufficient for XAI. Assumption 4.3 (Q1–Q6 as a Separating Family) . Let Q = { Q 1 , . . . , Q 6 } denote the query functionals implied by F ig. 1. W e assume Q is separating for the model class under consideration. That is, for any two SCMs M = ˆ M in the class, there exists a query Q ∈ Q such that Q ( M ) = Q ( ˆ M ) . Theorem 4.4 (Necessity of the T rue SCM for XAI) . Sup- pose a dataset V is gener ated by a true but unknown SCM M . If an alternative model ˆ M does not match M in at least one structural equation or in its exo genous distribu- tion P ( U ) , then there exists at least one of the six XAI questions (Q1–Q6) for which ˆ M cannot pr ovide an accu- rate and complete answer . Proof Sketch (Full proof in App. A ) Recall that accurate 5 Position: Explainable AI is Causality in Disguise and complete answers require reproducing exactly the obser - vational, interventional, or counterfactual results from M . W e prove by contradiction: 1. Assume ˆ M is a different SCM than M but still claims to yield correct answers for all Q1–Q6. 2. There are three broad query types: • Observational (Q1–Q2) : If ˆ M dif fers in F or P ( U ) , it may induce a different joint distrib ution over V , misidentifying underlying data factors (Q2). • Interventional (Q3–Q4) : Even if ˆ M matches obser- vationally , the do-operator do ( X = x ) can produce different outcomes in ˆ M vs. M due to dif ferences in causal structure or confounding assumptions ( Pearl , 2009 ). • Counterfactual (Q5–Q6) : Counterfactual questions rely on abduction–action–prediction with the true exogenous state. A mismatch in structural equations leads to different counterf actual results. 3. Hence, there must be at least one question Q1–Q6 where ˆ M ’ s answer diver ges from M ’ s. This contradicts the assumption that ˆ M is correct for all XAI questions. Thus, to ensure accuracy and completeness across all six questions, access to the true M is necessary for XAI. 4.1. Discussion on Robustness and Limitations The Normativ e Ideal. W e emphasize that Theorems 4.2 and 4.4 define a normative ideal . Defining a the- oretical upper bound relativ e to an oracle SCM ( M = ⟨ U , V , F , P ( U ) ⟩ ) is not a tautology , but a necessary for- malization to establish the target semantics of XAI. Just as the Bayes Optimal Classifier defines the theoretical limit of predictiv e modeling—an oracle we rarely access but contin- ually approximate—the oracle SCM defines the theoretical limit of explainability . This frames all practical XAI meth- ods as approximate causal discov ery . T rading One Hard Problem f or Another . Howe ver , in real-world applications, such oracle-lev el causal knowledge is rarely accessible. W e recognize that by establishing this ideal, we are seemingly reducing the dif ficult pr oblem of explainability to the equally difficult problem of high- dimensional causal discovery . Howe ver , we argue it is bet- ter for the community to tackle this hard problem e xplicitly than to continue de veloping yet another XAI method with- out a rigorous way to gauge its effecti veness. By making the causal ground truth explicit, we establish a principled met- ric for progress, discouraging the proliferation of heuristic methods that lack formal semantics. Challenges in Causal Discovery . Inferring the true M from data presents sev eral well-known obstacles: • Faithfulness and Causal Sufficiency . Causal discovery typically assumes faithfulness (i.e., observed independen- cies reflect true causal structure) and causal sufficiency (i.e., no hidden common causes). If these assumptions fail, the inferred causal structure may be incorrect. 3 • Sample Complexity and Computational Constraints. Even when causal suf ficiency holds, reliable causal discov ery requires a large sample size, especially in high-dimensional settings. The number of samples re- quired grows exponentially with the number of vari- ables, making exhausti ve search computationally infeasi- ble ( Kalisch & Bühlman , 2007 ). • Identifiability and Equivalence Classes. Even with unlimited data and v alid assumptions, causal discov ery methods often recov er only a Marko v equi v alence class of D A Gs—multiple causal graphs that imply the same observational dependencies ( Spirtes et al. , 2001 ). This ambiguity means that without interventional data, key causal relationships may remain unresolved. Partial Causal Knowledge and Robustness. Because e x- haustiv e causal discovery is often intractable, practitioners must rely on estimated, partial models ( ˆ M ) by incorpo- rating known domain relationships. While this does not guarantee absolute correctness, it pro vides immense v alue ov er purely statistical methods. Specifically , the princi- ple of Independent Causal Mechanisms (ICM) provides a powerful theoretical justification for causal XAI. When an en vironment undergoes a distribution shift, typically only one causal mechanism changes while others remain in vari- ant. By anchoring explanations in causal abstractions rather than purely associational features, XAI methods inherit this in variance, pro viding explanations that remain rob ust across varying contexts. T o further assess robustness, sensiti vity analysis ( Saltelli et al. , 2004 ) can quantify the stability of these explanations under small perturbations to ˆ M . T akeaway . The ideal of fully accurate and complete XAI is difficult to achiev e due to the limitations above. In light of these challenges, correlation-based explanations (e.g., fea- ture importances, saliency maps) may suffice when the goal is merely to detect patterns, biases, or anomalies rather than to enable interventions. Nonetheless, a more nuanced view is that the r equir ed le vel of causal gr ounding depends on the stakeholder’ s objecti ve. When reliability matters—such as in high-stakes decision-making—approximate causal mod- els, ev en if imperfect, yield explanations that are funda- mentally more actionable and robust when addressing the div erse interpretability questions in Fig. 1 . 3 Consider three variables ( X, Y , Z ) where Z is an unmeasured confounder influencing both X and Y (i.e., Z → X , Z → Y ). W ithout observing Z , the learned ˆ M may wrongly suggest a direct causal link between X and Y , leading to incorrect explanations. 6 Position: Explainable AI is Causality in Disguise 5. A W ay Forward Recognizing these challenges, we propose strategic direc- tions to address them, focusing on two interrelated tasks: Concept Discovery and Relation Discovery . By adv ancing methods in these areas, we can approximate causal models more effecti vely and enhance e xplainable AI. Despite the limitations, we encourage the community to embrace these challenges, as they are essential steps to ward realizing that explainable AI is, in essence , causality in disguise . 5.1. Dual Challenges in Causal XAI: Concept Discovery and Relation Discovery Concept Discovery . Effecti ve explanations require a shared language of interpretable concepts { Z i } that align with the stakeholder’ s understanding. Explanations should be constructed using well-defined, semantically clear vari- ables to ensure meaningful communication. Current XAI methods vary along a Concept-Alignment Spectrum : • Fully Specified Concepts: At one end, methods like SHAP ( Lundberg & Lee , 2017 ) and causal re- course ( Karimi et al. , 2021 ) pro vide explanations using features X i with direct semantic meaning, such as age or income. These methods produce mappings ϕ : X → R that quantify feature contributions and support actionable interventions. • Low-Lev el Featur es: At the other end, methods like saliency maps ( Simonyan et al. , 2013 ) highlight groups of pixels in images, which lack inherent semantic meaning and require abstraction to align with human concepts. • Concept-Based Methods: In the middle, methods lik e TCA V ( Kim et al. , 2018 ) attempt to align explana- tions with predefined concepts by measuring alignment with existing embeddings. Ho wev er, TCA V is limited to known concepts and cannot discover new , relev ant concepts—the “unkno wn unknowns”—that may be cru- cial for understanding the model’ s behavior . T o enhance concept discovery , we advocate for methods that can uncover new concepts, potentially via causal ap- proaches such as Concept Bottleneck Models ( Koh et al. , 2020 ), Causal Concept Ef fect ( Goyal et al. , 2019 ), and Neuro-Symbolic Concept Learners ( Marconato et al. , 2023a ; Ellis et al. , 2023 ) of fering promising directions by treating concepts as entities that facilitate action and interpretability . These methods enable both structured learning and deeper understanding by integrating causal reasoning into concept discov ery . Concepts should also be identified at a granular- ity that is useful to a giv en stakeholder (or audience). Even if we had a perfect SCM of low-le vel features (e.g., pix- els), explanations would remain unhelpful unless translated to higher -le vel abstractions that align with human mental models ( Beckers & Halpern , 2019 ; Rubenstein et al. , 2017 ). Future research should thus emphasize learning and serving these causal concepts at the right le vel of detail, possibly via user interaction or iterativ e refinement. Relation Discovery . Disco vering causal relationships among identified variables { V i } remains a foundational challenge in interpretability . T raditional causal discov- ery and structure learning aim to infer a directed acyclic graph G = ( V , E ) , where V represents variables and E captures causal dependencies. Established algorithms like PC ( Spirtes et al. , 2001 ) and score-based methods ( Huang et al. , 2018 ) provide structure but are often computationally demanding in high-dimensional settings. W e propose lev er- aging adv ances in causal r epr esentation learning ( Bengio et al. , 2019 ; Schölkopf et al. , 2021 ), which striv e to cap- ture both the concept space and causal structure. These approaches can deepen interpretability by jointly learn- ing representations that are both semantically meaningful and causally informative. Howe ver , the scalability chal- lenge becomes especially pressing for large-scale or high- dimensional models (e.g., LLMs). Here, purely symbolic or conditional-independence-based causal discovery can be prohibiti vely slow . Exploring approximations such as sparse regressions, online structure learning, or domain-guided heuristics ( Granger , 1969 ) may be necessary to handle real- world data at scale. 5.2. Leveraging A pproximate Models and Interactiv e Appr oaches In practice, obtaining a fully accurate causal model is often infeasible due to data and computational limitations. T o address this, we adv ocate for appr oximate causal models supplemented by interactiv e, user -driven methods. By itera- tiv ely refining causal structures through user feedback and interventions, approximate models can better align with real- world needs, enabling users to validate and adjust causal assumptions as needed. In scenarios where full causal structure discov ery is imprac- tical, interactive appr oaches enable iterati ve refinement of causal models based on user interactions and counterfac- tual queries. This user-in-the-loop methodology aligns with recent advances in chain-of-thought reasoning ( W ei et al. , 2022 ) and large language models (e.g., GPT -4), allo wing explanations to e volve with stakeholder feedback, enhanc- ing their relev ance and causal grounding. Moreov er , an interacti ve process can rev eal the “right” le vel of abstraction for each user’ s goals ( T eso et al. , 2023 ), acknowledging that an exhausti ve model of the world is neither feasible nor desirable for most tasks. Instead, explanations should focus on those causal factors that the user can understand and act upon, effecti vely capturing a subset of the world’ s SCM aligned with the user’ s mental model ( Gerstenber g , 2024 ; Gerstenberg et al. , 2021 ). 7 Position: Explainable AI is Causality in Disguise 5.3. Summary of Recommendations In spite of limitations, our core thesis remains: XAI is causal- ity in disguise . Advances in concept and relation disco very will enable the construction of (approximate) causal mod- els that enhance the rigor, reliability , and applicability of explanations. W e encourage the community to inv est in: 1. Developing Robust Causal Discovery Algorithms: Improving methods to better handle high-dimensional data, hidden confounders, and model misspecification. Future work should also explore multi-lev el abstrac- tions ( Rubenstein et al. , 2017 ; Beckers & Halpern , 2019 ) to balance expressi vity with user interpretability . 2. Advancing Causal Repr esentation Learning: Jointly learning concepts and causal relations that are inter- pretable, stable, and scalable. Concept discovery should align with users’ internal models, recognizing that no single “correct” v ariable decomposition e xists uni ver - sally ( T eso et al. , 2023 ). This calls for methods that bridge machine-learned representations with human- understandable structures. 3. Promoting Interactiv e Explanations: Engaging stake- holders in refining causal models through iterative feed- back. This aligns with “explanatory interacti ve learn- ing” ( T eso & K ersting , 2019 ), where users refine models by correcting e xplanations, steering causal learning to ensure relev ance and actionability . By pursuing these directions, we can mitigate the practical challenges of causal XAI while moving toward a principled foundation for explainability . 5.4. Alternativ e V iews: Possible Limitations of SCMs for Repr esenting Human Intuition While we argue that explainable AI is fundamentally a causal problem, an opposing perspectiv e questions whether Structural Causal Models (SCMs) are the right frame work for capturing human reasoning used for e xplanation. Specif- ically , SCMs are often criticized for their limited expres- siv eness in representing rich, structured mental models of the world. Human reasoning frequently operates through intuitive theories ( Gerstenberg & T enenbaum , 2017 ), which go beyond the propositional nature of SCMs. For exam- ple, in physics, people intuiti vely understand the world in terms of objects, forces, and attrib utes (e.g., mass, elasticity , friction), rather than abstract causal graphs. When reason- ing counterfactually , humans naturally ask questions such as “What if this object hadn’t been there?” or “What if a reasonable person had acted differently?”—queries that are difficult to formalize in an SCM, where variables typically represent discrete ev ents or predefined states. Unlike SCMs, which encode causal mechanisms as structured equations ov er variables, human cognition often blends causal rea- soning with spatial, temporal, and qualitativ e constraints, making it unclear whether SCMs are the best mathematical framew ork for modeling how people construct and inter- pret explanations. For further discussion of SCMs and their comparison to Potential Outcomes, see Appendix B.2 . One response to this challenge is to extend SCMs with hier- archical abstractions that align with ho w humans structure knowledge. Recent work on causal abstraction models and neur o-symbolic r easoning of fers promising directions by introducing layers of representation that move beyond traditional SCM constraints. Howe ver , these approaches remain an open area of research, and critics argue that a truly human-aligned XAI framework may require funda- mentally different tools—potentially drawing from cogni- tiv e science, probabilistic programs, or physics-inspired models—to bridge the gap between mechanistic causality and intuitiv e human understanding. 5.5. Conclusion The vast landscape of e xplainable AI is mark ed by an ov er- whelming number of methods, surveys, and perspecti ves, all of which underscore the field’ s current lack of consensus. This paper argues that achie ving such consensus hinges on viewing XAI through a causal lens, demonstrating through formal necessity and sufficienc y results that causal assump- tions are both essential and adequate to address purpose- driv en questions around the “What?”, “Ho w?”, and “Why (not)?” of explanations. By positioning e xplanations within a causal model, researchers and practitioners can align on clearer , more robust foundations for XAI, ef fecti vely vie w- ing it as causality in disguise . Building on this viewpoint, we adv ocate for adv ancing con- cept discovery and r elation discovery to identify v ariables and causal links at a lev el of abstraction that matches stake- holders’ mental models. In practice, approximate causal modeling and interacti ve refinement are key . By itera- tiv ely engaging users (e.g., through counterfactual queries or explanatory interacti ve learning), we can con verge on a causal representation that offers actionable insights while accommodating the complexities of real-world systems. Ul- timately , we encourage the community to see be yond frag- mented XAI methods and move toward a unified causal framew ork—one that embraces multi-lev el abstractions, in- teractiv e approaches, and real-world constraints. Although challenges like scalability , incomplete domain kno wledge, and unmeasured confounders remain, the y should be vie wed not as barriers but as opportunities to refine and extend causal discov ery methodologies for explainable AI. By do- ing so, we belie ve the field can progress tow ard a shared, actionable approach to XAI that balances rigor , utility , and adaptability for div erse stakeholders. 8 Position: Explainable AI is Causality in Disguise References Adadi, A. and Berrada, M. Peeking inside the black-box: a surve y on e xplainable artificial intelligence (xai). IEEE access , 6:52138–52160, 2018. Adebayo, J., Gilmer , J., Muelly , M., Goodfellow , I., Hardt, M., and Kim, B. Sanity checks for saliency maps. Ad- vances in neural information pr ocessing systems , 31, 2018. Adolfi, F ., V ilas, M. G., and W areham, T . The computational complexity of circuit disco very for inner interpretability . arXiv pr eprint arXiv:2410.08025 , 2024. Amgoud, L. and Ben-Naim, J. Axiomatic foundations of explainability . In IJCAI , pp. 636–642. V ienna, 2022. Anwar , S., Griffiths, N., Bhalerao, A., and Popham, T . Masala: Model-agnostic surrogate explanations by local- ity adaptation. arXiv pr eprint arXiv:2408.10085 , 2024. Apley , D. W . and Zhu, J. V isualizing the ef fects of predictor variables in black box supervised learning models. Jour - nal of the Royal Statistical Society Series B: Statistical Methodology , 82(4):1059–1086, 2020. Arrieta, A. B., Díaz-Rodríguez, N., Del Ser , J., Bennetot, A., T abik, S., Barbado, A., García, S., Gil-López, S., Molina, D., Benjamins, R., et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion , 58:82–115, 2020. Barceló, P ., Monet, M., Pérez, J., and Subercaseaux, B. Model interpretability through the lens of computational complexity . Advances in neural information pr ocessing systems , 33:15487–15498, 2020. Baron, S. Explainable ai and causal understanding: Coun- terfactual approaches considered. SpringerLink , 2023. Bassan, S., Amir, G., and Katz, G. Local vs. global in- terpretability: A computational complexity perspecti ve. arXiv pr eprint arXiv:2406.02981 , 2024. Beckers, S. Causal explanations and xai. In Confer ence on causal learning and r easoning , pp. 90–109. PMLR, 2022. Beckers, S. and Halpern, J. Y . Abstracting causal models. In Pr oceedings of the AAAI Conference on Artificial Intellig ence (AAAI) , pp. 2678–2685, 2019. URL https://www.cs.cornell.edu/home/halpern/papers/abstraction.pdf . Bengio, Y ., Deleu, T ., Rahaman, N., Ke, N., Lachapelle, S., Bilaniuk, O., Goyal, A., and Pal, C. J. A meta-transfer objectiv e for learning to disentangle causal mechanisms. arXiv pr eprint arXiv:1901.10912 , 2019. Carloni, G., Berti, A., and Colantonio, S. The role of causal- ity in e xplainable artificial intelligence. arXiv pr eprint arXiv:2309.09901 , 2023. Carvalho, D. V ., Pereira, E. M., and Cardoso, J. S. Ma- chine learning interpretability: A surve y on methods and metrics. Electr onics , 8(8):832, 2019. Chattopadhyay , A., Manupriya, P ., Sarkar, A., and Balasub- ramanian, V . N. Neural network attributions: A causal perspectiv e. In International Confer ence on Machine Learning , pp. 981–990. PMLR, 2019. Chen, Z. et al. Causal explainable ai. In Pr oceedings of the 2023 IEEE Confer ence on AI , 2023. Chou, Y .-L. et al. Counterfactuals and causability in ex- plainable artificial intelligence: Theory , algorithms, and applications. arXiv pr eprint arXiv:2103.04244 , 2021. Dombrowski, A.-K., Alber , M., Anders, C., Ackermann, M., Müller, K.-R., and Kessel, P . Explanations can be manipulated and geometry is to blame. Advances in neural information pr ocessing systems , 32, 2019. Doshi-V elez, F . and Kim, B. T owards a rigorous sci- ence of interpretable machine learning. arXiv pr eprint arXiv:1702.08608 , 2017. Došilovi ´ c, F . K., Br ˇ ci ´ c, M., and Hlupi ´ c, N. Explainable arti- ficial intelligence: A survey . In 2018 41st International con vention on information and communication technol- ogy , electr onics and microelectr onics (MIPR O) , pp. 0210– 0215. IEEE, 2018. Du, M., Liu, N., and Hu, X. T echniques for interpretable machine learning. Communications of the A CM , 63(1): 68–77, 2019. Eberhardt, F . Introduction to the foundations of causal discov ery . International Journal of Data Science and Analytics , 3:81–91, 2017. Ehsan, U. and Riedl, M. O. Social construction of xai: Do we need one definition to rule them all? P atterns , 5(2), 2024. Ellis, K., W ong, L., Nye, M., Sable-Meyer , M., Cary , L., Anaya Pozo, L., Hewitt, L., Solar -Lezama, A., and T enen- baum, J. B. Dreamcoder: growing generalizable, inter - pretable kno wledge with wake–sleep bayesian program learning. Philosophical T ransactions of the Royal Society A , 381(2251):20220050, 2023. Friedman, J. H. Greedy function approximation: a gradient boosting machine. Annals of statistics , pp. 1189–1232, 2001. 9 Position: Explainable AI is Causality in Disguise Geiger , A., Ibeling, D., Zur , A., Chaudhary , M., Chauhan, S., Huang, J., Arora, A., W u, Z., Goodman, N., Potts, C., et al. Causal abstraction: A theoretical founda- tion for mechanistic interpretability . arXiv preprint arXiv:2301.04709 , 2023. Gerstenberg, T . Counterfactual simulation in causal cogni- tion. T rends in Co gnitive Sciences , 2024. Gerstenberg, T . and T enenbaum, J. B. Intuiti ve theories. 2017. Gerstenberg, T ., Goodman, N. D., Lagnado, D. A., and T enenbaum, J. B. A counterfactual simulation model of causal judgments for physical ev ents. Psychological r eview , 128(5):936, 2021. Ghorbani, A., Abid, A., and Zou, J. Interpretation of neural networks is fragile. In Pr oceedings of the AAAI confer- ence on artificial intelligence , volume 33, pp. 3681–3688, 2019. Gilpin, L. H., Bau, D., Y uan, B. Z., Bajwa, A., Specter , M., and Kagal, L. Explaining explanations: An overvie w of interpretability of machine learning. In 2018 IEEE 5th International Confer ence on data science and advanced analytics (DSAA) , pp. 80–89. IEEE, 2018. Glymour , C., Zhang, K., and Spirtes, P . Revie w of causal discov ery methods based on graphical models. F rontier s in Genetics , 10:524, 2019. Goyal, Y ., Feder , A., Shalit, U., and Kim, B. Explaining clas- sifiers with causal concept ef fect (cace). arXiv preprint arXiv:1907.07165 , 2019. Granger , C. W . In vestigating causal relations by economet- ric models and cross-spectral methods. Econometrica: journal of the Econometric Society , pp. 424–438, 1969. Guidotti, R., Monreale, A., Ruggieri, S., T urini, F ., Gian- notti, F ., and Pedreschi, D. A survey of methods for explaining black box models. ACM computing surveys (CSUR) , 51(5):1–42, 2018. Gunning, D. and Aha, D. Darpa’ s explainable artificial intelligence (xai) program. AI magazine , 40(2):44–58, 2019. Gunning, D., Stefik, M., Choi, J., Miller , T ., Stumpf, S., and Y ang, G.-Z. Xai—explainable artificial intelligence. Science r obotics , 4(37):eaay7120, 2019. Halpern, J. Y . Actual causality . MiT Press, 2016. Halpern, J. Y . and Pearl, J. Causes and explanations: A structural-model approach. part i: Causes. The British Journal for the Philosophy of Science , 56(4):843–887, 2005. Hamon, R., Junkle witz, H., Sanchez, I., et al. Robustness and explainability of artificial intelligence. Publications Office of the Eur opean Union , 207:2020, 2020. Hastie, T . and Tibshirani, R. Generalized additiv e models: some applications. Journal of the American Statistical Association , 82(398):371–386, 1987. Heo, J., Joo, S., and Moon, T . Fooling neural network inter - pretations via adversarial model manipulation. Advances in neural information pr ocessing systems , 32, 2019. Hoffman, R. R., Mueller , S. T ., Klein, G., and Litman, J. Metrics for explainable ai: Challenges and prospects. arXiv pr eprint arXiv:1812.04608 , 2018. Huang, B., Zhang, K., Lin, Y ., Schölkopf, B., and Glymour, C. Generalized score functions for causal discovery . In Pr oceedings of the 24th A CM SIGKDD international con- fer ence on knowledge discovery & data mining , pp. 1551– 1560, 2018. Imbens, G. W . Potential outcome and directed ac yclic graph approaches to causality: Relev ance for empirical practice in economics. Journal of Economic Literatur e , 58(4): 1129–1179, 2020. Janizek, J. D., Sturmfels, P ., and Lee, S.-I. Explaining explanations: Axiomatic feature interactions for deep networks. Journal of Machine Learning Researc h , 22 (104):1–54, 2021. Janzing, D., Minorics, L., and Blöbaum, P . Feature rele- vance quantification in e xplainable ai: A causal problem. In International Confer ence on artificial intelligence and statistics , pp. 2907–2916. PMLR, 2020. Kalisch, M. and Bühlman, P . Estimating high-dimensional directed acyclic graphs with the pc-algorithm. Journal of Machine Learning Resear ch , 8(3), 2007. Karimi, A.-H., von Kügelgen, J., Schölkopf, B., and V alera, I. Algorithmic recourse under imperfect causal knowl- edge: A probabilistic approach. In Advances in Neural Information Pr ocessing Systems (NeurIPS) , 2020. Karimi, A.-H., Schölkopf, B., and V alera, I. Algorithmic re- course: from counterfactual explanations to interventions. In Pr oceedings of the 2021 ACM confer ence on fairness, accountability , and transpar ency , pp. 353–362, 2021. Karimi, A.-H., Muandet, K., K ornblith, S., Schölkopf, B., and Kim, B. On the relationship between explanation and prediction: a causal view . In Pr oceedings of the 40th International Confer ence on Machine Learning , pp. 15861–15883, 2023. 10 Position: Explainable AI is Causality in Disguise Kim, B., Rudin, C., and Shah, J. A. The bayesian case model: A generativ e approach for case-based reasoning and prototype classification. Advances in neural informa- tion pr ocessing systems , 27, 2014. Kim, B., W attenber g, M., Gilmer , J., Cai, C., W exler , J., and V iegas, F . Interpretability be yond feature attribution: Quantitati ve testing with concept activ ation v ectors (tca v). In Pr oceedings of the 35th International Confer ence on Machine Learning (ICML) , pp. 2668–2677. PMLR, 2018. Kindermans, P .-J., Hooker , S., Adebayo, J., Alber , M., Schütt, K. T ., Dähne, S., Erhan, D., and Kim, B. The (un) reliability of salienc y methods. Explainable AI: In- terpr eting, explaining and visualizing deep learning , pp. 267–280, 2019. Kingma, D. P ., W elling, M., et al. Auto-encoding variational bayes, 2013. K odratof f, Y . The comprehensibility manifesto. KDD Nugget Ne wsletter , 94(9), 1994. K oh, P . W ., Nguyen, T ., T ang, Y . S., Mussmann, S., Pierson, E., Kim, B., and Liang, P . Concept bottleneck models. In International confer ence on machine learning , pp. 5338– 5348. PMLR, 2020. Lakkaraju, H., Bach, S. H., and Leskov ec, J. Interpretable decision sets: A joint framework for description and pre- diction. In Pr oceedings of the 22nd ACM SIGKDD in- ternational confer ence on knowledge discovery and data mining , pp. 1675–1684, 2016. Laugel, T ., Renard, X., Lesot, M.-J., Marsala, C., and De- tyniecki, M. Defining locality for surrogates in post-hoc interpretablity . arXiv pr eprint arXiv:1806.07498 , 2018. Laugel, T ., Lesot, M.-J., Marsala, C., Renard, X., and De- tyniecki, M. The dangers of post-hoc interpretability: Unjustified counterfactual explanations. arXiv preprint arXiv:1907.09294 , 2019. Leblanc, B. and Germain, P . Seeking interpretability and explainability in binary activ ated neural networks. In W orld Confer ence on Explainable Artificial Intelligence , pp. 3–20. Springer , 2024. Lipton, P . Contrastive e xplanation. Royal Institute of Phi- losophy Supplements , 27:247–266, 1990. Lipton, Z. C. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery . Queue , 16(3):31–57, 2018. Lundberg, S. M. and Lee, S.-I. A unified approach to inter- preting model predictions. Advances in Neural Informa- tion Pr ocessing Systems , 30:4765–4774, 2017. Malinsky , D. and Danks, D. Causal discovery algorithms: A practical guide. Philosophy Compass , 13(1):e12470, 2018. Marcinke vi ˇ cs, R. and V ogt, J. E. Interpretability and ex- plainability: A machine learning zoo mini-tour . arXiv pr eprint arXiv:2012.01805 , 2020. Marconato, E., Passerini, A., and T eso, S. Interpretability is in the mind of the beholder: A causal framew ork for human-interpretable representation learning. Entropy , 25 (12):1574, 2023a. Marconato, E., T eso, S., V ergari, A., and P asserini, A. Not all neuro-symbolic concepts are created equal: Analysis and mitigation of reasoning shortcuts. Advances in Neu- ral Information Processing Systems , 36:72507–72539, 2023b. McInnes, L., Healy , J., and Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv pr eprint arXiv:1802.03426 , 2018. Miller , T . Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence , 267:1–38, 2019. Mittelstadt, B., Russell, C., and W achter , S. Explaining explanations in ai. In Pr oceedings of the conference on fairness, accountability , and transpar ency , pp. 279–288, 2019. Montagna, F ., Faller , P . M., Bloebaum, P ., Kirschbaum, E., and Locatello, F . Score matching through the roof: linear, nonlinear , and latent v ariables causal discovery . arXiv pr eprint arXiv:2407.18755 , 2024. Montav on, G., Samek, W ., and Müller, K.-R. Methods for interpreting and understanding deep neural networks. Digital signal pr ocessing , 73:1–15, 2018. Murdoch, W . J., Singh, C., Kumbier , K., Abbasi-Asl, R., and Y u, B. Definitions, methods, and applications in inter - pretable machine learning. Proceedings of the National Academy of Sciences , 116(44):22071–22080, 2019. Murtagh, F . and Contreras, P . Algorithms for hierarchi- cal clustering: an overvie w . W iley Inter disciplinary Re- views: Data Mining and Knowledge Discovery , 2(1): 86–97, 2012. Namatevs, I., Sudars, K., and Dobrajs, A. Interpretability versus explainability: Classification for understanding deep learning systems and models. Computer Assisted Methods in Engineering and Science , 29(4):297–356, 2022. 11 Position: Explainable AI is Causality in Disguise Nogueira, A. R., Pugnana, A., Ruggieri, S., Pedreschi, D., and Gama, J. Methods and tools for causal discovery and causal inference. W ile y inter disciplinary r eviews: data mining and knowledge discovery , 12(2):e1449, 2022. Pearl, J. Models, Reasoning and Infer ence . Cambridge Univ ersity Press, 2000. Pearl, J. Causality . Cambridge university press, 2009. Pearl, J. The se ven tools of causal inference, with reflections on machine learning. Communications of the ACM , 62 (3):54–60, 2019. Peters, J., Janzing, D., and Schölkopf, B. Elements of Causal Infer ence: F oundations and Learning Algorithms . MIT Press, 2017. Preece, A., Harborne, D., Braines, D., T omsett, R., and Chakraborty , S. Stakeholders in explainable ai. arXiv pr eprint arXiv:1810.00184 , 2018. Rezende, D. and Mohamed, S. V ariational inference with normalizing flows. In International confer ence on ma- chine learning , pp. 1530–1538. PMLR, 2015. Ribeiro, M. T ., Singh, S., and Guestrin, C. " why should i trust you?" e xplaining the predictions of an y classifier . In Pr oceedings of the 22nd ACM SIGKDD international confer ence on knowledg e discovery and data mining , pp. 1135–1144, 2016. Rubenstein, P . K., W eichwald, S., Bongers, S., Mooij, J. M., Janzing, D., Grosse-W entrup, M., and Schölkopf, B. Causal consistency of structural equation models. arXiv pr eprint arXiv:1707.00819 , 2017. Rubin, D. B. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association , 100(469):322–331, 2005. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Natur e machine intellig ence , 1(5):206– 215, 2019. Saltelli, A., T arantola, S., Campolongo, F ., Ratto, M., et al. Sensitivity analysis in practice: a guide to assessing sci- entific models , volume 1. Wile y Online Library , 2004. Samek, W ., Montavon, G., Lapuschkin, S., Anders, C. J., and Müller , K.-R. Explaining deep neural networks and beyond: A revie w of methods and applications. Proceed- ings of the IEEE , 109(3):247–278, 2021. Schölkopf, B., Locatello, F ., Bauer, S., Ke, N., Kalchbrenner , N., Goyal, A., and Bengio, Y . T ow ard causal representa- tion learning. Pr oceedings of the IEEE , 109(5):612–634, 2021. Schubert, E., Sander , J., Ester , M., Kriegel, H. P ., and Xu, X. Dbscan revisited, re visited: why and ho w you should (still) use dbscan. ACM T ransactions on Database Sys- tems (TODS) , 42(3):1–21, 2017. Selvaraju, R. R., Das, A., V edantam, R., Cogswell, M., Parikh, D., and Batra, D. Grad-cam: Why did you say that? arXiv preprint , 2016. Shavit, Y . and Moses, W . S. Extracting incentiv es from black-box decisions. arXiv preprint , 2019a. Shavit, Y . and Moses, W . S. Extracting incentiv es from black-box decisions. arXiv preprint , 2019b. Simonyan, K., V edaldi, A., and Zisserman, A. Deep inside con volutional networks: V isualising image classification models and saliency maps. In W orkshop at International Confer ence on Learning Representations (ICLR) , 2013. Slack, D., Hilgard, S., Jia, E., Singh, S., and Lakkaraju, H. Fooling lime and shap: Adversarial attacks on post hoc ex- planation methods. In Pr oceedings of the AAAI/ACM Con- fer ence on AI, Ethics, and Society , pp. 180–186, 2020. Speith, T . A revie w of taxonomies of explainable artifi- cial intelligence (xai) methods. In Pr oceedings of the 2022 A CM confer ence on fairness, accountability , and transpar ency , pp. 2239–2250, 2022. Spirtes, P . and Zhang, K. Causal discovery and inference: concepts and recent methodological adv ances. In Applied informatics , volume 3, pp. 1–28. Springer , 2016. Spirtes, P ., Glymour , C., and Scheines, R. Causation, pr e- diction, and sear ch . MIT press, 2001. Sulli van, E. and V erreault-Julien, P . From e xplanation to rec- ommendation: Ethical standards for algorithmic recourse. In Pr oceedings of the 2022 AAAI/A CM Confer ence on AI, Ethics, and Society , pp. 712–722, 2022. Sundararajan, M., T aly , A., and Y an, Q. Axiomatic attribu- tion for deep networks. In International confer ence on machine learning , pp. 3319–3328. PMLR, 2017. T enenbaum, J. B., Silva, V . d., and Langford, J. C. A global geometric frame work for nonlinear dimensionality reduction. science , 290(5500):2319–2323, 2000. T ermine, A., Antonucci, A., and Facchini, A. Machine learn- ing explanations by surrogate causal models (malescamo). In xAI (Late-breaking W ork, Demos, Doctoral Consor- tium) , pp. 59–64, 2023. 12 Position: Explainable AI is Causality in Disguise T eso, S. and K ersting, K. Explanatory interactiv e machine learning. In Pr oceedings of the A CM Confer ence on Human F actors in Computing Systems (CHI) , pp. 1–11. A CM, 2019. doi: 10.1145/3306618.3314293. URL https://dl.acm.org/doi/abs/10.1145/3306618.3314293 . T eso, S. et al. Lev eraging explanations in interac- tiv e machine learning: An overvie w . Entr opy , 25 (3):441, 2023. doi: 10.3390/e25030441. URL https://www.mdpi.com/1099-4300/25/3/441 . Tjoa, E. and Guan, C. A survey on explainable artificial intelligence (xai): T o ward medical xai. IEEE transactions on neural networks and learning systems , 32(11):4793– 4813, 2020. T omsett, R., Harborne, D., Chakraborty , S., Gurram, P ., and Preece, A. Sanity checks for saliency metrics. In Pr o- ceedings of the AAAI confer ence on artificial intellig ence , volume 34, pp. 6021–6029, 2020. T owell, G. G. and Shavlik, J. W . Extracting refined rules from knowledge-based neural networks. Machine learn- ing , 13:71–101, 1993. V an der Maaten, L. and Hinton, G. V isualizing data using t-sne. Journal of machine learning r esearc h , 9(11), 2008. V on Kügelgen, J., Karimi, A.-H., Bhatt, U., V alera, I., W eller, A., and Schölkopf, B. On the fairness of causal algorithmic recourse. In Proceedings of the AAAI confer - ence on artificial intelligence , volume 36, pp. 9584–9594, 2022. V owels, M. J., Camgoz, N. C., and Bo wden, R. D’ya like dags? a survey on structure learning and causal discov ery . A CM Computing Surve ys , 55(4):1–36, 2022. W achter, S., Mittelstadt, B., and Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the gdpr . Harv . JL & T ech. , 31:841, 2017. W ang, T ., Rudin, C., Doshi-V elez, F ., Liu, Y ., Klampfl, E., and MacNeille, P . A bayesian frame work for learning rule sets for interpretable classification. Journal of Machine Learning Resear ch , 18(70):1–37, 2017. W ˛ eglarczyk, S. Kernel density estimation and its application. In ITM web of confer ences , volume 23, pp. 00037. EDP Sciences, 2018. W ei, J., W ang, X., Schuurmans, D., Bosma, M., Ichter , B., Xia, F ., Chi, E., Le, Q., Zhou, D., et al. Chain-of-thought prompting elicits reasoning in large language models. In Pr oceedings of the Conference on Empirical Methods in Natural Languag e Pr ocessing (EMNLP) , 2022. W ickstrøm, K., Höhne, M., and Hedström, A. From flexibil- ity to manipulation: The slippery slope of xai e valuation. W oodward, J. Making things happen: A theory of causal explanation . Oxford University Press, 2005. Y eh, C.-K., Hsieh, C.-Y ., Suggala, A., Inouye, D. I., and Ravikumar , P . K. On the (in) fidelity and sensiti vity of explanations. Advances in neural information pr ocessing systems , 32, 2019. Y osinski, J., Clune, J., Nguyen, A., Fuchs, T ., and Lipson, H. Understanding neural networks through deep visual- ization. arXiv pr eprint arXiv:1506.06579 , 2015. Zhang, Y . Causal abstraction in model interpretability: A compact surve y . arXiv pr eprint arXiv:2410.20161 , 2024. 13 Position: Explainable AI is Causality in Disguise A. Sufficiency and Necessity of Causality f or Explainable AI Theorem 4.2 (Suf ficiency of the T rue SCM for XAI) . Let M = ⟨ U , V , F , P ( U ) ⟩ be the unique true Structural Causal Model of the data-gener ating pr ocess. Under standar d assumptions (acyclicity , no unmeasur ed confounders, well-defined exo genous variables), having full access to M is sufficient to pr ovide accurate and complete answers to the six cor e XAI questions (Q1–Q6) depicted in F igure 1 . Pr oof. W e proceed by examining each question indi vidually , using the predefined v ariables and formal language. Q1: What explains the distribution of the data? The SCM M specifies the structural equations F and the distribution P ( U ) . The joint distribution of the endogenous variables V can be derived from M using the law of structural models : P ( V ) = Z U Y V i ∈ V δ V i − f V i (pa( V i ) , U V i ) P ( U ) d U where δ ( · ) is the Dirac delta function ensuring that V i satisfies its structural equation, and pa( V i ) are the parents of V i in the causal graph G associated with M . Since we can deri ve P ( V ) from M , we can fully explain the distribution of the data, accounting for all dependencies and relationships specified by the structural equations and exogenous distributions. Q2: What underlying factors generate the data? In the SCM M , the exogenous v ariables U represent the underlying factors that are not determined within the model but affect the endogenous variables through the structural equations. Each endogenous variable V i is generated by: V i = f V i (pa( V i ) , U V i ) . Access to M giv es both the e xogenous variables U and the structural equations F , allowing us to identify and understand the underlying factors generating the observed data. Q3: How does the model transform inputs into outputs? Suppose the AI model takes inputs X ⊆ V and produces outputs Y ⊆ V . The causal pathways from X to Y are specified in the causal graph G associated with M . The structural equations define ho w each v ariable depends on its parents: V i = f V i (pa( V i ) , U V i ) . By following these equations along the paths from X to Y , we can trace ho w inputs are transformed into outputs through the model. Specifically , we can compute the effect of X on Y by recursi vely ev aluating the structural equations. Q4: How do the model’ s internal mechanisms function? Internal mechanisms (e.g., hidden layers, intermediate computations) are represented by intermediate endogenous variables H ⊆ V in the SCM. The structural equations for the internal variables are: H j = f H j (pa( H j ) , U H j ) By analyzing these equations and their dependencies, we can understand ho w the internal v ariables operate and contribute to the processing of inputs X to outputs Y . The causal graph G of model M shows the connections between X , H , and Y , allo wing us to trace the flo w of information and causation through the model’ s internal structure. Q5: Why does the model make a specific decision for a giv en input? Giv en a specific input X = x and the observed output Y = y , we can perform abduction to infer the values of the exogenous v ariables U = u consistent with these observations. Using the inferred u and the structural equations F , we can then trace the causal pathw ays from X = x to Y = y , identifying the causal mechanisms and intermediate v ariables that led to the decision. Q6: Why would the decision differ if the input had been dif ferent? T o answer this counterfactual question, we consider an alternativ e input X = x ′ while keeping the inferred e xogenous v ariables U = u fixed at the values inferred during abduction. Finally , comparing the counterfactual output Y ∗ with the original output Y = y to understand how and why the decision w ould differ under the alternati ve input. 14 Position: Explainable AI is Causality in Disguise Overall Conclusion In each case, access to the true SCM M provides suf ficient information—whether through computing distributions, tracing causal pathw ays, or performing counterfactual reasoning—to accurately and completely answer each of the six XAI questions. Theorem 4.4 (Necessity of the T rue SCM for XAI) . Suppose a dataset V is gener ated by a true but unknown SCM M . If an alternative model ˆ M does not match M in at least one structural equation or in its e xogenous distrib ution P ( U ) , then ther e exists at least one of the six XAI questions (Q1–Q6) for which ˆ M cannot pr ovide an accurate and complete answer . Pr oof. W e will demonstrate that without causal information—specifically , without access to the true SCM M —it is impossible to answer the six core XAI questions. W e proceed by addressing each question indi vidually , using the predefined variables and formal language established earlier . Q1: What explains the distribution of the data? W ithout causal information, we only hav e access to the observ ational distribution P ( V ) of the endogenous variables V . Howe ver , P ( V ) encodes statistical associations but not causal relationships. Statistical dependencies in P ( V ) can arise from various causal structures, such as direct causation, confounding, or e ven collider ef fects. Illustrative Example : Consider three v ariables X , Y , and Z with the following causal structures: 1. Confounding : Z is a common cause of X and Y , i.e., Z → X , Z → Y . 2. Causal Chain : X causes Z , which in turn causes Y , i.e., X → Z → Y . 3. Collider : X and Y both cause Z , i.e., X → Z ← Y . All these structures can produce similar statistical associations between X and Y in P ( V ) . Without causal assumptions or knowledge of the underlying SCM, we cannot distinguish among these possibilities. Q2: What underlying factors generate the data? In an SCM M = ⟨ U , V , F , P ( U ) ⟩ , the exogenous variables U and structural equations F define how the observed data V are generated: V i = f V i (pa( V i ) , U V i ) , ∀ V i ∈ V W ithout access to M , we lack kno wledge of both U (the unobserved factors) and F (the causal mechanisms). Consequently , we cannot accurately model the data-generating process. Q3: How does the model transform inputs into outputs? Suppose the AI model is represented as a function f : X → Y . W ithout causal information, we can estimate the conditional distribution P ( Y | X ) from observ ational data. Howe ver , this distrib ution reflects statistical associations, not necessarily causal effects. Potential issues include: • Confounding : A hidden v ariable Z ∈ V (or Z ∈ U ) af fects both X and Y , inducing spurious associations. • Reverse Causation : The true causal direction might be Y → X . • Feedback Loops : Cyclic dependencies complicate the interpretation of P ( Y | X ) . W ithout the causal graph G , we cannot compute the interv entional distribution: P ( Y | do ( X = x )) which reflects the causal effect of setting X to x . Q4: How do the model’ s internal mechanisms function? Internal mechanisms in volve the causal interactions among hidden or intermediate variables within the model. Let H ⊆ V represent internal v ariables (e.g., hidden layers in a neural network). The structural equations for H and their causal relationships with X and Y are gi ven by: H j = f H j (pa( H j ) , U H j ) W ithout knowledge of M , we cannot specify these equations or the causal graph G , preventing us from understanding ho w H mediates between X and Y . 15 Position: Explainable AI is Causality in Disguise Q5: Why does the model make a specific decision for a giv en input? Explaining a specific decision requires identifying the causal factors that led from the input X = x to the output Y = y . T o perform this explanation, we need to: 1. Abduction : Infer the exogenous v ariables U = u consistent with X = x and Y = y . 2. T race Causal Pathways : Use the structural equations to identify how changing X affects Y . W ithout M , we cannot perform abduction because U and F are unknown. Additionally , we cannot trace causal pathways without the causal graph G . Q6: Why would the decision differ if the input had been dif ferent? Answering this question requires counterfactual reasoning , which in volves considering a hypothetical scenario where the input is X = x ′ (different from the observ ed X = x ) and determining the corresponding output Y X = x ′ ( u ) . As per Pearl ( 2009 ), computing counterfactuals in volves: 1. Abduction : Infer U = u from the observed data ( X = x , Y = y ) . 2. Action : Modify the structural equations to reflect the counterfactual intervention do ( X = x ′ ) . 3. Prediction : Compute the counterfactual outcome Y X = x ′ ( u ) using the modified model. W ithout the SCM M , none of these steps can be performed accurately . Note on Markov Equivalence Classes (MECs): If ˆ M and M belong to the same MEC, they share the exact same observ ational distribution, meaning queries Q1 and Q2 will not separate them. Howe ver , models within an MEC must hav e at least one differing edge orientation. Consequently , an interventional query do ( X = x ) on that specific edge, or a counterfactual query (Q5, Q6), will yield different distributions. Because our separating family Q includes these interv entional and counterf actual queries, Assumption 4.3 holds, successfully separating models ev en within the same MEC. Overall Conclusion The absence of causal information—specifically , the structural causal model M —restricts us to the observational distribution P ( V ) , pre venting us from identifying underlying data-generating mechanisms, understanding causal pathways within the model, and performing counterfactual reasoning. Consequently , causal information is essential for providing accurate and reliable explanations in XAI. W ithout it, explanations may be incomplete, incorrect, or misleading. 16 Position: Explainable AI is Causality in Disguise B. A Guided Illustration of Causal Modeling Concepts W e expand on foundational causal modeling ideas through a concrete example as follo ws: 1. Section B.1 illustrates exogenous vs. endogenous variables, 2. Section B.2 contrasts SCM and Potential Outcomes (PO) framew orks, 3. Section B.3 highlights computational limits of exact discov ery , and 4. Section B.6 motiv ates open research questions (see Section C ). B.1. SCM Basics via a Simple Example W e consider a three-variable SCM: SunriseTime → MoodToday ← SleepHours MoodToday = f ( SunriseTime , SleepHours , U ) SunriseTime is an exogenous v ariable—fixed by en vironment or external conditions. MoodToday and SleepHours are endogenous v ariables, defined through structural relationships. Notably , both variable types may or may not be observ ed, and U is unobserved noise. B.2. SCM vs. Potential Outcomes (PO) Our main position is not that structural causal models (SCMs) are the only viable frame work for explainable AI (XAI), but rather that XAI questions ar e inher ently causal , and SCMs provide one principled frame work to address them. Specifically , SCMs support observational, interv entional, and counterfactual reasoning in a unified formalism that makes both conceptual assumptions and computational tradeoffs e xplicit ( Halpern & Pearl , 2005 ; Pearl , 2009 ). The Potential Outcomes (PO) framework, on the other hand, models causal ef f ects as contrasts between unit-lev el responses under hypothetical interventions (e.g., Y i (1) and Y i (0) ) ( Rubin , 2005 ). While PO is well-suited for estimating average treatment effects (A TEs) in randomized trials or observational studies, it is less expressiv e for answering individualized, mechanistic, or counterfactual XAI questions ( Imbens , 2020 ). For e xample: • PO-style query: What is the average ef fect of 8 vs. 4 hours of sleep on mood? • SCM-style query: What would my mood have been today , had I slept 8 hours instead of 4? Moreov er , SCMs naturally represent causal mechanisms, allo w reasoning over latent confounders, and support interventions on learned internal representations—capabilities that are vital in modern machine learning systems. That said, we ackno wl- edge both frame works can be reconciled under certain assumptions and of fer complementary perspectives ( Imbens , 2020 ; Pearl , 2019 ). B.3. Appr oximate vs. Exact Causal Discovery While SCMs of fer a po werful lens for interpreting machine learning models, disco vering the correct SCM from data remains fraught with theoretical and practical challenges: • Non-identifiability . Multiple SCMs can induce the same observational distribution. Without assumptions such as faithfulness, acyclicity , or no hidden confounding, it is impossible to infer the true causal structure solely from data ( Spirtes et al. , 2001 ; Pearl , 2009 ). • Computational Hardness. Even under identifiability , learning the D AG structure is NP-hard. Score-based approaches (e.g., GES) and constraint-based algorithms (e.g., PC, FCI) scale poorly as the number of v ariables increases. Recent work shows that man y interpretability queries—such as counterfactuals or recourse—inherit this complexity ( Adolfi et al. , 2024 ; Barceló et al. , 2020 ; Bassan et al. , 2024 ). 17 Position: Explainable AI is Causality in Disguise Example. In our toy SCM: SunriseTime → MoodToday ← SleepHours if SunriseTime is latent, then e ven observing a statistical correlation between SunriseTime and MoodToday does not distinguish: • SleepHours → MoodToday , • MoodToday → SleepHours , or • a hidden common cause. Thus, causal discovery is often ill-posed and computationally expensi ve. As such, we advocate for approximate or partial discov ery: identifying abstractions or fragments of the true causal structure that are “good enough” to answer XAI queries of interest. This moti vates research into ϵ -approximate causal models and robustness analysis under model uncertainty—directions explored further in Section C . B.4. LIME as Appr oximate Causal Inference W e can vie w local surrogate models like LIME ( Ribeiro et al. , 2016 ) as estimating a localized structural equation Y ≈ P w i X i around a specific input. Howe ver , because LIME relies on perturbing inputs independently , it ignores the true causal graph generating X . Consequently , it generates off-manifold, out-of-distribution samples. Under cov ariate shift, LIME’ s attributions often f ail because the surrogate captures the model’ s arbitrary behavior on these of f-manifold points rather than the true causal mechanism. V iewing LIME through an SCM lens immediately diagnoses this failure mode, demonstrating how making causal assumptions explicit pro vides a rigorous framew ork to gauge effecti veness, diagnose failures predictably , and ultimately improve community con vergence on rob ust explanation standards. B.5. Algorithmic Recourse and Counterfactual Explanations T raditional counterfactual explanations seek to pro vide recourse by finding the minimal feature perturbation required to flip a model’ s prediction (e.g., “If your education lev el were higher, your loan w ould be approv ed”). Howe ver , by treating features as independently manipulable, these methods ignore the underlying causal dependencies of the en vironment. In reality , features are causally intertwined; for instance, obtaining a higher education degree naturally affects both a person’ s age and future salary . If a stakeholder acts on a non-causal recommendation, the downstream effects on other v ariables are unaccounted for , frequently rendering the suggested action physically impossible or entirely ineffecti ve at changing the actual outcome. As demonstrated in Algorithmic Recour se: fr om Counterfactual Explanations to Interventions ( Karimi et al. , 2021 ), meaningful recourse must be formalized not as a geometric nearest-neighbor search, but as a causal intervention. V iewing counterf actual explanations through the lens of an SCM ensures that recommended actions respect the structural equations of the real world, transforming them into v alid, actionable guidance. B.6. Where to Go Next The abov e conceptual challenges moti vate practical and theoretical open research questions outlined in Section C . 18 Position: Explainable AI is Causality in Disguise C. Research Questions f or Advancing Causal Explainability T o help operationalize our position and inspire actionable follow-up, we present a set of concrete research questions suitable for graduate-le vel e xploration. These questions span causal abstraction, approximate modeling, and stak eholder-aligned causal queries, and are supported by relev ant literature to guide early-stage researchers. Question Approach References Causal Abstraction & Representation Learning RQ1 How can we systematically identify and extract human-interpretable concepts from deep neural networks (DNNs) to construct meaningful causal abstractions? Dev elop unsupervised or semi-supervised algorithms le veraging neuro-symbolic methods to identify concept-level abstrac- tions from latent spaces. ( Geiger et al. , 2023 ; Marconato et al. , 2023b ; Zhang , 2024 ) RQ2 Under what conditions can we guarantee that causal abstractions discovered from low-le vel neural representations remain faithful to the original neural network’ s causal structure? Study theoretical guarantees for causal ab- straction under approximate causal mod- eling framew orks. Quantify trade-offs between abstraction fidelity and inter- pretability . ( Beckers & Halpern , 2019 ; Karimi et al. , 2023 ; Rubenstein et al. , 2017 ) Approximate and Partial Causal Models RQ3 How can we formulate and efficiently compute approximate causal explanations that balance computational complexity with explanation accurac y? Formalize a notion of ϵ -approximate causal explanations and de velop scalable algorithms to deriv e such explanations from large-scale models. ( Glymour et al. , 2019 ; Montagna et al. , 2024 ) RQ4 Can we establish rigorous bounds on ho w approximations in causal discov ery meth- ods affect do wnstream XAI tasks such as model debugging or user trust? Empirically and theoretically analyze sen- sitivity to approximation errors; de velop robustness metrics for causal XAI meth- ods. ( Janzing et al. , 2020 ) RQ5 What computational trade-offs arise when using approximate causal models in real- istic XAI scenarios? Study whether relaxing accuracy require- ments yields tangible computational ben- efits or if computational complexity re- mains a core barrier . ( Adolfi et al. , 2024 ; Barceló et al. , 2020 ; Bassan et al. , 2024 ) Human-Aligned Causal Queries & Interactiv e XAI RQ6 How should interactiv e XAI systems lev erage causal interventions to enhance users’ mental models about AI systems? Design and validate interacti ve interfaces (e.g., visual tools or agents) enabling causal interventions, and assess user un- derstanding. ( Miller , 2019 ) RQ7 Which classes of causal queries (observa- tional, interventional, counterf actual) are most effecti ve in various stak eholder con- texts? Conduct domain-specific studies to e val- uate query impact on trust, decision- making, and transparency . ( Mittelstadt et al. , 2019 ; W achter et al. , 2017 ) 19
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment