Explainable AI needs formalization
The field of “explainable artificial intelligence” (XAI) seemingly addresses the desire that decisions of machine learning systems should be human-understandable. However, in its current state, XAI itself needs scrutiny. Popular methods cannot reliably answer relevant questions about ML models, their training data, or test inputs, because they systematically attribute importance to input features that are independent of the prediction target. This limits the utility of XAI for diagnosing and correcting data and models, for scientific discovery, and for identifying intervention targets. The fundamental reason for this is that current XAI methods do not address well-defined problems and are not evaluated against targeted criteria of explanation correctness. Researchers should formally define the problems they intend to solve and design methods accordingly. This will lead to diverse use-case-dependent notions of explanation correctness and objective metrics of explanation performance that can be used to validate XAI algorithms.
💡 Research Summary
The paper critically examines the current state of Explainable Artificial Intelligence (XAI) and argues that most popular XAI methods fail to provide reliable answers to the questions they are often employed to address. The authors begin by noting that regulatory frameworks such as the European AI Act demand “human‑understandable” explanations for high‑risk AI systems, yet the prevailing XAI paradigm—primarily feature attribution—does not guarantee that the attributed importance scores reflect any genuine statistical or causal relationship between input features and the prediction target.
To formalize what a correct explanation should satisfy, the authors introduce the Statistical Association Property (SAP). SAP states that any non‑zero importance assigned to a single feature must imply a statistically significant association between that feature and the target variable. In other words, a method that respects SAP can never assign importance to a feature that is independent of the outcome. The paper demonstrates that a wide range of widely used attribution techniques—gradient‑based methods, Layer‑wise Relevance Propagation (LRP), Deep Taylor Decomposition (DTD), SHAP, LIME, Integrated Gradients, counterfactual explanations, and even permutation‑based importance—systematically violate SAP.
The violation is illustrated through two minimal synthetic classification problems (Examples A and B). In Example A, a “suppressor” variable X₂ is statistically independent of the target Y but is correlated with the predictive feature X₁ via additive noise. A Bayes‑optimal linear classifier can improve prediction accuracy by assigning a non‑zero weight to X₂, thereby removing noise from X₁. In Example B, the target Y is constructed as X₁ = Y − X₂, making X₂ completely independent of Y; nevertheless, a linear model with equal weights on X₁ and X₂ perfectly recovers Y. In both cases, the model relies on a suppressor variable that carries no direct information about Y, yet attribution methods assign it substantial importance. This shows that the notion of a model “using” a feature is ambiguous when suppressor variables are present.
The authors discuss three major downstream purposes for which XAI is commonly invoked: (1) model and data diagnostics, (2) scientific discovery, and (3) identification of intervention or recourse targets. For each purpose, they argue that SAP is a necessary prerequisite. Diagnostic use assumes that highlighted features correspond to domain‑expert expectations; if a suppressor is highlighted, the expert may incorrectly deem the model flawed. In fairness assessments, a high importance score for a protected attribute does not necessarily mean the model exploits that attribute; it may merely be adjusting for variance introduced by other features. Scientific discovery relies on the belief that important features reflect genuine associations or causal mechanisms, which is invalid when suppressors dominate the attribution. Finally, algorithmic recourse or counterfactual explanations that suggest changing a suppressor feature may alter the model’s output but will not affect the real‑world outcome, leading to potentially harmful or meaningless interventions.
Recognizing these limitations, the paper calls for a shift from an “algorithm‑first” to a “problem‑first” research agenda. The authors propose a two‑step framework: (i) formally define the explanation problem and the corresponding correctness criteria (such as SAP or other task‑specific properties), and (ii) evaluate XAI methods against these criteria using synthetic data with known ground‑truth explanations and theoretical analysis of the data‑generating process. This approach would yield objective performance metrics, enable fair comparison of methods, and guide the development of use‑case‑specific XAI tools.
In conclusion, the paper asserts that XAI cannot be treated as a monolithic solution; instead, diverse notions of explanation correctness must be articulated for each application domain. Only by grounding XAI methods in formally defined problems and rigorously validated criteria can the field deliver explanations that are truly useful for model validation, scientific insight, and responsible intervention.
Comments & Academic Discussion
Loading comments...
Leave a Comment