Causal Explanations for Image Classifiers
Existing algorithms for explaining the output of image classifiers use different definitions of explanations and a variety of techniques to find them. However, none of the existing tools use a principled approach based on formal definitions of cause and explanation. In this paper we present a novel black-box approach to computing explanations grounded in the theory of actual causality. We prove relevant theoretical results and present an algorithm for computing approximate explanations based on these definitions. We prove termination of our algorithm and discuss its complexity and the amount of approximation compared to the precise definition. We implemented the framework in a tool ReX and we present experimental results and a comparison with state-of-the-art tools. We demonstrate that ReX is the most efficient black-box tool and produces the smallest explanations, in addition to outperforming other black-box tools on standard quality measures.
💡 Research Summary
The paper tackles the fundamental problem of defining and computing explanations for image classifiers without any access to the internal structure of the model. Existing XAI methods either produce heat‑maps, rely on gradients, or use heuristic perturbations, but they lack a rigorous, formal notion of what an explanation actually is. To fill this gap, the authors adopt the theory of actual causality (Halpern 2000) and adapt it to the setting of image classification.
They model a classifier N and a specific input image x as a binary causal model M_N,x. The endogenous variables consist of a mask V = {V₁,…,Vₙ}, one Boolean variable per pixel, and an output variable O that indicates whether the classification of the masked image matches the original classification. Setting Vᵢ = 1 keeps the original pixel value, while Vᵢ = 0 replaces it with a predefined background value (e.g., mean color). An explanation is then defined as a minimal subset of mask variables that, when set to 1, guarantees O = 1; in causal terms this is a minimal actual cause of the classification outcome.
The authors present a concrete algorithm to compute an approximate minimal cause. Starting from the full mask (all Vᵢ = 1), they iteratively test the effect of turning individual pixels off. If the classification remains unchanged, the pixel is permanently removed from the candidate set. To avoid exhaustive search (which is NP‑hard), they embed a binary‑search style refinement that yields an ε‑approximation of the minimal size. The algorithm is proven to terminate, and its worst‑case time complexity is O(|P|·T), where |P| is the number of pixels and T is the time for a single forward pass through N. Empirically, on real images the algorithm converges after removing 95–97 % of the pixels, leaving explanations that occupy only 3–5 % of the image area.
The implementation, called ReX, is evaluated on four benchmark suites: ImageNet, VOC2012, ECSSD, and a collection of partially occluded images. It is compared against a broad spectrum of state‑of‑the‑art XAI tools, both black‑box (LIME, RISE, Noise Tunnel) and white‑box (Grad‑CAM, Guided Back‑Propagation, LRP, Integrated Gradients, SHAP). The evaluation uses four standard quality measures: (1) runtime, (2) explanation size (percentage of pixels retained), (3) insertion curves (how quickly the original class is recovered as pixels are added back), and (4) deletion curves (how rapidly the class score drops as explanatory pixels are removed).
Results show that ReX matches the fastest black‑box methods in runtime while producing the smallest explanations—often less than 1 % of the image, far below competing approaches. Insertion curves, which are widely accepted as a primary quality indicator, are consistently higher for ReX, indicating that its explanations are more informative. Deletion curves are lower, but the authors argue that low deletion scores reflect model robustness rather than explanation quality. They also report lower overlap with irrelevant background regions, confirming that ReX explanations are tightly focused on truly causal pixels.
Beyond empirical validation, the paper contributes several theoretical insights: (i) a formal, causally grounded definition of explanation for image classifiers, (ii) a proof of termination and a complexity bound for the approximation algorithm, and (iii) an analysis of the approximation ratio relative to the true minimal cause. The tool is released as open‑source software (https://github.com/ReX‑XAI/ReX) together with all models and datasets used, ensuring reproducibility.
In summary, the work bridges the gap between philosophical notions of causation and practical XAI for deep vision systems. By treating the classifier as a black‑box causal model and extracting minimal actual causes, ReX delivers concise, verifiable explanations that outperform existing methods on both efficiency and quality metrics. The approach opens avenues for extending causally based explanations to more complex architectures, multimodal data, and other domains where interpretability is critical.
Comments & Academic Discussion
Loading comments...
Leave a Comment