Axiomatic Foundations of Counterfactual Explanations

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Explaining autonomous and intelligent systems is critical in order to improve trust in their decisions. Counterfactuals have emerged as one of the most compelling forms of explanation. They address ``why not’’ questions by revealing how decisions could be altered. Despite the growing literature, most existing explainers focus on a single type of counterfactual and are restricted to local explanations, focusing on individual instances. There has been no systematic study of alternative counterfactual types, nor of global counterfactuals that shed light on a system’s overall reasoning process. This paper addresses the two gaps by introducing an axiomatic framework built on a set of desirable properties for counterfactual explainers. It proves impossibility theorems showing that no single explainer can satisfy certain axiom combinations simultaneously, and fully characterizes all compatible sets. Representation theorems then establish five one-to-one correspondences between specific subsets of axioms and the families of explainers that satisfy them. Each family gives rise to a distinct type of counterfactual explanation, uncovering five fundamentally different types of counterfactuals. Some of these correspond to local explanations, while others capture global explanations. Finally, the framework situates existing explainers within this taxonomy, formally characterizes their behavior, and analyzes the computational complexity of generating such explanations.

💡 Research Summary

The paper tackles a fundamental gap in the literature on counterfactual explanations for AI systems: while many methods generate local, instance‑specific counterfactuals of a single type, systematic study of alternative counterfactual forms and of global (model‑wide) explanations has been lacking. To fill this void, the authors propose an axiomatic framework that formalizes desirable properties of counterfactual explainers as nine explicit axioms. These axioms capture basic requirements such as always providing at least one explanation (Success), avoiding empty explanations (Non‑Triviality), treating instances with the same prediction identically (Equivalence), restricting explanations to parts of the original instance (Feasibility), limiting explanations to core literals of the predicted class (Coreness), guaranteeing that removing the explanation changes the prediction (Sceptical Validity), ensuring the explanation introduces new literals (Novelty), and two notions of validity (Strong and Weak) that differ in how broadly the explanation forces a class change.

The authors first explore logical relationships among the axioms, showing several implication chains (e.g., Coreness ⇒ Feasibility). They then prove a series of impossibility theorems (Theorem 1) demonstrating that certain triples of axioms cannot be satisfied simultaneously—most notably, Success, Non‑Triviality, and Coreness cannot coexist. This reveals inherent trade‑offs between global versus local explanations and between necessary and sufficient forms of counterfactuals.

Crucially, the paper proves that any combination of axioms not ruled out by the implication and impossibility results is realizable: there exists at least one explainer satisfying that exact set (Theorem 2). This establishes the completeness of the axiom system.

Building on the axioms, the authors distinguish two fundamental families of counterfactuals:

Necessary Reasons – statements of the form “If E were not true, the decision would be different.” These capture features that are mandatory for the current prediction. Within this family they identify:
- Global Necessary Reasons (GNR) – feature‑value sets that, for a given class, guarantee that any instance containing them will be classified into that class. The explainer gNec enumerates all GNRs. It satisfies Coreness and Non‑Triviality, but cannot satisfy Success (by the impossibility result). It also satisfies Equivalence because explanations are class‑wide.
- Local Necessary Reasons – explanations specific to a single instance, derived from the class’s core literals and the Sceptical Validity axiom. These explanations are stricter than global ones and focus on the minimal changes that would flip the prediction.
Sufficient Reasons – statements of the form “Applying E makes the decision change.” Here the emphasis is on sufficiency rather than necessity. The authors further split this family into:
- Strong‑Validity Sufficient Reasons (Sceptical) – explanations that, when added to any instance, guarantee a class change. They satisfy Strong Validity together with Novelty, yielding a “sceptical” style that is robust but computationally demanding (co‑NP‑complete).
- Weak‑Validity Sufficient Reasons (Credulous) – explanations that only need to change the specific instance at hand (Weak Validity). Combined with Novelty, they produce a “credulous” style, which is the most common in existing literature. Generation reduces to a minimal‑change problem and is NP‑hard.
- Global Sufficient Reasons – a less explored variant where a set of literals, when present in any instance, forces a different class globally. This aligns with the Strong‑Validity axiom but without the locality constraint.

Through representation theorems, the paper establishes one‑to‑one correspondences between each subset of axioms and a distinct family of explainers, thereby proving that the five identified types are exhaustive and mutually exclusive under the axiom system.

The authors then map several well‑known counterfactual methods (e.g., Wachter et al., 2017; Mothilal et al., 2020) onto this taxonomy, showing that almost all of them belong to the local credulous sufficient family. The sole existing global method (referenced as

Axiomatic Foundations of Counterfactual Explanations

💡 Research Summary

Comments & Academic Discussion

Leave a Comment