CounterFlowNet: From Minimal Changes to Meaningful Counterfactual Explanations
Counterfactual explanations (CFs) provide human-interpretable insights into model’s predictions by identifying minimal changes to input features that would alter the model’s output. However, existing methods struggle to generate multiple high-quality explanations that (1) affect only a small portion of the features, (2) can be applied to tabular data with heterogeneous features, and (3) are consistent with the user-defined constraints. We propose CounterFlowNet, a generative approach that formulates CF generation as sequential feature modification using conditional Generative Flow Networks (GFlowNet). CounterFlowNet is trained to sample CFs proportionally to a user-specified reward function that can encode key CF desiderata: validity, sparsity, proximity and plausibility, encouraging high-quality explanations. The sequential formulation yields highly sparse edits, while a unified action space seamlessly supports continuous and categorical features. Moreover, actionability constraints, such as immutability and monotonicity of features, can be enforced at inference time via action masking, without retraining. Experiments on eight datasets under two evaluation protocols demonstrate that CounterFlowNet achieves superior trade-offs between validity, sparsity, plausibility, and diversity with full satisfaction of the given constraints.
💡 Research Summary
CounterFlowNet introduces a novel generative framework for producing counterfactual explanations (CFs) that simultaneously addresses three major challenges: sparsity of modifications, handling heterogeneous tabular data (continuous and categorical features), and enforcing user‑defined actionability constraints without retraining. The method casts CF generation as a sequential decision process within the Conditional Generative Flow Networks (GFlowNet) paradigm. Each step consists of two sub‑actions: (i) selecting a feature index d to modify, and (ii) assigning a new value v to that feature. Continuous attributes are discretized into a finite set of bins, while categorical attributes retain their one‑hot representation, yielding a unified discrete action space that simplifies flow‑matching and ensures that the GFlowNet’s probability mass can be defined over a countable state space.
The core of CounterFlowNet is a reward function R(x′|x₀, y′) that aggregates four desiderata—validity, proximity, plausibility, and sparsity—through a multiplicative composition: R = R_v^λv · R_d^λd · R_p^λp · R_s^λs. Validity (R_v) is a continuous, margin‑based clipping of the target‑class probability, providing graded feedback during exploration. Proximity (R_d) penalizes the distance between the original instance x₀ and the candidate counterfactual x′, typically using an exponential decay of an L₁/L₂ norm. Plausibility (R_p) measures how likely x′ is under the data distribution, e.g., via a pretrained density estimator or k‑nearest‑neighbor likelihood. Sparsity (R_s) rewards fewer modified features, encouraging minimal edits. The exponents λ allow practitioners to weight or disable any component, offering flexibility for domain‑specific priorities.
Actionability constraints such as immutability (features that must not change) or monotonicity (features that can only increase) are incorporated through action masking. At inference time, the policy’s action logits for prohibited moves are forced to zero, guaranteeing that the sampled CFs respect the constraints without any post‑hoc filtering or model retraining.
Training follows the standard GFlowNet trajectory‑balance objective. The forward policy P_F generates a sequence of (d, v) actions until a STOP token is emitted, while a backward policy P_B learns the reverse dynamics. Flow conservation is enforced so that the total incoming flow to any state equals the outgoing flow, and the flow reaching a terminal state equals its reward. Because the reward need not be differentiable, complex, non‑smooth constraints can be directly encoded.
Empirical evaluation spans eight public tabular datasets (e.g., Adult, Credit, HELOC, COMPAS) and two benchmark protocols. Protocol 1 measures the trade‑off between validity and sparsity; CounterFlowNet achieves comparable or higher validity while modifying on average only 1–2 features, whereas optimization‑based baselines such as DiCE or MCCE typically alter 3–5 features. Protocol 2 assesses diversity (pairwise Jaccard distance) and plausibility (likelihood under the data distribution). CounterFlowNet consistently outperforms both optimization‑based and other generative baselines (CeFlow, DiCoFlex, MCCE) on these metrics, demonstrating that sampling proportionally to the defined reward yields a richer set of high‑quality CFs. Importantly, when action masks enforce immutability or monotonicity, performance degradation is negligible, confirming that constraints are seamlessly integrated into the generation process.
Sampling speed is also favorable: after a single offline training phase, the forward policy can generate multiple CFs for a new instance in tens of milliseconds, making the approach suitable for real‑time decision‑support scenarios.
In summary, CounterFlowNet advances the state of the art in counterfactual explanation generation by (1) leveraging sequential feature edits to naturally promote sparsity, (2) providing a unified discrete action space that handles mixed‑type tabular data, (3) directly optimizing a user‑configurable reward that captures validity, proximity, plausibility, and sparsity, and (4) embedding actionability constraints via masking without extra training overhead. These innovations make CounterFlowNet a practical XAI tool for regulated domains such as finance, healthcare, and law, where transparent, actionable, and constraint‑aware explanations are essential.
Comments & Academic Discussion
Loading comments...
Leave a Comment