Counterfactual Maps: What They Are and How to Find Them
Counterfactual explanations are a central tool in interpretable machine learning, yet computing them exactly for complex models remains challenging. For tree ensembles, predictions are piecewise constant over a large collection of axis-aligned hyperrectangles, implying that an optimal counterfactual for a point corresponds to its projection onto the nearest rectangle with an alternative label under a chosen metric. Existing methods largely overlook this geometric structure, relying either on heuristics with no optimality guarantees or on mixed-integer programming formulations that do not scale to interactive use. In this work, we revisit counterfactual generation through the lens of nearest-region search and introduce counterfactual maps, a global representation of recourse for tree ensembles. Leveraging the fact that any tree ensemble can be compressed into an equivalent partition of labeled hyperrectangles, we cast counterfactual search as the problem of identifying the generalized Voronoi cell associated with the nearest rectangle of an alternative label. This leads to an exact, amortized algorithm based on volumetric k-dimensional (KD) trees, which performs branch-and-bound nearest-region queries with explicit optimality certificates and sublinear average query time after a one-time preprocessing phase. Our experimental analyses on several real datasets drawn from high-stakes application domains show that this approach delivers globally optimal counterfactual explanations with millisecond-level latency, achieving query times that are orders of magnitude faster than existing exact, cold-start optimization methods.
💡 Research Summary
The paper addresses the problem of generating optimal counterfactual explanations for tree‑ensemble classifiers such as random forests and gradient‑boosted trees. The authors observe that any tree ensemble partitions the input space into a finite set of axis‑aligned hyperrectangles, each associated with a constant class label. Consequently, for a query point x with predicted label y and a desired target label y′≠y, the globally optimal counterfactual is obtained by projecting x onto the hyperrectangle of label y′ that lies closest to x under a chosen Lp distance (including weighted variants to model feature‑wise actionability costs).
To exploit this geometric structure, the authors introduce the concept of a “counterfactual map”. A counterfactual map is a function that, for every point in the feature space, returns the nearest hyperrectangle of the target class. The map implicitly defines a generalized Voronoi diagram whose cells are determined by the distance to each hyperrectangle. Computing a counterfactual then reduces to a nearest‑region search followed by a simple projection.
The methodology consists of two stages. In the preprocessing stage, the ensemble is transformed into an equivalent set of disjoint hyperrectangles using the “born‑again tree” technique of Vidal & Schiffer (2020). While an exact minimal partition can be obtained via dynamic programming, the authors adopt the faster heuristic version to keep preprocessing tractable. For each target class y′, the corresponding hyperrectangles H_y′ are indexed in a volumetric KD‑tree. Each KD‑tree node stores a bounding box that encloses all rectangles in its subtree, enabling the computation of admissible lower bounds on the distance from any query point to that subtree.
During the query stage, a branch‑and‑bound algorithm traverses the KD‑tree. A priority queue orders nodes by their lower‑bound distance to the query point. Nodes whose lower bound exceeds the best distance found so far are pruned, guaranteeing that no potentially better rectangle is discarded. When a leaf node is reached, the exact distances to its rectangles are evaluated, and the best candidate is updated. The algorithm terminates when the smallest lower bound among unexplored nodes is no longer better than the current best distance. The authors prove (Theorem 3.1) that this procedure always returns a hyperrectangle H⋆∈arg min_{H∈H_y′} d_p(x,H), and that the projected point Π_{H⋆}(x) is a globally optimal counterfactual under the chosen metric.
The approach supports any Lp norm with 1 ≤ p ≤ ∞, including weighted L1/L2 norms that capture heterogeneous feature costs. The authors also discuss geometric properties of the induced Voronoi cells: for p = 1 or ∞ the bisectors are unions of polyhedra, while for p = 2 they involve quadratic surfaces, making explicit construction of the full diagram impractical. Hence the implicit nearest‑region search is preferred.
Empirical evaluation is performed on four high‑stakes tabular datasets: COMPAS (recidivism), FICO (credit scoring), Breast Cancer, and Pima Diabetes. Random forests of varying depth (3–10) and number of trees (3–100) are trained. The proposed method (CF‑Maps) is compared against exact mixed‑integer programming approaches and several state‑of‑the‑art heuristics. Results show that after a one‑time preprocessing cost, CF‑Maps answer targeted counterfactual queries in 1–5 ms on average, achieving speedups of one to two orders of magnitude over exact baselines while delivering identical minimal Lp distances. The quality of explanations (measured by distance and feasibility) matches that of exact methods, and the method scales gracefully with the number of trees and depth because the KD‑tree query time remains sublinear in the number of hyperrectangles. Memory consumption stays modest because the heuristic partition typically yields far fewer rectangles than the theoretical exponential worst case.
The paper’s contributions are threefold: (1) a formal reduction of counterfactual generation for tree ensembles to a nearest‑region search problem; (2) an exact, branch‑and‑bound KD‑tree algorithm that provides optimality certificates and sublinear average query time; (3) a demonstration that preprocessing can amortize the cost of exact counterfactual computation, making interactive recourse feasible for real‑world, high‑stakes applications. Limitations include the reliance on axis‑aligned partitions (thus not directly applicable to non‑tree models) and the preprocessing overhead for extremely large ensembles, which may require additional compression techniques. Overall, the work presents a practical, theoretically sound framework that bridges the gap between optimal counterfactual explanations and real‑time usability.
Comments & Academic Discussion
Loading comments...
Leave a Comment