LxCIM: a new rank-based binary classifier performance metric invariant to local exchange of classes

LxCIM: a new rank-based binary classifier performance metric invariant to local exchange of classes
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Binary classification is one of the oldest, most prevalent, and studied problems in machine learning. However, the metrics used to evaluate model performance have received comparatively little attention. The area under the receiver operating characteristic curve (AUROC) has long been a standard choice for model comparison. Despite its advantages, AUROC is not always ideal, particularly for problems that are invariant to local exchange of classes (LxC), a new form of metric invariance introduced in this work. To address this limitation, we propose LxCIM (LxC-invariant metric), which is not only rank-based and invariant under local exchange of classes, but also intuitive, logically consistent, and always computable, while enabling more detailed analysis through the cumulative accuracy-decision rate curve. Moreover, LxCIM exhibits clear theoretical connections to AUROC, accuracy, and the area under the accuracy-decision rate curve (AUDRC). These relationships allow for multiple complementary interpretations: as a symmetric form of AUROC, a rank-based analogue of accuracy, or a more representative and more interpretable variant of AUDRC. Finally, we demonstrate the direct applicability of LxCIM to the bivariate causal discovery problem (which exhibits invariance to local exchange of classes) and show how it addresses the acknowledged limitations of existing metrics used in this field. All code and implementation details are publicly available at github.com/tiagobrogueira/Causal-Discovery-In-Exchangeable-Data.


💡 Research Summary

The paper addresses a subtle but important gap in the evaluation of binary classifiers: most commonly used performance metrics either ignore the confidence information contained in model scores (categorical metrics) or, while exploiting this information (rank‑based metrics such as AUROC), are not invariant to a specific type of symmetry that can arise in certain problem settings. The authors introduce the notion of “local exchange of classes” (LxC), a stronger invariance than the classical I₁ invariance (which only requires a metric to be unchanged when the positive and negative class definitions are swapped globally). LxC invariance demands that a metric remain unchanged under any local perturbation that simultaneously adds a quantity δ₁ to the true‑positive count and subtracts the same quantity from the true‑negative count, while also adding a possibly different quantity δ₂ to the false‑positive count and subtracting it from the false‑negative count. In other words, the metric must depend only on the total number of correct predictions C = TP + TN and the total number of errors I = FP + FN, regardless of how those correct or incorrect predictions are distributed across the two classes.

The authors prove that, among categorical metrics, only accuracy satisfies this LxC invariance, because accuracy can be expressed solely as C / (C + I). Consequently, any rank‑based metric that respects LxC invariance must be a rank‑based analogue of accuracy. Building on this insight, they propose LxCIM (LxC‑Invariant Metric), a novel rank‑based performance measure that fulfills the LxC invariance while retaining the interpretability of a fixed decision threshold.

LxCIM is defined as follows. Given a scoring function s(x) and a bi‑monotonic decision function g(s) (strictly decreasing for scores below a pivot s* and strictly increasing above it), the samples are ordered by decreasing g(s) values, yielding a permutation π. For each position i in this ordering, the indicator I{y_{π(i)} = u(s_{π(i)})} records whether the true label matches the binary prediction u(s) derived from the threshold s*. The metric aggregates these indicators with a weighting proportional to the decision rate i / N, where N is the total number of examples:

LxCIM = (1/N) ∑{i=1}^{N} (i/N) · I{y{π(i)} = u(s_{π(i)})}.

This formulation is mathematically equivalent to the area under the accuracy‑decision‑rate (ADR) curve, also known as the area under the accuracy‑coverage curve (AUDRC). The authors demonstrate three key relationships: (1) LxCIM equals the average of AUROC and its complement (0.5·


Comments & Academic Discussion

Loading comments...

Leave a Comment