Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification

Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In high-stakes risk prediction, quantifying uncertainty through interval-valued predictions is essential for reliable decision-making. However, standard evaluation tools like the receiver operating characteristic (ROC) curve and the area under the curve (AUC) are designed for point scores and fail to capture the impact of predictive uncertainty on ranking performance. We propose an uncertainty-aware ROC framework specifically for interval-valued predictions, introducing two new measures: $AUC_L$ and $AUC_U$. This framework enables an informative three-region decomposition of the ROC plane, partitioning pairwise rankings into correct, incorrect, and uncertain orderings. This approach naturally supports selective prediction by allowing models to abstain from ranking cases with overlapping intervals, thereby optimizing the trade-off between abstention rate and discriminative reliability. We prove that under valid class-conditional coverage, $AUC_L$ and $AUC_U$ provide formal lower and upper bounds on the theoretical optimal AUC ($AUC^*$), characterizing the physical limit of achievable discrimination. The proposed framework applies broadly to interval-valued prediction models, regardless of the interval construction method. Experiments on real-world benchmark datasets, using bootstrap-based intervals as one instantiation, validate the framework’s correctness and demonstrate its practical utility for uncertainty-aware evaluation and decision-making.


💡 Research Summary

The paper addresses a critical gap in the evaluation of binary classifiers that output interval‑valued predictions rather than single point scores. Traditional ROC curves and the associated Area Under the Curve (AUC) assume a total ordering of instances based on scalar scores; they cannot capture the explicit ambiguity introduced when prediction intervals overlap. To remedy this, the authors propose an uncertainty‑aware ROC framework that defines two new performance measures, AUC_L (lower) and AUC_U (upper), together with a three‑region decomposition of the ROC plane.

The setup considers a classifier that, for each input x, returns an interval I(x) = (L(x), U(x)) with L ≤ U. For a randomly drawn positive instance I₁ = (L₁, U₁) and a negative instance I₀ = (L₀, U₀), three mutually exclusive outcomes are defined: (i) confidently correct if L₁ > U₀, (ii) confidently incorrect if U₁ < L₀, and (iii) uncertain if the intervals overlap. These outcomes induce a partial order rather than a total order.

Two ROC‑style curves are constructed by pairing different rate functions:

  • Strict curve: TPR_L(t) = P(L₁ > t) versus FPR_U(t) = P(U₀ > t). This curve is conservative, counting a true positive only when the entire positive interval lies above the threshold and counting a false positive whenever any part of a negative interval exceeds the threshold. Its area is defined as AUC_L. Theorem 1 shows that AUC_L = P(L₁ > U₀) = P(I₁ > I₀), i.e., the probability of a definitively correct ranking.
  • Permissive curve: TPR_U(t) = P(U₁ > t) versus FPR_L(t) = P(L₀ > t). This curve is liberal, granting a true positive when any part of the positive interval exceeds the threshold and penalizing a false positive only when the entire negative interval exceeds it. Its area is denoted AUC_U, and Theorem 2 proves AUC_U = 1 – P(U₀ > L₁) = 1 – P(I₀ > I₁).

Geometrically, the ROC plane is split into three disjoint regions: the blue region (area = AUC_L) corresponds to confident correct orderings, the red region (area = 1 – AUC_U) to confident errors, and the white region to overlapping intervals (uncertainty). This three‑region decomposition provides a complete probabilistic characterization: P(I₁ > I₀) + P(I₁ < I₀) + P(overlap) = 1.

A major theoretical contribution is the connection to the optimal AUC, denoted AUC*. Under a mild class‑conditional coverage assumption (the intervals contain the true conditional probability with the prescribed confidence), the authors prove the bounds AUC_L ≤ AUC* ≤ AUC_U. Thus, observable interval predictions yield provable lower and upper bounds on the best achievable discrimination, revealing how predictive uncertainty fundamentally limits performance.

Empirical validation uses the Pima Indians Diabetes dataset with bootstrap‑derived 90 % confidence intervals. By varying the bootstrap sample size, the authors manipulate interval width and observe the resulting changes in overlap probability, AUC_L, and AUC_U. Wider intervals increase overlap, reduce AUC_L, and leave AUC_U relatively stable, illustrating the trade‑off between certainty and discriminative power. The framework also supports selective prediction: instances whose intervals overlap a decision threshold can be abstained from, allowing practitioners to control the abstention rate while monitoring how AUC_L and AUC_U evolve.

Importantly, the proposed methodology is agnostic to how intervals are generated; it applies equally to bootstrap, Bayesian credible intervals, conformal prediction sets, or ensemble quantiles. By integrating uncertainty directly into ROC analysis, the paper offers a principled, interpretable, and practically useful tool for evaluating and deploying interval‑aware classifiers in high‑stakes domains.


Comments & Academic Discussion

Loading comments...

Leave a Comment