Online Selective Conformal Prediction with Asymmetric Rules: A Permutation Test Approach
Selective conformal prediction aims to construct prediction sets with valid coverage for a test unit conditional on it being selected by a data-driven mechanism. While existing methods in the offline setting handle any selection mechanism that is permutation invariant to the labeled data, their extension to the online setting – where data arrives sequentially and later decisions depend on earlier ones – is challenged by the fact that the selection mechanism is naturally asymmetric. As such, existing methods only address a limited collection of selection mechanisms. In this paper, we propose PErmutation-based Mondrian Conformal Inference (PEMI), a general permutation-based framework for selective conformal prediction with arbitrary asymmetric selection rules. Motivated by full and Mondrian conformal prediction, PEMI identifies all permutations of the observed data (or a Monte-Carlo subset thereof) that lead to the same selection event, and calibrates a prediction set using conformity scores over this selection-preserving reference set. Under standard exchangeability conditions, our prediction sets achieve finite-sample exact selection-conditional coverage for any asymmetric selection mechanism and any prediction model. PEMI naturally incorporates additional offline labeled data, extends to selection mechanisms with multiple test samples, and achieves FCR control with fine-grained selection taxonomies. We further work out several efficient instantiations for commonly-used online selection rules, including covariate-based rules, conformal p/e-values-based procedures, and selection based on earlier outcomes. Finally, we demonstrate the efficacy of our methods across various selection rules on a real drug discovery dataset and investigate their performance via simulations.
💡 Research Summary
The paper addresses the problem of selective conformal prediction in an online setting where a data‑driven selection rule determines whether a prediction set should be issued for a test point. Existing offline methods rely on the selection rule being permutation‑invariant with respect to the labeled data; this assumption fails online because the order of past observations influences later decisions, making the rule inherently asymmetric. Consequently, prior online approaches only cover a narrow class of “decision‑driven” rules and often produce overly conservative or even vacuous prediction sets.
To overcome these limitations, the authors propose PEMI (Permutation‑based Mondrian Conformal Inference), a general framework that restores exchangeability by working at the level of data permutations rather than individual data points. The key idea is to consider the set of all permutations (or a Monte‑Carlo subset) of the observed data ((Z_1,\dots,Z_t)). For each candidate label (y) of the current test point, PEMI identifies the subset of permutations that would leave the selection event (S_t=1) unchanged; this subset is called the reference set. Because the underlying data are exchangeable, the random permutation conditional on belonging to the reference set remains uniformly distributed, preserving the essential symmetry needed for conformal inference.
Two theoretical regimes are established:
- Full‑permutation case – when the reference set consists of all permutations that preserve the selection event, PEMI yields finite‑sample exact selection‑conditional coverage (SCC) for any asymmetric rule and any predictive model.
- Monte‑Carlo case – when only a random collection of permutations is used, the same coverage guarantee follows from the exchangeability of the sampled permutations.
The framework naturally extends to (i) incorporating additional offline labeled data, (ii) handling multiple test points arriving simultaneously, and (iii) controlling the false coverage rate (FCR) under fine‑grained selection taxonomies. Importantly, when the selection rule is symmetric, PEMI reduces to the previously proposed Joint Mondrian Conformal Inference (JOMI), showing that it strictly generalizes existing methods.
From a computational standpoint, the authors develop efficient instantiations for three common families of online selection rules:
- Covariate‑only rules – where selection depends solely on the covariates ({X_i}). Here the reference set can be constructed by reordering covariates without touching labels, leading to an (O(t)) algorithm.
- Online multiple‑testing procedures based on conformal p‑values or e‑values. PEMI leverages the monotonicity of p‑value rankings to identify preserving permutations efficiently.
- Selection based on weighted quantiles or averages of past labels. By pre‑computing how weighted statistics change under permutations, the reference set can be obtained in (O(t\log t)) time.
The authors validate PEMI on a real drug‑discovery dataset (several thousand compounds with high‑dimensional features) and extensive simulations. Compared with state‑of‑the‑art online selective conformal methods such as CAP and EXPRESS, PEMI achieves:
- Smaller average prediction sets (30–45 % reduction), indicating less conservatism.
- Exact SCC: empirical coverage matches the nominal level (e.g., 0.099–0.101 for (\alpha=0.1)).
- Rare vacuous sets: the proportion of infinite‑size prediction sets drops below 0.2 % versus 5–12 % for competing methods.
- Scalable Monte‑Carlo approximation: comparable statistical performance with a 5‑fold reduction in runtime.
Overall, PEMI provides a unified, theoretically sound, and practically efficient solution for online selective conformal prediction under arbitrary asymmetric selection mechanisms. The paper concludes with several promising directions: extending to non‑exchangeable time‑series data, integrating deep learning score functions, and optimizing memory/computation for real‑time deployment.
Comments & Academic Discussion
Loading comments...
Leave a Comment