Offline changepoint localization using a matrix of conformal p-values

Offline changepoint localization using a matrix of conformal p-values
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Changepoint localization is the problem of estimating the index at which a change occurred in the data generating distribution of an ordered list of data, or declaring that no change occurred. We present the broadly applicable MCP algorithm, which uses a matrix of conformal p-values to produce a confidence interval for a (single) changepoint under the mild assumption that the pre-change and post-change distributions are each exchangeable. We prove a novel conformal Neyman-Pearson lemma, motivating practical classifier-based choices for our conformal score function. Finally, we exemplify the MCP algorithm on a variety of synthetic and real-world datasets, including using black-box pre-trained classifiers to detect changes in sequences of images, text, and accelerometer data.


💡 Research Summary

**
The paper tackles the offline changepoint localization problem—identifying the index at which the data‑generating distribution switches, or declaring that no change occurred—under a remarkably weak statistical assumption: both the pre‑change and post‑change distributions are exchangeable. Traditional changepoint methods typically require parametric knowledge of the two distributions, focus only on mean or variance shifts, or provide only point estimates without finite‑sample confidence guarantees. In contrast, the authors introduce the MCP (Matrix of Conformal p‑values) algorithm, which leverages conformal prediction to construct valid, distribution‑free confidence sets for a single changepoint.

Core methodology
For each candidate changepoint (t) the algorithm builds two separate conformal score families: a “left” family (s^{(0)}r) that treats the first (t) observations as pre‑change and the remaining (n-t) as post‑change, and a “right” family (s^{(1)}r) that swaps the roles. For every (r) (the size of a subsample) and every observation (j) within the relevant side, a score (\kappa^{(0)}{t,rj}) (or (\kappa^{(1)}{t,rj})) is computed. Using the standard conformal p‑value construction—counting how many scores exceed the held‑out score and adding a uniform random tie‑breaker—the algorithm obtains a matrix of left‑side p‑values (p^{(0)}{t,r}) and right‑side p‑values (p^{(1)}{t,r}). Exchangeability guarantees that each entry is exactly Uniform(0,1) in finite samples, regardless of the underlying distributions.

These p‑values are then aggregated into empirical distribution functions (\hat F_0) and (\hat F_1). The algorithm computes Kolmogorov‑Smirnov distances between each empirical CDF and the uniform CDF, scaling them by (\sqrt{t}) and (\sqrt{n-t}) respectively to obtain statistics (W^{(0)}_t) and (W^{(1)}_t). A secondary conformal test (Algorithm 2) converts each statistic into a left‑ and right‑p‑value. Finally, a chosen combination rule (e.g., Fisher’s method, Stouffer’s Z‑score, or minimum p‑value) merges the two into a single candidate p‑value (p_t) for changepoint (t).

Confidence set construction
Given a significance level (\alpha), the confidence set is defined as
\


Comments & Academic Discussion

Loading comments...

Leave a Comment