OPTICS: Order-Preserved Test-Inverse Confidence Set for Number of Change-Points
Determining the number of change-points is a first-step and fundamental task in change-point detection problems, as it lays the groundwork for subsequent change-point position estimation. While the existing literature offers various methods for consi…
Authors: Ao Sun, Jingyuan Liu
OPTICS: Order-Preserv ed T est-In v erse Confidence Set for Num b er of Change-P oin ts Ao Sun Data Sciences and Op erations Departmen t, Univ ersit y of Southern California and Jingyuan Liu Departmen t of Statistics, Xiamen Univ ersit y Marc h 31, 2026 Abstract Determining the n umber of change-points is a first-step and fundamen tal task in c hange-p oin t detection problems, as it la ys the groundwork for subsequent change- p oin t p osition estimation. While the existing literature offers v arious metho ds for consisten tly estimating the num b er of change-points, these metho ds typically yield a single p oin t estimate without any assurance that it recov ers the true num b er of c hanges in a sp ecific dataset. Moreo ver, ac hieving consistency often hinges on very stringen t conditions that can b e challenging to v erify in practice. T o address these is- sues, w e in tro duce a unified test-in verse pro cedure to construct a confidence set for the n umber of c hange-p oin ts. The prop osed confidence set pro vides a set of possible v al- ues within which the true n umber of change-points is guaran teed to lie with a specified lev el of confidence. W e further pro ved that the confidence set is sufficien tly narrow to b e pow erful and informative b y deriving the order of its cardinality . Remark ably , this confidence set can b e established under more relaxed conditions than those required b y most p oin t estimation techniques. W e also advocate multiple-splitting pro cedures to enhance stability and extend the prop osed method to hea vy-tailed and dependent settings. As a byproduct, w e ma y also lev erage this constructed confidence set to assess the effectiv eness of p oin t-estimation algorithms. Through extensiv e simulation studies, we demonstrate the superior p erformance of our confidence set approac h. Additionally , we apply this metho d to analyze a bladder tumor microarray dataset. Supplemen tary Material, including pro ofs of all theoretical results, computer co de, the R pac k age, and extended simulation studies, are av ailable online. Keywor ds: Change-p oin t detection, Cross-v alidation, Order-preserv ed data splitting, Hy- p othesis testing, Confidence level 1 1 In tro duction 1.1 Motiv ation and in tuition Estimating the n umber of change-points is a fundamen tal task in c hange-p oin t detection problems, as a consistent estimation of the num b er usually leads to consistent estimation of change lo cations (Harc haoui and L´ evy-Leduc, 2010; W ang et al., 2021). Therefore, the consistency in c hange-p oin t detection problems can typically b e form ulated as lim inf n →∞ Pr { ˆ K = K ∗ } = 1 , (1.1) where ˆ K is the estimated n umber of change-points, and K ∗ is the true n um b er of c hange- p oin ts. Classical metho ds for obtaining a consistent ˆ K are mainly based on the Ba y esian information criterion (BIC, Sch warz (1978)). See for instance Y ao (1988); Bai (1998); Braun et al. (2000); Zhang and Siegm und (2012); F ryzlewicz (2014) and Cho and F ry- zlewicz (2015). Additionally , some recent approaches, such as Padilla et al. (2021a); W ang et al. (2021); P adilla et al. (2021b), hav e embraced a hard threshold approach. Both the BIC-based metho ds and hard-threshold tec hniques require parameter tuning, and the consistency of their estimates highly relies on these tuning parameters. T o mitigate this dep endence on tuning, Zou et al. (2020) introduced an Order-Preserved Sample-Splitting Pro cedure (COPSS) that selects the n um b er of c hange-p oin ts b y optimizing out-of-sample prediction, thus is tuning-free. Nev ertheless, these metho ds only provide p oint estimates for the num b er of c hange- p oin ts, without an y guaran tee that the true n umber K ∗ can b e reco vered on a finite-sample basis. As a result, the consistency result may not b e sufficient in practice. T o intuitiv ely illustrate, w e conduct a simple simulation for COPSS, using Binary Segmen tation (BS, F ryzlewicz (2014)) as the change-position detector. The detailed mo del settings are pro- vided in Section 5. According to COPSS, the true n umber K ∗ can b e correctly iden tified 2 only when K ∗ is the minimizer of the out-of-sample prediction error; that is, in an ascend- ing order, K ∗ should rank the first in terms of prediction error among all the candidate n um b ers of c hange-p oin ts. Left picture in Figure 1 depicts the piec hart of such ranks for K ∗ o v er 500 simulatio n runs. W e observe that in only 55.8% runs, K ∗ can rank the first and hence b e successfully reco vered. In brief, a single-point estimate can be unreliable; a misjudgment in the num b er of c hange-p oin ts could further result in erroneous estimations of c hange positions. T o this end, inspired by the concept of confidence in terv al, a more pruden t approac h is constructing a confidence set, denoted as A , for K ∗ , with a specified confidence lev el 1 − α , suc h that lim inf n →∞ Pr { K ∗ ∈ A} ≥ 1 − α. This task app ears challenging since the construction of confidence sets for the n umber of c hange-p oin ts demands an in vestigation into the asymptotic b ehavior of a discrete random v ariable. How ever, Right picture in Figure 1 offers an insigh t for this problem. The figure depicts a histogram of differences in prediction errors b et ween the mo dels fitted with the true num b er of c hange-p oin ts K ∗ and those fitted with the minimizer ˆ K . W e observe that while the mo dels with K ∗ ma y not alwa ys lead to optimal out-of-sample prediction p erformance, they t ypically exhibit relativ ely small deviations from the optimum. In light of this and motiv ated by Lei (2020), we address the challenge of constructing a confidence set b y framing it as a testing problem. In this testing framework, the null hypothesis p osits that the selected n um b er is indeed optimal from the p ersp ectiv e of out-of-sample prediction error. Note that in h yp othesis testing, the n ull hypothesis is not rejected unless there exists substantial evidence to the alternativ e. Therefore, although the true num b er need not b e the minimizer, it will likely not b e rejected unless the observ ed difference is statistically significant. W e then collect those num b ers that are not rejected to form the confidence set A . When the significance lev el is set as α , w e will show that the true n umber 3 of change-points lies in the confidence set with probabilit y at lease 1 − α . Figure 1: Left figure: Piec hart for ranks of K ∗ in an ascending order according to prediction error. Righ t figure: Histogram of the difference in the out-of-sample prediction errors b et w een the mo dels fitted with the true n umber of c hange-p oin ts K ∗ and those fitted with the minimizer ˆ K . Apart from the desirable cov erage probabilit y , w e also delve into the cardinality of the prop osed confidence set, which reflects the p o wer of tests from the p ersp ectiv e of statistical inference, or the rate of false negatives in the context of mo del selection. Therefore, the theory of cardinalit y holds its own significance. W e establish that, with an o v erwhelming probabilit y , the selected confidence set p ossesses non trivial p o w er with a b ounded cardi- nalit y . F urthermore, the cardinalit y of the confidence set can serve as a metric for assessing the efficacy and stabilit y of a change-point detection algorithm in the detection stage. When an inefficient algorithm is emplo y ed, it b ecomes challenging to distinguish the true n um b er of change-points K ∗ from other p oten tial c hoices, resulting in the confidence set con taining man y candidates to achiev e the desired cov erage rate. Con v ersely , with an efficien t and stable detection algorithm, the true num b er K ∗ often exhibits significantly sup erior p erformance compared to other choices. Consequen tly , the n ull h yp othesis will only b e retained when K is closely aligned with K ∗ , leading to a small cardinality for the confidence set. 4 1.2 Our con tributions Mean while, some w orks in the change-point literature could p oten tially facilitate con- structing confidence in terv als or sets for change locations. F or instance, Y ao and Au (1989) deriv ed the asymptotic distribution of change lo cations in the context of one-dimensional mean change problems. Similarly , Bai and P erron (1998) inv estigated the asymptotic b e- ha vior of change lo cations in structural breaking mo dels. Ho w ever, the asymptotic distri- butions in these studies rely on unkno wn p opulation-lev el parameters, such as the means and v ariances of noises, as well as the true n um b er of change-points. T o the b est of our kno wledge, no researc h has delved in to the asymptotic distributions of the num b er of c hange-p oin ts. In ligh t of this, we prop ose a test-based framew ork to circum v en t the asymptotic dis- tribution of n umber of change-points. This framework was inspired b y Lei (2020), whic h fo cused on linear mo dels and constructed the confidence in terv al for num b er of active pre- dictors via cross-v alidation. How ever, it’s crucial to recognize the substantial differences b et w een linear mo dels and c hange-p oin t detection problems in b oth metho dology and the whole theoretical foundation; ev en the con v en tional cross-v alidation techniques are not ap- plicable in the con text of c hange-p oin t detection. F or instance, in c hange-p oin t problems, b oth the mean v alues and v ariances of individual data p oin ts migh t b e non-stationary . Hence, obtaining a consistent estimator of the cov ariance matrix b ecomes c hallenging. This, in turn, causes failure of the Gaussian m ultiplier b o otstrap pro cedure, whic h is cru- cial for constructing confidence in terv als. F urthermore, Lei (2020) did not tak e in to accoun t the length of their prop osed confidence interv als, which in turn impacts the p o wer of tests. Consequen tly , if w e include an excessiv e n umber of candidate v alues, the resulting in terv al could b ecome uninformativ e ev en if the cov erage rate is guaran teed. In this pap er, w e establish the confidence set for the num b er of change-points by incor- p orating order-preserved splitting and m ultiplier bo otstrap, and systematically establish 5 the theoretical prop erties of the prop osed set in the c hange-p oin t framework. In terest- ingly , the theory of the prop osed confidence set can b e built up on w eak er conditions than the corresp onding p oin t estimates of num b er of c hange-p oin ts. Additionally , b eyond the scop e of cov erage rate, w e deriv e a sharp upp er bound for the cardinalit y of the confidence set, whic h plays a crucial role in con trolling the false negativ e rate and ensuring that the asso ciated test has non trivial p ow er. The prop osed OPTICS metho d is then extended to handle hea vy-tailed and m -dep endent settings. The effectiv eness of these extensions is v erified through simulations. As a b ypro duct, w e provide an easy-to-implemen t R pack age, OPTICS , av ailable on GitHub to implemen t the prop osed method. 1 1.3 Notations W e introduce the follo wing notations used throughout this pap er. F or t w o sequences a n and b n , a n ≲ b n ( a n ≳ b n ) means with probabilit y approaching one, a n ≤ cb n ( b n ≤ ca n ) for some c > 0 and sufficiently large n ; a n ≫ b n stands for a n /b n → ∞ . a := b represen ts that a is defined as b . Giv en a vector x = [ x 1 , . . . , x d ] ⊤ ∈ R d , its ℓ 2 -norm is defined as ∥ x ∥ 2 = P d j =1 | x j | 2 1 / 2 , and its ℓ ∞ -norm is ∥ x ∥ ∞ = max j =1 ,...,d | x j | . F or a random v ariable X , its Orlicz norm ∥ X ∥ ψ β = inf { C > 0 : E [ ψ β ( | X | /C )] ≤ 1 } , where ψ β ( x ) := exp( x β ) − 1 for β = 1 , 2. Let T K = { τ 1 , τ 2 , . . . , τ K } b e a set of K change-points for a size- m sequence { x i , i = 1 , 2 , . . . , m } , where τ 0 < τ 1 < . . . < τ K . Denote ¯ x τ k ,τ k +1 as the a v erage of { x i , i = τ k + 1 , τ k + 2 , . . . , τ k +1 } , and set S 2 x ( T K ) = P K k =0 P τ k +1 i = τ k +1 ∥ x i − ¯ x τ k ,τ k +1 ∥ 2 2 . Moreov er, let ˜ T ˜ K b e another set of ˜ K change-points. Denote S 2 x ( T K ∪ ˜ T ˜ K ) = S 2 x (sort( T K ∪ ˜ T ˜ K )), where sort( A ) represents the set of sorted elements of A in ascending order. F or a matrix X , v ec h( X ) b e the vectorization of the low er half matrix of X , otherwise, if x is a vector, w e define vec h( x ) = x . W e denote λ min ( X ) and λ max ( X ) as the smallest and largest eigenv alues of the matrix X , resp ectiv ely . F or a set A , |A| denotes its cardinalit y . 1 A is the indicator 1 https://github.com/suntiansheng/OPTICS 6 function that tak es v alue 1 when A is true and 0 otherwise. 1.4 Organization of the pap er The rest of the pap er is organized as follows. In Section 2, w e prop ose an Order- Preserv ed T est-Inv erse Confidence Set (OPTICS) for the n um b er of change-points - the metho dology , intuition, algorithm and practical guidelines are pro vided. In Section 3, we systematically study the theoretical prop erties of the OPTICS, including but not limited to the co verage rate and the asymptotic b ound for the cardinalit y of OPTICS. The finite- sample p erformance of OPTICS is empirically v erified through several simulation studies in Section 5. In Section 6, w e apply construct the OPTICS for a bladder tumor microarray dataset. Section 7 concludes the pap er. The Supplementary Material contains the pro ofs of theoretical results, the additional literature review, sim ulations and real data results of the main pap er. 2 Metho dology: OPTICS for n um b er of c hange-p oin ts 2.1 A general c hange-p oin t mo del and its score transformation Supp ose a sequence of indep enden t data observ ations Z = { z 1 , . . . , z 2 n } are collected from the follo wing m ultiple change-point mo del: z i ∼ m ( · | β ∗ k ) , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1; i = 1 , . . . , 2 n. (2.1) In mo del (2.1), the sample size is set to b e 2 n for the later notation conv enience. K ∗ is the true n umber of change-points that is allo w ed to v ary with n . τ ∗ ′ k s are the lo cations of true change-points; for conv ention, set τ ∗ 0 = 0 and τ ∗ K ∗ +1 = 2 n , so that the change-points in T ∗ = { τ ∗ 1 , . . . , τ ∗ K ∗ } partition the 2 n sample p oin ts in to k + 1 segmen ts. m ( · | β ∗ k ) represen ts 7 a certain mo del structure for the k th segment, with a d -dimensional parameter v ector β ∗ k , where β ∗ k = β ∗ k +1 . W e assume d is fixed throughout the man uscript. This mo del setting is quite general, including mean changes (Hao et al., 2013), v ariance changes (Chen and Gupta, 1997), structural changes in regression mo del (Bai and Perron, 1998), Cov ariance c hange-p oin ts mo del (Aue et al., 2009) and net w ork change-points model (W ang et al., 2021; F an et al., 2025). See more discussio in Section S.4 of Supplemen tary Material. Our primary goal in this pap er is to construct a confidence set for K ∗ . Inspired by Zou et al. (2020), we em b ed the generic mo del (2.1) into a multiple mean c hange-p oin t detection problem via a score-type transformation. Sp ecifically , let ℓ ( β ; z i ) b e a plausible loss function for data point z i , thus the score function can b e defined as its deriv ative s β ( z i ) = v ec h( ∂ ℓ ( β ; z i ) /∂ β ). In tuitively , if the score is identifiable, for i ∈ ( τ ∗ k − 1 , τ ∗ k ] and i ′ ∈ ( τ ∗ k , τ ∗ k +1 ], we should hav e E { s γ ( z i ) } = 0 if and only if γ = β ∗ k , and E { s γ ( z i ′ ) } = 0 if and only if γ = β ∗ k +1 . Hence, given a fixed d -dimensional vector γ , t ypically E { s γ ( z i ) } = E { s γ ( z i ′ ) } . This motiv ates us to decomp ose the score into s i := s γ ( z i ) = µ i + ϵ i , i = 1 , . . . , 2 n, (2.2) where µ i = E[ s γ ( z i )], and ϵ i = s γ ( z i ) − E[ s γ ( z i )]. Denote the co v ariance matrix Cov( ϵ i ) = Σ ( k ) when i ∈ ( τ ∗ k , τ ∗ k +1 ]. Note that the score-t yp e transformation generally remains in- v arian t regardless of the choice of γ . Hence, we can c ho ose any γ , such as γ = 0 or γ := arg min β P z i ∈Z ℓ ( β ; z i ). 2.2 Construction of OPTICS In this subsection, w e prop ose an order-preserv ed test-inv erse confidence set (OPTICS) for the num b er of change-points. Let M = { 1 , 2 , . . . , K max } b e a ten tative candidate set of the n umber of c hange-p oin ts in mo del (2.1), where K max > K ∗ is allo wed to increase with the sample size 2 n ; a commonly adopted conv ention for K max is K max = log( n ) (Zou et al., 8 2020). Our ob jectiv e is to identify a small subset of M that co v ers the true num b er K ∗ with a sp ecified lev el of confidence. T o b egin with, w e in tro duce a criterion for ev aluating the mo del-fitting capability of each K ∈ M , utilizing an order-preserv ed data-splitting tec hnique (Zou et al., 2020). Considering the in trinsic order structure of c hange-p oint problems, we divide the data in to the follo wing “Odd sample” Z O and “Ev en sample” Z E based on the parit y of temp oral order: Z O = { z 2 i − 1 , i = 1 , . . . , n } and Z E = { z 2 i , i = 1 , . . . , n } . Giv en each K ∈ M , certain base change-point detection algorithm can b e adopted to estimate the c hange-p osition set T K = { τ K 1 , . . . , τ K K } using the odd sample Z O . Then a natural criterion can be defined, based on the even sample Z E , as C ′ ( T K ; Z E ) := 1 n S 2 s E ( T K ) = 1 n K X k =0 τ K k +1 X i = τ K k +1 ∥ s E i − ¯ s E τ K k ,τ K k +1 ∥ 2 2 , where s E i is the score calculated using Z E and ¯ s E τ K k ,τ K k +1 is the av erage of { s E i , i ∈ [ τ K k + 1 , τ K k +1 ] } . How ever, C ′ ( T K ; Z E ) gauges the predictive p erformance within the ev en sample Z E itself, wherein ov er-fitting alwa ys app ears adv antageous. In addition, the term ¯ s E τ K k ,τ K k +1 in C ′ ( T K ; Z E ) renders ∥ s E i − ¯ s E τ K k ,τ K k +1 ∥ 2 2 and ∥ s E j − ¯ s E τ K k ,τ K k +1 ∥ 2 2 dep enden t for those i and j in the same sub-in terv al. Hence, we replace ¯ s E τ K k ,τ K k +1 with its coun terpart in Z O , denoted b y ¯ s O τ K k ,τ K k +1 , and refine the ab o ve criterion as C ( T K ; Z E ) := 1 n K X k =0 τ K k +1 X i = τ K k +1 ∥ s E i − ¯ s O τ K k ,τ K k +1 ∥ 2 2 . (2.3) 9 F urther let ¯ s O K,i = P K k =0 1 { τ K k +1 ,τ K k +1 } ¯ s O τ K k ,τ K k +1 , then (2.3) can b e rewritten as C ( T K ; Z E ) = n − 1 n X i =1 ∥ s E i − ¯ s O K,i ∥ 2 2 . As discussed in Section 1, for the true n umber of c hange-p oin ts K ∗ , while C ( T K ∗ ; Z E ) migh t not b e the minimum among all candidate mo dels in M , it is often reasonably close to the minim um. F ormally , this motiv ates us to establish the follo wing hypotheses for each candidate K ∈ M : H 0 ,K : E [ C ( T K ; Z E )] is the minimum in M v.s. H 1 ,K : E [ C ( T K ; Z E )] is not the minimum in M . (2.4) F ollowing the philosoph y of h yp othesis testing, we do not reject H 0 ,K unless C ( T K ; Z E ) significan tly departs from the minim um. Then H 0 ,K ∗ , the null hypothesis corresponding to the true n um b er K ∗ , is exp ected not rejected with an o v erwhelming probabilit y . Therefore, the confidence set A can b e naturally defined as the collection of K ∈ M whose H 0 ,K ’s are not rejected. In other words, let p K b e the asso ciated p -v alue for the testing problem (2.4), then for a predetermined significance level α , A := { K ∈ M : p K > α } . By this means, the members in A are statistically equiv alen t with resp ect to the criterion C ( · , Z E ); that is, all the num b er of c hange-p oin ts in this set are highly competitive. Indeed, as to b e seen in Section 3, the selected set A cov ers the true n um b er K ∗ with probabilit y at least 1 − α : Pr { K ∗ ∈ A} = Pr { p K ∗ > α } ≥ 1 − α. The remaining task is to obtain the p -v alue p K asso ciated with the testing problem 10 (2.4) for eac h K ∈ M . T o accomplish this, for an y J , K ∈ M , w e further define δ K,J = E[ C ( T K ; Z E ) − C ( T J ; Z E )] . Hence, the h yp otheses in (2.4) are equiv alent to H 0 ,K : max J ∈M ,J = K δ K,J ≤ 0 v.s. H 1 ,K : max J ∈M ,J = K δ K,J > 0 . (2.5) One p ossible p oin t estimate of δ K,J in (2.5) is ˆ δ K,J = 1 n n X i =1 ∥ s E i − ¯ s O K,i ∥ 2 2 − ∥ s E i − ¯ s O J,i ∥ 2 2 := 1 n n X i =1 ξ ( i ) K,J , (2.6) where ξ ( i ) K,J := ∥ s E i − ¯ s O K,i ∥ 2 2 − ∥ s E i − ¯ s O J,i ∥ 2 2 , i = 1 , . . . , n , are indep enden t random v ariables giv en the o dd sample Z O . Then naturally we can further take max J ∈M ,J = K ˆ δ K,J = max J ∈M ,J = K 1 n n X i =1 ξ ( i ) K,J , and the test statistic can b e set as its studen tized v ersion: T K = max J = K √ n ˆ δ K,J ˆ σ K,J = max K = J 1 √ n n X i =1 ξ ( i ) K,J ˆ σ K,J , (2.7) where ˆ σ 2 K,J = n − 1 P n i =1 ( ξ ( i ) K,J ) 2 is the estimated second moment. Next, w e calculate the p -v alues corresp onding to T K using a Gaussian comparison and b o otstrap metho d analogous to Chernozhuk ov et al. (2013, 2017). T o b e sp ecific, we first generate indep endent standard Gaussian random v ariables ζ i , i = 1 , . . . , n . Then for b = 1 , . . . , B , where B is the total num b er of b o otstrap runs, define the b th b o otstrap 11 statistic as T ♯ K,b = max K = J 1 √ n n X i =1 ξ ( i ) K,J ˆ σ K,J ζ i . (2.8) Then the p -v alue is naturally set to b e ˆ p K = B − 1 P B b =1 I( T ♯ K,b > T K ). In (2.8), a non- cen tered b o otstrap statistic is used. The reason is, as the random v ariables { ξ ( i ) K,J , i = 1 , . . . n } ma y ha ve v arying means and v ariance, it is in tricate to v erify the sample v ariance con v erges to p opulation v ariance under suc h a fluctuated scenario. Ho wev er, the regular law of large num b ers still implies that the sample second moment is exp ected to appro ximate its p opulation counterpart. Notably , since the second moment serv es as an upp er b ound of v ariance, the non-cen tered b ootstrap statistic tends to b e slightly conserv ativ e, while ensuring the t yp e-I error. In sum, the confidence set A for the true n um b er of c hange-p oin ts K ∗ is constructed up on a test-based metho d with the order-preserv ed sample splitting tec hnique. Th us, w e name the confidence set to b e Order-Preserv ed T est-Inv erse Confidence Set (OPTICS). The en tire procedure for obtaining OPTICS is summarized in the following steps. 1. (Initialization) . Giv en a prop er γ in (2.2), calculate the score functions s i for i = 1 , . . . , 2 n . 2. F or eac h given candidate n um b er of change-points K ∈ M : 2.1 (T raining) . Obtain the estimated change-position set T K based on the o dd sample Z O . Compute the piecewise av erages ¯ s O τ K k ,τ K k +1 in (2.3), or equiv alently , ¯ s O K,i . 2.2 (V alidation) . F or i = 1 , . . . , n and K = J , compute ξ ( i ) K,J in (2.6) using the ev en sample Z E . F urther obtain the test statistic T K in (2.7). 2.3 (Bo otstrapping) . F or b = 1 , . . . , B , calculate the Gaussian m ultiplier b o ot- strap statistic T ♯ K,b in (2.8), as w ell as the asso ciated p -v alue ˆ p K . 12 3. (OPTICS) . Given a significance level α , the OPTICS is taken to b e A = { K ∈ M : ˆ p K > α } . (2.9) 2.3 Practical guidelines of OPTICS 2.3.1 Choice of loss function and computational complexity The c hoice of the loss function and c hange-p oint detection metho ds is crucial in c hange- p oin t analysis. T able S.1 in the Supplemen tary Material provides a comprehensiv e o v erview of settings for mean, v ariance, regression co efficien t, nonparametric change-point mo dels, co v ariance c hange-p oints mo del and Net work c hange-p oin ts mo del (F an et al., 2025). The o v erall computational complexit y of OPTICS dep ends on the change-point detec- tion metho ds and the b o otstrap testing pro cedure. The complexit y of change-point detec- tion v aries depending on the mo del and the sp ecific detection method used (see T ruong et al. (2020) for details). F or b o otstrap testing, if w e treat basic mathematical op erations suc h as addition, subtraction, and m ultiplication as O (1), eac h computation of (2.6) has a complexit y of O ( nd p ), where d p is the dimension of s 1 . Completing the en tire testing pro- cedure requires O ( B K max nd p ) op erations, where B is the n umber of b o otstrap samples and K max is the maximum n um b er of change-points. How ever, b y parallelizing the b o otstrap pro cedure, the computational burden can be significantly reduced. 2.3.2 Reduce a set to a single num b er of c hange-p oints The OPTICS pro duces a set of num b ers of c hange-p oin ts A . It is eligible to guaran tee the predetermined confidence and thus is typically more informativ e in practice. How ever, in some circumstances, a single estimated num b er is still desirable, esp ecially when the ultimate goal is to estimate the change-positions. One suggestion is to adopt the rightmost n um b er ¯ K in A , as it is guaran teed to b e larger than the true n um b er K ∗ with probabilit y 13 1 − α : Pr { ¯ K ≥ K ∗ } ≥ Pr { K ∗ ∈ A} ≥ 1 − α. ¯ K tends to b e a sligh t ov erestimate of K ∗ . Additionally , the leftmost n umber K could also b e chosen if the primary goal is to control the FWER, as FWER = Pr { K > K ∗ } ≤ Pr { K ∗ ∈ A} ≤ α. Another suggestion is the post-ho c strategy , whic h incorporates the data-driven OP- TICS with domain knowledge. That is, pick the member in A with the most comp elling scien tific or industrial interpretation. F or instance, in time-series change-point detection, if certain sp ecific p ositions are known to b e true changes, it is advised to adopt the mem- b er in A with whic h these c hange-p ositions can b e successfully detected. This strategy allo ws us to mak e informativ e and inter pretable decisions, as well as to precisely estimate c hange-p ositions. W e emphasize that OPTICS returns a set of change-point num b ers that are no worse than any other n um b er of change-points in the criterion C defined in (2.3). If OPTICS w ere to return an empty set, it w ould imply that for any giv en n umber of c hange-p oin ts, there exists another n um b er of change-points that significan tly outp erforms the candidate, leading to a con tradiction. Therefore, OPTICS alw ays returns at least one change-point n um b er. 2.3.3 Multiple-splitting OPTICS for finite-sample stability In practice, the resulting confidence set may b e sensitiv e to the particular split used in the OPTICS pro cedure, esp ecially when the sample size is mo derate or the signal is w eak. T o enhance finite-sample stabilit y , w e can apply OPTICS with multiple order-preserving splits and then com bine the resulting evidence. 14 T o b e sp ecific, let L ≥ 2 b e a fixed integer, and suppose for simplicity that n = Lm for some integer m . F or each r = 1 , . . . , L , define the r -th order-preserving subsample by Z ( r ) = { z r + Lj : j = 0 , . . . , m − 1 } . Under indep endent observ ations, each Z ( r ) remains an indep enden t sample and preserves the original temp oral order. W e further split each subsample Z ( r ) in to an “o dd sample” Z ( r ) O and an “ev en sample” Z ( r ) E . F or eac h split r , we apply the ordinary OPTICS to ( Z ( r ) O , Z ( r ) E ). Specifically , for eac h candidate K ∈ M , w e use Z ( r ) O to fit the K -change-point mo del and Z ( r ) E to ev aluate its out-of-sample p erformance. Let ¯ s ( r,O ) K,i denote the analogue of ¯ s O K,i constructed from Z ( r ) O , and let s ( r,E ) i denote the score v ector from Z ( r ) E . F or J ∈ M\{ K } , define ˆ δ ( r ) K,J = 1 n r n r X i =1 s ( r,E ) i − ¯ s ( r,O ) K,i 2 2 − s ( r,E ) i − ¯ s ( r,O ) J,i 2 2 , where n r := |Z ( r ) E | . Based on ˆ δ ( r ) K,J , w e compute the split-sp ecific OPTICS test statistic and the corresp onding p -v alue ˆ p ( r ) K exactly as in the original pro cedure. T o aggregate the evidence across differen t order-preserving splits, w e adopt the Cauc h y com bination method (Liu and Xie, 2020). F or each K ∈ M , define T K = L X r =1 ω r tan n 0 . 5 − ˆ p ( r ) K π o , where the weigh ts ω r are nonnegative and satisfy P L r =1 ω r = 1. The corresp onding com- bined p -v alue is ˆ p MS K = 1 2 − 1 π arctan T K . 15 W e then define the m ultiple-splitting OPTICS (MS-OPTICS) confidence set by A MS = { K ∈ M : ˆ p MS K > α } . The multiple-splitting version has the same in terpretation as the original OPTICS, but is typically less sensitiv e to the choice of a particular sample split (See Subsection S.5.7 in Supplemen tary Material for comparison). It therefore pro vides a simple and practical re- finemen t when greater numerical stability is desired in mo derate samples. In our view, the original single-splitting OPTICS remains the default choice b ecause of its simpler presen- tation and low er computational cost, whereas the m ultiple-splitting version is best viewed as a practical enhancemen t. 3 Theory of OPTICS In this section, w e systematically study the theoretical properties of OPTICS. W e first pro v e that the Gaussian multiplier b o otstrap pro cedure indeed pro duces v alid p -v alues under the desired n ull space. The second subsection discusses the co verage rate of OPTICS. The theoretical rate of cardinality of OPTICS, which corresp onds to the p o w er of test, is pro vided in the third subsection. 3.1 V alidit y of the b o otstrap pro cedure T o study the theoretical guarantee of the Gaussian multiplier b o otstrap pro cedure in our problem setting, w e first impose the follo wing technical assumptions. Condition 3.1 (T ails and momen ts) . F or i = 1 , . . . , 2 n , supp ose ther e exist some p ositive c onstants M 1 and M 2 , such that (i) for any c onstant ve ctor b , ∥ b ⊤ s i ∥ ψ 1 ≤ M 1 ∥ b ∥ 2 ; 16 (ii) for any j = 1 , . . . , d , we have 1 n P n i =1 E [ | s ij | 3 ] ≤ M 1 and 1 n P n i =1 E [ | s ij | 4 ] ≤ M 2 1 , wher e s ij is the j th element in s i ; (iii) λ min (V ar[ s i ]) ≥ M 2 and λ max (E[ s i s ⊤ i ]) ≤ M 1 , wher e V ar( s i ) is the c ovarianc e matrix of s i . Condition 3.2 (Distance betw een c hange-p ositions) . F or any two distinct change p ositions τ k and τ k ′ in T K , | τ k − τ k ′ | ≳ n/ log( n ) . Condition 3.1 provides a sub-exp onential tail b ound for the data and the moment conditions. Similar conditions can b e found in other change-point literature; see Liu et al. (2020) and Y u and Chen (2021) for instances. Condition 3.2 requires a sufficien t distance b et w een any tw o c hange-p oin ts in T K , since distinguishing t w o c hange-p oin ts b ecomes c hallenging if they are to o close. This condition is fairly mild when the candidate mo del K ≪ n (Chen et al., 2023). This condition is also imp osed in sev eral prominen t w orks, suc h as Wild Binary Segmentation (WBS); see Assumption 3.2 in F ryzlewicz (2014), and more recen tly , the Narro west-Ov er-Threshold (NOT) m etho d; see Theorem 1 in Baranowski et al. (2019). Theorem 3.1. Supp ose Mo del (2.1) holds. L et F O denote the σ -field gener ate d by the observe d sample use d to c onstruct { ¯ s O K,i : K ∈ M , 1 ≤ i ≤ n } . F or J ∈ M\{ K } , define δ K,J = 1 n P n i =1 E ξ ( i ) K,J | F O , and σ 2 K,J = 1 n P n i =1 E ( ξ ( i ) K,J ) 2 | F O . Assume K max ≍ log( n ) and Conditions 3.1 and 3.2 hold. In addition, assume ther e exists a c onstant c 0 > 0 such that min J ∈M\{ K } σ 2 K,J ≥ c 0 with pr ob ability tending to one. Then, (1) If max J = K δ K,J /σ K,J ≤ x n ( n log( n )) − 1 / 2 for some x n = o (1) , then for n → ∞ , Pr { H 0 ,K is not r eje cte d at level α } ≥ 1 − α + o (1) . (2) If α ≥ n − 1 and max J = K δ K,J /σ K,J ≥ cn − 1 / 2 log( n ) for a sufficiently lar ge c onstant 17 c > 0 , then for n → ∞ , Pr { H 0 ,K is not r eje cte d at level α } = o (1) . Theorem 3.1 (1) depicts an “appro ximate n ull” space, i.e, max J = K δ K,J /σ K,J ≤ x n ( n log( n )) − 1 / 2 , where w e do not reject H 0 . It implies that the p-v alue obtained from the b o otstrap is pre- serv es the type I error under this appro ximate null. Mean while, (2) claims that OPTICS w ould successfully rule out those candidate mo dels with inferior predictive p o wer. A direct implication of Theorem 3.1 is that when COPSS in Zou et al. (2020) is con- sisten t, OPTICS also cov ers the true num b er of c hange-p oin ts K ∗ with confidence level 1 − α asymptotically . T o in tuitively see this, consider the sp ecial scenario when K ∗ is indeed the minimizer among all the candidate mo dels in M . In this case, COPSS is con- sisten t, since it selects the minimizer to b e the estimated n umber of change-points. On the other hand, we ha v e max K = K ∗ δ K ∗ ,K /σ K ∗ ,J ≤ 0, thus Theorem 3.1 (1) indicates that Pr { K ∗ ∈ A} ≥ 1 − α + o (1). Nev ertheless, as demonstrated in the next section, OPTICS preserv ed the confidence (cov erage rate) under conditions that are less stringent than those for COPSS. 3.2 Co verage rate of OPTICS In this section, w e systematically study the co v erage rate of OPTICS. Let λ = min 1 ≤ k ≤ K ∗ ( τ ∗ k +1 − τ ∗ k ), ¯ λ = max 1 ≤ k ≤ K ∗ ( τ ∗ k +1 − τ ∗ k ) b e the minim um and maximum of distances b etw een adjacen t true change-positions resp ectiv ely . F or k = 1 , . . . , K ∗ , denote ∆ k := ∥ µ k +1 − µ k ∥ 2 as the jump size of the k th c hange in mo del (2.2), and ∆ ( k ) as the corresp onding k th order statistic. Without loss of generality , assume the true n um b er of c hange-p oin ts K ∗ b elongs to the candidate set M , i.e., K max ≥ K ∗ . Note that M can alw a ys b e chosen conserv atively to include K ∗ . Let M l = { K ∈ M : K < K ∗ } b e the lac k-of-fit set, and M o = { K ∈ M : K > K ∗ } b e the o v er-fit set. Hence, the candidate set 18 M is naturally partitioned into M = M o ∪ M l ∪ { K ∗ } . W e imp ose the following conditions b efore introducing the theoretical results for cov erage rate of OPTICS. Condition 3.3 (Number of c hange-p oints) . (i) K ∗ = o ( λ ) and K ∗ (log( K ∗ ∨ e )) 2 = o (log log ( ¯ λ )) for Euler’s numb er e ; (ii) ( K ∗ max log K ∗ max ) 1 / 2 = o (log log ¯ λ ) . Condition 3.4 (Accuracy of estimation) . (i) (Over-fit) F or any K ∈ M o , denote the c orr esp onding estimate d change-p osition set as T oK = { τ K o 1 , . . . , τ K oK } . Ther e exists a subset { τ K ok s , s = 1 , . . . , K ∗ } ⊂ T oK , such that Pr ∀ τ ∗ k ∈ T ∗ , τ K ok s − τ ∗ k ≤ b n → 1 , wher e b n > 0 satisfies K ∗ log log ( b n ∨ e ) = o (log log( ¯ λ )) and P K ∗ k =1 b n ∆ 2 k = o ( M 1 log log ( ¯ λ )) . (ii) (L ack-of-fit) F or any K ∈ M l , denote the c orr esp onding estimate d change-p osition set as T lK = { τ K l 1 , . . . , τ K lK } . Ther e exists a subset of true change-p ositions I ∗ lK ⊂ T ∗ , such that for any τ ∗ k ∈ I ∗ lK , no estimate d change-p oint lies within interval { τ ∗ k − λ/ 2 + 1 , . . . , τ ∗ k + λ/ 2 } . F urthermor e, denote ¯ µ τ K lk ,τ K l ( k +1) to b e the aver age of { µ i , i = τ K lk + 1 , τ K lk + 2 , . . . , τ K l ( k +1) } . Then for some c onstant M 3 > 0 , Pr ∀ τ ∗ k ∈ I ∗ lK , τ ∗ k + λ 4 X i = τ ∗ k − λ 4 +1 ∥ µ i − ¯ µ K,i ∥ 2 2 ≥ M 3 λ ∆ 2 k → 1 , (3.1) with ¯ µ K,i := P K k =0 1 n τ K lk +1 ≤ i ≤ τ K l ( k +1) o ¯ µ τ K lk ,τ K l ( k +1) . Condition 3.5 (Minimum Signal) . Assume the minimum jump size ∆ (1) satisfies λ ∆ 2 (1) K ∗ M 1 (log ¯ λ ) 2 → ∞ , wher e M 1 is define d in Condition 3.1. 19 Condition 3.3 imp oses some standard assumptions on the n umber of change-points. Condition 3.4 (i) states that under the ov er-fitting setting where K > K ∗ , for each true change τ ∗ k ∈ T ∗ , there must exist an estimated change-point lying within the c n - neigh b orho o d of τ ∗ k asymptotically . This estimation accuracy ensures the reliability of the order-preserv ed cross-v alidation criterion. Condition 3.4 (ii) claims that for an under-fitted mo del with K < K ∗ , some undetected true c hange-p ositions ha v e to b e isolated from all the estimated ones by length λ/ 2, and the estimation errors of mean scores due to the undetected c hange-p ositions are not neglected. Condition 3.5 b ounds b elow the minim um jump size of mean scores across true adjacen t in terv als. Conditions 3.4 and 3.5 are imp osed to guaran tee the consistent selection of c hange-p oin ts; see the discussion in P ein and Shah (2025). Theorem 3.2. Supp ose Conditions 3.1 – 3.5 hold. Assume that the maximum discr ep ancy satisfies min K = K ∗ max i =1 ,...,n ∥ ¯ s O K ∗ ,i − ¯ s O K,i ∥ 2 2 ≳ ∆ 2 ( K ∗ ) . If the maximum jump size ∆ ( K ∗ ) satisfies ∆ 2 ( K ∗ ) ≳ x − 2 n (log log ¯ λ ) 2 log 2 ( n ) n , (3.2) then Pr { K ∗ ∈ A} ≥ 1 − α + o (1) . The inequalit y (3.2) in Theorem 3.2 states that the true num b er of change-points K ∗ indeed lies within the “appro ximate n ull” space. Therefore, incorp orating Theorem 3.1, the co v erage rate of A can b e guaran teed. The conditions for Theorem 3.2 relax those in Zou et al. (2020) and Pein and Shah (2025). The latter tw o pap ers b oth additionally imp ose a divergen t lo wer b ound for the “ov er-fit” effect; to b e sp ecific, they require for some c n → ∞ , S ϵ O ( T ∗ ) − max K ∈M o S ϵ O ( T ∗ ∪ T K ) ≥ c n , (3.3) where ϵ O = { ϵ 2 i − 1 , i = 1 , . . . , n } represents the collection of individual errors from the 20 giv en o dd sample, with ϵ i defined in mo del (2.2). W e will explore the explicit rate of c n in Section 3.3 to enhance the p ow er of the metho d. As sho wn in Section 5, the divergence of c n in (3.3) can b e fairly stringent in man y practical scenarios, potentially leading to the failure of consistency for the resp ectiv e estimations. 3.3 Cardinalit y of OPTICS Theorem 3.2 guaran tees that OPTICS A cov ers the true n umber of change-points K ∗ at the nominal confidence lev el asymptotically . Ho w ev er, it is alwa ys p ossible to select a sufficien tly conserv ativ e A that encompasses K ∗ , suc h as the trivial set M . In this instance, OPTICS is non-informativ e and has no p o wer. Therefore, in this subsection, we delve into examining the p o wer of OPTICS by analyzing its cardinality . W e first define the following t w o sets to depict the under-fit and ov er-fit signal-to-noise ratios, resp ectiv ely . Definition 3.1 (Signal-to-noise ratio) . (i) In a lack-of-fit mo del K ∈ M l , c onsider the undete cte d change-p osition set I ∗ lK in Con- dition 3.4. F or a sufficiently lar ge n , define B 1 n := K ∈ M l : X τ ∗ k ∈I ∗ lK ∆ 2 k ≳ M 1 ∆ 2 ( K ∗ ) p n log( n ) /λ , wher e ∆ ( K ∗ ) is the maximum jump size b etwe en adjac ent changes, as define d in Se ction 3.2. (ii) In an over-fit mo del K ∈ M o , for a sufficiently lar ge n , define B 2 n := n K ∈ M o : E {S s E ( T K ) − S s E ( T K ∪ T ∗ ) } + {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } ≳ M 1 ∆ 2 ( K ∗ ) p n log( n ) o . Note that set B 1 n in Definition 3.1 (i) collects a subset of lack-of-fit mo dels, where the jump sizes of undetected c hanges are b ounded below. Set B 2 n consists of some o v er-fitted 21 mo dels whose in-sample o ver-fitting effects are sufficien tly large. Based on B 1 n and B 2 n , the follo wing theorem provides the asymptotic upp er b ounds for the cardinality of OPTICS. Theorem 3.3. Supp ose Mo del (2.1) holds. Under Conditions 3.1 – 3.5 and assuming the maximum discr ep ancy satisfies max i =1 ,...,n ∥ ¯ s O K,i − ¯ s O J,i ∥ 2 2 ≲ ∆ 2 ( K ∗ ) for al l p airs K = J with asymptotic pr ob ability 1 , we have as n → ∞ : Pr {|A| ≤ K max − |B 1 n | − |B 2 n |} ≥ 1 − o (1) , (3.4) and Pr { K ∗ ∈ A , |A| ≤ K max − |B 1 n | − |B 2 n |} ≥ 1 − α + o (1) . (3.5) Theorem 3.3 confines the cardinalit y of confidence set A . If for sufficien tly large n , the minim um jump size on its o wn satisfies ∆ 2 (1) ≥ M 1 ∆ 2 ( K ∗ ) p n log( n ) /λ , then all lack-of-fit mo dels b elong to B 1 , hence |B 1 | = |M l | . Based on (3.4) in Theorem 3.3, the confidence set A excludes all lac k-of-fit mo dels where K ∈ M l with o verwhelming probability . If further all ov er-fit models belong to B 2 , the cardinalit y of OPTICS will b e 1, and Pr {A = K ∗ } → 1. On the other hand, if unfortunately b oth under-fitting and o ver-fitting signals are not sufficien tly strong, suc h that there are no elements in either B 1 or B 2 , then (3.5) in Theorem 3.3 degenerates to the cov erage rate result in Theorem 3.2. 4 Tw o extensions of OPTICS In this section, w e extend OPTICS to the heav e-tailed data and m-dep enden t data. 4.1 Extension of OPTICS to hea vy-tailed data W e explore the p oten tial extension of the OPTICS method to hea vy-tailed data. In this scenario, the tail b eha vior of the scores s i defined in (2.2) may exhibit hea vy-tailed 22 c haracteristics, causing the Sub-Gaussian condition in Condition 3.1 to no longer hold. This issue could result in the failure of the OPTICS metho d. A p ossible solution is to generalize the criterion in (2.3) from the ℓ 2 loss to a robust loss function, such as the Hub er loss (Hub er, 1992), defined as ℓ κ ( u ) = u 2 / 2 , if | u | ≤ κ κ | u | − κ 2 / 2 , if | u | > κ, where κ > 0 is a tuning parameter that balances bias and robustness. Then, we define a robust version of (2.3) as C κ ( T K ; Z E ) := 1 n K X k =0 τ K k +1 X i = τ K k +1 ℓ κ s E i − ¯ s O τ K k ,τ K k +1 , (4.1) where ℓ κ ( x ) = P p i =1 ℓ κ ( x i ) for x ∈ R p . Alternatively , the ℓ 1 loss function can also b e used. In tuitiv ely , the difference s E i − ¯ s O τ K k ,τ k +1 K ≈ s E i − E s E i = ϵ E i . When ϵ E i exhibits heavy- tailed b ehavior, the ℓ 2 loss function becomes unsuitable, as it can b e distorted b y extreme v alues. In con trast, the Hub er loss is m uch more robust, as it transitions from the ℓ 2 loss to the ℓ 1 loss when ϵ E i deviates significan tly from zero. Then, we can define a robust v ersion of (2.6) as ˆ δ K,J,κ = 1 n n X i =1 ℓ κ s E i − ¯ s O K,i − ℓ κ s E i − ¯ s O J,i := 1 n n X i =1 ξ ( i ) K,J,κ . (4.2) The corresp onding test statistic is T K,κ = max J = K √ n ˆ δ K,J,κ ˆ σ K,J,κ = max K = J 1 √ n n X i =1 ξ ( i ) K,J,κ ˆ σ K,J,κ , (4.3) where ˆ σ 2 K,J,κ = n − 1 P n i =1 ( ξ ( i ) K,J,κ ) 2 is the estimated second moment. F or this robust version 23 of statistic, we can also use the Gaussian multiplier bo otstrap statistic similar as (2.8), whic h is T ♯ K,κ,b = max K = J 1 √ n n X i =1 ξ ( i ) K,J,κ ˆ σ K,J,κ ζ i . (4.4) Then the p -v alue is naturally set to b e ˆ p K,κ = B − 1 P B b =1 I( T ♯ K,κ,b > T K,κ ). W e refer to Chen and Zhou (2020) for adaptiv ely selecting the hyperparameter κ . W e name this ex- tension as Hub er-OPTICS (H-OPTICS). Subsection S.5.4 of the Supplemen tary Material demonstrates the sup erior p erformance of our proposed metho d through simulation studies 4.2 Extension of OPTICS to m -dep endent data W e extend the OPTICS metho d to handle dependent data, sp ecifically fo cusing on m - dep enden t data, which is commonly used in econometrics applications (Mo on and V elasco, 2013; Kok oszk a et al., 2018). W e restrict our discussion in m ultiple mean change-points detection, where z i = β ∗ k + ϵ i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1; i = 1 , . . . , ( m + 1) n, where ϵ i ∼ ( 0 , Σ ) for cov ariance matrix Σ ∈ R d × d . This mo del is a sp ecial case of (2.1). In our w ork, w e define a sto c hastic pro cess Z = { z 1 , . . . , z ( m +1) n } is called m -dep endent for m ≥ 0, if { z 1 , . . . , z i − 1 , z i } and { z i + m +1 , z i + m +2 , . . . , z ( m +1) n } are indep enden t. Here, w e assume sample size is ( m + 1) n to simplify the presentation. When m = 0, m -dep endence reduces to standard indep endence. F or m -dep enden t data, w e apply the order-preserving ( m + 1)-splitting Z ( r ) = { z r +( m +1) i : i = 0 , . . . , n − 1 } , r = 1 , . . . , m + 1 . 24 Then each subsequence Z ( r ) consists of indep enden t observ ations. W e may therefore apply the m ultiple-splitting OPTICS pro cedure as in Section 2.3, yielding a split-sp ecific p -v alue for each K ∈ M . These p -v alues are then combined to obtain a single combined p -v alue. W e refer to this extension as m -dep enden t OPTICS (m-OPTICS). Subsections S.5.5 and S.5.6 of the Supplemen tary Material provide thorough sim ulations to inv estigate the effects of m -dep endence. W e find that m-OPTICS is robust to v arious m -dep endent structures and p erforms esp ecially well when m is mo derate. How ever, we should emphasize that m cannot b e excessiv ely large. A v ery high m leads to a loss of selection p ow er, as each subsample will con tain to o few observ ations to yield a reliable c hange-p oint estimation. 5 Sim ulation studies In this section, we conduct simulation studies under v arious change-point mo del set- tings to assess the empirical performance of OPTICS. Two base change-point detection algorithms are used in the training stage for constructing OPTICS - the Binary Segmenta- tion (BS, F ryzlewicz (2014)) and the Segmen tation Neighborho o d (SN, Auger and Lawrence (1989)). F or comparison, we consider the consisten t estimation metho d COPSS (Zou et al., 2020), the FDR-control metho d FDRseg (Li et al., 2016) and the FWER-con trol metho d SMUCE (F rick et al., 2014). W e also adopt BS and SN as base algorithms for COPSS. Since no state-to-art confidence set construction methods for the num b er of c hange-p oin ts are av ailable in literature, w e artificially create the quasi-confidence set based on eac h p oin t estimate from COPSS, FDRseg or SMUCE, respectively , to match the cardinality of OPITCS. This also naturally enhances the cov erage rate of these existing metho ds. Sp ecifically , giv en the p oin t estimate ˆ K , the asso ciated quasi-confidence set is defined as A q := { ˆ K − q , ˆ K − q + 1 , . . . , ˆ K , . . . , ˆ K + q } with cardinalit y 2 q + 1. W e remark that these sets only facilitate a fair n umerical comparison with comparable cardinality with OPTICS; 25 they lack theoretical justification of the cov erage rate. Throughout this section, the candidate set is taken as M = { 1 , 2 , . . . , log ( n ) } following Zou et al. (2020), and the significance level α = 0 . 1. W e conduct 100 sim ulation runs for eac h setting, and rep ort the co verage rate P 100 i =1 I( K ∗ ∈ A i ) / 100 and the a v erage cardinalit y of confidence set P 100 i =1 |A i | / 100, where A i denotes the set obtained in the i th sim ulation run. 5.1 Multiple mean-c hange mo del Consider the m ultiple mean-c hange mo del y i = µ ∗ k + ϵ i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1 , i = 1 , . . . , 2 n, where τ ∗ k , k = 1 , . . . , K ∗ are the true mean chang e-p oin ts, µ ∗ k is the d -dimensional mean v ector for sub ject i when τ ∗ k − 1 < i ≤ τ ∗ k ; ϵ i is the indep enden tly and identically distributed random error. In this example, b oth the univ ariate mean with d = 1 and the m ultiv ariate mean v ector with d = 5 are studied. How ever, as FDRseg and SMUCE only w ork for univ ariate mean c hange-p oin t detection, w e only compare OPTICS with COPSS when d = 5. The sample size is taken to b e 2 n = 1000, and the set of true c hange-p oin ts T ∗ = { τ ∗ k = 200 k , k = 1 , . . . , 4 } , hence K ∗ = 4. The k th mean vector µ ∗ k = ( − 1) k − 1 A 1 d , k = 1 , . . . , 4, where A is a scalar, representing the amplitude, and 1 d is the d -dimensional vector with all elemen ts 1’s. The amplitude A v aries among { 0 . 50 , 0 . 625 , 0 . 75 , 0 . 875 , 1 } . Two error distributions are under consideration - the standard normal distribution N (0 , 1) and a t-distribution t (10). As we observe similar phenomenon for these t w o types of error distributions, w e only exhibit that for the more c hallenging t (10) case. Refer to Section S.5 of Supplemen tary Material for the sim ulation results under N (0 , 1). T able 1 and 2 rep ort the co v erage rates when d = 1 and d = 5, resp ectiv ely , under 26 t-distributed errors. The quantities in paren theses are av erage cardinalities of estimated sets, from whic h we observ e the a verage cardinalit y of OPTICS is around 3. Therefore, to matc h this cardinality , we tak e q = 1 for the quasi-confidence sets for COPSS, FDRseg and SMUCE, i.e., A 1 = { ˆ K − 1 , ˆ K , ˆ K + 1 } with cardinalit y 3, and denote the generated sets b y A 1 C O P S S , A 1 F D Rseg and A 1 S M U C E , resp ectiv ely . W e also rep ort the cov erage rate solely b y the p oin t estimates in the last four ro ws. F rom b oth tables, the cov erage rate of OPTICS gradually meets the nominal lev el with decreasing a v erage cardinality as the amplitude A increases, esp ecially with the SN detection algorithm. This empirically illustrates the v alidity of Theorem 3.2 and 3.3. The BS algorithm is less fav orable due to its nature as an appro ximation algorithm rather than an exact one; thus, the accuracy of detected c hange p ositions falls short of meeting the requiremen ts in Condition 3.4. Nev ertheless, OPTICS with b oth SN and BS are sup erior to the quasi-confidence sets generated from COPSS, FDRreg and SMUCE in terms of co v erage probabilities. Recall that these quasi-confidence sets do not p ossess any theoretical guaran tee. On the other hand, as depicted in the last four rows of T able 1, the p oin t estimates exhibit lo w probabilities of correctly encompassing the true n um b er of c hange- p oin ts. F urthermore, OPTICS demonstrates its sup eriorit y compared to SMUCE and FDRseg esp ecially under non-Gaussian errors. This adv an tage stems from SMUCE and FDRseg utilizing the suprem um of Brownian motion to establish cutoffs, which b ecome misaligned under non-Gaussian assumptions. Notably , SMUCE lac ks p o wer when dealing with small amplitudes, elucidating the stringency of using the FWER-t yp e criterion. Additionally , it is w orth noting that OPTICS serv es as a unified metho d applicable to the multiv ariate c hange-p oin t mo dels, while b oth SMUCE and FDRseg are limited to scenarios where d = 1. 27 T able 1: The co v erage rates in the mean-change mo del: d = 1; t-distributed error. Amplitude A 0.50 0.625 0.75 0.875 1.00 OPTICS(BS) 0.72(4.02) 0.82(4.11) 0.89(3.48) 0.86(2.86) 0.80(2.75) OPTICS(SN) 0.89(4.40) 0.92(4.22) 0.97(3.88) 0.98(2.82) 0.99(2.80) A 1 C O P S S (BS) 0.46(3.00) 0.58(3.00) 0.69(3.00) 0.80(3.00) 0.88(3.00) A 1 C O P S S (SN) 0.38(3.00) 0.45(3.00) 0.80(3.00) 0.88(3.00) 0.97(3.00) A 1 F D Rseg 0.69(3.00) 0.64(3.00) 0.59(3.00) 0.58(3.00) 0.61(3.00) A 1 S M U C E 0.28(3.00) 0.62(3.00) 0.94(3.00) 0.97(3.00) 0.96(3.00) COPSS(BS) 0.19 0.25 0.46 0.47 0.48 COPSS(SN) 0.18 0.33 0.57 0.77 0.85 FDRseg 0.39 0.45 0.44 0.43 0.38 SMUCE 0.07 0.30 0.58 0.81 0.89 T able 2: The co v erage rates in the mean-change mo del: d = 5; t-distributed error. Amplitude A 0.50 0.625 0.75 0.875 1.00 OPTICS(BS) 0.60(3.37) 0.54(2.84) 0.61(2.17) 0.58(2.29) 0.70(2.06) OPTICS(SN) 0.73(4.09) 0.82(3.40) 0.78(2.33) 0.91(2.29) 0.97(2.37) A 1 C O P S S (BS) 0.51(3.00) 0.51(3.00) 0.65(3.00) 0.77(3.00) 0.84(3.00) A 1 C O P S S (SN) 0.58(3.00) 0.66(3.00) 0.85(3.00) 0.92(3.00) 0.93(3.00) COPSS(BS) 0.20 0.23 0.38 0.40 0.54 COPSS(SN) 0.24 0.39 0.68 0.84 0.89 5.2 Linear regression mo del with co efficien t structural-breaks The change-point detection problem can b e naturally extended to the co efficient structural- c hange detection in linear regression mo dels. In this subsection, we consider the follo wing linear regression mo del with K ∗ p otential co efficien t structural-breaks: y i = x ⊤ i β ∗ k + ϵ i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1 , i = 1 , . . . , 2 n, where β ∗ k = ( − 1) k − 1 A 1 d , k = 1 , . . . , K ∗ + 1. W e generate the co v ariate x i from the m ultiv ariate normal distribution N ( 0 , I d ), with d = 5, and the error term ϵ i from N (0 , 1) and t (10), resp ectively . The sample size n = 1000, the true num b er of change-points K ∗ = 4, and the true change-point set T ∗ = { 200 k , k = 1 , . . . , 4 } . The cov erage rates under the t (10) distribution are presented in T able 3. F or the co v erage rates under N (0 , 1), please refer to T able S.4 in Section S.5 of Supplementary 28 T able 3: The cov erage rates in linear mo del with co efficient structural-breaks; t (10) errors. Amplitude A 0.10 0.125 0.15 0.175 0.20 OPTICS(BS) 0.76(3.94) 0.89(3.67) 0.79(3.26) 0.94(2.80) 0.85(2.39) OPTICS(SN) 0.85(3.22) 0.95(2.65) 0.94(2.19) 0.99(1.95) 0.98(1.62) A 1 C O P S S (BS) 0.44(3.00) 0.46(3.00) 0.52(3.00) 0.52(3.00) 0.53(3.00) A 1 C O P S S (SN) 0.67(3.00) 0.72(3.00) 0.78(3.00) 0.75(3.00) 0.78(3.00) COPSS(BS) 0.14 0.24 0.21 0.19 0.28 COPSS(SN) 0.46 0.46 0.58 0.59 0.66 Material. Since FDRseg and SMUCE are inapplicable for structural-break detection, our comparison focuses solely on OPTICS against COPSS and its related quasi-confidence sets. In T able 3, a notably sup erior p erformance of OPTICS is observed in terms of cov erage probabilit y , particularly when SN is used as the base algorithm. Imp ortan tly , COPSS fails to ac hieve consistency under these w eak signal settings, and the asso ciated quasi-confidence in terv als that match the cardinalities of OPTICS are not adequately expanded to ensure the cov erage rate. In contrast, OPTICS adaptively adjusts its cardinality to ac hiev e the desired confidence lev el, due to the nature of its originated testing framework. W e also include expanded simulation studies co v ering v ariance, netw ork, and co v ari- ance c hange-p oin ts, and m ultiple mean changes with heavy-tailed as well as m -dep endent distributions in Section S.5 of the Supplementary Material 6 Real Data Analysis In this section, we apply OPTICS to analyze the bladder tumor microarra y dataset sourced from the e cp R pack age (James and Matteson, 2015). W e extract data from 10 individuals diagnosed with bladder tumors. F or eac h individual, log-in tensit y-ratio mea- suremen ts for 2215 distinct genetic lo ci were collected. Our primary ob jective is to identify the num b er of change-points within the genetic lo ci, allo wing us to pinp oin t p otential influen tial genes asso ciated with bladder tumors. W e first demonstrate that the dataset may exhibit sub-Gaussian b eha vior, thereby 29 satisfying the main assumption of OPTICS. An equiv alen t condition for sub-Gaussianit y is that the k th momen t is b ounded by ck k/ 2 for some universal constan t c > 0. The left panel in Figure 2 presen ts the v alues of up to the 100th momen ts for 10 individuals, which are b ounded b y k k/ 2 (blac k curv e). This confirms that, at least for each individual, the data exhibit sub-Gaussian prop erties. Figure 2: Left Panel: The k th momen t plots of 10 individuals, where the blac k curve represen ts y = k k/ 2 . The y-axis is rescaled using log 10 . Righ t P anel: Bo xplots of Hausdorff distances b etw een the confidence sets obtained by OPTICS and those generated randomly . W e apply our OPTICS pro cedure for detecting multiple mean c hanges to the bladder tumor microarray dataset. This dataset is analyzed from tw o persp ectives. First, we apply OPTICS to each individual and construct the corresp onding confidence sets. Since all ten individuals share the same disease, their change-point patterns ma y exhibit similar structures. W e set K max = 3 log (10) and apply OPTICS to obtain A i for i ∈ [10], where A i represen ts the confidence set for the n umber of change-points for individual i . The concrete v alues within eac h set can b e found in Section S.6 in the Supplementary Material. T o verify whether A i for i ∈ [10] share similar patterns, we compute the pairwise Hausdorff distances b et w een these sets and compare them to randomly generated sets. Sp ecifically , w e construct B i for i ∈ [10], where B i is sampled from { 1 , . . . , K max } without replacemen t and has the same cardinality as A i . The righ t panel of Figure 2 presents the b o xplots of the distributions of Hausdorff distances for the OPTICS confidence sets and the randomly generated sets. It is evident that the OPTICS confidence sets hav e smaller Hausdorff distances, indicating greater consistency in detecting shared patterns across individuals. 30 This shows that the OPTICS has p o w er in co v ering the underlying true n um b er of c hange- p oin ts. W e further detect the change-points using the Binary Segmentation algorithm. T o con trol the family-wise error rate (FWER) (see the discussion in Section 2.3), w e select the smallest v alue in eac h A i as the optimal n um b er of change-points. The detected change- p oin ts, based on the corresp onding n umber of c hange-p oints, are presen ted in Figure 3. F rom the figure, it is eviden t that the n um b er of c hange-p oin ts c hosen b y OPTICS pro vides a go o d fit for eac h individual. Figure 3: Detected c hange-p oints for eac h individual separately . 31 W e no w p erform a join t analysis of the 10 individuals. Using OPTICS for the multi- dimensional mean c hange-p oints mo del, we construct the confidence set for the num b er of c hange-p oin ts, A joint = { 10 , 13 , 16 , 19 , 22 } . T o control the FWER, w e select the minimum v alue within A joint . The change-points are then detected using the Wild Binary Segmen ta- tion (WBS) algorithm, whic h is S joint = { 154 , 358 , 1140 , 1268 , 1534 , 1724 , 1906 , 1966 , 2052 , 2142 } , as sho wn in Figure 4. The plot illustrates that eac h dark red b old vertical line, iden tified via the OPTICS metho d, corresponds to a lo cation where a c hange o ccurs in at least one co or- dinate. Con versely , the ligh t red v ertical lines, represen ting the default WBS detections not selected by OPTICS, contain n umerous false p ositiv es. This confirms that OPTICS-based selection effectively controls the false disco very rate while preserving detection accuracy . Figure 4: Detected c hange-p oin ts for 10 individuals analyzed join tly . The b old dashed lines (dark red) represen t c hange-p oin ts selected b y the OPTICS FWER con trol criteria, while the solid light lines (ligh t red) indicate additional change-points iden tified by the default stopping criteria in the wbs R pack age. 32 7 Conclusion Determining the num b er of c hange-p oin ts is a fundamental problem in literature. Rather than offering a single-p oin t estimate, we propose a testing framework designed to construct a confidence set for the true n um b er of change-points. The proposed metho d, named OP- TICS, rigorously co vers the true num b er with the predetermined confidence level under mild conditions. Additionally , we delve into studying the cardinality of OPTICS to ensure the obtained set is nontrivial and informativ e. The cardinalit y and co v erage rate of OP- TICS can also b e utilized to assess the efficacy of base c hange-p oint detection algorithms. F urthermore, w e implemen t a multiple-splitting approach to stabilize OPTICS. W e also extend this framework to accommo date high-dimensional datasets and develop a robust v ersion capable of handling m -dep endent and heavy-tailed distributions. 8 Supplemen tary Materials The Supplementary Material provides pro ofs for Theorems 3.1, 3.2, and 3.3, along with an extended literature review on statistical inference for change-point detection. W e also include guidelines for selecting loss functions across v arious mo dels and expanded sim ulation studies cov ering v ariance, netw ork, and cov ariance c hange-p oin ts, m ultiple mean c hanges with hea vy-tailed as well as m -dependent distributions. Finally , additional real-data results are pro vided; the dataset used in the real data analysis is av ailable in the ecp R pack age. 33 Supplemen t to “OPTICS: Order-Preserv ed T est-In v erse Confidence Set for Num b er of Change-P oin ts” This supplementary material contains the all technique pro ofs, additional literature review, additional sim ulations and real data results of the main pap er. S.1 Pro ofs for Theorems in Section 3.1 In the follo wing pro ofs, we denote c and C as generic constants which may differ line to line. S.1.1 Auxiliary lemmas The first lemma gives a uniform concentration of sample means ¯ s O K,i for all i = 1 , . . . , n and K ∈ M . Let ∥ x ∥ 1 = P d j =1 | x j | for a v ector x ∈ R d . Lemma S.1. Under Conditions 3.1 and 3.2, ther e exists a p ositive c onstant c such that Pr max K ∈M max 1 ≤ i ≤ n ¯ s O K,i − E( ¯ s O K,i ) 1 ≳ n − 1 / 2 log( n ) ≲ n − c . Pr o of. Recall that s i = E( s i ) + ϵ i , ϵ i := s i − E( s i ) = ( ϵ i 1 , . . . , ϵ id ) ⊤ for i = 1 , . . . , n . By Condition 3.1(i), for each j = 1 , . . . , d , ∥ ϵ ij ∥ ψ 1 ≤ M 1 , i = 1 , . . . , n . Since w e do not consider the high-dimensional setting here, d is fixed. F or eac h candidate segmentation T K = { τ K 1 , . . . , τ K K } , the quan tit y ¯ s O K,i is the sample a v erage of { s l } ov er the segmen t con taining i . Hence, ¯ s O K,i − E( ¯ s O K,i ) = 1 b − a + 1 b X l = a ϵ l 34 for some subset { a, . . . , b } ⊂ { 1 , . . . , n } determined by K and i . By Condition 3.2, every suc h in terv al has length at least of order n/ log ( n ). Therefore, if we define Θ n := ( a, b ) : 1 ≤ a ≤ b ≤ n, b − a + 1 ≳ n log( n ) , then Pr max K ∈M max 1 ≤ i ≤ n ∥ ¯ s O K,i − E( ¯ s O K,i ) ∥ 1 ≥ x n ≤ Pr ( max ( a,b ) ∈ Θ n 1 b − a + 1 b X l = a ϵ l 1 ≥ x n ) . (S.1) No w fix ( a, b ) ∈ Θ n and let m = b − a + 1. Since 1 m b X l = a ϵ l 1 ≤ d X j =1 1 m b X l = a ϵ lj , w e ha ve, by the union b ound, Pr ( max ( a,b ) ∈ Θ n 1 b − a + 1 b X l = a ϵ l 1 ≥ x n ) ≤ X ( a,b ) ∈ Θ n Pr ( 1 b − a + 1 b X l = a ϵ l 1 ≥ x n ) ≤ X ( a,b ) ∈ Θ n d X j =1 Pr ( 1 b − a + 1 b X l = a ϵ lj ≥ x n d ) . (S.2) F or each fixed ( a, b ) ∈ Θ n and j ∈ { 1 , . . . , d } , the random v ariables ϵ aj , . . . , ϵ bj are indep enden t, mean zero, and uniformly sub-exp onen tial. Hence, by Bernstein’s inequality for sub-exp onen tial random v ariables, there exist p ositiv e constan ts c 1 and c 2 , dep ending only on M 1 , such that Pr ( 1 m b X l = a ϵ lj ≥ t ) ≤ 2 exp − c 1 m min { t 2 , t } , t > 0 . 35 Applying this b ound with t = x n /d and using m ≳ n/ log ( n ) for ( a, b ) ∈ Θ n , we obtain Pr ( 1 b − a + 1 b X l = a ϵ lj ≥ x n d ) ≲ exp − c 2 n log( n ) min { x 2 n , x n } . (S.3) Since | Θ n | ≤ n 2 and d is fixed, combining (S.1)–(S.3) yields Pr max K ∈M max 1 ≤ i ≤ n ∥ ¯ s O K,i − E( ¯ s O K,i ) ∥ 1 ≥ x n ≲ n 2 exp − c 2 n log( n ) min { x 2 n , x n } . No w c ho ose x n = An − 1 / 2 log( n ), for a sufficien tly large constan t A > 0. Then x n → 0, so for all sufficien tly large n , min { x 2 n , x n } = x 2 n = A 2 n − 1 log 2 ( n ), and therefore n log( n ) min { x 2 n , x n } = A 2 log( n ) . It follows that Pr max K ∈M max 1 ≤ i ≤ n ∥ ¯ s O K,i − E( ¯ s O K,i ) ∥ 1 ≥ An − 1 / 2 log( n ) ≲ n 2 − c 2 A 2 . By taking A sufficien tly large, w e obtain Pr max K ∈M max 1 ≤ i ≤ n ∥ ¯ s O K,i − E( ¯ s O K,i ) ∥ 1 ≳ n − 1 / 2 log( n ) ≲ n − c for some p ositiv e constan t c . This completes the pro of. Next lemma sho ws ( ξ ( i ) K,J − E[ ξ ( i ) K,J ]) /σ K,J has sub-exp onential tail under H 0 ,K to fulfill our main theorem. Lemma S.2. L et ξ ( i ) K,J := ∥ s E i − ¯ s O K,i ∥ 2 2 − ∥ s E i − ¯ s O J,i ∥ 2 2 , and let F O denote the σ -field gener ate d by the observe d sample use d to c onstruct { ¯ s O K,i : K ∈ M , 1 ≤ i ≤ n } . Define the c onditional 36 varianc e sc ale σ 2 K,J := 1 n P n i =1 V ar ξ ( i ) K,J | F O . Assume Conditions 3.1 and 3.2 hold. In addition, assume that for the p air ( K , J ) under c onsider ation, the dete cte d change-p oint sets T K and T J ar e neste d. Then, under H 0 ,K , on the event E n := max K ∈M max 1 ≤ i ≤ n ∥ ¯ s O K,i − E( ¯ s O K,i ) ∥ 1 ≤ cn − 1 / 2 log( n ) , (S.4) ther e exists a c onstant C > 0 such that max 1 ≤ i ≤ n ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) σ K,J ψ 1 |F O ≤ C p log( n ) . Conse quently, Pr max 1 ≤ i ≤ n ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) σ K,J ψ 1 |F O ≲ p log( n ) → 1 . Pr o of. By Lemma S.1, Pr( E n ) → 1. W e w ork on the even t E n throughout the pro of. F or fixed K , J ∈ M and 1 ≤ i ≤ n , expand ξ ( i ) K,J = ∥ s E i − ¯ s O K,i ∥ 2 2 − ∥ s E i − ¯ s O J,i ∥ 2 2 = ∥ ¯ s O K,i ∥ 2 2 − ∥ ¯ s O J,i ∥ 2 2 − 2( s E i ) ⊤ ( ¯ s O K,i − ¯ s O J,i ) = ( ¯ s O K,i + ¯ s O J,i ) ⊤ ∆ K,J,i − 2( s E i ) ⊤ ∆ K,J,i , where ∆ K,J,i := ¯ s O K,i − ¯ s O J,i . Conditioning on F O , the v ectors ¯ s O K,i , ¯ s O J,i , and hence ∆ K,J,i are deterministic. Therefore, ξ ( i ) K,J − E ξ ( i ) K,J | F O = − 2 n ( s E i ) ⊤ ∆ K,J,i − E ( s E i ) ⊤ ∆ K,J,i | F O o . 37 Since ¯ s O K,i and ¯ s O J,i are F O -measurable, it follo ws from Condition 3.1(i) and the basic prop- erties of the Orlicz norm that ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) ψ 1 |F O ≲ ∥ ∆ K,J,i ∥ 2 . (S.5) Indeed, writing ∆ K,J,i = ( δ i 1 , . . . , δ id ) ⊤ and using that d is fixed, ( s E i ) ⊤ ∆ K,J,i ψ 1 |F O ≤ d X j =1 | δ ij | ∥ s E ij ∥ ψ 1 ≲ ∥ ∆ K,J,i ∥ 1 ≲ ∥ ∆ K,J,i ∥ 2 . Next, we study the denominator. Since ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) = − 2 n ( s E i ) ⊤ ∆ K,J,i − E ( s E i ) ⊤ ∆ K,J,i | F O o , w e ha ve V ar ξ ( i ) K,J | F O = 4 ∆ ⊤ K,J,i V ar( s E i )∆ K,J,i . By Condition 3.1(iii), there exists a constant c 0 > 0 such that V ar ξ ( i ) K,J | F O ≥ c 0 ∥ ∆ K,J,i ∥ 2 2 . (S.6) Hence, σ 2 K,J = 1 n n X i =1 V ar ξ ( i ) K,J | F O ≳ 1 n n X i =1 ∥ ∆ K,J,i ∥ 2 2 . No w use the additional nesting assumption on T K and T J . Since the t wo segmen tations are nested, the vector sequence { ∆ K,J,i } n i =1 is piecewise constan t on the finer partition, and 38 eac h constan t blo c k has length at least of order n/ log ( n ) by Condition 3.2. Therefore, if M K,J := max 1 ≤ i ≤ n ∥ ∆ K,J,i ∥ 2 , then there are at least c 1 n/ log( n ) indices i suc h that ∥ ∆ K,J,i ∥ 2 = M K,J for some constan t c 1 > 0. It follo ws that 1 n n X i =1 ∥ ∆ K,J,i ∥ 2 2 ≳ 1 log( n ) M 2 K,J . (S.7) Com bining this with (S.6), we obtain σ K,J ≳ 1 p log( n ) M K,J . (S.8) Finally , com bining (S.5) and (S.8) yields, for ev ery 1 ≤ i ≤ n , ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) σ K,J ψ 1 |F O ≲ ∥ ∆ K,J,i ∥ 2 M K,J / p log( n ) ≤ C p log( n ) for some constan t C > 0. T aking the maxim um ov er 1 ≤ i ≤ n gives max 1 ≤ i ≤ n ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) σ K,J ψ 1 |F O ≤ C p log( n ) on E n . Since Pr( E n ) → 1, the conclusion follo ws. Lemma S.3. L et ˆ δ K,J := 1 n P n i =1 ξ ( i ) K,J , δ K,J := 1 n P n i =1 E ξ ( i ) K,J | F O , wher e F O denotes the σ -field gener ate d by the observe d sample use d to c onstruct { ¯ s O K,i : K ∈ M , 1 ≤ i ≤ n } . Under Conditions 3.1 and 3.2, and assume that for the p air ( K , J ) under c onsider ation, the dete cte d change-p oint sets T K and T J ar e neste d, ther e exists a p ositive c onstant c such 39 that Pr ( max J = K ˆ δ K,J − δ K,J σ K,J ≳ n − 1 / 2 log( n ) ) ≲ n − c , wher e σ 2 K,J := 1 n P n i =1 V ar ξ ( i ) K,J | F O for J ∈ M\{ K } . Pr o of. Recall that b y Lemma S.1, Pr( E c n ) ≲ n − c 1 for some p ositiv e constant c 1 , where E n is defined in (S.4). Fix J = K , and define Z i,J := ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) σ K,J , i = 1 , . . . , n. Then ˆ δ K,J − δ K,J σ K,J = 1 n n X i =1 Z i,J . Conditional on F O , the random v ariables Z 1 ,J , . . . , Z n,J are independent and satisfy E( Z i,J | F O ) = 0 for i = 1 , . . . , n . Moreov er, b y Lemma S.2, on the ev ent E n , max 1 ≤ i ≤ n ∥ Z i,J ∥ ψ 1 |F O ≲ p log( n ). Therefore, conditional on F O and on the even t E n , Bernstein’s inequality for indep en- den t cen tered sub-exp onen tial random v ariables yields Pr ( 1 n n X i =1 Z i,J ≥ x n F O ) ≤ 2 exp " − c 2 n min ( x 2 n log( n ) , x n p log( n ) )# for some p ositiv e constan t c 2 . Hence, again on E n , Pr ( ˆ δ K,J − δ K,J σ K,J ≥ x n F O ) ≤ 2 exp " − c 2 n min ( x 2 n log( n ) , x n p log( n ) )# . 40 Applying the union bound ov er J ∈ M\{ K } , w e obtain on E n , Pr ( max J = K ˆ δ K,J − δ K,J σ K,J ≥ x n F O ) ≤ 2 |M| exp " − c 2 n min ( x 2 n log( n ) , x n p log( n ) )# . Since |M| ≤ K max , it follo ws that Pr ( max J = K ˆ δ K,J − δ K,J σ K,J ≥ x n , E n ) ≤ 2 K max exp " − c 2 n min ( x 2 n log( n ) , x n p log( n ) )# . No w c ho ose x n = An − 1 / 2 log( n ), for a sufficien tly large constant A > 0. Then n x 2 n log( n ) = A 2 log( n ) , n x n p log( n ) = A p n log( n ) . Hence, for all sufficien tly large n , n min ( x 2 n log( n ) , x n p log( n ) ) = A 2 log( n ) , and therefore Pr ( max J = K ˆ δ K,J − δ K,J σ K,J ≥ An − 1 / 2 log( n ) , E n ) ≤ 2 K max n − c 2 A 2 . If K max is fixed or gro ws at most p olynomially in n , then by taking A sufficiently large, 2 K max n − c 2 A 2 ≲ n − c 3 for some p ositiv e constan t c 3 . Finally , Pr ( max J = K ˆ δ K,J − δ K,J σ K,J ≥ An − 1 / 2 log( n ) ) ≤ Pr( E c n ) + Pr ( max J = K ˆ δ K,J − δ K,J σ K,J ≥ An − 1 / 2 log( n ) , E n ) ≲ n − c 41 for some p ositiv e constan t c . This completes the pro of. Let σ 2 K,J = P n i =1 E[( ξ ( i ) K,J ) 2 ] /n and ˆ δ K,J = P n i =1 ξ ( i ) K,J /n for J ∈ M\{ K } . The sample second momen t matrix is defined as b Γ ( K ) = 1 n P n i =1 ξ ( i ) K,J ξ ( i ) K,J ′ J,J ′ ∈ R ( |M|− 1) × ( |M|− 1) with J, J ′ ∈ M\{ K } and its p opulation analog Γ ( K ) = 1 n P n i =1 E( ξ ( i ) K,J ξ ( i ) K,J ′ | F O ) J,J ′ , where F O denotes the σ -field generated b y the observ ed sample used to construct { ¯ s O K,i : K ∈ M , 1 ≤ i ≤ n } . Besides, let D ( K ) = diag(Γ ( K ) ) 1 / 2 and b D ( K ) = diag( b Γ ( K ) ) 1 / 2 . F urthermore, let b H ( K ) = ( b D ( K ) ) − 1 b Γ ( K ) ( b D ( K ) ) − 1 and H ( K ) = ( D ( K ) ) − 1 Γ ( K ) ( D ( K ) ) − 1 . Lemma S.4. Assume Conditions 3.1 and 3.2 hold. In addition, assume the nesting c ondi- tion in L emma S.2, and that ther e exists a c onstant c 0 > 0 such that min J ∈M\{ K } D ( K ) J J ≥ c 0 with pr ob ability tending to one. If K max ≍ log ( n ) , then ther e exists a p ositive c onstant c such that Pr max J,J ′ ∈M\{ K } b H ( K ) − H ( K ) J,J ′ ≳ n − 1 / 2 log 3 / 2 ( n ) ≲ n − c . Pr o of. Recall that b y Lemma S.1, Pr( E c n ) ≲ n − c 1 for some c 1 > 0, where E n is defined in S.4. W e w ork on the ev en t E n throughout the pro of. F or J ∈ M\{ K } , define Z i,J := ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) D ( K ) J J , i = 1 , . . . , n. By Lemma S.2, on E n , max J ∈M\{ K } max 1 ≤ i ≤ n ∥ Z i,J ∥ ψ 1 |F O ≲ p log( n ). Hence, for eac h pair ( J, J ′ ), U i,J J ′ := Z i,J Z i,J ′ − E( Z i,J Z i,J ′ | F O ) is conditionally cen tered and satisfies max J,J ′ max 1 ≤ i ≤ n ∥ U i,J J ′ ∥ ψ 1 / 2 |F O ≲ log ( n ), b ecause the pro duct of tw o sub-exp onen tial random v ariables is sub-W eibull of order 1 / 2. 42 No w note that b Γ ( K ) J J ′ − Γ ( K ) J J ′ D ( K ) J J D ( K ) J ′ J ′ = 1 n n X i =1 " ξ ( i ) K,J ξ ( i ) K,J ′ D ( K ) J J D ( K ) J ′ J ′ − E ξ ( i ) K,J ξ ( i ) K,J ′ D ( K ) J J D ( K ) J ′ J ′ F O !# = 1 n n X i =1 U i,J J ′ + R J J ′ , where R J J ′ = 1 n n X i =1 E( ξ ( i ) K,J | F O ) E( ξ ( i ) K,J ′ | F O ) D ( K ) J J D ( K ) J ′ J ′ . By the pro of of Lemma S.2, on E n , max J max 1 ≤ i ≤ n E( ξ ( i ) K,J | F O ) D ( K ) J J ≲ p log( n ) , so | R J J ′ | ≲ log( n ) uniformly in ( J, J ′ ). Since R J J ′ app ears in b oth b Γ ( K ) J J ′ and Γ ( K ) J J ′ , it cancels in the cen tered difference ab ov e. Therefore it suffices to con trol the av erage of { U i,J J ′ } n i =1 . Conditional on F O , the v ariables U 1 ,J J ′ , . . . , U n,J J ′ , are indep enden t, centered, and uni- formly sub-W eibull of order 1 / 2 with parameter of order log ( n ). Hence, Bernstein’s in- equalit y for sub-W eibull random v ariables yields Pr ( 1 n n X i =1 U i,J J ′ ≥ t F O ) ≤ 2 exp " − c 2 min ( nt 2 log 2 ( n ) , nt log( n ) 1 / 2 )# for some c 2 > 0. Applying the union bound o ver all ( J , J ′ ) ∈ ( M\{ K } ) 2 , w e obtain on E n , Pr ( max J,J ′ b Γ ( K ) J J ′ − Γ ( K ) J J ′ D ( K ) J J D ( K ) J ′ J ′ ≥ t F O ) ≤ 2( K max − 1) 2 exp " − c 2 min ( nt 2 log 2 ( n ) , nt log( n ) 1 / 2 )# . (S.9) 43 Cho ose t = An − 1 / 2 log 3 / 2 ( n ), for a sufficien tly large constan t A > 0. Then nt 2 log 2 ( n ) = A 2 log( n ) , nt log( n ) 1 / 2 = A 1 / 2 n 1 / 4 log 1 / 4 ( n ) , so the first term determines the rate for all sufficiently large n . Since K max is fixed or gro ws at most p olynomially in n , (S.9) implies that Pr ( max J,J ′ b Γ ( K ) J J ′ − Γ ( K ) J J ′ D ( K ) J J D ( K ) J ′ J ′ ≳ n − 1 / 2 log 3 / 2 ( n ) , E n ) ≲ n − c 3 (S.10) for some c 3 > 0. Next, for eac h J , b D ( K ) J J = b Γ ( K ) J J 1 / 2 , and D ( K ) J J = Γ ( K ) J J 1 / 2 . By (S.10) with J = J ′ , max J b Γ ( K ) J J − Γ ( K ) J J ≲ n − 1 / 2 log 3 / 2 ( n ) with probability at least 1 − O ( n − c 3 ) on E n . Since min J D ( K ) J J ≥ c 0 > 0 with probabilit y tending to one, the map x 7→ x 1 / 2 is Lipschitz on a neighborho o d of { Γ ( K ) J J } J , and therefore max J b D ( K ) J J − D ( K ) J J ≲ n − 1 / 2 log 3 / 2 ( n ) with probability at least 1 − O ( n − c 4 ) on E n . Consequently , max J ( b D ( K ) ) − 1 J J − ( D ( K ) ) − 1 J J ≲ n − 1 / 2 log 3 / 2 ( n ) with probability at least 1 − O ( n − c 4 ) on E n . 44 Finally , write b H ( K ) − H ( K ) = ( b D ( K ) ) − 1 b Γ ( K ) ( b D ( K ) ) − 1 − ( D ( K ) ) − 1 Γ ( K ) ( D ( K ) ) − 1 = ( b D ( K ) ) − 1 − ( D ( K ) ) − 1 b Γ ( K ) ( b D ( K ) ) − 1 + ( D ( K ) ) − 1 b Γ ( K ) − Γ ( K ) ( b D ( K ) ) − 1 + ( D ( K ) ) − 1 Γ ( K ) ( b D ( K ) ) − 1 − ( D ( K ) ) − 1 . Because ( K max − 1) is finite or p olynomially gro wing, and all diagonal en tries of D ( K ) are b ounded aw ay from zero with probability tending to one, eac h term on the right-hand side is of order O P n − 1 / 2 log 3 / 2 ( n ) in the elemen twise maxim um norm. Therefore, Pr max J,J ′ b H ( K ) − H ( K ) J,J ′ ≳ n − 1 / 2 log 3 / 2 ( n ) ≲ Pr( E c n ) + n − c 5 ≲ n − c for some p ositiv e constan t c . This completes the pro of. S.1.2 Pro of of Theorem 3.1 W e ignore the Monte Carlo v ariabilit y from b o otstrap resampling and regard ˆ p K as the limiting b o otstrap p -v alue when the b o otstrap sample size B → ∞ . Let γ n = n − 1 / 2 log 3 / 2 ( n ) and η n = n − 1 / 2 log( n ). Define the even t F n = ( ∥ b H ( K ) − H ( K ) ∥ ∞ ≤ C 1 γ n , max J ∈M\{ K } ˆ δ K,J − δ K,J D ( K ) J J ≤ C 1 η n ) . 45 By Lemma S.3 and Lemma S.4, there exists a constant c 1 > 0 such that Pr( F c n ) ≲ n − c 1 . Moreo v er, b y the diagonal part of Lemma S.4 and the assumption min J ∈M\{ K } D ( K ) J J ≥ c 0 > 0 with probabilit y tending to one, we hav e on F n , max J ∈M\{ K } b D ( K ) J J D ( K ) J J − 1 ≲ γ n , max J ∈M\{ K } D ( K ) J J b D ( K ) J J − 1 ≲ γ n . Let G ( K ) = 1 n P n i =1 Co v ξ ( i ) K,J , ξ ( i ) K,J ′ | F O J,J ′ ∈M\{ K } denote the conditional cov ariance matrix, and define its normalized coun terpart F ( K ) = ( D ( K ) ) − 1 G ( K ) ( D ( K ) ) − 1 . Let T ∗ ( A ) denote a cen tered Gaussian vector with co v ariance matrix A , and let z max ( α, A ) b e the upp er α -quantile of ∥ T ∗ ( A ) ∥ ∞ . W e first prov e part (1). On F n , √ n max J ∈M\{ K } ˆ δ K,J b D ( K ) J J ≤ √ n max J ∈M\{ K } ˆ δ K,J − δ K,J D ( K ) J J D ( K ) J J b D ( K ) J J ! + √ n max J ∈M\{ K } δ K,J b D ( K ) J J ≤ √ n max J ∈M\{ K } ˆ δ K,J − δ K,J D ( K ) J J + C 2 √ n η n γ n + √ n max J ∈M\{ K } δ K,J D ( K ) J J (1 + C 2 γ n ) for some constan t C 2 > 0. Under the appro ximate null assumption, max J ∈M\{ K } δ K,J D ( K ) J J ≤ x n ( n log n ) − 1 / 2 where x n = o (1), and therefore √ n max J ∈M\{ K } δ K,J D ( K ) J J = o (log n ) − 1 / 2 = o (1) . 46 Also, √ n η n γ n = √ n · n − 1 / 2 log( n ) · n − 1 / 2 log 3 / 2 ( n ) = n − 1 / 2 log 5 / 2 ( n ) = o (1) . Hence the previous displa y implies √ n max J ∈M\{ K } ˆ δ K,J b D ( K ) J J ≤ √ n max J ∈M\{ K } ˆ δ K,J − δ K,J D ( K ) J J + o (1) uniformly on F n . No w define the cen tered normalized sum v ector S K = n − 1 / 2 n X i =1 ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) D ( K ) J J ! J ∈M\{ K } . Conditional on F O , Lemma S.2 pro vides the required tail control, and a Gaussian appro x- imation for maxima of sums of indep endent random vectors yields sup t ∈ R Pr ∥ S K ∥ ∞ ≤ t | F O − Pr ∥ T ∗ ( F ( K ) ) ∥ ∞ ≤ t | F O = o (1) . Consequen tly , for some sequence ϖ n → 0, Pr ( √ n max J ∈M\{ K } ˆ δ K,J − δ K,J D ( K ) J J ≥ z max ( α + ϖ n , F ( K ) ) ) ≤ α + ϖ n + o (1) . Next, note that Γ ( K ) = G ( K ) + µ ( K ) ( µ ( K ) ) ⊤ , where µ ( K ) = δ K,J D ( K ) J J J ∈M\{ K } , so H ( K ) − F ( K ) = µ ( K ) ( µ ( K ) ) ⊤ . Under the approximate n ull assumption, ∥ µ ( K ) ∥ ∞ ≤ x n ( n log n ) − 1 / 2 = o (1). Therefore, 47 ∥ H ( K ) − F ( K ) ∥ ∞ = o (1). By the con tin uit y of the Gaussian max quan tiles with resp ect to the cov ariance matrix en tries, this implies z max ( α + ϖ n , H ( K ) ) ≥ z max ( α + ϖ n , F ( K ) ) − o (1) . F urthermore, on F n , we hav e ∥ b H ( K ) − H ( K ) ∥ ∞ ≤ C 1 γ n = o (1), and therefore z max ( α, b H ( K ) ) ≥ z max ( α + ϖ n , H ( K ) ) − o (1) for a p ossibly different sequence ϖ n → 0. Com bining the previous displa ys, we obtain Pr { ˆ p K ≤ α } = Pr ( √ n max J ∈M\{ K } ˆ δ K,J b D ( K ) J J ≥ z max ( α, b H ( K ) ) ) ≤ α + o (1) . Equiv alen tly , Pr { H 0 ,K is not rejected at level α } ≥ 1 − α + o (1) . W e no w prov e part (2). Supp ose there exists some J ∗ ∈ M\{ K } suc h that δ K,J ∗ D ( K ) J ∗ J ∗ ≥ c n − 1 / 2 log( n ) for a sufficien tly large constant c > 0. On F n , ˆ δ K,J ∗ b D ( K ) J ∗ J ∗ = δ K,J ∗ D ( K ) J ∗ J ∗ + ˆ δ K,J ∗ − δ K,J ∗ D ( K ) J ∗ J ∗ ! D ( K ) J ∗ J ∗ b D ( K ) J ∗ J ∗ . 48 Hence, on F n , ˆ δ K,J ∗ b D ( K ) J ∗ J ∗ ≥ c n − 1 / 2 log( n ) − C 1 n − 1 / 2 log( n ) (1 − C 2 γ n ) . By choosing c sufficien tly large, w e obtain for all sufficien tly large n , ˆ δ K,J ∗ b D ( K ) J ∗ J ∗ ≥ c 2 n − 1 / 2 log( n ) for some constan t c 2 > 0. Therefore, √ n max J ∈M\{ K } ˆ δ K,J b D ( K ) J J ≥ c 2 log( n ) on F n . On the other hand, since α ≥ n − 1 and K max ≍ log( n ), a union b ound together with Mills’ inequality implies z max ( α, b H ( K ) ) ≲ p log( K max ) + log( α − 1 ) ≲ p log( n ) with probability tending to one. Consequen tly , on F n and for all sufficien tly large n , √ n max J ∈M\{ K } ˆ δ K,J b D ( K ) J J > z max ( α, b H ( K ) ) . Th us, Pr { ˆ p K ≤ α } ≥ Pr( F n ) − o (1) = 1 − o (1) , 49 whic h is equiv alent to Pr { H 0 ,K is not rejected at level α } = o (1) . This completes the proof. S.2 Pro ofs for Theorems in Section 3.2 Lemma S.1. Supp ose that Conditions 3.1-3.5 hold. Then, (i) max K ∈M E {S ϵ E ( T K ) − S ϵ E ( T K ∪ T ∗ ) } = o M 1 log log ¯ λ . (ii) max K ∈M E {S ϵ E ( T ∗ ) − S ϵ E ( T K ∪ T ∗ ) } = o M 1 log log ¯ λ . Pr o of. F or a fixed K , let m k b e the n umber of true change-points strictly b et ween τ K k and τ K k +1 . W e denote the merged partition p oints within this interv al as τ K k, 0 := τ K k < τ K k, 1 < . . . < τ K k,m k < τ K k,m k +1 := τ K k +1 . Giv en Z O , T K is a fixed change-point set. According to Lemma 14 in P ein and Shah (2025), w e hav e E {S ϵ E ( T K ) − S ϵ E ( T K ∪ T ∗ ) } ≤ 2 K X k =0 m k X l =0 τ K k,l +1 − τ K k,l E ¯ ϵ τ K k,l : τ K k,l +1 2 . Based on the independence of the errors ϵ i , for eac h term we ha ve τ K k,l +1 − τ K k,l E ¯ ϵ τ K k,l : τ K k,l +1 2 = τ K k,l +1 − τ K k,l − 1 τ K k,l +1 X i = τ K k,l +1 E[ ϵ 2 i ] ≤ M 1 . Note that the total num b er of segmen ts in the merged partition T K ∪ T ∗ is exactly P K k =0 ( m k + 1) ≤ K + K ∗ + 1 ≤ 2 K max + 1. Hence, w e hav e E {S ϵ E ( T K ) − S ϵ E ( T K ∪ T ∗ ) } ≤ 2 M 1 (2 K max + 1) ≲ M 1 K max . 50 Therefore, max K ∈M E {S ϵ E ( T K ) − S ϵ E ( T K ∪ T ∗ ) } ≲ M 1 K max . By assuming K max = o (log log ¯ λ ) in Condition 3.3 (iii), w e obtain M 1 K max = o ( M 1 log log ¯ λ ), whic h implies statemen t (i). The exact same deduction applies to the true c hange-p oin t set T ∗ , yielding statement (ii), whic h w e omit here. Let T ∗ b e the true change p ositions, T K ∗ b e the estimated change p ositions when the n um b er of change-points is correctly sp ecified. Let T ∗ b e the true c hange p ositions, and T K ∗ b e the estimated c hange p ositions when the num b er of change-points is correctly sp ecified. Lemma S.2. Supp ose that Conditions 3.1-3.5 hold. Then (i) min K ∈M l E {S s E ( T K ) − S s E ( T ∗ ) } ≥ λM 3 min K ∈M l P τ ∗ k ∈I ∗ lK ∆ 2 k . (ii) max K ∈M E {S s E ( T ∗ ) − S s E ( T K ∪ T ∗ ) } = o M 1 log log ¯ λ . (iii) E {S s E ( T K ∗ ) − S s E ( T ∗ ) } = o M 1 log log ¯ λ . Pr o of. The pro of of this lemma is inspired by Lemma 17 in P ein and Shah (2025). F or statemen t (ii), we observ e that S s E ( T ∗ ) − S s E ( T K ∪ T ∗ ) = S ϵ E ( T ∗ ) − S ϵ E ( T K ∪ T ∗ ), since the true signal µ i is constant on both partitions and cancels out. Hence, statemen t (ii) directly follows from Lemma S.1 (ii). No w, w e sho w (iii). By definition, S s E ( T K ∗ ) = K ∗ X k =0 τ K ∗ k +1 X i = τ K ∗ k +1 s E i − ¯ s E τ K ∗ k : τ K ∗ k +1 2 = K ∗ X k =0 τ K ∗ k +1 X i = τ K ∗ k +1 µ i + ϵ E i − ¯ µ τ K ∗ k : τ K ∗ k +1 − ¯ ϵ E τ K ∗ k : τ K ∗ k +1 2 = K ∗ X k =0 τ K ∗ k +1 X i = τ K ∗ k +1 µ i − ¯ µ τ K ∗ k : τ K ∗ k +1 2 + 2 K ∗ X k =0 τ K ∗ k +1 X i = τ K ∗ k +1 ϵ E i µ i − ¯ µ τ K ∗ k : τ K ∗ k +1 + n X i =1 ( ϵ E i ) 2 − K ∗ X k =0 τ K ∗ k +1 − τ K ∗ k (¯ ϵ E τ K ∗ k : τ K ∗ k +1 ) 2 . (S.1) 51 Similarly , for S s E ( T ∗ ), S s E ( T ∗ ) = K ∗ X k =0 τ ∗ k +1 X i = τ ∗ k +1 s E i − ¯ s E τ ∗ k : τ ∗ k +1 2 = K ∗ X k =0 τ ∗ k +1 X i = τ ∗ k +1 ϵ E i − ¯ ϵ E τ ∗ k : τ ∗ k +1 2 = n X i =1 ( ϵ E i ) 2 − K ∗ X k =0 τ ∗ k +1 − τ ∗ k (¯ ϵ E τ ∗ k : τ ∗ k +1 ) 2 . The exp ectation of their difference is E {S s E ( T K ∗ ) − S s E ( T ∗ ) } = K ∗ X k =0 τ K ∗ k +1 X i = τ K ∗ k +1 µ i − ¯ µ τ K ∗ k : τ K ∗ k +1 2 − K ∗ X k =0 τ K ∗ k +1 − τ K ∗ k E ¯ ϵ E τ K ∗ k : τ K ∗ k +1 2 + K ∗ X k =0 τ ∗ k +1 − τ ∗ k E ¯ ϵ E τ ∗ k : τ ∗ k +1 2 := A 1 − A 2 + A 3 . F or A 1 , according to the deduction in Lemma 17 in P ein and Shah (2025), the first term is A 1 = O P K ∗ k =1 b n ∆ 2 k . F or A 2 , we notice that by Condition 3.1, A 2 = K ∗ X k =0 τ K ∗ k +1 − τ K ∗ k − 1 τ K ∗ k +1 X i = τ K ∗ k +1 E( ϵ 2 i ) ≲ M 1 K ∗ . The exact same reasoning gives A 3 ≲ M 1 K ∗ . Hence, E {S s E ( T K ∗ ) − S s E ( T ∗ ) } ≲ K ∗ X k =1 b n ∆ 2 k + M 1 K ∗ . Statemen t (iii) then follows from Condition 3.4 (i), whic h guaran tees P K ∗ k =1 b n ∆ 2 k = o ( M 1 log log ¯ λ ), and Condition 3.3 (i), which ensures K ∗ = o (log log ¯ λ ). W e no w sho w statemen t (i). Based on a similar deduction as (S.1), we can decomp ose 52 the difference directly as: E {S s E ( T K ) − S s E ( T ∗ ) } = K X k =0 τ K k +1 X i = τ K k +1 µ i − ¯ µ τ K k : τ K k +1 2 + E {S ϵ E ( T K ) − S ϵ E ( T ∗ ) } . F or the noise comp onen t, Lemma S.1 yields max K ∈M l | E {S ϵ E ( T K ) − S ϵ E ( T ∗ ) }| ≤ max K ∈M l E {S ϵ E ( T K ) − S ϵ E ( T K ∪ T ∗ ) } + max K ∈M l E {S ϵ E ( T ∗ ) − S ϵ E ( T K ∪ T ∗ ) } = o ( M 1 log log ¯ λ ) . (S.2) F or the signal component, since K ∈ M l is a lac k-of-fit model, Condition 3.4 (ii) guaran tees there exists a subset of c hange-p oin ts τ ∗ k ∈ I ∗ lK suc h that there is no estimated c hange-p oint within [ τ ∗ k − λ / 4 , τ ∗ k + λ/ 4]. Hence, using Condition 3.1, min K ∈M l K X k =0 τ K k +1 X i = τ K k +1 µ i − ¯ µ τ K k : τ K k +1 2 ≥ min K ∈M l X τ ∗ k ∈I ∗ lK τ ∗ k + λ 4 X i = τ ∗ k − λ 4 +1 µ i − ¯ µ τ K k : τ K k +1 2 ≥ min K ∈M l M 3 λ X τ ∗ k ∈I ∗ lK ∆ 2 k . Summarizing the ab o v e results, we ha ve min K ∈M l E {S s E ( T K ) − S s E ( T ∗ ) } ≥ min K ∈M l M 3 λ X τ ∗ k ∈I ∗ lK ∆ 2 k − o ( M 1 log log ¯ λ ) . By Condition 3.5, the right-hand side is strictly dominated by the p ositive first term. Hence, statement (i) concludes. 53 S.2.1 Pro of of Theorem 3.2 No w, we fo cus on the main theorem. Our strategy is to apply Theorem 3.1, which requires showing that max K ∈M\{ K ∗ } δ K ∗ ,K σ K ∗ ,K ≤ x n s 1 n log( n ) , for sufficiently large n . Recall that σ 2 K ∗ ,K can b e lo w er b ounded as follo ws: σ 2 K ∗ ,K ≥ 1 n n X i =1 V ar ∥ s E i − ¯ s O K ∗ ,i ∥ 2 2 − ∥ s E i − ¯ s O K,i ∥ 2 2 = 1 n n X i =1 V ar ∥ ¯ s O K ∗ ,i ∥ 2 2 − ∥ ¯ s O K,i ∥ 2 2 − 2( s E i ) ⊤ ( ¯ s O K ∗ ,i − ¯ s O K,i ) = 4 n n X i =1 ( ¯ s O K ∗ ,i − ¯ s O K,i ) ⊤ V ar s E i ( ¯ s O K ∗ ,i − ¯ s O K,i ) . Hence, for any K = K ∗ , w e hav e σ 2 K ∗ ,K ≥ 4 M 2 n P n i =1 ∥ ¯ s O K ∗ ,i − ¯ s O K,i ∥ 2 2 . By the assumptions stated in the theorem, min K = K ∗ max i =1 ,...,n ∥ ¯ s O K ∗ ,i − ¯ s O K,i ∥ 2 2 ≳ ∆ 2 ( K ∗ ) with asymptotic proba- bilit y 1. F urthermore, b ecause the estimated c hange-p oin ts set is nested and the distance b et w een adjoining c hange-p oints is at least cn/ log( n ), summing this maxim um discrep- ancy ov er the fraction of p oin ts n/ log ( n ) divided by the total n yields a factor of 1 / log( n ). Th us, σ 2 K ∗ ,K ≳ M 2 log − 1 ( n )∆ 2 ( K ∗ ) , which strictly implies: σ K ∗ ,K ≳ p M 2 ∆ ( K ∗ ) 1 p log( n ) , for sufficiently large n . Because δ K ∗ ,K = − δ K,K ∗ , obtaining an upp er b ound for the ratio δ K ∗ ,K /σ K ∗ ,K is equiv- alen t to b ounding δ K,K ∗ from b elow. Therefore, using our b ound on σ K ∗ ,K , it suffices to 54 sho w that min K ∈M\{ K ∗ } δ K,K ∗ ≳ − p M 2 x n ∆ ( K ∗ ) 1 q n log 2 ( n ) , (S.3) with asymptotic probability 1. Let S x,y ( T K ) = P K k =0 P τ k +1 i = τ k +1 ( x i − ¯ x τ k ,τ k +1 )( y i − ¯ y τ k ,τ k +1 ). T aking the exp ectation E E [ · ] with resp ect to the even sample (conditional on the o dd sample so that T K is fixed), algebraic expansion yields: nδ K,K ∗ = E E {S s E ( T K ) − S s E ( T K ∗ ) } − {S ϵ O ( T K ) − S ϵ O ( T K ∗ ) } − E E {S ϵ E ( T K ) − S ϵ E ( T K ∗ ) } + 2 E E S ϵ O ,ϵ E ( T K ) − S ϵ O ,ϵ E ( T K ∗ ) := A K − B K − C K + 2 D K . In the following, we aim to b ound A K , B K , C K , and D K suc h that (S.3) holds. W e consider t w o cases: lack-of-fit K ∈ M l (con trolled primarily via the minim um jump ∆ (1) ) and ov er-fit K ∈ M o (con trolled via the maxim um jump ∆ ( K ∗ ) ). Step 1: (Lac k-of-fit scenario). First, we fo cus on K ∈ M l . F or the signal term A K , w e decompose it as: min K ∈M l A K = min K ∈M l E E {S s E ( T K ) − S s E ( T ∗ ) } − E E {S s E ( T K ∗ ) − S s E ( T ∗ ) } . Based on Lemma S.2 (i) and Condition 3.5 (which ensures sufficient signal strength via the minim um jump size ∆ (1) ), min K ∈M l E E {S s E ( T K ) − S s E ( T ∗ ) } ≥ λM 3 min K ∈M l X k ∈I ∗ lK ∆ 2 k . 55 The second term is b ounded b y o ( M 1 log log ¯ λ ) according to Lemma S.2 (iii). Hence, min K ∈M l A K ≥ λM 3 min K ∈M l X k ∈I ∗ lK ∆ 2 k − o ( M 1 log log ¯ λ ) . T o low er b ound − B K , we b ound the maxim um absolute deviation of B K using the triangle inequalit y o ver the merged partition T K ∪ T ∗ : max K ∈M l | B K | ≤ max K ∈M l |S ϵ O ( T K ) − S ϵ O ( T K ∪ T ∗ ) | + max K ∈M l |S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) | + |S ϵ O ( T ∗ ) − S ϵ O ( T K ∗ ∪ T ∗ ) | + |S ϵ O ( T K ∗ ) − S ϵ O ( T K ∗ ∪ T ∗ ) | . It follows from Lemma 18 in Pein and Shah (2025) that max K ∈M l | B K | = O P ( M 1 K ∗ (log ¯ λ ) 2 ). F or the third term, Lemma S.2 yields max K ∈M l | C K | = o ( M 1 log log ¯ λ ). F or the cross term D K , applying a similar triangle inequality expansion and Lemma 15 in P ein and Shah (2025), we hav e max K ∈M l | D K | ≤ o ( M 1 log log ¯ λ ) + O P ( K ∗ M 1 (log ¯ λ ) 2 ). Noting that M 1 log log ¯ λ ≲ M 1 K ∗ (log ¯ λ ) 2 , we combine the ab ov e b ounds to obtain: nδ K,K ∗ ≳ λM 3 min K ∈M l X k ∈I ∗ lK ∆ 2 k − M 1 K ∗ (log ¯ λ ) 2 , (S.4) with asymptotic probabilit y 1 as n → ∞ . According to Condition 3.5, the signal term strictly dominates the noise term, guaran teeing that nδ K,K ∗ > 0. Because δ K,K ∗ > 0 implies δ K ∗ ,K < 0, the ratio δ K ∗ ,K /σ K ∗ ,K is strictly negativ e. Thus, under-fit mo dels trivially satisfy the approximate-n ull upp er b ound requiremen t of Theorem 3.1. 56 Step 2: (Over-fit scenario). Next, we consider K ∈ M o . F or the signal term A K , min K ∈M o A K = min K ∈M o E E {S s E ( T K ) − S s E ( T K ∗ ) } ≥ min K ∈M o E E {S s E ( T K ) − S s E ( T K ∪ T ∗ ) } − max K ∈M o E E {S s E ( T ∗ ) − S s E ( T K ∪ T ∗ ) } − E E {S s E ( T K ∗ ) − S s E ( T ∗ ) } . Based on Lemma S.2 (ii) and (iii), the last tw o terms are b oth o ( M 1 log log ¯ λ ). Hence, min K ∈M o A K ≥ min K ∈M o E E {S s E ( T K ) − S s E ( T K ∪ T ∗ ) } − o ( M 1 log log ¯ λ ). F or the second term B K , applying the decomposition through T K ∪ T ∗ and using Lemma 18 in P ein and Shah (2025), we obtain: min K ∈M o B K ≥ min K ∈M o {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } − o P M 1 log log ¯ λ . F or the third term C K , a similar decomp osition using Lemma S.1 gives max K ∈M o | C K | = O M 1 log log ¯ λ . F or the last term D K , according to Lemma 19 in P ein and Shah (2025), w e ha ve min K ∈M o D K ≥ − o min K ∈M o {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } − o P ( M 1 log log ¯ λ ) . Com bining these elements for sufficiently large n , we ha ve: min K ∈M o nδ K,K ∗ ≳ min K ∈M o E E {S s E ( T K ) − S s E ( T K ∪ T ∗ ) } + min K ∈M o {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } − M 1 log log ¯ λ ≳ − M 1 log log ¯ λ. (S.5) The final inequality holds with asymptotic probability 1 b ecause the terms E E {S s E ( T K ) − 57 S s E ( T K ∪ T ∗ ) } and {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } represen t the deterministic reduction in signal sum of squares and the realized noise-part sum of squares improv ement from partition refinemen t, resp ectively , b oth of which can b e b ounded b elo w by zero. Dividing by n , w e obtain min K ∈M o δ K,K ∗ ≳ − M 1 log log ¯ λ/n . T o satisfy condition (S.3), w e require: M 1 log log ¯ λ n ≤ p M 2 x n ∆ ( K ∗ ) 1 q n log 2 ( n ) , whic h simplifies to the requirement that ∆ ( K ∗ ) ≳ x − 1 n log log ( ¯ λ ) log( n ) √ n . Squaring this yields the necessary lo w er b ound on the maxim um jump size given in the theorem statemen t, implying Pr { K ∗ ∈ A} ≥ 1 − α + o (1). S.2.2 Pro of of Theorem 3.3 Let B 1 n and B 2 n b e defined as in Definition 3.1. First, w e establish an upp er b ound for the standard deviation σ K,K ∗ . Recalling the v ariance expansion, we hav e: σ 2 K,K ∗ ≲ 1 n n X i =1 ∥ ¯ s O K,i − ¯ s O K ∗ ,i ∥ 2 2 . Under the theorem assumption that max i =1 ,...,n ∥ ¯ s O K,i − ¯ s O K ∗ ,i ∥ 2 2 ≲ ∆ 2 ( K ∗ ) for all K , this yields σ 2 K,K ∗ ≲ ∆ 2 ( K ∗ ) . Therefore, taking the square ro ot giv es σ K,K ∗ ≲ ∆ ( K ∗ ) with asymptotic probabilit y 1. Next, we establish the b ehavior of δ K,K ∗ for mo dels in B 1 n and B 2 n separately: Case 1: K ∈ B 1 n ⊂ M l . F rom the lack-of-fit b ound established in equation (S.4), w e 58 ha v e: nδ K,K ∗ ≳ λM 3 X k ∈I ∗ lK ∆ 2 k − M 1 K ∗ (log ¯ λ ) 2 . F or K ∈ B 1 n , the definition of the set implies that the signal term λ P k ∈I ∗ lK ∆ 2 k dominates the M 1 K ∗ (log ¯ λ ) 2 noise remainder, yielding nδ K,K ∗ ≳ M 1 ∆ 2 ( K ∗ ) p n log( n ). Dividing by n , w e obtain δ K,K ∗ ≳ M 1 ∆ 2 ( K ∗ ) q log( n ) n . Case 2: K ∈ B 2 n ⊂ M o . F rom the ov er-fit b ound established in equation (S.5), we ha v e: nδ K,K ∗ ≳ E E {S s E ( T K ) − S s E ( T K ∪ T ∗ ) } + {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } − M 1 log log ¯ λ. F or K ∈ B 2 n , the definition of the s et ensures that the first t wo terms exceed the sto chastic remainder by a sufficien t margin, yielding nδ K,K ∗ ≳ M 1 ∆ 2 ( K ∗ ) p n log( n ). Dividing b y n , w e again obtain δ K,K ∗ ≳ M 1 ∆ 2 ( K ∗ ) q log( n ) n . Therefore, for an y K ∈ B 1 n ∪ B 2 n , combining the lo wer b ound on δ K,K ∗ with the upp er b ound on σ K,K ∗ yields: max J = K δ K,J σ K,J ≥ δ K,K ∗ σ K,K ∗ ≳ M 1 ∆ 2 ( K ∗ ) q log( n ) n ∆ ( K ∗ ) = M 1 ∆ ( K ∗ ) r log( n ) n . Because ∆ ( K ∗ ) q log( n ) n ≫ x n √ n log ( n ) for an y x n = o (1), the ratio strictly violates the threshold required for inclusion in A . Consequently , every mo del K ∈ B 1 n ∪ B 2 n is excluded from A with probabilit y tending to 1. It immediately follows that at least |B 1 n | + |B 2 n | candidate mo dels are absent from the selected set, establishing Pr {|A| ≤ K max − |B 1 n | − |B 2 n |} ≥ 1 − o (1). 59 S.3 Additional related literature In the field of c hange-p oint detection literature, while there is no shortage of researc h on statistical inference related to the n umber of c hange-p oin ts, the ma jority , if not all, of these w orks ha v e fo cused on con trolling either the F amilywise Error Rate (FWER) or the F alse Disco v ery Rate (FDR). F ric k et al. (2014) first introduced a no v el metho d called SMUCE to control FWER for data from one-dimensional exp onential family . P ein et al. (2017) extended this approach to the heterogeneous c hange-p oin t detection problem. The FWER in change-point detection context is t ypically defined as the probabilit y that the estimated n um b er of change-points ˆ K strictly exceeds the true n um b er K ∗ , i.e., Pr { ˆ K > K ∗ } . By sp ecifying a small predetermined lev el α for FWER, one can hav e certain confidence that ˆ K do es not ov erestimate the true n um b er of change-points. On the flip side, ho w ever, the SMUCE-type pro cedures can b e ov erly stringen t, result- ing in lo w statistical p o w er and a tendency to significantly underestimate the true num b er of c hange-p oin ts in practical applications (Li et al., 2016; Chen et al., 2023). Instead of FWER, Li et al. (2016) suggested con trolling a less stringen t criterion FDR. FDR is defined as the prop ortion of false disco veries among the selected c hange-p oin ts, i.e., E[ ˆ K F / ˆ K ], where ˆ K F represen ts the num b er of false discov eries. See Li et al. (2016) for a more comprehensiv e definition of FDR in c hange-p oin t problems. In the realm of FDR-based metho ds, Hao et al. (2013) in tro duced a screening and ranking algorithm (SaRa) for de- tecting one-dimensional normal mean change -p oin ts. Li et al. (2016) prop osed a m ultiscale c hange-p oin t segmen tation approach (FDRseg) based on the same mo del setup. Cheng et al. (2020) advocated a differen tial smo othing and testing of maxima/minima algorithm (dSTEM) for con tinuous time series. Chen et al. (2023) developed a mirror with order- preserv ed splitting pro cedure called MOPS, to address a broader range of c hange-p oin t 60 mo dels, including structural changes and v ariance changes. Liu et al. (2024) extended the kno c k off framework to control FDR in structural change detection, which in turn can b e mo dified for change-point detection. Sun et al. (2025) provided a data-splitting approach for FDR control. While FDR control may pro vide greater p ow er than FWER, it do es not guaran tee either consisten t estimation of true n umber of c hange-p oin ts, or the finite-sample confidence of recov ering the true n um b er. At its core, FDR only concerns the exp ectation of false discov eries rather than probability . F or instance, consider the scenario of o verfit- ting. Note that ˆ K = K ∗ + ˆ K F , and FDR con trol aims to ensure that E[ ˆ K F / ˆ K ] ≤ α . Under some mild conditions, it implies that E[ ˆ K F ] ≤ ( α/ (1 − α )) K ∗ appro ximately , whic h further means that E[ ˆ K F ] can increase as the true num b er K ∗ increases. Consequen tly , if K ∗ is sufficien tly large, E[ ˆ K F ] ma y also b ecome substantial. As a result, the p oint estimate ˆ K ma y deviate significantly from the true v alue in suc h cases. The concept of confidence set studied in this pap er is fundamentally distinct with FWER and FDR, as it revolv es around the probability of reco v ering the true num b er of change- p oin ts, and the prop osed confidence do es not rely on the true num b er K ∗ . F urthermore, letting α approac hes zero, a w ell-designed detection algorithm should lead to the cardi- nalit y of the constructed confidence set equal to one, thus further result in consistency , highligh ting the robustness and reliabilit y of the algorithm in accurately identifying the true num b er of change-points. 61 S.4 Choice of loss function and discussion of related conditions T able 1 in Zou et al. (2020) hav e discussed several imp ortant settings. Here, w e discuss t w o extra imp ortant settings here, as detailed b elow . • Co v ariance change-points mo del. Consider the setting z i ∈ R p ∼ ( 0 , Σ ∗ k ) where τ ∗ k − 1 < i ≤ τ ∗ k for k = 1 , . . . , K ∗ + 1 and i = 1 , . . . , 2 n . F or this setup, w e can c ho ose the loss function l ( β ; z i ) = ∥ z i z ⊤ i − β ∥ 2 F for β ∈ R p × p . The gradien t of the loss function is ∂ l ( β ; z i ) /∂ β = z i z ⊤ i , and with β = γ = 0, we define the score s i = v ech( ∂ l ( β ; z i ) /∂ β ) = vec h( z i z ⊤ i ). The metho d from Aue et al. (2009) can then b e applied to detect c hange-p oin ts in this setting. • Net w ork c hange-p oin ts mo del. In the net w ork change-point setting, we assume z i ∈ R p × p ∼ Bern( Θ ∗ k ), where τ ∗ k − 1 < i ≤ τ ∗ k for k = 1 , . . . , K ∗ + 1 and i = 1 , . . . , 2 n . Here, z ∼ Bern( Θ ) means that z i,j i.i.d. ∼ Bern( θ i,j ). F or this mo del, w e ma y c ho ose the loss function l ( β ; z i ) = ∥ z i − β ∥ 2 F for β ∈ R p × p . The gradient is ∂ l ( β ; z i ) /∂ β = z i , and with β = γ = 0, the score b ecomes s i = vec h( ∂ l ( β ; z i ) /∂ β ) = v ec h( z i ). The metho d from W ang et al. (2021) can b e used to detect the c hange-p oin ts in this con text. Let η i ∈ R p i.i.d. ∼ (0 , I ), and denote it as η i when it reduces to the scalar case (i.e., p = 1). Model (2.1) can b e sp ecified in to the following different mo dels. Let Σ b e a fixed co v ariance matrix. F or the net w ork mo del, Z i = ( z i,j,l ) 1 ≤ j,l ≤ n ∼ Bern( Θ i ) represen ts z i,j,l i.i.d. ∼ Bern( β i,j,l ), and let W i = Z i − Θ i . Based on this table, we can clearly iden tify the sufficien t conditions for z i to satisfy Condition 3.1 in the main con ten t. F or instance, in the case of the multiple mean c hange- 62 T able S.1: Detailed c hoice of score functions for OPTICS under differen t mo dels. Name F ormula l ( β ; z i ) s i Mean z i = β ∗ k + Σ 1 / 2 η i ∥ z i − β ∥ 2 2 s i = z i V ariance z i = β ∗ k η i ∥ log( z 2 i ) − β ∥ 2 2 s i = 2 log( z i ) Regression z i = x ⊤ i β ∗ k + η i ∥ z i − β ⊤ x i ∥ 2 2 s i = z i (1 , x ⊤ i ) ⊤ Co v ariance z i = ( Θ ∗ k ) 1 / 2 η i , β ∗ k ∈ R p × p ∥ z i z ⊤ i − Θ ∥ 2 F s i = v ech( z i z ⊤ i ) Net w ork Z i = Θ ∗ k + W i , Θ ∗ k ∈ R p × p ∥ Z i − Θ ∥ 2 F s i = v ech( Z i ) p oin t mo del, the original data z i should b e a sub-Gaussian vector. Similarly , for m ultiple v ariance change-point models, it is required that a transformation of z i , sp ecifically log ( z i ), is sub-Gaussian. S.5 More sim ulation results This section devotes to more simulation results. T able S.2 and S.3 rep ort the co v erage rates when d = 1 and d = 5, resp ectively , when the error term follo ws the standard normal distribution in Section 5.1. T able S.4 provides the co verage rates in Section 5.2, with standard normal errors. T able S.2: The co v erage rates in the mean-change mo del: d = 1; normal error. Amplitude A 0.50 0.625 0.75 0.875 1.00 OPTICS(BS) 0.77(4.10) 0.84(3.75) 0.87(3.10) 0.84(2.66) 0.84(2.68) OPTICS(SN) 0.82(4.00) 0.95(3.42) 0.98(2.56) 0.99(2.39) 1.00(2.43) A 1 C O P S S (BS) 0.45(3.00) 0.74(3.00) 0.85(3.00) 0.89(3.00) 0.97(3.00) A 1 C O P S S (SN) 0.57(3.00) 0.91(3.00) 0.92(3.00) 0.97(3.00) 0.98(3.00) A 1 F D Rseg 0.83(3.00) 0.99(3.00) 0.98(3.00) 0.96(3.00) 0.96(3.00) A 1 S M U C E 0.24(3.00) 0.86(3.00) 0.98(3.00) 1.00(3.00) 1.00(3.00) COPSS(BS) 0.23 0.38 0.52 0.60 0.52 COPSS(SN) 0.30 0.61 0.84 0.89 0.89 FDRseg 0.53 0.85 0.89 0.82 0.82 SMUCE 0.01 0.36 0.92 1 1 63 T able S.3: The co v erage rates in the mean-change mo del: d = 5; normal error. Amplitude A 0.50 0.625 0.75 0.875 1.00 OPTICS(BS) 0.73(3.27) 0.55(2.34) 0.60(2.19) 0.56(2.04) 0.67(2.20) OPTICS(SN) 0.83(3.17) 0.83(2.04) 0.94(1.91) 0.92(1.85) 0.93(2.04) A 1 C O P S S (BS) 0.65(3.00) 0.70(3.00) 0.76(3.00) 0.80(3.00) 0.88(3.00) A 1 C O P S S (SN) 0.77(3.00) 0.83(3.00) 0.96(3.00) 0.93(3.00) 0.95(3.00) COPSS(BS) 0.35 0.33 0.36 0.42 0.43 COPSS(SN) 0.48 0.75 0.80 0.80 0.69 T able S.4: The co v erage rates in linear mo del with co efficien t structural-breaks; N (0 , 1) errors. Amplitude A 0.10 0.125 0.15 0.175 0.20 OPTICS(BS) 0.83(3.52) 0.90(3.13) 0.89(2.68) 0.80(2.23) 0.78(1.97) OPTICS(SN) 0.92(2.76) 0.97(2.28) 0.98(1.90) 0.97(1.61) 1.00(1.51) A 1 C O P S S (BS) 0.50(3.00) 0.59(3.00) 0.60(3.00) 0.59(3.00) 0.61(3.00) A 1 C O P S S (SN) 0.72(3.00) 0.76(3.00) 0.72(3.00) 0.74(3.00) 0.77(3.00) COPSS(BS) 0.22 0.31 0.26 0.33 0.32 COPSS(SN) 0.51 0.57 0.56 0.62 0.58 S.5.1 V ariance c hange-p oin t mo del In this subsection, w e consider the v ariance c hange-p oin t mo del y i = σ k ϵ ∗ i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1 , i = 1 , . . . , 2 n, where σ ∗ k +1 /σ ∗ k = A ( − 1) k − 1 , k = 1 , . . . , K ∗ and σ 1 = 1. The error terms ϵ i ∼ N (0 , 0 . 25) follo wing Chen and Gupta (1997). In this mo del, w e set the sample size n = 1000, the true c hange-p oin t set T ∗ = { 200 k , k = 1 , . . . , 4 } and the change amplitude A = { 2 , 3 , 4 , 5 , 6 } . T able S.5 compares the co v erage rates of OPTICS with the state-to-art metho ds. Sim- ilar as in the multiple mean-c hange model, OPTICS with SN and BS achiev e the nominal confidence lev el. Their cov erage rates consisten tly surpass those of quasi-confidence sets created from COPSS. This implies OPTICS is desirable from b oth empirical and theoretical standp oin ts. Additionally , as exp ected, FDRseg and SMUCE, along with their resp ective quasi-confidence sets, lac k p o wer as they are tailored for detecting mean c hanges, rendering 64 T able S.5: The co v erage rates in the v ariance c hange-p oin t model. Amplitude A 2 3 4 5 6 OPTICS(BS) 0.86(3.99) 0.85(2.73) 0.93(2.63) 0.88(2.71) 0.88(2.71) OPTICS(SN) 0.83(4.55) 0.98(2.84) 0.99(2.66) 1.00(2.83) 0.99(3.02) A 1 C O P S S (BS) 0.51(3.00) 0.77(3.00) 0.88(3.00) 0.87(3.00) 0.94(3.00) A 1 C O P S S (SN) 0.53(3.00) 0.83(3.00) 0.94(3.00) 0.96(3.00) 0.97(3.00) A 1 F D Rseg 0(3.00) 0(3.00) 0(3.00) 0(3.00) 0.01(3.00) A 1 S M U C E 0.24(3.00) 0.13(3.00) 0.13(3.00) 0.15(3.00) 0.19(3.00) COPSS(BS) 0.28 0.53 0.63 0.60 0.55 COPSS(SN) 0.18 0.74 0.87 0.89 0.97 FDRseg 0 0 0 0 0 SMUCE 0.11 0.06 0.06 0.04 0.04 them insensitive to v ariance changes. S.5.2 Net work c hange-p oin ts mo del Consider the net work c hange-p oin ts model Z i = Θ ∗ k + W i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1 , i = 1 , . . . , 2 n, where τ ∗ k , k = 1 , . . . , K ∗ are the true net w ork change-points, Θ ∗ k is the d × d -dimensional net w ork mean matrix for sub ject i when τ ∗ k − 1 < i ≤ τ ∗ k ; W i is the indep endently distributed Bernoulli error. The sample size is tak en to be 2 n = 1000, and the set of true c hange-p oints T ∗ = { τ ∗ k = 200 k , k = 1 , . . . , 4 } , hence K ∗ = 4. The k th mean v ector Θ ∗ k is generated from sto c hastic blo c k mo del (W ang et al., 2021) with connectivit y matrices Q k = Q l with l = mo d ( k / 2) + 1, and Q 1 = A × 0 . 6 , 1 , 0 . 6 1 , 0 . 6 , 0 . 5 0 . 6 , 0 . 5 , 0 . 6 and Q 2 = A × 0 . 6 , 0 . 5 , 0 . 6 0 . 5 , 0 . 6 , 1 0 . 6 , 1 , 0 . 6 65 where A is a scaler, v arying among { 0 . 50 , 0 . 60 , 0 . 70 , 0 . 80 , 0 . 90 } . Each netw ork is generated from a 3-communit y sto c hastic blo c k mo del and no de size d = 5. At the change p oin ts, mem b ership are resh uffled randomly . This sim ulation setting mimics the situation in W ang et al. (2021), and we choose the Net work Binary Segmen tation (NBS) proposed therein as our change-point detection pro cedure. W e compare OPTICS with COPSS and its naiv e confidence set A 1 C O P S S . T able S.6 re- p ort the co v erage rates. The quan tities in parentheses are a verage cardinalities of estimated sets. W e tak e q = 1 for the quasi-confidence sets for COPSS, i.e., A 1 = { ˆ K − 1 , ˆ K , ˆ K + 1 } with cardinalit y 3, and denote the generated sets b y A 1 C O P S S . W e can see that OPTICS is the only metho d closely reac hes the sp ecified confidence level. T able S.6: The co v erage rates in the netw ork c hange-p oin ts model. Amplitude A 0.50 0.60 0.70 0.80 0.90 1.00 OPTICS(BS) 0.79(2.88) 0.75(2.57) 0.79(2.59) 0.70(1.83) 0.65(1.79) 0.75(1.85) A 1 C O P S S (BS) 0.75(3.00) 0.84(3.00) 0.89(3.00) 0.91(3.00) 0.86(3.00) 0.87(3.00) COPSS(BS) 0.37 0.39 0.47 0.55 0.55 0.62 S.5.3 Co v ariance change-point mo del Consider the co v ariance c hange-p oin ts mo del z i = ( Θ ∗ k ) 1 / 2 η i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1 , i = 1 , . . . , 2 n, where Θ ∗ k is the d × d -dimensional cov ariance matrix for sub ject i when τ ∗ k − 1 < i ≤ τ ∗ k ; η i is the indep enden tly and iden tically distributed standard Gaussian vector. The sample size is tak en to b e 2 n = 1000, and the set of true c hange-p oin ts T ∗ = { τ ∗ k = 200 k , k = 1 , . . . , 4 } , 66 hence K ∗ = 4. The k th mean vector Θ ∗ k = Θ ∗ l with l = mo d ( k / 2) + 1, and Θ ∗ 1 = I d and Θ ∗ 1 = { A | i − j | } 1 ≤ i,j ≤ n , where A is a scaler, v arying among { 0 . 10 , 0 . 20 , 0 . 30 , 0 . 40 , 0 . 50 } . W e choose the Wild Bi- nary Segmen tation (WBS) prop osed in F ryzlewicz (2014) as our change-point detection pro cedure. W e compare OPTICS with COPSS and its naive confidence set A 1 C O P S S . T able S.7 rep orts the co v erage rates, with the quantities in parentheses representing the av erage cardinalities of the estimated sets. F rom the results, w e observe that OPTICS p erforms b est among all metho ds, although it still do es not ac hiev e the sp ecified confidence level. This may b e due to the F rob enius measure not b eing an ideal choice. W e lea v e this issue for future researc h. T able S.7: The co v erage rates in the netw ork c hange-p oin ts model. Amplitude A 0.30 0.35 0.40 0.45 0.50 OPTICS(BS) 0.57(4.52) 0.63(5.17) 0.70(5.33) 0.68(5.18) 0.64(4.29) A 1 C O P S S (BS) 0.42(3.00) 0.44(3.00) 0.22(3.00) 0.17(3.00) 0.29(3.00) COPSS(BS) 0.09 0.05 0.06 0.03 0.08 S.5.4 Multiple mean-c hange with hea vy-tail error The data-generating pro cess for the hea vy-tailed case is nearly iden tical to the m ultiple mean-c hange mo del with a t -distribution describ ed in Section 5.1, with the only differ- ence b eing that w e set the degrees of freedom to d f = 1 in the t -distribution. F rom the T able S.8, we observe that Hub er-OPTICS (H-OPTICS) with κ = 1 . 5 and OPTICS consisten tly achiev e higher cov erage rates compared to existing metho ds. F urthermore, H-OPTICS pro duces narrow er confidence sets than OPTICS, demonstrating greater p ow er 67 under heavy-tailed data. T able S.8: Cov erage rates in the mean-change mo del ( d = 1) with hea vy-tail errors. Amplitude A 0.50 0.625 0.75 0.875 1.00 H-OPTICS(BS) 0.58(3.01) 0.57(3.07) 0.54(3.13) 0.58(3.27) 0.63(3.00) H-OPTICS(SN) 0.97(2.99) 0.92(2.95) 0.93(3.03) 0.97(3.28) 0.95(3.05) OPTICS(BS) 0.82(4.21) 0.73(4.23) 0.74(4.15) 0.79(4.25) 0.80(4.09) OPTICS(SN) 1.00(4.32) 1.00(4.32) 1.00(4.54) 0.99(4.31) 1.00(4.32) A 1 C O P S S (BS) 0.29(3.00) 0.34(3.00) 0.34(3.00) 0.31(3.00) 0.36(3.00) A 1 C O P S S (SN) 0.60(3.00) 0.51(3.00) 0.52(3.00) 0.45(3.00) 0.45(3.00) COPSS(BS) 0.02 0.10 0.04 0.05 0.05 COPSS(SN) 0.04 0.01 0.03 0.08 0.00 S.5.5 Multiple mean-c hange with m -dep enden t errors The data-generating pro cess for this setting is nearly identical to the multiple mean- c hange mo del with a normal distribution describ ed in Section 5.1, except that the error terms ϵ t are m -dep endent. Sp ecifically , they are defined as ϵ i = m X l =1 ϕ l η t + l , i = 1 , . . . , 2 n, with η t ∼ N (0 , I d ) , t = 1 , . . . , 2 n + m, and ϕ l = p 1 / M . It follo ws that the ϵ i ’s exhibit an m -dep enden t structure. W e ev aluate the p erformance of the metho ds under b oth a single c hange-p oin t setting ( d = 1) and a m ulti-dimensional mean-c hange setting ( d = 5). F rom T able S.9 ( d = 1), we observ e that m -dep enden t OPTICS (M-OPTICS) ac hieves significan tly higher co verage rates compared to standard OPTICS and existing comp eting metho ds, whic h struggle hea vily with the dep enden t errors. M-OPTICS(SN), in particular, main tains near-p erfect co verage. This phenomenon demonstrates that M-OPTICS can effectiv ely adapt to m -dep enden t data while main taining the pre-sp ecified co v erage rate, alb eit with sligh tly wider confidence sets. 68 T able S.10 extends this analysis to the more complex multi-dimensional case ( d = 5). As the num b er of true c hange-p oin ts increases, the estimation problem becomes substan tially harder. Consequently , standard OPTICS and all comp eting baseline metho ds fail almost completely , yielding co verage rates near zero. While the cov erage of M-OPTICS(BS) de- creases in this c hallenging setting, M-OPTICS(SN) exhibits remark able robustness. By com bining the m ultiple-splitting pro cedure with self-normalization, M-OPTICS(SN) suc- cessfully preserv es high co v erage rates (ranging from 0.88 to 0.98) across all tested ampli- tudes, highlighting its distinct adv antage in handling multi-dimensional, dep enden t data structures. T able S.9: Cov erage rates in the mean–change mo del ( d = 1) with m –dep enden t errors. Amplitude A 0.50 0.625 0.75 0.875 1.00 M-OPTICS(BS) 0.87(4.11) 0.79(2.58) 0.85(2.51) 0.81(2.75) 0.78(2.55) M-OPTICS(SN) 0.93(3.23) 0.96(2.27) 1.00(1.99) 1.00(2.23) 1.00(2.27) OPTICS(BS) 0.22(2.61) 0.25(2.72) 0.29(2.71) 0.23(2.57) 0.21(2.59) OPTICS(SN) 0.41(2.92) 0.46(3.08) 0.39(2.91) 0.31(2.74) 0.43(3.00) A 1 COPSS (BS) 0.05(3.00) 0.08(3.00) 0.06(3.00) 0.06(3.00) 0.02(3.00) A 1 COPSS (SN) 0.04(3.00) 0.06(3.00) 0.05(3.00) 0.05(3.00) 0.06(3.00) A 1 FDRseg 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) A 1 SMUCE 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) COPSS(BS) 0.00 0.05 0.03 0.02 0.01 COPSS(SN) 0.01 0.01 0.01 0.03 0.04 FDRseg 0.00 0.00 0.00 0.00 0.00 SMUCE 0.00 0.00 0.00 0.00 0.00 T able S.10: Cov erage rates in the mean–change mo del ( d = 5) with m –dep enden t errors. Amplitude A 0.50 0.625 0.75 0.875 1.00 M-OPTICS(BS) 0.48(2.16) 0.48(1.97) 0.49(1.93) 0.53(2.06) 0.66(2.12) M-OPTICS(SN) 0.88(2.17) 0.94(1.93) 0.94(1.92) 0.97(1.94) 0.98(1.93) OPTICS(BS) 0.05(1.88) 0.04(1.89) 0.11(2.03) 0.10(2.03) 0.07(1.90) OPTICS(SN) 0.16(2.32) 0.15(2.27) 0.20(2.32) 0.15(2.22) 0.11(2.22) A 1 COPSS (BS) 0.00 0.00 0.01 0.01 0.00 A 1 COPSS (SN) 0.01 0.01 0.03 0.02 0.00 A 1 FDRseg 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) A 1 SMUCE 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) COPSS(BS) 0.00 0.00 0.00 0.00 0.00 COPSS(SN) 0.00 0.00 0.01 0.00 0.00 69 S.5.6 V arying dep endence structure In this subsection, w e use simulations to inv estigate the effect of the dep endence struc- ture m . The simula tion setting is the same as in Subsection S.5.5, except w e fix the amplitude at A = 0 . 75 and v ary the dep endence parameter m ∈ { 1 , 2 , 3 , 5 , 8 } to observ e its impact. T able S.11: Cov erage rates in the mean–change mo del ( d = 1) with m –dep enden t errors. m 1 2 3 5 8 M-OPTICS(BS) 0.84(2.97) 0.86(2.52) 0.79(2.47) 0.81(2.59) 0.78(2.86) M-OPTICS(SN) 0.99(2.53) 1.00(2.16) 1.00(2.10) 1.00(2.18) 0.94(2.18) OPTICS(BS) 0.83(2.96) 0.08(2.87) 0.13(3.04) 0.01(1.51) 0.00(1.06) OPTICS(SN) 0.99(2.47) 0.25(3.59) 0.27(3.37) 0.01(1.80) 0.00(1.40) A 1 COPSS (BS) 0.83 0.00 0.02 0.00 0.00 A 1 COPSS (SN) 0.99 0.03 0.01 0.00 0.00 A 1 FDRseg 0.94 0.00 0.00 0.00 0.00 A 1 SMUCE 1.00 0.00 0.00 0.00 0.00 COPSS(BS) 0.50 0.00 0.00 0.00 0.00 COPSS(SN) 0.94 0.00 0.00 0.00 0.00 FDRseg 0.78 0.00 0.00 0.00 0.00 SMUCE 1.00 0.00 0.00 0.00 0.00 F rom T able S.11, we observ e that while most metho ds perform adequately when m = 1, their cov erage rates collapse to near zero as the dep endence increases ( m ≥ 2). In con- trast, M-OPTICS consistently main tains high cov erage rates across all tested v alues of m , demonstrating strong robustness to dep endent errors. F urthermore, M-OPTICS(SN) ac hiev es sup erior co v erage (remaining near 1.00) while generally pro ducing narrow er con- fidence sets compared to M-OPTICS(BS). S.5.7 Comparison of OPTICS and Multiple-Splitting OPTICS In this subsection, we ev aluate the performance of t w o v arian ts of our prop osed method- ology: the base OPTICS metho d and the multiple-splitting OPTICS (MS-OPTICS). The sim ulation setup mirrors the data generating pro cess describ ed in Subsection 5.1, fo cusing 70 on a single change-point setting ( d = 1). T o inv estigate the impact of sample size and error distribution on the confidence sets, we fix the signal amplitude at A = 0 . 75 and v ary the total sample size N ∈ { 400 , 800 , 1000 , 1600 } . F urthermore, to assess the robustness of the metho ds, the error terms are generated from b oth a standard Normal distribution and a hea vy-tailed t 10 distribution. T able S.12: Cov erage rates (lengths) for amplitude A = 0 . 75 across v arying sample sizes N . Distribution Metho d N = 400 N = 800 N = 1000 N = 1600 Normal OPTICS(BS) 0.88(2.41) 0.80(2.12) 0.89(2.59) 0.74(2.33) MS-OPTICS(BS) 0.88(3.59) 0.92(2.47) 0.90(2.67) 0.87(2.56) OPTICS(SN) 1.00(2.20) 1.00(2.00) 1.00(2.41) 0.99(2.18) MS-OPTICS(SN) 0.98(3.45) 0.99(2.09) 1.00(2.36) 1.00(1.99) t − distr ibution OPTICS(BS) 0.74(2.43) 0.84(2.13) 0.85(2.54) 0.77(2.52) MS-OPTICS(BS) 0.88(4.10) 0.83(2.53) 0.91(2.67) 0.90(2.62) OPTICS(SN) 1.00(2.45) 0.98(2.34) 1.00(2.68) 1.00(2.74) MS-OPTICS(SN) 0.93(4.08) 0.99(2.42) 0.99(2.53) 0.99(2.40) T able S.12 summarizes the empirical cov erage rates and av erage lengths of the estimated confidence sets. W e observ e that for mo derate sample sizes (e.g., N ≤ 1000), both OPTICS and MS-OPTICS deliver comparable and highly satisfactory co v erage rates across b oth distributions. As the sample size increases to N = 1600, MS-OPTICS exhibits an added la y er of stabilit y , main taining a co v erages under the Normal distribution and the hea vy- tailed t 10 distribution. This suggests that while the base OPTICS is highly effective and efficien t for t ypical sample sizes, the m ultiple-splitting pro cedure can serv e as a robust alternativ e when the sample size gro ws exceptionally large. S.5.8 Effectiv eness of c hange-p oin t detection algorithm As aforementioned in Section 1, the cardinalit y of OPTICS can ev aluate the efficacy of v arious c hange-p oint detection algorithms when utilized as the base algorithms for con- structing OPTICS. An efficien t detection algorithm should demonstrate superior capabilit y 71 in distinguishing the true num b er of c hange-p oin ts from others. Therefore, OPTICS cou- pled with a p o werful detection algorithm is exp ected to achiev e the co v erage rate with a smaller cardinality . In this subsection, w e conduct a comparison b etw een the effectiveness of SN and BS under the univ ariate mean change-point mo del setting in Section 5.1 The left panel of Figure S.1 depicts the b oxplot of cardinalities of OPTICS using BS and SN as base algorithm across 100 simulation runs, where the error term follows N (0 , 1). As the amplitude A increases, the av erage cardinalities decrease, with OPTICS with SN exhibiting a notably faster decline. Additionally , the right panel of Figure S.1 shows the co v erage rates of OPTICS with the tw o base algorithms, where OPTICS with SN demon- strates a higher cov erage rate compared to that with BS. Hence, under the univ ariate mean change-point mo del setting, the SN method pro v es to be more effective than the BS metho d. Figure S.2 is devoted to the parallel results with t (10) errors. Similar phenomena are observed. Figure S.1: The b o xplot of cardinalities of OPTICS and the line plot of cov erage rates in univ ariate mean c hange-p oin t mo del: N (0 , 1) error. 72 Figure S.2: The b o xplot of cardinalities of OPTICS and the line plot of cov erage rates in univ ariate mean c hange-p oin t mo del: t (10) error. S.6 More real data analysis results W e pro vide the estimated c hange-p osition set A i , i = 1 , . . . , 10: A 1 = { 13 , 16 } , A 2 = { 7 , 10 , 19 , 22 } , A 3 = { 4 , 16 , 19 , 22 } , A 4 = { 4 , 7 , 10 , 13 , 16 , 19 , 22 } , A 5 = { 13 , 22 } , A 6 = { 4 , 7 , 10 , 13 , 16 } , A 7 = { 4 , 13 , 16 , 19 , 22 } , A 8 = { 22 } , A 9 = { 19 , 22 } , A 10 = { 22 } . The detected c hange-p oin ts for each individual are as follo ws: 73 S 1 = { 263 , 341 , 363 , 388 , 428 , 449 , 469 , 1319 , 1724 , 1906 , 2044 , 2143 , 2195 } , S 2 = { 1642 , 1663 , 1771 , 1795 , 1816 , 1965 , 2195 } , S 3 = { 540 , 601 , 2143 , 2195 } , S 4 = { 220 , 2041 , 2062 , 2143 } , S 5 = { 73 , 174 , 269 , 1141 , 1225 , 1276 , 1641 , 1915 , 1965 , 1991 , 2031 , 2143 , 2195 } , S 6 = { 73 , 105 , 134 , 2195 } , S 7 = { 74 , 134 , 1572 , 2195 } , S 8 = { 177 , 393 , 521 , 541 , 601 , 788 , 811 , 895 , 932 , 960 , 1051 , 1141 , 1285 , 1319 , 1386 , 1724 , 1906 , 1973 , 1997 , 2041 , 2137 , 2195 } , S 9 = { 60 , 221 , 454 , 521 , 544 , 581 , 905 , 925 , 1029 , 1054 , 1141 , 1225 , 1249 , 1378 , 1522 , 2047 , 2071 , 2141 , 2195 } , S 10 = { 72 , 134 , 756 , 1119 , 1141 , 1167 , 1225 , 1321 , 1366 , 1386 , 1455 , 1534 , 1560 , 1642 , 1685 , 1726 , 1818 , 2044 , 2091 , 2143 , 2166 , 2195 } . References Aue, A., S. H¨ ormann, L. Horv´ ath, and M. Reimherr (2009). Break detection in the co v ari- ance structure of m ultiv ariate time series mo dels. Annals of Statistics 37 (6B), 046–4087. Auger, I. E. and C. E. Lawrence (1989). Algorithms for the optimal iden tification of segmen t neigh b orho o ds. Bul letin of Mathematic al Biolo gy 51 (1), 39–54. Bai, J. (1998). Estimation of m ultiple-regime regressions with least absolutes deviation. Journal of Statistic al Planning and Infer enc e 74 (1), 103–134. 74 Bai, J. and P . Perron (1998). Estimating and testing linear mo dels with m ultiple structural c hanges. Ec onometric a 66 (1), 47–78. Barano wski, R., Y. Chen, and P . F ryzlewicz (2019). Narrow est-ov er-threshold detection of m ultiple c hange p oin ts and change-point-lik e features. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy 81 (3), 649–672. Braun, J. V., R. Braun, and H.-G. M ¨ uller (2000). Multiple changepoint fitting via quasi- lik eliho o d, with application to dna sequence segmentation. Biometrika 87 (2), 301–314. Chen, H., H. Ren, F. Y ao, and C. Zou (2023). Data-driv en selection of the num b er of change- p oin ts via error rate control. Journal of the Americ an Statistic al Asso ciation 118 (542), 1415–1428. Chen, J. and A. K. Gupta (1997). T esting and lo cating v ariance changepoints with appli- cation to sto c k prices. Journal of the A meric an Statistic al asso ciation 92 (438), 739–747. Chen, X. and W.-X. Zhou (2020). Robust inference via m ultiplier b o otstrap. A nnals of Statistics 48 (3), 1665–1691. Cheng, D., Z. He, and A. Sc h wartzman (2020). Multiple testing of lo cal extrema for detection of c hange p oin ts. Ele ctr onic Journal of Statistics 14 (2), 3705–3729. Chernozh uk ov, V., D. Chetverik ov, and K. Kato (2013). Gaussian appro ximations and m ultiplier b ootstrap for maxima of sums of high-dimensional random v ectors. A nnals of Statistics 41 (6), 2786–2819. Chernozh uk ov, V., D. Chetv erik o v, and K. Kato (2017). Central limit theorems and b o ot- strap in high dimensions. A nnals of Pr ob ability 45 (4), 2309–2352. 75 Cho, H. and P . F ryzlewicz (2015). Multiple-c hange-p oin t detection for high dimensional time series via sparsified binary segmen tation. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) 77 (2), 475–507. F an, Y., J. Liu, J. Lv, and A. Sun (2025). MOSAIC: Minimax-optimal sparsity-adaptiv e inference for c hange p oin ts in dynamic netw orks. arXiv pr eprint arXiv:2509.06303 . F rick, K., A. Munk, and H. Sieling (2014). Multiscale change p oin t inference. Journal of the R oyal Statistic al So ciety: Series B: Statistic al Metho dolo gy 76 (3), 495–580. F ryzlewicz, P . (2014). Wild binary segmen tation for m ultiple c hange-p oin t detection. An- nals of Statistics 42 (6), 2243–2281. Hao, N., Y. S. Niu, and H. Zhang (2013). Multiple c hange-p oint detection via a screening and ranking algorithm. Statistic a Sinic a 23 (4), 1553–1572. Harc haoui, Z. and C. L´ evy-Leduc (2010). Multiple c hange-p oin t estimation with a total v ariation p enalt y . Journal of the Americ an Statistic al Asso ciation 105 (492), 1480–1493. Hub er, P . J. (1992). Robust estimation of a lo cation parameter. In Br e akthr oughs in statistics: Metho dolo gy and distribution , pp. 492–518. Springer. James, N. A. and D. S. Matteson (2015). ecp: An r pac k age for nonparametric multiple c hange point analysis of multiv ariate data. Journal of Statistic al Softwar e 62 , 1–25. Kok oszk a, P ., H. Miao, M. Reimherr, and B. T aoufik (2018). Dynamic functional regression with application to the cross-section of returns. Journal of Financial Ec onometrics 16 (3), 461–485. Lei, J. (2020). Cross-v alidation with confidence. Journal of the A meric an Statistic al Asso- ciation 115 (532), 1978–1997. 76 Li, H., A. Munk, and H. Sieling (2016). FDR-control in m ultiscale c hange-p oin t segmen- tation. Ele ctr onic Journal of Statistics 10 (1), 918–959. Liu, B., C. Zhou, X. Zhang, and Y. Liu (2020). A unified data-adaptive framew ork for high dimensional change p oin t detection. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) 82 (4), 933–963. Liu, J., A. Sun, and Y. Ke (2024). A generalized kno ck off pro cedure for fdr con trol in structural change detection. Journal of Ec onometrics 239 (2), 105331. Liu, Y. and J. Xie (2020). Cauch y combination test: a p o werful test with analytic p-v alue calculation under arbitrary dep endency structures. Journal of the Americ an Statistic al Asso ciation 115 (529), 393–402. Mo on, S. and C. V elasco (2013). T ests for m-dep endence based on sample splitting metho ds. Journal of Ec onometrics 173 (2), 143–159. P adilla, O. H. M., Y. Y u, D. W ang, and A. Rinaldo (2021a). Optimal nonparametric change p oin t detection and localization. Ele ctric al Journal of Statistics 15 (1), 1154–1201. P adilla, O. H. M., Y. Y u, D. W ang, and A. Rinaldo (2021b). Optimal nonparametric m ultiv ariate c hange p oin t detection and lo calization. IEEE T r ansactions on Information The ory 68 (3), 1922–1944. P ein, F. and R. D. Shah (2025). Cross-v alidation for change-point regression: Pitfalls and solutions. Bernoul li 31 (1), 388–411. P ein, F., H. Sieling, and A. Munk (2017). Heterogeneous change p oint inference. Journal of the R oyal Statistic al So ciety. Series B (Statistic al Metho dolo gy) 79 (4), 1207–1227. Sc h warz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6 , 461–464. 77 Sun, A., J. Bi, and J. Liu (2025). A synthetic data approac h for fdr control in c hange-p oin t detection. Statistics Innovation 3 (1). T ruong, C., L. Oudre, and N. V a yatis (2020). Selective review of offline c hange p oint detection metho ds. Signal Pr o c essing 167 , 107299. W ang, D., Y. Y u, and A. Rinaldo (2021). Optimal change p oint detection and lo calization in sparse dynamic net works. A nnals of Statistics 49 (1), 203–232. W ang, D., Z. Zhao, K. Z. Lin, and R. Willett (2021). Statistically and computationally efficien t change p oint lo calization in regression settings. Journal of Machine L e arning R ese ar ch 22 , 248–1. Y ao, Y.-C. (1988). Estimating the n umber of c hange-p oin ts via sc hw arz’criterion. Statistics & Pr ob ability L etters 6 (3), 181–189. Y ao, Y.-C. and S.-T. Au (1989). Least-squares estimation of a step function. Sankhy¯ a: The Indian Journal of Statistics (Series A) 51 (3), 370–381. Y u, M. and X. Chen (2021). Finite sample change p oint inference and identification for high- dimensional mean v ectors. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) 83 (2), 247–270. Zhang, N. R. and D. O. Siegm und (2012). Model selection for high-dimensional, multi- sequence change-point problems. Statistic a Sinic a 22 (4), 1507–1538. Zou, C., G. W ang, and R. Li (2020). Consistent selection of the num b er of change-points via sample-splitting. Annals of statistics 48 (1), 413. 78
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment