OPTICS: Order-Preserved Test-Inverse Confidence Set for Number of Change-Points

OPTICS: Order-Preserv ed T est-In v erse Conﬁdence Set for Num b er of Change-P oin ts Ao Sun Data Sciences and Op erations Departmen t, Univ ersit y of Southern California and Jingyuan Liu Departmen t of Statistics, Xiamen Univ ersit y Marc h 31, 2026 Abstract Determining the n umber of change-points is a ﬁrst-step and fundamen tal task in c hange-p oin t detection problems, as it la ys the groundwork for subsequent change- p oin t p osition estimation. While the existing literature oﬀers v arious metho ds for consisten tly estimating the num b er of change-points, these metho ds typically yield a single p oin t estimate without any assurance that it recov ers the true num b er of c hanges in a sp eciﬁc dataset. Moreo ver, ac hieving consistency often hinges on very stringen t conditions that can b e challenging to v erify in practice. T o address these is- sues, w e in tro duce a uniﬁed test-in verse pro cedure to construct a conﬁdence set for the n umber of c hange-p oin ts. The prop osed conﬁdence set pro vides a set of possible v al- ues within which the true n umber of change-points is guaran teed to lie with a speciﬁed lev el of conﬁdence. W e further pro ved that the conﬁdence set is suﬃcien tly narrow to b e pow erful and informative b y deriving the order of its cardinality . Remark ably , this conﬁdence set can b e established under more relaxed conditions than those required b y most p oin t estimation techniques. W e also advocate multiple-splitting pro cedures to enhance stability and extend the prop osed method to hea vy-tailed and dependent settings. As a byproduct, w e ma y also lev erage this constructed conﬁdence set to assess the eﬀectiv eness of p oin t-estimation algorithms. Through extensiv e simulation studies, we demonstrate the superior p erformance of our conﬁdence set approac h. Additionally , we apply this metho d to analyze a bladder tumor microarray dataset. Supplemen tary Material, including pro ofs of all theoretical results, computer co de, the R pac k age, and extended simulation studies, are av ailable online. Keywor ds: Change-p oin t detection, Cross-v alidation, Order-preserv ed data splitting, Hy- p othesis testing, Conﬁdence level 1 1 In tro duction 1.1 Motiv ation and in tuition Estimating the n umber of change-points is a fundamen tal task in c hange-p oin t detection problems, as a consistent estimation of the num b er usually leads to consistent estimation of change lo cations (Harc haoui and L´ evy-Leduc, 2010; W ang et al., 2021). Therefore, the consistency in c hange-p oin t detection problems can typically b e form ulated as lim inf n →∞ Pr { ˆ K = K ∗ } = 1 , (1.1) where ˆ K is the estimated n umber of change-points, and K ∗ is the true n um b er of c hange- p oin ts. Classical metho ds for obtaining a consistent ˆ K are mainly based on the Ba y esian information criterion (BIC, Sch warz (1978)). See for instance Y ao (1988); Bai (1998); Braun et al. (2000); Zhang and Siegm und (2012); F ryzlewicz (2014) and Cho and F ry- zlewicz (2015). Additionally , some recent approaches, such as Padilla et al. (2021a); W ang et al. (2021); P adilla et al. (2021b), hav e embraced a hard threshold approach. Both the BIC-based metho ds and hard-threshold tec hniques require parameter tuning, and the consistency of their estimates highly relies on these tuning parameters. T o mitigate this dep endence on tuning, Zou et al. (2020) introduced an Order-Preserved Sample-Splitting Pro cedure (COPSS) that selects the n um b er of c hange-p oin ts b y optimizing out-of-sample prediction, thus is tuning-free. Nev ertheless, these metho ds only provide p oint estimates for the num b er of c hange- p oin ts, without an y guaran tee that the true n umber K ∗ can b e reco vered on a ﬁnite-sample basis. As a result, the consistency result may not b e suﬃcient in practice. T o intuitiv ely illustrate, w e conduct a simple simulation for COPSS, using Binary Segmen tation (BS, F ryzlewicz (2014)) as the change-position detector. The detailed mo del settings are pro- vided in Section 5. According to COPSS, the true n umber K ∗ can b e correctly iden tiﬁed 2 only when K ∗ is the minimizer of the out-of-sample prediction error; that is, in an ascend- ing order, K ∗ should rank the ﬁrst in terms of prediction error among all the candidate n um b ers of c hange-p oin ts. Left picture in Figure 1 depicts the piec hart of such ranks for K ∗ o v er 500 simulatio n runs. W e observe that in only 55.8% runs, K ∗ can rank the ﬁrst and hence b e successfully reco vered. In brief, a single-point estimate can be unreliable; a misjudgment in the num b er of c hange-p oin ts could further result in erroneous estimations of c hange positions. T o this end, inspired by the concept of conﬁdence in terv al, a more pruden t approac h is constructing a conﬁdence set, denoted as A , for K ∗ , with a speciﬁed conﬁdence lev el 1 − α , suc h that lim inf n →∞ Pr { K ∗ ∈ A} ≥ 1 − α. This task app ears challenging since the construction of conﬁdence sets for the n umber of c hange-p oin ts demands an in vestigation into the asymptotic b ehavior of a discrete random v ariable. How ever, Right picture in Figure 1 oﬀers an insigh t for this problem. The ﬁgure depicts a histogram of diﬀerences in prediction errors b et ween the mo dels ﬁtted with the true num b er of c hange-p oin ts K ∗ and those ﬁtted with the minimizer ˆ K . W e observe that while the mo dels with K ∗ ma y not alwa ys lead to optimal out-of-sample prediction p erformance, they t ypically exhibit relativ ely small deviations from the optimum. In light of this and motiv ated by Lei (2020), we address the challenge of constructing a conﬁdence set b y framing it as a testing problem. In this testing framework, the null hypothesis p osits that the selected n um b er is indeed optimal from the p ersp ectiv e of out-of-sample prediction error. Note that in h yp othesis testing, the n ull hypothesis is not rejected unless there exists substantial evidence to the alternativ e. Therefore, although the true num b er need not b e the minimizer, it will likely not b e rejected unless the observ ed diﬀerence is statistically signiﬁcant. W e then collect those num b ers that are not rejected to form the conﬁdence set A . When the signiﬁcance lev el is set as α , w e will show that the true n umber 3 of change-points lies in the conﬁdence set with probabilit y at lease 1 − α . Figure 1: Left ﬁgure: Piec hart for ranks of K ∗ in an ascending order according to prediction error. Righ t ﬁgure: Histogram of the diﬀerence in the out-of-sample prediction errors b et w een the mo dels ﬁtted with the true n umber of c hange-p oin ts K ∗ and those ﬁtted with the minimizer ˆ K . Apart from the desirable cov erage probabilit y , w e also delve into the cardinality of the prop osed conﬁdence set, which reﬂects the p o wer of tests from the p ersp ectiv e of statistical inference, or the rate of false negatives in the context of mo del selection. Therefore, the theory of cardinalit y holds its own signiﬁcance. W e establish that, with an o v erwhelming probabilit y , the selected conﬁdence set p ossesses non trivial p o w er with a b ounded cardi- nalit y . F urthermore, the cardinalit y of the conﬁdence set can serve as a metric for assessing the eﬃcacy and stabilit y of a change-point detection algorithm in the detection stage. When an ineﬃcient algorithm is emplo y ed, it b ecomes challenging to distinguish the true n um b er of change-points K ∗ from other p oten tial c hoices, resulting in the conﬁdence set con taining man y candidates to achiev e the desired cov erage rate. Con v ersely , with an eﬃcien t and stable detection algorithm, the true num b er K ∗ often exhibits signiﬁcantly sup erior p erformance compared to other choices. Consequen tly , the n ull h yp othesis will only b e retained when K is closely aligned with K ∗ , leading to a small cardinality for the conﬁdence set. 4 1.2 Our con tributions Mean while, some w orks in the change-point literature could p oten tially facilitate con- structing conﬁdence in terv als or sets for change locations. F or instance, Y ao and Au (1989) deriv ed the asymptotic distribution of change lo cations in the context of one-dimensional mean change problems. Similarly , Bai and P erron (1998) inv estigated the asymptotic b e- ha vior of change lo cations in structural breaking mo dels. Ho w ever, the asymptotic distri- butions in these studies rely on unkno wn p opulation-lev el parameters, such as the means and v ariances of noises, as well as the true n um b er of change-points. T o the b est of our kno wledge, no researc h has delved in to the asymptotic distributions of the num b er of c hange-p oin ts. In ligh t of this, we prop ose a test-based framew ork to circum v en t the asymptotic dis- tribution of n umber of change-points. This framework was inspired b y Lei (2020), whic h fo cused on linear mo dels and constructed the conﬁdence in terv al for num b er of active pre- dictors via cross-v alidation. How ever, it’s crucial to recognize the substantial diﬀerences b et w een linear mo dels and c hange-p oin t detection problems in b oth metho dology and the whole theoretical foundation; ev en the con v en tional cross-v alidation techniques are not ap- plicable in the con text of c hange-p oin t detection. F or instance, in c hange-p oin t problems, b oth the mean v alues and v ariances of individual data p oin ts migh t b e non-stationary . Hence, obtaining a consistent estimator of the cov ariance matrix b ecomes c hallenging. This, in turn, causes failure of the Gaussian m ultiplier b o otstrap pro cedure, whic h is cru- cial for constructing conﬁdence in terv als. F urthermore, Lei (2020) did not tak e in to accoun t the length of their prop osed conﬁdence interv als, which in turn impacts the p o wer of tests. Consequen tly , if w e include an excessiv e n umber of candidate v alues, the resulting in terv al could b ecome uninformativ e ev en if the cov erage rate is guaran teed. In this pap er, w e establish the conﬁdence set for the num b er of change-points by incor- p orating order-preserved splitting and m ultiplier bo otstrap, and systematically establish 5 the theoretical prop erties of the prop osed set in the c hange-p oin t framework. In terest- ingly , the theory of the prop osed conﬁdence set can b e built up on w eak er conditions than the corresp onding p oin t estimates of num b er of c hange-p oin ts. Additionally , b eyond the scop e of cov erage rate, w e deriv e a sharp upp er bound for the cardinalit y of the conﬁdence set, whic h plays a crucial role in con trolling the false negativ e rate and ensuring that the asso ciated test has non trivial p ow er. The prop osed OPTICS metho d is then extended to handle hea vy-tailed and m -dep endent settings. The eﬀectiv eness of these extensions is v eriﬁed through simulations. As a b ypro duct, w e provide an easy-to-implemen t R pack age, OPTICS , av ailable on GitHub to implemen t the prop osed method. 1 1.3 Notations W e introduce the follo wing notations used throughout this pap er. F or t w o sequences a n and b n , a n ≲ b n ( a n ≳ b n ) means with probabilit y approaching one, a n ≤ cb n ( b n ≤ ca n ) for some c > 0 and suﬃciently large n ; a n ≫ b n stands for a n /b n → ∞ . a := b represen ts that a is deﬁned as b . Giv en a vector x = [ x 1 , . . . , x d ] ⊤ ∈ R d , its ℓ 2 -norm is deﬁned as ∥ x ∥ 2 =  P d j =1 | x j | 2  1 / 2 , and its ℓ ∞ -norm is ∥ x ∥ ∞ = max j =1 ,...,d | x j | . F or a random v ariable X , its Orlicz norm ∥ X ∥ ψ β = inf { C > 0 : E [ ψ β ( | X | /C )] ≤ 1 } , where ψ β ( x ) := exp( x β ) − 1 for β = 1 , 2. Let T K = { τ 1 , τ 2 , . . . , τ K } b e a set of K change-points for a size- m sequence { x i , i = 1 , 2 , . . . , m } , where τ 0 < τ 1 < . . . < τ K . Denote ¯ x τ k ,τ k +1 as the a v erage of { x i , i = τ k + 1 , τ k + 2 , . . . , τ k +1 } , and set S 2 x ( T K ) = P K k =0 P τ k +1 i = τ k +1 ∥ x i − ¯ x τ k ,τ k +1 ∥ 2 2 . Moreov er, let ˜ T ˜ K b e another set of ˜ K change-points. Denote S 2 x ( T K ∪ ˜ T ˜ K ) = S 2 x (sort( T K ∪ ˜ T ˜ K )), where sort( A ) represents the set of sorted elements of A in ascending order. F or a matrix X , v ec h( X ) b e the vectorization of the low er half matrix of X , otherwise, if x is a vector, w e deﬁne vec h( x ) = x . W e denote λ min ( X ) and λ max ( X ) as the smallest and largest eigenv alues of the matrix X , resp ectiv ely . F or a set A , |A| denotes its cardinalit y . 1 A is the indicator 1 https://github.com/suntiansheng/OPTICS 6 function that tak es v alue 1 when A is true and 0 otherwise. 1.4 Organization of the pap er The rest of the pap er is organized as follows. In Section 2, w e prop ose an Order- Preserv ed T est-Inv erse Conﬁdence Set (OPTICS) for the n um b er of change-points - the metho dology , intuition, algorithm and practical guidelines are pro vided. In Section 3, we systematically study the theoretical prop erties of the OPTICS, including but not limited to the co verage rate and the asymptotic b ound for the cardinalit y of OPTICS. The ﬁnite- sample p erformance of OPTICS is empirically v eriﬁed through several simulation studies in Section 5. In Section 6, w e apply construct the OPTICS for a bladder tumor microarray dataset. Section 7 concludes the pap er. The Supplementary Material contains the pro ofs of theoretical results, the additional literature review, sim ulations and real data results of the main pap er. 2 Metho dology: OPTICS for n um b er of c hange-p oin ts 2.1 A general c hange-p oin t mo del and its score transformation Supp ose a sequence of indep enden t data observ ations Z = { z 1 , . . . , z 2 n } are collected from the follo wing m ultiple change-point mo del: z i ∼ m ( · | β ∗ k ) , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1; i = 1 , . . . , 2 n. (2.1) In mo del (2.1), the sample size is set to b e 2 n for the later notation conv enience. K ∗ is the true n umber of change-points that is allo w ed to v ary with n . τ ∗ ′ k s are the lo cations of true change-points; for conv ention, set τ ∗ 0 = 0 and τ ∗ K ∗ +1 = 2 n , so that the change-points in T ∗ = { τ ∗ 1 , . . . , τ ∗ K ∗ } partition the 2 n sample p oin ts in to k + 1 segmen ts. m ( · | β ∗ k ) represen ts 7 a certain mo del structure for the k th segment, with a d -dimensional parameter v ector β ∗ k , where β ∗ k  = β ∗ k +1 . W e assume d is ﬁxed throughout the man uscript. This mo del setting is quite general, including mean changes (Hao et al., 2013), v ariance changes (Chen and Gupta, 1997), structural changes in regression mo del (Bai and Perron, 1998), Cov ariance c hange-p oin ts mo del (Aue et al., 2009) and net w ork change-points model (W ang et al., 2021; F an et al., 2025). See more discussio in Section S.4 of Supplemen tary Material. Our primary goal in this pap er is to construct a conﬁdence set for K ∗ . Inspired by Zou et al. (2020), we em b ed the generic mo del (2.1) into a multiple mean c hange-p oin t detection problem via a score-type transformation. Sp eciﬁcally , let ℓ ( β ; z i ) b e a plausible loss function for data point z i , thus the score function can b e deﬁned as its deriv ative s β ( z i ) = v ec h( ∂ ℓ ( β ; z i ) /∂ β ). In tuitively , if the score is identiﬁable, for i ∈ ( τ ∗ k − 1 , τ ∗ k ] and i ′ ∈ ( τ ∗ k , τ ∗ k +1 ], we should hav e E { s γ ( z i ) } = 0 if and only if γ = β ∗ k , and E { s γ ( z i ′ ) } = 0 if and only if γ = β ∗ k +1 . Hence, given a ﬁxed d -dimensional vector γ , t ypically E { s γ ( z i ) }  = E { s γ ( z i ′ ) } . This motiv ates us to decomp ose the score into s i := s γ ( z i ) = µ i + ϵ i , i = 1 , . . . , 2 n, (2.2) where µ i = E[ s γ ( z i )], and ϵ i = s γ ( z i ) − E[ s γ ( z i )]. Denote the co v ariance matrix Cov( ϵ i ) = Σ ( k ) when i ∈ ( τ ∗ k , τ ∗ k +1 ]. Note that the score-t yp e transformation generally remains in- v arian t regardless of the choice of γ . Hence, we can c ho ose any γ , such as γ = 0 or γ := arg min β P z i ∈Z ℓ ( β ; z i ). 2.2 Construction of OPTICS In this subsection, w e prop ose an order-preserv ed test-inv erse conﬁdence set (OPTICS) for the num b er of change-points. Let M = { 1 , 2 , . . . , K max } b e a ten tative candidate set of the n umber of c hange-p oin ts in mo del (2.1), where K max > K ∗ is allo wed to increase with the sample size 2 n ; a commonly adopted conv ention for K max is K max = log( n ) (Zou et al., 8 2020). Our ob jectiv e is to identify a small subset of M that co v ers the true num b er K ∗ with a sp eciﬁed lev el of conﬁdence. T o b egin with, w e in tro duce a criterion for ev aluating the mo del-ﬁtting capability of each K ∈ M , utilizing an order-preserv ed data-splitting tec hnique (Zou et al., 2020). Considering the in trinsic order structure of c hange-p oint problems, we divide the data in to the follo wing “Odd sample” Z O and “Ev en sample” Z E based on the parit y of temp oral order: Z O = { z 2 i − 1 , i = 1 , . . . , n } and Z E = { z 2 i , i = 1 , . . . , n } . Giv en each K ∈ M , certain base change-point detection algorithm can b e adopted to estimate the c hange-p osition set T K = { τ K 1 , . . . , τ K K } using the odd sample Z O . Then a natural criterion can be deﬁned, based on the even sample Z E , as C ′ ( T K ; Z E ) := 1 n S 2 s E ( T K ) = 1 n K X k =0 τ K k +1 X i = τ K k +1 ∥ s E i − ¯ s E τ K k ,τ K k +1 ∥ 2 2 , where s E i is the score calculated using Z E and ¯ s E τ K k ,τ K k +1 is the av erage of { s E i , i ∈ [ τ K k + 1 , τ K k +1 ] } . How ever, C ′ ( T K ; Z E ) gauges the predictive p erformance within the ev en sample Z E itself, wherein ov er-ﬁtting alwa ys app ears adv antageous. In addition, the term ¯ s E τ K k ,τ K k +1 in C ′ ( T K ; Z E ) renders ∥ s E i − ¯ s E τ K k ,τ K k +1 ∥ 2 2 and ∥ s E j − ¯ s E τ K k ,τ K k +1 ∥ 2 2 dep enden t for those i and j in the same sub-in terv al. Hence, we replace ¯ s E τ K k ,τ K k +1 with its coun terpart in Z O , denoted b y ¯ s O τ K k ,τ K k +1 , and reﬁne the ab o ve criterion as C ( T K ; Z E ) := 1 n K X k =0 τ K k +1 X i = τ K k +1 ∥ s E i − ¯ s O τ K k ,τ K k +1 ∥ 2 2 . (2.3) 9 F urther let ¯ s O K,i = P K k =0 1 { τ K k +1 ,τ K k +1 } ¯ s O τ K k ,τ K k +1 , then (2.3) can b e rewritten as C ( T K ; Z E ) = n − 1 n X i =1 ∥ s E i − ¯ s O K,i ∥ 2 2 . As discussed in Section 1, for the true n umber of c hange-p oin ts K ∗ , while C ( T K ∗ ; Z E ) migh t not b e the minimum among all candidate mo dels in M , it is often reasonably close to the minim um. F ormally , this motiv ates us to establish the follo wing hypotheses for each candidate K ∈ M : H 0 ,K : E [ C ( T K ; Z E )] is the minimum in M v.s. H 1 ,K : E [ C ( T K ; Z E )] is not the minimum in M . (2.4) F ollowing the philosoph y of h yp othesis testing, we do not reject H 0 ,K unless C ( T K ; Z E ) signiﬁcan tly departs from the minim um. Then H 0 ,K ∗ , the null hypothesis corresponding to the true n um b er K ∗ , is exp ected not rejected with an o v erwhelming probabilit y . Therefore, the conﬁdence set A can b e naturally deﬁned as the collection of K ∈ M whose H 0 ,K ’s are not rejected. In other words, let p K b e the asso ciated p -v alue for the testing problem (2.4), then for a predetermined signiﬁcance level α , A := { K ∈ M : p K > α } . By this means, the members in A are statistically equiv alen t with resp ect to the criterion C ( · , Z E ); that is, all the num b er of c hange-p oin ts in this set are highly competitive. Indeed, as to b e seen in Section 3, the selected set A cov ers the true n um b er K ∗ with probabilit y at least 1 − α : Pr { K ∗ ∈ A} = Pr { p K ∗ > α } ≥ 1 − α. The remaining task is to obtain the p -v alue p K asso ciated with the testing problem 10 (2.4) for eac h K ∈ M . T o accomplish this, for an y J , K ∈ M , w e further deﬁne δ K,J = E[ C ( T K ; Z E ) − C ( T J ; Z E )] . Hence, the h yp otheses in (2.4) are equiv alent to H 0 ,K : max J ∈M ,J  = K δ K,J ≤ 0 v.s. H 1 ,K : max J ∈M ,J  = K δ K,J > 0 . (2.5) One p ossible p oin t estimate of δ K,J in (2.5) is ˆ δ K,J = 1 n n X i =1  ∥ s E i − ¯ s O K,i ∥ 2 2 − ∥ s E i − ¯ s O J,i ∥ 2 2  := 1 n n X i =1 ξ ( i ) K,J , (2.6) where ξ ( i ) K,J := ∥ s E i − ¯ s O K,i ∥ 2 2 − ∥ s E i − ¯ s O J,i ∥ 2 2 , i = 1 , . . . , n , are indep enden t random v ariables giv en the o dd sample Z O . Then naturally we can further take max J ∈M ,J  = K ˆ δ K,J = max J ∈M ,J  = K 1 n n X i =1 ξ ( i ) K,J , and the test statistic can b e set as its studen tized v ersion: T K = max J  = K √ n ˆ δ K,J ˆ σ K,J = max K  = J 1 √ n n X i =1 ξ ( i ) K,J ˆ σ K,J , (2.7) where ˆ σ 2 K,J = n − 1 P n i =1 ( ξ ( i ) K,J ) 2 is the estimated second moment. Next, w e calculate the p -v alues corresp onding to T K using a Gaussian comparison and b o otstrap metho d analogous to Chernozhuk ov et al. (2013, 2017). T o b e sp eciﬁc, we ﬁrst generate indep endent standard Gaussian random v ariables ζ i , i = 1 , . . . , n . Then for b = 1 , . . . , B , where B is the total num b er of b o otstrap runs, deﬁne the b th b o otstrap 11 statistic as T ♯ K,b = max K  = J 1 √ n n X i =1 ξ ( i ) K,J ˆ σ K,J ζ i . (2.8) Then the p -v alue is naturally set to b e ˆ p K = B − 1 P B b =1 I( T ♯ K,b > T K ). In (2.8), a non- cen tered b o otstrap statistic is used. The reason is, as the random v ariables { ξ ( i ) K,J , i = 1 , . . . n } ma y ha ve v arying means and v ariance, it is in tricate to v erify the sample v ariance con v erges to p opulation v ariance under suc h a ﬂuctuated scenario. Ho wev er, the regular law of large num b ers still implies that the sample second moment is exp ected to appro ximate its p opulation counterpart. Notably , since the second moment serv es as an upp er b ound of v ariance, the non-cen tered b ootstrap statistic tends to b e slightly conserv ativ e, while ensuring the t yp e-I error. In sum, the conﬁdence set A for the true n um b er of c hange-p oin ts K ∗ is constructed up on a test-based metho d with the order-preserv ed sample splitting tec hnique. Th us, w e name the conﬁdence set to b e Order-Preserv ed T est-Inv erse Conﬁdence Set (OPTICS). The en tire procedure for obtaining OPTICS is summarized in the following steps. 1. (Initialization) . Giv en a prop er γ in (2.2), calculate the score functions s i for i = 1 , . . . , 2 n . 2. F or eac h given candidate n um b er of change-points K ∈ M : 2.1 (T raining) . Obtain the estimated change-position set T K based on the o dd sample Z O . Compute the piecewise av erages ¯ s O τ K k ,τ K k +1 in (2.3), or equiv alently , ¯ s O K,i . 2.2 (V alidation) . F or i = 1 , . . . , n and K  = J , compute ξ ( i ) K,J in (2.6) using the ev en sample Z E . F urther obtain the test statistic T K in (2.7). 2.3 (Bo otstrapping) . F or b = 1 , . . . , B , calculate the Gaussian m ultiplier b o ot- strap statistic T ♯ K,b in (2.8), as w ell as the asso ciated p -v alue ˆ p K . 12 3. (OPTICS) . Given a signiﬁcance level α , the OPTICS is taken to b e A = { K ∈ M : ˆ p K > α } . (2.9) 2.3 Practical guidelines of OPTICS 2.3.1 Choice of loss function and computational complexity The c hoice of the loss function and c hange-p oint detection metho ds is crucial in c hange- p oin t analysis. T able S.1 in the Supplemen tary Material provides a comprehensiv e o v erview of settings for mean, v ariance, regression co eﬃcien t, nonparametric change-point mo dels, co v ariance c hange-p oints mo del and Net work c hange-p oin ts mo del (F an et al., 2025). The o v erall computational complexit y of OPTICS dep ends on the change-point detec- tion metho ds and the b o otstrap testing pro cedure. The complexit y of change-point detec- tion v aries depending on the mo del and the sp eciﬁc detection method used (see T ruong et al. (2020) for details). F or b o otstrap testing, if w e treat basic mathematical op erations suc h as addition, subtraction, and m ultiplication as O (1), eac h computation of (2.6) has a complexit y of O ( nd p ), where d p is the dimension of s 1 . Completing the en tire testing pro- cedure requires O ( B K max nd p ) op erations, where B is the n umber of b o otstrap samples and K max is the maximum n um b er of change-points. How ever, b y parallelizing the b o otstrap pro cedure, the computational burden can be signiﬁcantly reduced. 2.3.2 Reduce a set to a single num b er of c hange-p oints The OPTICS pro duces a set of num b ers of c hange-p oin ts A . It is eligible to guaran tee the predetermined conﬁdence and thus is typically more informativ e in practice. How ever, in some circumstances, a single estimated num b er is still desirable, esp ecially when the ultimate goal is to estimate the change-positions. One suggestion is to adopt the rightmost n um b er ¯ K in A , as it is guaran teed to b e larger than the true n um b er K ∗ with probabilit y 13 1 − α : Pr { ¯ K ≥ K ∗ } ≥ Pr { K ∗ ∈ A} ≥ 1 − α. ¯ K tends to b e a sligh t ov erestimate of K ∗ . Additionally , the leftmost n umber K could also b e chosen if the primary goal is to control the FWER, as FWER = Pr { K > K ∗ } ≤ Pr { K ∗ ∈ A} ≤ α. Another suggestion is the post-ho c strategy , whic h incorporates the data-driven OP- TICS with domain knowledge. That is, pick the member in A with the most comp elling scien tiﬁc or industrial interpretation. F or instance, in time-series change-point detection, if certain sp eciﬁc p ositions are known to b e true changes, it is advised to adopt the mem- b er in A with whic h these c hange-p ositions can b e successfully detected. This strategy allo ws us to mak e informativ e and inter pretable decisions, as well as to precisely estimate c hange-p ositions. W e emphasize that OPTICS returns a set of change-point num b ers that are no worse than any other n um b er of change-points in the criterion C deﬁned in (2.3). If OPTICS w ere to return an empty set, it w ould imply that for any giv en n umber of c hange-p oin ts, there exists another n um b er of change-points that signiﬁcan tly outp erforms the candidate, leading to a con tradiction. Therefore, OPTICS alw ays returns at least one change-point n um b er. 2.3.3 Multiple-splitting OPTICS for ﬁnite-sample stability In practice, the resulting conﬁdence set may b e sensitiv e to the particular split used in the OPTICS pro cedure, esp ecially when the sample size is mo derate or the signal is w eak. T o enhance ﬁnite-sample stabilit y , w e can apply OPTICS with multiple order-preserving splits and then com bine the resulting evidence. 14 T o b e sp eciﬁc, let L ≥ 2 b e a ﬁxed integer, and suppose for simplicity that n = Lm for some integer m . F or each r = 1 , . . . , L , deﬁne the r -th order-preserving subsample by Z ( r ) = { z r + Lj : j = 0 , . . . , m − 1 } . Under indep endent observ ations, each Z ( r ) remains an indep enden t sample and preserves the original temp oral order. W e further split each subsample Z ( r ) in to an “o dd sample” Z ( r ) O and an “ev en sample” Z ( r ) E . F or eac h split r , we apply the ordinary OPTICS to ( Z ( r ) O , Z ( r ) E ). Speciﬁcally , for eac h candidate K ∈ M , w e use Z ( r ) O to ﬁt the K -change-point mo del and Z ( r ) E to ev aluate its out-of-sample p erformance. Let ¯ s ( r,O ) K,i denote the analogue of ¯ s O K,i constructed from Z ( r ) O , and let s ( r,E ) i denote the score v ector from Z ( r ) E . F or J ∈ M\{ K } , deﬁne ˆ δ ( r ) K,J = 1 n r n r X i =1    s ( r,E ) i − ¯ s ( r,O ) K,i   2 2 −   s ( r,E ) i − ¯ s ( r,O ) J,i   2 2  , where n r := |Z ( r ) E | . Based on ˆ δ ( r ) K,J , w e compute the split-sp eciﬁc OPTICS test statistic and the corresp onding p -v alue ˆ p ( r ) K exactly as in the original pro cedure. T o aggregate the evidence across diﬀeren t order-preserving splits, w e adopt the Cauc h y com bination method (Liu and Xie, 2020). F or each K ∈ M , deﬁne T K = L X r =1 ω r tan n  0 . 5 − ˆ p ( r ) K  π o , where the weigh ts ω r are nonnegative and satisfy P L r =1 ω r = 1. The corresp onding com- bined p -v alue is ˆ p MS K = 1 2 − 1 π arctan  T K  . 15 W e then deﬁne the m ultiple-splitting OPTICS (MS-OPTICS) conﬁdence set by A MS = { K ∈ M : ˆ p MS K > α } . The multiple-splitting version has the same in terpretation as the original OPTICS, but is typically less sensitiv e to the choice of a particular sample split (See Subsection S.5.7 in Supplemen tary Material for comparison). It therefore pro vides a simple and practical re- ﬁnemen t when greater numerical stability is desired in mo derate samples. In our view, the original single-splitting OPTICS remains the default choice b ecause of its simpler presen- tation and low er computational cost, whereas the m ultiple-splitting version is best viewed as a practical enhancemen t. 3 Theory of OPTICS In this section, w e systematically study the theoretical properties of OPTICS. W e ﬁrst pro v e that the Gaussian multiplier b o otstrap pro cedure indeed pro duces v alid p -v alues under the desired n ull space. The second subsection discusses the co verage rate of OPTICS. The theoretical rate of cardinality of OPTICS, which corresp onds to the p o w er of test, is pro vided in the third subsection. 3.1 V alidit y of the b o otstrap pro cedure T o study the theoretical guarantee of the Gaussian multiplier b o otstrap pro cedure in our problem setting, w e ﬁrst impose the follo wing technical assumptions. Condition 3.1 (T ails and momen ts) . F or i = 1 , . . . , 2 n , supp ose ther e exist some p ositive c onstants M 1 and M 2 , such that (i) for any c onstant ve ctor b , ∥ b ⊤ s i ∥ ψ 1 ≤ M 1 ∥ b ∥ 2 ; 16 (ii) for any j = 1 , . . . , d , we have 1 n P n i =1 E [ | s ij | 3 ] ≤ M 1 and 1 n P n i =1 E [ | s ij | 4 ] ≤ M 2 1 , wher e s ij is the j th element in s i ; (iii) λ min (V ar[ s i ]) ≥ M 2 and λ max (E[ s i s ⊤ i ]) ≤ M 1 , wher e V ar( s i ) is the c ovarianc e matrix of s i . Condition 3.2 (Distance betw een c hange-p ositions) . F or any two distinct change p ositions τ k and τ k ′ in T K , | τ k − τ k ′ | ≳ n/ log( n ) . Condition 3.1 provides a sub-exp onential tail b ound for the data and the moment conditions. Similar conditions can b e found in other change-point literature; see Liu et al. (2020) and Y u and Chen (2021) for instances. Condition 3.2 requires a suﬃcien t distance b et w een any tw o c hange-p oin ts in T K , since distinguishing t w o c hange-p oin ts b ecomes c hallenging if they are to o close. This condition is fairly mild when the candidate mo del K ≪ n (Chen et al., 2023). This condition is also imp osed in sev eral prominen t w orks, suc h as Wild Binary Segmentation (WBS); see Assumption 3.2 in F ryzlewicz (2014), and more recen tly , the Narro west-Ov er-Threshold (NOT) m etho d; see Theorem 1 in Baranowski et al. (2019). Theorem 3.1. Supp ose Mo del (2.1) holds. L et F O denote the σ -ﬁeld gener ate d by the observe d sample use d to c onstruct { ¯ s O K,i : K ∈ M , 1 ≤ i ≤ n } . F or J ∈ M\{ K } , deﬁne δ K,J = 1 n P n i =1 E  ξ ( i ) K,J | F O  , and σ 2 K,J = 1 n P n i =1 E  ( ξ ( i ) K,J ) 2 | F O  . Assume K max ≍ log( n ) and Conditions 3.1 and 3.2 hold. In addition, assume ther e exists a c onstant c 0 > 0 such that min J ∈M\{ K } σ 2 K,J ≥ c 0 with pr ob ability tending to one. Then, (1) If max J  = K δ K,J /σ K,J ≤ x n ( n log( n )) − 1 / 2 for some x n = o (1) , then for n → ∞ , Pr { H 0 ,K is not r eje cte d at level α } ≥ 1 − α + o (1) . (2) If α ≥ n − 1 and max J  = K δ K,J /σ K,J ≥ cn − 1 / 2 log( n ) for a suﬃciently lar ge c onstant 17 c > 0 , then for n → ∞ , Pr { H 0 ,K is not r eje cte d at level α } = o (1) . Theorem 3.1 (1) depicts an “appro ximate n ull” space, i.e, max J  = K δ K,J /σ K,J ≤ x n ( n log( n )) − 1 / 2 , where w e do not reject H 0 . It implies that the p-v alue obtained from the b o otstrap is pre- serv es the type I error under this appro ximate null. Mean while, (2) claims that OPTICS w ould successfully rule out those candidate mo dels with inferior predictive p o wer. A direct implication of Theorem 3.1 is that when COPSS in Zou et al. (2020) is con- sisten t, OPTICS also cov ers the true num b er of c hange-p oin ts K ∗ with conﬁdence level 1 − α asymptotically . T o in tuitively see this, consider the sp ecial scenario when K ∗ is indeed the minimizer among all the candidate mo dels in M . In this case, COPSS is con- sisten t, since it selects the minimizer to b e the estimated n umber of change-points. On the other hand, we ha v e max K  = K ∗ δ K ∗ ,K /σ K ∗ ,J ≤ 0, thus Theorem 3.1 (1) indicates that Pr { K ∗ ∈ A} ≥ 1 − α + o (1). Nev ertheless, as demonstrated in the next section, OPTICS preserv ed the conﬁdence (cov erage rate) under conditions that are less stringent than those for COPSS. 3.2 Co verage rate of OPTICS In this section, w e systematically study the co v erage rate of OPTICS. Let λ = min 1 ≤ k ≤ K ∗ ( τ ∗ k +1 − τ ∗ k ), ¯ λ = max 1 ≤ k ≤ K ∗ ( τ ∗ k +1 − τ ∗ k ) b e the minim um and maximum of distances b etw een adjacen t true change-positions resp ectiv ely . F or k = 1 , . . . , K ∗ , denote ∆ k := ∥ µ k +1 − µ k ∥ 2 as the jump size of the k th c hange in mo del (2.2), and ∆ ( k ) as the corresp onding k th order statistic. Without loss of generality , assume the true n um b er of c hange-p oin ts K ∗ b elongs to the candidate set M , i.e., K max ≥ K ∗ . Note that M can alw a ys b e chosen conserv atively to include K ∗ . Let M l = { K ∈ M : K < K ∗ } b e the lac k-of-ﬁt set, and M o = { K ∈ M : K > K ∗ } b e the o v er-ﬁt set. Hence, the candidate set 18 M is naturally partitioned into M = M o ∪ M l ∪ { K ∗ } . W e imp ose the following conditions b efore introducing the theoretical results for cov erage rate of OPTICS. Condition 3.3 (Number of c hange-p oints) . (i) K ∗ = o ( λ ) and K ∗ (log( K ∗ ∨ e )) 2 = o (log log ( ¯ λ )) for Euler’s numb er e ; (ii) ( K ∗ max log K ∗ max ) 1 / 2 = o (log log ¯ λ ) . Condition 3.4 (Accuracy of estimation) . (i) (Over-ﬁt) F or any K ∈ M o , denote the c orr esp onding estimate d change-p osition set as T oK = { τ K o 1 , . . . , τ K oK } . Ther e exists a subset { τ K ok s , s = 1 , . . . , K ∗ } ⊂ T oK , such that Pr  ∀ τ ∗ k ∈ T ∗ ,   τ K ok s − τ ∗ k   ≤ b n  → 1 , wher e b n > 0 satisﬁes K ∗ log log ( b n ∨ e ) = o (log log( ¯ λ )) and P K ∗ k =1 b n ∆ 2 k = o ( M 1 log log ( ¯ λ )) . (ii) (L ack-of-ﬁt) F or any K ∈ M l , denote the c orr esp onding estimate d change-p osition set as T lK = { τ K l 1 , . . . , τ K lK } . Ther e exists a subset of true change-p ositions I ∗ lK ⊂ T ∗ , such that for any τ ∗ k ∈ I ∗ lK , no estimate d change-p oint lies within interval { τ ∗ k − λ/ 2 + 1 , . . . , τ ∗ k + λ/ 2 } . F urthermor e, denote ¯ µ τ K lk ,τ K l ( k +1) to b e the aver age of { µ i , i = τ K lk + 1 , τ K lk + 2 , . . . , τ K l ( k +1) } . Then for some c onstant M 3 > 0 , Pr    ∀ τ ∗ k ∈ I ∗ lK , τ ∗ k + λ 4 X i = τ ∗ k − λ 4 +1 ∥ µ i − ¯ µ K,i ∥ 2 2 ≥ M 3 λ ∆ 2 k    → 1 , (3.1) with ¯ µ K,i := P K k =0 1 n τ K lk +1 ≤ i ≤ τ K l ( k +1) o ¯ µ τ K lk ,τ K l ( k +1) . Condition 3.5 (Minimum Signal) . Assume the minimum jump size ∆ (1) satisﬁes λ ∆ 2 (1) K ∗ M 1 (log ¯ λ ) 2 → ∞ , wher e M 1 is deﬁne d in Condition 3.1. 19 Condition 3.3 imp oses some standard assumptions on the n umber of change-points. Condition 3.4 (i) states that under the ov er-ﬁtting setting where K > K ∗ , for each true change τ ∗ k ∈ T ∗ , there must exist an estimated change-point lying within the c n - neigh b orho o d of τ ∗ k asymptotically . This estimation accuracy ensures the reliability of the order-preserv ed cross-v alidation criterion. Condition 3.4 (ii) claims that for an under-ﬁtted mo del with K < K ∗ , some undetected true c hange-p ositions ha v e to b e isolated from all the estimated ones by length λ/ 2, and the estimation errors of mean scores due to the undetected c hange-p ositions are not neglected. Condition 3.5 b ounds b elow the minim um jump size of mean scores across true adjacen t in terv als. Conditions 3.4 and 3.5 are imp osed to guaran tee the consistent selection of c hange-p oin ts; see the discussion in P ein and Shah (2025). Theorem 3.2. Supp ose Conditions 3.1 – 3.5 hold. Assume that the maximum discr ep ancy satisﬁes min K  = K ∗ max i =1 ,...,n ∥ ¯ s O K ∗ ,i − ¯ s O K,i ∥ 2 2 ≳ ∆ 2 ( K ∗ ) . If the maximum jump size ∆ ( K ∗ ) satisﬁes ∆ 2 ( K ∗ ) ≳ x − 2 n (log log ¯ λ ) 2 log 2 ( n ) n , (3.2) then Pr { K ∗ ∈ A} ≥ 1 − α + o (1) . The inequalit y (3.2) in Theorem 3.2 states that the true num b er of change-points K ∗ indeed lies within the “appro ximate n ull” space. Therefore, incorp orating Theorem 3.1, the co v erage rate of A can b e guaran teed. The conditions for Theorem 3.2 relax those in Zou et al. (2020) and Pein and Shah (2025). The latter tw o pap ers b oth additionally imp ose a divergen t lo wer b ound for the “ov er-ﬁt” eﬀect; to b e sp eciﬁc, they require for some c n → ∞ , S ϵ O ( T ∗ ) − max K ∈M o S ϵ O ( T ∗ ∪ T K ) ≥ c n , (3.3) where ϵ O = { ϵ 2 i − 1 , i = 1 , . . . , n } represents the collection of individual errors from the 20 giv en o dd sample, with ϵ i deﬁned in mo del (2.2). W e will explore the explicit rate of c n in Section 3.3 to enhance the p ow er of the metho d. As sho wn in Section 5, the divergence of c n in (3.3) can b e fairly stringent in man y practical scenarios, potentially leading to the failure of consistency for the resp ectiv e estimations. 3.3 Cardinalit y of OPTICS Theorem 3.2 guaran tees that OPTICS A cov ers the true n umber of change-points K ∗ at the nominal conﬁdence lev el asymptotically . Ho w ev er, it is alwa ys p ossible to select a suﬃcien tly conserv ativ e A that encompasses K ∗ , suc h as the trivial set M . In this instance, OPTICS is non-informativ e and has no p o wer. Therefore, in this subsection, we delve into examining the p o wer of OPTICS by analyzing its cardinality . W e ﬁrst deﬁne the following t w o sets to depict the under-ﬁt and ov er-ﬁt signal-to-noise ratios, resp ectiv ely . Deﬁnition 3.1 (Signal-to-noise ratio) . (i) In a lack-of-ﬁt mo del K ∈ M l , c onsider the undete cte d change-p osition set I ∗ lK in Con- dition 3.4. F or a suﬃciently lar ge n , deﬁne B 1 n :=    K ∈ M l : X τ ∗ k ∈I ∗ lK ∆ 2 k ≳ M 1 ∆ 2 ( K ∗ ) p n log( n ) /λ    , wher e ∆ ( K ∗ ) is the maximum jump size b etwe en adjac ent changes, as deﬁne d in Se ction 3.2. (ii) In an over-ﬁt mo del K ∈ M o , for a suﬃciently lar ge n , deﬁne B 2 n := n K ∈ M o : E {S s E ( T K ) − S s E ( T K ∪ T ∗ ) } + {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } ≳ M 1 ∆ 2 ( K ∗ ) p n log( n ) o . Note that set B 1 n in Deﬁnition 3.1 (i) collects a subset of lack-of-ﬁt mo dels, where the jump sizes of undetected c hanges are b ounded below. Set B 2 n consists of some o v er-ﬁtted 21 mo dels whose in-sample o ver-ﬁtting eﬀects are suﬃcien tly large. Based on B 1 n and B 2 n , the follo wing theorem provides the asymptotic upp er b ounds for the cardinality of OPTICS. Theorem 3.3. Supp ose Mo del (2.1) holds. Under Conditions 3.1 – 3.5 and assuming the maximum discr ep ancy satisﬁes max i =1 ,...,n ∥ ¯ s O K,i − ¯ s O J,i ∥ 2 2 ≲ ∆ 2 ( K ∗ ) for al l p airs K  = J with asymptotic pr ob ability 1 , we have as n → ∞ : Pr {|A| ≤ K max − |B 1 n | − |B 2 n |} ≥ 1 − o (1) , (3.4) and Pr { K ∗ ∈ A , |A| ≤ K max − |B 1 n | − |B 2 n |} ≥ 1 − α + o (1) . (3.5) Theorem 3.3 conﬁnes the cardinalit y of conﬁdence set A . If for suﬃcien tly large n , the minim um jump size on its o wn satisﬁes ∆ 2 (1) ≥ M 1 ∆ 2 ( K ∗ ) p n log( n ) /λ , then all lack-of-ﬁt mo dels b elong to B 1 , hence |B 1 | = |M l | . Based on (3.4) in Theorem 3.3, the conﬁdence set A excludes all lac k-of-ﬁt mo dels where K ∈ M l with o verwhelming probability . If further all ov er-ﬁt models belong to B 2 , the cardinalit y of OPTICS will b e 1, and Pr {A = K ∗ } → 1. On the other hand, if unfortunately b oth under-ﬁtting and o ver-ﬁtting signals are not suﬃcien tly strong, suc h that there are no elements in either B 1 or B 2 , then (3.5) in Theorem 3.3 degenerates to the cov erage rate result in Theorem 3.2. 4 Tw o extensions of OPTICS In this section, w e extend OPTICS to the heav e-tailed data and m-dep enden t data. 4.1 Extension of OPTICS to hea vy-tailed data W e explore the p oten tial extension of the OPTICS method to hea vy-tailed data. In this scenario, the tail b eha vior of the scores s i deﬁned in (2.2) may exhibit hea vy-tailed 22 c haracteristics, causing the Sub-Gaussian condition in Condition 3.1 to no longer hold. This issue could result in the failure of the OPTICS metho d. A p ossible solution is to generalize the criterion in (2.3) from the ℓ 2 loss to a robust loss function, such as the Hub er loss (Hub er, 1992), deﬁned as ℓ κ ( u ) =        u 2 / 2 , if | u | ≤ κ κ | u | − κ 2 / 2 , if | u | > κ, where κ > 0 is a tuning parameter that balances bias and robustness. Then, we deﬁne a robust version of (2.3) as C κ ( T K ; Z E ) := 1 n K X k =0 τ K k +1 X i = τ K k +1 ℓ κ  s E i − ¯ s O τ K k ,τ K k +1  , (4.1) where ℓ κ ( x ) = P p i =1 ℓ κ ( x i ) for x ∈ R p . Alternatively , the ℓ 1 loss function can also b e used. In tuitiv ely , the diﬀerence s E i − ¯ s O τ K k ,τ k +1 K ≈ s E i − E  s E i  = ϵ E i . When ϵ E i exhibits heavy- tailed b ehavior, the ℓ 2 loss function becomes unsuitable, as it can b e distorted b y extreme v alues. In con trast, the Hub er loss is m uch more robust, as it transitions from the ℓ 2 loss to the ℓ 1 loss when ϵ E i deviates signiﬁcan tly from zero. Then, we can deﬁne a robust v ersion of (2.6) as ˆ δ K,J,κ = 1 n n X i =1  ℓ κ  s E i − ¯ s O K,i  − ℓ κ  s E i − ¯ s O J,i  := 1 n n X i =1 ξ ( i ) K,J,κ . (4.2) The corresp onding test statistic is T K,κ = max J  = K √ n ˆ δ K,J,κ ˆ σ K,J,κ = max K  = J 1 √ n n X i =1 ξ ( i ) K,J,κ ˆ σ K,J,κ , (4.3) where ˆ σ 2 K,J,κ = n − 1 P n i =1 ( ξ ( i ) K,J,κ ) 2 is the estimated second moment. F or this robust version 23 of statistic, we can also use the Gaussian multiplier bo otstrap statistic similar as (2.8), whic h is T ♯ K,κ,b = max K  = J 1 √ n n X i =1 ξ ( i ) K,J,κ ˆ σ K,J,κ ζ i . (4.4) Then the p -v alue is naturally set to b e ˆ p K,κ = B − 1 P B b =1 I( T ♯ K,κ,b > T K,κ ). W e refer to Chen and Zhou (2020) for adaptiv ely selecting the hyperparameter κ . W e name this ex- tension as Hub er-OPTICS (H-OPTICS). Subsection S.5.4 of the Supplemen tary Material demonstrates the sup erior p erformance of our proposed metho d through simulation studies 4.2 Extension of OPTICS to m -dep endent data W e extend the OPTICS metho d to handle dependent data, sp eciﬁcally fo cusing on m - dep enden t data, which is commonly used in econometrics applications (Mo on and V elasco, 2013; Kok oszk a et al., 2018). W e restrict our discussion in m ultiple mean change-points detection, where z i = β ∗ k + ϵ i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1; i = 1 , . . . , ( m + 1) n, where ϵ i ∼ ( 0 , Σ ) for cov ariance matrix Σ ∈ R d × d . This mo del is a sp ecial case of (2.1). In our w ork, w e deﬁne a sto c hastic pro cess Z = { z 1 , . . . , z ( m +1) n } is called m -dep endent for m ≥ 0, if { z 1 , . . . , z i − 1 , z i } and { z i + m +1 , z i + m +2 , . . . , z ( m +1) n } are indep enden t. Here, w e assume sample size is ( m + 1) n to simplify the presentation. When m = 0, m -dep endence reduces to standard indep endence. F or m -dep enden t data, w e apply the order-preserving ( m + 1)-splitting Z ( r ) = { z r +( m +1) i : i = 0 , . . . , n − 1 } , r = 1 , . . . , m + 1 . 24 Then each subsequence Z ( r ) consists of indep enden t observ ations. W e may therefore apply the m ultiple-splitting OPTICS pro cedure as in Section 2.3, yielding a split-sp eciﬁc p -v alue for each K ∈ M . These p -v alues are then combined to obtain a single combined p -v alue. W e refer to this extension as m -dep enden t OPTICS (m-OPTICS). Subsections S.5.5 and S.5.6 of the Supplemen tary Material provide thorough sim ulations to inv estigate the eﬀects of m -dep endence. W e ﬁnd that m-OPTICS is robust to v arious m -dep endent structures and p erforms esp ecially well when m is mo derate. How ever, we should emphasize that m cannot b e excessiv ely large. A v ery high m leads to a loss of selection p ow er, as each subsample will con tain to o few observ ations to yield a reliable c hange-p oint estimation. 5 Sim ulation studies In this section, we conduct simulation studies under v arious change-point mo del set- tings to assess the empirical performance of OPTICS. Two base change-point detection algorithms are used in the training stage for constructing OPTICS - the Binary Segmenta- tion (BS, F ryzlewicz (2014)) and the Segmen tation Neighborho o d (SN, Auger and Lawrence (1989)). F or comparison, we consider the consisten t estimation metho d COPSS (Zou et al., 2020), the FDR-control metho d FDRseg (Li et al., 2016) and the FWER-con trol metho d SMUCE (F rick et al., 2014). W e also adopt BS and SN as base algorithms for COPSS. Since no state-to-art conﬁdence set construction methods for the num b er of c hange-p oin ts are av ailable in literature, w e artiﬁcially create the quasi-conﬁdence set based on eac h p oin t estimate from COPSS, FDRseg or SMUCE, respectively , to match the cardinality of OPITCS. This also naturally enhances the cov erage rate of these existing metho ds. Sp eciﬁcally , giv en the p oin t estimate ˆ K , the asso ciated quasi-conﬁdence set is deﬁned as A q := { ˆ K − q , ˆ K − q + 1 , . . . , ˆ K , . . . , ˆ K + q } with cardinalit y 2 q + 1. W e remark that these sets only facilitate a fair n umerical comparison with comparable cardinality with OPTICS; 25 they lack theoretical justiﬁcation of the cov erage rate. Throughout this section, the candidate set is taken as M = { 1 , 2 , . . . , log ( n ) } following Zou et al. (2020), and the signiﬁcance level α = 0 . 1. W e conduct 100 sim ulation runs for eac h setting, and rep ort the co verage rate P 100 i =1 I( K ∗ ∈ A i ) / 100 and the a v erage cardinalit y of conﬁdence set P 100 i =1 |A i | / 100, where A i denotes the set obtained in the i th sim ulation run. 5.1 Multiple mean-c hange mo del Consider the m ultiple mean-c hange mo del y i = µ ∗ k + ϵ i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1 , i = 1 , . . . , 2 n, where τ ∗ k , k = 1 , . . . , K ∗ are the true mean chang e-p oin ts, µ ∗ k is the d -dimensional mean v ector for sub ject i when τ ∗ k − 1 < i ≤ τ ∗ k ; ϵ i is the indep enden tly and identically distributed random error. In this example, b oth the univ ariate mean with d = 1 and the m ultiv ariate mean v ector with d = 5 are studied. How ever, as FDRseg and SMUCE only w ork for univ ariate mean c hange-p oin t detection, w e only compare OPTICS with COPSS when d = 5. The sample size is taken to b e 2 n = 1000, and the set of true c hange-p oin ts T ∗ = { τ ∗ k = 200 k , k = 1 , . . . , 4 } , hence K ∗ = 4. The k th mean vector µ ∗ k = ( − 1) k − 1 A 1 d , k = 1 , . . . , 4, where A is a scalar, representing the amplitude, and 1 d is the d -dimensional vector with all elemen ts 1’s. The amplitude A v aries among { 0 . 50 , 0 . 625 , 0 . 75 , 0 . 875 , 1 } . Two error distributions are under consideration - the standard normal distribution N (0 , 1) and a t-distribution t (10). As we observe similar phenomenon for these t w o types of error distributions, w e only exhibit that for the more c hallenging t (10) case. Refer to Section S.5 of Supplemen tary Material for the sim ulation results under N (0 , 1). T able 1 and 2 rep ort the co v erage rates when d = 1 and d = 5, resp ectiv ely , under 26 t-distributed errors. The quantities in paren theses are av erage cardinalities of estimated sets, from whic h we observ e the a verage cardinalit y of OPTICS is around 3. Therefore, to matc h this cardinality , we tak e q = 1 for the quasi-conﬁdence sets for COPSS, FDRseg and SMUCE, i.e., A 1 = { ˆ K − 1 , ˆ K , ˆ K + 1 } with cardinalit y 3, and denote the generated sets b y A 1 C O P S S , A 1 F D Rseg and A 1 S M U C E , resp ectiv ely . W e also rep ort the cov erage rate solely b y the p oin t estimates in the last four ro ws. F rom b oth tables, the cov erage rate of OPTICS gradually meets the nominal lev el with decreasing a v erage cardinality as the amplitude A increases, esp ecially with the SN detection algorithm. This empirically illustrates the v alidity of Theorem 3.2 and 3.3. The BS algorithm is less fav orable due to its nature as an appro ximation algorithm rather than an exact one; thus, the accuracy of detected c hange p ositions falls short of meeting the requiremen ts in Condition 3.4. Nev ertheless, OPTICS with b oth SN and BS are sup erior to the quasi-conﬁdence sets generated from COPSS, FDRreg and SMUCE in terms of co v erage probabilities. Recall that these quasi-conﬁdence sets do not p ossess any theoretical guaran tee. On the other hand, as depicted in the last four rows of T able 1, the p oin t estimates exhibit lo w probabilities of correctly encompassing the true n um b er of c hange- p oin ts. F urthermore, OPTICS demonstrates its sup eriorit y compared to SMUCE and FDRseg esp ecially under non-Gaussian errors. This adv an tage stems from SMUCE and FDRseg utilizing the suprem um of Brownian motion to establish cutoﬀs, which b ecome misaligned under non-Gaussian assumptions. Notably , SMUCE lac ks p o wer when dealing with small amplitudes, elucidating the stringency of using the FWER-t yp e criterion. Additionally , it is w orth noting that OPTICS serv es as a uniﬁed metho d applicable to the multiv ariate c hange-p oin t mo dels, while b oth SMUCE and FDRseg are limited to scenarios where d = 1. 27 T able 1: The co v erage rates in the mean-change mo del: d = 1; t-distributed error. Amplitude A 0.50 0.625 0.75 0.875 1.00 OPTICS(BS) 0.72(4.02) 0.82(4.11) 0.89(3.48) 0.86(2.86) 0.80(2.75) OPTICS(SN) 0.89(4.40) 0.92(4.22) 0.97(3.88) 0.98(2.82) 0.99(2.80) A 1 C O P S S (BS) 0.46(3.00) 0.58(3.00) 0.69(3.00) 0.80(3.00) 0.88(3.00) A 1 C O P S S (SN) 0.38(3.00) 0.45(3.00) 0.80(3.00) 0.88(3.00) 0.97(3.00) A 1 F D Rseg 0.69(3.00) 0.64(3.00) 0.59(3.00) 0.58(3.00) 0.61(3.00) A 1 S M U C E 0.28(3.00) 0.62(3.00) 0.94(3.00) 0.97(3.00) 0.96(3.00) COPSS(BS) 0.19 0.25 0.46 0.47 0.48 COPSS(SN) 0.18 0.33 0.57 0.77 0.85 FDRseg 0.39 0.45 0.44 0.43 0.38 SMUCE 0.07 0.30 0.58 0.81 0.89 T able 2: The co v erage rates in the mean-change mo del: d = 5; t-distributed error. Amplitude A 0.50 0.625 0.75 0.875 1.00 OPTICS(BS) 0.60(3.37) 0.54(2.84) 0.61(2.17) 0.58(2.29) 0.70(2.06) OPTICS(SN) 0.73(4.09) 0.82(3.40) 0.78(2.33) 0.91(2.29) 0.97(2.37) A 1 C O P S S (BS) 0.51(3.00) 0.51(3.00) 0.65(3.00) 0.77(3.00) 0.84(3.00) A 1 C O P S S (SN) 0.58(3.00) 0.66(3.00) 0.85(3.00) 0.92(3.00) 0.93(3.00) COPSS(BS) 0.20 0.23 0.38 0.40 0.54 COPSS(SN) 0.24 0.39 0.68 0.84 0.89 5.2 Linear regression mo del with co eﬃcien t structural-breaks The change-point detection problem can b e naturally extended to the co eﬃcient structural- c hange detection in linear regression mo dels. In this subsection, we consider the follo wing linear regression mo del with K ∗ p otential co eﬃcien t structural-breaks: y i = x ⊤ i β ∗ k + ϵ i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1 , i = 1 , . . . , 2 n, where β ∗ k = ( − 1) k − 1 A 1 d , k = 1 , . . . , K ∗ + 1. W e generate the co v ariate x i from the m ultiv ariate normal distribution N ( 0 , I d ), with d = 5, and the error term ϵ i from N (0 , 1) and t (10), resp ectively . The sample size n = 1000, the true num b er of change-points K ∗ = 4, and the true change-point set T ∗ = { 200 k , k = 1 , . . . , 4 } . The cov erage rates under the t (10) distribution are presented in T able 3. F or the co v erage rates under N (0 , 1), please refer to T able S.4 in Section S.5 of Supplementary 28 T able 3: The cov erage rates in linear mo del with co eﬃcient structural-breaks; t (10) errors. Amplitude A 0.10 0.125 0.15 0.175 0.20 OPTICS(BS) 0.76(3.94) 0.89(3.67) 0.79(3.26) 0.94(2.80) 0.85(2.39) OPTICS(SN) 0.85(3.22) 0.95(2.65) 0.94(2.19) 0.99(1.95) 0.98(1.62) A 1 C O P S S (BS) 0.44(3.00) 0.46(3.00) 0.52(3.00) 0.52(3.00) 0.53(3.00) A 1 C O P S S (SN) 0.67(3.00) 0.72(3.00) 0.78(3.00) 0.75(3.00) 0.78(3.00) COPSS(BS) 0.14 0.24 0.21 0.19 0.28 COPSS(SN) 0.46 0.46 0.58 0.59 0.66 Material. Since FDRseg and SMUCE are inapplicable for structural-break detection, our comparison focuses solely on OPTICS against COPSS and its related quasi-conﬁdence sets. In T able 3, a notably sup erior p erformance of OPTICS is observed in terms of cov erage probabilit y , particularly when SN is used as the base algorithm. Imp ortan tly , COPSS fails to ac hieve consistency under these w eak signal settings, and the asso ciated quasi-conﬁdence in terv als that match the cardinalities of OPTICS are not adequately expanded to ensure the cov erage rate. In contrast, OPTICS adaptively adjusts its cardinality to ac hiev e the desired conﬁdence lev el, due to the nature of its originated testing framework. W e also include expanded simulation studies co v ering v ariance, netw ork, and co v ari- ance c hange-p oin ts, and m ultiple mean changes with heavy-tailed as well as m -dep endent distributions in Section S.5 of the Supplementary Material 6 Real Data Analysis In this section, we apply OPTICS to analyze the bladder tumor microarra y dataset sourced from the e cp R pack age (James and Matteson, 2015). W e extract data from 10 individuals diagnosed with bladder tumors. F or eac h individual, log-in tensit y-ratio mea- suremen ts for 2215 distinct genetic lo ci were collected. Our primary ob jective is to identify the num b er of change-points within the genetic lo ci, allo wing us to pinp oin t p otential inﬂuen tial genes asso ciated with bladder tumors. W e ﬁrst demonstrate that the dataset may exhibit sub-Gaussian b eha vior, thereby 29 satisfying the main assumption of OPTICS. An equiv alen t condition for sub-Gaussianit y is that the k th momen t is b ounded by ck k/ 2 for some universal constan t c > 0. The left panel in Figure 2 presen ts the v alues of up to the 100th momen ts for 10 individuals, which are b ounded b y k k/ 2 (blac k curv e). This conﬁrms that, at least for each individual, the data exhibit sub-Gaussian prop erties. Figure 2: Left Panel: The k th momen t plots of 10 individuals, where the blac k curve represen ts y = k k/ 2 . The y-axis is rescaled using log 10 . Righ t P anel: Bo xplots of Hausdorﬀ distances b etw een the conﬁdence sets obtained by OPTICS and those generated randomly . W e apply our OPTICS pro cedure for detecting multiple mean c hanges to the bladder tumor microarray dataset. This dataset is analyzed from tw o persp ectives. First, we apply OPTICS to each individual and construct the corresp onding conﬁdence sets. Since all ten individuals share the same disease, their change-point patterns ma y exhibit similar structures. W e set K max = 3 log (10) and apply OPTICS to obtain A i for i ∈ [10], where A i represen ts the conﬁdence set for the n umber of change-points for individual i . The concrete v alues within eac h set can b e found in Section S.6 in the Supplementary Material. T o verify whether A i for i ∈ [10] share similar patterns, we compute the pairwise Hausdorﬀ distances b et w een these sets and compare them to randomly generated sets. Sp eciﬁcally , w e construct B i for i ∈ [10], where B i is sampled from { 1 , . . . , K max } without replacemen t and has the same cardinality as A i . The righ t panel of Figure 2 presents the b o xplots of the distributions of Hausdorﬀ distances for the OPTICS conﬁdence sets and the randomly generated sets. It is evident that the OPTICS conﬁdence sets hav e smaller Hausdorﬀ distances, indicating greater consistency in detecting shared patterns across individuals. 30 This shows that the OPTICS has p o w er in co v ering the underlying true n um b er of c hange- p oin ts. W e further detect the change-points using the Binary Segmentation algorithm. T o con trol the family-wise error rate (FWER) (see the discussion in Section 2.3), w e select the smallest v alue in eac h A i as the optimal n um b er of change-points. The detected change- p oin ts, based on the corresp onding n umber of c hange-p oints, are presen ted in Figure 3. F rom the ﬁgure, it is eviden t that the n um b er of c hange-p oin ts c hosen b y OPTICS pro vides a go o d ﬁt for eac h individual. Figure 3: Detected c hange-p oints for eac h individual separately . 31 W e no w p erform a join t analysis of the 10 individuals. Using OPTICS for the multi- dimensional mean c hange-p oints mo del, we construct the conﬁdence set for the num b er of c hange-p oin ts, A joint = { 10 , 13 , 16 , 19 , 22 } . T o control the FWER, w e select the minimum v alue within A joint . The change-points are then detected using the Wild Binary Segmen ta- tion (WBS) algorithm, whic h is S joint = { 154 , 358 , 1140 , 1268 , 1534 , 1724 , 1906 , 1966 , 2052 , 2142 } , as sho wn in Figure 4. The plot illustrates that eac h dark red b old vertical line, iden tiﬁed via the OPTICS metho d, corresponds to a lo cation where a c hange o ccurs in at least one co or- dinate. Con versely , the ligh t red v ertical lines, represen ting the default WBS detections not selected by OPTICS, contain n umerous false p ositiv es. This conﬁrms that OPTICS-based selection eﬀectively controls the false disco very rate while preserving detection accuracy . Figure 4: Detected c hange-p oin ts for 10 individuals analyzed join tly . The b old dashed lines (dark red) represen t c hange-p oin ts selected b y the OPTICS FWER con trol criteria, while the solid light lines (ligh t red) indicate additional change-points iden tiﬁed by the default stopping criteria in the wbs R pack age. 32 7 Conclusion Determining the num b er of c hange-p oin ts is a fundamental problem in literature. Rather than oﬀering a single-p oin t estimate, we propose a testing framework designed to construct a conﬁdence set for the true n um b er of change-points. The proposed metho d, named OP- TICS, rigorously co vers the true num b er with the predetermined conﬁdence level under mild conditions. Additionally , we delve into studying the cardinality of OPTICS to ensure the obtained set is nontrivial and informativ e. The cardinalit y and co v erage rate of OP- TICS can also b e utilized to assess the eﬃcacy of base c hange-p oint detection algorithms. F urthermore, w e implemen t a multiple-splitting approach to stabilize OPTICS. W e also extend this framework to accommo date high-dimensional datasets and develop a robust v ersion capable of handling m -dep endent and heavy-tailed distributions. 8 Supplemen tary Materials The Supplementary Material provides pro ofs for Theorems 3.1, 3.2, and 3.3, along with an extended literature review on statistical inference for change-point detection. W e also include guidelines for selecting loss functions across v arious mo dels and expanded sim ulation studies cov ering v ariance, netw ork, and cov ariance c hange-p oin ts, m ultiple mean c hanges with hea vy-tailed as well as m -dependent distributions. Finally , additional real-data results are pro vided; the dataset used in the real data analysis is av ailable in the ecp R pack age. 33 Supplemen t to “OPTICS: Order-Preserv ed T est-In v erse Conﬁdence Set for Num b er of Change-P oin ts” This supplementary material contains the all technique pro ofs, additional literature review, additional sim ulations and real data results of the main pap er. S.1 Pro ofs for Theorems in Section 3.1 In the follo wing pro ofs, we denote c and C as generic constants which may diﬀer line to line. S.1.1 Auxiliary lemmas The ﬁrst lemma gives a uniform concentration of sample means ¯ s O K,i for all i = 1 , . . . , n and K ∈ M . Let ∥ x ∥ 1 = P d j =1 | x j | for a v ector x ∈ R d . Lemma S.1. Under Conditions 3.1 and 3.2, ther e exists a p ositive c onstant c such that Pr  max K ∈M max 1 ≤ i ≤ n   ¯ s O K,i − E( ¯ s O K,i )   1 ≳ n − 1 / 2 log( n )  ≲ n − c . Pr o of. Recall that s i = E( s i ) + ϵ i , ϵ i := s i − E( s i ) = ( ϵ i 1 , . . . , ϵ id ) ⊤ for i = 1 , . . . , n . By Condition 3.1(i), for each j = 1 , . . . , d , ∥ ϵ ij ∥ ψ 1 ≤ M 1 , i = 1 , . . . , n . Since w e do not consider the high-dimensional setting here, d is ﬁxed. F or eac h candidate segmentation T K = { τ K 1 , . . . , τ K K } , the quan tit y ¯ s O K,i is the sample a v erage of { s l } ov er the segmen t con taining i . Hence, ¯ s O K,i − E( ¯ s O K,i ) = 1 b − a + 1 b X l = a ϵ l 34 for some subset { a, . . . , b } ⊂ { 1 , . . . , n } determined by K and i . By Condition 3.2, every suc h in terv al has length at least of order n/ log ( n ). Therefore, if we deﬁne Θ n :=  ( a, b ) : 1 ≤ a ≤ b ≤ n, b − a + 1 ≳ n log( n )  , then Pr  max K ∈M max 1 ≤ i ≤ n ∥ ¯ s O K,i − E( ¯ s O K,i ) ∥ 1 ≥ x n  ≤ Pr ( max ( a,b ) ∈ Θ n      1 b − a + 1 b X l = a ϵ l      1 ≥ x n ) . (S.1) No w ﬁx ( a, b ) ∈ Θ n and let m = b − a + 1. Since      1 m b X l = a ϵ l      1 ≤ d X j =1      1 m b X l = a ϵ lj      , w e ha ve, by the union b ound, Pr ( max ( a,b ) ∈ Θ n      1 b − a + 1 b X l = a ϵ l      1 ≥ x n ) ≤ X ( a,b ) ∈ Θ n Pr (      1 b − a + 1 b X l = a ϵ l      1 ≥ x n ) ≤ X ( a,b ) ∈ Θ n d X j =1 Pr (      1 b − a + 1 b X l = a ϵ lj      ≥ x n d ) . (S.2) F or each ﬁxed ( a, b ) ∈ Θ n and j ∈ { 1 , . . . , d } , the random v ariables ϵ aj , . . . , ϵ bj are indep enden t, mean zero, and uniformly sub-exp onen tial. Hence, by Bernstein’s inequality for sub-exp onen tial random v ariables, there exist p ositiv e constan ts c 1 and c 2 , dep ending only on M 1 , such that Pr (      1 m b X l = a ϵ lj      ≥ t ) ≤ 2 exp  − c 1 m min { t 2 , t }  , t > 0 . 35 Applying this b ound with t = x n /d and using m ≳ n/ log ( n ) for ( a, b ) ∈ Θ n , we obtain Pr (      1 b − a + 1 b X l = a ϵ lj      ≥ x n d ) ≲ exp  − c 2 n log( n ) min { x 2 n , x n }  . (S.3) Since | Θ n | ≤ n 2 and d is ﬁxed, combining (S.1)–(S.3) yields Pr  max K ∈M max 1 ≤ i ≤ n ∥ ¯ s O K,i − E( ¯ s O K,i ) ∥ 1 ≥ x n  ≲ n 2 exp  − c 2 n log( n ) min { x 2 n , x n }  . No w c ho ose x n = An − 1 / 2 log( n ), for a suﬃcien tly large constan t A > 0. Then x n → 0, so for all suﬃcien tly large n , min { x 2 n , x n } = x 2 n = A 2 n − 1 log 2 ( n ), and therefore n log( n ) min { x 2 n , x n } = A 2 log( n ) . It follows that Pr  max K ∈M max 1 ≤ i ≤ n ∥ ¯ s O K,i − E( ¯ s O K,i ) ∥ 1 ≥ An − 1 / 2 log( n )  ≲ n 2 − c 2 A 2 . By taking A suﬃcien tly large, w e obtain Pr  max K ∈M max 1 ≤ i ≤ n ∥ ¯ s O K,i − E( ¯ s O K,i ) ∥ 1 ≳ n − 1 / 2 log( n )  ≲ n − c for some p ositiv e constan t c . This completes the pro of. Next lemma sho ws ( ξ ( i ) K,J − E[ ξ ( i ) K,J ]) /σ K,J has sub-exp onential tail under H 0 ,K to fulﬁll our main theorem. Lemma S.2. L et ξ ( i ) K,J := ∥ s E i − ¯ s O K,i ∥ 2 2 − ∥ s E i − ¯ s O J,i ∥ 2 2 , and let F O denote the σ -ﬁeld gener ate d by the observe d sample use d to c onstruct { ¯ s O K,i : K ∈ M , 1 ≤ i ≤ n } . Deﬁne the c onditional 36 varianc e sc ale σ 2 K,J := 1 n P n i =1 V ar  ξ ( i ) K,J | F O  . Assume Conditions 3.1 and 3.2 hold. In addition, assume that for the p air ( K , J ) under c onsider ation, the dete cte d change-p oint sets T K and T J ar e neste d. Then, under H 0 ,K , on the event E n :=  max K ∈M max 1 ≤ i ≤ n ∥ ¯ s O K,i − E( ¯ s O K,i ) ∥ 1 ≤ cn − 1 / 2 log( n )  , (S.4) ther e exists a c onstant C > 0 such that max 1 ≤ i ≤ n      ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) σ K,J      ψ 1 |F O ≤ C p log( n ) . Conse quently, Pr    max 1 ≤ i ≤ n      ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) σ K,J      ψ 1 |F O ≲ p log( n )    → 1 . Pr o of. By Lemma S.1, Pr( E n ) → 1. W e w ork on the even t E n throughout the pro of. F or ﬁxed K , J ∈ M and 1 ≤ i ≤ n , expand ξ ( i ) K,J = ∥ s E i − ¯ s O K,i ∥ 2 2 − ∥ s E i − ¯ s O J,i ∥ 2 2 = ∥ ¯ s O K,i ∥ 2 2 − ∥ ¯ s O J,i ∥ 2 2 − 2( s E i ) ⊤ ( ¯ s O K,i − ¯ s O J,i ) = ( ¯ s O K,i + ¯ s O J,i ) ⊤ ∆ K,J,i − 2( s E i ) ⊤ ∆ K,J,i , where ∆ K,J,i := ¯ s O K,i − ¯ s O J,i . Conditioning on F O , the v ectors ¯ s O K,i , ¯ s O J,i , and hence ∆ K,J,i are deterministic. Therefore, ξ ( i ) K,J − E  ξ ( i ) K,J | F O  = − 2 n ( s E i ) ⊤ ∆ K,J,i − E  ( s E i ) ⊤ ∆ K,J,i | F O  o . 37 Since ¯ s O K,i and ¯ s O J,i are F O -measurable, it follo ws from Condition 3.1(i) and the basic prop- erties of the Orlicz norm that    ξ ( i ) K,J − E( ξ ( i ) K,J | F O )    ψ 1 |F O ≲ ∥ ∆ K,J,i ∥ 2 . (S.5) Indeed, writing ∆ K,J,i = ( δ i 1 , . . . , δ id ) ⊤ and using that d is ﬁxed,   ( s E i ) ⊤ ∆ K,J,i   ψ 1 |F O ≤ d X j =1 | δ ij | ∥ s E ij ∥ ψ 1 ≲ ∥ ∆ K,J,i ∥ 1 ≲ ∥ ∆ K,J,i ∥ 2 . Next, we study the denominator. Since ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) = − 2 n ( s E i ) ⊤ ∆ K,J,i − E  ( s E i ) ⊤ ∆ K,J,i | F O  o , w e ha ve V ar  ξ ( i ) K,J | F O  = 4 ∆ ⊤ K,J,i V ar( s E i )∆ K,J,i . By Condition 3.1(iii), there exists a constant c 0 > 0 such that V ar  ξ ( i ) K,J | F O  ≥ c 0 ∥ ∆ K,J,i ∥ 2 2 . (S.6) Hence, σ 2 K,J = 1 n n X i =1 V ar  ξ ( i ) K,J | F O  ≳ 1 n n X i =1 ∥ ∆ K,J,i ∥ 2 2 . No w use the additional nesting assumption on T K and T J . Since the t wo segmen tations are nested, the vector sequence { ∆ K,J,i } n i =1 is piecewise constan t on the ﬁner partition, and 38 eac h constan t blo c k has length at least of order n/ log ( n ) by Condition 3.2. Therefore, if M K,J := max 1 ≤ i ≤ n ∥ ∆ K,J,i ∥ 2 , then there are at least c 1 n/ log( n ) indices i suc h that ∥ ∆ K,J,i ∥ 2 = M K,J for some constan t c 1 > 0. It follo ws that 1 n n X i =1 ∥ ∆ K,J,i ∥ 2 2 ≳ 1 log( n ) M 2 K,J . (S.7) Com bining this with (S.6), we obtain σ K,J ≳ 1 p log( n ) M K,J . (S.8) Finally , com bining (S.5) and (S.8) yields, for ev ery 1 ≤ i ≤ n ,      ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) σ K,J      ψ 1 |F O ≲ ∥ ∆ K,J,i ∥ 2 M K,J / p log( n ) ≤ C p log( n ) for some constan t C > 0. T aking the maxim um ov er 1 ≤ i ≤ n gives max 1 ≤ i ≤ n      ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) σ K,J      ψ 1 |F O ≤ C p log( n ) on E n . Since Pr( E n ) → 1, the conclusion follo ws. Lemma S.3. L et ˆ δ K,J := 1 n P n i =1 ξ ( i ) K,J , δ K,J := 1 n P n i =1 E  ξ ( i ) K,J | F O  , wher e F O denotes the σ -ﬁeld gener ate d by the observe d sample use d to c onstruct { ¯ s O K,i : K ∈ M , 1 ≤ i ≤ n } . Under Conditions 3.1 and 3.2, and assume that for the p air ( K , J ) under c onsider ation, the dete cte d change-p oint sets T K and T J ar e neste d, ther e exists a p ositive c onstant c such 39 that Pr ( max J  = K      ˆ δ K,J − δ K,J σ K,J      ≳ n − 1 / 2 log( n ) ) ≲ n − c , wher e σ 2 K,J := 1 n P n i =1 V ar  ξ ( i ) K,J | F O  for J ∈ M\{ K } . Pr o of. Recall that b y Lemma S.1, Pr( E c n ) ≲ n − c 1 for some p ositiv e constant c 1 , where E n is deﬁned in (S.4). Fix J  = K , and deﬁne Z i,J := ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) σ K,J , i = 1 , . . . , n. Then ˆ δ K,J − δ K,J σ K,J = 1 n n X i =1 Z i,J . Conditional on F O , the random v ariables Z 1 ,J , . . . , Z n,J are independent and satisfy E( Z i,J | F O ) = 0 for i = 1 , . . . , n . Moreov er, b y Lemma S.2, on the ev ent E n , max 1 ≤ i ≤ n ∥ Z i,J ∥ ψ 1 |F O ≲ p log( n ). Therefore, conditional on F O and on the even t E n , Bernstein’s inequality for indep en- den t cen tered sub-exp onen tial random v ariables yields Pr (      1 n n X i =1 Z i,J      ≥ x n      F O ) ≤ 2 exp " − c 2 n min ( x 2 n log( n ) , x n p log( n ) )# for some p ositiv e constan t c 2 . Hence, again on E n , Pr (      ˆ δ K,J − δ K,J σ K,J      ≥ x n      F O ) ≤ 2 exp " − c 2 n min ( x 2 n log( n ) , x n p log( n ) )# . 40 Applying the union bound ov er J ∈ M\{ K } , w e obtain on E n , Pr ( max J  = K      ˆ δ K,J − δ K,J σ K,J      ≥ x n      F O ) ≤ 2 |M| exp " − c 2 n min ( x 2 n log( n ) , x n p log( n ) )# . Since |M| ≤ K max , it follo ws that Pr ( max J  = K      ˆ δ K,J − δ K,J σ K,J      ≥ x n , E n ) ≤ 2 K max exp " − c 2 n min ( x 2 n log( n ) , x n p log( n ) )# . No w c ho ose x n = An − 1 / 2 log( n ), for a suﬃcien tly large constant A > 0. Then n x 2 n log( n ) = A 2 log( n ) , n x n p log( n ) = A p n log( n ) . Hence, for all suﬃcien tly large n , n min ( x 2 n log( n ) , x n p log( n ) ) = A 2 log( n ) , and therefore Pr ( max J  = K      ˆ δ K,J − δ K,J σ K,J      ≥ An − 1 / 2 log( n ) , E n ) ≤ 2 K max n − c 2 A 2 . If K max is ﬁxed or gro ws at most p olynomially in n , then by taking A suﬃciently large, 2 K max n − c 2 A 2 ≲ n − c 3 for some p ositiv e constan t c 3 . Finally , Pr ( max J  = K      ˆ δ K,J − δ K,J σ K,J      ≥ An − 1 / 2 log( n ) ) ≤ Pr( E c n ) + Pr ( max J  = K      ˆ δ K,J − δ K,J σ K,J      ≥ An − 1 / 2 log( n ) , E n ) ≲ n − c 41 for some p ositiv e constan t c . This completes the pro of. Let σ 2 K,J = P n i =1 E[( ξ ( i ) K,J ) 2 ] /n and ˆ δ K,J = P n i =1 ξ ( i ) K,J /n for J ∈ M\{ K } . The sample second momen t matrix is deﬁned as b Γ ( K ) =  1 n P n i =1 ξ ( i ) K,J ξ ( i ) K,J ′  J,J ′ ∈ R ( |M|− 1) × ( |M|− 1) with J, J ′ ∈ M\{ K } and its p opulation analog Γ ( K ) =  1 n P n i =1 E( ξ ( i ) K,J ξ ( i ) K,J ′ | F O )  J,J ′ , where F O denotes the σ -ﬁeld generated b y the observ ed sample used to construct { ¯ s O K,i : K ∈ M , 1 ≤ i ≤ n } . Besides, let D ( K ) =  diag(Γ ( K ) )  1 / 2 and b D ( K ) =  diag( b Γ ( K ) )  1 / 2 . F urthermore, let b H ( K ) = ( b D ( K ) ) − 1 b Γ ( K ) ( b D ( K ) ) − 1 and H ( K ) = ( D ( K ) ) − 1 Γ ( K ) ( D ( K ) ) − 1 . Lemma S.4. Assume Conditions 3.1 and 3.2 hold. In addition, assume the nesting c ondi- tion in L emma S.2, and that ther e exists a c onstant c 0 > 0 such that min J ∈M\{ K } D ( K ) J J ≥ c 0 with pr ob ability tending to one. If K max ≍ log ( n ) , then ther e exists a p ositive c onstant c such that Pr  max J,J ′ ∈M\{ K }     b H ( K ) − H ( K )  J,J ′    ≳ n − 1 / 2 log 3 / 2 ( n )  ≲ n − c . Pr o of. Recall that b y Lemma S.1, Pr( E c n ) ≲ n − c 1 for some c 1 > 0, where E n is deﬁned in S.4. W e w ork on the ev en t E n throughout the pro of. F or J ∈ M\{ K } , deﬁne Z i,J := ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) D ( K ) J J , i = 1 , . . . , n. By Lemma S.2, on E n , max J ∈M\{ K } max 1 ≤ i ≤ n ∥ Z i,J ∥ ψ 1 |F O ≲ p log( n ). Hence, for eac h pair ( J, J ′ ), U i,J J ′ := Z i,J Z i,J ′ − E( Z i,J Z i,J ′ | F O ) is conditionally cen tered and satisﬁes max J,J ′ max 1 ≤ i ≤ n ∥ U i,J J ′ ∥ ψ 1 / 2 |F O ≲ log ( n ), b ecause the pro duct of tw o sub-exp onen tial random v ariables is sub-W eibull of order 1 / 2. 42 No w note that b Γ ( K ) J J ′ − Γ ( K ) J J ′ D ( K ) J J D ( K ) J ′ J ′ = 1 n n X i =1 " ξ ( i ) K,J ξ ( i ) K,J ′ D ( K ) J J D ( K ) J ′ J ′ − E ξ ( i ) K,J ξ ( i ) K,J ′ D ( K ) J J D ( K ) J ′ J ′      F O !# = 1 n n X i =1 U i,J J ′ + R J J ′ , where R J J ′ = 1 n n X i =1 E( ξ ( i ) K,J | F O ) E( ξ ( i ) K,J ′ | F O ) D ( K ) J J D ( K ) J ′ J ′ . By the pro of of Lemma S.2, on E n , max J max 1 ≤ i ≤ n      E( ξ ( i ) K,J | F O ) D ( K ) J J      ≲ p log( n ) , so | R J J ′ | ≲ log( n ) uniformly in ( J, J ′ ). Since R J J ′ app ears in b oth b Γ ( K ) J J ′ and Γ ( K ) J J ′ , it cancels in the cen tered diﬀerence ab ov e. Therefore it suﬃces to con trol the av erage of { U i,J J ′ } n i =1 . Conditional on F O , the v ariables U 1 ,J J ′ , . . . , U n,J J ′ , are indep enden t, centered, and uni- formly sub-W eibull of order 1 / 2 with parameter of order log ( n ). Hence, Bernstein’s in- equalit y for sub-W eibull random v ariables yields Pr (      1 n n X i =1 U i,J J ′      ≥ t      F O ) ≤ 2 exp " − c 2 min ( nt 2 log 2 ( n ) ,  nt log( n )  1 / 2 )# for some c 2 > 0. Applying the union bound o ver all ( J , J ′ ) ∈ ( M\{ K } ) 2 , w e obtain on E n , Pr ( max J,J ′      b Γ ( K ) J J ′ − Γ ( K ) J J ′ D ( K ) J J D ( K ) J ′ J ′      ≥ t      F O ) ≤ 2( K max − 1) 2 exp " − c 2 min ( nt 2 log 2 ( n ) ,  nt log( n )  1 / 2 )# . (S.9) 43 Cho ose t = An − 1 / 2 log 3 / 2 ( n ), for a suﬃcien tly large constan t A > 0. Then nt 2 log 2 ( n ) = A 2 log( n ) ,  nt log( n )  1 / 2 = A 1 / 2 n 1 / 4 log 1 / 4 ( n ) , so the ﬁrst term determines the rate for all suﬃciently large n . Since K max is ﬁxed or gro ws at most p olynomially in n , (S.9) implies that Pr ( max J,J ′      b Γ ( K ) J J ′ − Γ ( K ) J J ′ D ( K ) J J D ( K ) J ′ J ′      ≳ n − 1 / 2 log 3 / 2 ( n ) , E n ) ≲ n − c 3 (S.10) for some c 3 > 0. Next, for eac h J , b D ( K ) J J =  b Γ ( K ) J J  1 / 2 , and D ( K ) J J =  Γ ( K ) J J  1 / 2 . By (S.10) with J = J ′ , max J    b Γ ( K ) J J − Γ ( K ) J J    ≲ n − 1 / 2 log 3 / 2 ( n ) with probability at least 1 − O ( n − c 3 ) on E n . Since min J D ( K ) J J ≥ c 0 > 0 with probabilit y tending to one, the map x 7→ x 1 / 2 is Lipschitz on a neighborho o d of { Γ ( K ) J J } J , and therefore max J    b D ( K ) J J − D ( K ) J J    ≲ n − 1 / 2 log 3 / 2 ( n ) with probability at least 1 − O ( n − c 4 ) on E n . Consequently , max J    ( b D ( K ) ) − 1 J J − ( D ( K ) ) − 1 J J    ≲ n − 1 / 2 log 3 / 2 ( n ) with probability at least 1 − O ( n − c 4 ) on E n . 44 Finally , write b H ( K ) − H ( K ) = ( b D ( K ) ) − 1 b Γ ( K ) ( b D ( K ) ) − 1 − ( D ( K ) ) − 1 Γ ( K ) ( D ( K ) ) − 1 =  ( b D ( K ) ) − 1 − ( D ( K ) ) − 1  b Γ ( K ) ( b D ( K ) ) − 1 + ( D ( K ) ) − 1  b Γ ( K ) − Γ ( K )  ( b D ( K ) ) − 1 + ( D ( K ) ) − 1 Γ ( K )  ( b D ( K ) ) − 1 − ( D ( K ) ) − 1  . Because ( K max − 1) is ﬁnite or p olynomially gro wing, and all diagonal en tries of D ( K ) are b ounded aw ay from zero with probability tending to one, eac h term on the right-hand side is of order O P  n − 1 / 2 log 3 / 2 ( n )  in the elemen twise maxim um norm. Therefore, Pr  max J,J ′     b H ( K ) − H ( K )  J,J ′    ≳ n − 1 / 2 log 3 / 2 ( n )  ≲ Pr( E c n ) + n − c 5 ≲ n − c for some p ositiv e constan t c . This completes the pro of. S.1.2 Pro of of Theorem 3.1 W e ignore the Monte Carlo v ariabilit y from b o otstrap resampling and regard ˆ p K as the limiting b o otstrap p -v alue when the b o otstrap sample size B → ∞ . Let γ n = n − 1 / 2 log 3 / 2 ( n ) and η n = n − 1 / 2 log( n ). Deﬁne the even t F n = ( ∥ b H ( K ) − H ( K ) ∥ ∞ ≤ C 1 γ n , max J ∈M\{ K }      ˆ δ K,J − δ K,J D ( K ) J J      ≤ C 1 η n ) . 45 By Lemma S.3 and Lemma S.4, there exists a constant c 1 > 0 such that Pr( F c n ) ≲ n − c 1 . Moreo v er, b y the diagonal part of Lemma S.4 and the assumption min J ∈M\{ K } D ( K ) J J ≥ c 0 > 0 with probabilit y tending to one, we hav e on F n , max J ∈M\{ K }      b D ( K ) J J D ( K ) J J − 1      ≲ γ n , max J ∈M\{ K }      D ( K ) J J b D ( K ) J J − 1      ≲ γ n . Let G ( K ) =  1 n P n i =1 Co v  ξ ( i ) K,J , ξ ( i ) K,J ′ | F O   J,J ′ ∈M\{ K } denote the conditional cov ariance matrix, and deﬁne its normalized coun terpart F ( K ) = ( D ( K ) ) − 1 G ( K ) ( D ( K ) ) − 1 . Let T ∗ ( A ) denote a cen tered Gaussian vector with co v ariance matrix A , and let z max ( α, A ) b e the upp er α -quantile of ∥ T ∗ ( A ) ∥ ∞ . W e ﬁrst prov e part (1). On F n , √ n max J ∈M\{ K } ˆ δ K,J b D ( K ) J J ≤ √ n max J ∈M\{ K } ˆ δ K,J − δ K,J D ( K ) J J D ( K ) J J b D ( K ) J J ! + √ n max J ∈M\{ K } δ K,J b D ( K ) J J ≤ √ n max J ∈M\{ K } ˆ δ K,J − δ K,J D ( K ) J J + C 2 √ n η n γ n + √ n max J ∈M\{ K } δ K,J D ( K ) J J (1 + C 2 γ n ) for some constan t C 2 > 0. Under the appro ximate null assumption, max J ∈M\{ K } δ K,J D ( K ) J J ≤ x n ( n log n ) − 1 / 2 where x n = o (1), and therefore √ n max J ∈M\{ K } δ K,J D ( K ) J J = o  (log n ) − 1 / 2  = o (1) . 46 Also, √ n η n γ n = √ n · n − 1 / 2 log( n ) · n − 1 / 2 log 3 / 2 ( n ) = n − 1 / 2 log 5 / 2 ( n ) = o (1) . Hence the previous displa y implies √ n max J ∈M\{ K } ˆ δ K,J b D ( K ) J J ≤ √ n max J ∈M\{ K } ˆ δ K,J − δ K,J D ( K ) J J + o (1) uniformly on F n . No w deﬁne the cen tered normalized sum v ector S K = n − 1 / 2 n X i =1 ξ ( i ) K,J − E( ξ ( i ) K,J | F O ) D ( K ) J J ! J ∈M\{ K } . Conditional on F O , Lemma S.2 pro vides the required tail control, and a Gaussian appro x- imation for maxima of sums of indep endent random vectors yields sup t ∈ R   Pr  ∥ S K ∥ ∞ ≤ t | F O  − Pr  ∥ T ∗ ( F ( K ) ) ∥ ∞ ≤ t | F O    = o (1) . Consequen tly , for some sequence ϖ n → 0, Pr ( √ n max J ∈M\{ K } ˆ δ K,J − δ K,J D ( K ) J J ≥ z max ( α + ϖ n , F ( K ) ) ) ≤ α + ϖ n + o (1) . Next, note that Γ ( K ) = G ( K ) + µ ( K ) ( µ ( K ) ) ⊤ , where µ ( K ) =  δ K,J D ( K ) J J  J ∈M\{ K } , so H ( K ) − F ( K ) = µ ( K ) ( µ ( K ) ) ⊤ . Under the approximate n ull assumption, ∥ µ ( K ) ∥ ∞ ≤ x n ( n log n ) − 1 / 2 = o (1). Therefore, 47 ∥ H ( K ) − F ( K ) ∥ ∞ = o (1). By the con tin uit y of the Gaussian max quan tiles with resp ect to the cov ariance matrix en tries, this implies z max ( α + ϖ n , H ( K ) ) ≥ z max ( α + ϖ n , F ( K ) ) − o (1) . F urthermore, on F n , we hav e ∥ b H ( K ) − H ( K ) ∥ ∞ ≤ C 1 γ n = o (1), and therefore z max ( α, b H ( K ) ) ≥ z max ( α + ϖ n , H ( K ) ) − o (1) for a p ossibly diﬀerent sequence ϖ n → 0. Com bining the previous displa ys, we obtain Pr { ˆ p K ≤ α } = Pr ( √ n max J ∈M\{ K } ˆ δ K,J b D ( K ) J J ≥ z max ( α, b H ( K ) ) ) ≤ α + o (1) . Equiv alen tly , Pr { H 0 ,K is not rejected at level α } ≥ 1 − α + o (1) . W e no w prov e part (2). Supp ose there exists some J ∗ ∈ M\{ K } suc h that δ K,J ∗ D ( K ) J ∗ J ∗ ≥ c n − 1 / 2 log( n ) for a suﬃcien tly large constant c > 0. On F n , ˆ δ K,J ∗ b D ( K ) J ∗ J ∗ = δ K,J ∗ D ( K ) J ∗ J ∗ + ˆ δ K,J ∗ − δ K,J ∗ D ( K ) J ∗ J ∗ ! D ( K ) J ∗ J ∗ b D ( K ) J ∗ J ∗ . 48 Hence, on F n , ˆ δ K,J ∗ b D ( K ) J ∗ J ∗ ≥  c n − 1 / 2 log( n ) − C 1 n − 1 / 2 log( n )  (1 − C 2 γ n ) . By choosing c suﬃcien tly large, w e obtain for all suﬃcien tly large n , ˆ δ K,J ∗ b D ( K ) J ∗ J ∗ ≥ c 2 n − 1 / 2 log( n ) for some constan t c 2 > 0. Therefore, √ n max J ∈M\{ K } ˆ δ K,J b D ( K ) J J ≥ c 2 log( n ) on F n . On the other hand, since α ≥ n − 1 and K max ≍ log( n ), a union b ound together with Mills’ inequality implies z max ( α, b H ( K ) ) ≲ p log( K max ) + log( α − 1 ) ≲ p log( n ) with probability tending to one. Consequen tly , on F n and for all suﬃcien tly large n , √ n max J ∈M\{ K } ˆ δ K,J b D ( K ) J J > z max ( α, b H ( K ) ) . Th us, Pr { ˆ p K ≤ α } ≥ Pr( F n ) − o (1) = 1 − o (1) , 49 whic h is equiv alent to Pr { H 0 ,K is not rejected at level α } = o (1) . This completes the proof. S.2 Pro ofs for Theorems in Section 3.2 Lemma S.1. Supp ose that Conditions 3.1-3.5 hold. Then, (i) max K ∈M E {S ϵ E ( T K ) − S ϵ E ( T K ∪ T ∗ ) } = o  M 1 log log ¯ λ  . (ii) max K ∈M E {S ϵ E ( T ∗ ) − S ϵ E ( T K ∪ T ∗ ) } = o  M 1 log log ¯ λ  . Pr o of. F or a ﬁxed K , let m k b e the n umber of true change-points strictly b et ween τ K k and τ K k +1 . W e denote the merged partition p oints within this interv al as τ K k, 0 := τ K k < τ K k, 1 < . . . < τ K k,m k < τ K k,m k +1 := τ K k +1 . Giv en Z O , T K is a ﬁxed change-point set. According to Lemma 14 in P ein and Shah (2025), w e hav e E {S ϵ E ( T K ) − S ϵ E ( T K ∪ T ∗ ) } ≤ 2 K X k =0 m k X l =0  τ K k,l +1 − τ K k,l  E  ¯ ϵ τ K k,l : τ K k,l +1  2 . Based on the independence of the errors ϵ i , for eac h term we ha ve  τ K k,l +1 − τ K k,l  E  ¯ ϵ τ K k,l : τ K k,l +1  2 =  τ K k,l +1 − τ K k,l  − 1 τ K k,l +1 X i = τ K k,l +1 E[ ϵ 2 i ] ≤ M 1 . Note that the total num b er of segmen ts in the merged partition T K ∪ T ∗ is exactly P K k =0 ( m k + 1) ≤ K + K ∗ + 1 ≤ 2 K max + 1. Hence, w e hav e E {S ϵ E ( T K ) − S ϵ E ( T K ∪ T ∗ ) } ≤ 2 M 1 (2 K max + 1) ≲ M 1 K max . 50 Therefore, max K ∈M E {S ϵ E ( T K ) − S ϵ E ( T K ∪ T ∗ ) } ≲ M 1 K max . By assuming K max = o (log log ¯ λ ) in Condition 3.3 (iii), w e obtain M 1 K max = o ( M 1 log log ¯ λ ), whic h implies statemen t (i). The exact same deduction applies to the true c hange-p oin t set T ∗ , yielding statement (ii), whic h w e omit here. Let T ∗ b e the true change p ositions, T K ∗ b e the estimated change p ositions when the n um b er of change-points is correctly sp eciﬁed. Let T ∗ b e the true c hange p ositions, and T K ∗ b e the estimated c hange p ositions when the num b er of change-points is correctly sp eciﬁed. Lemma S.2. Supp ose that Conditions 3.1-3.5 hold. Then (i) min K ∈M l E {S s E ( T K ) − S s E ( T ∗ ) } ≥ λM 3 min K ∈M l  P τ ∗ k ∈I ∗ lK ∆ 2 k  . (ii) max K ∈M E {S s E ( T ∗ ) − S s E ( T K ∪ T ∗ ) } = o  M 1 log log ¯ λ  . (iii) E {S s E ( T K ∗ ) − S s E ( T ∗ ) } = o  M 1 log log ¯ λ  . Pr o of. The pro of of this lemma is inspired by Lemma 17 in P ein and Shah (2025). F or statemen t (ii), we observ e that S s E ( T ∗ ) − S s E ( T K ∪ T ∗ ) = S ϵ E ( T ∗ ) − S ϵ E ( T K ∪ T ∗ ), since the true signal µ i is constant on both partitions and cancels out. Hence, statemen t (ii) directly follows from Lemma S.1 (ii). No w, w e sho w (iii). By deﬁnition, S s E ( T K ∗ ) = K ∗ X k =0 τ K ∗ k +1 X i = τ K ∗ k +1  s E i − ¯ s E τ K ∗ k : τ K ∗ k +1  2 = K ∗ X k =0 τ K ∗ k +1 X i = τ K ∗ k +1  µ i + ϵ E i − ¯ µ τ K ∗ k : τ K ∗ k +1 − ¯ ϵ E τ K ∗ k : τ K ∗ k +1  2 = K ∗ X k =0 τ K ∗ k +1 X i = τ K ∗ k +1  µ i − ¯ µ τ K ∗ k : τ K ∗ k +1  2 + 2 K ∗ X k =0 τ K ∗ k +1 X i = τ K ∗ k +1 ϵ E i  µ i − ¯ µ τ K ∗ k : τ K ∗ k +1  + n X i =1 ( ϵ E i ) 2 − K ∗ X k =0  τ K ∗ k +1 − τ K ∗ k  (¯ ϵ E τ K ∗ k : τ K ∗ k +1 ) 2 . (S.1) 51 Similarly , for S s E ( T ∗ ), S s E ( T ∗ ) = K ∗ X k =0 τ ∗ k +1 X i = τ ∗ k +1  s E i − ¯ s E τ ∗ k : τ ∗ k +1  2 = K ∗ X k =0 τ ∗ k +1 X i = τ ∗ k +1  ϵ E i − ¯ ϵ E τ ∗ k : τ ∗ k +1  2 = n X i =1 ( ϵ E i ) 2 − K ∗ X k =0  τ ∗ k +1 − τ ∗ k  (¯ ϵ E τ ∗ k : τ ∗ k +1 ) 2 . The exp ectation of their diﬀerence is E {S s E ( T K ∗ ) − S s E ( T ∗ ) } = K ∗ X k =0 τ K ∗ k +1 X i = τ K ∗ k +1  µ i − ¯ µ τ K ∗ k : τ K ∗ k +1  2 − K ∗ X k =0  τ K ∗ k +1 − τ K ∗ k  E  ¯ ϵ E τ K ∗ k : τ K ∗ k +1  2 + K ∗ X k =0  τ ∗ k +1 − τ ∗ k  E  ¯ ϵ E τ ∗ k : τ ∗ k +1  2 := A 1 − A 2 + A 3 . F or A 1 , according to the deduction in Lemma 17 in P ein and Shah (2025), the ﬁrst term is A 1 = O  P K ∗ k =1 b n ∆ 2 k  . F or A 2 , we notice that by Condition 3.1, A 2 = K ∗ X k =0  τ K ∗ k +1 − τ K ∗ k  − 1 τ K ∗ k +1 X i = τ K ∗ k +1 E( ϵ 2 i ) ≲ M 1 K ∗ . The exact same reasoning gives A 3 ≲ M 1 K ∗ . Hence, E {S s E ( T K ∗ ) − S s E ( T ∗ ) } ≲ K ∗ X k =1 b n ∆ 2 k + M 1 K ∗ . Statemen t (iii) then follows from Condition 3.4 (i), whic h guaran tees P K ∗ k =1 b n ∆ 2 k = o ( M 1 log log ¯ λ ), and Condition 3.3 (i), which ensures K ∗ = o (log log ¯ λ ). W e no w sho w statemen t (i). Based on a similar deduction as (S.1), we can decomp ose 52 the diﬀerence directly as: E {S s E ( T K ) − S s E ( T ∗ ) } = K X k =0 τ K k +1 X i = τ K k +1  µ i − ¯ µ τ K k : τ K k +1  2 + E {S ϵ E ( T K ) − S ϵ E ( T ∗ ) } . F or the noise comp onen t, Lemma S.1 yields max K ∈M l | E {S ϵ E ( T K ) − S ϵ E ( T ∗ ) }| ≤ max K ∈M l E {S ϵ E ( T K ) − S ϵ E ( T K ∪ T ∗ ) } + max K ∈M l E {S ϵ E ( T ∗ ) − S ϵ E ( T K ∪ T ∗ ) } = o ( M 1 log log ¯ λ ) . (S.2) F or the signal component, since K ∈ M l is a lac k-of-ﬁt model, Condition 3.4 (ii) guaran tees there exists a subset of c hange-p oin ts τ ∗ k ∈ I ∗ lK suc h that there is no estimated c hange-p oint within [ τ ∗ k − λ / 4 , τ ∗ k + λ/ 4]. Hence, using Condition 3.1, min K ∈M l K X k =0 τ K k +1 X i = τ K k +1  µ i − ¯ µ τ K k : τ K k +1  2 ≥ min K ∈M l X τ ∗ k ∈I ∗ lK τ ∗ k + λ 4 X i = τ ∗ k − λ 4 +1  µ i − ¯ µ τ K k : τ K k +1  2 ≥ min K ∈M l M 3 λ X τ ∗ k ∈I ∗ lK ∆ 2 k . Summarizing the ab o v e results, we ha ve min K ∈M l E {S s E ( T K ) − S s E ( T ∗ ) } ≥ min K ∈M l M 3 λ X τ ∗ k ∈I ∗ lK ∆ 2 k − o ( M 1 log log ¯ λ ) . By Condition 3.5, the right-hand side is strictly dominated by the p ositive ﬁrst term. Hence, statement (i) concludes. 53 S.2.1 Pro of of Theorem 3.2 No w, we fo cus on the main theorem. Our strategy is to apply Theorem 3.1, which requires showing that max K ∈M\{ K ∗ } δ K ∗ ,K σ K ∗ ,K ≤ x n s 1 n log( n ) , for suﬃciently large n . Recall that σ 2 K ∗ ,K can b e lo w er b ounded as follo ws: σ 2 K ∗ ,K ≥ 1 n n X i =1 V ar  ∥ s E i − ¯ s O K ∗ ,i ∥ 2 2 − ∥ s E i − ¯ s O K,i ∥ 2 2  = 1 n n X i =1 V ar  ∥ ¯ s O K ∗ ,i ∥ 2 2 − ∥ ¯ s O K,i ∥ 2 2 − 2( s E i ) ⊤ ( ¯ s O K ∗ ,i − ¯ s O K,i )  = 4 n n X i =1 ( ¯ s O K ∗ ,i − ¯ s O K,i ) ⊤ V ar  s E i  ( ¯ s O K ∗ ,i − ¯ s O K,i ) . Hence, for any K  = K ∗ , w e hav e σ 2 K ∗ ,K ≥ 4 M 2 n P n i =1 ∥ ¯ s O K ∗ ,i − ¯ s O K,i ∥ 2 2 . By the assumptions stated in the theorem, min K  = K ∗ max i =1 ,...,n ∥ ¯ s O K ∗ ,i − ¯ s O K,i ∥ 2 2 ≳ ∆ 2 ( K ∗ ) with asymptotic proba- bilit y 1. F urthermore, b ecause the estimated c hange-p oin ts set is nested and the distance b et w een adjoining c hange-p oints is at least cn/ log( n ), summing this maxim um discrep- ancy ov er the fraction of p oin ts n/ log ( n ) divided by the total n yields a factor of 1 / log( n ). Th us, σ 2 K ∗ ,K ≳ M 2 log − 1 ( n )∆ 2 ( K ∗ ) , which strictly implies: σ K ∗ ,K ≳ p M 2 ∆ ( K ∗ ) 1 p log( n ) , for suﬃciently large n . Because δ K ∗ ,K = − δ K,K ∗ , obtaining an upp er b ound for the ratio δ K ∗ ,K /σ K ∗ ,K is equiv- alen t to b ounding δ K,K ∗ from b elow. Therefore, using our b ound on σ K ∗ ,K , it suﬃces to 54 sho w that min K ∈M\{ K ∗ } δ K,K ∗ ≳ − p M 2 x n ∆ ( K ∗ ) 1 q n log 2 ( n ) , (S.3) with asymptotic probability 1. Let S x,y ( T K ) = P K k =0 P τ k +1 i = τ k +1 ( x i − ¯ x τ k ,τ k +1 )( y i − ¯ y τ k ,τ k +1 ). T aking the exp ectation E E [ · ] with resp ect to the even sample (conditional on the o dd sample so that T K is ﬁxed), algebraic expansion yields: nδ K,K ∗ = E E {S s E ( T K ) − S s E ( T K ∗ ) } − {S ϵ O ( T K ) − S ϵ O ( T K ∗ ) } − E E {S ϵ E ( T K ) − S ϵ E ( T K ∗ ) } + 2 E E  S ϵ O ,ϵ E ( T K ) − S ϵ O ,ϵ E ( T K ∗ )  := A K − B K − C K + 2 D K . In the following, we aim to b ound A K , B K , C K , and D K suc h that (S.3) holds. W e consider t w o cases: lack-of-ﬁt K ∈ M l (con trolled primarily via the minim um jump ∆ (1) ) and ov er-ﬁt K ∈ M o (con trolled via the maxim um jump ∆ ( K ∗ ) ). Step 1: (Lac k-of-ﬁt scenario). First, we fo cus on K ∈ M l . F or the signal term A K , w e decompose it as: min K ∈M l A K = min K ∈M l E E {S s E ( T K ) − S s E ( T ∗ ) } − E E {S s E ( T K ∗ ) − S s E ( T ∗ ) } . Based on Lemma S.2 (i) and Condition 3.5 (which ensures suﬃcient signal strength via the minim um jump size ∆ (1) ), min K ∈M l E E {S s E ( T K ) − S s E ( T ∗ ) } ≥ λM 3 min K ∈M l   X k ∈I ∗ lK ∆ 2 k   . 55 The second term is b ounded b y o ( M 1 log log ¯ λ ) according to Lemma S.2 (iii). Hence, min K ∈M l A K ≥ λM 3 min K ∈M l   X k ∈I ∗ lK ∆ 2 k   − o ( M 1 log log ¯ λ ) . T o low er b ound − B K , we b ound the maxim um absolute deviation of B K using the triangle inequalit y o ver the merged partition T K ∪ T ∗ : max K ∈M l | B K | ≤ max K ∈M l |S ϵ O ( T K ) − S ϵ O ( T K ∪ T ∗ ) | + max K ∈M l |S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) | + |S ϵ O ( T ∗ ) − S ϵ O ( T K ∗ ∪ T ∗ ) | + |S ϵ O ( T K ∗ ) − S ϵ O ( T K ∗ ∪ T ∗ ) | . It follows from Lemma 18 in Pein and Shah (2025) that max K ∈M l | B K | = O P ( M 1 K ∗ (log ¯ λ ) 2 ). F or the third term, Lemma S.2 yields max K ∈M l | C K | = o ( M 1 log log ¯ λ ). F or the cross term D K , applying a similar triangle inequality expansion and Lemma 15 in P ein and Shah (2025), we hav e max K ∈M l | D K | ≤ o ( M 1 log log ¯ λ ) + O P ( K ∗ M 1 (log ¯ λ ) 2 ). Noting that M 1 log log ¯ λ ≲ M 1 K ∗ (log ¯ λ ) 2 , we combine the ab ov e b ounds to obtain: nδ K,K ∗ ≳ λM 3 min K ∈M l   X k ∈I ∗ lK ∆ 2 k   − M 1 K ∗ (log ¯ λ ) 2 , (S.4) with asymptotic probabilit y 1 as n → ∞ . According to Condition 3.5, the signal term strictly dominates the noise term, guaran teeing that nδ K,K ∗ > 0. Because δ K,K ∗ > 0 implies δ K ∗ ,K < 0, the ratio δ K ∗ ,K /σ K ∗ ,K is strictly negativ e. Thus, under-ﬁt mo dels trivially satisfy the approximate-n ull upp er b ound requiremen t of Theorem 3.1. 56 Step 2: (Over-ﬁt scenario). Next, we consider K ∈ M o . F or the signal term A K , min K ∈M o A K = min K ∈M o E E {S s E ( T K ) − S s E ( T K ∗ ) } ≥ min K ∈M o E E {S s E ( T K ) − S s E ( T K ∪ T ∗ ) } − max K ∈M o E E {S s E ( T ∗ ) − S s E ( T K ∪ T ∗ ) } −    E E {S s E ( T K ∗ ) − S s E ( T ∗ ) }    . Based on Lemma S.2 (ii) and (iii), the last tw o terms are b oth o ( M 1 log log ¯ λ ). Hence, min K ∈M o A K ≥ min K ∈M o E E {S s E ( T K ) − S s E ( T K ∪ T ∗ ) } − o ( M 1 log log ¯ λ ). F or the second term B K , applying the decomposition through T K ∪ T ∗ and using Lemma 18 in P ein and Shah (2025), we obtain: min K ∈M o B K ≥ min K ∈M o {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } − o P  M 1 log log ¯ λ  . F or the third term C K , a similar decomp osition using Lemma S.1 gives max K ∈M o | C K | = O  M 1 log log ¯ λ  . F or the last term D K , according to Lemma 19 in P ein and Shah (2025), w e ha ve min K ∈M o D K ≥ − o      min K ∈M o {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) }      − o P ( M 1 log log ¯ λ ) . Com bining these elements for suﬃciently large n , we ha ve: min K ∈M o nδ K,K ∗ ≳ min K ∈M o E E {S s E ( T K ) − S s E ( T K ∪ T ∗ ) } + min K ∈M o {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } − M 1 log log ¯ λ ≳ − M 1 log log ¯ λ. (S.5) The ﬁnal inequality holds with asymptotic probability 1 b ecause the terms E E {S s E ( T K ) − 57 S s E ( T K ∪ T ∗ ) } and {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } represen t the deterministic reduction in signal sum of squares and the realized noise-part sum of squares improv ement from partition reﬁnemen t, resp ectively , b oth of which can b e b ounded b elo w by zero. Dividing by n , w e obtain min K ∈M o δ K,K ∗ ≳ − M 1 log log ¯ λ/n . T o satisfy condition (S.3), w e require: M 1 log log ¯ λ n ≤ p M 2 x n ∆ ( K ∗ ) 1 q n log 2 ( n ) , whic h simpliﬁes to the requirement that ∆ ( K ∗ ) ≳ x − 1 n log log ( ¯ λ ) log( n ) √ n . Squaring this yields the necessary lo w er b ound on the maxim um jump size given in the theorem statemen t, implying Pr { K ∗ ∈ A} ≥ 1 − α + o (1). S.2.2 Pro of of Theorem 3.3 Let B 1 n and B 2 n b e deﬁned as in Deﬁnition 3.1. First, w e establish an upp er b ound for the standard deviation σ K,K ∗ . Recalling the v ariance expansion, we hav e: σ 2 K,K ∗ ≲ 1 n n X i =1 ∥ ¯ s O K,i − ¯ s O K ∗ ,i ∥ 2 2 . Under the theorem assumption that max i =1 ,...,n ∥ ¯ s O K,i − ¯ s O K ∗ ,i ∥ 2 2 ≲ ∆ 2 ( K ∗ ) for all K , this yields σ 2 K,K ∗ ≲ ∆ 2 ( K ∗ ) . Therefore, taking the square ro ot giv es σ K,K ∗ ≲ ∆ ( K ∗ ) with asymptotic probabilit y 1. Next, we establish the b ehavior of δ K,K ∗ for mo dels in B 1 n and B 2 n separately: Case 1: K ∈ B 1 n ⊂ M l . F rom the lack-of-ﬁt b ound established in equation (S.4), w e 58 ha v e: nδ K,K ∗ ≳ λM 3   X k ∈I ∗ lK ∆ 2 k   − M 1 K ∗ (log ¯ λ ) 2 . F or K ∈ B 1 n , the deﬁnition of the set implies that the signal term λ P k ∈I ∗ lK ∆ 2 k dominates the M 1 K ∗ (log ¯ λ ) 2 noise remainder, yielding nδ K,K ∗ ≳ M 1 ∆ 2 ( K ∗ ) p n log( n ). Dividing by n , w e obtain δ K,K ∗ ≳ M 1 ∆ 2 ( K ∗ ) q log( n ) n . Case 2: K ∈ B 2 n ⊂ M o . F rom the ov er-ﬁt b ound established in equation (S.5), we ha v e: nδ K,K ∗ ≳ E E {S s E ( T K ) − S s E ( T K ∪ T ∗ ) } + {S ϵ O ( T ∗ ) − S ϵ O ( T K ∪ T ∗ ) } − M 1 log log ¯ λ. F or K ∈ B 2 n , the deﬁnition of the s et ensures that the ﬁrst t wo terms exceed the sto chastic remainder by a suﬃcien t margin, yielding nδ K,K ∗ ≳ M 1 ∆ 2 ( K ∗ ) p n log( n ). Dividing b y n , w e again obtain δ K,K ∗ ≳ M 1 ∆ 2 ( K ∗ ) q log( n ) n . Therefore, for an y K ∈ B 1 n ∪ B 2 n , combining the lo wer b ound on δ K,K ∗ with the upp er b ound on σ K,K ∗ yields: max J  = K δ K,J σ K,J ≥ δ K,K ∗ σ K,K ∗ ≳ M 1 ∆ 2 ( K ∗ ) q log( n ) n ∆ ( K ∗ ) = M 1 ∆ ( K ∗ ) r log( n ) n . Because ∆ ( K ∗ ) q log( n ) n ≫ x n √ n log ( n ) for an y x n = o (1), the ratio strictly violates the threshold required for inclusion in A . Consequently , every mo del K ∈ B 1 n ∪ B 2 n is excluded from A with probabilit y tending to 1. It immediately follows that at least |B 1 n | + |B 2 n | candidate mo dels are absent from the selected set, establishing Pr {|A| ≤ K max − |B 1 n | − |B 2 n |} ≥ 1 − o (1). 59 S.3 Additional related literature In the ﬁeld of c hange-p oint detection literature, while there is no shortage of researc h on statistical inference related to the n umber of c hange-p oin ts, the ma jority , if not all, of these w orks ha v e fo cused on con trolling either the F amilywise Error Rate (FWER) or the F alse Disco v ery Rate (FDR). F ric k et al. (2014) ﬁrst introduced a no v el metho d called SMUCE to control FWER for data from one-dimensional exp onential family . P ein et al. (2017) extended this approach to the heterogeneous c hange-p oin t detection problem. The FWER in change-point detection context is t ypically deﬁned as the probabilit y that the estimated n um b er of change-points ˆ K strictly exceeds the true n um b er K ∗ , i.e., Pr { ˆ K > K ∗ } . By sp ecifying a small predetermined lev el α for FWER, one can hav e certain conﬁdence that ˆ K do es not ov erestimate the true n um b er of change-points. On the ﬂip side, ho w ever, the SMUCE-type pro cedures can b e ov erly stringen t, result- ing in lo w statistical p o w er and a tendency to signiﬁcantly underestimate the true num b er of c hange-p oin ts in practical applications (Li et al., 2016; Chen et al., 2023). Instead of FWER, Li et al. (2016) suggested con trolling a less stringen t criterion FDR. FDR is deﬁned as the prop ortion of false disco veries among the selected c hange-p oin ts, i.e., E[ ˆ K F / ˆ K ], where ˆ K F represen ts the num b er of false discov eries. See Li et al. (2016) for a more comprehensiv e deﬁnition of FDR in c hange-p oin t problems. In the realm of FDR-based metho ds, Hao et al. (2013) in tro duced a screening and ranking algorithm (SaRa) for de- tecting one-dimensional normal mean change -p oin ts. Li et al. (2016) prop osed a m ultiscale c hange-p oin t segmen tation approach (FDRseg) based on the same mo del setup. Cheng et al. (2020) advocated a diﬀeren tial smo othing and testing of maxima/minima algorithm (dSTEM) for con tinuous time series. Chen et al. (2023) developed a mirror with order- preserv ed splitting pro cedure called MOPS, to address a broader range of c hange-p oin t 60 mo dels, including structural changes and v ariance changes. Liu et al. (2024) extended the kno c k oﬀ framework to control FDR in structural change detection, which in turn can b e mo diﬁed for change-point detection. Sun et al. (2025) provided a data-splitting approach for FDR control. While FDR control may pro vide greater p ow er than FWER, it do es not guaran tee either consisten t estimation of true n umber of c hange-p oin ts, or the ﬁnite-sample conﬁdence of recov ering the true n um b er. At its core, FDR only concerns the exp ectation of false discov eries rather than probability . F or instance, consider the scenario of o verﬁt- ting. Note that ˆ K = K ∗ + ˆ K F , and FDR con trol aims to ensure that E[ ˆ K F / ˆ K ] ≤ α . Under some mild conditions, it implies that E[ ˆ K F ] ≤ ( α/ (1 − α )) K ∗ appro ximately , whic h further means that E[ ˆ K F ] can increase as the true num b er K ∗ increases. Consequen tly , if K ∗ is suﬃcien tly large, E[ ˆ K F ] ma y also b ecome substantial. As a result, the p oint estimate ˆ K ma y deviate signiﬁcantly from the true v alue in suc h cases. The concept of conﬁdence set studied in this pap er is fundamentally distinct with FWER and FDR, as it revolv es around the probability of reco v ering the true num b er of change- p oin ts, and the prop osed conﬁdence do es not rely on the true num b er K ∗ . F urthermore, letting α approac hes zero, a w ell-designed detection algorithm should lead to the cardi- nalit y of the constructed conﬁdence set equal to one, thus further result in consistency , highligh ting the robustness and reliabilit y of the algorithm in accurately identifying the true num b er of change-points. 61 S.4 Choice of loss function and discussion of related conditions T able 1 in Zou et al. (2020) hav e discussed several imp ortant settings. Here, w e discuss t w o extra imp ortant settings here, as detailed b elow . • Co v ariance change-points mo del. Consider the setting z i ∈ R p ∼ ( 0 , Σ ∗ k ) where τ ∗ k − 1 < i ≤ τ ∗ k for k = 1 , . . . , K ∗ + 1 and i = 1 , . . . , 2 n . F or this setup, w e can c ho ose the loss function l ( β ; z i ) = ∥ z i z ⊤ i − β ∥ 2 F for β ∈ R p × p . The gradien t of the loss function is ∂ l ( β ; z i ) /∂ β = z i z ⊤ i , and with β = γ = 0, we deﬁne the score s i = v ech( ∂ l ( β ; z i ) /∂ β ) = vec h( z i z ⊤ i ). The metho d from Aue et al. (2009) can then b e applied to detect c hange-p oin ts in this setting. • Net w ork c hange-p oin ts mo del. In the net w ork change-point setting, we assume z i ∈ R p × p ∼ Bern( Θ ∗ k ), where τ ∗ k − 1 < i ≤ τ ∗ k for k = 1 , . . . , K ∗ + 1 and i = 1 , . . . , 2 n . Here, z ∼ Bern( Θ ) means that z i,j i.i.d. ∼ Bern( θ i,j ). F or this mo del, w e ma y c ho ose the loss function l ( β ; z i ) = ∥ z i − β ∥ 2 F for β ∈ R p × p . The gradient is ∂ l ( β ; z i ) /∂ β = z i , and with β = γ = 0, the score b ecomes s i = vec h( ∂ l ( β ; z i ) /∂ β ) = v ec h( z i ). The metho d from W ang et al. (2021) can b e used to detect the c hange-p oin ts in this con text. Let η i ∈ R p i.i.d. ∼ (0 , I ), and denote it as η i when it reduces to the scalar case (i.e., p = 1). Model (2.1) can b e sp eciﬁed in to the following diﬀerent mo dels. Let Σ b e a ﬁxed co v ariance matrix. F or the net w ork mo del, Z i = ( z i,j,l ) 1 ≤ j,l ≤ n ∼ Bern( Θ i ) represen ts z i,j,l i.i.d. ∼ Bern( β i,j,l ), and let W i = Z i − Θ i . Based on this table, we can clearly iden tify the suﬃcien t conditions for z i to satisfy Condition 3.1 in the main con ten t. F or instance, in the case of the multiple mean c hange- 62 T able S.1: Detailed c hoice of score functions for OPTICS under diﬀeren t mo dels. Name F ormula l ( β ; z i ) s i Mean z i = β ∗ k + Σ 1 / 2 η i ∥ z i − β ∥ 2 2 s i = z i V ariance z i = β ∗ k η i ∥ log( z 2 i ) − β ∥ 2 2 s i = 2 log( z i ) Regression z i = x ⊤ i β ∗ k + η i ∥ z i − β ⊤ x i ∥ 2 2 s i = z i (1 , x ⊤ i ) ⊤ Co v ariance z i = ( Θ ∗ k ) 1 / 2 η i , β ∗ k ∈ R p × p ∥ z i z ⊤ i − Θ ∥ 2 F s i = v ech( z i z ⊤ i ) Net w ork Z i = Θ ∗ k + W i , Θ ∗ k ∈ R p × p ∥ Z i − Θ ∥ 2 F s i = v ech( Z i ) p oin t mo del, the original data z i should b e a sub-Gaussian vector. Similarly , for m ultiple v ariance change-point models, it is required that a transformation of z i , sp eciﬁcally log ( z i ), is sub-Gaussian. S.5 More sim ulation results This section devotes to more simulation results. T able S.2 and S.3 rep ort the co v erage rates when d = 1 and d = 5, resp ectively , when the error term follo ws the standard normal distribution in Section 5.1. T able S.4 provides the co verage rates in Section 5.2, with standard normal errors. T able S.2: The co v erage rates in the mean-change mo del: d = 1; normal error. Amplitude A 0.50 0.625 0.75 0.875 1.00 OPTICS(BS) 0.77(4.10) 0.84(3.75) 0.87(3.10) 0.84(2.66) 0.84(2.68) OPTICS(SN) 0.82(4.00) 0.95(3.42) 0.98(2.56) 0.99(2.39) 1.00(2.43) A 1 C O P S S (BS) 0.45(3.00) 0.74(3.00) 0.85(3.00) 0.89(3.00) 0.97(3.00) A 1 C O P S S (SN) 0.57(3.00) 0.91(3.00) 0.92(3.00) 0.97(3.00) 0.98(3.00) A 1 F D Rseg 0.83(3.00) 0.99(3.00) 0.98(3.00) 0.96(3.00) 0.96(3.00) A 1 S M U C E 0.24(3.00) 0.86(3.00) 0.98(3.00) 1.00(3.00) 1.00(3.00) COPSS(BS) 0.23 0.38 0.52 0.60 0.52 COPSS(SN) 0.30 0.61 0.84 0.89 0.89 FDRseg 0.53 0.85 0.89 0.82 0.82 SMUCE 0.01 0.36 0.92 1 1 63 T able S.3: The co v erage rates in the mean-change mo del: d = 5; normal error. Amplitude A 0.50 0.625 0.75 0.875 1.00 OPTICS(BS) 0.73(3.27) 0.55(2.34) 0.60(2.19) 0.56(2.04) 0.67(2.20) OPTICS(SN) 0.83(3.17) 0.83(2.04) 0.94(1.91) 0.92(1.85) 0.93(2.04) A 1 C O P S S (BS) 0.65(3.00) 0.70(3.00) 0.76(3.00) 0.80(3.00) 0.88(3.00) A 1 C O P S S (SN) 0.77(3.00) 0.83(3.00) 0.96(3.00) 0.93(3.00) 0.95(3.00) COPSS(BS) 0.35 0.33 0.36 0.42 0.43 COPSS(SN) 0.48 0.75 0.80 0.80 0.69 T able S.4: The co v erage rates in linear mo del with co eﬃcien t structural-breaks; N (0 , 1) errors. Amplitude A 0.10 0.125 0.15 0.175 0.20 OPTICS(BS) 0.83(3.52) 0.90(3.13) 0.89(2.68) 0.80(2.23) 0.78(1.97) OPTICS(SN) 0.92(2.76) 0.97(2.28) 0.98(1.90) 0.97(1.61) 1.00(1.51) A 1 C O P S S (BS) 0.50(3.00) 0.59(3.00) 0.60(3.00) 0.59(3.00) 0.61(3.00) A 1 C O P S S (SN) 0.72(3.00) 0.76(3.00) 0.72(3.00) 0.74(3.00) 0.77(3.00) COPSS(BS) 0.22 0.31 0.26 0.33 0.32 COPSS(SN) 0.51 0.57 0.56 0.62 0.58 S.5.1 V ariance c hange-p oin t mo del In this subsection, w e consider the v ariance c hange-p oin t mo del y i = σ k ϵ ∗ i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1 , i = 1 , . . . , 2 n, where σ ∗ k +1 /σ ∗ k = A ( − 1) k − 1 , k = 1 , . . . , K ∗ and σ 1 = 1. The error terms ϵ i ∼ N (0 , 0 . 25) follo wing Chen and Gupta (1997). In this mo del, w e set the sample size n = 1000, the true c hange-p oin t set T ∗ = { 200 k , k = 1 , . . . , 4 } and the change amplitude A = { 2 , 3 , 4 , 5 , 6 } . T able S.5 compares the co v erage rates of OPTICS with the state-to-art metho ds. Sim- ilar as in the multiple mean-c hange model, OPTICS with SN and BS achiev e the nominal conﬁdence lev el. Their cov erage rates consisten tly surpass those of quasi-conﬁdence sets created from COPSS. This implies OPTICS is desirable from b oth empirical and theoretical standp oin ts. Additionally , as exp ected, FDRseg and SMUCE, along with their resp ective quasi-conﬁdence sets, lac k p o wer as they are tailored for detecting mean c hanges, rendering 64 T able S.5: The co v erage rates in the v ariance c hange-p oin t model. Amplitude A 2 3 4 5 6 OPTICS(BS) 0.86(3.99) 0.85(2.73) 0.93(2.63) 0.88(2.71) 0.88(2.71) OPTICS(SN) 0.83(4.55) 0.98(2.84) 0.99(2.66) 1.00(2.83) 0.99(3.02) A 1 C O P S S (BS) 0.51(3.00) 0.77(3.00) 0.88(3.00) 0.87(3.00) 0.94(3.00) A 1 C O P S S (SN) 0.53(3.00) 0.83(3.00) 0.94(3.00) 0.96(3.00) 0.97(3.00) A 1 F D Rseg 0(3.00) 0(3.00) 0(3.00) 0(3.00) 0.01(3.00) A 1 S M U C E 0.24(3.00) 0.13(3.00) 0.13(3.00) 0.15(3.00) 0.19(3.00) COPSS(BS) 0.28 0.53 0.63 0.60 0.55 COPSS(SN) 0.18 0.74 0.87 0.89 0.97 FDRseg 0 0 0 0 0 SMUCE 0.11 0.06 0.06 0.04 0.04 them insensitive to v ariance changes. S.5.2 Net work c hange-p oin ts mo del Consider the net work c hange-p oin ts model Z i = Θ ∗ k + W i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1 , i = 1 , . . . , 2 n, where τ ∗ k , k = 1 , . . . , K ∗ are the true net w ork change-points, Θ ∗ k is the d × d -dimensional net w ork mean matrix for sub ject i when τ ∗ k − 1 < i ≤ τ ∗ k ; W i is the indep endently distributed Bernoulli error. The sample size is tak en to be 2 n = 1000, and the set of true c hange-p oints T ∗ = { τ ∗ k = 200 k , k = 1 , . . . , 4 } , hence K ∗ = 4. The k th mean v ector Θ ∗ k is generated from sto c hastic blo c k mo del (W ang et al., 2021) with connectivit y matrices Q k = Q l with l = mo d ( k / 2) + 1, and Q 1 = A ×        0 . 6 , 1 , 0 . 6 1 , 0 . 6 , 0 . 5 0 . 6 , 0 . 5 , 0 . 6        and Q 2 = A ×        0 . 6 , 0 . 5 , 0 . 6 0 . 5 , 0 . 6 , 1 0 . 6 , 1 , 0 . 6        65 where A is a scaler, v arying among { 0 . 50 , 0 . 60 , 0 . 70 , 0 . 80 , 0 . 90 } . Each netw ork is generated from a 3-communit y sto c hastic blo c k mo del and no de size d = 5. At the change p oin ts, mem b ership are resh uﬄed randomly . This sim ulation setting mimics the situation in W ang et al. (2021), and we choose the Net work Binary Segmen tation (NBS) proposed therein as our change-point detection pro cedure. W e compare OPTICS with COPSS and its naiv e conﬁdence set A 1 C O P S S . T able S.6 re- p ort the co v erage rates. The quan tities in parentheses are a verage cardinalities of estimated sets. W e tak e q = 1 for the quasi-conﬁdence sets for COPSS, i.e., A 1 = { ˆ K − 1 , ˆ K , ˆ K + 1 } with cardinalit y 3, and denote the generated sets b y A 1 C O P S S . W e can see that OPTICS is the only metho d closely reac hes the sp eciﬁed conﬁdence level. T able S.6: The co v erage rates in the netw ork c hange-p oin ts model. Amplitude A 0.50 0.60 0.70 0.80 0.90 1.00 OPTICS(BS) 0.79(2.88) 0.75(2.57) 0.79(2.59) 0.70(1.83) 0.65(1.79) 0.75(1.85) A 1 C O P S S (BS) 0.75(3.00) 0.84(3.00) 0.89(3.00) 0.91(3.00) 0.86(3.00) 0.87(3.00) COPSS(BS) 0.37 0.39 0.47 0.55 0.55 0.62 S.5.3 Co v ariance change-point mo del Consider the co v ariance c hange-p oin ts mo del z i = ( Θ ∗ k ) 1 / 2 η i , τ ∗ k − 1 < i ≤ τ ∗ k , k = 1 , . . . , K ∗ + 1 , i = 1 , . . . , 2 n, where Θ ∗ k is the d × d -dimensional cov ariance matrix for sub ject i when τ ∗ k − 1 < i ≤ τ ∗ k ; η i is the indep enden tly and iden tically distributed standard Gaussian vector. The sample size is tak en to b e 2 n = 1000, and the set of true c hange-p oin ts T ∗ = { τ ∗ k = 200 k , k = 1 , . . . , 4 } , 66 hence K ∗ = 4. The k th mean vector Θ ∗ k = Θ ∗ l with l = mo d ( k / 2) + 1, and Θ ∗ 1 = I d and Θ ∗ 1 = { A | i − j | } 1 ≤ i,j ≤ n , where A is a scaler, v arying among { 0 . 10 , 0 . 20 , 0 . 30 , 0 . 40 , 0 . 50 } . W e choose the Wild Bi- nary Segmen tation (WBS) prop osed in F ryzlewicz (2014) as our change-point detection pro cedure. W e compare OPTICS with COPSS and its naive conﬁdence set A 1 C O P S S . T able S.7 rep orts the co v erage rates, with the quantities in parentheses representing the av erage cardinalities of the estimated sets. F rom the results, w e observe that OPTICS p erforms b est among all metho ds, although it still do es not ac hiev e the sp eciﬁed conﬁdence level. This may b e due to the F rob enius measure not b eing an ideal choice. W e lea v e this issue for future researc h. T able S.7: The co v erage rates in the netw ork c hange-p oin ts model. Amplitude A 0.30 0.35 0.40 0.45 0.50 OPTICS(BS) 0.57(4.52) 0.63(5.17) 0.70(5.33) 0.68(5.18) 0.64(4.29) A 1 C O P S S (BS) 0.42(3.00) 0.44(3.00) 0.22(3.00) 0.17(3.00) 0.29(3.00) COPSS(BS) 0.09 0.05 0.06 0.03 0.08 S.5.4 Multiple mean-c hange with hea vy-tail error The data-generating pro cess for the hea vy-tailed case is nearly iden tical to the m ultiple mean-c hange mo del with a t -distribution describ ed in Section 5.1, with the only diﬀer- ence b eing that w e set the degrees of freedom to d f = 1 in the t -distribution. F rom the T able S.8, we observe that Hub er-OPTICS (H-OPTICS) with κ = 1 . 5 and OPTICS consisten tly achiev e higher cov erage rates compared to existing metho ds. F urthermore, H-OPTICS pro duces narrow er conﬁdence sets than OPTICS, demonstrating greater p ow er 67 under heavy-tailed data. T able S.8: Cov erage rates in the mean-change mo del ( d = 1) with hea vy-tail errors. Amplitude A 0.50 0.625 0.75 0.875 1.00 H-OPTICS(BS) 0.58(3.01) 0.57(3.07) 0.54(3.13) 0.58(3.27) 0.63(3.00) H-OPTICS(SN) 0.97(2.99) 0.92(2.95) 0.93(3.03) 0.97(3.28) 0.95(3.05) OPTICS(BS) 0.82(4.21) 0.73(4.23) 0.74(4.15) 0.79(4.25) 0.80(4.09) OPTICS(SN) 1.00(4.32) 1.00(4.32) 1.00(4.54) 0.99(4.31) 1.00(4.32) A 1 C O P S S (BS) 0.29(3.00) 0.34(3.00) 0.34(3.00) 0.31(3.00) 0.36(3.00) A 1 C O P S S (SN) 0.60(3.00) 0.51(3.00) 0.52(3.00) 0.45(3.00) 0.45(3.00) COPSS(BS) 0.02 0.10 0.04 0.05 0.05 COPSS(SN) 0.04 0.01 0.03 0.08 0.00 S.5.5 Multiple mean-c hange with m -dep enden t errors The data-generating pro cess for this setting is nearly identical to the multiple mean- c hange mo del with a normal distribution describ ed in Section 5.1, except that the error terms ϵ t are m -dep endent. Sp eciﬁcally , they are deﬁned as ϵ i = m X l =1 ϕ l η t + l , i = 1 , . . . , 2 n, with η t ∼ N (0 , I d ) , t = 1 , . . . , 2 n + m, and ϕ l = p 1 / M . It follo ws that the ϵ i ’s exhibit an m -dep enden t structure. W e ev aluate the p erformance of the metho ds under b oth a single c hange-p oin t setting ( d = 1) and a m ulti-dimensional mean-c hange setting ( d = 5). F rom T able S.9 ( d = 1), we observ e that m -dep enden t OPTICS (M-OPTICS) ac hieves signiﬁcan tly higher co verage rates compared to standard OPTICS and existing comp eting metho ds, whic h struggle hea vily with the dep enden t errors. M-OPTICS(SN), in particular, main tains near-p erfect co verage. This phenomenon demonstrates that M-OPTICS can eﬀectiv ely adapt to m -dep enden t data while main taining the pre-sp eciﬁed co v erage rate, alb eit with sligh tly wider conﬁdence sets. 68 T able S.10 extends this analysis to the more complex multi-dimensional case ( d = 5). As the num b er of true c hange-p oin ts increases, the estimation problem becomes substan tially harder. Consequently , standard OPTICS and all comp eting baseline metho ds fail almost completely , yielding co verage rates near zero. While the cov erage of M-OPTICS(BS) de- creases in this c hallenging setting, M-OPTICS(SN) exhibits remark able robustness. By com bining the m ultiple-splitting pro cedure with self-normalization, M-OPTICS(SN) suc- cessfully preserv es high co v erage rates (ranging from 0.88 to 0.98) across all tested ampli- tudes, highlighting its distinct adv antage in handling multi-dimensional, dep enden t data structures. T able S.9: Cov erage rates in the mean–change mo del ( d = 1) with m –dep enden t errors. Amplitude A 0.50 0.625 0.75 0.875 1.00 M-OPTICS(BS) 0.87(4.11) 0.79(2.58) 0.85(2.51) 0.81(2.75) 0.78(2.55) M-OPTICS(SN) 0.93(3.23) 0.96(2.27) 1.00(1.99) 1.00(2.23) 1.00(2.27) OPTICS(BS) 0.22(2.61) 0.25(2.72) 0.29(2.71) 0.23(2.57) 0.21(2.59) OPTICS(SN) 0.41(2.92) 0.46(3.08) 0.39(2.91) 0.31(2.74) 0.43(3.00) A 1 COPSS (BS) 0.05(3.00) 0.08(3.00) 0.06(3.00) 0.06(3.00) 0.02(3.00) A 1 COPSS (SN) 0.04(3.00) 0.06(3.00) 0.05(3.00) 0.05(3.00) 0.06(3.00) A 1 FDRseg 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) A 1 SMUCE 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) COPSS(BS) 0.00 0.05 0.03 0.02 0.01 COPSS(SN) 0.01 0.01 0.01 0.03 0.04 FDRseg 0.00 0.00 0.00 0.00 0.00 SMUCE 0.00 0.00 0.00 0.00 0.00 T able S.10: Cov erage rates in the mean–change mo del ( d = 5) with m –dep enden t errors. Amplitude A 0.50 0.625 0.75 0.875 1.00 M-OPTICS(BS) 0.48(2.16) 0.48(1.97) 0.49(1.93) 0.53(2.06) 0.66(2.12) M-OPTICS(SN) 0.88(2.17) 0.94(1.93) 0.94(1.92) 0.97(1.94) 0.98(1.93) OPTICS(BS) 0.05(1.88) 0.04(1.89) 0.11(2.03) 0.10(2.03) 0.07(1.90) OPTICS(SN) 0.16(2.32) 0.15(2.27) 0.20(2.32) 0.15(2.22) 0.11(2.22) A 1 COPSS (BS) 0.00 0.00 0.01 0.01 0.00 A 1 COPSS (SN) 0.01 0.01 0.03 0.02 0.00 A 1 FDRseg 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) A 1 SMUCE 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) 0.00(3.00) COPSS(BS) 0.00 0.00 0.00 0.00 0.00 COPSS(SN) 0.00 0.00 0.01 0.00 0.00 69 S.5.6 V arying dep endence structure In this subsection, w e use simulations to inv estigate the eﬀect of the dep endence struc- ture m . The simula tion setting is the same as in Subsection S.5.5, except w e ﬁx the amplitude at A = 0 . 75 and v ary the dep endence parameter m ∈ { 1 , 2 , 3 , 5 , 8 } to observ e its impact. T able S.11: Cov erage rates in the mean–change mo del ( d = 1) with m –dep enden t errors. m 1 2 3 5 8 M-OPTICS(BS) 0.84(2.97) 0.86(2.52) 0.79(2.47) 0.81(2.59) 0.78(2.86) M-OPTICS(SN) 0.99(2.53) 1.00(2.16) 1.00(2.10) 1.00(2.18) 0.94(2.18) OPTICS(BS) 0.83(2.96) 0.08(2.87) 0.13(3.04) 0.01(1.51) 0.00(1.06) OPTICS(SN) 0.99(2.47) 0.25(3.59) 0.27(3.37) 0.01(1.80) 0.00(1.40) A 1 COPSS (BS) 0.83 0.00 0.02 0.00 0.00 A 1 COPSS (SN) 0.99 0.03 0.01 0.00 0.00 A 1 FDRseg 0.94 0.00 0.00 0.00 0.00 A 1 SMUCE 1.00 0.00 0.00 0.00 0.00 COPSS(BS) 0.50 0.00 0.00 0.00 0.00 COPSS(SN) 0.94 0.00 0.00 0.00 0.00 FDRseg 0.78 0.00 0.00 0.00 0.00 SMUCE 1.00 0.00 0.00 0.00 0.00 F rom T able S.11, we observ e that while most metho ds perform adequately when m = 1, their cov erage rates collapse to near zero as the dep endence increases ( m ≥ 2). In con- trast, M-OPTICS consistently main tains high cov erage rates across all tested v alues of m , demonstrating strong robustness to dep endent errors. F urthermore, M-OPTICS(SN) ac hiev es sup erior co v erage (remaining near 1.00) while generally pro ducing narrow er con- ﬁdence sets compared to M-OPTICS(BS). S.5.7 Comparison of OPTICS and Multiple-Splitting OPTICS In this subsection, we ev aluate the performance of t w o v arian ts of our prop osed method- ology: the base OPTICS metho d and the multiple-splitting OPTICS (MS-OPTICS). The sim ulation setup mirrors the data generating pro cess describ ed in Subsection 5.1, fo cusing 70 on a single change-point setting ( d = 1). T o inv estigate the impact of sample size and error distribution on the conﬁdence sets, we ﬁx the signal amplitude at A = 0 . 75 and v ary the total sample size N ∈ { 400 , 800 , 1000 , 1600 } . F urthermore, to assess the robustness of the metho ds, the error terms are generated from b oth a standard Normal distribution and a hea vy-tailed t 10 distribution. T able S.12: Cov erage rates (lengths) for amplitude A = 0 . 75 across v arying sample sizes N . Distribution Metho d N = 400 N = 800 N = 1000 N = 1600 Normal OPTICS(BS) 0.88(2.41) 0.80(2.12) 0.89(2.59) 0.74(2.33) MS-OPTICS(BS) 0.88(3.59) 0.92(2.47) 0.90(2.67) 0.87(2.56) OPTICS(SN) 1.00(2.20) 1.00(2.00) 1.00(2.41) 0.99(2.18) MS-OPTICS(SN) 0.98(3.45) 0.99(2.09) 1.00(2.36) 1.00(1.99) t − distr ibution OPTICS(BS) 0.74(2.43) 0.84(2.13) 0.85(2.54) 0.77(2.52) MS-OPTICS(BS) 0.88(4.10) 0.83(2.53) 0.91(2.67) 0.90(2.62) OPTICS(SN) 1.00(2.45) 0.98(2.34) 1.00(2.68) 1.00(2.74) MS-OPTICS(SN) 0.93(4.08) 0.99(2.42) 0.99(2.53) 0.99(2.40) T able S.12 summarizes the empirical cov erage rates and av erage lengths of the estimated conﬁdence sets. W e observ e that for mo derate sample sizes (e.g., N ≤ 1000), both OPTICS and MS-OPTICS deliver comparable and highly satisfactory co v erage rates across b oth distributions. As the sample size increases to N = 1600, MS-OPTICS exhibits an added la y er of stabilit y , main taining a co v erages under the Normal distribution and the hea vy- tailed t 10 distribution. This suggests that while the base OPTICS is highly eﬀective and eﬃcien t for t ypical sample sizes, the m ultiple-splitting pro cedure can serv e as a robust alternativ e when the sample size gro ws exceptionally large. S.5.8 Eﬀectiv eness of c hange-p oin t detection algorithm As aforementioned in Section 1, the cardinalit y of OPTICS can ev aluate the eﬃcacy of v arious c hange-p oint detection algorithms when utilized as the base algorithms for con- structing OPTICS. An eﬃcien t detection algorithm should demonstrate superior capabilit y 71 in distinguishing the true num b er of c hange-p oin ts from others. Therefore, OPTICS cou- pled with a p o werful detection algorithm is exp ected to achiev e the co v erage rate with a smaller cardinality . In this subsection, w e conduct a comparison b etw een the eﬀectiveness of SN and BS under the univ ariate mean change-point mo del setting in Section 5.1 The left panel of Figure S.1 depicts the b oxplot of cardinalities of OPTICS using BS and SN as base algorithm across 100 simulation runs, where the error term follows N (0 , 1). As the amplitude A increases, the av erage cardinalities decrease, with OPTICS with SN exhibiting a notably faster decline. Additionally , the right panel of Figure S.1 shows the co v erage rates of OPTICS with the tw o base algorithms, where OPTICS with SN demon- strates a higher cov erage rate compared to that with BS. Hence, under the univ ariate mean change-point mo del setting, the SN method pro v es to be more eﬀective than the BS metho d. Figure S.2 is devoted to the parallel results with t (10) errors. Similar phenomena are observed. Figure S.1: The b o xplot of cardinalities of OPTICS and the line plot of cov erage rates in univ ariate mean c hange-p oin t mo del: N (0 , 1) error. 72 Figure S.2: The b o xplot of cardinalities of OPTICS and the line plot of cov erage rates in univ ariate mean c hange-p oin t mo del: t (10) error. S.6 More real data analysis results W e pro vide the estimated c hange-p osition set A i , i = 1 , . . . , 10: A 1 = { 13 , 16 } , A 2 = { 7 , 10 , 19 , 22 } , A 3 = { 4 , 16 , 19 , 22 } , A 4 = { 4 , 7 , 10 , 13 , 16 , 19 , 22 } , A 5 = { 13 , 22 } , A 6 = { 4 , 7 , 10 , 13 , 16 } , A 7 = { 4 , 13 , 16 , 19 , 22 } , A 8 = { 22 } , A 9 = { 19 , 22 } , A 10 = { 22 } . The detected c hange-p oin ts for each individual are as follo ws: 73 S 1 = { 263 , 341 , 363 , 388 , 428 , 449 , 469 , 1319 , 1724 , 1906 , 2044 , 2143 , 2195 } , S 2 = { 1642 , 1663 , 1771 , 1795 , 1816 , 1965 , 2195 } , S 3 = { 540 , 601 , 2143 , 2195 } , S 4 = { 220 , 2041 , 2062 , 2143 } , S 5 = { 73 , 174 , 269 , 1141 , 1225 , 1276 , 1641 , 1915 , 1965 , 1991 , 2031 , 2143 , 2195 } , S 6 = { 73 , 105 , 134 , 2195 } , S 7 = { 74 , 134 , 1572 , 2195 } , S 8 = { 177 , 393 , 521 , 541 , 601 , 788 , 811 , 895 , 932 , 960 , 1051 , 1141 , 1285 , 1319 , 1386 , 1724 , 1906 , 1973 , 1997 , 2041 , 2137 , 2195 } , S 9 = { 60 , 221 , 454 , 521 , 544 , 581 , 905 , 925 , 1029 , 1054 , 1141 , 1225 , 1249 , 1378 , 1522 , 2047 , 2071 , 2141 , 2195 } , S 10 = { 72 , 134 , 756 , 1119 , 1141 , 1167 , 1225 , 1321 , 1366 , 1386 , 1455 , 1534 , 1560 , 1642 , 1685 , 1726 , 1818 , 2044 , 2091 , 2143 , 2166 , 2195 } . References Aue, A., S. H¨ ormann, L. Horv´ ath, and M. Reimherr (2009). Break detection in the co v ari- ance structure of m ultiv ariate time series mo dels. Annals of Statistics 37 (6B), 046–4087. Auger, I. E. and C. E. Lawrence (1989). Algorithms for the optimal iden tiﬁcation of segmen t neigh b orho o ds. Bul letin of Mathematic al Biolo gy 51 (1), 39–54. Bai, J. (1998). Estimation of m ultiple-regime regressions with least absolutes deviation. Journal of Statistic al Planning and Infer enc e 74 (1), 103–134. 74 Bai, J. and P . Perron (1998). Estimating and testing linear mo dels with m ultiple structural c hanges. Ec onometric a 66 (1), 47–78. Barano wski, R., Y. Chen, and P . F ryzlewicz (2019). Narrow est-ov er-threshold detection of m ultiple c hange p oin ts and change-point-lik e features. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy 81 (3), 649–672. Braun, J. V., R. Braun, and H.-G. M ¨ uller (2000). Multiple changepoint ﬁtting via quasi- lik eliho o d, with application to dna sequence segmentation. Biometrika 87 (2), 301–314. Chen, H., H. Ren, F. Y ao, and C. Zou (2023). Data-driv en selection of the num b er of change- p oin ts via error rate control. Journal of the Americ an Statistic al Asso ciation 118 (542), 1415–1428. Chen, J. and A. K. Gupta (1997). T esting and lo cating v ariance changepoints with appli- cation to sto c k prices. Journal of the A meric an Statistic al asso ciation 92 (438), 739–747. Chen, X. and W.-X. Zhou (2020). Robust inference via m ultiplier b o otstrap. A nnals of Statistics 48 (3), 1665–1691. Cheng, D., Z. He, and A. Sc h wartzman (2020). Multiple testing of lo cal extrema for detection of c hange p oin ts. Ele ctr onic Journal of Statistics 14 (2), 3705–3729. Chernozh uk ov, V., D. Chetverik ov, and K. Kato (2013). Gaussian appro ximations and m ultiplier b ootstrap for maxima of sums of high-dimensional random v ectors. A nnals of Statistics 41 (6), 2786–2819. Chernozh uk ov, V., D. Chetv erik o v, and K. Kato (2017). Central limit theorems and b o ot- strap in high dimensions. A nnals of Pr ob ability 45 (4), 2309–2352. 75 Cho, H. and P . F ryzlewicz (2015). Multiple-c hange-p oin t detection for high dimensional time series via sparsiﬁed binary segmen tation. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) 77 (2), 475–507. F an, Y., J. Liu, J. Lv, and A. Sun (2025). MOSAIC: Minimax-optimal sparsity-adaptiv e inference for c hange p oin ts in dynamic netw orks. arXiv pr eprint arXiv:2509.06303 . F rick, K., A. Munk, and H. Sieling (2014). Multiscale change p oin t inference. Journal of the R oyal Statistic al So ciety: Series B: Statistic al Metho dolo gy 76 (3), 495–580. F ryzlewicz, P . (2014). Wild binary segmen tation for m ultiple c hange-p oin t detection. An- nals of Statistics 42 (6), 2243–2281. Hao, N., Y. S. Niu, and H. Zhang (2013). Multiple c hange-p oint detection via a screening and ranking algorithm. Statistic a Sinic a 23 (4), 1553–1572. Harc haoui, Z. and C. L´ evy-Leduc (2010). Multiple c hange-p oin t estimation with a total v ariation p enalt y . Journal of the Americ an Statistic al Asso ciation 105 (492), 1480–1493. Hub er, P . J. (1992). Robust estimation of a lo cation parameter. In Br e akthr oughs in statistics: Metho dolo gy and distribution , pp. 492–518. Springer. James, N. A. and D. S. Matteson (2015). ecp: An r pac k age for nonparametric multiple c hange point analysis of multiv ariate data. Journal of Statistic al Softwar e 62 , 1–25. Kok oszk a, P ., H. Miao, M. Reimherr, and B. T aouﬁk (2018). Dynamic functional regression with application to the cross-section of returns. Journal of Financial Ec onometrics 16 (3), 461–485. Lei, J. (2020). Cross-v alidation with conﬁdence. Journal of the A meric an Statistic al Asso- ciation 115 (532), 1978–1997. 76 Li, H., A. Munk, and H. Sieling (2016). FDR-control in m ultiscale c hange-p oin t segmen- tation. Ele ctr onic Journal of Statistics 10 (1), 918–959. Liu, B., C. Zhou, X. Zhang, and Y. Liu (2020). A uniﬁed data-adaptive framew ork for high dimensional change p oin t detection. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) 82 (4), 933–963. Liu, J., A. Sun, and Y. Ke (2024). A generalized kno ck oﬀ pro cedure for fdr con trol in structural change detection. Journal of Ec onometrics 239 (2), 105331. Liu, Y. and J. Xie (2020). Cauch y combination test: a p o werful test with analytic p-v alue calculation under arbitrary dep endency structures. Journal of the Americ an Statistic al Asso ciation 115 (529), 393–402. Mo on, S. and C. V elasco (2013). T ests for m-dep endence based on sample splitting metho ds. Journal of Ec onometrics 173 (2), 143–159. P adilla, O. H. M., Y. Y u, D. W ang, and A. Rinaldo (2021a). Optimal nonparametric change p oin t detection and localization. Ele ctric al Journal of Statistics 15 (1), 1154–1201. P adilla, O. H. M., Y. Y u, D. W ang, and A. Rinaldo (2021b). Optimal nonparametric m ultiv ariate c hange p oin t detection and lo calization. IEEE T r ansactions on Information The ory 68 (3), 1922–1944. P ein, F. and R. D. Shah (2025). Cross-v alidation for change-point regression: Pitfalls and solutions. Bernoul li 31 (1), 388–411. P ein, F., H. Sieling, and A. Munk (2017). Heterogeneous change p oint inference. Journal of the R oyal Statistic al So ciety. Series B (Statistic al Metho dolo gy) 79 (4), 1207–1227. Sc h warz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6 , 461–464. 77 Sun, A., J. Bi, and J. Liu (2025). A synthetic data approac h for fdr control in c hange-p oin t detection. Statistics Innovation 3 (1). T ruong, C., L. Oudre, and N. V a yatis (2020). Selective review of oﬄine c hange p oint detection metho ds. Signal Pr o c essing 167 , 107299. W ang, D., Y. Y u, and A. Rinaldo (2021). Optimal change p oint detection and lo calization in sparse dynamic net works. A nnals of Statistics 49 (1), 203–232. W ang, D., Z. Zhao, K. Z. Lin, and R. Willett (2021). Statistically and computationally eﬃcien t change p oint lo calization in regression settings. Journal of Machine L e arning R ese ar ch 22 , 248–1. Y ao, Y.-C. (1988). Estimating the n umber of c hange-p oin ts via sc hw arz’criterion. Statistics & Pr ob ability L etters 6 (3), 181–189. Y ao, Y.-C. and S.-T. Au (1989). Least-squares estimation of a step function. Sankhy¯ a: The Indian Journal of Statistics (Series A) 51 (3), 370–381. Y u, M. and X. Chen (2021). Finite sample change p oint inference and identiﬁcation for high- dimensional mean v ectors. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) 83 (2), 247–270. Zhang, N. R. and D. O. Siegm und (2012). Model selection for high-dimensional, multi- sequence change-point problems. Statistic a Sinic a 22 (4), 1507–1538. Zou, C., G. W ang, and R. Li (2020). Consistent selection of the num b er of change-points via sample-splitting. Annals of statistics 48 (1), 413. 78

OPTICS: Order-Preserved Test-Inverse Confidence Set for Number of Change-Points

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment