On the Bootstrap for Persistence Diagrams and Landscapes

On the Bo otstrap for P ersistence Diagrams and Landscap es F r ´ ed ´ eric Chazal 1 , Brittan y T erese F asy 2 , F abrizio Lecci 3 , Alessandro Rinaldo 3 , Aarti Singh 4 , and Larry W asserman 3 1 INRIA Saclay 2 Computer Scienc e Dep artment, T ulane University 3 Dep artment of Statistics, Carne gie Mel lon University 4 Machine L e arning Dep artment, Carne gie Mel lon University topstat@stat.cm u.edu Jan uary 22, 2014 Abstract P ersistent homology prob es top ological prop erties from p oin t clouds and func- tions. By lo oking at m ultiple scales simultaneously , one can record the births and deaths of top olo gic al fe atur es as the scale v aries. In this pap er we use a statistical tec hnique, the empirical b o otstrap, to separate top olo gic al signal from top olo gic al noise . In particular, we deriv e conﬁdence sets for p ersistence diagrams and conﬁdence bands for p ersistence landscap es. 1 In tro duction P ersisten t homology is a metho d for studying the homology at multiple scales sim ulta- neously . Given a manifold X embedded in a metric space Y , we consider a probabilit y densit y function p : Y → R , deﬁned o ver Y but concentrated around X ; that is, the densit y is p ositive for a small neighborho o d around X and very small o ver Y \ X . F or the right scale parameter t , the sup erlevel set p − 1 ([ t, ∞ )) captures the homology of X . The problem, ho wev er, is that t is not kno wn a priori. P ersistent homology quan tiﬁes the top ological c hanges of the sup erlevel sets with a m ultiset of points in the extended plane; we call this m ultiset the p ersistence diagram, and denote it b y P . Another wa y to represen t the information con tained in a p ersistence diagram is with the landsc ap e function L : R → R , which can be thought of as a functional summary of P ; we deﬁne these concepts in Section 1.1. Computationally , it ma y b e diﬃcult to compute P or L directly . Instead, w e assume that p corresp onds to a probabilit y distribution P , from whic h w e can sample. Giv en a sample of size n , we create an estimate of the probability densit y function p n using a kernel densit y estimate. As n increases, p n approac hes the true probability densit y . Giv en n large enough, w e compute the persistence diagram P n and the landscap e L n corresp onding to p n . Sometimes knowing the estimate of a persistence diagram or landscap e is not enough. The bigger question is: Ho w close is the estimated p ersistence diagram or landscap e to the true one? W e answ er this question b y constructing a c onﬁdenc e set for p ersistenc e diagr ams and a c onﬁdenc e b and for p ersistenc e landsc ap es . A (1 − α ) -c onﬁdenc e interval for a parameter θ is an in terv al [ a, b ] suc h that the probabilit y P ( θ ∈ [ a, b ]) is at least 1 − α . In our setting, we desire to ﬁnd a conﬁdence set for a p ersistence diagram P . T o do so, w e compute an estimated diagram b P and and in terv al [0 , c ] suc h that the b ottlenec k distance b et ween P and b P is contained in [0 , c ] with probability 1 − α . That is, w e ﬁnd a metric ball con taining P with high probabilit y . In this pap er, we presen t the b o otstrap, a metho d for computing conﬁdence inter- v als, and w e apply it to p ersistence diagrams and landscap es. After brieﬂy reviewing the necessary concepts from computational top ology , we giv e the general technique of b o otstrapping in statistics in Section 1.2. In Section 2, w e apply the b o otstrap to p ersistence diagrams and landscap es, providing a few examples of these conﬁdence in terv als. W e conclude in Section 2.3 with a discussion of our ongoing research and op en questions. 1 Bac kground Before presenting our results, we review the necessary deﬁnitions and theorems from p ersisten t homology . Then, w e present the b o otstrap. Due to space constrain ts, we co v er the basics and provide references for a more detailed description. 2 1.1 P ersistence Diagrams and Landscap es Let Y b e a metric space, for example. let Y be a compact subspace of R D . Supp ose w e ha ve a probability density function p : Y → R concen trated in a neigh b orho o d of a manifold X ⊆ Y . Persisten t homology monitors the ev olution of the generators of the homology groups of p − 1 ([ t, ∞ )), the sup erlev el sets of p , and assigns to each generator of these groups a birth time (or scale) b and a death time d . The persistence diagram P records each pair ( b, d ) as the point ( b + d 2 , b − d 2 ); that is, the x -co ordinate is the mid-life of the homological feature and the y -co ordinate is the half-life or half of the p ersistence of the feature. 1 W e refer the reader to Edelsbrunner and Harer [2010] for a more complete in tro duction to p ersisten t homology . Let D T b e the space of p ositive, coun table, T -b ounded p ersistence diagrams; that is, for each p oin t ( x, y ) = ( b + d 2 , b − d 2 ) ∈ P , w e hav e 0 ≤ d ≤ b ≤ T and there are a coun table n umber of p oin ts for whic h y > 0. W e note here that each p oin t on the line x = 0 is included in the p ersistence diagram P with inﬁnite multiplicit y . Letting W ∞ ( P 1 , P 2 ) denote the b ottleneck distance b etw een diagrams P 1 and P 2 , the space ( D , W ∞ ) is a metric space. W e then hav e the follo wing stabilit y result from Cohen- Steiner et al. [2007] and generalized in Chazal et al. [2012]: Theorem 1.1 (Stabilit y Theorem) . L et M b e ﬁnitely triangulable. L et f , g : M → R b e two c ontinuous functions. Then, the c orr esp onding p ersistenc e diagr ams P f and P g ar e wel l deﬁne d, and W ∞ ( P f , P g ) ≤ k f − g k ∞ . Bub enik [2012] introduced another representation called the p ersistence landscap e, whic h is in one-to-one corresp ondence with p ersistence diagrams. A p ersistence land- scap e is a con tinuous, piecewise linear function L : Z + × R → R . T o deﬁne the p er- sistence landscap e function, we replace each p ersistence p oin t p = ( x, y ) =  b + d 2 , b − d 2  with the triangle function t p ( z ) =      z − x + y z ∈ [ x − y , x ] x + y − z z ∈ ( x, x + y ] 0 otherwise =      z − d z ∈ [ d, b + d 2 ] b − z z ∈ ( b + d 2 , b ] 0 otherwise . Notice that p is itself on the graph of t p ( z ). W e obtain an arrangement of curves b y ov erlaying the graphs of the functions { t p ( z ) } p ∈P ; see Figure 1. The p ersistence landscap e is deﬁned formally as a w alk through this arrangement: L P ( k , z ) = kmax p ∈P t p ( z ) , (1) where kmax is the k th maximum v alue in the set; in particular, 1max is the usual maxim um function. Observe that L P ( k , z ) is 1-Lipsc hitz. F or the ease of exp osition, 1 In this pap er, we focus on the p ersistent homology of the sup erlev el set ﬁltration of a density function. Thus, the birth time b is greater than the death time d . 3 (birth-death)/2 0 2 4 6 8 10 0 1 2 3 4 Figure 1: The pink circles are the p oin ts in a p ersistence diagram. The cyan curve is the landscape L (1 , · ). w e will focus on k = 1 in this paper, using L ( z ) = L P (1 , z ). Ho wev er, the ideas we presen t in Section 2.2 hold for k > 1. Our deﬁnition of the p ersistence landscap e is equiv alen t to the original deﬁnition giv en in Bub enik [2012]. 1.2 The Standard Bo otstrap In tro duced in Efron [1979], the bo otstrap is a general method for estimating standard errors and for computing conﬁdence interv als. W e fo cus on the latter in this pap er, but refer the interested reader to Efron et al. [2001], Da vison and Hinkley [1997], and V an der V aart [2000] for more details on the versatilit y of the b o otstrap. Let X 1 , . . . , X n b e indep enden t and iden tically distributed random v ariables tak- ing v alues in the measure space ( X , X , P ). Supp ose w e are interested in estimating the real-v alued parameter θ corresponding to the distribution P of the observ ation. W e estimate θ using the statistic ˆ θ = g ( X 1 , . . . , X n ), whic h is some function of the data. F or example, θ and ˆ θ could b e the p opulation mean and the sample mean, resp ectiv ely . The distribution of the diﬀerence ˆ θ − θ con tains all the information that w e need to construct a conﬁdence in terv al of level 1 − α for θ ; that is, an interv al [ a, b ] dep ending on the data X 1 , . . . , X n suc h that P ( θ ∈ [ a, b ]) ≥ 1 − α. If w e knew the cum ulativ e distribution F of ˆ θ − θ , then the quantiles F − 1 (1 − α/ 2) and F − 1 ( α/ 2) can b e computed. F urthermore, setting a = ˆ θ − F − 1 (1 − α/ 2) and b = ˆ θ − F − 1 ( α/ 2), w e immediately obtain a (1 − α )-conﬁdence interv al for θ : P ( θ ∈ [ a, b ]) = P  F − 1  α 2  ≤ ˆ θ − θ ≤ F − 1  1 − α 2  = 1 − α. Unfortunately , the distribution of ˆ θ − θ depends on the unkno wn distribution P . In the ﬁrst step in the bo otstrap pro cedure, w e appro ximate the unknown P with the empirical measure P n that puts mass 1 /n at eac h X i in the sample. Let X ∗ 1 , . . . , X ∗ n b e a sample of size n from P n . Equiv alently , we can think of dra wing X ∗ 1 , . . . , X ∗ n from X 1 , . . . , X n with replacemen t. W e estimate the distribution F ( r ) with the distribution b F ( r ) = P n ( ˆ θ ∗ − ˆ θ ≤ r ), where ˆ θ ∗ = g ( X ∗ 1 , . . . , X ∗ n ). 4 The distribution b F is still not analytically computable, but can b e approximated b y simulation: for large B , obtain B diﬀerent v alues of ˆ θ ∗ and appro ximate b F ( r ), and hence F ( r ), with e F ( r ) = 1 B P B j =1 I ( ˆ θ ∗ j − ˆ θ ≤ r ) . Since the quantiles of e F appro ximate the quan tiles of F , w e deﬁne the estimated conﬁdence in terv al as C n = h ˆ θ − e F − 1 n (1 − α/ 2) , ˆ θ − e F − 1 n ( α/ 2) i . (2) In summary , the standard bo otstrap pro cedure is: 1. Compute the estimate ˆ θ = g ( X 1 , . . . , X n ). 2. Dra w X ∗ 1 , . . . , X ∗ n from P n and compute ˆ θ ∗ = g ( X ∗ 1 , . . . , X ∗ n ). 3. Rep eat the previous step B times to obtain ˆ θ ∗ 1 , . . . , ˆ θ ∗ B . 4. Compute the quantiles of e F and construct the conﬁdence interv al C n . There are t wo sources of error in the Bootstrap pro cedure. W e ﬁrst approximate F with b F and then we estimate b F by simulation. The second error can b e made arbitrarily small, b y c ho osing B large enough. Therefore, this error is usually ignored in the theory of the b o otstrap. F ormally , one has to show that sup r    e F ( r ) − F ( r )    P → 0 , whic h implies that the conﬁdence in terv al C n , deﬁned in (2), is asymptotic al ly c onsistent at lev el 1 − α ; that is, lim inf n →∞ P ( θ ∈ C n ) ≥ 1 − α . 1.3 The Bo otstrap Empirical Pro cess When a random v ariable is a function rather than a real v alue, the b o otstrap pro ce- dure describ ed ab ov e can b e used to construct a conﬁdence in terv al for the function ev aluated at a particular elemen t of the domain. Instead, we use the b o otstr ap empiri- c al pr o c ess , whic h can be used to ﬁnd a conﬁdence band for a function h ( t ); that is, w e ﬁnd a pair of functions a ( t ) and b ( t ) suc h that the probability that h ( t ) ∈ [ a ( t ) , b ( t )] for all t is at least 1 − α . W e describ e this tec hnique b elo w, but refer the reader to V an der V aart and W ellner [1996] and Kosorok [2008] for more details. An empiric al pr o c ess is a sto chastic pro cess based on a random sample. Let X 1 , . . . , X n b e independent and iden tically distributed random v ariables taking v alues in the measure space ( X , X , P ). F or a measurable function f : X → R , w e denote P f = R f dP and P n f = R f dP n = n − 1 P n i =1 f ( X i ). By the la w of large n umbers P n f con v erges almost surely to P f . Giv en a class F of measurable functions, we deﬁne the empirical pro cess G n indexed b y F as { G n f } f ∈F =  √ n ( P n f − P f )  f ∈F . 5 Example 1.2. If F = { I ( x ≤ t ) } t ∈ R , then { P n f } f ∈F = { n − 1 P n i =1 I ( X i ≤ t ) } t ∈ R , which is the empiric al distribution function se en as a sto chastic pr o c ess indexe d by t . F urthermor e, we have { G n f } f ∈F = { n − 1 / 2 P n i =1 I ( X i ≤ t ) − P ( X i ≤ t ) } t ∈ R . Deﬁnition 1.3. A class F of me asur able functions f : X → R is c al le d P -Donsker if the pr o c ess { G n f } f ∈F c onver ges in distribution to a limit pr o c ess in the sp ac e ` ∞ ( F ) , wher e ` ∞ ( F ) is the c ol le ction of al l b ounde d functions f : X → R . The limit pr o c ess is a Gaussian pr o c ess G with zer o me an and c ovarianc e function E G f G g := P f g − P f P g ; this pr o c ess is known as a Br ownian Bridge. Let P ∗ n f = n − 1 P n i =1 f ( X ∗ i ) where { X ∗ 1 , . . . , X ∗ n } is a b o otstrap sample from P n , the measure that puts mass 1 /n on eac h element of the sample { X 1 , . . . , X n } . The b o otstr ap empiric al pr o c ess G ∗ n indexed b y F is deﬁned as { G ∗ n f } f ∈F = { √ n ( P ∗ n f − P n f ) } f ∈F . Theorem 1.4 (Theorem 2.4 in Gin ´ e and Zinn [1990]) . F is P -Donsker if and only if G ∗ n c onver ges in distribution to G in ` ∞ ( F ) . In words, Theorem 1.4 states that F is P -Donsker if and only if the b o otstrap empirical pro cess conv erges in distribution to the limit pro cess G given in Deﬁnition 1.3. Supp ose we are in terested in constructing a conﬁdence band of level 1 − α for { P f } f ∈F , where F is P -Donsker. Let ˆ θ = sup f ∈F | G n f | . W e pro ceed as follows: 1. Dra w X ∗ 1 , . . . , X ∗ n from P n and compute ˆ θ ∗ = sup f ∈F | G ∗ n f | . 2. Rep eat the previous step B times to obtain ˆ θ ∗ 1 , . . . , ˆ θ ∗ B . 3. Compute q α = inf n q : 1 B P B j =1 I ( ˆ θ ∗ j ≥ q ) ≤ α o . 4. F or f ∈ F deﬁne the conﬁdence band C n ( f ) = h P n f − q α √ n , P n f + q α √ n i . A consequence of Theorem 1.4 is that, for large n and B , the interv al [0 , q α ] has co v erage 1 − α for ˆ θ and the band C n ( f ) f ∈F has co verage 1 − α for { P f } f ∈F . 2 Applications of the Bo otstrap In this section, w e apply the bo otstrap from the previous section to p ersistence dia- grams, as well as to p ersistence landscap es. 6 2.1 P ersistence Diagrams Let X 1 , . . . , X n b e a sample from the distribution P , supp orted on a smooth manifold X ⊂ R D . Let p h ( x ) = R X 1 h D K  || x − u || h  dP ( u ) , where K : R → R is an in tegrable function satisfying R K ( u ) du = 1 and K ( u ) is nonnegative for all u ; th us p h is a probabilit y distribution. The function K is called a kernel and the parameter h > 0 is its b andwidth . Then p h is the density of the probability measure P h whic h is the con v olution P h = P ? K h where K h ( A ) = h − D K ( h − 1 A ) and K ( A ) = R A K ( t ) dt . P h is a smo othed version of P . Our target of inference in this section is P h , the p ersistence diagram of the sup er- lev el sets of p h . The standard estimator for p h is the kernel density estimator ˆ p h ( x ) = 1 n n X i =1 1 h D K  || x − X i || h  ; notice that if X i are ﬁxed, then ˆ p h is a p orbability distribution. Let b P h b e the corresp onding p ersistence diagram. W e wish to ﬁnd a conﬁdence set for P h , i.e. , an in terv al [0 , c n ] such that lim sup n →∞ P ( W ∞ ( b P h , P h ) ∈ [0 , c n ]) ≥ 1 − α . F rom Theorem 1.1 (Stabilit y), it suﬃces to ﬁnd c n suc h that lim sup n →∞ P ( k ˆ p h − p h k ∞ > c n ) ≤ α. T o ﬁnd c n , we use the bo otstrap. Let F = n f x ( u ) = 1 h D K  k x − u k h o x ∈ X . Using the notation of Section 1.3, it follo ws that P f x = p h ( x ), P n f x = ˆ p h ( x ) and ˆ θ = sup f x ∈F | G n f x | = √ n k ˆ p h − p h k ∞ . The appro ximated 1 − α quan tile q α can be obtained through sim ulation, i.e., q α = inf { q : 1 B P B j =1 I ( √ n || ˆ p j n − ˆ p n || ≥ q ) ≤ α } , where p j h ( x ) denotes the probabilit y distribution corresp onding to the j th b o otstrap sample. The follo wing result holds under suitable regularity conditions on the kernel K for which F is Donsk er; see Gin ´ e and Guillou [2002]. Theorem 2.1 (Lemma 15 in Balakrishnan et al. [2013]) . We have that lim sup n →∞ P  √ n k ˆ p h − p h k ∞ > q α  ≤ α. By the Stability Theorem, w e conclude: lim n →∞ P  W ∞ ( b P h , P h ) > q α √ n  ≤ α. Example 2.2 (T orus) . We emb e d the torus S 1 × S 1 in R 3 and we use the r eje ction sampling algorithm of Diac onis et al. [2012] ( R = 1 . 5 , r = 0 . 8 ) to sample 10 , 000 p oints uniformly fr om the torus. Then, we c ompute the p ersistenc e diagr am b P h using the Gaussian kernel with b andwidth h = 0 . 25 and use the b o otstr ap to c onstruct the 0 . 95% c onﬁdenc e interval [0 , 0 . 01] for W ∞ ( b P h , P h ) ; se e Figur e 2. Notic e that the c onﬁdenc e set c orr e ctly c aptur es the top olo gy of the torus. That is, only the p oints r epr esenting r e al fe atur es of the torus ar e signiﬁc antly far fr om the horizontal axis. 7 Figure 2: Left: Persistence Diagram of the superlevel sets of a k ernel density estimator on the 3D torus describ ed in Example 2.2. The b oxes of side = 2 × 0 . 01 around the p oin ts represent the 95% conﬁdence set for P h . Middle: 2D pro jection of the sup erlev el set { x : ˆ p h ( x ) > 0 . 034 } . Righ t: 2D pro jection of the sup erlevel set { x : ˆ p h ( x ) > 0 . 027 } . 2.2 Landscap es Let the diagrams P 1 , . . . , P n b e a sample from the distribution P ov er the space of p ersistence diagrams D T . Th us, by deﬁnition, we ha v e x + y ≤ T < ∞ and 0 ≤ y ≤ T / 2 for all ( x, y ) ∈ ∪ i P i . Let L 1 , . . . , L n b e the landscap e functions corresp onding to P 1 , . . . , P n . That is, L i ( t ) = L P i (1 , t ), as deﬁned in (1). W e deﬁne the me an landsc ap e µ ( t ) = E P [ L i ( t )] , and the empiric al me an landsc ap e L n ( t ) = 1 n P n i =1 L i ( t ) . In this section, we sho w that the pro cess √ n ( L n ( t ) − µ ( t )) conv erges to a Gaussian pro cess, so that we ma y use the procedure given in Section 1.3. Let F = { f t : 0 ≤ t ≤ T } , where f t : D → R is deﬁned by f t ( P ) = L P (1 , t ). W e note here that f t ( P ) = 0 if t / ∈ (0 , T ). W e can no w write √ n ( L n ( t ) − µ ( t )) as an empirical process indexed by t ∈ [0 , T ] : √ n ( L n ( t ) − µ ( t )) = √ n 1 n n X i =1 L i ( t ) − µ ( t ) ! = √ n ( P n f t − P f t ) ≡ G n f t . W e note that the constan t function F ( P ) = T / 2 is a measurable en v elop e for F . Giv en a probability measure Q o ver F , let k f − g k Q, 2 = q R | f − g | 2 dQ and let N ( F , L 2 ( Q ) , ε ) b e the cov ering num b er of F , that is, the size of the smallest ε -net in this metric. Lemma 2.3 (Theorem 2.5 in Kosorok [2008]) . L et F b e a class of me asur able func- tions satisfying R 1 0 q log sup Q N ( F , L 2 ( Q ) , ε k F k Q, 2 ) dε < ∞ , wher e F is a me asur able 8 envelop e of F and the supr emum is taken over al l ﬁnitely discr ete pr ob ability me asur es Q with k F k Q, 2 > 0 . If P F 2 < ∞ , then F is P -Donsker. Theorem 2.4 (W eak Con v ergence of Landscap es) . L et G b e a Br ownian bridge with c ovarianc e function κ ( t, u ) = R f t ( P ) f u ( P ) dP ( P ) − R f t ( P ) dP ( P ) R f u ( P ) dP ( P ) . Then, G n c onver ges in distribution to G . Pr o of. Since p ersistence landscap es are 1-Lipsc hitz, w e ha v e k f t − f u k Q, 2 ≤ | t − u | . Construct a regular grid 0 ≡ t 0 < t 1 < · · · < t N ≡ T , where t j +1 − t j = ε k F k Q, 2 = ε T / 2. W e claim that { f t j : 1 ≤ j ≤ N } is an ( ε T / 2)-net for F : choose f t ∈ F ; then there is a j so that t j ≤ t ≤ t j +1 and k f t j +1 − f t k Q, 2 ≤ | t j +1 − t | ≤ | t j +1 − t j | = ε T / 2 . The fact that { f t j : 1 ≤ j ≤ N } is an ( ε T / 2)-net implies sup Q N ( F , L 2 ( Q ) , ε k F k Q, 2 ) ≤ 2 /ε. Hence, R 1 0 q log sup Q N ( F , L 2 ( Q ) , ε k F k 2 ) dε < ∞ . F = T / 2 is trivially square-in tegrable. By Lemma 2.3, G n con v erges in distribution to G . No w that w e ha v e shown that G n con v erges to a Gaussian pro cess, we can fol- lo w the pro cedure outlined in Section 1.3. Let P n b e the empirical measure that puts mass 1 /n at eac h diagram P i . W e dra w P ∗ 1 , . . . P ∗ n from P n and construct the corresp onding landscap es L ∗ 1 , . . . , L ∗ n . Let L ∗ n b e the empirical mean and ˆ θ ∗ = sup t ∈ R | √ n ( L ∗ n ( t ) − L n ( t )) | . Repeating this B times, w e obtain ˆ θ ∗ 1 , . . . ˆ θ ∗ B , and w e com- pute the quantile q α . Theorem 2.5 (Conﬁdence Band for P ersisten t Landscap es) . The interval C n ( t ) in- dexe d by t ∈ R , deﬁne d by C n ( t ) = h L n ( t ) − q α √ n , L n ( t ) + q α √ n i , is a c onﬁdenc e b and for µ ( t ) : lim n →∞ P ( µ ( t ) ∈ C n ( t ) for al l t ) ≥ 1 − α. Example 2.6 (Circles) . Given the nine cir cles of r adii 0 . 4 and 0 . 3 , shown in Fig- ur e 3, we obtain a sample X 1 , . . . , X 100 as fol lows: ﬁrst, cho ose a cir cle C i uniformly at r andom, then sample a p oint iid fr om C i . L et P b e the (Betti 1) p ersistenc e di- agr am c orr esp onding to the R ips ﬁltr ation for the sample, and L b e the landsc ap e c orr esp onding to P . 2 We r ep e at this 50 times to obtain diagr ams P 1 , . . . P 50 and landsc ap es L 1 , . . . L 50 . Then, we use the b o otstr ap pr o c e dur e to obtain the quantile q α = 0 . 234 . T o gether with L 50 , this gives us an appr oximate d 95% c onﬁdenc e b and for µ ( t ) = E P ( L i ( t )) . On the right of Figur e 3 we show the empiric al me an landsc ap e L 50 with the 95% c onﬁdenc e b and for µ ( t ) . 2 Note that, since in this example w e are using sublevel sets, the role of birth and death in the deﬁnitions of section 1.1 is inv erted. The death time d is greater than the birth time b . 9 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 2 − 1 0 1 2 − 2 − 1 0 1 2 Space 0.0 0.5 1.0 1.5 0.00 0.10 0.20 Mean Landscape with Bootstrapped 95% band Landscape Figure 3: Left: The set of circles from whic h samples are taken. Righ t: The conﬁdence band for the p ersistence landscap e corresp onding to the distance to the p oin t set. 2.3 Discussion In this pap er, w e ha v e described the b o otstrap as it applies to p ersistence diagrams and landscap es. The purp ose of this pap er w as to introduce the b o otstrap and the b o otstrap empirical process to top ologists. In a related pap er (Balakrishnan et al. [2013]), aimed tow ards a statistical audience, we deriv e the con v ergence rates for the tec hnique presented in Section 2.1, as well as present three other metho ds for computing conﬁdence sets for p ersistence diagrams. The p ersistence landscap e can be thought of as a summary function of a p er- sistence diagram. The b o otstrap metho d that we presented in Section 2.2 trivially generalizes to handle all landscap es L ( k , t ). F urthermore, we need not limit the scope of this metho d to landscap e functions. In a future pap er, we plan to in v estigate other meaningful summary functions as well as the con v ergence rates for the tec hniques presen ted in Section 2.2. W e ha ve demonstrated ho w the bo otstrap w orks for tw o examples, giv en in Fig- ure 2 and Figure 3. Part of our ongoing researc h is in vestigating applications for these conﬁdence in terv als; in particular, we are applying it to real (rather than sim ulated) data sets. One can use the conﬁdence interv als for h yp othesis testing, but an op en question is how to determine the pow er of such a test. Ac kno wledgemen t The authors would lik e to thank Siv araman Balakrishnan for his insightful discussions. 10 References Siv araman Balakrishnan, Brittan y T erese F asy , F abrizio Lecci, Alessandro Rinaldo, Aarti Singh, and Larry W asserman. Statistical inference for p ersisten t homology , 2013. P eter Bubenik. Statistical topology using p ersistence landscap es, 2012. F r´ ed ´ eric Chazal, Vin de Silv a, Marc Glisse, and Stev e Oudot. The structure and stabilit y of p ersistence mo dules, July 2012. Da vid Cohen-Steiner, Herb ert Edelsbrunner, and John Harer. Stabilit y of p ersistence diagrams. Discr ete Comput. Ge om. , 37(1):103–120, 2007. An thon y Christopher Da vison and D. V. Hinkley . Bo otstr ap Metho ds and Their Ap- plic ation , v olume 1. Cambridge UP , 1997. P ersi Diaconis, Susan Holmes, and Mehrdad Shahshahani. Sampling from a manifold, 2012. Herb ert Edelsbrunner and John Harer. Computational T op olo gy. A n Intr o duction . Amer. Math. So c., Providence, RI, 2010. Bradley Efron. Bootstrap metho ds: Another lo ok at the jackknife. A nn. Statist. , pages 1–26, 1979. Bradley Efron, Robert Tibshirani, John D. Storey , and Virginia T usher. Empirical Ba y es analysis of a microarray experiment. J. Amer. Statist. Asso c. , 96(456):1151– 1160, 2001. Ev arist Gin´ e and Armelle Guillou. Rates of strong uniform consistency for m ul- tiv ariate kernel densit y estimators. In A nnales de l’Institut Henri Poinc ar e (B) Pr ob ability and Statistics , v olume 38, pages 907–921. Elsevier, 2002. Ev arist Gin´ e and Joel Zinn. Bootstrapping general empirical measures. The A nnals of Pr ob ability , pages 851–869, 1990. Mic hael R. Kosorok. Intr o duction to Empiric al Pr o c esses and Semip ar ametric Infer- enc e . Springer, 2008. Aad V an der V aart. Asymptotic statistics , v olume 3. Cam bridge univ ersit y press, 2000. Aad V an der V aart and Jon W ellner. We ak Conver genc e and Empiric al Pr o c esses: With Applic ations to Statistics . Springer, 1996. 11

On the Bootstrap for Persistence Diagrams and Landscapes

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment