On the Bootstrap for Persistence Diagrams and Landscapes

Persistent homology probes topological properties from point clouds and functions. By looking at multiple scales simultaneously, one can record the births and deaths of topological features as the scale varies. In this paper we use a statistical tech…

Authors: Frederic Chazal, Brittany Terese Fasy, Fabrizio Lecci

On the Bootstrap for Persistence Diagrams and Landscapes
On the Bo otstrap for P ersistence Diagrams and Landscap es F r ´ ed ´ eric Chazal 1 , Brittan y T erese F asy 2 , F abrizio Lecci 3 , Alessandro Rinaldo 3 , Aarti Singh 4 , and Larry W asserman 3 1 INRIA Saclay 2 Computer Scienc e Dep artment, T ulane University 3 Dep artment of Statistics, Carne gie Mel lon University 4 Machine L e arning Dep artment, Carne gie Mel lon University topstat@stat.cm u.edu Jan uary 22, 2014 Abstract P ersistent homology prob es top ological prop erties from p oin t clouds and func- tions. By lo oking at m ultiple scales simultaneously , one can record the births and deaths of top olo gic al fe atur es as the scale v aries. In this pap er we use a statistical tec hnique, the empirical b o otstrap, to separate top olo gic al signal from top olo gic al noise . In particular, we deriv e confidence sets for p ersistence diagrams and confidence bands for p ersistence landscap es. 1 In tro duction P ersisten t homology is a metho d for studying the homology at multiple scales sim ulta- neously . Given a manifold X embedded in a metric space Y , we consider a probabilit y densit y function p : Y → R , defined o ver Y but concentrated around X ; that is, the densit y is p ositive for a small neighborho o d around X and very small o ver Y \ X . F or the right scale parameter t , the sup erlevel set p − 1 ([ t, ∞ )) captures the homology of X . The problem, ho wev er, is that t is not kno wn a priori. P ersistent homology quan tifies the top ological c hanges of the sup erlevel sets with a m ultiset of points in the extended plane; we call this m ultiset the p ersistence diagram, and denote it b y P . Another wa y to represen t the information con tained in a p ersistence diagram is with the landsc ap e function L : R → R , which can be thought of as a functional summary of P ; we define these concepts in Section 1.1. Computationally , it ma y b e difficult to compute P or L directly . Instead, w e assume that p corresp onds to a probabilit y distribution P , from whic h w e can sample. Giv en a sample of size n , we create an estimate of the probability densit y function p n using a kernel densit y estimate. As n increases, p n approac hes the true probability densit y . Giv en n large enough, w e compute the persistence diagram P n and the landscap e L n corresp onding to p n . Sometimes knowing the estimate of a persistence diagram or landscap e is not enough. The bigger question is: Ho w close is the estimated p ersistence diagram or landscap e to the true one? W e answ er this question b y constructing a c onfidenc e set for p ersistenc e diagr ams and a c onfidenc e b and for p ersistenc e landsc ap es . A (1 − α ) -c onfidenc e interval for a parameter θ is an in terv al [ a, b ] suc h that the probabilit y P ( θ ∈ [ a, b ]) is at least 1 − α . In our setting, we desire to find a confidence set for a p ersistence diagram P . T o do so, w e compute an estimated diagram b P and and in terv al [0 , c ] suc h that the b ottlenec k distance b et ween P and b P is contained in [0 , c ] with probability 1 − α . That is, w e find a metric ball con taining P with high probabilit y . In this pap er, we presen t the b o otstrap, a metho d for computing confidence inter- v als, and w e apply it to p ersistence diagrams and landscap es. After briefly reviewing the necessary concepts from computational top ology , we giv e the general technique of b o otstrapping in statistics in Section 1.2. In Section 2, w e apply the b o otstrap to p ersistence diagrams and landscap es, providing a few examples of these confidence in terv als. W e conclude in Section 2.3 with a discussion of our ongoing research and op en questions. 1 Bac kground Before presenting our results, we review the necessary definitions and theorems from p ersisten t homology . Then, w e present the b o otstrap. Due to space constrain ts, we co v er the basics and provide references for a more detailed description. 2 1.1 P ersistence Diagrams and Landscap es Let Y b e a metric space, for example. let Y be a compact subspace of R D . Supp ose w e ha ve a probability density function p : Y → R concen trated in a neigh b orho o d of a manifold X ⊆ Y . Persisten t homology monitors the ev olution of the generators of the homology groups of p − 1 ([ t, ∞ )), the sup erlev el sets of p , and assigns to each generator of these groups a birth time (or scale) b and a death time d . The persistence diagram P records each pair ( b, d ) as the point ( b + d 2 , b − d 2 ); that is, the x -co ordinate is the mid-life of the homological feature and the y -co ordinate is the half-life or half of the p ersistence of the feature. 1 W e refer the reader to Edelsbrunner and Harer [2010] for a more complete in tro duction to p ersisten t homology . Let D T b e the space of p ositive, coun table, T -b ounded p ersistence diagrams; that is, for each p oin t ( x, y ) = ( b + d 2 , b − d 2 ) ∈ P , w e hav e 0 ≤ d ≤ b ≤ T and there are a coun table n umber of p oin ts for whic h y > 0. W e note here that each p oin t on the line x = 0 is included in the p ersistence diagram P with infinite multiplicit y . Letting W ∞ ( P 1 , P 2 ) denote the b ottleneck distance b etw een diagrams P 1 and P 2 , the space ( D , W ∞ ) is a metric space. W e then hav e the follo wing stabilit y result from Cohen- Steiner et al. [2007] and generalized in Chazal et al. [2012]: Theorem 1.1 (Stabilit y Theorem) . L et M b e finitely triangulable. L et f , g : M → R b e two c ontinuous functions. Then, the c orr esp onding p ersistenc e diagr ams P f and P g ar e wel l define d, and W ∞ ( P f , P g ) ≤ k f − g k ∞ . Bub enik [2012] introduced another representation called the p ersistence landscap e, whic h is in one-to-one corresp ondence with p ersistence diagrams. A p ersistence land- scap e is a con tinuous, piecewise linear function L : Z + × R → R . T o define the p er- sistence landscap e function, we replace each p ersistence p oin t p = ( x, y ) =  b + d 2 , b − d 2  with the triangle function t p ( z ) =      z − x + y z ∈ [ x − y , x ] x + y − z z ∈ ( x, x + y ] 0 otherwise =      z − d z ∈ [ d, b + d 2 ] b − z z ∈ ( b + d 2 , b ] 0 otherwise . Notice that p is itself on the graph of t p ( z ). W e obtain an arrangement of curves b y ov erlaying the graphs of the functions { t p ( z ) } p ∈P ; see Figure 1. The p ersistence landscap e is defined formally as a w alk through this arrangement: L P ( k , z ) = kmax p ∈P t p ( z ) , (1) where kmax is the k th maximum v alue in the set; in particular, 1max is the usual maxim um function. Observe that L P ( k , z ) is 1-Lipsc hitz. F or the ease of exp osition, 1 In this pap er, we focus on the p ersistent homology of the sup erlev el set filtration of a density function. Thus, the birth time b is greater than the death time d . 3 (birth-death)/2 0 2 4 6 8 10 0 1 2 3 4 Figure 1: The pink circles are the p oin ts in a p ersistence diagram. The cyan curve is the landscape L (1 , · ). w e will focus on k = 1 in this paper, using L ( z ) = L P (1 , z ). Ho wev er, the ideas we presen t in Section 2.2 hold for k > 1. Our definition of the p ersistence landscap e is equiv alen t to the original definition giv en in Bub enik [2012]. 1.2 The Standard Bo otstrap In tro duced in Efron [1979], the bo otstrap is a general method for estimating standard errors and for computing confidence interv als. W e fo cus on the latter in this pap er, but refer the interested reader to Efron et al. [2001], Da vison and Hinkley [1997], and V an der V aart [2000] for more details on the versatilit y of the b o otstrap. Let X 1 , . . . , X n b e indep enden t and iden tically distributed random v ariables tak- ing v alues in the measure space ( X , X , P ). Supp ose w e are interested in estimating the real-v alued parameter θ corresponding to the distribution P of the observ ation. W e estimate θ using the statistic ˆ θ = g ( X 1 , . . . , X n ), whic h is some function of the data. F or example, θ and ˆ θ could b e the p opulation mean and the sample mean, resp ectiv ely . The distribution of the difference ˆ θ − θ con tains all the information that w e need to construct a confidence in terv al of level 1 − α for θ ; that is, an interv al [ a, b ] dep ending on the data X 1 , . . . , X n suc h that P ( θ ∈ [ a, b ]) ≥ 1 − α. If w e knew the cum ulativ e distribution F of ˆ θ − θ , then the quantiles F − 1 (1 − α/ 2) and F − 1 ( α/ 2) can b e computed. F urthermore, setting a = ˆ θ − F − 1 (1 − α/ 2) and b = ˆ θ − F − 1 ( α/ 2), w e immediately obtain a (1 − α )-confidence interv al for θ : P ( θ ∈ [ a, b ]) = P  F − 1  α 2  ≤ ˆ θ − θ ≤ F − 1  1 − α 2  = 1 − α. Unfortunately , the distribution of ˆ θ − θ depends on the unkno wn distribution P . In the first step in the bo otstrap pro cedure, w e appro ximate the unknown P with the empirical measure P n that puts mass 1 /n at eac h X i in the sample. Let X ∗ 1 , . . . , X ∗ n b e a sample of size n from P n . Equiv alently , we can think of dra wing X ∗ 1 , . . . , X ∗ n from X 1 , . . . , X n with replacemen t. W e estimate the distribution F ( r ) with the distribution b F ( r ) = P n ( ˆ θ ∗ − ˆ θ ≤ r ), where ˆ θ ∗ = g ( X ∗ 1 , . . . , X ∗ n ). 4 The distribution b F is still not analytically computable, but can b e approximated b y simulation: for large B , obtain B different v alues of ˆ θ ∗ and appro ximate b F ( r ), and hence F ( r ), with e F ( r ) = 1 B P B j =1 I ( ˆ θ ∗ j − ˆ θ ≤ r ) . Since the quantiles of e F appro ximate the quan tiles of F , w e define the estimated confidence in terv al as C n = h ˆ θ − e F − 1 n (1 − α/ 2) , ˆ θ − e F − 1 n ( α/ 2) i . (2) In summary , the standard bo otstrap pro cedure is: 1. Compute the estimate ˆ θ = g ( X 1 , . . . , X n ). 2. Dra w X ∗ 1 , . . . , X ∗ n from P n and compute ˆ θ ∗ = g ( X ∗ 1 , . . . , X ∗ n ). 3. Rep eat the previous step B times to obtain ˆ θ ∗ 1 , . . . , ˆ θ ∗ B . 4. Compute the quantiles of e F and construct the confidence interv al C n . There are t wo sources of error in the Bootstrap pro cedure. W e first approximate F with b F and then we estimate b F by simulation. The second error can b e made arbitrarily small, b y c ho osing B large enough. Therefore, this error is usually ignored in the theory of the b o otstrap. F ormally , one has to show that sup r    e F ( r ) − F ( r )    P → 0 , whic h implies that the confidence in terv al C n , defined in (2), is asymptotic al ly c onsistent at lev el 1 − α ; that is, lim inf n →∞ P ( θ ∈ C n ) ≥ 1 − α . 1.3 The Bo otstrap Empirical Pro cess When a random v ariable is a function rather than a real v alue, the b o otstrap pro ce- dure describ ed ab ov e can b e used to construct a confidence in terv al for the function ev aluated at a particular elemen t of the domain. Instead, we use the b o otstr ap empiri- c al pr o c ess , whic h can be used to find a confidence band for a function h ( t ); that is, w e find a pair of functions a ( t ) and b ( t ) suc h that the probability that h ( t ) ∈ [ a ( t ) , b ( t )] for all t is at least 1 − α . W e describ e this tec hnique b elo w, but refer the reader to V an der V aart and W ellner [1996] and Kosorok [2008] for more details. An empiric al pr o c ess is a sto chastic pro cess based on a random sample. Let X 1 , . . . , X n b e independent and iden tically distributed random v ariables taking v alues in the measure space ( X , X , P ). F or a measurable function f : X → R , w e denote P f = R f dP and P n f = R f dP n = n − 1 P n i =1 f ( X i ). By the la w of large n umbers P n f con v erges almost surely to P f . Giv en a class F of measurable functions, we define the empirical pro cess G n indexed b y F as { G n f } f ∈F =  √ n ( P n f − P f )  f ∈F . 5 Example 1.2. If F = { I ( x ≤ t ) } t ∈ R , then { P n f } f ∈F = { n − 1 P n i =1 I ( X i ≤ t ) } t ∈ R , which is the empiric al distribution function se en as a sto chastic pr o c ess indexe d by t . F urthermor e, we have { G n f } f ∈F = { n − 1 / 2 P n i =1 I ( X i ≤ t ) − P ( X i ≤ t ) } t ∈ R . Definition 1.3. A class F of me asur able functions f : X → R is c al le d P -Donsker if the pr o c ess { G n f } f ∈F c onver ges in distribution to a limit pr o c ess in the sp ac e ` ∞ ( F ) , wher e ` ∞ ( F ) is the c ol le ction of al l b ounde d functions f : X → R . The limit pr o c ess is a Gaussian pr o c ess G with zer o me an and c ovarianc e function E G f G g := P f g − P f P g ; this pr o c ess is known as a Br ownian Bridge. Let P ∗ n f = n − 1 P n i =1 f ( X ∗ i ) where { X ∗ 1 , . . . , X ∗ n } is a b o otstrap sample from P n , the measure that puts mass 1 /n on eac h element of the sample { X 1 , . . . , X n } . The b o otstr ap empiric al pr o c ess G ∗ n indexed b y F is defined as { G ∗ n f } f ∈F = { √ n ( P ∗ n f − P n f ) } f ∈F . Theorem 1.4 (Theorem 2.4 in Gin ´ e and Zinn [1990]) . F is P -Donsker if and only if G ∗ n c onver ges in distribution to G in ` ∞ ( F ) . In words, Theorem 1.4 states that F is P -Donsker if and only if the b o otstrap empirical pro cess conv erges in distribution to the limit pro cess G given in Definition 1.3. Supp ose we are in terested in constructing a confidence band of level 1 − α for { P f } f ∈F , where F is P -Donsker. Let ˆ θ = sup f ∈F | G n f | . W e pro ceed as follows: 1. Dra w X ∗ 1 , . . . , X ∗ n from P n and compute ˆ θ ∗ = sup f ∈F | G ∗ n f | . 2. Rep eat the previous step B times to obtain ˆ θ ∗ 1 , . . . , ˆ θ ∗ B . 3. Compute q α = inf n q : 1 B P B j =1 I ( ˆ θ ∗ j ≥ q ) ≤ α o . 4. F or f ∈ F define the confidence band C n ( f ) = h P n f − q α √ n , P n f + q α √ n i . A consequence of Theorem 1.4 is that, for large n and B , the interv al [0 , q α ] has co v erage 1 − α for ˆ θ and the band C n ( f ) f ∈F has co verage 1 − α for { P f } f ∈F . 2 Applications of the Bo otstrap In this section, w e apply the bo otstrap from the previous section to p ersistence dia- grams, as well as to p ersistence landscap es. 6 2.1 P ersistence Diagrams Let X 1 , . . . , X n b e a sample from the distribution P , supp orted on a smooth manifold X ⊂ R D . Let p h ( x ) = R X 1 h D K  || x − u || h  dP ( u ) , where K : R → R is an in tegrable function satisfying R K ( u ) du = 1 and K ( u ) is nonnegative for all u ; th us p h is a probabilit y distribution. The function K is called a kernel and the parameter h > 0 is its b andwidth . Then p h is the density of the probability measure P h whic h is the con v olution P h = P ? K h where K h ( A ) = h − D K ( h − 1 A ) and K ( A ) = R A K ( t ) dt . P h is a smo othed version of P . Our target of inference in this section is P h , the p ersistence diagram of the sup er- lev el sets of p h . The standard estimator for p h is the kernel density estimator ˆ p h ( x ) = 1 n n X i =1 1 h D K  || x − X i || h  ; notice that if X i are fixed, then ˆ p h is a p orbability distribution. Let b P h b e the corresp onding p ersistence diagram. W e wish to find a confidence set for P h , i.e. , an in terv al [0 , c n ] such that lim sup n →∞ P ( W ∞ ( b P h , P h ) ∈ [0 , c n ]) ≥ 1 − α . F rom Theorem 1.1 (Stabilit y), it suffices to find c n suc h that lim sup n →∞ P ( k ˆ p h − p h k ∞ > c n ) ≤ α. T o find c n , we use the bo otstrap. Let F = n f x ( u ) = 1 h D K  k x − u k h o x ∈ X . Using the notation of Section 1.3, it follo ws that P f x = p h ( x ), P n f x = ˆ p h ( x ) and ˆ θ = sup f x ∈F | G n f x | = √ n k ˆ p h − p h k ∞ . The appro ximated 1 − α quan tile q α can be obtained through sim ulation, i.e., q α = inf { q : 1 B P B j =1 I ( √ n || ˆ p j n − ˆ p n || ≥ q ) ≤ α } , where p j h ( x ) denotes the probabilit y distribution corresp onding to the j th b o otstrap sample. The follo wing result holds under suitable regularity conditions on the kernel K for which F is Donsk er; see Gin ´ e and Guillou [2002]. Theorem 2.1 (Lemma 15 in Balakrishnan et al. [2013]) . We have that lim sup n →∞ P  √ n k ˆ p h − p h k ∞ > q α  ≤ α. By the Stability Theorem, w e conclude: lim n →∞ P  W ∞ ( b P h , P h ) > q α √ n  ≤ α. Example 2.2 (T orus) . We emb e d the torus S 1 × S 1 in R 3 and we use the r eje ction sampling algorithm of Diac onis et al. [2012] ( R = 1 . 5 , r = 0 . 8 ) to sample 10 , 000 p oints uniformly fr om the torus. Then, we c ompute the p ersistenc e diagr am b P h using the Gaussian kernel with b andwidth h = 0 . 25 and use the b o otstr ap to c onstruct the 0 . 95% c onfidenc e interval [0 , 0 . 01] for W ∞ ( b P h , P h ) ; se e Figur e 2. Notic e that the c onfidenc e set c orr e ctly c aptur es the top olo gy of the torus. That is, only the p oints r epr esenting r e al fe atur es of the torus ar e signific antly far fr om the horizontal axis. 7 Figure 2: Left: Persistence Diagram of the superlevel sets of a k ernel density estimator on the 3D torus describ ed in Example 2.2. The b oxes of side = 2 × 0 . 01 around the p oin ts represent the 95% confidence set for P h . Middle: 2D pro jection of the sup erlev el set { x : ˆ p h ( x ) > 0 . 034 } . Righ t: 2D pro jection of the sup erlevel set { x : ˆ p h ( x ) > 0 . 027 } . 2.2 Landscap es Let the diagrams P 1 , . . . , P n b e a sample from the distribution P ov er the space of p ersistence diagrams D T . Th us, by definition, we ha v e x + y ≤ T < ∞ and 0 ≤ y ≤ T / 2 for all ( x, y ) ∈ ∪ i P i . Let L 1 , . . . , L n b e the landscap e functions corresp onding to P 1 , . . . , P n . That is, L i ( t ) = L P i (1 , t ), as defined in (1). W e define the me an landsc ap e µ ( t ) = E P [ L i ( t )] , and the empiric al me an landsc ap e L n ( t ) = 1 n P n i =1 L i ( t ) . In this section, we sho w that the pro cess √ n ( L n ( t ) − µ ( t )) conv erges to a Gaussian pro cess, so that we ma y use the procedure given in Section 1.3. Let F = { f t : 0 ≤ t ≤ T } , where f t : D → R is defined by f t ( P ) = L P (1 , t ). W e note here that f t ( P ) = 0 if t / ∈ (0 , T ). W e can no w write √ n ( L n ( t ) − µ ( t )) as an empirical process indexed by t ∈ [0 , T ] : √ n ( L n ( t ) − µ ( t )) = √ n 1 n n X i =1 L i ( t ) − µ ( t ) ! = √ n ( P n f t − P f t ) ≡ G n f t . W e note that the constan t function F ( P ) = T / 2 is a measurable en v elop e for F . Giv en a probability measure Q o ver F , let k f − g k Q, 2 = q R | f − g | 2 dQ and let N ( F , L 2 ( Q ) , ε ) b e the cov ering num b er of F , that is, the size of the smallest ε -net in this metric. Lemma 2.3 (Theorem 2.5 in Kosorok [2008]) . L et F b e a class of me asur able func- tions satisfying R 1 0 q log sup Q N ( F , L 2 ( Q ) , ε k F k Q, 2 ) dε < ∞ , wher e F is a me asur able 8 envelop e of F and the supr emum is taken over al l finitely discr ete pr ob ability me asur es Q with k F k Q, 2 > 0 . If P F 2 < ∞ , then F is P -Donsker. Theorem 2.4 (W eak Con v ergence of Landscap es) . L et G b e a Br ownian bridge with c ovarianc e function κ ( t, u ) = R f t ( P ) f u ( P ) dP ( P ) − R f t ( P ) dP ( P ) R f u ( P ) dP ( P ) . Then, G n c onver ges in distribution to G . Pr o of. Since p ersistence landscap es are 1-Lipsc hitz, w e ha v e k f t − f u k Q, 2 ≤ | t − u | . Construct a regular grid 0 ≡ t 0 < t 1 < · · · < t N ≡ T , where t j +1 − t j = ε k F k Q, 2 = ε T / 2. W e claim that { f t j : 1 ≤ j ≤ N } is an ( ε T / 2)-net for F : choose f t ∈ F ; then there is a j so that t j ≤ t ≤ t j +1 and k f t j +1 − f t k Q, 2 ≤ | t j +1 − t | ≤ | t j +1 − t j | = ε T / 2 . The fact that { f t j : 1 ≤ j ≤ N } is an ( ε T / 2)-net implies sup Q N ( F , L 2 ( Q ) , ε k F k Q, 2 ) ≤ 2 /ε. Hence, R 1 0 q log sup Q N ( F , L 2 ( Q ) , ε k F k 2 ) dε < ∞ . F = T / 2 is trivially square-in tegrable. By Lemma 2.3, G n con v erges in distribution to G . No w that w e ha v e shown that G n con v erges to a Gaussian pro cess, we can fol- lo w the pro cedure outlined in Section 1.3. Let P n b e the empirical measure that puts mass 1 /n at eac h diagram P i . W e dra w P ∗ 1 , . . . P ∗ n from P n and construct the corresp onding landscap es L ∗ 1 , . . . , L ∗ n . Let L ∗ n b e the empirical mean and ˆ θ ∗ = sup t ∈ R | √ n ( L ∗ n ( t ) − L n ( t )) | . Repeating this B times, w e obtain ˆ θ ∗ 1 , . . . ˆ θ ∗ B , and w e com- pute the quantile q α . Theorem 2.5 (Confidence Band for P ersisten t Landscap es) . The interval C n ( t ) in- dexe d by t ∈ R , define d by C n ( t ) = h L n ( t ) − q α √ n , L n ( t ) + q α √ n i , is a c onfidenc e b and for µ ( t ) : lim n →∞ P ( µ ( t ) ∈ C n ( t ) for al l t ) ≥ 1 − α. Example 2.6 (Circles) . Given the nine cir cles of r adii 0 . 4 and 0 . 3 , shown in Fig- ur e 3, we obtain a sample X 1 , . . . , X 100 as fol lows: first, cho ose a cir cle C i uniformly at r andom, then sample a p oint iid fr om C i . L et P b e the (Betti 1) p ersistenc e di- agr am c orr esp onding to the R ips filtr ation for the sample, and L b e the landsc ap e c orr esp onding to P . 2 We r ep e at this 50 times to obtain diagr ams P 1 , . . . P 50 and landsc ap es L 1 , . . . L 50 . Then, we use the b o otstr ap pr o c e dur e to obtain the quantile q α = 0 . 234 . T o gether with L 50 , this gives us an appr oximate d 95% c onfidenc e b and for µ ( t ) = E P ( L i ( t )) . On the right of Figur e 3 we show the empiric al me an landsc ap e L 50 with the 95% c onfidenc e b and for µ ( t ) . 2 Note that, since in this example w e are using sublevel sets, the role of birth and death in the definitions of section 1.1 is inv erted. The death time d is greater than the birth time b . 9 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 2 − 1 0 1 2 − 2 − 1 0 1 2 Space 0.0 0.5 1.0 1.5 0.00 0.10 0.20 Mean Landscape with Bootstrapped 95% band Landscape Figure 3: Left: The set of circles from whic h samples are taken. Righ t: The confidence band for the p ersistence landscap e corresp onding to the distance to the p oin t set. 2.3 Discussion In this pap er, w e ha v e described the b o otstrap as it applies to p ersistence diagrams and landscap es. The purp ose of this pap er w as to introduce the b o otstrap and the b o otstrap empirical process to top ologists. In a related pap er (Balakrishnan et al. [2013]), aimed tow ards a statistical audience, we deriv e the con v ergence rates for the tec hnique presented in Section 2.1, as well as present three other metho ds for computing confidence sets for p ersistence diagrams. The p ersistence landscap e can be thought of as a summary function of a p er- sistence diagram. The b o otstrap metho d that we presented in Section 2.2 trivially generalizes to handle all landscap es L ( k , t ). F urthermore, we need not limit the scope of this metho d to landscap e functions. In a future pap er, we plan to in v estigate other meaningful summary functions as well as the con v ergence rates for the tec hniques presen ted in Section 2.2. W e ha ve demonstrated ho w the bo otstrap w orks for tw o examples, giv en in Fig- ure 2 and Figure 3. Part of our ongoing researc h is in vestigating applications for these confidence in terv als; in particular, we are applying it to real (rather than sim ulated) data sets. One can use the confidence interv als for h yp othesis testing, but an op en question is how to determine the pow er of such a test. Ac kno wledgemen t The authors would lik e to thank Siv araman Balakrishnan for his insightful discussions. 10 References Siv araman Balakrishnan, Brittan y T erese F asy , F abrizio Lecci, Alessandro Rinaldo, Aarti Singh, and Larry W asserman. Statistical inference for p ersisten t homology , 2013. P eter Bubenik. Statistical topology using p ersistence landscap es, 2012. F r´ ed ´ eric Chazal, Vin de Silv a, Marc Glisse, and Stev e Oudot. The structure and stabilit y of p ersistence mo dules, July 2012. Da vid Cohen-Steiner, Herb ert Edelsbrunner, and John Harer. Stabilit y of p ersistence diagrams. Discr ete Comput. Ge om. , 37(1):103–120, 2007. An thon y Christopher Da vison and D. V. Hinkley . Bo otstr ap Metho ds and Their Ap- plic ation , v olume 1. Cambridge UP , 1997. P ersi Diaconis, Susan Holmes, and Mehrdad Shahshahani. Sampling from a manifold, 2012. Herb ert Edelsbrunner and John Harer. Computational T op olo gy. A n Intr o duction . Amer. Math. So c., Providence, RI, 2010. Bradley Efron. Bootstrap metho ds: Another lo ok at the jackknife. A nn. Statist. , pages 1–26, 1979. Bradley Efron, Robert Tibshirani, John D. Storey , and Virginia T usher. Empirical Ba y es analysis of a microarray experiment. J. Amer. Statist. Asso c. , 96(456):1151– 1160, 2001. Ev arist Gin´ e and Armelle Guillou. Rates of strong uniform consistency for m ul- tiv ariate kernel densit y estimators. In A nnales de l’Institut Henri Poinc ar e (B) Pr ob ability and Statistics , v olume 38, pages 907–921. Elsevier, 2002. Ev arist Gin´ e and Joel Zinn. Bootstrapping general empirical measures. The A nnals of Pr ob ability , pages 851–869, 1990. Mic hael R. Kosorok. Intr o duction to Empiric al Pr o c esses and Semip ar ametric Infer- enc e . Springer, 2008. Aad V an der V aart. Asymptotic statistics , v olume 3. Cam bridge univ ersit y press, 2000. Aad V an der V aart and Jon W ellner. We ak Conver genc e and Empiric al Pr o c esses: With Applic ations to Statistics . Springer, 1996. 11

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment