Statistical topological data analysis using persistence landscapes

ST A TISTICAL TOPOLOGICAL D A T A ANAL YSIS USING PERSISTENCE LANDSCAPES PETER BUBENIK Abstract. W e deﬁne a new top ological summary for data that w e call the p ersistence landscap e. Since this summary lies in a vector space, it is easy to combine with tools from statistics and mac hine learning, in con trast to the standard top ological summaries. Viewed as a random v ariable with v alues in a Banach space, this summary obeys a strong la w of large n umbers and a central limit theorem. W e show how a n umber of standard statistical tests can b e used for statistical inference using this summary . W e also pro ve that this summary is stable and that it can b e used to provide lo w er b ounds for the b ottlenec k and W asserstein distances. 1. Introduction T op ological data analysis (TD A) consists of a growing set of metho ds that provide insigh t to the “shap e” of data (see the surv eys Ghrist, 2008; Carlsson, 2009). These to ols may b e of particular use in understanding global features of high dimensional data that are not readily accessible using other techniques. The use of TD A has b een limited by the diﬃcult y of com bining the main to ol of the sub ject, the b ar c o de or p ersistenc e diagr am with statistics and machine learning. Here we presen t an alternative approac h, using a new summary that w e call the p ersistenc e landsc ap e . The main technical adv antage of this descriptor is that it is a function and so w e can use the v ector space structure of its underlying function space. In fact, this function space is a separable Banach space and w e apply the theory of random v ariables with v alues in suc h spaces. F urthermore, since the p ersistence landscap es are sequences of piecewise-linear functions, calculations with them are muc h faster than the corresp onding calculations with barco des or p ersistence diagrams, removing a second serious obstruction to the wider use of top ological metho ds in data analysis. Notable successes of TD A include the discov ery of a subgroup of breast cancers b y Nicolau et al. (2011), an understanding of the top ology of the space of natural images by Carlsson et al. (2008) and the top ology of ortho don tic data b y Heo et al. (2012), and the detection of genes with a p erio dic proﬁle b y Dequ ´ ean t et al. (2008). De Silv a and Ghrist (2007b,a) used top ology to pro v e co verage in sensor net works. In the standard paradigm for TDA, one starts with data that one enco des as a ﬁnite set of p oin ts in R n or more generally in some metric space. Then one applies some geometric con- struction to which one applies to ols from algebraic top ology . The end result is a top ological summary of the data. The standard top ological descriptors are the barco de and the p er- sistence diagram (Edelsbrunner et al., 2002; Zomoro dian and Carlsson, 2005; Cohen-Steiner 1 et al., 2007), which giv e a multiscale represen tation of the homolo gy (Hatc her, 2002) of the geometric construction. Roughly , homology in degree 0 describ es the connectedness of the data; homology in degree 1 detects holes or tunnels; homology in degree 2 captures v oids; and so on. Of particular interest are the homological features that p ersist as the resolution c hanges. W e will giv e precise deﬁnitions and an illustrativ e example of this metho d, called p ersistent homolo gy or top olo gic al p ersistenc e , in Section 2. No w let us take a statistical view of this paradigm. W e consider the data to b e sampled from some underlying abstract probabilit y space. Composing the constructions ab o ve, we consider our top ological summary to b e a random v ariable with v alues in some summary space S . In detail, the probabilit y space (Ω , F , P ) consists of a sample space Ω, a σ -algebra F of ev ents, and a probability measure P . Comp osing our constructions gives a function X : (Ω , F , P ) → ( S , A , P ∗ ), where S is the summary space, whic h we assume has some metric, A is the corresponding Borel σ -algebra, and P ∗ is the probabilit y measure on S obtained b y pushing forward P along X . W e assume that X is measurable and th us X is a random v ariable with v alues in S . Here is a list of what we would like to b e able to do with our top ological summary . Let X 1 , . . . , X n b e a sample of indep endent random v ariables with the same distribution as X . W e would like to hav e a go o d notion of the mean µ of X and the mean X n of the sample; kno w that X n con v erges to µ ; and b e able to calculate X n ( ω ), for ω ∈ Ω, eﬃciently . W e w ould lik e to ha v e information the diﬀerence X n − µ , and b e able to calculate appro ximate conﬁdence in terv als related to µ . Given t w o suc h samples for random v ariables X and Y with v alues in our summary space, w e w ould like to b e able to test the h yp othesis that µ X = µ Y . In order to answ er these questions we also need an eﬃcient algorithm for calculating distances b et ween elemen ts of our summary space. In this article, we construct a top ological summary that w e call the p ersistence landscap e whic h meets these requiremen ts. Our basic idea is to conv ert the barcode into a function in a somewhat additiv e manner. The are many p ossible v ariations of this construction that may result in more suitable summary statistics for certain applications. Hop efully , the theory presen ted here will also b e helpful in those situations. W e remark that while the persistence landscap e has a corresp onding barcode and persistence diagram, the mean persistence landscap e do es not. This is analogous to the situation in which an integer-v alued random v ariable having a Poisson distribution has a summary statistic, the rate parameter, that is not an integer. W e also remark that the reader ma y restrict our Banach space results results to the p erhaps more familiar Hilb ert space setting. How ev er w e will need this generality to prov e stability of the p ersistence landscap e for, say , functions on the n -dimensional sphere where n > 2. There has b een progress tow ards combining the p ersistence diagram and statistics (Mileyk o et al., 2011; T urner et al., 2014; Munc h et al., 2013; Chazal et al., 2013; F asy et al., 2014). Blum b erg et al. (2014) giv e a related statistical approac h to TDA. Kov acev-Nikolic et al. (2014) use the p ersistence landscap e deﬁned here to study the maltose binding complex 2 and Chazal et al. (2014) apply the b o otstrap to the p ersistence landscap e. The p ersistence landscap e is related to the well group deﬁned by Edelsbrunner et al. (2011). In Section 2 we provide the necessary bac kground and deﬁne the p ersistence landscap e and giv e some of its prop erties. In Section 3 w e introduce the statistical theory of p ersistence landscap es, which w e apply to a few examples in Section 4. In Section 5 we pro v e that the p ersistence landscap e is stable and that it pro vides low er b ounds for the previously deﬁned b ottlenec k and W asserstein distances. 2. Topological summaries The tw o standard top ological summaries of data are the b ar c o de and the p ersistenc e diagr am . W e will deﬁne a new closely-related summary , the p ersistenc e landsc ap e , and then compare it to these tw o previous summaries. All of these summaries are deriv ed from the p ersistenc e mo dule , which we now deﬁne. 2.1. P ersistence Mo dules. The main algebraic ob ject of study in topological data analysis is the persistence mo dule. A p ersistenc e mo dule M consists of a v ector space M a for all a ∈ R and linear maps M ( a ≤ b ) : M a → M b for all a ≤ b such that M ( a ≤ a ) is the identit y map and for all a ≤ b ≤ c , M ( b ≤ c ) ◦ M ( a ≤ b ) = M ( a ≤ c ). There are man y wa ys of constructing a p ersistence mo dule. One example starts with a set of points X = { x 1 , . . . , x n } in the plane M = R 2 as sho wn in the top left of Figure 1. T o help understand this conﬁguration, we “thick en” each p oint, by replacing each p oin t, x , with B x ( r ) = { y ∈ M | d ( x, y ) ≤ r } , a disk of ﬁxed radius, r , centered at x . The resulting union, X r = S n i =1 B r ( x i ), is shown in Figure 1 for v arious v alues of r . F or eac h r , w e can calculate H ( X r ), the homology of the resulting union of disks. T o b e precise, H ( − ) denotes H k ( − , F ), the singular homology functor in degree k with co eﬃcients in a ﬁeld F . So H ( X r ) is a vector space that is the quotien t of the k -cycles mo dulo those that are b oundaries. As r increases, the union of disks gro ws, and the resulting inclusions induce maps b etw een the corresp onding homology groups. More precisely , if r ≤ s , the inclusion ι s r : X r  → X s induces a map H ( ι s r ) : H ( X r ) → H ( X s ). The images of these maps are the p ersistent homolo gy groups. The collection of v ector spaces H ( X r ) and linear maps H ( ι s r ) is a p ersistence mo dule. Note that this construction w orks for any set of p oints in R n or more generally in a metric space. The union of balls X r has a nice combinatorial description. The ˇ Ce ch c omplex , ˇ C r ( X ), of the set of balls { B x i ( r ) } is the simplicial complex whose vertices are the p oints { x i } and whose k -simplices corresp ond to k + 1 balls with nonempty intersection (see Figure 1). This is also called the nerve . It is a basic result that if the ambien t space is R n , X r is homotop y equiv alent to its ˇ Cec h complex (Borsuk, 1948). So to obtain the singular homology of the union of balls, one can calculate the simplicial homology of the corresp onding ˇ Cec h complex. The ˇ Cec h complexes { ˇ C r ( X ) } together with the inclusions ˇ C r ( X ) ⊆ ˇ C s ( X ) for r ≤ s form a ﬁltered simplicial complex. Applying simplicial homology w e obtain a p ersistence mo dule. 3 Figure 1. A gro wing union of balls and the 1-skeleton of the corresp onding ˇ Cec h complex. As the radius grows, features—such as connected comp onen ts and holes—app ear and disapp ear. Here, the complexes illustrate the births and deaths of three holes, homology classes in degree one. The corresp onding birth-death pairs are plotted as part of the top left of Figure 2. There exist eﬃcien t algorithms for calculating the p ersistent homology of ﬁltered simplicial complexes (Edelsbrunner et al., 2002; Milosavljevi ´ c et al., 2011; Chen and Kerb er, 2013). The ˇ Cec h complex is often computationally exp ensive, so many v arian ts hav e b een used in computational top ology . A larger, but simpler complex called the Rips complex has as v ertices the p oints x i and has k -simplices corresp onding to k + 1 balls with all pairwise in tersections nonempty . Other p ossibilities include the witness complexes of de Silv a and Carlsson (2004), graph induced complexes by Dey et al. (2013) and complexes built using k ernel densit y estimators and triangulations of the ambien t space (Bub enik et al., 2010). Some of these are used in the examples in Section 4. Giv en an y real-v alued function f : S → R on a top ological space S , we can deﬁne the asso ciated p ersistence mo dule, M ( f ), where M ( f )( a ) = H ( f − 1 (( ∞ , a ])) and M ( f )( a ≤ b ) is induced by inclusion. T aking f to b e the the minim um distance to a ﬁnite set of p oints, X , w e obtain the ﬁrst example. 2.2. P ersistence Landscap es. In this section we deﬁne a num b er of functions deriv ed from a p ersistence mo dule. Examples of each of these are giv en in Figure 2. Let M b e a p ersistence mo dule. F or a ≤ b , the corresp onding Betti numb er of M , is given b y the dimension of the image of the corresp onding linear map. That is, (2.1) β a,b = dim(im( M ( a ≤ b ))) . Lemma 2.1. If a ≤ b ≤ c ≤ d then β b,c ≥ β a,d . Pr o of. Since M ( a ≤ d ) = M ( c ≤ d ) ◦ M ( b ≤ c ) ◦ M ( a ≤ b ), this follows from (2.1).  Our simplest function, which we call the r ank function is the function λ : R 2 → R giv en b y λ ( b, d ) = ( β b,d if b ≤ d 0 otherwise. 4 No w let us c hange co ordinates so that the resulting function is supp orted on the upp er half plane. Let (2.2) m = b + d 2 , and h = d − b 2 . The r esc ale d r ank function is the function λ : R 2 → R giv en b y λ ( m, h ) = ( β m − h,m + h if h ≥ 0 0 otherwise. Muc h of our theory will apply to these simple functions. Ho w ever, the follo wing version, whic h w e will call the p ersistenc e landsc ap e , will ha v e some adv antages. First let us observe that for a ﬁxed t ∈ R , β t −• ,t + • is a decreasing function. That is, Lemma 2.2. F or 0 ≤ h 1 ≤ h 2 , β t − h 1 ,t + h 1 ≥ β t − h 2 ,t + h 2 . Pr o of. Since t − h 2 ≤ t − h 1 ≤ t + h 1 ≤ t + h 2 , b y Lemma 2.1, β t − h 2 ,t + h 2 ≤ β t − h 1 ,t + h 1 .  Deﬁnition 2.3. The p ersistenc e landsc ap e is a function λ : N × R → R , where R denotes the extended real n um b ers, [ −∞ , ∞ ]. Alternativ ely , it may b e though t of as a sequence of functions λ k : R → R , where λ k ( t ) = λ ( k , t ). Deﬁne λ k ( t ) = sup( m ≥ 0 | β t − m,t + m ≥ k ) . The p ersistence landscap e has the following prop erties. Lemma 2.4. (1) λ k ( t ) ≥ 0 , (2) λ k ( t ) ≥ λ k +1 ( t ) , and (3) λ k is 1-Lipschitz. The ﬁrst t w o prop erties follow directly from the deﬁnition. W e prov e the third in the app endix. T o help visualize the graph of λ : N × R → R , w e can extend it to a function λ : R 2 → R by setting (2.3) λ ( x, t ) = ( λ ( d x e , t ) , if x > 0 , 0 , if x ≤ 0 . W e remark that the non-p ersisten t Betti num bers, { dim( M ( t )) } , of a p ersistence mo dule M can b e read oﬀ from the diagonal of the rank function, the m -axis of the rescaled rank function, and from the supp ort of the p ersistence landscap e. 5 2 4 6 8 10 12 2 4 6 8 10 12 0 birth death 1 1 2 2 3 0 0 2 4 6 8 10 12 2 0 λ 1 λ 2 λ 3 2 4 6 8 10 12 2 0 1 1 2 2 3 0 Figure 2. Persistence landscap es for the homology in degree 1 of the example in Figure 1. F or the rank function (top left) and rescaled rank function (top righ t) the v alues of the functions on the corresp onding region are given. The top left graph also contains the three p oin ts of the corresp onding p ersistence diagram. Belo w the top right graph is the corresp onding barco de. W e also ha v e the corresp onding p ersistence landscap e (b ottom left) and its 3d-version (b ottom right). Notice that λ 1 giv es a measure of the dominan t homological feature at eac h p oin t of the ﬁltration. 2.3. Barco des and Persistence Diagrams. All of the information in a (tame) p ersistence mo dule is completely contained in a m ultiset of interv als called a b ar c o de (Zomoro dian and Carlsson, 2005; Crawley-Boevey, 2012; Chazal et al., 2012). Mapping each interv al to its endp oin ts w e obtain the p ersistenc e diagr am . There exist maps in b oth directions b etw een these top ological summaries and our functions. F or an example of corresp onding p ersistence diagrams, barco des and p ersistence landscap es, see Figure 2. Informally , the p ersistence diagram consists of the “upp er-left corners” in our rank function. In the other direction, λ ( b, d ) counts the n um b er of p oin ts in the p ersistence diagram in the upp er left quadrant of ( b, d ). Informally , the barco de consists of the “bases of the triangles” in the rescaled rank function, and the other direction is obtained by “stacking isosceles triangles” whose bases are the interv als in the barco de. W e invite the reader to mak e the mappings precise. F or example, given a p ersistence diagram { ( b i , d i ) } n i =1 , λ k ( t ) = k th largest v alue of min( t − b i , d i − t ) + , where c + denotes max( c, 0). The fact that barco des are a complete in v arian t of p ersistence mo dules is cen tral to these equiv alences. The geometry of the space of persistence diagrams makes it hard to w ork with. F or example, sets of p ersistence diagrams need not ha v e a unique (F r´ ec het) mean (Mileyko et al., 2011). In con trast, the space of p ersistence landscap es is v ery nice. So a set of p ersistence landscap es has a unique mean (3.1). See Figure 3. 6 2 4 6 8 10 12 14 16 2 4 6 8 10 0 2 4 6 8 10 12 14 16 2 4 6 8 10 0 λ 1 λ 2 2 4 6 8 10 12 14 16 2 4 6 8 10 0 λ 1 λ 2 2 4 6 8 10 12 14 16 2 4 6 8 10 0 λ 1 λ 2 Figure 3. Means of p ersistence diagrams and p ersistence landscap es. T op left: the rescaled p ersistence diagrams { (6 , 6) , (10 , 6) } and { (8 , 4) , (8 , 8) } ha ve t w o (F r´ ec het) means: { (7 , 5) , (9 , 7) } and { (7 , 7) , (9 , 5) } . In contrast their cor- resp onding p ersistence landscap es (top righ t and b ottom left) hav e a unique mean (b ottom righ t). Compared to the p ersistence diagram, the barco de has extra information on whether or not the endp oints of the in terv als are included. This ﬁner information is seen in the rank function and rescaled rank function, but not in the p ersistence landscap e. Ho w ev er when w e pass to the corresp onding L p space in Section 2.4, this information disapp ears. 2.4. Norms for P ersistence Landscap es. Recall that for a measure space ( S , A , µ ), and a function f : S → R deﬁned µ -almost everywhere, for 1 ≤ p < ∞ , k f k p =  R | f | p dµ  1 p , and k f k ∞ = ess sup f = inf { a | µ { s ∈ S | f ( s ) > a } = 0 } . F or 1 ≤ p ≤ ∞ , L p ( S ) = { f : S → R | k f k p < ∞} and deﬁne L p ( S ) = L p ( S ) / ∼ , where f ∼ g if k f − g k p = 0. On R and R 2 w e will use the Leb esgue measure. On N × R , w e use the pro duct of the coun ting measure on N and the Leb esgue measure on R . F or 1 ≤ p < ∞ and λ : N × R → R , k λ k p p = ∞ X k =1 k λ k k p p , where λ k ( t ) = λ ( k , t ). By Lemma 2.4(2), k λ k ∞ = k λ 1 k ∞ . If we extend f to λ : R 2 → R , as in (2.3), w e ha v e k λ k p = k λ k p , for 1 ≤ p ≤ ∞ . If λ is any of our functions corresp onding to a barco de that is a ﬁnite collection of ﬁnite in terv als, then λ ∈ L p ( S ) for 1 ≤ p ≤ ∞ , where S equals N × R or R 2 . Let λ bd and λ mh denote the rank function and the rescaled rank function corresp onding to a p ersistence landscap e λ , and let D b e the corresp onding p ersistence diagram. Let p ers 2 ( D ) denote the sum of the squares of the lengths of the interv als in the corresp onding barco de, and let p ers ∞ ( D ) b e the length of the longest interv al. 7 Prop osition 2.5. (1) k λ k 1 = k λ mh k 1 = 1 2 k λ bd k 1 = 1 4 p ers 2 ( D ) , and (2) k λ k ∞ = k λ 1 k ∞ = 1 2 p ers ∞ ( D ) . Pr o of. (1) T o see that k λ k 1 = k λ mh k w e remark that b oth are the volume of the same solid. The c hange of co ordinates implies that k λ mh k 1 = 1 2 k λ bd k 1 . If D = { ( b i , d i ) } , then each p oint ( b i , d i ) contributes h 2 i to the volume k λ mh k 1 , where h i = d i − b i 2 . So k λ mh k 1 = P i h 2 i . Finally , p ers 2 ( D ) = P i (2 h i ) 2 = 4 P i h 2 i . (2) Lemma 2.4(2) implies that k λ k ∞ = k λ 1 k ∞ . If D = { ( b i , d i ) } , then k λ k ∞ = sup i d i − b i 2 .  W e remark that the quantities in 1 and 2 also equal W 2 ( D , ∅ ) 2 and W ∞ ( D , ∅ ) resp ectiv ely (see Section 5 for the corresp onding deﬁnitions). 3. St a tistics with landscapes No w let us take a probabilistic viewp oint. First, w e assume that our p ersistence landscap es lie in L p ( S ) for some 1 ≤ p < ∞ , where S equals N × R or R 2 . In this case, L p ( S ) is a separable Banach space. When p = 2 w e hav e a Hilb ert space; ho wev er, we will not use this structure. In some examples, the p ersistence landscap es will only b e stable for some p > 2 (see Theorem 5.5). 3.1. Landscap es as Banach Space V alued Random V ariables. Let X b e a random v ariable on some underlying probabilit y space (Ω , F , P ), with corresp onding p ersistence landscap e Λ, a Borel random v ariable with v alues in the separable Banac h space L p ( S ). That is, for ω ∈ Ω, X ( ω ) is the data and Λ( ω ) = λ ( X ( ω )) =: λ is the corresponding top ological summary statistic. No w let X 1 , . . . , X n b e indep enden t and iden tically distributed copies of X , and let Λ 1 , . . . , Λ n b e the corresp onding p ersistence landscap es. Using the v ector space structure of L p ( S ), the me an landsc ap e Λ n is giv en b y the p oin twise mean. That is, Λ n ( ω ) = λ n , where (3.1) λ n ( k , t ) = 1 n n X i =1 λ i ( k , t ) . Let us interpret the mean landscap e. If B 1 , . . . , B n are the barco des corresp onding to the p ersistence landscap es λ 1 , . . . , λ n , then for k ∈ N and t ∈ R , λ n ( k , t ) is the av erage v alue of the largest radius int erv al cen tered at t that is contained in k in terv als in the barco des B 1 , . . . , B n . F or those used to w orking with p ersistence diagrams, it is tempting to try to ﬁnd a p ersistence diagram whose p ersistence landscap e is closest to a given mean landscap e. While this is an interesting mathematical question, we w ould lik e to suggest that the more imp ortan t practical issue is using the mean landscap e to understand the data. 8 W e would lik e to b e able to say that the mean landscap e con v erges to the exp ected p ersistence landscap e. T o sa y this precisely w e need some notions from probability in Banach spaces. 3.2. Probabilit y in Banac h Spaces. Here w e present some results from probability in Banac h spaces. F or a more detailed exp osition we refer the reader to Ledoux and T alagrand (2011). Let B b e a real separable Banac h space with norm k·k . Let (Ω , F , P ) b e a probability space, and let V : (Ω , F , P ) → B be a Borel random v ariable with v alues in B . The comp osite k V k : Ω V − → B k·k − → R is a real-v alued random v ariable. Let B ∗ denote the top ological dual space of con tinuous linear real-v alued functions on B . F or f ∈ B ∗ , the comp osite f ( V ) : Ω V − → B f − → R is a real-v alued random v ariable. F or a real-v alued random v ariable Y : (Ω , F , P ) → R , the me an or exp e cte d value , is giv en b y E ( Y ) = R Y dP = R Ω Y ( ω ) dP ( ω ). W e call an element E ( V ) ∈ B the Pettis inte gr al of V if E ( f ( V )) = f ( E ( V )) for all f ∈ B ∗ . Prop osition 3.1. If E k V k < ∞ , then V has a Pettis inte gr al and k E ( V ) k ≤ E k V k . No w let ( V n ) n ∈ N b e a sequence of indep enden t copies of V . F or each n ≥ 1, let S n = V 1 + · · · + V n . F or a sequence ( Y n ) of B -v alued random v ariables, we say that ( Y n ) c onver ges almost sur ely to a B -v alued random v ariable Y , if P (lim n →∞ Y n = Y ) = 1. Theorem 3.2 (Strong La w of Large Num b ers) . ( 1 n S n ) → E ( V ) almost sur ely if and only if E k V k < ∞ . F or a sequence ( Y n ) of B -v alued random v ariables, w e say that ( Y n ) c onver ges we akly to a B -v alued random v ariable Y , if lim n →∞ E ( ϕ ( Y n )) = E ( ϕ ( Y )) for all b ounded con tin u- ous functions ϕ : B → R . A random v ariable G with v alues in B is said to be Gauss- ian if for eac h f ∈ B ∗ , f ( G ) is a real v alued Gaussian random v ariable with mean zero. The c ovarianc e structur e of a B -v alued random v ariable, V , is giv en b y the exp ectations E [( f ( V ) − E ( f ( V )))( g ( V ) − E ( g ( V )))], where f , g ∈ B ∗ . A Gaussian random v ariable is determined b y its cov ariance structure. F rom Hoﬀmann-Jørgensen and Pisier (1976) w e ha v e the follo wing. Theorem 3.3 (Central Limit Theorem) . Assume that B has typ e 2. (F or example B = L p ( S ) , with 2 ≤ p < ∞ .) If E ( V ) = 0 and E ( k V k 2 ) < ∞ then 1 √ n S n c onver ges we akly to a Gaussian r andom variable G ( V ) with the same c ovarianc e structur e as V . 3.3. Con v ergence of P ersistence Landscap es. Now w e will apply the results of the previous section to p ersistence landscap es. Theorem 3.2 directly implies the following. Theorem 3.4 (Strong La w of Large Num b ers for p ersistence landscap es) . Λ n → E (Λ) almost sur ely if and only if E k Λ k < ∞ . 9 Theorem 3.5 (Central Limit Theorem for peristence landscapes) . Assume p ≥ 2 . If E k Λ k < ∞ and E ( k Λ k 2 ) < ∞ then √ n [Λ n − E (Λ)] c onver ges we akly to a Gaussian r andom variable with the same c ovarianc e structur e as Λ . Pr o of. Apply Theorem 3.3 to V = λ ( X ) − E ( λ ( X )).  Next we apply a functional to the p ersistence landscap es to obtain a real-v alued random v ariable that satisﬁes the usual cen tral limit theorem. Corollary 3.6. Assume p ≥ 2 , E k Λ k < ∞ and E ( k Λ k 2 ) < ∞ . F or any f ∈ L q ( S ) with 1 p + 1 q = 1 , let (3.2) Y = Z S f Λ = k f Λ k 1 . Then (3.3) √ n [ Y n − E ( Y )] d − → N (0 , V ar( Y )) . wher e d denotes c onver genc e in distribution and N ( µ, σ 2 ) is the normal distribution with me an µ and varianc e σ 2 . Pr o of. Since V = Λ − E (Λ) satisﬁes the cen tral limit theorem in L p ( S ), for any g ∈ L p ( S ) ∗ , the real random v ariable g ( V ) satisﬁes the central limit theorem in R with limiting Gaussian la w with mean 0 and v ariance E ( g ( V ) 2 ). If we tak e g ( h ) = R S f h , where f ∈ L q ( S ), with 1 p + 1 q = 1, then g ( V ) = Y − E ( Y ) and E ( g ( V ) 2 ) = V ar( Y ).  3.4. Conﬁdence Interv als. The results of Section 3.3 allo w us to obtain approximate con- ﬁdence in terv als for the exp ected v alues of functionals on p ersistence landscap es. Assume that λ ( X ) satisﬁes the conditions of Corollary 3.6 and that Y is a corresp onding real random v ariable as deﬁned in (3.2). By Corollary 3.6 and Slutsky’s theorem w e may use the normal distribution to obtain the approximate (1 − α ) conﬁdence interv al for E ( Y ) using Y n ± z ∗ S n √ n , where S 2 n = 1 n − 1 P n i =1 ( Y i − Y n ) 2 , and z ∗ is the upp er α 2 critical v alue for the normal distri- bution. 3.5. Statistical Inference using Landscap es I. Here we apply the results of Section 3.3 to h yp othesis testing using p ersistence landscap es. Let X 1 , . . . , X n b e an iid copies of the random v ariable X and let X 0 1 , . . . , X 0 n 0 b e an iid copies of the random v ariable X 0 . Assume that the corresp onding p ersistence landscap es Λ, Λ 0 lie in L p ( S ), where p ≥ 2. Let f ∈ L q ( S ), where 1 p + 1 q = 1. Let Y and Y 0 b e deﬁned as in (3.2). Let µ = E ( Y ) and µ 0 = E ( Y 0 ). W e will test the n ull h yp othesis that µ = µ 0 . First w e recall that the sample mean Y = 1 n P n i =1 Y i is an un biased estimator of µ and the sample v ariance 10 s 2 Y = 1 n − 1 P n i =1 ( Y i − Y ) 2 is an un biased estimator of V ar( Y ) and similarly for Y 0 and s 2 Y 0 . By Corollary 3.6, Y and Y 0 are asymptotically normal. W e use the t wo-sample z-test. Let z = Y − Y 0 q S 2 Y n + S 2 Y 0 n 0 , where the denominator is the standard error for the diﬀerence. F rom this standard score a p-v alue may b e obtained from the normal distribution. 3.6. Cho osing a F unctional. T o apply the ab ov e results, one needs to c ho ose a functional, f ∈ L q ( S ). This choice will need to b e made with an understanding of the data at hand. Here w e presen t a couple of options. If eac h λ = Λ( ω ) is supp orted by { 1 , . . . , K } × [ − B , B ], tak e (3.4) f ( k , t ) = ( 1 if t ∈ [ − B , B ] and k ≤ K 0 otherwise. Then k f Λ k 1 = k Λ k 1 . If the parameter v alues for whic h the p ersistence landscap e is nonzero are b ounded b y ± B , then we ha ve a nice choice of functional for the p ersistence landscap e that is una v ailable for the (rescaled) rank function. W e can c ho ose a functional that is sensitive of the ﬁrst K dominan t homological features. That is, using f in (3.4), k f λ k 1 = P K k =1 k Λ k k 1 . Under this w eaker assumption we can also take f k ( t ) = 1 k r χ [ − B ,B ] , where r > 1. Then k f Λ k 1 = P ∞ k =1 1 k r k Λ k ( t ) k 1 . The condition that λ is supp orted by N × [ − B , B ] can often b e enforced by using reduced homology or b y applying extended p ersistence (Cohen-Steiner et al., 2009; Bubenik and Scott, 2014) or b y simply truncating the in terv als in the corresp onding barcode at some ﬁxed v alues. W e remark that certain exp erimental data may hav e b ounds on the n um b er of interv als. F or example, in the protein data considered using the ideas presen ted here in Ko v acev-Nikolic et al. (2014), the simplicial complexes ha ve a ﬁxed n umber of v ertices. 3.7. Statistical Inference using Landscap es I I. The functionals suggested in Section 3.6 in the h yp othesis test given in Section 3.5 may not ha ve enough p ow er to discriminate b et ween t wo groups with diﬀeren t p ersistence in some examples. T o increase the p ow er, one can apply a v ector of functionals and then apply Hotelling’s T 2 test. F or example, consider Y = ( R (Λ 1 − Λ 0 1 ) , . . . , R (Λ K − Λ 0 K )), where K  n 1 + n 2 − 2. This alternative will not be suﬃcien t if the persistence landscapes are translates of eac h other, (see Figure 9). An additional approach is to compute the distance b etw een the mean landscap es of the tw o groups and obtain a p-v alue using a p ermutation test. This is done 11 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 Figure 4. 200 p oin ts w ere sampled from a pair of linked ann uli. Here we sho w the p oin ts and a corresp onding union of balls and 1-sk eleton of the ˇ Cec h complex. This w as rep eated 100 times. Next w e show t wo of the degree one p ersistence landscap es and the mean degree one p ersistence landscap e. in the Section 4.5. This test has b een applied to p ersistence diagrams and barco des (Chung et al., 2009; Robinson and T urner, 2013). 4. Examples The p ersisten t homologies in this section were calculated using jav aPlex (T ausz et al., 2011) and Perseus b y Nanda (2013). Another publicly a v ailable alternativ e is Dion ysus b y Morozo v (2012). In Section 4.4 w e use Matlab co de courtesy of Eliran Subag that implemen ts an algorithm from W o o d and Chan (1994). 4.1. Link ed Ann uli. W e start with a simple example to illustrate the techniques. F ollo wing Munc h et al. (2013), we sample 200 p oints from the uniform distribution on the union of 12 t w o annuli. W e then calculate the corresp onding p ersistence landscap e in degree one using the Vietoris-Rips complex. W e rep eat this 100 times and calculate the mean p ersistence landscap e. See Figure 4. Note that in the degree one barco de of this example, it is very lik ely that there will b e one large in terv al, one smaller in terv al b orn at around the same time, and all other interv als are smaller and die around the time the larger tw o interv als are b orn. 4.2. Random geometric complexes. The (non-p ersistent) homology of random geometric complexes has been studied in Kahle (2011); Kahle and Mec k es (2013); Bobro wski and Adler (2011). W e sample 100 p oin ts from the uniform distribution on the unit cub e [0 , 1] 3 , and calculate the p ersistence landscap es in degrees 0, 1 and 2 of the corresp onding Vietoris-Rips com- plex. In degree 0, w e use reduced homology . W e rep eat this 1000 times and calculate the corresp onding mean p ersistence landscap es. See Figure 5. Since the num ber of simplices is b ounded and the ﬁltration is b ounded, these p ersistence landscap es ha v e ﬁnite supp ort. As discussed in Section 3.6, w e can c ho ose the functional giv en b y the indicator function on this supp ort. W e obtain the real random v ariable Y = k λ ( X ) k 1 = 1 4 p ers 2 ( D ( X )), where D ( X ) is the p ersistence diagram corresp onding to λ ( X ). F ollowing Section 3.4 w e calculate the appro ximate 95% conﬁdence interv als of E ( Y ) in degrees 0, 1 and 2 to b e [0 . 1534 , 0 . 1545], [0 . 0064 , 0 . 0066] and [0 . 0002 , 0 . 0003]. R emark 4.1 . The graphs in Figure 5 ma y b e though t of as a p ersistent homology v ersion of the graph in Figure 2 of Kahle and Meck es (2013). 4.3. Erd¨ os-R ´ en yi random clique complexes. The (non-persistent) homology of the ran- dom complexes in this section has b een studied Kahle and Meck es (2013). Let G ( n ) b e the following random ﬁltered graph. There are n v ertices with ﬁltration v alue 0. Eac h of the p ossible  n 2  edges has a ﬁltration v alue which is chosen indep enden tly from a uniform distribution on [0 , 1]. Let X ( n ) b e the clique complex of G ( n ). In Figure 6 w e show the mean p ersistence landscap es of a sample of 10 indep endent copies of G (100) in degrees 0, 1, 2, and 3, where in degree 0 w e use reduced homology . F or computational reasons, w e only considered the sub complex of G (100) with ﬁltration v alues at most 0.55. The graphs in Figure 6 are a p ersistent homology v ersion of Figure 1 in Kahle and Mec k es (2013). In fact the latter graphs are giv en b y the supp ort of the graphs in Figure 6. As in the example in Section 4.2, w e let Y = k λ ( X ) k 1 = 1 4 p ers 2 ( D ( X )). The appro ximate 95% conﬁdence interv als of E ( Y ) in degrees 0, 1, 2 and 3 are estimated to b e [0 . 0034 , 0 . 0039], [0 . 751 , 0 . 777], [1 . 971 , 2 . 041] and [2 . 591 , 2 . 618]. 13 0 10 20 30 40 50 60 70 80 90 100 0 5 10 15 20 25 30 Figure 5. The mean p ersistence landscap es of 1000 Vietoris-Rips complexes of 100 p oints sampled uniformly from the cub e [0 , 1] 3 . The top left, top righ t and b ottom left are homology in degrees 0, 1 and 2. The b ottom righ t is the sup erp osition of all three. Note that the ﬁltration parameter has been rescaled, so 100 in the width and height of the graphs equals 0.3. Figure 6. The mean p ersistence landscap es in degrees 0–3 from 10 copies of the random clique complex G ( n ). Note that w e ha ve rescaled the ﬁltration by a factor of 100. 14 4.4. Gaussian Random Fields. The top ology of Gaussian random ﬁelds is of in terest in statistics. The Euler c haracteristic of sup erlevel sets of a Gaussian random ﬁeld ma y b e calculated using the Gaussian Kinematic F ormula of Adler and T aylor (2007). The p ersisten t homology of Gaussian random ﬁelds has b een considered b y Adler et al. (2010) and its exp ected Euler characteristic has b een obtained by Bobrowski and Borman (2012). Here w e consider a stationary Gaussian random ﬁeld on [0 , 1] 2 with auto co v ariance function γ ( x, y ) = e − 400( x 2 + y 2 ) . See Figure 7. W e sample this ﬁeld on a 100 by 100 grid, and calculate the p ersistence landscap e of the sublevel set. F or homology in degree 0, we truncate the inﬁnite interv al at the maximum v alue of the ﬁeld. W e calculate the mean p ersistence landscap es in degrees 0 and 1 from 100 samples (see Figure 7, where w e ha v e rescaled the ﬁltration b y a factor of 100). In the Gaussian random ﬁeld literature, it is more common to consider sup erlevel sets. Ho w ever, by symmetry , the exp ected p ersistence landscap e in this case is the same except for a c hange in the sign of the ﬁltration. W e rep eat this calculation for a similar Gaussian random ﬁeld on [0 , 1] 3 , this time using reduced homology . See Figure 7. This time we sample on a 25 × 25 × 25 grid. 4.5. T orus and Sphere. Here we combine p ersistence landscap es and statistical inference to discriminate b etw een iid samples of 1000 p oints from a torus and a sphere in R 3 with the same surface area, using the uniform surface area measure as describ ed by Diaconis et al. (2012) (see Figure 8). T o b e precise, w e use the torus given b y ( r − 2) 2 + z 2 = 1 in cylindrical co ordinates, and the sphere given by r 2 = 2 π in spherical co ordinates. F or these p oints, w e construct a ﬁltered simplicial complex as follo ws. First we triangulate the underlying space using the Co xeter–F reuden thal–Kuhn triangulation, starting with a cubical grid with sides of length 1 2 . Next w e smo oth our data using a triangular kernel with bandwidth 0.9. W e ev aluate this kernel densit y estimator at the vertices of our simplicial complex. Finally , w e ﬁlter our simplicial complex as follows. F or ﬁltration lev el − r , w e include a simplex in our triangulation if and only if the kernel densit y estimator has v alues greater than or equal to r at all of its vertices. Three stages in the ﬁltration for one of the samples are shown in (see Figure 8). W e then calculate the p ersistence landscap e of this ﬁltered simplicial complex for 100 samples and plot the mean landscap es (see Figure 8). W e observ e that the large p eaks corresp ond to the Betti num b ers of the torus and sphere. Since the supp ort of the p ersistence landscap es is b ounded, we can use the in tegral of the landscap es to obtain a real v alued random v ariable that satisﬁes (3.3). W e use a t w o-sample z-test to test the n ull hypothesis that these random v ariables hav e equal mean. F or the landscap es in dimensions 0 and 2 w e cannot reject the null hypothesis. In dimension 1 we do reject the null hypothesis with a p-v alue of 3 × 10 − 6 . W e can also c ho ose a functional that only integrates the p ersistence landscap e λ ( k , t ) for certain ranges of k . In dimension 1, with k = 1 or k = 2 there is a statistically signiﬁcant 15 Figure 7. Mean landscap es of Gaussian random ﬁelds. The graph of a Gauss- ian random ﬁeld on [0 , 1] 2 (top left) and its corresp onding mean landscap es (middle ro w) in degrees 0 and 1. The 0-isosurface of a Gaussian random ﬁeld on [0 , 1] 3 (top right) and the corresp onding mean landscap es in degrees 0, 1 and 2 (b ottom ro w). diﬀerence (p-v alues of 10 − 8 and 3 × 10 − 6 ), but not for k > 2. In dimension 2, there is not a signiﬁcant diﬀerence for k = 1, but there is a signiﬁcant diﬀerence for k > 1 (p-v alue < 10 − 4 ). No w w e increase the diﬃcult y by adding a fair amount of Gaussian noise to the p oin t samples (see Figure 9) and using only 10 samples for eac h surface. This time we calculate the L 2 distances betw een the mean landscapes. W e use the p erm utation test with 10,000 repetitions to determine if this distance is statistically signiﬁcant. There is a signiﬁcan t diﬀerence in dimension 0, with a p v alue of 0.0111. This is surprising, since the mean landscap es lo ok v ery similar. How ev er, on closer insp ection, they are shifted slightly (see Figure 9). Note that we are detecting a geometric diﬀerence, not a top ological one. This sho ws that this statistic is quite p ow erful. There is also a signiﬁcant diﬀerence in dimensions 1 and 2, with p v alues of 0.0000 and 0.0000, resp ectiv ely . 16 Figure 8. W e sample 1000 p oin ts for a torus and sphere, 100 times each, construct the corresp onding ﬁltered simplicial complexes and calculate p ersis- ten t homology . In columns 1, 2 and 3, w e ha v e the mean persistence landscap e in dimension 0, 1 and 2 of the torus in ro w 3 and the sphere in ro w 4. 5. Landscape Dist ance and St ability In this section w e deﬁne the landscape distance and use it to sho w that the persistence landscap e is a stable summary statistic. W e also show that the landscap e distance gives lo w er b ounds for the b ottlenec k and W asserstein distances. W e defer the pro ofs of the results of this section to the app endix. Let M and M 0 b e p ersistence mo dules as deﬁned in Section 2.1 and let λ and λ 0 b e their corresp onding p ersistence landscap es as deﬁned in Section 2.2. F or 1 ≤ p ≤ ∞ , deﬁne the 17 Figure 9. W e again sample 1000 p oints sampled from a torus (top left) and sphere (top middle), this time with Gaussian noise. W e show the torus from the p ersp ective that mak es it easiest to see the hole in the middle. W e calculate p ersisten t homology from 10 samples. In columns 1, 2 and 3, we ha v e the mean p ersistence landscap e in dimension 0, 1 and 2, resp ectively , with the torus in ro w 2 and the sphere in ro w 3. The top righ t is a graph of the diﬀerence b et ween the mean landscap es in dimension 0. p -landsc ap e distanc e b et w een M and M 0 b y Λ p ( M , M 0 ) = k λ − λ 0 k p . Similarly , if λ and λ 0 are the p ersistence landscap es corresp onding to p ersistence diagrams D and D 0 (Section 2.3), then we deﬁne Λ p ( D , D 0 ) = k λ − λ k p . Giv en a real v alued function f : X → R on a top ological space X , let M ( f ) denote b e the corresp onding p ersistence mo dule deﬁned at the end of Section 2.1. Theorem 5.1 ( ∞ -Landscap e Stabilit y Theorem) . L et f , g : X → R . Then Λ ∞ ( M ( f ) , M ( g )) ≤ k f − g k ∞ . 18 Th us the p ersistence landscap e is stable with resp ect to the supremum norm. W e remark that there are no assumptions on f and g , not even the q-tame condition of Chazal et al. (2012). Let D b e a p ersistence diagram. F or x = ( b, d ) ∈ D , let  = d − b denote the p ersistenc e of x . If D = { x j } , let P ers k ( D ) = P j  k j denote the de gr e e- k total p ersistenc e of D . No w let us consider a p ersistence diagram to b e an equiv alence class of m ultisets of pairs ( b, d ) with b ≤ d , where D ∼ D q { ( t, t ) } for any t ∈ R . That is, to any p ersistence diagram, w e can freely adjoin p oin ts on the diagonal. This is reasonable, since p oin ts on the diagonal ha v e zero p ersistence. Each p ersistence diagram has a unique representativ e ˆ D without an y p oin ts on the diagonal. W e set | D | = | ˆ D | . W e also remark that Pers k ( D ) is w ell deﬁned. By allowing ourselves to add as man y p oin ts on the diagonal as necessary , there exists bijec- tions b etw een an y tw o p ersistence diagrams. An y bijection ϕ : D ∼ = − → D 0 can b e represen ted b y ϕ : x j 7→ x 0 j , where j ∈ J with | J | = | D | + | D 0 | . F or a given ϕ , let x j = ( b j , d j ), x 0 j = ( b 0 j , d 0 j ) and ε j = k x j − x 0 j k ∞ = max( | b j − b 0 j | , | d j − d 0 j | ). The b ottlene ck distanc e (Cohen-Steiner et al., 2007) b etw een p ersistence diagrams D and D 0 is giv en b y W ∞ ( D , D 0 ) = inf ϕ : D ∼ = − → D 0 sup j ε j , where the inﬁmum is tak en ov er all bijections from D to D 0 . It follows that for the empty p ersistence diagram ∅ , W ∞ ( D , ∅ ) = 1 2 sup j  j . The ∞ -landscap e distance is b ounded b y the b ottlenec k distance. Theorem 5.2. F or p ersistenc e diagr ams D and D 0 , Λ ∞ ( D , D 0 ) ≤ W ∞ ( D , D 0 ) . F or p ≥ 1, the p -Wasserstein distanc e (Cohen-Steiner et al., 2010) b et w een D and D 0 is giv en b y W p ( D , D 0 ) = inf ϕ : D ∼ = − → D 0 " X j ε p j # 1 p . W e remark that the W asserstein distance gives equal w eigh ting to the ε j while the landscap e distance gives a stronger weigh ting to ε j if x j has larger p ersistence. The landscap e distance is most closely related to a weigh ted version of the W asserstein distance that w e no w deﬁne. The p ersistenc e weighte d p -Wasserstein distanc e b etw een D and D 0 is giv en b y W p ( D , D 0 ) = inf ϕ : D ∼ = − → D 0 " X j  j ε p j # 1 p . Note that it is asymmetric. 19 F or the remainder of the section w e assume that D and D 0 are ﬁnite. The following result b ounds the p -landscap e distance. Recall that  j is the p ersistence of x j ∈ D and when ϕ : x j 7→ x 0 j , ε j = k x j − x 0 j k ∞ Theorem 5.3. If n = | D | + | D | then Λ p ( D , D 0 ) p ≤ min ϕ : D ∼ = − → D 0 " n X j =1  j ε p j + 2 p + 1 n X j =1 ε p +1 j # . F rom this we can obtain a lo w er b ound on the p -W asserstein distance. Corollary 5.4. W p ( D , D 0 ) p ≥ min  1 , 1 2 h W ∞ ( D , ∅ ) + 1 p +1 i − 1 Λ p ( D , D 0 ) p  . F or our ﬁnal stabilit y theorem, w e use ideas from Cohen-Steiner et al. (2010). Let f : X → R b e a function on a top ological space. W e say that f is tame if for all but ﬁnitely many a ∈ R , the asso ciated p ersistence mo dule M ( f ) is constant and ﬁnite dimensional on some op en interv al con taining a . F or suc h an f , let D ( f ) denote the corresp onding p ersistence diagram. If X is a metric space w e say that f is Lipschitz if there is some constant c such that | f ( x ) − f ( y ) | ≤ c d ( x, y ) for all x, y ∈ X . W e let Lip( f ) denote the inﬁmum of all suc h c . W e sa y that a metric space X implies b ounde d de gr e e- k total p ersistenc e if there is a constant C X,k suc h that P ers k ( D ( f )) ≤ C X,k for all tame Lipsc hitz functions f : X → R suc h that Lip( f ) ≤ 1. F or example, as observed b y Cohen-Steiner et al. (2010), if X is the n -dimensional sphere, then X = S n has b ounded k -p ersistence for k = n + δ for any δ > 0, but do es not ha v e b ounded k -p ersistence for k < n . Theorem 5.5 ( p -Landscap e stabilit y theorem) . L et X b e a triangulable, c omp act metric sp ac e that implies b ounde d de gr e e- k total p ersistenc e for some r e al numb er k ≥ 1 , and let f and g b e two tame Lipschitz functions. Then Λ p ( D ( f ) , D ( g )) p ≤ C k f − g k p − k ∞ , for al l p ≥ k , wher e C = C X,k k f k ∞ (Lip( f ) k + Lip( g ) k ) + C X,k +1 1 p +1 (Lip( f ) k +1 + Lip( g ) k +1 ) . Th us the p ersistence diagram is stable with resp ect to the p -landscap e distance if p > k , where X has b ounded degree- k total p ersistence. This is the same condition as for the stabilit y of the p -W asserstein distance in Cohen-Steiner et al. (2010). Equiv alen tly , the p ersistence landscap e is stable with resp ect to the p -norm if p > k , where X has b ounded degree- k total p ersistence. Ac kno wledgmen ts. The author would lik e to thank Rob ert Adler, F rederic Chazal, Her- b ert Edelsbrunner, Giseon Heo, Sa yan Mukherjee and Stephen Rush for helpful discussions. Thanks to Juny ong P ark for suggesting Hotelling’s T 2 test. Also thanks to the anon ymous referees who made a n um b er of helpful commen ts to improv e the exp osition. In addition, the author gratefully ac knowledges the supp ort of the Air F orce Oﬃce of Scientiﬁc Researc h (AF OSR gran t F A9550-13-1-0115). 20 Appendix A. Proofs Pr o of of L emma 2.4(3). W e will pro v e that λ k is 1-Lipschitz. That is, | λ k ( t ) − λ k ( s ) | ≤ | t − s | , for all s, t ∈ R . Let s, t ∈ R . Without loss of generalit y , assume that λ k ( t ) ≥ λ k ( s ) ≥ 0. If λ k ( t ) ≤ | t − s | , then λ k ( t ) − λ k ( s ) ≤ λ k ( t ) ≤ | t − s | and we are done. So assume that λ k ( t ) > | t − s | . Let 0 < h < λ k ( t ) − | t − s | . Then t − λ k ( t ) < s − h < s + h < t + λ k ( t ). Th us, by Lemma 2.1 and Deﬁnition 2.3, β s − h,s + h ≥ k . It follo ws that λ k ( s ) ≥ λ k ( t ) − | t − s | . Thus λ k ( t ) − λ k ( s ) ≤ | t − s | .  Theorems 5.1 and 5.2 follow from the next result whic h is of indep enden t interest. F ollo wing Chazal et al. (2009), we say that tw o p ersistence mo dules M and M 0 are ε -interle ave d if for all a ∈ R there exist linear maps ϕ a : M a → M 0 a + ε and ψ : M 0 a → M a + ε suc h that for all a ∈ R , ψ a + ε ◦ ϕ a = M ( a ≤ a + 2 ε ) and ϕ a + ε ◦ ψ a = M 0 ( a ≤ a + 2 ε ) and for all a ≤ b M 0 ( a + ε ≤ b + ε ) ◦ ϕ a = ϕ b ◦ M ( a ≤ b ) and M ( a + ε ≤ b + ε ) ◦ ψ a = ψ b ◦ M 0 ( a ≤ b ). F or p ersistence mo dules M and M 0 deﬁne the interle aving distanc e b et ween M and M 0 b y d I ( M , M ) = inf ( ε | M and M 0 are ε -in terlea ved) . Theorem A.1. Λ ∞ ( M , M 0 ) ≤ d I ( M , M 0 ) . Pr o of. Assume that M and M 0 are ε -interlea v ed. Then for t ∈ R and m ≥ ε , the map M ( t − m ≤ t + m ) factors through the map M 0 ( t − m + ε ≤ t + m − ε ). So by Lemma 2.1, β t − m + ε,t + m − ε ( M 0 ) ≥ β t − m,t + m ( M ). Th us b y Deﬁnition 2.3, λ 0 ( k , t ) ≥ λ ( k , t ) − ε for all k ≥ 1. It follo ws that k λ − λ 0 k ∞ ≤ ε .  Pr o of of The or em 5.1. Com bining Theorem A.1 with the stability theorem of Bub enik and Scott (2014), w e ha v e Λ ∞ ( M ( f ) , M ( g )) ≤ d I ( M ( f ) , M ( g )) ≤ k f − g k ∞ .  Pr o of of The or em 5.2. F or a p ersistence diagram D , consider the p ersistence mo dule given b y the corresp onding sum of interv al mo dules (Chazal et al., 2012), M ( D ) = ⊕ ( a,b ) ∈ ˆ D I ( a, b ). Com bining Theorem A.1 with Theorem 4.9 of Chazal et al. (2012) w e hav e Λ ∞ ( M ( D ) , M ( D 0 )) ≤ d I ( M ( D ) , M ( D 0 )) ≤ W ∞ ( D , D 0 ).  Pr o of of The or em 5.3. Let ϕ : D ∼ = − → D 0 with ϕ ( x j ) = x 0 j . Let λ = λ ( D ) and λ 0 = λ ( D 0 ). So Λ p ( D , D 0 ) p = k λ − λ 0 k p p . k λ − λ 0 k p p = Z | λ ( k , t ) − λ 0 ( k , t ) | p = n X k =1 Z | λ k ( t ) − λ 0 k ( t ) | p dt = Z n X k =1 | λ k ( t ) − λ 0 k ( t ) | p dt 21 Fix t . Let u j ( t ) = λ ( { x j } )(1 , t ) and v j ( t ) = λ ( { x 0 j } )(1 , t ). F or eac h t , let u (1) ( t ) ≤ · · · ≤ u ( n ) ( t ) denote an ordering of u 1 ( t ) , . . . , u n ( t ) and deﬁne v ( k ) ( t ) for 1 ≤ k ≤ n similarly . Then u ( k ) ( t ) = λ k ( t ) and v ( k ) ( t ) = λ 0 k ( t ) (see Figure 2). W e obtain the result from the follo wing where the t w o inequalities are pro v en in Lemmata A.2 and A.3. k λ − λ 0 k p p = Z n X k =1 | u ( k ) ( t ) − v ( k ) ( t ) | p dt ≤ Z n X k =1 | u k ( t ) − v k ( t ) | p dt = n X j =1 Z | u j ( t ) − v j ( t ) | p dt ≤ n X j =1  j ε p j + 2 p + 1 n X j =1 ε p +1 j .  Lemma A.2. L et u 1 , . . . , u n ∈ R and v 1 , . . . , v n ∈ R . Or der them u (1) ≤ · · · ≤ u ( n ) and v (1) ≤ · · · ≤ v ( n ) . Then n X j =1 | u ( j ) − v ( j ) | p ≤ n X j =1 | u j − v j | p . Pr o of. Assume u 1 < · · · < u n , v 1 < · · · < v n , and p ≥ 1. Let u and v denote ( u 1 , . . . , u n ) and ( v 1 , . . . , v n ). Let Σ n denote the symmetric group on n letters and let f n : Σ n → R b e giv en b y f n ( σ ) = P n j =1 | u j − v σ ( j ) | p . W e will pro v e by induction that if f n ( σ ) is minimal then σ is the iden tit y , which we denote by 1. F or n = 1 this is trivial. F or n = 2 assume without loss of generalit y that u 1 = 0, u 2 = 1 and 0 ≤ v 1 < v 2 . Let 1 and τ denote the elements of Σ 2 . Then f (1) = v p 1 + | 1 − v 2 | p and f ( τ ) = v p 2 + | 1 − v 1 | p . Notice that f (1) < f ( τ ) if and only if v p 1 − | 1 − v 1 | p < v p 2 − | 1 − v 2 | p . The result follows from c hec king that g ( x ) = x p − | 1 − x | p is an increasing function for x ≥ 0. No w assume that the statement is true for some n ≥ 2. Assume that f n +1 ( σ ∗ ) is minimal. Fix 1 ≤ i ≤ n + 1. Let u 0 = ( u 1 , . . . , ˆ u i , . . . , u n +1 ) and v 0 = ( v 1 , . . . , ˆ v σ ∗ ( i ) , . . . , v n +1 ), where ˆ · denotes omission. Since f n +1 ( σ ∗ ) is minimal for u and v , it follo ws that P n j =1 ,j 6 = i | u j − v σ ∗ ( j ) | is minimal for u 0 and v 0 . By the induction h yp othesis, for 1 ≤ j < k ≤ n + 1 and j, k 6 = i , σ ∗ ( j ) < σ ∗ ( k ). Therefore σ ∗ = 1. Thus, by induction, the statement is true for all n . Hence P n j =1 | u ( j ) − v ( j ) | p ≤ P n j =1 | u j − v j | p if u (1) < · · · < u ( n ) and v (1) < · · · < v ( n ) . The statemen t in the lemma follo ws b y con tinuit y .  Lemma A.3. L et x = ( b, d ) and x 0 = ( b 0 , d 0 ) wher e b ≤ d and b 0 ≤ d 0 . L et  = d − b and ε = k x − x 0 k ∞ . Then k λ ( { x } ) − λ ( { x 0 } ) k p p ≤ ε p + 2 p +1 ε p +1 . 22 Pr o of. Let λ = λ ( { x } ) and λ 0 = λ ( { x 0 } ). First λ k = λ 0 k = 0 for k > 1; so k λ − λ 0 k p = k λ 1 − λ 0 1 k p . Second λ 1 ( t ) = ( h − | t − m | ) + , where h = d − b 2 , m = b + d 2 , and y + = max( y , 0), and similarly for λ 0 1 (see Figure 2). Fix x and ε . As x 0 mo v es along the square k x − x 0 k ∞ = ε , k λ 1 − λ 0 1 k p p has a maximum if x 0 = ( a − ε, b + ε ). In this case k λ 1 − λ 0 1 k p p = 2 R h 0 ε p dt + 2 R ε 0 t p dt = ε p + 2 p +1 ε p +1 .  Pr o of of Cor ol lary 5.4. Let ϕ : D ∼ = − → D 0 b e a minimizer for W p ( D , D 0 ), with corresp onding { ε j } . Assume that W p ( D , D 0 ) ≤ 1. Then W p ( D , D 0 ) p = P n j =1 ε p j ≤ 1. So for 1 ≤ j ≤ n , ε j ≤ 1. Combining this with Theorem 5.3, w e ha ve that (A.1) Λ p ( D , D 0 ) p ≤ n X j =1   j + 2 p + 1  ε p j . Since W ∞ ( D , ∅ ) = max 1 2  j ,  j ≤ 2 W ∞ ( D , ∅ ). Hence (A.2) Λ p ( D , D 0 ) p ≤ 2  W ∞ ( D , ∅ ) + 1 p + 1  W p ( D , D 0 ) p . Therefore W p ( D , D 0 ) p ≥ 1 or W p ( D , D 0 ) p ≥ 1 2 h W ∞ ( D , ∅ ) + 1 p +1 i − 1 Λ p ( D , D 0 ) p . The state- men t of the corollary follo ws.  Theorem 5.5 follows from the follo wing corollary to Theorem 5.3 whic h is of indep enden t in terest. Corollary A.4. L et p ≥ k ≥ 1 . Then Λ p ( D , D 0 ) p ≤ W ∞ ( D , D 0 ) p − k  W ∞ ( D , ∅ )(Pers k ( D ) + Pers k ( D 0 ))+ 1 p + 1 (P ers k +1 ( D ) + Pers k +1 ( D 0 ))  Pr o of. Let ϕ b e a minimizer for W ∞ ( D , D 0 ) with corresp onding { ε j } . If ε j > ` j 2 + ` 0 j 2 then mo dify ϕ to pair x j = ( b j , d j ) with ¯ x j = ( b j + d j 2 , b j + d j 2 ) and similarly for x 0 j . Note that k x j − ¯ x j k ∞ = ` j 2 and k x 0 j − ¯ x 0 j k ∞ = ` 0 j 2 , so ϕ is still a minimizer for W ∞ ( D , D 0 ). Recall that for all j ,  j ≤ 2 W ∞ ( D , ∅ ). Since ϕ is a minimizer for W ∞ ( D , D 0 ), for all j , ε j ≤ W ∞ ( D , D 0 ). So applying our c hoice of ϕ to Theorem 5.3 w e ha v e, Λ p ( D , D 0 ) p ≤ W ∞ ( D , D 0 ) p − k " 2 W ∞ ( D , ∅ ) n X j =1 ε k j + 2 p + 1 n X j =1 ε k +1 j # . No w ε q j ≤  ` j 2 + ` 0 j 2  q ≤ 1 2  (  j ) q + (  0 j ) q  for q ≥ 1, where the righ t hand side follows b y the con v exity of α ( x ) = x q for q ≥ 1. Thus P n j =1 ε q j ≤ 1 2 (P ers q ( D ) + Pers q ( D 0 )) for q ≥ 1. The result follo ws.  23 Pr o of of The or em 5.5. Theorem 5.5 follows from Corollary A.4 by the follo wing tw o obser- v ations. First, by the stability theorem of Cohen-Steiner et al. (2007), W ∞ ( D ( f ) , D ( g )) ≤ k f − g k ∞ and W ∞ ( D ( f ) , ∅ ) ≤ k f k ∞ . Second, if P ers q ( D ( f )) ≤ C X,q for all tame Lips- c hitz functions f : X → R with Lip( f ) ≤ 1, then for general tame Lipsc hitz functions, P ers q ( D ( f )) ≤ C X,q Lip( f ) q .  References Rob ert J. Adler and Jonathan E. T a ylor. R andom Fields and Ge ometry . Springer Mono- graphs in Mathematics. Springer, New Y ork, 2007. ISBN 978-0-387-48112-8. Rob ert J. Adler, Omer Bobro wski, Matthew S. Borman, Eliran Subag, and Shmuel W ein- b erger. Persisten t homology for random ﬁelds and complexes. In Borr owing Str ength: The ory Powering Applic ations—a Festschrift for Lawr enc e D. Br own , v olume 6 of Inst. Math. Stat. Col le ct. , pages 124–143. Inst. Math. Statist., Beac h woo d, OH, 2010. Andrew J. Blumberg, Itamar Gal, Mic hael A. Mandell, and Matthew Pancia. Robust sta- tistics, h yp othesis testing, and conﬁdence interv als for p ersisten t homology on metric measure spaces. F ound. Comput. Math. , 14(4):745–789, 2014. ISSN 1615-3375. doi: 10.1007/s10208- 014- 9201- 4. URL http://dx.doi.org/10.1007/s10208- 014- 9201- 4 . Omer Bobrowski and Rob ert J. Adler. Distance functions, critical p oints, and top ology for some random complexes. arXiv:1107.4775 [math.A T], 2011. Omer Bobrowski and Matthew Strom Borman. Euler integration of Gaussian random ﬁelds and p ersisten t homology . J. T op ol. Anal. , 4(1):49–70, 2012. ISSN 1793-5253. Karol Borsuk. On the im b edding of systems of compacta in simplicial complexes. F und. Math. , 35:217–234, 1948. ISSN 0016-2736. P eter Bub enik and Jonathan A. Scott. Categoriﬁcation of p ersistent homology . Discr ete Comput. Ge om. , 51(3):600–627, 2014. ISSN 0179-5376. P eter Bub enik, Gunnar Carlsson, P eter T. Kim, and Zhi-Ming Luo. Statistical top ology via Morse theory p ersistence and nonparametric estimation. In Algebr aic Metho ds in Statistics and Pr ob ability II , v olume 516 of Contemp. Math. , pages 75–92. Amer. Math. So c., Pro vidence, RI, 2010. Gunnar Carlsson. T op ology and data. Bul l. Amer. Math. So c. (N.S.) , 46(2):255–308, 2009. ISSN 0273-0979. Gunnar Carlsson, Tigran Ishkhanov, Vin de Silv a, and Afra Zomoro dian. On the local b eha vior of spaces of natural images. Int. J. Comput. Vision , 76(1):1–12, 2008. ISSN 0920-5691. F r´ ed ´ eric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J. Guibas, and Steve Y. Oudot. Pro ximit y of p ersistence mo dules and their diagrams. In Pr o c e e dings of the 25th Annual Symp osium on Computational Ge ometry , SCG ’09, pages 237–246, New Y ork, NY, USA, 2009. A CM. ISBN 978-1-60558-501-7. F rederic Chazal, Vin de Silv a, Marc Glisse, and Steve Oudot. The structure and stability of p ersistence mo dules. arXiv:1207.3674 [math.A T], 2012. F r´ ed ´ eric Chazal, Marc Glisse, Catherine Labru` ere, and Bertrand Michel. Optimal rates of con v ergence for p ersistence diagrams in top ological data analysis. 2013. [math.ST]. 24 F r´ ed ´ eric Chazal, Brittan y T erese F asy , F abrizio Lecci, Alessandro Rinaldo, and Larry W asser- man. Sto c hastic conv ergence of p ersistence landscap es and silhouettes. Symp osium on Computational Ge ometry (SoCG) , 2014. Chao Chen and Mic hael Kerber. An output-sensitive algorithm for p ersisten t homology . Comput. Ge om. , 46(4):435–447, 2013. ISSN 0925-7721. Mo o K. Ch ung, Peter Bub enik, and P eter T. Kim. Persistence diagrams in cortical surface data. In Information Pr o c essing in Me dic al Imaging (IPMI) 2009 , volume 5636 of L e ctur e Notes in Computer Scienc e , pages 386–397, 2009. Da vid Cohen-Steiner, Herb ert Edelsbrunner, and John Harer. Stability of p ersistence dia- grams. Discr ete Comput. Ge om. , 37(1):103–120, 2007. ISSN 0179-5376. Da vid Cohen-Steiner, Herb ert Edelsbrunner, and John Harer. Extending p ersistence using Poincar ´ e and Lefschetz dualit y . F ound. Comput. Math. , 9(1):79–103, 2009. ISSN 1615- 3375. Da vid Cohen-Steiner, Herb ert Edelsbrunner, John Harer, and Y uriy Mileyko. Lipschitz functions ha ve L p -stable p ersistence. F ound. Comput. Math. , 10(2):127–139, 2010. ISSN 1615-3375. William Crawley-Boevey . Decomp osition of p oint wise ﬁnite-dimensional p ersistence mo d- ules. arXiv:1210.0819 [math.R T], 2012. Vin de Silv a and Gunnar Carlsson. T op ological estimation using witness complexes. Eur o- gr aphics Symp osium on Point-Base d Gr aphics , 2004. Vin De Silv a and Rob ert Ghrist. Co v erage in sensor net works via persistent homology . A lgebr. Ge om. T op ol. , 7:339–358, 2007a. Vin De Silv a and Rob ert Ghrist. Homological sensor net works. Notic. Amer. Math. So c. , 54 (1):10–17, 2007b. Mary-Lee Dequ´ ean t, Sebastian Ahnert, Herb ert Edelsbrunner, Thomas M. A. Fink, Earl F. Glynn, Ga y e Hattem, Andrzej Kudlicki, Y uriy Mileyko, Jason Morton, Arcady R. Mushe- gian, Lior Pac h ter, Maga Rowic k a, Anne Shiu, Bernd Sturmfels, and Olivier P ourqui´ e. Comparison of pattern detection metho ds in microarra y time series of the segmentation clo c k. PL oS ONE , 3(8):e2856, 08 2008. T amal Krishna Dey , F engtao F an, and Y usu W ang. Graph induced complex on p oin t data. In Pr o c e e dings of the Twenty-ninth Annual Symp osium on Computational Ge ometry , SoCG ’13, pages 107–116, New Y ork, NY, USA, 2013. A CM. ISBN 978-1-4503-2031-3. P ersi Diaconis, Susan Holmes, and Mehrdad Shahshahani. Sampling from a manifold. arXiv:1206.6913 [math.ST], 2012. Herb ert Edelsbrunner, David Letscher, and Afra Zomoro dian. T op ological p ersistence and simpliﬁcation. Discr ete Comput. Ge om. , 28(4):511–533, 2002. ISSN 0179-5376. Discrete and computational geometry and graph drawing (Columbia, SC, 2001). Herb ert Edelsbrunner, Dmitriy Morozov, and Amit P atel. Quan tifying transv ersalit y by measuring the robustness of intersections. F ound. Comput. Math. , 11(3):345–361, 2011. ISSN 1615-3375. Brittan y T erese F asy , F abrizio Lecci, Alessandro Rinaldo, Larry W asserman, Siv araman Balakrishnan, and Aarti Singh. Conﬁdence sets for p ersistence diagrams. Ann. Statist. , 42(6):2301–2339, 2014. ISSN 0090-5364. doi: 10.1214/14- AOS1252. URL http://dx. doi.org/10.1214/14- AOS1252 . 25 Rob ert Ghrist. Barcodes: the p ersisten t top ology of data. Bul l. A mer. Math. So c. (N.S.) , 45(1):61–75, 2008. ISSN 0273-0979. Allen Hatcher. A lgebr aic T op olo gy . Cambridge Univ ersity Press, Cambridge, 2002. ISBN 0-521-79160-X; 0-521-79540-0. Giseon Heo, Jennifer Gamble, and P eter T. Kim. T op ological analysis of v ariance and the maxillary complex. J. Amer. Statist. Asso c. , 107(498):477–492, 2012. ISSN 0162-1459. J. Hoﬀmann-Jørgensen and G. Pisier. The law of large num bers and the cen tral limit theorem in Banac h spaces. Ann. Pr ob ability , 4(4):587–599, 1976. Matthew Kahle. Random geometric complexes. Discr ete Comput. Ge om. , 45:553–573, 2011. Matthew Kahle and Elizab eth Meck es. Limit theorems for Betti num bers of random simpli- cial complexes. Homolo gy Homotopy Appl. , 15(1):343–374, 2013. ISSN 1532-0073. Violeta Ko v acev-Nikolic, Giseon Heo, Dragan Nik oli ´ c, and P eter Bub enik. Using cycles in high dimensional data to analyze protein binding. 2014. arXiv:1412.1394 [stat.ME]. Mic hel Ledoux and Mic hel T alagrand. Pr ob ability in Banach Sp ac es . Classics in Mathemat- ics. Springer-V erlag, Berlin, 2011. ISBN 978-3-642-20211-7. Isop erimetry and pro cesses, Reprin t of the 1991 edition. Y uriy Mileyko, Sa y an Mukherjee, and John Harer. Probabilit y measures on the space of p ersistence diagrams. Inverse Pr oblems , 27(12):124007, 22, 2011. ISSN 0266-5611. Nik ola Milosavljevi ´ c, Dmitriy Morozov, and Primoˇ z ˇ Skraba. Zigzag p ersistent homology in matrix m ultiplication time. In Computational Ge ometry (SCG’11) , pages 216–225. A CM, New Y ork, 2011. Dimitriy Morozov. Dionysus: a C++ library with v arious algorithms for computing p ersis- ten t homology . Softw are av ailable at http://www.mrzv.org/software/dionysus/ , 2012. Elizab eth Munc h, Paul Bendic h, Katharine T urner, Say an Mukherjee, Jonathan Mat- tingly , and John Harer. Probabilistic fr´ ec het means and statistics on vineyards. 2013. arXiv:1307.6530 [math.PR]. Vidit Nanda. Perseus: the p ersistent homology soft ware. Softw are a v ailable at http://www. math.rutgers.edu/ ~ vidit/perseus/index.html , 2013. Monica Nicolau, Arnold J. Levine, and Gunnar Carlsson. T op ology based data analysis iden tiﬁes a subgroup of breast cancers with a unique mutational proﬁle and excellen t surviv al. Pr o c. Nat. A c ad. Sci. , 108(17):7265–7270, 2011. Andrew Robinson and Katharine T urner. Hypothesis testing for top ological data analysis. 2013. arXiv:1310.7467 [stat.AP]. Andrew T ausz, Mik ael V ejdemo-Johansson, and Henry Adams. Jav aplex: a researc h soft ware pac k age for persistent (co)homology . Softw are av ailable at http://code.google.com/ javaplex , 2011. Katharine T urner, Y uriy Mileyko, Sa yan Mukherjee, and John Harer. F r´ ec het means for distributions of p ersistence diagrams. Discr ete Comput. Ge om. , 52(1):44–70, 2014. Andrew T. A. W o o d and Grace Chan. Sim ulation of stationary Gaussian pro cesses in [0 , 1] d . J. Comput. Gr aph. Statist. , 3(4):409–432, 1994. ISSN 1061-8600. Afra Zomoro dian and Gunnar Carlsson. Computing p ersisten t homology . Discr ete Comput. Ge om. , 33(2):249–274, 2005. ISSN 0179-5376. Dep ar tment of Ma thema tics, Cleveland St a te University, Cleveland, OH 44115-2214, USA E-mail addr ess : peter.bubenik@gmail.com 26

Statistical topological data analysis using persistence landscapes

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment