Introduction to the R package TDA
We present a short tutorial and introduction to using the R package TDA, which provides some tools for Topological Data Analysis. In particular, it includes implementations of functions that, given some data, provide topological information about the…
Authors: Brittany Terese Fasy, Jisu Kim, Fabrizio Lecci
In tro duction to the R pac k age TD A Brittan y T. F asy Computer Scienc e Dep artment,T ulane University Jisu Kim Dep artment of Statistics, Carne gie Mel lon University F abrizio L ecci Dep artment of Statistics, Carne gie Mel lon University Cl ´ emen t Maria Ge ometric a Gr oup, INRIA Sophia A ntip olis-M´ editerr an´ ee Abstract W e present a sho rt tutorial and in tro duction to using the R pack age TDA , whic h provides some tools for T opological Data Analysis. In particular, it includes implemen tations of functions that, giv en some data, pro vide topological information ab out the underlying spac e, suc h as the distance function, the distance to a measure, the kNN densit y estimator, the kernel density estimator, and the kernel distance. The salient topological features of the sublevel sets (or sup erle vel sets) of these functions can b e quan tified with persistent homology . W e provide an R in terface for the efficien t algorithms of the C++ libraries GUDHI , Dion ysus and PHA T , including a function for the p ersisten t homology of the Rips filtration, and one for the persistent homology of sublevel sets (or sup erlev el sets) of arbitrary functions ev aluated ov er a grid of p oin ts. The significance of the features in the resulting persistence diagrams can b e analyzed with functions t hat implemen t recen tly developed statistical methods. The R pac k age TDA also includes the implemen tation of an algorithm for densit y clustering, which allows us to identify the spatial organization of the probabilit y mass asso ciated to a density function and visualize it by means of a dendrogram, the cluster tree. Key wor ds: T opological Data Analysis, P ersistent Homology , Density Clustering ? Research partially supp orted by NSF CAREER Grant DMS 1149677. Email addr esses: brittany.fasy@alumni.duke.edu (Brittany T. F asy), jisuk1@andrew.cmu.edu (Jisu Kim), lecci@cmu.edu (F abrizio Lecci), clement.maria@inria.fr (Cl´ ement Maria). Preprin t submitted to Journal of Symbolic Computation 30 January 2015 1. In tro duction T op ol ogical Data Analysis (TDA) refers to a collection of metho ds for finding top o- logical structure in data (Carlsson, 2009). Recen t adv ances in computational top ol ogy ha v e made it p ossible to actually compute topological in v ariants from data. The input of these pro cedures t ypically takes the form of a p oin t cloud, regarded as p ossibly noisy observ ations from an unknown lo wer-dimensional set S whose in teresting top ological fea- tures were lost during sampling. The output is a collection of data summaries that are used to estimate the top ological features of S . One approac h to TDA is p ersisten t homology (Edelsbrunner and Harer, 2010), a method for studying the homology at multiple scales sim ultaneously . More precisely , it pro vides a framework and efficien t algorithms to quantify the evolution of the topology of a family of nested topological spaces. Giv en a real-v alued function f , such as the ones described in Section 2, persistent homology describ es how the top ology of the lo wer level sets { x : f ( x ) ≤ t } (or superlevel sets { x : f ( x ) ≥ t } ) c hange as t increases from −∞ to ∞ (or decreases from ∞ to −∞ ). This information is encoded in the p ersistence diagram, a m ultiset of p oin ts in the plane, each corresp onding to the birth-time and death-time of a homological feature that exists for some interv al of t . This pap er is devoted to the presentation of the R pack age TDA , which provides a user-friendly inte rface for the efficient algorithms of the C++ libraries GUDHI (Maria, 2014), Dionysus (Morozo v, 2007), and PHA T (Bauer et al., 2012). The pac k age can b e do wnloaded from http://cran.r- project.org/web/pack ages/TDA/index.html . In Section 2, we describe how to compute some widely studied functions that, starting from a p oin t cloud, provide top ologic al information ab out the underlying space: the distance function ( d istFct ), the di stance to a measure function ( dtm ), the k Nearest Neigh bor density estimator ( knnDE ), the ker nel density estimator ( kde ), and the kernel distance ( kernelDist ). Section 3 is devoted to the computation of p ersistence diagrams: the function gridDiag can b e used to compute p ers isten t homology of sublevel sets (or superlevel sets) of functions ev aluated o ver a gri d of p oin ts; the function ripsDiag returns the p ersistence diagram of the Rips filtration built on top of a p oin t cloud. One of the k ey c hallenges in p ersisten t homology is to find a w ay to separate the p oin ts of the p ersistence diagram represen ting the topological noise from the points representing the top ological signal. Statistical methods for p ersisten t homology provide an alternativ e to its exact computation. Knowing with high confidence that an approximated p ersis- tence diagram is close to the true–computationally infeasible–diagram is often enough for practical purp oses. F asy et al. (2014), Chazal et al. (2014c), and Chazal et al. (2014a) propose several statistical metho ds to construct confidence sets for p ersistence diagrams and other summary functions that allo w us to separate topological signal from topologi- cal noise. The metho ds are implement ed in the TDA pack age, and describ ed in Section 3. Finally , the TDA pack age provides the implementation of an algorithm for density clustering. This metho d allo ws us to identify and visualize the spatial organization of the data, without sp ecific knowledge about the data generating mechanism and in particular without any a priori information ab out the n um b er of clusters. In Section 4, w e describ e the function clusterTree , that, given a density estimator, encodes the hierarch y of the connected comp onen ts of its superlevel sets into a dendrogram, the cluster tree (Kp otuf e and von Luxburg, 2011; Ken t, 2013). 2 2. Distance F unctions and Density Estimators As a first example illustrating how to use the TDA pack age, we show how to compute distance functions and density estimators ov er a grid. The setting is the typical one in TD A: a set of p oin ts X = { x 1 , . . . , x n } ⊂ R d has b een sampled from some distribution P and we are interested in reco vering the top ological features of the underlying space by studying some functions of the data. The following code generates a sample of 400 p oin ts from the unit circle and constructs a grid ov er which w e will ev aluate the functions. .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > library("TDA") > X = circleUnif(400) > Xlim=c(-1.6, 1.6); Ylim=c(-1.7, 1.7); by=0.065 > Xseq=seq(Xlim[1], Xlim[2], by=by) > Yseq=seq(Ylim[1], Ylim[2], by=by) > Grid=expand.grid(Xseq ,Yseq) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . The TDA pack age pro vides implemen tations of the following functions: (1) The distance function is defined for eac h y ∈ R d as ∆( y ) = inf x ∈ X k x − y k 2 and is computed for each p oin t of the Grid with the follo wing co de: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > distance = distFct(X=X, Grid=Grid) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . (2) Given a probabilit y measure P on R d , the distance-to-a-measure (DTM) is defined for each y ∈ R d as d m 0 ( y ) = s 1 m 0 Z m 0 0 ( G − 1 y ( u )) 2 du, where G y ( t ) = P ( k X − y k ≤ t ) and 0 < m 0 < 1 is a smo othing parameter. The DTM can be seen as a smo othed version of the distance function. F or more details, see Chazal et al. (2011). Giv en X = { x 1 , . . . , x n } ⊂ R d , the empirical version of the DTM is ˆ d m 0 ( y ) = v u u t 1 k X x i ∈ N k ( y ) k x i − y k 2 , where k = d m 0 n e and N k ( y ) is the set containing the k nearest neighbors of y among x 1 , . . . , x n . The DTM is computed for each p oin t of the Grid with the follo wing co de: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > m0=0.1 > DTM=dtm(X=X, Grid=Grid, m0=m0) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . (3) The k Nearest Neighbor density estimator, for eac h y ∈ R d , is defined as ˆ δ k ( y ) = k n v d r d k ( y ) , where v d is the volume of the Euclidean d -dimensional unit ball and r k ( x ) is the Euclidean distance form p oin t x to its k th closest neighbor among the p oin ts of X . It is computed for each p oin t of the Grid with the following co de: 3 .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > k=60 > kNN=knnDE(X=X, Grid=Grid, k=k) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . (4) The Gaussian Kernel Densit y Estimator (KDE), for each y ∈ R d , is defined as ˆ p h ( y ) = 1 n ( √ 2 π h ) d n X i =1 exp −k y − x i k 2 2 2 h 2 . where h is a smo othing parameter. It is computed for each point of the Grid with the followin g co de: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > h=0.3 > KDE= kde(X=X, Grid=Grid, h=h) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . (5) The Kernel distance estimator, for each y ∈ R d , is defined as ˆ κ h ( y ) = v u u t 1 n 2 n X i =1 n X j =1 K h ( x i , x j ) + K h ( y , y ) − 2 1 n n X i =1 K h ( y , x i ) , where K h ( x, y ) = exp −k x − y k 2 2 2 h 2 is the Gaussian Kernel with smo ot hing parame- ter h . The Kernel distance i s computed for eac h point of the Grid with t he follo wing co de: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > h=0.3 > Kdist= kernelDis t(X=X, Grid=Grid, h=h) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . F or this tw o-dimensional example, we can visualize the functions using persp form the graphics pack age. F or example the following co de pro duces the KDE plot in Figure 1: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > persp(Xseq,Yseq,matri x(KDE,ncol=length(Yseq), nrow=length(Xseq)), + xlab="",ylab="",zlab= "",theta=-20,phi=35, ltheta=50, col=2, border=NA, + main="KDE", d=0.5, scale=FALSE, expand=3, shade=0.9) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . 2.1. Bo otstr ap Confidenc e Bands W e can construct a (1 − α ) confidence band for a function using the b ootstrap algo- rithm, whic h we briefly describe using the kernel density estimator: (1) Given a sample X = { x 1 , . . . , x n } , compute the kernel densit y estimator ˆ p h ; (2) Draw X ∗ = { x ∗ 1 , . . . , x ∗ n } from X = { x 1 , . . . , x n } (with replacemen t), and compute θ ∗ = √ n k ˆ p ∗ h ( x ) − ˆ p h ( x ) k ∞ , where ˆ p ∗ h is the density estimator computed using X ∗ ; (3) Rep eat the previous step B times to obtain θ ∗ 1 , . . . , θ ∗ B ; (4) Compute q α = inf n q : 1 B P B j =1 I ( θ ∗ j ≥ q ) ≤ α o ; (5) The (1 − α ) confidence band for E [ ˆ p h ] is h ˆ p h − q α √ n , ˆ p h + q α √ n i . F asy et al. (2014) and Chazal et al. (2014a) pro v e the v alidity of the bo otstrap algo- rithm for k ernel density estimators, distance-to-a-measure, and kernel distance, and use 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 1.0 − 0.5 0.0 0.5 1.0 − 1.0 − 0.5 0.0 0.5 1.0 Sample X Distance Function DTM kNN KDE Kernel Distance Fig. 1. distance functions and density estimators ev aluated ov er a grid of p oin ts. it in the framework of p ersisten t homology . The b ootstrap algorithm is implemented in the function bootstrapBand , which provides the option of parallelizing the algorithm ( parallel=TRUE ) using the pack age parallel . The following co de computes a 90% confi- dence band for E [ ˆ p h ], show ed in Figure 2. .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > band=bootstrapBand(X= X, FUN=kde, Grid=Grid, B=100, + parallel=FALSE, alpha=0.1, h=h) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . KDE with 90% Confidence Band Cross − section Fig. 2. the 90% confidence band for E [ ˆ p h ] has the form [ `, u ] = [ ˆ p h − q α / √ n , ˆ p h + q α / √ n ]. The plot on the righ t sho ws a cross-section of the functions: the red surface is the KDE ˆ p h ; the pink surfaces are ` and u . 5 3. P ersistent Homology W e provide an informal description of the implemented metho ds of persistent ho- mology . W e assume the reader is familiar with the basic concepts and, for a rigorous exposition, w e refer to the textb ook Edelsbrunner and Harer (2010). 3.1. Persistent Homolo gy Over a Grid In this section, w e describ e ho w to use the gridDiag function to compute the p ersis- ten t homology of sublev el (and sup erlev el) sets of the functions described in Section 2. The function gridDiag ev aluates a given real v alued function ov er a triangulated grid, constructs a filtration of simplices using the v alues of the function, and computes the p er sisten t homology of the filtration. F rom version 1.2, gridDiag works in arbitrary di- mension. The core of the function is written in C++ and the user can choose to compute p er sistence diagrams using either the Dionysus library or the PHA T library . The follo wing code computes the p ersisten t homology of the sup erlev el sets ( sublevel=FALSE ) of the kernel densit y estimator ( FUN=kde , h=0.3 ) using the p oin t cloud stored in the matrix X from the previous example. The same co de w ould w ork for the other functions defined in Section 2 (it is sufficient to replace kde and its smo oth- ing parameter h with another function and the corresp onding parameter). The function gridDiag returns an ob ject of the class "diagram" . The other inputs are the features of the grid ov er which the kde is ev aluated ( lim and by ), the smo othing parameter h , and a logical v ariable that indicat es whether a progress bar should b e printe d ( printProgress ). .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > Diag = gridDiag( X=X, FUN=kde, h=0.3, lim=cbind(Xlim,Ylim), by=by, + sublevel=F, library="Dionysus", printProgress=FALSE )$diagram .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . W e plot the data and the diagram, using the function plot , implemen ted as a standard S3 metho d for ob jects of the class "diagram" . The follo wing command produces the third plot in Figure 3. .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > plot(Diag, band=2*band$width, main="KDE Diagram") .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . The option band=2*band$width produces a pink confidence band for the p ersistence diagram, using the confidence band constructed for the corresponding kernel density estimator in the previous section. The features abov e the band can b e interpreted as represen ting significan t homological features, while p oin ts in the band are not signifi- can tly different from noise. The v alidity of the b ootstrap confidence band for p ersistence diagrams of KDE, DTM, and Kernel Distance deriv e from the Stability The or em (Chazal et al., 2012) and is discussed in detail in F asy et al. (2014) and Chazal et al. (2014a). The function plot for the class " diagram" pro vides the options of rotating the diagram ( rotated=TRUE ), dra wing the barco de in place of the diagram ( barcode=TRUE ), as well as other standard graphical options. See Figure 4. 3.2. Ri ps Diagr ams The Vietoris-R ips c omplex R ( X , ε ) consists of simplices with vertices in X = { x 1 , . . . , x n } ⊂ R d and diameter at most ε . In other w ords, a simplex σ is in- cluded in the complex if each pair of v ertices in σ is at most ε apart. The sequence of Rips complexes obtained by gradually increasing the radius ε creates a filtration. 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 Sample X KDE KDE Diagram ● ● ● ● ● 0.00 0.10 0.20 0.00 0.10 0.20 Death Bir th ● component loop Fig. 3. The plot on the righ t sho ws the persistence diagram of the sup erlev el sets of the KDE. Blac k p oin ts represent connected comp onen ts and red triangles represent lo ops. The features are b orn at high levels of the density and die at low er levels. The pink 90% confidence band separates significant features from noise. .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > par(mfrow=c(1,2), mai=c(0.8,0.8,0.3,0.1)) > plot(Diag, rotated=T, band=band$width, main="Rotated Diagram") > legend(0.13,0.25, c("component", "loop"), col=c(1,2), pch=c(20,2), cex=0.9, pt.lwd=2) > plot(Diag, barcode=T, main="Barcode") .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . Rotated Diagram ● ● ● ● ● 0.00 0.10 0.20 0.00 0.10 0.20 (Death+Bir th)/2 (Bir th − Death)/2 ● component loop Barcode 0.00 0.10 0.20 time dim 0 dim 1 Fig. 4. Rotated Persistence Diagram and Barcode The ripsDiag function computes the p ersistence diagram of the Rips filtration built on top of a p oin t cloud. The user can choose to compute the Rips p ersistence diagram using either the C++ library GUDHI , or Dion ysus . The following co de generates 60 p oin ts from t w o circles: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > Circle1 = circleUnif(60) > Circle2 = circleUnif(60, r=2) +3 > Circles=rbind(Circle1 ,Circle2) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . W e sp e cify the limit of the Rips filtration and the maxim um dimension of the homological features we are in terested in (0 for comp onen ts, 1 for lo ops, 2 for v oids, etc.): 7 .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > maxscale=5 # limit of the filtration > maxdimension=1 # components and loops .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . and we generate the p ersistenc e diagram: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > Diag=ripsDiag(X=Circl es, maxdimension, maxscale, + library="GUDHI", printProgress=FALSE)$diagram .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . Alternativ ely , using the option dist="arbitrary" in ripsDiag() , the input X can be an n × n matrix of distances. This option is useful when the user wan ts to consider a Rips filtration constructed using an arbitrary distance and is curren tly only av ailable for the option library="Dionysus" . Finally , w e plot the data and the diagram: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > par(mfrow=c(1,2), mai=c(0.8,0.8,0.3,0.1)) > plot(Circles, pch=16, xlab="",ylab="", main="Sample") > plot(Diag, main="Rips Diagram") > legend(2.5,2, c("component", "loop"), col=c(1,2), pch=c(20,2), cex=0.9, pt.lwd=2) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 1 0 1 2 3 4 5 − 1 0 1 2 3 4 5 Sample Rips Diagram ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 0 1 2 3 4 5 Bir th Death ● component loop Fig. 5. Rips persistence diagram. Blac k points represen t connected componen ts and red triangles represen t lo ops. 3.3. Bottlene ck and Wasserstein Distanc es Standard metrics for measuring the di stance betw een t w o persistence diagrams are the b ot tlenec k distance and the p th W asserstein distance (Edelsbrunner and Harer, 2010). The TDA pack age includes the functions bottleneck and wasserstein , which are R wrappers of the Dionysus functions “bottleneck distance” and “wassers tein distance.” W e generate tw o p ersistence diagrams of the Rips filtrations built on top of the tw o (separate) circles of the previous example, .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > Diag1=ripsDiag(Circle 1,maxdimension=1, maxscale=5, printProgress=FALSE)$diagram > Diag2=ripsDiag(Circle 2,maxdimension=1, maxscale=5, printProgress=FALSE)$diagram .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . 8 and w e compute the b ottlenec k distance and the 2nd W asserstein distance b et ween the t w o diagrams. In the followin g co de, the option dimension=1 sp ecifies that the distances b et ween diagrams are computed using only one-dimensional homological feature s (l o ops). .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > print( bottlenec k(Diag1, Diag2, dimension=1) ) [1] 1.15569 > print( wasserste in(Diag1, Diag2, p=2, dimension=1) ) [1] 1.680847 .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . 3.4. L andsc ap es and Silhouettes P ersistence landscap es and silhouettes are real-v alued functions that further summa- rize the information contained in a p ersistence diagram. They hav e b een introduced and studied in Bub enik (2012), Chazal et al. (2014c), and Chazal et al. (2014b). W e briefly in troduce the tw o functions. Landscap e. The p ersistence landscap e is a sequence of contin uous, piecewise linear functions λ : Z + × R → R that provide an enco ding of a p ersistence diagram. T o define the landscape, consider the set of functions created by tenting each p oin t x = ( x 1 , x 2 ) = b + d 2 , d − b 2 represen ting a birth-death pair ( b, d ) in the persistence diagram D as follo ws: Λ p ( t ) = t − x 1 + x 2 t ∈ [ x 1 − x 2 , x 1 ] x 1 + x 2 − t t ∈ ( x 1 , x 1 + x 2 ] 0 otherwise = t − b t ∈ [ b, b + d 2 ] d − t t ∈ ( b + d 2 , d ] 0 otherwise . (1) W e obtain an arrangement of piecewise linear curves by ov erlaying the graphs of the functions { Λ x } x ; see Figure 6 (left). The p ersistence landscape of D is a summary of this arrangemen t. F ormally , the p ersistence landscap e of D is the collection of functions λ ( k , t ) = k max x Λ x ( t ) , t ∈ [0 , T ] , k ∈ N , (2) where k max is the k th largest v alue in the set; in particular, 1max is the usual maxi- m um function. see Figure 6 (middle). Silhouette. Consider a p e rsistence diagram with N off diagonal points { ( b j , d j ) } N j =1 . F or ev ery 0 < p < ∞ we define the pow er-w eighted silhouette φ ( p ) ( t ) = P N j =1 | d j − b j | p Λ j ( t ) P N j =1 | d j − b j | p . The v alue p can b e thought of as a trade-off parameter b et ween uniformly treating all pairs in the p ersistence diagram and considering only the most p ersisten t pairs. Sp ecif- ically , when p is small, φ ( p ) ( t ) is dominated by the effect of low p ersistence features. Con v ersely , when p is large, φ ( p ) ( t ) is dominated b y the most p ersisten t features; see Figure 6 (right). The landscap e and silhouette functions can b e ev aluated ov er a one-dimensional grid of p oin ts tseq using the functions landscape and silhouette . In the follo wing co de, we use the persistence diagram from Figure 5 to construct the corresp onding landscap e and silhouette for one-dimensional features ( dimension=1 ). The option KK=1 sp ecifies that w e are interested in the 1st landscap e function, and p=1 is the p o wer of the weigh ts in the definition of the silhouette function. .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . 9 Arrangement of Λ x 0 2 4 6 8 0.0 0.5 1.0 1.5 2.0 (Bir th+Death)/2 (Death − Bir th)/2 ● ● ● ● ● ● ● ● ● ● 1st Landscape t λ ( 1 , t ) 0 2 4 6 8 0.0 0.5 1.0 1.5 2.0 Silhouette (p=1) t φ ( 1 ) ( t ) 0 2 4 6 8 0.0 0.5 1.0 1.5 2.0 Fig. 6. Left: we use the rotated axes to represen t a persistence diagram D . A feature ( b, d ) ∈ D is represen ted by the p oin t ( b + d 2 , d − b 2 ) (pink). In words, the x -coordinate is the av erage parameter v alue o v er whic h the feature exists, and the y -coordinate is the half-life of the feature. Middle: the blue curv e is the landscap e λ (1 , · ). Right: the blue curv e is the silhouette φ (1) ( · ). > tseq=seq(0,maxscale, length=1000) #domain > Land= landscape( Diag, dimension=1, KK=1, tseq) > Sil=silhouette(Diag, p=1, dimension=1, tseq) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . The functions landscape and silhouette return real v alued vectors, which can b e simply plotted with plot(tseq, Land, type="l"); plot(tseq, Sil, type="l") . See Figure 7. 1st Landscape t λ ( 1 , t ) 0 1 2 3 4 5 0.0 1.0 Silhouette (p=1) t φ ( 1 ) ( t ) 0 1 2 3 4 5 0.0 1.0 Fig. 7. Landscape and Silhouette of the one-dimensional features of the diagram of Figure 5. 3.5. Confidenc e Bands for L andsc ap es and Silhouettes Recen t results in Chazal et al. (2014c) and Chazal et al. (2014b) sho w ho w to construct confidence bands for landscap es and silhouet tes, using a b o otstr ap algorithm (specifically , the m ultiplier bo otstrap). This str ategy is useful in the follo wing scenario. W e ha ve a v ery large dataset with N p oin ts. There is a diagram D and landscap e λ corresp onding to some filtration built on the data. When N is large, computing D is prohibitive. Instead, w e dra w n subsamples, each of size m . W e compute a diagram and a landscap e for each subsample yielding landscap es λ 1 , . . . , λ n . (Assuming m is muc h smaller than N , these subsamples are essen tially independent and iden tically distributed.) Then we compute 1 n P i λ i , an estimate of E ( λ i ), whic h can b e regarded as an approximation of λ . The function multipBoo tstrap uses the landscap es λ 1 , . . . , λ n to construct a confidence band for E ( λ i ). The same strategy is v alid for silhouette functions. W e illustrate the metho d with a simple example. First we sample N p oin ts from tw o circles: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . 10 > N=4000 > XX1 = circleUnif(N/2) > XX2 = circleUnif(N/2, r=2) +3 > X=rbind(XX1,XX2) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . Then we sp ecify the num b er of subsamples n , the subsample size m , and we create the ob jects that will store the n diagrams and landscap es: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > m=80 # subsample size > n=10 # we will compute n landscapes using subsamples of size m > tseq=seq(0,maxscale, length=500) #domain of landscapes > Diags=list() #here we store n Rips diags > Lands=matrix(0,nrow=n , ncol=length(tseq)) #here we store n landscap es .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . F or n times, we subsample from the large p oin t cloud, compute n Rips diagrams and the corresp onding 1st landscap e functions ( KK=1 ), using one-dimensional features ( dimension=1 ): .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > for (i in 1:n){ + subX=X[sample(1:N,m),] + Diags[[i]]=ripsDiag(subX,max dimension=1, maxscale=5) $diagram + Lands[i,]=landscape(Diags[[i ]], dimension=1, KK=1, tseq ) + } .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . Finally we use the n landscap es to construct a 95% confidence band for the mean landscape .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > bootLand=multipBootst rap(Lands,B=100,alpha=0.05, parallel=FALSE) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . whic h is plotted by the follo wing co de . See Figure 8. .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > plot(tseq, bootLand$mean, main="Mean Landscape with 95% band") > polygon(c(tseq, rev(tseq)), + c(bootLand$band[,1],rev(boot Land$band[,2])), col="pink") > lines(tseq, bootLand$mean, lwd=2, col=2) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . 3.6. Sele ction of Smo othing Par ameters An unsolv ed problem in top ological inference is how to c ho ose the smo othing param- eters, for example h for KDE and m 0 for DTM. Chazal et al. (2014a) suggest the followin g metho d, that w e describe here for the kernel densit y estimator, but works also for the kernel distance and the distance-to-a-measure. Let ` 1 ( h ) , ` 2 ( h ) , . . . , b e the lifetimes of the features of a p ersistence diagram at scale h . Let q α ( h ) / √ n b e the width of the confidence band for the kernel density estimator at scale h , as described in Section 2.1. W e define tw o quanti ties that measure the amount of significant information at lev el h : • The n um ber of significant features, N ( h ) = # n i : ` ( i ) > 2 q α ( h ) √ n o ; 11 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 1 0 1 2 3 4 5 − 1 0 1 2 3 4 5 Large Sample from Cir cles Mean Landscape, 95% band t λ ( 1 , t ) 0 1 2 3 4 5 0.0 0.4 0.8 1.2 Fig. 8. 95% confidence band for the mean landscap e function. • The total significan t p ersistence, S ( h ) = P i h ` i − 2 q α ( h ) √ n i + . These measures are small when h is small since q α ( h ) is large. On the other hand, they are small when h is large since all of the features of the KDE smo oth out. Thus we hav e a kind of top ological bias-v ariance tradeoff. W e choose h to maximize N ( h ) or S ( h ). The metho d is implemented in the function maxPersistence , as illustrated in the follo wing example. First, w e sample 1600 p oin ts from t wo circles (plus some cl utter noise) and we sp ecify the limits of the grid o v er whic h the KDE is ev aluated: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > XX1 = circleUnif(600) > XX2 = circleUnif(1000, r=1.5) +2.5 > noise=cbind(runif(80, -2,5),runif(80, -2,5)) > X=rbind(XX1,XX2, noise) > # Grid limits > Xlim=c(-2,5) > Ylim=c(-2,5) > by=0.2 .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . Then we sp ecify a sequence of smo othing parameters among which we will select the optimal one, the n umber of bo otstrap iterations and the level of the confidence bands to b e computed: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > parametersKDE=seq(0.1 ,0.6,by=0.05) > B=50 # number of bootstrap iterations. Should be large. > alpha=0.1 # level of the confidence bands .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . The function maxPersistence can b e parallelized ( parallel=TRUE ) and a progress bar can b e prin ted ( printProgress=TRUE ): .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > maxKDE=maxPersistence (kde, parametersKDE, X, lim=cbind(Xlim, Ylim), + by=by, sublevel=F, B=B, alpha=alpha, + parallel=TRUE, bandFUN= "bootstrapBand") .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . The S3 metho ds summary and plot are implemented for the class "maxPersistence" . W e can display the v alues of the parameters that maximize the t w o criteria: .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . 12 > print(summary(maxKDE) ) Call: maxPersistence(FUN = kde, parameters = parametersKD E, X = X, lim = cbind(Xlim, Ylim), by = by, sublevel = F, B = B, alpha = alpha, bandFUN = "bootstr apBand", parallel = TRUE) The number of significant features is maximized by [1] 0.20 0.25 0.30 The total signific ant persistence is maximized by [1] 0.1 .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . and pro duce the summary plot of Figure 9. .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > par(mfrow=c(1,2), mai=c(0.8,0.8,0.35,0.3)) > plot(X, pch=16, cex=0.5, main="Two Circles + Noise", xlab="", ylab="") > plot(maxKDE, main="Max Persistence (KDE)") .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 2 − 1 0 1 2 3 4 5 − 2 − 1 0 1 2 3 4 T wo Circ les + Noise Max Per sistence (KDE) parameter P ersistence 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Fig. 9. Max Persistence Method for the selection of smo othing parameters. F or each v alue of the smo othing parameter we display the persistence of the corresponding homological features, along with a (pink) confidence band that separates the statistically significan t features from the topological noise. 4. Densit y Clustering The final example of this pap er demonstrates the use of the function clusterTree , whic h is an implementation of Algorithm 1 in Kent et al. (2013). First, we briefly describe the task of density clustering; w e defer the reader to Kent (2013) for a more rigorous and complete description. Let f b e the densit y of the probabil- it y distribution P generating the observ ed sample X = { x 1 , . . . , x n } ⊂ R d . F or a thresh- old v alue λ > 0, the corresp onding sup er level set of f is L f ( λ ) := cl( { x ∈ R s : f ( x ) > λ } ), and its d -dimensional subsets are called high-densit y regions. The high-densit y clusters of P are the maximal connected subsets of L f ( λ ). By considering all the level sets simul- taneously (from λ = 0 to λ = ∞ ), we can record the evolution and the hierarc hy of the 13 high-densit y clusters of P . This naturally leads to the notion of the cluster density tree of P (see, e.g., Hartigan (1981)), defined as the collection of sets T := { L f ( λ ) , λ ≥ 0 } , whic h satisfies the tree property: A, B ∈ T implies that A ⊂ B or B ⊂ A or A ∩ B = ∅ . W e will refer to this construction as the λ -tree. Alternativ ely , Kent et al. (2013) intro- duced the α -tree and the κ -tree, whic h facilitate the in terpretation of the tree b y preci sely enco d ing the probability conten t of eac h tree branch rather than the density level. Clus- ter trees are particularly useful for high-dimensional data, whose spatial organization is difficult to represent. W e illustrate the strategy with a simple example. The first step is to generate a 2D p oin t cloud from three (not so well) separated clusters (see top left plot of Figure 10): .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > X1=cbind(rnorm(300,1, .8),rnorm(300,5,0.8)) > X2=cbind(rnorm(300,3. 5,.8),rnorm(300,5,0.8)) > X3=cbind(rnorm(300,6, 1),rnorm(300,1,1)) > XX=rbind(X1,X2,X3) .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . Then, we use the function clusterTree to compute cluster trees using the k Nearest Neigh bors densit y estimator ( density="knn" , k=100 nearest neigh b ors) . .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > Tree=clusterTree(XX, k=100, density="knn") .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . Alternativ ely , we can estimate the density using a Gaussian kernel densit y estimator ( density="kde" , h=0.3 for the smo othing parameter). Ev en when the KDE is used to estimate the density , w e hav e to pro vide the option k=100 , so that the algorithm can compute the connected comp onen ts at each lev el of the density using a k Nearest Neigh bors graph. The "clusterTree" ob ject Tree con tains information ab out the λ -tree, α -tree and κ -tree. The function plot for ob jects of the class "clusterTree" pro duces the plot in the middle of Figure 10. .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . > plot(Tree, type="lambda", main="lambda Tree (knn)") .. .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. .. ... .. .. ... .. . ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 0 2 4 6 8 −2 0 2 4 6 Sample XX[,2] λ −tree (KNN) lambda 0.0 0.2 0.4 0.6 0.8 1.0 0 0.045 0.071 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 0 2 4 6 8 −2 0 2 4 6 Cluster Labels XX[,2] ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Fig. 10. The λ -tree of the k Nearest Neigh b or density estimator. 14 5. Conclusions The TD A pack age makes top ological data analysis accessible to a larger audience by pro viding access to the efficient algorithms for the computation of p ersisten t homology from the C++ libraries GUDHI , Dionysus , and PHA T to R users. Moreov er, the pack age includes the implementation of an algorithm for density clustering and a series of to ols for the statistical analysis of p ersiste n t homology , including the metho ds describ ed in F asy et al. (2014), Chazal et al. (2014c), and Chazal et al. (2014a). W e would like TDA to b ecome a comprehensiv e assistan t for the analysis of data from a top ological p oin t of view. The pac k age will keep providi ng an R interface for the most efficient libraries for TDA and we plan to include new functionalities for all the steps in the data analysis pro cess: new to ols for summarizing the top ological information con tained in the data, metho ds for the statistical analysis of the summaries, and functions for displayi ng the results. Ac knowledgemen ts The authors would like to thank F r´ ed ´ eric Chazal, Jessi Cisewski, Bertrand Mic hel, Alessandro Rinaldo and Larry W asserman for their insigh tful discussions and advice. The authors also thank Dmitriy Morozov for dev eloping Dion ysus and Ulric h Bauer, Mic hael Kerb er, Jan Reininghaus for developi ng PHA T , and for making these pac k ages freely av ailable. References Bauer, U., Kerb er, M., Reininghaus, J., 2012. PHA T, a softw are library for p ers isten t homology . https://code.google.com/p/ phat/ . Bubenik, P ., 2012. Statistical top ological data analysis using p ersistence landscap es. arXiv preprin t Carlsson, G., 2009. T op ology and data. Bulletin of the American Mathematical So cie t y 46 (2), 255–308. Chazal, F., Cohen-Steiner, D., M´ erigot, Q., 2011. Geometric inference for probability measures. F oundations of Computational Mathematics 11 (6), 733–751. Chazal, F., de Silv a, V., Glisse, M., Oudot, S., 2012. The structure and stabilit y of p er sistence mo dules. arXiv preprin t Chazal, F., F asy , B. T., Lecci, F., Michel, B., Rinaldo, A., W asserman, L., 2014a. Robust topological inference: Distance-to-a-measure and kernel distance. T echnical Report. Chazal, F., F asy , B. T., Lecci, F., Michel, B., Rinaldo, A., W asserman, L., 2014b. Sub- sampling metho ds for persistent homology . arXiv preprin t Chazal, F., F asy , B. T., Lecci, F., Rinaldo, A., W asserman, L., 2014c. Sto c hastic con ver- gence of p ersistence landscapes and silhouettes. In: Ann ual Symp osium on Computa- tional Geometry . ACM, pp. 474–483. Edelsbrunner, H., Harer, J., 2010. Computational top ology: an introduction. American Mathematical So ciet y . F asy , B. T., Lecci, F., Rinaldo, A., W asserman, L., Balakrishnan, S., Singh, A., 2014. Confidence sets for p ersistence diagrams. T o app e ar in The Annals of Statistics. 15 Hartigan, J. A., 1981. Consistency of single link age for high-density clusters. Journal of the American Statistical Asso ciation 76 (374), 388–394. Ken t, B., 2013. Lev el set trees for applied statistics. Ph.D. thesis, Departmen t of Statis- tics, Carnegie Mellon Universit y . Ken t, B. P ., Rinaldo, A., V erstynen, T., 2013. Debacl: A python pack age for interactiv e densit y-based clustering. arXiv preprin t Kp ot ufe, S., von Luxburg, U., 2011. Pruning nearest neighbor cluster trees. In: Interna- tional Conference on Machine Learning (ICML). Maria, C., 2014. GUDHI, simplicial complexes and p ersiste n t homology pac k ages. https: //project.inria.fr/gudhi/s oftware/ . Morozo v, D., 2007. Dionysus, a c++ library for computing p ersisten t homology . http: //www.mrzv.org/software/di onysus/ . 16
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment