Minimax Manifold Estimation

Minimax Manifold Estima tion Minimax Manifold Estimation Octob er 25, 2018 Christopher R. Geno v ese genovese@st a t.cmu.edu Dep artment of Statistics Carne gie Mel lon University Pittsbur gh, P A 15213, USA Marco P erone-P aciﬁco marco.per onep acifico@unir oma1.it Dep artment of Statistic al Scienc es Sapienza University of R ome R ome, Italy Isab ella V erdinelli isabella@st a t.cmu.edu Dep artment of Statistics Carne gie Mel lon University Pittsbur gh, P A 15213, USA and Dep artment of Statistic al Scienc es Sapienza University of R ome R ome, Italy Larry W asserman larr y@st a t.cmu.edu Dep artment of Statistics and Machine L e arning Dep artment Carne gie Mel lon University Pittsbur gh, P A 15213, USA Abstract W e ﬁnd the minimax rate of conv ergence in Hausdorﬀ distance for estimating a manifold M of dimension d em bedded in R D giv en a noisy sample from the manifold. Under certain conditions, we show that the optimal rate of conv ergence is n − 2 / (2+ d ) . Thus, the minimax rate dep ends only on the dimension of the manifold, not on the dimension of the space in whic h M is embedded. Keyw ords: Manifold learning, Minimax estimation. 1. In tro duction W e consider the problem of estimating a manifold M given noisy observ ations near the manifold. The observed data are a random sample Y 1 , . . . , Y n where Y i ∈ R D . The mo del for the data is Y i = ξ i + Z i (1) where ξ 1 , . . . , ξ n are unobserved v ariables dra wn from a distribution supp orted on a manifold M with dimension d < D . The noise v ariables Z 1 , . . . , Z n are drawn from a distribution F . Our main assumption is that M is a compact, d -dimensional, smo oth Riemannian submanifold in R D ; the precise conditions on M are given in Section 2.1. 1 Genovese, Perone-P a cifico, Verdinelli and W asserman A manifold M and a distribution for ( ξ , Z ) induce a distribution Q ≡ Q M for Y . In Section 2.2, we deﬁne a class of suc h distributions Q = n Q M : M ∈ M o (2) where M is a set of manifolds. Giv en tw o sets A and B , the Hausdorﬀ distance b etw een A and B is H ( A, B ) = inf n  : A ⊂ B ⊕  and B ⊂ A ⊕  o (3) where A ⊕  = [ x ∈ A B D ( x,  ) (4) and B D ( x,  ) is an open ball in R D cen tered at x with radius  . W e are in terested in the minimax risk R n ( Q ) = inf c M sup Q ∈Q E Q [ H ( c M , M )] (5) where the inﬁm um is o v er all estimators c M . By an estimator c M we mean a measurable function of Y 1 , . . . , Y n taking v alues in the set of all manifolds. Our ﬁrst main result is the follo wing minimax low er b ound which is pro ved in Section 3. Theorem 1 Under the c onditions given in Se ction 2, ther e is a c onstant C 1 > 0 such that, for al l lar ge n , inf c M sup Q ∈Q E Q h H ( c M , M ) i ≥ C 1  1 n  2 2+ d (6) wher e the inﬁmum is over al l estimators c M . Th us, no metho d of estimating M can ha v e an expected Hausdorﬀ distance smaller than the stated bound. Note that the rate dep ends on d but not on D even though the support of the distribution Q for Y has dimension D . Our second result is the following upp er b ound whic h is prov ed in Section 4. Theorem 2 Under the c onditions given in Se ction 2, ther e exists an estimator c M such that, for al l lar ge n , sup Q ∈Q E Q h H ( c M , M ) i ≤ C 2  log n n  2 2+ d (7) for some C 2 > 0 . Th us the rate is tight, up to logarithmic factors. The estimator in Theorem 2 is of theoretical interest b ecause it establishes that the low er b ound is tigh t. But, the estimator constructed in the pro of of that theorem is not practical and so in Section 5, we construct a very simple estimator c M such that sup Q ∈Q E Q h H ( c M , M ) i ≤  C log n n  1 /D . (8) 2 Minimax Manifold Estima tion This is slow er than the minimax rate, but the estimator is computationally very simple and requires no knowledge of d or the smoothness of M . R elate d Work. There is a v ast literature on manifold estimation. Much of the litera- ture deals with using manifolds for the purp ose of dimension reduction. See, for example, Baraniuk and W akin (2007) and references therein. W e are in terested instead in actually estimating the manifold itself. There is a large literature on this problem in the ﬁeld of computational geometry; see, for example, Dey (2006), Dey and Gosw ami (2004), Chazal and Lieutier (2008) Cheng and Dey (2005) and Boissonnat and Ghosh (2010). How ever, v ery few papers allow for noise in the statistical sense, by which we mean observ ations dra wn randomly from a distribution. In the literature on computational geometry , obser- v ations are called noisy if they depart from the underlying manifold in a v ery sp eciﬁc wa y: the observ ations ha ve to be close to the manifold but not to o close to each other. This notion of noise is quite diﬀeren t from random sampling from a distribution. An exception is Niy ogi et al. (2008) who constructed the follo wing estimator. Let I = { i : b p ( Y i ) > λ } where b p is a density estimator. They deﬁne c M = S i ∈ I B D ( Y i ,  ) and they sho w that if λ and  are chosen prop erly , then c M is homologous to M . (This means that M and c M share certain top ological prop erties.) How ever, the result do es not guarantee closeness in Hausdorﬀ distance. Note that S n i =1 B D ( Y i ,  ) is precisely the Devro y e-Wise estimator for the supp ort of a distribution (Devro ye and Wise (1980)). Notation. Given a set S , w e denote its b oundary by ∂ S . W e let B D ( x, r ) denote a D -dimensional op en ball cen tered at x with radius r . If A is a set and x is a p oin t then w e write d ( x, A ) = inf y ∈ A || x − y || where || · || is the Euclidean norm. Let A ◦ B = ( A ∩ B c ) [ ( A c ∩ B ) (9) denote symmetric set diﬀerence b etw een sets A and B . The uniform measure on a manifold M is denoted by µ M . Leb esgue measure on R k is denoted by ν k . In case k = D , we sometimes write V instead of ν D ; in other w ords V ( A ) is simply the volume of A . Any integral of the form R f is understo od to be the integral with resp ect to Lebesgue measure on R D . If P and Q are t wo probability measures on R D with densities p and q then the Hel linger distanc e betw een P and Q is h ( P , Q ) ≡ h ( p, q ) = s Z ( √ p − √ q ) 2 = s 2  1 − Z √ pq  (10) where the integrals are with resp ect to ν D . Recall that ` 1 ( p, q ) ≤ h ( p, q ) ≤ p ` 1 ( p, q ) (11) where ` 1 ( p, q ) = R | p − q | . Let p ( x ) ∧ q ( x ) = min { p ( x ) , q ( x ) } . The aﬃnity b etw een P and Q is || P ∧ Q || = Z p ∧ q = 1 − 1 2 Z | p − q | . (12) 3 Genovese, Perone-P a cifico, Verdinelli and W asserman Let P n denote the n -fold product measure based on n indep endent observ ations from P . In the app endix Section 7.1 w e show that || P n ∧ Q n || ≥ 1 8  1 − 1 2 Z | p − q |  2 n . (13) W e write X n = O P ( a n ) to mean that, for every  > 0 there exists C > 0 such that P ( || X n || /a n > C ) ≤  for all large n . Throughout, we use sym b ols like C, C 0 , C 1 , c, c 0 , c 1 . . . to denote generic p ositiv e con tan ts whose v alue may b e diﬀerent in diﬀeren t expressions. 2. Mo del Assumptions 2.1 Manifold Conditions W e shall b e concerned with d -dimensional compact Riemannian submanifolds without b oundary embedded in R D with d < D . (Informally , this means that M lo oks like R d in a small neigh b orhoo d around an y point in M .) W e assume that M is contained in some compact set K ⊂ R D . A t each u ∈ M let T u M denote the tangent space to M and let T ⊥ u M b e the normal space. W e can regard T u M as a d -dimensional h yp erplane in R D and w e can regard T ⊥ u M as the D − d dimensional h yp erplane p erp endicular to T u M . Deﬁne the ﬁb er of size a at u to b e L a ( u ) ≡ L a ( u, M ) = T ⊥ u M T B D ( u, a ). Let ∆( M ) b e the largest r such that eac h p oint in M ⊕ r has a unique pro jection on to M . The quantit y ∆( M ) will be small if either M highly curv ed or if M is close to b eing self-in tersecting. Let M ≡ M ( κ ) denote all d -dimensional manifolds em b edded in K suc h that ∆( M ) ≥ κ . Throughout this pap er, κ is a ﬁxed p ositive constan t. The quantit y ∆( M ) has b een rediscov ered man y times. It is called the c ondition numb er in Niyogi et al. (2006), the thickness in Gonzalez and Maddo c ks (1999) and the r e ach in F ederer (1959). An equiv alent deﬁnition of ∆( M ) is the following: ∆( M ) is the largest n umber r suc h that the ﬁb ers L r ( u ) nev er intersect. See Figure 1. Note that if M is a sphere then ∆( M ) is just the radius of the sphere and if M is a linear space then ∆( M ) = ∞ . Also, if σ < ∆( M ) then M ⊕ σ is the disjoin t union of its ﬁbers: M ⊕ σ = [ u ∈ M L σ ( u ) . (14) Deﬁne tub e ( M , a ) = S u ∈ M L a ( u ) . Thus, if σ < ∆( M ) then M ⊕ σ = tub e ( M , σ ). Let p, q ∈ M . The angle b et w een t wo tangent spaces T p and T q is deﬁned to b e angle ( T p , T q ) = cos − 1  min u ∈ T p max v ∈ T q |h u − p, v − q i|  (15) where h u, v i is the usual inner pro duct in R D . Let d M ( p, q ) denote the geo desic distance b et ween p, q ∈ M . W e now summarize some useful results from Niyogi et al. (2006). Lemma 3 L et M ⊂ K b e a manifold and supp ose that ∆( M ) = κ > 0 . L et p, q ∈ M . 4 Minimax Manifold Estima tion Figure 1: The condition num b er ∆( M ) of a manifold is the largest num b er κ such that the normals to the manifold do not cross as long as they are not extended b ey ond κ . The plot on the left sho ws a one-dimensional manifold (a curv e) and some normals of length r < κ . The plot on the righ t shows the same manifold and some normals of length r > κ . 1. L et γ b e a ge o desic c onne cting p and q with unit sp e e d p ar ameterization. Then the curvatur e of γ is b ounde d ab ove by 1 /κ . 2. cos( angle ( T p , T q )) > 1 − d M ( p, q ) /κ . Thus, angle ( T p , T q ) ≤ p 2 d M ( p, q ) /κ + o ( p d M ( p, q ) /κ ) . 3. If a = || p − q || ≤ κ/ 2 then d M ( p, q ) ≤ κ − κ p 1 − (2 a ) /κ = a + o ( a ) . 4. If a = || p − q || ≤ κ/ 2 then a ≥ d M ( p, q ) − ( d M ( p, q )) 2 / (2 κ ) . 5. If || q − p || >  and v ∈ B D ( q ,  ) ∩ T ⊥ p M ∩ B D ( p, κ ) then || v − p || <  2 /κ . 6. Fix any δ > 0 . Ther e exists p oints x 1 , . . . , x N ∈ M such that M ⊂ S N j =1 B D ( x j , δ ) and such that N ≤ ( c/δ ) d . F or further information ab out manifolds, see Lee (2002). 2.2 Distributional Assumptions The distribution of Y is induced b y the distribution of ξ and Z . W e will assume that ξ is drawn uniformly on the manifold. Then we assume that Z is drawn uniformly on the normal to M . More precisely , given ξ , we draw Z uniformly on L σ ( ξ ). In other words, the noise is p erp endicular to the manifold. The result is that, if σ < κ , then the distribution Q = Q M of Y has supp ort equal to M ⊕ σ . The distributional assumption on ξ is not critical. An y smo oth density b ounded aw ay from 0 on the manifold will lead to similar results. How ever, the assumption on the noise Z is critical. W e hav e chosen the simplest noise distribution here. (P erp endicular noise is also assumed in Niy ogi et al. (2008).) In curren t w ork, w e are deriving the rates for more complicated noise distributions. The rates are quite diﬀeren t and the pro ofs are more complex. Those results will b e reported elsewhere. 5 Genovese, Perone-P a cifico, Verdinelli and W asserman The set of distributions we consider is as follows. Let κ and σ b e ﬁxed p ositive n umbers suc h that 0 < σ < κ . Let Q ≡ Q ( κ, σ ) = n Q M : M ∈ M ( κ ) o . (16) F or an y M ∈ M ( κ ) consider the corresp onding distribution Q M , supp orted on S M = M ⊕ σ . Let q M b e the density of Q M with resp ect to Leb esgue measure. W e now show that q M is b ounded ab o ve and below by a uniform density . Recall that the essen tial suprem um and essen tial inﬁmum of q M are deﬁned by ess sup y ∈ A q M = inf n a ∈ R : ν D ( { y : q M ( y ) > a } ∩ A ) = 0 o and ess inf y ∈ A q M = sup n a ∈ R : ν D ( { y : q M ( y ) < a } ∩ A ) = 0 o . Also recall that, by the Leb esgue density theorem, q M ( y ) = lim  → 0 Q M ( B D ( y ,  )) /V ( B D ( y ,  )) for almost all y . Let U M b e the uniform distribution on M ⊕ σ and let u M = 1 /V ( M ⊕ σ ) b e the densit y of U M . Note that, for A ⊂ M ⊕ σ , U M ( A ) = V ( A ) /V ( M ⊕ σ ). Lemma 4 Ther e exist c onstants 0 < C ∗ ≤ C ∗ < ∞ , dep ending only on κ and d , such that C ∗ ≤ inf M ∈M ess inf y ∈ S M q M ( y ) u M ( y ) ≤ sup M ∈M ess sup y ∈ S M q M ( y ) u M ( y ) ≤ C ∗ . (17) Pro of Choose any M ∈ M ( κ ). Let x by an y p oint in the in terior of S M . Let B = B D ( x,  ) where  > 0 is small enough so that B ⊂ S M = M ⊕ σ . Let y b e the pro jection of x on to M . W e wan t to upp er and lo wer b ound Q ( B ) /V ( B ). Then w e will take the limit as  → 0. Consider the tw o spheres of radius κ tangen t to M at y in the direction of the line b et ween x and y . (See Figure 2.) Note that Q ( B ) is maximized by taking M to b e equal to the upp er sphere and Q ( B ) is minimized b y taking M to b e equal to the lo wer sphere. Let us consider ﬁrst the case where M is equal to the upp er sphere. Let U = n u ∈ M : L σ ( u ) ∩ B 6 = ∅ o b e the pro jection of B onto M . By simple geometry , U = M ∩ B D ( y , r ) where  1 + σ κ  − 1 ≤ r ≤  1 + σ κ  . Let Vol denote d -dimensional volume on M . Then Vol ( B D ( y , r ) ∩ M ) ≤ c 1 r d  d ω d where ω d is the v olume of a unit d -ball and c 1 dep ends only on κ and d . T o see this, note that because M is a manifold and ∆( M ) ≥ κ , it follows that near y , M may b e lo cally parameterized as a smo oth function f = ( f 1 , . . . , f D − d ) o v er B ∩ T y M . The surface area of the graph of f o v er B ∩ T y M is bounded b y R B D ( y ,r ) ∩ T y M q 1 + k∇ f i k 2 , which is bounded b y a constant c 1 uniformly ov er M . Hence, Vol ( B D ( y , r ) ∩ M ) ≤ c 1 V ol ( B D ( y , r ) ∩ T y M ) = c 1 r d  d ω d . 6 Minimax Manifold Estima tion Let Λ M b e the uniform distribution on M and let Γ u denote the uniform measure on L σ ( u ). Note that, for u ∈ U , L σ ( u ) ∩ B is a ( D − d )-ball whose radius is at most  . Hence, Γ u ( L σ ( u ) ∩ B ) ≤  D − d ω D − d σ D − d ω D − d =   σ  D − d . Th us, Q M ( B ) = Z M Γ u ( B ∩ L σ ( u )) d Λ M ( u ) = Z U Γ u ( B ∩ L σ ( u )) d Λ M ( u ) ≤   σ  D − d Λ( U ) =   σ  D − d V ol ( B D ( y , r ) ∩ M ) V ol ( M ) ≤   σ  D − d  d r d ω d V ol ( M ) ≤   σ  D − d  d (1 + σ /κ ) d ω d V ol ( M ) . No w, U M ( B ) = V ( B ) /V ( M ⊕ σ ) =  D ω D / ( σ D − d V ol ( M )). Hence, Q M ( B ) U M ( B ) ≤  1 + σ κ  d ω d . T aking limits as  → 0 w e ha ve that q M ( y ) ≤ C ∗ u M ( y ) for almost all y . The pro of of the low er b ound is similar to the upper b ound except for the follo wing c hanges: let U 0 denote all u ∈ U suc h that the radius of B ∩ L σ ( u ) is at least / 2. Then Λ( U 0 ) ≥ Λ( U )(1 − O (  )) and the pro jection of U 0 on to M is again of the form B D ( y , r ) ∩ M . By Lemma 5.3 of Niyogi et al. (2006), V ol ( B D ( y , r ) ∩ M ) ≥  1 − r 2  2 4 κ 2  d/ 2 r d  d ω d and the latter is larger than 2 − d/ 2 r d  d ω d for all small  . Also, Γ u ( L σ ( u ) ∩ B ) ≥ ( / (2 σ )) D − d for all u ∈ U 0 . Of course, an immediate consequence of the ab o ve lemma is that, for every M ∈ M ( κ ) and every measurable set A , C ∗ U M ( A ) ≤ Q M ( A ) ≤ C ∗ U M ( A ). 3. Minimax Lo w er Bound In this section w e deriv e a lo w er bound on the minimax rate of conv ergence for this problem. W e will make use of the follo wing result due to LeCam (1973). The following version is from Lemma 1 of Y u (1997). Lemma 5 (Le Cam 1973) L et Q b e a set of distributions. L et θ ( Q ) take values in a metric sp ac e with metric ρ . L et Q 0 , Q 1 ∈ Q b e any p air of distributions in Q . L et Y 1 , . . . , Y n b e dr awn iid fr om some Q ∈ Q and denote the c orr esp onding pr o duct me asur e by Q n . L et b θ ( Y 1 , . . . , Y n ) b e any estimator. Then sup Q ∈Q E Q n h ρ ( b θ ( Y 1 , . . . , Y n ) , θ ( Q )) i ≥ ρ  θ ( Q 0 ) , θ ( Q 1 )  || Q n 0 ∧ Q n 1 || . (18) T o get a useful b ound from Le Cam’s lemma, w e need to construct an appropriate pair Q 0 and Q 1 . This is the topic of the next subsection. 7 Genovese, Perone-P a cifico, Verdinelli and W asserman M ● x ● y Figure 2: Figure for proof of Lemma 4. x is a p oin t in the supp ort M ⊕ σ . y is the pro jection of x onto M . The tw o spheres are tangent to M at y and hav e radius κ . 3.1 A Geometric Construction In this section, w e construct a pair of manifolds M 0 , M 1 ∈ M ( κ ) and corresp onding distri- butions Q 0 , Q 1 for use in Le Cam’s lemma. An informal description is as follows. Roughly sp eaking, M 0 and M 1 minimize the Hellinger distance h ( Q 0 , Q 1 ) sub ject to their Hausdorﬀ distance H ( M 0 , M 1 ) b eing equal to a giv e n v alue γ . Let M 0 = n ( u 1 , . . . , u d , 0 , . . . , 0) : − 1 ≤ u j ≤ 1 , 1 ≤ j ≤ d o (19) b e a d -dimensional hyperplane in R D . Hence ∆( M 0 ) = ∞ . Place a hypersphere of radius κ b elow M 0 . Push the sphere up w ards in to M 0 causing a bump of height γ at the origin. This creates a new manifold M 0 0 suc h that H ( M 0 , M 0 0 ) = γ . Ho w ev er, M 0 0 is not smo oth. W e will roll a sphere of radius κ around M 0 0 to get a sm ooth manifold M 1 as in Figure 3. The formal details of the construction are in Section 7.2. Theorem 6 L et γ b e a smal l p ositive numb er. L et M 0 and M 1 b e as deﬁne d in Se ction 7.2. L et Q i b e the c orr esp onding distributions on M i ⊕ σ for i = 0 , 1 . Then: 1. ∆( M i ) ≥ κ , i = 0 , 1 . 2. H ( M 0 , M 1 ) = γ . 3. R | q 0 − q 1 | = O ( γ ( d +2) / 2 ) . Pro of See Section 7.2. 8 Minimax Manifold Estima tion A B C D Figure 3: A sphere of radius κ is pushed upw ards in to the plane M 0 (panel A). The resulting manifold M 0 0 is not smo oth (panel B). A sphere is th en rolled around the manifold (panel C) to pro duce a smooth manifold M 1 (panel D). 9 Genovese, Perone-P a cifico, Verdinelli and W asserman 3.2 Pro of of the Lo wer Bound No w w e are in a p osition to pro ve the ﬁrst theorem. Let us ﬁrst restate the theorem. Theorem 1. Ther e is a c onstant C > 0 such that, for al l lar ge n , inf c M sup Q ∈Q E Q h H ( c M , M ) i ≥ C n − 2 2+ d (20) wher e the inﬁmum is over al l estimators c M . Pro of of Theorem 1. Let M 0 and M 1 b e as deﬁned in Section 3.1. Let Q i b e the uniform distribution on M i ⊕ σ , i = 0 , 1. Let q i b e the density of Q i with resp ect to Leb esgue measure ν D , i = 0 , 1. Then, from Theorem 6, H ( M 0 , M 1 ) = γ and R | q 0 − q 1 | = O ( γ ( d +2) / 2 ). Le Cam’s lemma then gives, for an y c M , sup Q ∈Q E Q n [ H ( M , c M )] ≥ H ( M 0 , M 1 ) || Q n 0 ∧ Q n 1 || ≥ γ (1 − cγ ( d +2) / 2 ) 2 n where we used equation (13). Setting γ = n − 2 / ( d +2) yields the result.  4. Upp er b ound T o establish the upp er b ound, w e will construct an estimator that achiev es the appropriate rate. The estimator is intended only for the theoretical purp ose of establishing the rate. (A simpler but non-optimal method is discussed in Section 5.) Recall that M = M ( κ ) is the set of all d -dimensional submanifolds M con tained in K such that ∆( M ) ≥ κ > 0. Before pro ceeding, we need to discuss siev e maxim um likelihoo d. Siev e Maximum Lik eliho o d. Let P b e an y set of distributions such that each P ∈ P has a densit y p with resp ect to Lebesgue measure ν D . Recall that h denotes Hellinger distance. A set of pairs of functions B = { ( ` 1 , u 1 ) , . . . , ( ` N , u N ) } is an  -Hellinger brac ketin g for P if, (i) for each p ∈ P there is a ( `, u ) ∈ B such that ` ( y ) ≤ p ( y ) ≤ u ( y ) for all y and (ii) h ( `, u ) ≤  . The logarithm of the size of the smallest  -brac keting is called the br acketing entr opy and is denoted b y H [ ] ( , P , h ). W e will mak e use of the following result whic h is Example 4 of Shen and W ong (1995). Theorem 7 (Shen and W ong (1995)) L et  n solve the e quation H [ ] (  n , P , h ) = n 2 n . L et ( ` 1 , u 1 ) , . . . , ( ` N , u N ) b e an  n br acketing wher e N = H [ ] (  n , P , h ) . Deﬁne the set of densities S ∗ n = { p ∗ 1 , . . . , p ∗ N } wher e p ∗ t = u t / R u t . L et b p ∗ maximize the likeliho o d Q n i =1 p ∗ t ( Y i ) over the set S ∗ n . Then sup P ∈P P n ( { h ( p, b p ∗ ) ≥  n } ) ≤ c 1 e − c 2 n 2 n . (21) 10 Minimax Manifold Estima tion The sequence { S ∗ n } in Theorem 7 is called a sieve and the estimator b p ∗ is called a sieve- maximum likeliho o d estimator . The estimator b p ∗ need not be in P . W e will actually need an estimator that is contained in P . W e ma y construct one as follo ws. Let b p ∗ b e the sieve mle corresponding to S ∗ n . Then b p ∗ = p ∗ t for some t . Let ( b `, b u ) ≡ ( ` t , u t ) be the corresp onding brac k et. Lemma 8 Assume the c onditions in The or em 7. L et b p b e any density in P such that b ` ≤ b p ≤ b u . If  n ≤ 1 then sup P ∈P P n ( { h ( p, b p ) ≥ c n } ) ≤ c 1 e − c 2 n 2 n . (22) Pro of By the triangle inequality , h ( p, b p ) ≤ h ( p, b p ∗ ) + h ( b p, b p ∗ ) = h ( p, b p ∗ ) + h ( b p, u t / R u t ) where b p ∗ = u t / R u t for some t . F rom Theorem 7, h ( p, b p ∗ ) ≤  n with high probabilit y . Th us w e need to show that h ( b p, u t / R u t ) ≤ C  n . It suﬃces to sho w that, in general, h ( p, u/ R u ) ≤ C h ( `, u ) whenev er ` ≤ p ≤ u . Let ( `, u ) b e a brack et and let δ 2 = h 2 ( `, u ) ≤ 1. Let ` ≤ p ≤ u . W e claim that h 2 ( p, u/ R u ) ≤ 4 δ 2 . (T aking δ =  n then prov es the result.) Let c 2 = R u . Then 1 ≤ c 2 = R u = R p + R ( u − p ) = 1 + R ( u − p ) = 1 + ` 1 ( u, p ) ≤ 1 + 2 h ( u, ` ) = 1 + 2 δ . No w, h 2  p, u R u  = Z ( √ u/c − √ p ) 2 = 1 c 2 Z ( √ u − c √ p ) 2 ≤ Z ( √ u − c √ p ) 2 = Z (( √ u − √ p ) + ( c − 1) √ p ) 2 ≤ 2 Z ( √ u − √ p ) 2 + 2( c − 1) 2 ≤ 2 δ 2 + 2( √ 1 + 2 δ − 1) 2 ≤ 2 δ 2 + 2 δ 2 = 4 δ 2 where the last inequalit y used the fact that δ ≤ 1. In light of the ab o ve result, w e deﬁne mo diﬁed maximum likelihoo d sieve estimator b p to b e an y p ∈ P suc h that b ` ≤ b p ≤ b u . F or simplicity , in the rest of the pap er, w e refer to the mo diﬁed sieve estimator b p , simply as the maxim um lik eliho o d estimator (mle). Outline of pro of. W e are now ready to ﬁnd an estimator c M that conv erges at the optimal rate (up to loga- rithmic terms.) Our strategy for estimating M has the following steps: Step 1. W e split the data in to tw o halv es. Step 2. Let e Q b e the maximum lik eliho o d estimator using the ﬁrst half of the data. Deﬁne f M to b e the corresponding manifold. W e call f M , the pilot estimator. W e sho w that f M is a consistent estimator of M that con verges at a sub-optimal rate a n = n − 2 D ( d +2) . T o show this w e: a. Compute the Hellinger brac keting en trop y of Q . (Theorem 9, Lemmas 10 and 11). 11 Genovese, Perone-P a cifico, Verdinelli and W asserman b. Establish the rate of conv ergence of the mle in Hellinger distance, using the brac k eting en tropy and Theorem 7. c. Relate the Hausdorﬀ distance to the Hellinger distance and hence establish the rate of conv ergence a n of the mle in Hausdorﬀ distance. (Lemma 13). d. Conclude that the true manifold is con tained, with high probabilit y , in M n = { M ∈ M ( κ ) : H ( M , f M ) ≤ a n } (Lemma 14). Hence, w e can now restrict atten tion to M n . Step 3. T o impro v e the pilot estimator, we need to control the relationship b et ween Hellinger and Hausdorﬀ distance and th us need to work ov er small sets on which the manifold cannot v ary to o greatly . Hence, w e cov er the pilot estimator with long, thin slabs R 1 , . . . , R N . W e do this by ﬁrst cov ering f M with spheres ג 1 , . . . , ג N of radius δ n = O ((log n/n ) 1 / (2+ d ) ). W e deﬁne a slab R j to b e the union of ﬁb ers of size b = σ + a n within one of the spheres: R j = ∪ x ∈ ג j L b ( x, f M ). W e then sho w that: a. The set of ﬁb ers on f M cov er each M ∈ M n in a nice w a y . In particular, if M ∈ M n then each ﬁb er from f M is nearly normal to M . (Lemma 15). b. As M cuts through a slab, it sta ys nearly parallel to f M . Roughly speaking, M b eha ves like a smooth, nearly linear function within eac h slab. (Lemma 16). Step 4. Using the second half of the data, w e apply maximum likelihoo d within eac h slab. This deﬁnes estimators c M j , for 1 ≤ j ≤ N . W e show that: a. The en tropy of the set of distributions within a slab is very small. (Lemma 18). b. Because the en trop y is small, the maxim um likelihoo d estimator within a slab con v erges fairly quickly in Hellinger distance. The rate is  n = (log n/n ) 1 / (2+ d ) . (Lemma 19). c. Within a slab, there is a tight relationship b et w een Hellinger distance and Haus- dorﬀ distance. Sp eciﬁcally , H ( M 1 , M 2 ) ≤ c h 2 ( Q 1 , Q 2 ). (Lemma 20). d. Steps (4b) and (4c) imply that H ( M ∩ R j , c M j ) = O P (  2 n ) = O P ((log n/n ) 2 / ( d +2) ). Step 5. Finally we deﬁne c M = S N j =1 c M j and show that c M con verges at the optimal rate b ecause each c M j do es within its own slab. The reason for getting a preliminary estimator and then co vering the estimator with thin slabs is that, within a slab, there is a tight relationship b etw een He llinger distance and Hausdorﬀ distance. This is not true globally but only in thin slabs. Maximum lik eliho o d is optimal with resp ect to Hellinger distance. Within a slab, this allows us to get optimal rates in Hausdorﬀ distance. Step 1: Data Splitting F or simplicity assume the sample size is even and denote it b y 2 n . W e split the data into t w o halv es which w e denote by X = ( X 1 , . . . , X n ) and Y = ( Y 1 , . . . , Y n ). Step 2: Pilot Estimator 12 Minimax Manifold Estima tion Let e q b e the maxim um likelihoo d estimator ov er Q . Let f M b e the corresponding manifold. T o study the properties of f M requires t wo steps: computing the brac keting en tropy of Q and relating H ( M , f M ) to h ( q , e q ). The former allows us to apply Theorem 7 to b ound h ( q , e q ), and the latter allo ws us to con trol the Hausdorﬀ distance. Step 2a: Computing the En trop y of Q . T o compute the en tropy of Q w e start b y constructing a ﬁnite net of manfolds to cov er M ( κ ). A ﬁnite set of d -manifolds M γ = { M 1 , . . . , M N } is a γ -net (or a γ -cov er) if, for each M ∈ M there exists M j ∈ M γ suc h that H ( M , M j ) ≤ γ . Let N ( γ ) = N ( γ , M , H ) be the size of the smallest co vering set, called the (Hausdorﬀ ) co vering num b er of M . Theorem 9 The Hausdorﬀ c overing numb er of M satisﬁes the fol lowing: N ( γ ) ≡ N ( γ , M , H ) ≤ c 1 κ 2 ( κ, d, D ) exp  κ 3 ( κ, d, D ) γ − d/ 2  ≡ c exp  c 0 γ − d/ 2  (23) wher e κ 2 ( κ, d, D ) =  D d  ( c 2 /κ ) D and κ 3 ( κ, d, D ) = 2 d/ 2 ( D − d )( c 2 /κ ) D , for a c onstant c 2 that dep ends only on κ and d . Pro of Recall that the manifolds in M all lie within K . Consider any h yp ercub e con taining K . Divide this cube into a grid of J = (2 c/κ ) D sub-cub es { C 1 , . . . , C J } of side length κ/c , where c ≥ 4 is a p ositiv e constan t c hosen to b e suﬃciently large. Our strategy is to sho w that within eac h of these cub es, the manifold is the graph of a smooth function. W e then only need count the n umber of suc h smo oth functions. In thinking ab out the manifold as (lo cally) the graph of a smo oth function, it helps to b e able to translate easily b et ween the natural co ordinates in K and the domain-range co ordinates of the function. T o that end, within eac h subcub e C j for j ∈ { 1 , . . . , J } , w e deﬁne K =  D d  co ordinate frames, F j k for k ∈ { 1 , . . . , K } , in whic h d out of D co ordinates are lab eled as “domain” and the remaining D − d coordinates are labeled as “range.” Eac h frame is asso ciated with a relab eling of the coordinates so that the d “domain” co ordinates are listed ﬁrst and D − d “range” co ordinates last. That is, F j k is deﬁned b y a one-to-one corresp ondence b et ween x ∈ C j and ( u, v ) ∈ π j k ( x ) where u ∈ R d and v ∈ R D − d and π j k ( x 1 , . . . , x D ) = ( x i 1 , . . . , x i d , x j 1 , . . . , x j D − d ) for domain co ordinate indices i 1 < . . . < i d and range co ordinate indices j 1 < . . . < j D − d . W e deﬁne domain( F j k ) = { u ∈ R d : ∃ v ∈ R D − d suc h that ( u, v ) ∈ F j k } , and let G j k denote the class of functions deﬁned on domain( F j k ) whose second deriv ative (i.e., second fundamen tal form) is b ounded ab o ve b y a constant C ( κ ) that depends only on κ . T o sa y that a set R ⊂ C j is the graph of a function on a d -dimensional subset of the co ordinates in C j is equiv alent to sa ying that for some frame F j k and some set A ⊂ domain( F j k ), R = π − 1 j k { ( u, f ( u )) : u ∈ A } . W e will prov e the theorem by establishing the following claims. Claim 1 . Let M ∈ M and C j b e a sub cube that in tersects M . Then: (i) for at least one k ∈ { 1 , . . . , K } , the set M ∩ C j is the graph of a function (i.e., single-v alued mapping) deﬁned on a set A ⊂ domain( F j k ), of the form ( u 1 , . . . , u d ) 7→ π − 1 j k (( u, f ( u ))) for some function f on A , and (ii) this function lies in G j k . 13 Genovese, Perone-P a cifico, Verdinelli and W asserman Claim 2 . M is in one-to-one corresp ondence with a subset of G = Q J j =1 S K k =1 G j k . Claim 3 . The L ∞ co v ering n umber of G satisﬁes N ( γ , G , L ∞ ) ≤ c 1  D d  (2 c/κ ) D exp  ( D − d )(2 c/κ ) D γ − d/ 2  . Claim 4 . There is a one-to-one corresp ondence b et ween an γ / 2 L ∞ -co v er of G and an γ Hausdorﬀ-co v er of M . T ak en together, the claims imply that N ( γ , M , H ) ≤ c 1  D d  (2 c/κ ) D exp(( D − d )(2 c/κ ) D 2 d/ 2 γ − d/ 2 ) . T aking c 2 = 2 c prov es the theorem. Pr o of of Claim 1 . W e b egin b y showing that (i) implies (ii). By part 1 of Lemma 3, each M ∈ M has curv ature (second fundamen tal form) bounded abov e by 1 /κ . This implies that the function iden tiﬁed in (i) has uniformly b ounded second deriv ativ e and th us lies in the corresp onding G j k . W e prov e (i) b y con tradiction. Suppose that there is an M ∈ M such that for every j with M ∩ C j 6 = ∅ , the set M ∩ C j is not the graph of a single-v alued mapping for any of the K co ordinate frames. Fix j ∈ { 1 , . . . , J } . Then in each domain( F j k ), there is a p oint u such that C j ∩ π − 1 j k ( u × R D − d ) in tersects M in at least tw o p oin ts, call them a k and b k . By construction k a k − b k k ≤ √ D − d · κ/c , and hence by choosing c large enough (making the cub es small), part 3 of Lemma 3 tells us that d M ( a k , b k ) ≤ 2 √ D − dκ/c . Then w e argue as follo ws: 1. By parts 2 and 3 of Lemma 3 and the fact that C j has diameter √ D κ/c and max p,q ∈ C j ∩ M cos( angle ( T p M , T q M )) ≥ 1 − 2 √ D c . F or large enough c , the maximum angle b et w een tangent v ectors can b e made smaller than π / 3. 2. By part 2 of Lemma 3, an y point z along a geo desic b et ween a k and b k , cos( angle ( T a k M , T z M )) ≥ 1 − 2 √ D − d c . It follo ws that there is a p oin t in C j ∩ M and a tangent vector v k at that point such that angle ( v k , b k − a k ) = O (1 / √ c ). 3. W e hav e for eac h of K =  D d  co ordinate frames and asso ciated tangent vectors v 1 , . . . , v K that are each nearly orthogonal to at least d of the others. Consequently , there are ≥ d + 1 nearly orthogonal tangent v ectors of M within C j . This con tradicts p oin t 1 and prov es the claim. 14 Minimax Manifold Estima tion Pr o of of Claim 2 . W e construct the correspondence as follo ws. F or eac h cube C j , let k ∗ j b e the smallest k such that M ∩ C j is the graph of a function φ j k ∈ G j k as in Claim 1. Map M to ϕ = ( φ 1 k ∗ 1 , . . . , φ J k ∗ J ), and let F ⊂ G b e the image of this map. If M 6 = M 0 ∈ M , then the corresp onding ϕ and ϕ 0 m ust b e distinct. If not, then M ∩ C j = M 0 ∩ C j for all j , contradicting M 6 = M 0 . The corresp ondence from M to F is th us a one-to-one corresp ondence. Pr o of of Claim 3 . F rom the results in Birman and Solomjak (1967), the set of functions deﬁned on a pre-compact d -dimensional set that tak e v alues in a ﬁxed dimension space R m with uniformly b ounded second deriv ative has L ∞ co v ering num b er bounded ab o ve b y c 1 e m (1 /γ ) d/ 2 for some c 1 . P art 1 of Lemma 3 shows that each M ∈ M has curv ature (second fundamen tal form) b ounded ab o ve by 1 /κ , so eac h G j k satisﬁes Birman and Solomjak’s conditions. Hence, N ( γ , G j k , L ∞ ) ≤ c 1 e ( D − d )(1 /γ ) d/ 2 . Because all the G j k ’s are disjoin t, simple counting arguments sho w that N ( γ , G , L ∞ ) =   D d  N ( γ , G j k , L ∞ )  J , where J is the n um b er of cub es deﬁned ab o v e. The claim follows. (Note that the functions in Claim 1 are deﬁned on a subset of domain( F j k ). But b ecause all such functions hav e an extension in G j k , a cov ering of G j k also cov ers these functions deﬁned on restricted domains.) Pr o of of Claim 4 . First, note that if t w o functions are less than γ distant in L ∞ , their graphs are less than γ distan t in Hausdorﬀ distance, and vice versa. This implies that a γ L ∞ -co v er of a set of functions corresp onds directly to an γ Hausdorﬀ-cov er of the set of the functions’ graphs. Hence, in the argument that follo ws, w e can w ork with functions or graphs interc hangeably . F or k ∈ { 1 , . . . , K } , let G γ j k b e a minimal L ∞ co v er of G j k b y γ / 2 balls; sp eciﬁcally , w e assume that G γ j k is the set of cen ters of these balls. F or each g j k ∈ G γ j k , deﬁne f j k ( u ) = π − 1 j k ( u, g j k ( u )). F or ev ery j , choose one such f j k , and deﬁne a set M 0 = S j ( C j ∩ range( f j k j )), whic h is a union of manifolds with b oundary that hav e curv ature b ounded by 1 /κ . That is, suc h an M 0 is piecewise smooth (smo oth within eac h cub e) but may fail to satisfy ∆( M 0 ) ≥ κ globally . Let A be the collection of M 0 constructed this w ay . There are N ( γ / 2 , G , L ∞ ) elements in this collection. By construction and Claim 2, for each M ∈ M , there exists an M 0 ∈ A such that H ( M , M 0 ) ≤ γ / 2. In other w ords, the set of γ / 2 Hausdorﬀ balls around the manifolds in A co vers M but the elemen ts of A are not themselv es necessarily in M . Let B H ( A, γ / 2) denote the set of all d -manifolds M ∈ M such that H ( A, M ) ≤ γ / 2. Le t A 0 = n A ∈ A : B H ( A, γ / 2) ∩ M 6 = ∅ o . (24) F or each A ∈ A 0 , c ho ose some e A ∈ B H ( A, γ / 2) ∩ M . By the triangle inequalit y , the set { e A : A ∈ A 0 } forms an γ Hausdorﬀ-net for M . This prov es the claim. W e are almost ready to compute the entrop y . W e will need the following lemma. Lemma 10 L et 0 < γ < κ − σ . Ther e exists a c onstant K > 0 (dep ending only on K , κ and σ ) such that, for any M 1 , M 2 ∈ M ( κ ) , H ( M 1 , M 2 ) ≤ γ implies that | V ( M 1 ⊕ σ ) − V ( M 2 ⊕ σ ) | ≤ K γ . Also, for any M ∈ M ( κ ) , | V ( M ⊕ ( σ + γ )) − V ( M ⊕ σ ) | ≤ K γ . 15 Genovese, Perone-P a cifico, Verdinelli and W asserman Pro of Let S j = M j ⊕ σ , j = 1 , 2. Then, using (14), S 2 ⊂ M 1 ⊕ ( σ + γ ) = [ u ∈ M 1 L σ + γ ( u ) . (25) Hence, uniformly ov er M , V ( S 2 ) ≤ Z M 1 ν D − d ( L σ + γ ( u )) dµ M 1 ≤ Z M 1 ν D − d ( L σ ( u )) dµ M 1 + K γ = V ( S 1 ) + K γ since ν D − d ( B ( u, σ + γ )) ≤ ν D − d ( B ( u, σ )) + K γ for some K > 0 not depending on M 1 or M 2 . By a symmetric argumen t, V ( S 1 ) ≤ V ( S 2 ) + K γ . Hence, | V ( M 1 ⊕ σ ) − V ( M 2 ⊕ σ ) | ≤ K γ . The second statement is pro ved in a similar wa y . No w we construct a Hellinger brack eting. Let γ =  2 . Let M γ = { M 1 , . . . , M N } b e a γ -Hausdorﬀ net of manifolds. Thus, by Theorem 9, N = N (  2 , M , H ) ≤ c 1 e c 2 (1 / ) d . Let ω denote the volume of a sphere of radius σ . Let q j b e the density corresp onding to M j . Deﬁne u j ( y ) =  q j ( y ) + 2  2 V ( M j ⊕ ( σ +  2 ))  I ( y ∈ M j ⊕ ( σ +  2 )) and ` j ( y ) =  q j ( y ) − 2  2 V ( M j ⊕ ( σ −  2 ))  I ( y ∈ M j ⊕ ( σ −  2 )) . Let B = { ( ` 1 , u 1 ) , . . . , ( ` N , u N ) } . Lemma 11 B is an  -Hel linger br acketing of Q . Henc e, H [ ] ( , Q , h ) ≤ C (1 / ) d . Pro of Let M ∈ M ( κ ) and let Q = Q M b e the corresp onding distribution. Let q b e the densit y of Q . Q is supp orted on S = M ⊕ σ . There exists M j ∈ M γ suc h that H ( M , M j ) ≤  2 . Let y b e in S . Then there is a x ∈ M such that || y − x || ≤ σ . There is a x 0 ∈ M j suc h that || x − x 0 || ≤  2 . Hence, d ( y , M j ) ≤ σ +  2 and th us y is in the supp ort of u j . Now, for y ∈ S , u j ( y ) − q ( y ) = 2  2 /V ( M j ⊕ ( σ +  2 )) ≥ 0. Hence, q ( y ) ≤ u j ( y ). By a similar argument, ` j ( y ) ≤ q ( y ). Th us B is a brack eting. No w ` 1 ( ` j , u j ) = Z u j − Z ` j =  1 + 2 K  2 ω  −  1 − 2 K  2 ω  = 4 K  2 ω . Finally , by (11), h ( u j , ` j ) ≤ p ` 1 ( ` j , u j ) = C  . Thus B is a C  -Hellinger brack eting. Step 2b. Hellinger Rate. Lemma 12 L et e Q b e the mle. Then sup Q ∈Q Q n n h ( Q, e Q ) > C 0 n − 1 d +2 o ≤ exp n − C n d 2+ d o . 16 Minimax Manifold Estima tion Pro of W e ha v e sho wn (Lemma 11) that H [ ] ( , Q , h ) ≤ C (1 / ) d . Solving the equation H [ ] (  n , Q , h ) = n  2 n from Theorem 7 w e get  n = (1 /n ) 1 / ( d +2) . F rom Lemma 8, for all Q Q n n h ( Q, e Q ) > C 0 n − 1 d +2 o ≤ c 1 e − c 2 n 2 n = exp n − C n d 2+ d o . Step 2c. Relating Hellinger Distance and Hausdorﬀ Distance. Lemma 13 L et c = ( κ − σ ) √ π C ∗ / (2 Γ( D / 2 + 1)) . If M 1 , M 2 ∈ M ( κ ) and h ( Q 1 , Q 2 ) < c then H ( M 2 , M 2 ) ≤ " 2 √ π  Γ( D / 2 + 1) C ∗  1 /D # h 1 D ( Q 1 , Q 2 ) Pro of Let b = H ( M 1 , M 2 ) and γ = min { κ − σ, b } . Let S 1 , S 2 b e the supp orts of Q 1 and Q 2 . Because H ( M 1 , M 2 ) = b , we can ﬁnd p oin ts x ∈ M 1 and y ∈ M 2 suc h that k y − x k = b . Note that T x M 1 and T y M 2 . are parallel, otherwise we could mo ve x or y and increase k y − x k . It follows that the line segment [ x, y ] is along a common normal v ector of the t wo manifolds and we can write y = x ± bu for some u ∈ L σ ( u, M ). Without loss of generalit y , assume that y = x + bu . Let x 0 = x + σ u and y 0 = y + σu . Hence, x 0 ∈ ∂ S 1 , y 0 ∈ ∂ S 2 and || x 0 − y 0 || = b . Note that ∂ S 1 and ∂ S 2 are themselves smooth D -manifolds with ∆( ∂ S i ) ≥ κ − σ > 0. W e now make the follo wing three claims: 1. y 0 ∈ S 2 − S 1 . 2. ( x 0 , y 0 ] ⊂ S 2 − S 1 3. interior B  x 0 + y 0 2 , γ 2  ⊂ S 2 − S 1 First, note that y 0 diﬀers from y along a ﬁb er of M 2 b y exactly σ , therefore [ x 0 , y 0 ] ⊂ S 2 . Second, because x 0 ∈ ∂ S 1 , there is a neigh b orhoo d of x 0 in [ x 0 , y 0 ] that is not con tained in S 1 . Hence, if there is a p oin t in S 1 ∩ [ x 0 , y 0 ] there m ust b e a p oin t z 0 ∈ ∂ S 1 ∩ [ x 0 , y 0 ], with z 0 6 = x 0 . This implies the existence of t wo distinct points whose ﬁbers of length less than κ − σ cross, which con tradicts the fact that ∆( ∂ S 1 ) ≥ κ − σ . Claims 1 and 2 follows. Let B = B  x 0 + y 0 2 , γ 2  . By construction, B is tangent to ∂ S 1 at x 0 and tangent to ∂ S 2 at y 0 , and B con tains [ x 0 , y 0 ]. The ball has radius γ / 2 = (1 / 2) min { κ − σ, b } < κ − σ . Because B intersects S 2 − S 1 , the in terior of B cannot intersect either ∂ S 1 or ∂ S 2 . Claim 3 follo ws b y a similar argumen t as in the pro of of Claim 2. (In particular, if there were a point in the in terior of B that is either in S 1 or outside S 2 , a line segmen t from ( x 0 + y 0 ) / 2 to that p oin t w ould ha ve to in tersect the corresp onding boundary , which cannot happ en.) No w V ( B ) = ( γ / 2) D π D/ 2 / Γ( D / 2 + 1). So h ( Q 1 , Q 2 ) ≥ ` 1 ( Q 1 , Q 2 ) = Z | q 1 − q 2 | ≥ Z S 1 ∩ S c 2 | q 1 − q 2 | = Z S 1 ∩ S c 2 q 1 = Q 1 ( S 1 ∩ S c 2 ) ≥ C ∗ V ( S 1 ∩ S c 2 ) = C ∗ ( γ / 2) D π D/ 2 / Γ( D / 2 + 1) . 17 Genovese, Perone-P a cifico, Verdinelli and W asserman Hence, γ = min { κ − σ, b } ≤ " 2 √ π  Γ( D / 2 + 1) C ∗  1 /D # h 1 /D ( Q 1 , Q 2 ) . If κ − σ ≤ b this implies that h ( Q 1 , Q 2 ) > c whic h con tradicts the assumption that h ( Q 1 , Q 2 ) < c . Therefore, γ = b and the conclusion follo ws. Step 2d. Computing The Hausdorﬀ Rate of the Pilot. Lemma 14 L et a n =  C 0 n  2 D ( d +2) . F or al l lar ge n , sup Q ∈Q Q n  { H ( M , f M ) > a n }  ≤ exp n − C n d 2+ d o . (26) Pro of F ollows by com bining Lemma 12 and Lemma 13. W e conclude that, with high probabilit y , the true manifold M is contained in the set M n = n M ∈ M ( κ ) : H ( f M , M ) ≤ a n o . Step 3: Cov er With Slabs No w w e co ver the pilot estimator f M with (possibly o verlapping) slabs. Let δ n =  C log n n  1 2+ d . It follo ws from part 6 of Lemma 3 that there exists a collection of points F = { x 1 , . . . , x N } ⊂ f M , suc h that N = ( cδ n ) − d = ( C n/ log n ) d/ (2+ d ) and suc h that f M ⊂ S N j =1 B D ( x j , cδ ). Step 3a. The Fib ers of f M Co ver M Nicely . Lemma 15 L et b = σ + a n . F or e x ∈ f M , let L b ( e x ) = T ⊥ e x f M ∩ B D ( e x, b ) b e a ﬁb er at e x of size b . L et M ∈ M n . Then: 1. If e x ∈ f M and x ∈ M ar e such that k x − e x k ≤ a n , then angle ( T x M , T e x f M ) < π / 4 . 2. L b ( e x ) ∩ M 6 = ∅ . 3. If x ∈ L b ( e x ) ∩ M , then k x − e x k ≤ 2 a n . 4. F or any e x ∈ f M , # { L b ( e x ) ∩ M } = 1 . 5. We have M ⊂ S e x ∈ f M L b ( e x ) . Pro of 1. Let x and e x b e as giv en in the statement of the lemma and let θ = angle ( T x M , T e x f M ). Supp ose that θ ≥ π / 4. There exists unit v ectors u ∈ T e x f M and v ∈ T x M suc h that 18 Minimax Manifold Estima tion M x v u T T x ~ M Pro j e ct i o n o f g e o d e si c O - / 2 a a n n l x x ~ x ~ ~ C C 1 2 w g l y ^ z ^ a n a q Pro j e ct i o n o f g e o d e si c Pro j . M ~ ~ ˠ ˠ x = a i - x s - ^ Figure 4: Figure for the pro of of part 1 of Lemma 15. angle ( u, v ) = θ . Without loss of generalit y , w e can assume that x = e x . (The extension to the case x 6 = e x is straightforw ard.) Consider the plane deﬁned b y u and v as in Figure 4. W e assume, without loss of generalit y , that ( u + v ) / 2 generates the x -axis in this plane and that v lies ab o ve the x -axis and u lies b elo w the x axis. Let ` denote the horizon tal line, parallel to the x -axis and lying 2 a n units ab o v e the horizontal axis. Hence, u and v eac h make an angle greater than π / 8 with resp ect to the x -axis. Consider the tw o circles C 1 and C 2 tangen t to M at x with radius κ where C 1 lies b elo w v and C 2 lies ab o ve v . Let w be the p oint at which C 1 in tersects ` . The arclength of C 1 from x to w is C a n for some C > 1. Let γ be the geo desic on M through x with gradient v . The pro jection b γ of γ in to the plane must fall b et ween C 1 and C 2 . Let y = γ ( C a n ) and b y be the pro jection of y into the plane. No w || y − e x || ≥ || b y − e x || ≥ || w − e x || ≥ 2 a n > a n . There exists e z ∈ f M suc h that || e z − y || ≤ a n . Hence, || b z − b y || ≤ a n where b z is the pro jection of e z in to the plane. Let q be the p oin t on the plane with co ordinates ( a n √ C 2 − 1 , a n ). Thus, || q − e x || = C a n . Note that angle ( b z − e x, u ) is larger than the angle betw een q − e x and the x -axis which is arctan  1 √ C 2 − 1  ≡ α > 0. Hence, angle ( e z − e x, u ) ≥ angle ( b z − e x, u ) ≥ α. Let e γ b e a geo desic on f M , parameterized b y arclength connecting e x and e z . Thus e γ (0) = e x and e γ ( T ) = e z for some T . There exists some 0 ≤ t ≤ T such that γ 0 ( t ) ∝ e z − e x . So angle ( γ 0 ( t ) , γ 0 (0)) = α > 0 . Ho w ever, || e z − e x || ≤ ( C +1) a n whic h implies, b y part 2 of Lemma 3, that angle ( γ 0 ( t ) , γ 0 (0)) = O ( √ a n ) < α which is a contradiction. 19 Genovese, Perone-P acifico, Verdinelli and W asserman 2. F or an y e x ∈ f M , the closest p oin t x ∈ M m ust satisfy k x − e x k ≤ a n . Let y b e the pro jection of x onto T e x f M . Let U = T e x f M ∩ B d ( y , a n ). Let Cyl = S u ∈ U B D ( u, 3 a n ) ∩  T e x f M  ⊥ . Cyl is a small hyper-cylinder containing y and e x , with the former in the center. M cannot in tersect the top or b ottom faces of the cylinder. Otherwise, w e can ﬁnd a p oint p ∈ M such that angle ( T e x f M , T p M ) > arctan(1) = π / 4 con tradicting 1. Thus, an y path through x on M must intersect the sides of Cyl. Hence, L b ( e x ) ∩ M 6 = ∅ . 3. Let x ∈ M ∩ L b ( e x ). Supp ose that || x − e x || > 2 a n . There exists q ∈ f M suc h that || q − x || ≤ a n . Note that || q − e x || > a n . No w w e apply part 5 Lemma 3 with p = e x and v = x . This implies that || v − p || = || x − e x || < a 2 n /κ which contradicts the assumption that || x − e x || > 2 a n . 4. Supp ose that more than one point of M w ere in L b ( e x ). Pic k tw o and call them x 1 and x 2 . By 3 , k x i − e x k ≤ 2 a n . It follo ws that k x 1 − x 2 k ≤ 4 a n and th us they are O ( a n ) close in geo desic distance b y part 3 of Lemma 3. Hence, there is a geo desic on M connecting x 1 and x 2 that is con tained strictly within the C a n ball. Be cause x 2 − x 1 lies in L b ( e x ) and is consequen tly orthogonal to T e x f M , there m ust exist a point on the geo desic whose angle with T e x f M equals π / 2, contradicting part 1. 5. Because H ( f M , M ) ≤ a n , w e hav e that M ⊂ tub e ( f M , a n ). Because a n < κ , the ﬁbers L b ( e x ) partition tub e ( f M , a n ). Hence, each x ∈ M m ust lie on one (and only one) L b ( e x ). Step 3b. Construct slabs that co ver M nicely . Let ג j = B D ( x j , δ n ) ∩ f M . Deﬁne the slab R j = [ x ∈ ג j L b ( x, f M ) . (27) Lemma 16 The c ol le ction of slabs R 1 , . . . , R N has the fol lowing pr op erties. L et M ∈ M n . 1. M ⊂ S N j =1 R j . 2. M ∩ R j is function-like over R j . That is, ther e exists a function g j : ג j → R D − d such that M ∩ R j = { g j ( x ) : x ∈ ג j } . 3. F or e ach x ∈ ג j , L b ( x ) ∩ M 6 = ∅ . 4. Ther e exists a line ar function ` j : ג j → R D − d such that sup x ∈ ג j || g j ( x ) − ` j ( x ) || ≤ C δ 2 n . 5. sup M ∈M n diam ( M ∩ R j ) ≤ C δ n . Th us the slabs co ver M and M cuts across R j is a function-like wa y . Moreov er, M ∩ R j is nearly linear. 20 Minimax Manifold Estima tion Pro of The ﬁrst three claims follo w immediately from Lemma 15. In particular, g j in claim 2 is deﬁned b y g j ( x ) = { M ∩ L b ( x ) } . No w w e sho w 4. W e can write g j ( x ) = g j ( x j ) + ( x − x j ) T ∇ g + 1 2 ( x − x j ) T Hess ( x − x j ) where Hess is the Hessian matrix of g j ev aluated at some point b et ween x and x j . By part 1 of Lemma 3, the largest eigen v alue of Hess is b ounded ab o ve b y 1 /κ . Since || x − x j || ≤ cδ 2 n , the claim follows. P art 5 follows easily . Step 4: Lo cal Conditional Likelihoo d Recall that M n = { M ∈ M ( κ ) : H ( f M , M ) ≤ a n } . Let Q n = { Q M : M ∈ M n } . (28) Consider a slab R j . F or each Q ∈ Q n deﬁne Q j ≡ Q ( ·| R j ) b y Q j ( A ) = Q ( A ∩ R j ) /Q ( R j ). Note that Q j is supp orted ov er tub e ( M , σ ) ∩ R j . Let Q n,j = { Q j : Q ∈ Q n } . Before we pro ceed we need to establish the follo wing. Lemma 17 L et I j ( M ) = tub e ( M , σ ) ∩ R j . Then ther e exists c 0 > 0 such that inf M ∈M n V ( I j ( M )) ≥ c 0 δ d n . Pro of By Lemma 16, M ∩ R j lies in a slab of size a n orthogonal to ג j . Because the angle b et ween the tw o manifolds on this set m ust b e no more than π / 4 and because a n > δ n , the manifold M cannot in tersect b oth the “top” and “b ottom” surfaces of the slab. Hence, for large enough C > 0, J j = S x ∈ ג j B D ( x, σ /C ) ⊂ I j . By construction, V ( I j ) ≥ V ( J j ) ≥ cδ d n . Step 4a. The Entrop y of Q n,j . Lemma 18 H [ ] ( , Q n,j , h ) ≤ c 1 log( c 2 / ) . Pro of W e b egin b y creating a γ Hausdorﬀ net for Q n,j . T o do this, we will parameterize the supp ort of these distributions. Eac h Q ∈ Q n,j has supp ort in the collection S n,j = { ( M ⊕ σ ) ∩ R j : M ∈ M n } . W e will construct a γ -Hausdorﬀ net for S n,j . Let e x ∈ f M b e the cen ter of ג j . Let y 1 , . . . , y r b e a c 1 γ -net of L b ( e x ), and let θ 1 < θ 2 < · · · < θ s < π / 2 − η for a small, ﬁxed η > 0 where θ j − θ j − 1 ≤ c 2 γ . Note that r = O ( γ − ( D − d ) ) and s = O (1 /γ ). F or every pair y i and θ j , let M ij b e a M ∈ M n that crosses through y i with angle ( T y i M , T e x f M ) = θ j . These manifolds comprise a collection of size O ((1 /γ ) D − d − 1 ) whic h w e will denote by Net ( γ ). Let M ∈ M n . Let y b e the point where M crosses L b ( e x ). Let y i b e the closest p oin t in the net to y and let θ j b e the closest angle in the net to angle ( T y M , T e x f M ). Because the angle betw een M and M ij is strictly less than π/ 4 (part 1 of Lemma 15) and the slab R j has 21 Genovese, Perone-P acifico, Verdinelli and W asserman radius δ n , it follows that H ( M , M ij ) ≤ C 1 γ + δ n C 2 γ ≤ C γ . Hence, Net ( γ ) is a γ -Hausdorﬀ net. No w consider Net ( γ ) with γ =  2 . F or each M ij ∈ Net ( γ ) let q ij b e the correspondng densit y and deﬁne u ij and ` ij b y u ij ( y ) =  q ij ( y ) + C  2 V ( M ij ⊕ ( σ +  2 ))  I ( y ∈ M ij ⊕ ( σ +  2 )) and ` ij ( y ) =  q ij ( y ) − C  2 V ( M ij ⊕ ( σ −  2 ))  I ( y ∈ M j ⊕ ( σ −  2 )) . Let B = { ( ` ij , u ij ) } . Let M ∈ M n and let M ij b e the elemen t of the net closest to M . It follows easily that u ij ≥ q M ≥ ` ij . Thus B is a brac keting. No w, Z u ij − ` ij = 1 + C  2 − (1 − C  2 ) = 2 C  2 . Hence, h ( u ij , ` ij ) ≤ q R | u ij − ` ij | = √ 2 C  . Hence, B is an √ 2 C −  -brack eting. So, H [ ] ( , Q n,j , h ) ≤ ( D − d − 1) log ( c/ ) , (29) whic h pro ves the lemma. Step 4b. Hellinger Rate of the Conditional MLE. Let b q b e the mle ov er Q n,j using the Y i ’s in R j . Let c M b e the manifold corresp onding to b q and let c M j = c M ∩ R j . Lemma 19 F or al l Q , al l A > 0 and al l lar ge n , Q n ( h ( Q, b Q ) >  C 0 log n n  1 2+ d )! ≤ n − A . Pro of Let N j b e the num b er of observ ations from the second half of the data that are in R j . Let µ j = E ( N j ) and deﬁne m n = n 2 2+ d . First, w e claim that N j ≥ µ j / 2 = O ( m n ) for all j , except on a set of probability e − cn 2 / (2+ d ) . Let π j = Q ( R j ). By Lemma 17 and Lemma 4, π j ≥ cδ d n for some c > 0. Hence, µ j ≥ m n . Note that σ 2 ≡ V ar ( N j ) /n = π j (1 − π j ) ≤ π j . Let t = µ j / 2. By Bernstein’s inequalit y , P ( N j ≤ µ j / 2) = P ( N j − µ j ≤ − µ j / 2) ≤ exp  − t 2 2 nσ 2 + 2 t/ 3  ≤ exp n − cn 2 / (2+ d ) o . Hence, by the union bound, P ( N j ≤ µ j / 2 for some j ) ≤ 1 N exp n − cn 2 / (2+ d ) o ≤ exp n − c 0 n 2 / (2+ d ) o 22 Minimax Manifold Estima tion since there are N = O (1 /δ n ) slabs. Th us w e can assume that there are at least order m n observ ations in each R j . Since H [ ] ( , Q n,j , h ) ≤ log( C (1 / )), solving the equation H [ ] ( , Q n,j , h ) = m n  2 w e get  m ≥ p C log m n /m n = (log n/n ) 2 / (2(2+ d )) = δ n . F rom Lemma 8, w e ha ve, for all Q ∈ Q n,j , Q n n h ( Q, b Q ) > δ n o = Q n n h ( Q, b Q ) >  m o ≤ c 1 e − c 2 m n  2 m ≤ n − A . Step 4c. Relating Hausdorﬀ Distance to Hellinger Distance Within a Slab. Lemma 20 F or e ach M 1 , M 2 ∈ M n , H ( M 1 ∩ R j , M 2 ∩ R j ) ≤ C h 2 ( Q j 1 , Q j 2 ) . Pro of Let g 1 and g 2 b e deﬁned as in Lemma 16. There exists x ∈ ג j suc h that g 1 ( x ) ∈ M 1 , g 2 ( x ) ∈ M 2 and || g 1 ( x ) − g 2 ( x ) || = γ . W e claim there exists ג 0 ⊂ ג j suc h that inf x ∈ ג 0 || g 1 ( x ) − g 2 ( x ) || ≥ γ / 2 and such that V ( ג 0 ) ≥ cδ d n . This follo ws since g 1 and g 2 are smo oth, they both lie in a slab of size a n around ג j and the angle b et ween the tangen t of g j ( x ) and ג j is b ounded b y π / 4. Create a mo diﬁed manifold M 0 2 suc h that M 0 2 diﬀers from M 1 o v er ג 0 b y a γ / 2 shift orthogonal to ג j and suc h that M 0 2 is otherwise equal to M 1 . It follows that ` 1 ( M 1 , M 2 ) ≥ ` 1 ( M 1 , M 0 2 ) and h ( Q 1 , Q 2 ) ≥ h ( Q 1 , Q 0 2 ). Ev ery p oin t in the supp ort of the conditioned distributions can b e written as an ordered pair ( x, y ) where x ∈ ג j and y lies in a d 0 ball of radius σ . M 0 2 is shifted a distance of γ / 2 in the direction orthogonal to ג j . As a result, the ` 1 distance betw een M 1 and M 0 2 equals the in tegral o ver C 0 of the volume diﬀerence betw een t wo d 0 balls of the same radius that are shifted b y γ / 2 relativ e to eac h other. This v olume δ d n γ . Hence, V ( M 1 ∩ ג j ) ◦ ( M 2 ∩ ג j ) ≥ γ δ d n . Let A = { x ∈ ג j : q 1 > 0 , q 2 = 0 } , B = { x ∈ ג j : q 1 > 0 , q 2 > 0 } , C = { x ∈ ג j : q 1 = 0 , q 2 > 0 } . A t least one of A or B has v olume at least γ δ d n / 2. Without loss of generalit y , assume that it is A . Then h 2 ( q 1 , q 2 ) = Z ( √ q 1 − √ q 2 ) 2 ≥ Z A ( √ q 1 − √ q 2 ) 2 = Z A q 1 ≥ C ∗ cδ d n γ δ d n = cC ∗ γ = cC ∗ H ( M 1 , M 2 ) . Step 4d. The Hausdorﬀ Rate. Lemma 21 F or any A > 0 ther e exists C 0 such that Q n ( H ( M ∩ R j , c M j ) >  C 0 log n n  2 2+ d )! ≤ 1 n A . 23 Genovese, Perone-P acifico, Verdinelli and W asserman Pro of This follo ws b y combining Lemma 20 and Lemma 19. Step 5: Final Es timator No w w e can com bine the estimators from the diﬀerence slabs. Let c M = S N j =1 c M j . Recall that the num b er of slabs is N = ( cδ n ) − d = ( C n/ log n ) d/ (2+ d ) . Pro of of Theorem 2. Cho ose an A > 2 / (2 + d ). W e ha ve: Q n ( H ( c M , M ) >  C 0 log n n  2 2+ d )! ≤ X j Q n ( H ( c M j , M ∩ R j ) >  C 0 log n n  2 2+ d )! ≤ N n A =  n C log n  1 2+ d × 1 n A ≤ c n A . Let r n =  C 0 log n n  2 / (2+ d ) . Since M and c M are con tained in a compact set, H ( c M , M ) is uniformly b ounded ab o ve by a constant K 0 . Hence, E Q H ( c M , M ) = E Q [ H ( c M , M ) I ( H ( c M , M ) > r n )] + E Q [ H ( c M , M ) I ( H ( c M , M ) ≤ r n )] ≤ K 0 Q n ( H ( c M , M ) > r n ) + r n ≤ c n A + r n = O  log n n  2 / (2+ d ) ! .  5. A Simple, Consistent Estimator Here w e giv e a practical, consistent estimator, one that does not conv erge at the optimal rate. It is a generalization of the estimator in Genov ese et al. (2010) and is similar to the estimator in Niyogi et al. (2006). Let b S = n [ i =1 B D ( Y i ,  ) (30) and deﬁne c ∂ S = ∂ ( b S ), b σ = max y ∈ b S d ( y , c ∂ S ) and c M = n y ∈ b S : d ( y , c ∂ S ) ≥ b σ − 2  o . (31) 24 Minimax Manifold Estima tion Lemma 22 L et  n = C (log n/n ) 1 /D in the estimator c M . Then H ( M , c M ) = O  log n n  1 /D (32) almost sur ely for al l lar ge n . Before pro ving the lemma w e need a few deﬁnitions. F ollowing Cuev as and Rodr ´ ıguez- Casal (2004), w e sa y that a set S is ( χ, λ ) -standar d if there exist p ositiv e num b ers χ and λ suc h that ν D ( B D ( y ,  ) ∩ S ) ≥ χ ν D ( B ( y ,  )) for all y ∈ S, 0 <  ≤ λ. (33) W e sa y that S is p artly exp andable if there exist r > 0 and R ≥ 1 suc h that H ( ∂ S, ∂ ( S ⊕  )) ≤ R for all 0 ≤  < r . A standard set has no sharp p eaks while a partly expandable set has not deep inlets. Lemma 23 If σ < ∆( M ) then S = M ⊕ σ is standar d with χ = 2 − D and λ = σ and p artly exp andable with r = ∆( M ) − σ and R = 1 . Pro of Let χ = 2 − D . Let y b e a p oin t in S and let Λ( y ) ≤ σ b e its distance from the b oundary ∂ S . If Λ( y ) ≥  then B D ( y ,  ) ∩ S = B D ( y ,  ) so that ν D ( B D ( y ,  ) ∩ S ) = ν D ( B D ( y ,  )) ≥ χ ν D ( B D ( y ,  )). Supp ose that Λ( y ) <  . Let v be a p oin t on the manifold closest to y and let y ∗ b e the p oin t on the segmen t joining y to v suc h that || y − y ∗ || = / 2. The ball A = B D ( y ∗ , / 2) is con tained in b oth B D ( y ,  ) and S . Hence, ν D ( B D ( y ,  ) ∩ S ) ≥ ν D ( A ) ≥ χν D ( B D ( y ,  )). This is true for all  ≤ σ , hence S is ( χ, λ )-standard for χ = 1 / 2 D and λ = σ . No w we show that S is partly expandable. By Proposition 1 in Cuev as and Ro dr ´ ıguez- Casal (2004) it suﬃces to sho w that a ball of radius r rolls freely outside S for some r , meaning that, for each y ∈ ∂ S , there is an a suc h that y ∈ B ( a, r ) ⊂ S c , where S c is the complemen t of S . Let O y b e the ball of radius ∆ − σ tangen t to y suc h that O y ⊂ S c . Such a ball exists b y virtue of the fact that σ < ∆( M ). Theorem 24 (Cuev as and Ro dr ´ ıguez-Casal (2004)) L et Y 1 , . . . , Y n b e a r andom sam- ple fr om a distribution with supp ort S . L et S b e c omp act, ( λ, χ ) -standar d and p artly ex- p andable. L et b S = n [ i =1 B ( Y i ,  n ) (34) and let c ∂ S b e the b oundary of b S . L et  n = C (log n/n ) 1 /D with C > (2 / ( χ ω D )) 1 /D wher e ω D = V ( B D (0 , 1)) . Then, with pr ob ability one, H ( S, b S ) ≤ C  log n n  1 /D and H ( ∂ S , c ∂ S ) ≤ C  log n n  1 /D (35) for al l lar ge n . Also, S ⊂ b S almost sur ely for al l lar ge n . 25 Genovese, Perone-P acifico, Verdinelli and W asserman Pro of of Lemma 22. Theorem 24 and Lemma 23 imply that H ( S, b S ) ≤ C (log n/n ) 1 /D and H ( ∂ S, c ∂ S ) ≤ C (log n/n ) 1 /D . It follows that b σ ≥ σ −  . First w e show that y ∈ c M implies that d ( y , M ) ≤ 4  . Let y ∈ c M . Then d ( y , ∂ S ) ≥ d ( y , c ∂ S ) −  ≥ b σ − 2  −  ≥ σ −  − 2  −  = σ − 4  . So d ( y , M ) = σ − d ( y , ∂ S ) ≤ σ − σ + 4  = 4  . No w w e show that M ⊂ c M . Supp ose that y ∈ M . Then, d ( y , c ∂ S ) ≥ d ( y , ∂ S ) −  = σ −  ≥ b σ − 2  so that y ∈ c M .  6. Conclusion and Op en Questions W e hav e established that the optimal rate for estimating a smooth manifold in Hausdorﬀ distance is n − 2 2+ d . W e conclude with some commen ts and op en questions. 1. W e hav e assumed that the noise is p erpendicular to the manifold. In curren t w ork w e are deriving the minimax rate under the more general assumption that  is dra wn from a general, spherically symmetric distribution. W e also allow the distribution along the manifold to be an y smo oth densit y bounded a wa y from 0. The rates are quite diﬀerent and the metho ds for proving the rates are substan tially more inv olved. Moreo v er, the rates dep ends on the b eha vior of the noise density near the b oundary of its supp ort. W e will report on this elsewhere. 2. Perhaps the most imp ortan t op en question is to ﬁnd a computationally tractable estimator that ac hieves the optimal rate. It is p ossible that com bining the estimator in Section 5 with one of the estimators in the computational geometry literature (Dey (2006)) could w ork. How ever, it app ears that some mo diﬁcation of such an estimator is needed. This is a diﬃcult question which we hop e to address in the future. 3. It is in teresting to note that Niy ogi et al. (2006) ha ve a Gaussian noise distribution. While it is p ossible to infer the homology of M with Gaussian noise it is not p ossible to infer M itself with any accuracy . The reason is that manifold estimation is similar to (and in fact, more diﬃcult than) nonparametric regression with measurement error. In that case, it is w ell known that the fastest p ossible rates under Gaussian noise are logarithmic. This highligh ts an important distinction b et w een estimating the top ological structure of M v ersus estimating M in Hausdorﬀ distance. 4. The curren t results tak e ∆( M ), d and σ as known (or at least b ounded by kno wn constan ts). In practice these must b e estimated. W e do not kno w whether there exist minimax estimators that are adaptive ov er d, ∆( M ) and σ . Ac knowledgmen ts The authors thank Don Sheehy for helpful comments on an earlier draft of this pap er. The authors also thank the reviewers for their commen ts and questions. 26 Minimax Manifold Estima tion 7. App endix 7.1 Pro of of Equation 13 W e will use the following t wo results (see Section 2.4 of Tsybako v (2008)): h 2 ( P n , Q n ) = 2  1 −  1 − h 2 ( P , Q ) 2  n  (36) and P ∧ Q ≥ 1 2  1 − h 2 ( P , Q ) 2  2 . (37) W e hav e P n ∧ Q n ≥ 1 2  1 − h 2 ( P n , Q n ) 2  2 = 1 8  1 − h 2 ( P , Q ) 2  2 n ≥ 1 8  1 − ` 1 ( P , Q ) 2  2 n since h 2 ( P , Q ) ≤ ` 1 ( P , Q ). 7.2 Pro of of Theorem 6 W e deﬁne tw o manifolds M 1 and M 2 with corresp onding distributions Q 1 and Q 2 suc h that (i) ∆( M i ) ≥ κ i = 1 , 2, (ii) H ( M 1 , M 2 ) = γ and (iii) such that the v olume of S 1 ◦ S 2 is of order γ d 2 +1 , where S i is the supp ort of Q i . W e write a generic D -dimensional vector as y = ( u, v , z ), with u ∈ R d , v ∈ R , z ∈ R D − d − 1 . F or each u ∈ R d with || u || ≤ 1, deﬁne the disk in R d +1 D 0 = n ( u, 0) ∈ R d +1 : u ∈ B d (0 , 1) o and let F 0 = ∂   [ ( u,v ) ∈ D 0 B d +1 (( u, v ) , κ )   . No w deﬁne the follo wing d -dimensional manifold in R D M 0 = n ( u, v , 0 D − d − 1 ) : ( u, v ) ∈ F 0 o = n ( u, a ( u ) , 0 D − d − 1 ) : u ∈ B d (0 , 1 + κ ) o ∪ n ( u, − a ( u ) , 0 D − d − 1 ) : u ∈ B d (0 , 1 + κ ) o where a ( u ) =  κ if || u || ≤ 1 p κ 2 − ( || u || − 1) 2 if 1 < || u || ≤ 1 + κ. The manifold M 0 has no b oundary and, by construction, ∆( M 0 ) ≥ κ . No w deﬁne a second manifold that coincides with M 0 but has a small p erturbation: M 1 = n ( u, b ( u ) , 0 D − d − 1 ) : u ∈ B d (0 , 1 + κ ) o ∪ n ( u, − a ( u ) , 0 D − d − 1 ) : u ∈ B d (0 , 1 + κ ) o 27 Genovese, Perone-P acifico, Verdinelli and W asserman where b ( u ) =      γ + p κ 2 − || u || 2 if || u || ≤ 1 2 p 4 γ κ − γ 2 2 κ − q κ 2 − ( || u || − p 4 γ κ − γ 2 ) 2 if 1 2 p 4 γ κ − γ 2 < || u || ≤ p 4 γ κ − γ 2 a ( u ) if p 4 γ κ − γ 2 < || u || ≤ p 4 γ κ − γ 2 + κ. Note that ∆( M 1 ) ≥ κ since the p erturbation is obtained using p ortions of spheres of radius κ . In fact • for || u || ≤ 1 2 p 4 γ κ − γ 2 , b ( u ) is the d + 1-th co ordinate of the “upper” p ortion of the ( d + 1)-dimensional sphere with radius κ centered at (0 , · · · , 0 , γ ), hence b ( u ) satisﬁes || u || 2 + ( b ( u ) − γ ) 2 = κ 2 with b ( u ) ≥ γ ; • for 1 2 p 4 γ κ − γ 2 < || u || ≤ p 4 γ κ − γ 2 , b ( u ) is the ( d + 1)-th co ordinate of the “lo w er” p ortion of the ( d + 1)-dimensional sphere with radius κ centered at ( u · p 4 γ κ − γ 2 / || u || , 2 κ ) (note that the center of the sphere diﬀers according to the di- rection of u ), hence b ( u ) satisﬁes         u − u || u || p 4 γ κ − γ 2         2 + ( b ( u ) − 2 κ ) 2 = κ 2 with b ( u ) ≤ 2 κ. T o summarize, M 0 and M 1 are b oth manifolds with no b oundary , ∆( M 0 ) ≥ κ and ∆( M 1 ) ≥ κ . See Figure 5. No w E 0 = M 0 − M 1 = n ( u, a ( u ) , 0 D − d − 1 ) : u ∈ B d (0 , p 4 γ κ − γ 2 ) o E 1 = M 1 − M 0 = n ( u, b ( u ) , 0 D − d − 1 ) : u ∈ B d (0 , p 4 γ κ − γ 2 ) o . Note that for each p oin t y ∈ E 0 there exists y 0 ∈ E 1 suc h that || y − y 0 || ≤ | a ( u ) − b ( u ) | ≤ γ . Also, y 0 = (0 , a (0) , 0) ∈ M 0 has as its closest M 1 p oin t y 1 = (0 , b (0) , 0), so that || y 0 − x 0 || = γ . Hence H ( M 0 , M 1 ) = H ( E 0 , E 1 ) = γ . T o ﬁnd an upp er bound for V ( S 0 ◦ S 1 ), we sho w that each y = ( u, v , z ) ∈ S 1 − S 0 satisﬁes the following conditions: (i) u ∈ B d (0 , p 4 γ κ − γ 2 ); (ii) z ∈ B D − d − 1 (0 , σ ); (iii) κ + σ − || z || < v ≤ κ + γ + σ − || z || ]. If y = ( u, v , z ) b elongs to S 1 and has || u || > p 4 γ κ − γ 2 , then there is a point of M 0 ∩ M 1 within distance σ , hence y 6∈ S 1 − S 0 . This prov es (i). Before pro ving (ii) and (iii), note that if u ∈ B d (0 , p 4 γ κ − γ 2 ) then κ = a ( u ) ≤ b ( u ) ≤ κ + γ . No w, let y 0 = ( u 0 , b ( u 0 ) , 0) ∈ E 1 b e the p oin t in S 1 closest to y . W e ha ve d ( y , S 1 ) = || y − y 0 || = || u − u 0 || + | v − b ( u 0 ) | + || z || ≤ σ. 28 Minimax Manifold Estima tion R 1 R 2 R 2 Figure 5: One section of manifolds M 0 and M 1 . The common part is dashed, E 0 is dotted and E 1 solid. R 1 and R 2 denote the regions where the diﬀerent deﬁnitions of the p erturbation apply: R 1 is || u || ≤ 1 2 p 4 γ κ − γ 2 while R 2 denotes 1 2 p 4 γ κ − γ 2 < || u || ≤ p 4 γ κ − γ 2 . This gives condition (ii) abov e || z || ≤ σ and also | v − b ( u 0 ) | ≤ σ − || z || . (38) Since b ( u 0 ) ≤ κ + γ , we obtain v ≤ b ( u 0 ) + σ − || z || ≤ κ + γ + σ − || z || whic h is the righ t inequalit y in (iii). Finally , σ < d ( y , M 0 ) ≤ || y − ( u, a ( u ) , 0) || = | v − a ( u ) | + || z || whic h implies either v < a ( u ) − ( σ − || z || ) or v > a ( u ) + ( σ − || z || ). The former inequality w ould imply v < a ( u ) − ( σ − || z || ) = κ − ( σ − || z || ) ≤ inf u 0 b ( u 0 ) − ( σ − || z || ) so that | v − b ( u 0 ) | > σ − || z || for all u 0 , which is in contadiction with (38). Hence w e ha ve v > a ( u ) + ( σ − || z || ) = κ + ( σ − || z || ) that is the left inequalit y in (iii). As a consequence, S 1 − S 0 ⊂ B d (0 , p 4 γ κ − γ 2 ) × n ( v , z ) ∈ R D − d : κ − γ + σ −|| z || < v ≤ κ + γ + σ −|| z || ] , z ∈ B D − d − 1 (0 , σ ) o and V ( S 0 − S 1 ) ≤ C · ( p 4 γ κ − γ 2 ) d · γ · σ D − d − 1 . Hence, V ( S 0 − S 1 ) = O ( γ d 2 +1 ). With similar arguments one can show that V ( S 1 − S 0 ) = O ( γ d 2 +1 ) so that V ( S 0 ◦ S 1 ) = O ( γ d 2 +1 ) . It then follows that R | q 0 − q 1 | = O ( γ ( d +2) / 2 ). 29 Genovese, Perone-P acifico, Verdinelli and W asserman References Ric hard G. Baraniuk and Michael B. W akin. Random pro jections of smo oth manifolds. F oundations of Computational Mathematics , 9:51–77, 2007. M. Birman and M. Solomjak. Piecewise-p olynomial appro ximation of functions of the classes w p . Mathematics of USSR Sb ornik , 73:295–317, 1967. Jean-Daniel Boissonnat and Arijit Ghosh. Manifold reconstruction using tangential delau- na y complexes. In Pr o c e e dings of the 2010 annual symp osium on c omputational ge ometry , pages 324–333. ACM, 2010. F rederic Chazal and Andre Lieutier. Smo oth manifold reconstruction from noisy and non- uniform approximation with guaran tees. Computational Ge ometry , 40:156–170, 2008. Siu-Wing Cheng and T amal Dey . Manifold reconstruction from p oin t samples. In Pr o- c e e dings of the sixte enth annual ACM-SIAM symp osium on discr ete algorithms , pages 1018–1027. SIAM, 2005. An tonio Cuev as and Alb erto Ro dr ´ ıguez-Casal. On boundary estimation. A dvanc es in Ap- plie d Pr ob ability , 36(2):340–354, 2004. Luc Devroy e and Gary L. Wise. Detection of abnormal b eha vior via nonparametric esti- mation of the supp ort. SIAM Journal on Applie d Mathematics , 38:480–488, 1980. T amal Dey . Curve and Surfac e R e c onstruction: Algorithms with Mathematic al Analysis . Cam bridge Univ ersity Press, 2006. T amal Dey and Samrat Gosw ami. Pro v able surface reconstruction from noisy samples. In Pr o c e e dings of the twentieth annual symp osium on c omputational ge ometry , pages 330– 339. ACM, 2004. Herb ert F ederer. Curv ature measures. T r ansactions of the Americ an Statistic al So ciety , 93: 418–491, 1959. Christopher R. Genov ese, Marco P erone-Paciﬁco, Isabella V erdinelli, and Larry W asserman. Nonparametric ﬁlament estimation. , 2010. Oscar Gonzalez and John H. Maddo c ks. Global curv ature, thickness, and the ideal shap es of knots. Pr o c e e dings of the National A c ademy of Scienc es , 96(9):4769–4773, 1999. L. LeCam. Con v ergence of estimates under dimensionality restrictions. The A nnals of Statistics , pages 38–53, 1973. J.M. Lee. Intr o duction to Smo oth Manifolds . Springer, 2002. P artha Niy ogi, Steven Smale, and Shmuel W einberger. Finding the homology of submani- folds with high conﬁdence from random samples. Discr ete and Computational Ge ometry , 39:419–441, 2006. 30 Minimax Manifold Estima tion P artha Niy ogi, Stev en Smale, and Shm uel W einberger. A top ological view of unsup ervised learning from noisy data. Unpublishe d te chnic al r ep ort, University of Chic ago , 2008. Xiaotong Shen and Wing W ong. Probabilit y inequalities for likelihoo d ratios and conv er- gence rates of siev e mles. The Annals of Statistics , 23:339–362, 1995. Alexandre Tsybako v. Intr o duction to Nonp ar ametric Estimation . Springer, 2008. Bin Y u. Assouad, Fano, and Le Cam. In F estschrift for Lucien L e Cam . Springer, 1997. 31

Minimax Manifold Estimation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment