Manifold estimation and singular deconvolution under Hausdorff loss

The Annals of Statistics 2012, V ol. 40, No. 2, 941–963 DOI: 10.1214 /12-AOS994 c  Institute of Mathematical Statistics , 2 012 MANIF OLD ES TIMA TION AND SINGULAR DECONVOLUTION UNDER HA USDORFF LOSS By Christopher R. Geno vese 1 , Marco Perone-P a cifico 2 , Isabell a Verdinell i 2 and Larr y W asserman 3 Carne gie M el lon University, Sapienza U ni v ersity of R ome, Carne gi e Mel lon University and Sapienza University of R ome, and Carne gie Me l lon University W e ﬁnd low er and upp er bou n ds for th e risk of estimating a man- ifold in Hausdorﬀ distance under several mod els. W e also show t hat there are close connections betw een manifold estimation and the problem of deconvol ving a singular measure. 1. In tro duction. Manifold learning is an area of in tense researc h activit y in mac hine learning and statistics. Y et a very basic question ab out manifold learning is s till op en, n amely , ho w w ell can we estimate a m an if old f rom n noisy samples? In this pap er w e in v estigate this question under v arious as- sumptions. Supp ose we ob s erv e a r andom sample Y 1 , . . . , Y n ∈ R D that lies on or near a d -manifold M where d < D . The question w e add ress is: w hat is the minimax r isk under Hausdorﬀ distance f or estimating M ? Our main assumption is that M is a d -dimensional, smooth Riemannian submanifold in R D ; the pr ecise conditions on M are giv en in Section 2 . Let Q denote th e distribution of Y i . W e sh all see that Q dep en ds on sev eral things, in cluding the manifold M , a distribution G supp orted on M and a mo del for the noise. W e consid er three noise m o dels. The ﬁrst is the noiseless mo del in whic h Y 1 , . . . , Y n is a r an d om sample from G . The second is the clutter noise mo d el, in whic h Y 1 , . . . , Y n ∼ (1 − π ) U + π G, (1) Received Sep tember 2011; rev ised Jan uary 2012. 1 Supp orted by NSF Grant DMS - 08-06009. 2 Supp orted by Italian National Researc h Grant PRIN 2008. 3 Supp orted by NS F Grant DMS-08-06009, Air F orce Gran t F A95500910373 an d Sapienza Un iversit y of Rome grant for visiting professors 2009. AMS 2000 subje ct classiﬁc ations. Primary 62G05, 62G20; secondary 62H12. Key wor ds and phr ases. D econvol ution, manifold learning, minimax. This is an ele ctronic reprint o f the orig inal ar ticle published by the Institute of Ma thematical Statistics in The Annals of Statistics , 2012, V ol. 40, No. 2, 941–96 3 . This repr in t diﬀers from the orig inal in pagination and typogra phic detail. 1 2 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN where U is a uniform distrib ution on a compact set K ⊂ R D with nonempty in terior, and G is supp orted on M . (When π = 1 w e reco ver the noiseless case.) The third is the additive mo del, Y i = X i + Z i , (2) where X 1 , . . . , X n ∼ G , G is supp orted on M , and the noise v ariables Z 1 , . . . , Z n are a sample fr om a d istribution Φ on R D whic h we tak e to b e Gaussian. In this case, the distribution Q of Y is a conv olution of G and Φ written Q = G ⋆ Φ. In a previous pap er [Genov ese et al. ( 2010 )], we considered a noise mo del in which the noise is p erp en d icular to the manifold. This mo del is also considered in Niyog i, Smale and W einb erger ( 2011 ). Since we ha v e already studied that m o del, w e sh all not consider it furth er h ere. In the additiv e m o del, estimating M is r elated to estimating the distri- bution G , a problem that is u sually called de c onvolution [F an ( 1991 )]. Th e problem of d econ v olution is we ll stu died in the statistical literature, but in the manifold case there is an interesting complication: the measure G is sin- gular b ecause it puts all its mass on a subset of R D that has zero Leb esgue measure (since th e manifold has dimension d < D ). Decon v olution of singu- lar measures has n ot receiv ed as muc h atten tion as standard deconv olution problems and r aises in teresting challe nges. Eac h noise mo del giv es r ise to a class of d istributions Q for Y deﬁned more pr ecisely in Section 2 . W e are in terested in the minimax risk R n ≡ R n ( Q ) = in f c M sup Q ∈Q E Q [ H ( c M , M )] , (3) where the inﬁmum is o ve r all estimators c M , and H is the Hausd orﬀ d istance [deﬁned in equation ( 4 )]. Note that ﬁn ding the minimax risk is equiv alent to ﬁnd in g the sample c omplexity n ( ε ) = in f { n : R n ≤ ε } . W e emp hasize that the goal of this pap er is to ﬁ n d the minimax rates, not to ﬁn d practical estimators. W e use the Hausdorﬀ distance b ecause it is one of the m ost commonly u sed metrics for assessing the accuracy of set-v alued estimators. One could of course create other loss f u nctions and study th eir prop erties, but this is b ey ond the scop e of this pap er . Finally , we remark that our up p er b ound s s ometimes diﬀer from our low er b ounds by a logarithmic factor. This is a common p h enomenon when dealing with Hausd orﬀ d istance (and sup norm in function estimation problems). Cu rrently , we do not kno w h o w to eliminate the log factor. 1.1. R elate d work. In the additive noise case, estimating a manifold is related to decon v olution problems suc h as those in F an ( 1991 ), F an and T ruong ( 1993 ) an d S tefanski ( 1990 ). More closely related is the problem of estimating the supp ort of a distr ib ution in the p r esence of noise as d iscu ssed, for example, in Meister ( 2006 ). MANIFOLD ESTIMA TION 3 There is a v ast literature on m an if old estimation. Much of the literature deals w ith usin g manifolds for the p urp ose of dimension red u ction. S ee, for example, Baraniuk and W akin ( 2009 ) and r eferences therein. W e are inte r- ested instead in actually estimating the manifold itself. There is a literature on th is problem in the ﬁeld of computational geometry; see Dey ( 2007 ). Ho w ev er, very few pap ers allo w for noise in the statistical sense, b y whic h w e mean observ ations drawn randomly fr om a distribu tion. In the literature on computational geometry , observ ations are called noisy if th ey depart f rom the un derlying manifold in a v ery sp eciﬁc wa y: the observ ations h a v e to b e close to th e manifold but not to o close to eac h other. This n otion of noise is quite d iﬀerent f r om r andom samp ling from a distribution. An exceptio n is Niy ogi, Smale and W einb erger ( 2008 ), w h o constr u cted the follo wing es- timator: L et I = { i : b p ( Y i ) > λ } w here b p is a d ensit y estimator. They d eﬁne c M = S i ∈ I B D ( Y i , ε ) wher e B D ( Y i , ε ) is a ball in R D of radius ε cen tered at Y i . Niy ogi, Smale and W ein b erger ( 2008 ) show that if λ and ε are c hosen prop erly , then c M is h omologous to M . This means that M and c M share certain top ological pr op erties. Ho w ev er, the result do es not guaran tee close- ness in Hausdorﬀ distance. A v ery relev ant pap er is Caillerie et al. ( 2011 ). These authors consider observ ations generated from a manifold and then con taminated b y additiv e n oise as we do in Section 5 . Also, they use d e- con v olution metho ds as w e d o. Ho wev er, their in terest is in up p er b ou n ding the W asserstein distance b et wee n an estimator b G and the distrib ution G , as a prelude to estimating the homology of M . Th ey d o not establish Hausdorﬀ b ound s. Koltc hinskii ( 2000 ) considers estimating the num b er of connected comp onent s of a set, con taiminated by additiv e noise. Th is corresp onds to estimating the zeroth order homology . There is a also a literature on estimating pr in cipal sur faces. A recen t pap er on this app roac h with an excellen t review is Ozertem and Erd ogm us ( 2011 ). This is similar to estimating manifolds bu t, to the b est of our kno wledge, this literature do es not establish minimax b ounds for estimation in Hausdorﬀ distance. Finally we w ould lik e to mention the r elated problem of testing for a set of p oints on a sur face in a ﬁeld of uniform noise [Arias-Castro et al. ( 2005 )], bu t, despite some similarit y , this problem is qu ite diﬀeren t. 1.2. Notation. W e let B D ( x, r ) den ote a D -d imensional op en ball cen- tered at x with radiu s r . If A is a set, and x is a p oin t, then w e write d ( x, A ) = inf y ∈ A k x − y k where k · k is the Euclidean norm. Give n tw o sets A and B , the Hausdorﬀ dista nc e b et w een A and B is H ( A, B ) = inf { ε : A ⊂ B ⊕ ε and B ⊂ A ⊕ ε } , (4) where A ⊕ ε = [ x ∈ A B D ( x, ε ) . (5) 4 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN The L 1 distance b etw een t w o distributions P and Q with densities p and q is ℓ 1 ( p, q ) = R | p − q | and the total variation distanc e b etw een P and Q is TV ( P , Q ) = sup A | P ( A ) − Q ( A ) | , (6) where th e supr em um is o v er all measurable sets A . Recall that TV ( P , Q ) = (1 / 2) ℓ 1 ( p, q ). Let p ( x ) ∧ q ( x ) = min { p ( x ) , q ( x ) } . Th e aﬃnity b etw een P and Q is k P ∧ Q k = Z p ∧ q = 1 − 1 2 Z | p − q | . (7) Let P n denote the n -fold pro duct measure b ased on n indep en den t obser - v ations fr om P . It can b e sh o wn that k P n ∧ Q n k ≥ 1 8  1 − 1 2 Z | p − q |  2 n . (8) The con v olution b etw een t wo measures P and Φ—d enoted by P ⋆ Φ —is the measure deﬁ ned b y ( P ⋆ Φ )( A ) = Z Φ( A − x ) dP ( x ) . (9) If Φ has density φ , then P ⋆ Φ has densit y R φ ( y − u ) dP ( u ). Th e F ourier transform of P is denoted by p ∗ ( t ) = Z e it T u dP ( u ) = Z e it · u dP ( u ) , (10) where we u se b oth t T u and t · u to d enote the dot pr o duct. W e write X n = O P ( a n ) to mean that f or every ε > 0, th ere exists C > 0 suc h that P ( k X n k /a n > C ) ≤ ε for all large n . T hroughout, we use symbols lik e C, C 0 , C 1 , c, c 0 , c 1 , . . . to denote generic p ositiv e constan ts whose v alue ma y b e d iﬀeren t in diﬀerent expr essions . W e write p oly ( ε ) to denote any expression of the form a ε b for some p ositiv e real num b ers a and b . W e wr ite a n  b n if there exists c > 0 such that a n ≤ cb n for all large n . S imilarly , write a n  b n if b n  a n . Finally , write a n ≍ b n if a n  b n and b n  a n . W e will us e Le Cam’s lemma to d eriv e lo w er b oun ds, which w e n o w state. This ve rsion is from Y u ( 1997 ). Lemma 1 (Le Cam 1973). L et Q b e a set of distributions. L et θ ( Q ) take values in a metric sp ac e with metric ρ . L e t Q 0 , Q 1 ∈ Q b e any p air of distributions in Q . L et Y 1 , . . . , Y n b e dr awn i. i.d. fr om some Q ∈ Q and denote the c orr e sp onding pr o duct me asur e by Q n . L et b θ = b θ ( Y 1 , . . . , Y n ) b e any estimator. Then sup Q ∈Q E Q n [ ρ ( b θ , θ ( Q ))] ≥ ρ ( θ ( Q 0 ) , θ ( Q 1 )) k Q n 0 ∧ Q n 1 k ≥ ρ ( θ ( Q 0 ) , θ ( Q 1 )) 1 8 (1 − TV ( Q 0 , Q 1 )) 2 n . MANIFOLD ESTIMA TION 5 2. Assumptions. W e shall b e concerned with d -dim en sional Riemannian submanifolds of R D where d < D . Usually , w e assum e that M is con tained in some compact set K ⊂ R D . An exception is Section 5 where w e allo w noncompact manifolds . Let ∆( M ) b e the largest r suc h that eac h p oint in M ⊕ r has a unique pr o jection on to M . The quan tit y ∆( M ) will b e small if either M is not smo oth or if M is close to b eing self-in tersecting. Th e quan tit y ∆( M ) has b een redisco vered many times. It is called the c ondition numb er in Niy ogi, Smale and W einberger ( 2008 ) and the r e ach in F ederer ( 1959 ). Let M ( κ ) den ote all d -dimensional m anifolds em b edd ed in R D suc h that ∆( M ) ≥ κ . Th roughout this pap er, κ is a ﬁxed p ositive constan t. W e consider three diﬀeren t d istributional mo dels: (1) Noiseless . W e observe Y 1 , . . . , Y n ∼ G where G is sup p orted on a man- ifold M where M ∈ M = { M ∈ M ( κ ) , M ⊂ K } . In this case, Q = G and the observ ed d ata fall exactly on the m anifold. W e assume that G h as densit y g with resp ect to the uniform d istribution on M and that 0 < b ( M ) ≤ inf y ∈ M g ( y ) ≤ sup y ∈ M g ( y ) ≤ B ( M ) < ∞ , (11) where b ( M ) and B ( M ) are allo we d to dep en d on the class M , but not on the particular manifold M . Let G ( M ) denote all such distrib utions. In this case we deﬁne Q = G = [ M ∈M G ( M ) . (12) (2) Clutter noise . Deﬁne M an d G ( M ) as in the noiseless case. W e observ e Y 1 , . . . , Y n ∼ Q ≡ (1 − π ) U + π G, (13) where 0 < π ≤ 1 , U is uniform on the compact set K ⊂ R D and G ∈ G ( M ). Deﬁne Q = { Q = (1 − π ) U + π G : G ∈ G ( M ) , M ∈ M} . (14) (3) A dditive noise . In this case we allo w the manifolds to b e noncompact. Ho w ev er, w e do require that eac h G p ut n ontrivial p r obabilit y in some ﬁxed compact set. Sp eciﬁcally , we again ﬁx a compact set K . Let M = M ( κ ). Fix p ositiv e constan ts 0 < b ( M ) < B ( M ) < ∞ . F or an y M ∈ M , let G ( M ) b e the set of d istributions G su pp orted on M , such th at G has densit y g with resp ect to Hausdorﬀ measure on M , and such that 0 < b ( M ) ≤ inf y ∈ M ∩K g ( y ) ≤ sup y ∈ M ∩K g ( y ) ≤ B ( M ) < ∞ . (15) Let X 1 , X 2 , . . . , X n ∼ G ∈ G ( M ), and deﬁne Y i = X i + Z i , i = 1 , . . . , n, (16) 6 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN where Z i are i.i.d. dra ws from a distr ib ution Φ on R D , and where Φ is a standard D -dimensional Gaussian. Let Q = G ⋆ Φ b e the distrib ution of eac h Y i and Q n b e the corresp ondin g pro du ct measur e. Let Q = { G ⋆ Φ : G ∈ G ( M ) , M ∈ M} . These three mo dels are an attempt to capture the idea that we ha v e d ata falling on or near a manifold. These app ear to b e the most commonly used mo dels. No doubt, one could create other mo dels as well which is a topic for future researc h. As w e mentioned earlier, a diﬀerent noise mo del is consid- ered in Niy ogi, Smale and W ein b erger ( 2011 ) and in Geno vese et al. ( 201 0 ). Those au th ors consider the case where the noise is p erp endicular to the manifold. Th e former p ap er considers estimating the homology group s of M while the latter pap er sho ws that the minimax Hausdorﬀ rate is n − 2 / (2+ d ) in that case. 3. Noiseless case. W e now deriv e the m inimax b ounds in the noiseless case. Theorem 2. Under the noiseless mo del, we have inf c M sup Q ∈Q E Q n [ H ( c M , M )] ≥ C n − 2 /d . (17) Pr oof . Fix γ > 0 . By Th eorem 6 of Geno vese et al. ( 2010 ) there exist manifolds M 0 , M 1 that satisfy the f ollo wing conditions: (1) M 0 , M 1 ∈ M . (2) H ( M 0 , M 1 ) = γ . (3) Th ere is a set B ⊂ M 1 suc h that: (a) inf y ∈ M 0 k x − y k > γ / 2 for all x ∈ B . (b) µ 1 ( B ) ≥ γ d/ 2 where µ 1 is the un iform measure on M 1 . (c) Th ere is a p oin t x ∈ B such that k x − y k = γ wh ere y ∈ M 0 is the closest p oin t on M 0 to x . Moreo v er, T x M 1 and T y M 0 are parallel where T x M is the tangen t plane to M at x . (4) If A = { y : y ∈ M 1 , y / ∈ M 0 } , then µ 1 ( A ) ≤ C γ d/ 2 for some C > 0. Let Q i = G i b e the un iform measure on M i , for i = 0 , 1, an d let A b e the set d eﬁned in the last item. Then TV ( G 0 , G 1 ) = G 1 ( A ) − G 0 ( A ) = G 1 ( A ) ≤ C γ d/ 2 . F rom Le Cam’s lemma, sup Q ∈Q E Q n H ( c M , M ) ≥ γ (1 − γ d/ 2 ) 2 n . (18) Setting γ = (1 /n ) 2 /d yields the stated lo w er b oun d .  See Figure 1 for a heuristic explanation of the constru ction of the t w o manifolds, M 0 and M 1 , used in the ab o v e pro of. No w w e derive an up p er b ound . MANIFOLD ESTIMA TION 7 Fig. 1. The pr o of of The or em 2 uses two manifolds, M 0 and M 1 . A spher e of r adius κ is pushe d upwar d into the pl ane M 0 (top left). The r esulting manifold M ′ 0 is not smo oth (top right). A spher e i s then r ol l e d ar ound the manifold (b ottom l eft) to pr o duc e a smo oth manifold M 1 (b ottom right). The c onstruction is made rigor ous in The or em 6 of Genovese et al. ( 2010 ). Theorem 3. Under the noiseless mo del, we have inf c M sup Q ∈Q E Q n [ H ( c M , M )] ≤ C  log n n  2 /d . (19) Hence, the rate is tight, up to logarithmic factors. Th e pro of is a sp ecial case of the pro of of the upp er b ound in the next section and so is omitted. Remark. The Asso ciate Ed itor p oin ted out that the rate (1 /n ) 2 /d migh t seem counterin tuitive. F or example, when d = 1, this yields (1 /n ) 2 whic h w ould seem to contradict the us u al 1 /n r ate for estimat ing the supp ort of a un iform d istribution. How eve r, the slo wer 1 /n rate is actually a b oundary eﬀect m uc h lik e the b ound ary eﬀects that occur in density estimation and regression. If we em b ed the un iform in to R 2 and wr ap it in to a circle to eliminate the b oundary , w e d o in deed get a r ate of 1 /n 2 . Our assump tion of smo oth man if olds without b oun dary r emo v es the b oundary eﬀect. 4. Clutter noise. Recall that Y 1 , . . . , Y n ∼ Q = (1 − π ) U + π G, where U is uniform on K , 0 < π ≤ 1 and G ∈ G . 8 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN Fig. 2. Given a manifold M and a p oint y ∈ M , S M ( y ) is a slab, c enter e d at y , wi th size O ( √ ε n ) in the d dir e ctions c orr esp onding to the tangent sp ac e T y M and size O ( ε n ) in the D − d normal dir e ctions. Theorem 4. Under the clutter mo del, we have inf c M sup Q ∈Q E Q n [ H ( c M , M )] ≥ C  1 nπ  2 /d . (20) Pr oof . W e deﬁne M 0 , M 1 and A as in the pro of of Theorem 2 . Let Q 0 = (1 − π ) U + π G 0 and Q 1 = (1 − π ) U + π G 1 . Th en TV ( Q 0 , Q 1 ) = π TV ( G 0 , G 1 ). Hence TV ( Q 0 , Q 1 ) ≤ π ( G 1 ( A ) − G 0 ( A )) = π G 1 ( A ) ≤ C π γ d/ 2 . F rom Le Cam’s lemma, sup Q ∈Q E Q n [ H ( c M , M )] ≥ γ (1 − π γ d/ 2 ) 2 n . (21) Setting γ = (1 /nπ ) 2 /d yields the stated lo w er b oun d .  No w we consider the upp er b ound. Let b Q n b e the empirical measure. Let ε n = ( K log n /n ) 2 /d where K > 0 is a large p ositive constant . Give n a man if old M and a p oint y ∈ M let S M ( y ) d enote the slab, cen tered at y , with size b 1 √ ε n in the d directions corresp ond in g to the tangen t sp ace T y M and size b 2 ε n in the D − d n orm al directions to the tangen t sp ace. Here, b 1 and b 2 are small, p ositive constan ts. S ee Figure 2 . Deﬁne s ( M ) = inf y ∈ M b Q n [ S M ( y )] and c M n = arg max M s ( M ) . In case of ties w e take an y m aximizer. Theorem 5. L et ξ > 1 and let ε n = ( K log n/n ) 2 /d wher e K is a lar ge, p ositive c onstant. Then sup Q ∈Q Q n ( H ( M 0 , c M n ) > ε n ) < n − ξ and henc e sup Q ∈Q E Q n ( H ( M 0 , c M n )) ≤ C ε n . W e will use the follo wing result, whic h follo ws from Th eorem 7 of Bous- quet, Bouc heron an d Lugosi ( 2004 ). This v ersion of the result is from Ch au d - h uri and Dasgupta ( 2010 ). MANIFOLD ESTIMA TION 9 Lemma 6. L et A b e a class of sets with VC dimension V. L et 0 < u < 1 and β n = s  4 n  V log(2 n ) + log  8 u  . Then for al l A ∈ A , − min { β n q b Q n ( A ) , β 2 n + β n p Q ( A ) } ≤ Q ( A ) − b Q n ( A ) ≤ min { β 2 n + β n q b Q n ( A ) , β n p Q ( A ) } with pr ob ability at le ast 1 − u . The set of h yp er-rectangles in R D (whic h con tains all the s labs) has ﬁnite V C dimension V , sa y . Hence, we ha ve the follo wing lemma obtained by setting u = (1 /n ) ξ . Lemma 7. L et A denote al l hyp er-r e c tangles in R D . L et C = 4[ V + max { 3 , ξ } ] . Then f or al l A ∈ A , b Q n ( A ) ≤ Q ( A ) + C log n n + r C log n n p Q ( A ) and (22) b Q n ( A ) ≥ Q ( A ) − r C log n n p Q ( A ) (23) with pr ob ability at le ast 1 − (1 /n ) ξ . No w we can pro ve T heorem 5 . Pr oof o f Theorem 5 . Let M 0 denote the true manifold. Assu me that ( 22 ) and ( 23 ) h old. Let y ∈ M 0 and let A = S M 0 ( y ). Note that Q ( A ) = (1 − π ) U ( A ) + π G ( A ). Since y ∈ M 0 and G is singular, th e term U ( A ) is of lo w er order and so there exist 0 < c 1 ≤ c 2 < ∞ such th at, for all large n , c 1 K log n n = c 1 ε d/ 2 n ≤ Q ( A ) ≤ c 2 ε d/ 2 n = c 2 K log n n . Hence b Q n ( A ) ≥ Q ( A ) − r C log n n p Q ( A ) ≥ c 1 K log n n − q c ′ 2 K log n n > c 3 K log n n . Th us s ( M 0 ) > c 3 K log n n with high pr obablit y . No w consider any M for wh ic h H ( M 0 , M ) > ε n . There exists a p oint y ∈ M suc h that d ( y , M 0 ) > ε n . It can b e seen, since M ∈ M , that S M ( y ) ∩ 10 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN M 0 = ∅ . [T o see th is, note that ∆( M ) ≥ κ > 0 imp lies that the in terior of an y b all of radiu s κ tangen t to M at y has empty in tersection with M and the slab S M ( y ) is strictly conta ined in s u c h a ball for b 1 and b 2 small enough relativ e to κ .] Hence Q ( S M ( y )) = (1 − π ) U ( S M ( y )) = c 4 ε d/ 2 n ε D − d n =  K log n n  c 4  K log n n  2( D − d ) /d = C  log n n  (2 D − d ) /d . So, from th e p revious lemma, s ( M ) = inf x ∈ M b Q n ( S M ( x )) ≤ b Q n ( S M ( y )) ≤ Q ( S M ( y )) + C log n n + r C log n n p Q ( S M ( y )) =  K log n n  (2 D − d ) /d + C log n n +  K log n n  D /d < C 3 K log n n = s ( M 0 ) since D > d and K is large. Let M n = { M ∈ M : H ( M 0 , M ) > ε n } . W e con- clude that Q n ( s ( M ) > s ( M 0 ) for some M ∈ M n ) <  1 n  ξ .  5. Additiv e noise. Let u s recall the mo d el. Let M = M ( κ ). W e allo w th e manifolds to b e noncompact. Fix p ositiv e constan ts 0 < b ( M ) < B ( M ) < ∞ . F or an y M ∈ M let G ( M ) b e the set of distribu tions G su pp orted on M suc h that G has d ensit y g with resp ect to Hausdorﬀ measure on M and su c h that 0 < b ( M ) ≤ inf y ∈ M ∩K g ( y ) ≤ sup y ∈ M ∩K g ( y ) ≤ B ( M ) < ∞ , (24) where K is a compact set. L et X 1 , X 2 , . . . , X n ∼ G ∈ G ( M ), and deﬁne Y i = X i + Z i , i = 1 , . . . , n, (25) where Z i are i.i.d. dra ws from a distr ib ution Φ on R D , and where Φ is a standard D -dimensional Gaussian. Let Q = G ⋆ Φ b e the distrib ution of eac h Y i and Q n b e the corresp ondin g pro du ct measur e. Let Q = { G ⋆ Φ : G ∈ G ( M ) , M ∈ M} . Since w e allo w the manifolds to b e noncompact, the Hausdorﬀ distance could b e unb ounded . Hence w e deﬁne a truncated loss fu nction, L ( M , c M ) = H ( M ∩ K , c M ∩ K ) . (26) Theorem 8. F or al l lar ge enough n , inf c M sup Q ∈Q E Q [ L ( M , c M )] ≥ C log n . (27) MANIFOLD ESTIMA TION 11 Fig. 3. The two le ast f avor able manifolds M 0 and M 1 in the pr o of of The or em 8 in the sp e cial c ase wher e D = 2 and d = 1 . Pr oof . Deﬁne e c : R → R and c : R d → R D − d as follo ws: e c ( x ) = cos( x/ ( a √ γ )) and c ( u ) = ( Q d ℓ =1 e c ( u ℓ ) , 0 , . . . , 0) T . Let M 0 = { ( u, γ c ( u )) : u ∈ R d } and M 1 = { ( u, − γ c ( u )) : u ∈ R d } . See Figure 3 for a p ictur e of M 0 and M 1 when D = 2 , d = 1. Later, we will sho w that M 0 , M 1 ∈ M . Let U b e a d -dimensional random v ariable with densit y ζ where ζ is d - dimensional standard Gaussian density . Let e ζ b e a one-dimensional N (0 , 1) densit y . And deﬁn e G 0 and G 1 b y G 0 ( A ) = P (( U, γ c ( U )) ∈ A ) and G 1 ( A ) = P (( U, − γ c ( U )) ∈ A ). W e b egin by b oun ding R | q 1 − q 0 | 2 . Deﬁne th e D -cub e Z = [ − 1 / (2 a √ γ ) , 1 / (2 a √ γ )] D . Then , by P arsev al’s iden tit y , and that fact that q ∗ j = φ ∗ g ∗ j , (2 π ) D Z | q 1 − q 0 | 2 = Z | q ∗ 1 − q ∗ 0 | 2 = Z | φ ∗ | 2 | g ∗ 1 − g ∗ 0 | 2 = Z Z | φ ∗ | 2 | g ∗ 1 − g ∗ 0 | 2 + Z Z c | φ ∗ | 2 | g ∗ 1 − g ∗ 0 | 2 ≡ I + II . Then II = Z Z c | g ∗ 1 ( t ) − g ∗ 0 ( t ) | 2 | φ ∗ ( t ) | 2 ≤ Z Z c | φ ∗ ( t ) | 2 ≤ C  Z ∞ 1 / (2 a √ γ ) e − t 2 dt  D ≤ p oly ( γ ) e − D / 4 a 2 γ . No w we b ound I . W rite t ∈ R D as ( t 1 , t 2 ) wh ere t 1 = ( t 11 , . . . , t 1 d ) ∈ R d and t 2 = ( t 21 , . . . , t 2( D − d ) ) ∈ R D − d . Let c 1 ( u ) = Q d ℓ =1 e c ( u ℓ ) d en ote the ﬁrst comp onent of the v ector-v alued function c . W e hav e g ∗ 1 ( t ) − g ∗ 0 ( t ) = Z R d ( e it 1 · u + it 21 γ c 1 ( u ) − e it 1 · u − it 21 γ c 1 ( u ) ) ζ ( u ) du = 2 i Z e it 1 · u sin( t 21 γ c 1 ( u )) ζ ( u ) du = 2 i Z e it 1 · u ∞ X k =0 ( − 1) k t 2 k + 1 21 γ 2 k + 1 (2 k + 1)! c 2 k + 1 1 ( u ) ζ ( u ) du 12 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN = 2 i ∞ X k =0 ( − 1) k t 2 k + 1 21 γ 2 k + 1 (2 k + 1)! Z e it 1 · u c 2 k + 1 1 ( u ) ζ ( u ) du = 2 i ∞ X k =0 ( − 1) k t 2 k + 1 21 γ 2 k + 1 (2 k + 1)! d Y ℓ =1 Z e it 1 ℓ u ℓ e c 2 k + 1 ( u ℓ ) e ζ ( u ℓ ) du ℓ = 2 i ∞ X k =0 ( − 1) k t 2 k + 1 21 γ 2 k + 1 (2 k + 1)! d Y ℓ =1 ( e c 2 k + 1 e ζ ) ∗ ( t 1 ℓ ) = 2 i ∞ X k =0 ( − 1) k t 2 k + 1 21 γ 2 k + 1 (2 k + 1)! d Y ℓ =1 m k ( t 1 ℓ ) , where m k ( t 1 ℓ ) = ( e c 2 k + 1 e ζ ) ∗ ( t 1 ℓ ) = ( e c ∗ ⋆ e c ∗ ⋆ · · · ⋆ e c ∗ | {z } 2 k + 1 tim es ⋆ e ζ ∗ )( t 1 ℓ ) . (28) Note that e c ∗ = 1 2 δ − 1 / ( a √ γ ) + 1 2 δ 1 / ( a √ γ ) , where δ y a Dirac delta function at y , that is, a generalize d function corre- sp ond in g to p oin t ev aluation at y . F or any in teger r , if w e con v olv e e c ∗ with itself r times, we h a v e that e c ∗ ⋆ e c ∗ ⋆ · · · ⋆ e c ∗ | {z } r times =  1 2  r r X j =0  r j  δ a j , (29) where a j = (2 j − r ) / ( a √ γ ). T hus m k ( t 1 ℓ ) =  1 2  2 k + 1 2 k + 1 X j =0  2 k + 1 j  e ζ ∗ ( t 1 ℓ − a j ) . (30) No w e ζ ∗ ( t 1 ℓ ) = exp( − t 2 1 ℓ 2 ) and e ζ ∗ ( s ) ≤ 1 for all s ∈ R . F or t ∈ Z , e ζ ∗ ( t 1 ℓ − a j ) ≤ e − 1 / (2 a 2 γ ) , and th us | m k ( t 1 ℓ ) | ≤ e − 1 / (2 a 2 γ ) . Hence, Q d ℓ =1 | m k ( t 1 ℓ ) | ≤ e − d/ (2 a 2 γ ) . It follo ws that for t ∈ Z , | g ∗ 1 ( t ) − g ∗ 0 ( t ) | ≤ 2 ∞ X k =0 | t 21 | 2 k + 1 γ 2 k + 1 (2 k + 1)! d Y ℓ =1 | m k ( t 1 ℓ ) | ≤ e − d/ (2 a 2 γ ) ∞ X k =0 | t 21 | 2 k + 1 γ 2 k + 1 (2 k + 1)! ≤ e − d/ (2 a 2 γ ) sinh( | t 21 | γ ) ≤ e − d/ (2 a 2 γ ) . MANIFOLD ESTIMA TION 13 So, I = Z Z | g ∗ 1 ( t ) − g ∗ 0 ( t ) | 2 | φ ∗ ( t ) | 2 dt ≤ Z Z | g ∗ 1 ( t ) − g ∗ 0 ( t ) | 2 dt ≤ Vo lume ( Z ) e − d/ ( a 2 γ ) = p oly ( γ ) e − d/ ( a 2 γ ) . Hence, Z | q 1 − q 0 | 2 ≤ I + II ≤ p oly ( γ ) e − d/a 2 γ + p oly ( γ ) e − D / 4 a 2 γ = p oly ( γ ) e − 2 w /γ , (31) where 2 w = min { d/a 2 , D / (4 a 2 ) } . Next w e b ound R | q 1 − q 0 | so that we can apply L e Cam’s lemma. Let T γ b e a ball cen tered at the origin with radius 1 /γ . Then, by C auc h y–Sch warz, Z | q 1 − q 0 | = Z T γ | q 1 − q 0 | + Z T c γ | q 1 − q 0 | ≤ q V olume ( T γ ) s Z | q 1 − q 0 | 2 + Z T c γ | q 1 − q 0 | ≤ p oly ( γ ) e − w /γ + Z T c γ | q 1 − q 0 | . F or all small γ w e hav e that K ⊂ T γ . Hence, Z T c γ | q 1 − q 0 | ≤ Z M 1 Z T c γ φ ( k y − u k ) + Z M 0 Z T c γ φ ( k y − u k ) ≤ p oly ( γ ) e − D /γ 2 ≤ p oly ( γ ) e − w /γ . Putting this all together, we h a v e that R | q 1 − q 0 | ≤ p oly ( γ ) e − w /γ . No w we apply Lemma 1 and conclude that, for eve ry γ > 0, sup Q E ( L ( M , c M )) ≥ γ 8 (1 − p oly ( γ ) e − w /γ ) 2 n . Set γ ≍ w / log n and conclude that, f or all large n , sup Q E ( L ( M , c M )) ≥ w 8 e 2 1 log n . This conclud es th e p ro of of the lo wer b ound except that it remains to show that M 0 , M 1 ∈ M ( κ ). Note that | e c ′′ ( u ) | = a − 2 | cos( u/ ( a √ γ ) | . Hence, as long as a > √ κ , sup u | e c ′′ ( u ) | < 1 /κ . I t no w f ollo ws that M 0 , M 1 ∈ M ( κ ). This completes the pro of.  14 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN Remark. Consider the sp ecial case where D = 2 , d = 1 and the manifold has the sp ecial form { ( u, m ( u )) : u ∈ R } for some function m : R → R . In this case, estimating the manifold is lik e estimating a regression function with errors in v ariables. (More on th is in S ection 6 .) The r ate obtained for estimating a regression fu nction with errors in v ariables un der these conditions [F an and T ruong ( 1993 )] is 1 / log n in agreemen t with our rate. Ho w ev er, the pro of tec h nique is n ot quite the same as we explain in Section 6 . Remark. The pr o of of the low er b ound is s imilar to other lo w er b ounds in d econv olution problems. There is an int eresting tec hnical diﬀerence, ho w- ev er. In standard decon volution, we can c ho ose G 0 and G 1 so that g ∗ 1 ( t ) − g ∗ 0 ( t ) is zero in a large neighborh o o d around the origin. T h is simpliﬁes the pro of considerab ly . I t app ears we cannot do this in the m anifold case since G 0 and G 1 ha v e diﬀerent sup p orts. Next we construct an up p er b oun d. W e us e a standard deconv olution densit y estimator b g (ev en though t G has no density), and then w e threshold this estimator. Theorem 9. Fix any 0 < δ < 1 / 2 . L et h = 1 / √ log n . L et λ n b e such that C ′  1 h  D − d < λ n < C ′′  1 L  2 k  1 h  D − d , wher e k ≥ d/ (2 δ ) , C ′ is deﬁne d in L emma 11 and C ′′ and L ar e deﬁne d in L emma 12 . Deﬁne c M = { y : b g ( y ) > λ n } wher e b g is deﬁne d in ( 34 ). Then for al l lar ge n , inf c M sup Q ∈Q E Q [ L ( M , c M )] ≤ C  1 log n  (1 − δ ) / 2 . (32) Let us now deﬁne the estimator in more detail. Deﬁne ψ k ( y ) = sinc 2 k ( y / (2 k )). By elemen tary calculations, it follo ws that ψ ∗ k ( t ) = 2 k B 2 k  t 2 k  , where B r = J ⋆ · · · ⋆ J | {z } r times where J = 1 2 I [ − 1 , 1] . Th e follo wing prop erties of ψ k and ψ ∗ k follo w easil y: (1) Th e supp ort of ψ ∗ k is [ − 1 , 1]. (2) ψ k ≥ 0 and ψ ∗ k ≥ 0. (3) R ψ ∗ k ( t ) dt = ψ k (0) = 1. (4) ψ ∗ k and ψ k are sph er ically s y m metric. (5) | ψ k ( y ) | ≤ 1 / ((2 k ) 2 k | y | 2 k ) f or all | y | > π / (2 k ). Abusing notation somewhat, when u is a v ector, we tak e ψ k ( u ) ≡ ψ k ( k u k ). MANIFOLD ESTIMA TION 15 Deﬁne b g ∗ ( t ) = b q ∗ ( t ) φ ∗ ( t ) ψ ∗ k ( ht ) , (33) where b q ∗ ( t ) = 1 n P n i =1 e − it T Y i is the empir ical c haracteristic function. No w deﬁne b g ( y ) =  1 2 π  D Z e − it T y ψ ∗ k ( ht ) b q ∗ ( t ) φ ∗ ( t ) dt. (34) Let g ( y ) = E ( b g ( y )). Lemma 10. F or al l y ∈ R D , g ( y ) =  1 2 π h  D Z ψ k  k y − u k h  dG ( u ) . Pr oof . Let ψ k ,h ( x ) = h − D ψ k ( x/h ). Hence, ψ ∗ k ,h ( t ) = ψ ∗ k ( th ). No w, g ( y ) =  1 2 π  D Z e − it T y ψ ∗ k ( th ) q ∗ ( t ) φ ∗ ( t ) dt =  1 2 π  D Z e − it T y ψ ∗ k ( th ) g ∗ ( t ) φ ∗ ( t ) φ ∗ ( t ) dt =  1 2 π  D Z e − it T y ψ ∗ k ( th ) g ∗ ( t ) dt =  1 2 π  D Z e − it T y ψ ∗ k ,h ( t ) g ∗ ( t ) dt =  1 2 π  D Z e − it T y ( g ⋆ ψ k ,h ) ∗ ( t ) dt =  1 2 π  D ( g ⋆ ψ k ,h )( y ) =  1 2 π  D Z ψ k ,h ( y − u ) dG ( u ) = 1 h D  1 2 π  D Z ψ k  y − u h  dG ( u ) .  Lemma 11. We have that in f y ∈ M ∩K g ( y ) ≥ C ′ h d − D . Pr oof . Cho ose any x ∈ M ∩ K and let B = B ( x, C h ). Note that G ( B ) ≥ b ( M ) ch d . Hence, g ( x ) = (2 π ) − D h − D Z ψ k  x − u h  dG ( u ) ≥ (2 π ) − D h − D Z B ψ k  x − u h  dG ( u ) ≥ (2 π ) − D h − D G ( B ) = C ′ h d − D .  16 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN Lemma 12. Fix 0 < δ < 1 / 2 . Supp ose tha t k ≥ d/ (2 δ ) . Then, sup { g ( y ) : y ∈ K , d ( y , M ) > Lh 1 − δ } ≤ C ′′ L − 2 k  1 h  D − d . (35) Pr oof . Let y b e suc h that d ( y , M ) > Lh 1 − δ . F or in teger j ≥ 1, d eﬁne A j = [ B ( y , ( j + 1) Lh 1 − δ ) − B ( y , j Lh 1 − δ )] ∩ M ∩ K . Then g ( y ) =  1 2 π h  D Z ψ k  k u − y k h  dG ( u ) ≤  1 2 π h  D ∞ X j =1 Z A j ψ k  k u − y k h  dG ( u ) ≤  1 2 π h  D X j Z A j  2 k h k u − y k  2 k dG ( u ) ≤ C  1 h  D X j Z A j  h j Lh 1 − δ  2 k dG ( u ) ≤ C  1 h  D L − 2 k h 2 kδ X j  1 j  2 k G ( A j ) (*) ≤ C  1 h  D L − 2 k h 2 kδ (**) ≤ C  1 h  D L − 2 k h d ≤ C ′′ L − 2 k  1 h  D − d , where equation ( * ) follo ws b ecause G is a probabilit y m easure and P j j − 2 k < ∞ , and equation ( ** ) follo ws b ecause 2 k δ ≥ d .  No w deﬁn e Γ n = sup y | b g ( y ) − g ( y ) | . Lemma 13. L et h = 1 / √ log n , and let ξ > 1 . Then, for lar ge n , Γ n =  1 √ log n  4 k + 4 − D (36) on an e v ent A n of pr ob ability at le ast 1 − n − ξ . MANIFOLD ESTIMA TION 17 Pr oof . W e p ro ceed as in T h eorem 2.3 of Stefanski ( 1990 ). Note that b g ( y ) − g ( y ) =  1 2 π  D Z e − it T y ψ ∗ k ( th ) φ ∗ ( t ) ( b q ∗ ( t ) − q ∗ ( t )) dt, (37) and also note that the integ rand is 0 f or k t k > 1 /h . S o sup y | b g ( y ) − g ( y ) | ≤ ∆ n (2 π ) D     Z k t k≤ 1 /h ψ ∗ k ( th ) φ ∗ ( t ) dt     , (38) where ∆ n = sup k t k < 1 /h | b q ∗ ( t ) − q ∗ ( t ) | . F or D = 1, it follo ws from Theorem 4.3 of Y ukich ( 1985 ) that Q n (∆ n > 4 ε ) ≤ 4 N ( ε ) exp  − nε 2 8 + 4 ε/ 3  + 8 N ( ε ) exp  − nε 96  , (39) where N ( ε ) is the brac keting num b er of the set of complex exp onentials, whic h is giv en by N ( ε ) = 1 + 24 M ε T n ε , and M ε is deﬁned b y Q ( k Y k > M ε ) ≤ ε/ 4. By a similar argument, w e hav e that in D dim en sions, sup Q ∈Q n Q n (∆ n > 4 ε ) ≤ 4 N ( ε ) exp  − nε 2 8 + 4 ε/ 3  + 8 N ( ε ) exp  − nε 96  , (40) where no w N ( ε ) = C  1 + 24 M ε T n ε  ε − ( D − 1) , (41) and M ε is deﬁned by su p Q ∈Q n Q ( k Y k > M ε ) ≤ ε/ 4. Note that M ε = O (1). It follo w s that ∆ n ≤ q C l og n n except on a set of probabilit y n − ξ where ξ can b e made arbitrarily large by taking C large. No w, note that ψ ∗ k ( ht ) /φ ∗ ( t ) is a sph erically symmetric function R ( k t k ). Hence, Z k t k≤ 1 /h ψ ∗ k ( ht ) φ ∗ ( t ) dt = C Z 1 /h s =0 R ( s ) s D − 1 ds ≤ C h 4 k + 4 − D e 1 / (2 h 2 ) , where the last r esult follo w s from Lemma 3.1 in S tefanski ( 1990 ) using pa- rameters δ = 2 , γ = 1 / 2, r = 2 k + 2 , β = D − 1, with λ = h . The v alue of r follo w s from the d eﬁ nition of ψ ∗ k . T he resu lt no w follo ws b y com binin g th is b ound with ( 38 ).  No w we can complete the pr o of of the u p p er b ound . Pr oof of The orem 9 . On the ev en t A n where Γ n ≤ (1 / √ log n ) 4 k + 4 − D (deﬁned in th e previous lemma), we ha v e inf y ∈ M ∩K b g ( y ) ≥ in f y ∈ M ∩K g ( y ) − Γ n ≥ C  1 h  D − d −  1 √ log n  4 k + 4 − D 18 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN ≥ ( C / 2)  1 h  D − d > λ n . This implies that M ∩ K ⊂ c M ∩ K Next, w e h a v e sup y ∈K d ( y,M ) ≥ Lh 1 − δ b g ( y ) ≤ sup y ∈K d ( y,M ) ≥ Lh 1 − δ g ( y ) + Γ n ≤ C L − 2 k  1 h  D − d +  1 √ log n  4 k + 4 − D ≤ 2 C L − 2 k  1 h  D − d < λ n for large enough L . This implies that { y : y ∈ K and d ( y , M ) ≥ Lh 1 − δ } ∩ c M = ∅ . Therefore, on A n , L ( M , c M ) ≤ C ( 1 log n ) (1 − δ ) / 2 and hence, E ( L ( M , c M )) = E ( L ( M , c M )1 A n ) + E ( L ( M , c M )1 A c n ) ≤ C  1 log n  (1 − δ ) / 2 + Q n ( A c n ) ≤ C  1 log n  (1 − δ ) / 2 + n − ξ ≤ C  1 log n  (1 − δ ) / 2 , and the th eorem is prov ed.  Remark. Again, the pro of of the upp er b ound is similar to pro ofs used in other decon vo lution problems. But once more, there are interesting d iﬀer- ences. In particular, th e density estimator b g is not estimating any u nderlying densit y since the measure G is s ingular and thus do es not h a v e a densit y . Hence, the us u al bias calculation is meaningless. Remark. Note that c M is a set not a manif old; if desired, w e can re- place c M with any manifold in { M ∈ M : M ⊂ c M } , and then the estimator is a manifold and the rate is the same. Remark. The upp er b oun d is s ligh tly slo w er than the low er b ound. The rate is consistent with the results in Caillerie et al. ( 2011 ) wh o sho w that E ( W 2 ( b g , G )) ≤ C / √ log n wh ere W 2 is the W asserstein distance. In the sp ecial case where the manifold has the f orm { ( u, m ( u )) : u ∈ R } for some function m , the problem can b e view ed as n onparametric regression with measuremen t error; see Section 6 . In this sp ecial case, we can use the d econ- MANIFOLD ESTIMA TION 19 v olution k ernel r egression estimator in F an and T ru ong ( 1993 ) wh ic h ac hiev es the rate 1 / log n . W e do not kn o w of any estimator in the general case th at ac h iev es the rate 1 / log n , although we conjecture that the follo wing estima- tor might h a v e a b etter rate: let ( c M , b G ) min im ize sup k t k≤ T n | b q ∗ ( y ) − q ∗ M ,G ( t ) | where T n = O ( √ log n ). In an y case, as with all Gaussian decon v olution prob- lems, th e r ate is ve ry slo w, and the d iﬀerence b et w een 1 / log n and 1 / √ log n is not of practical consequence. 6. Singular decon v olution. Estimating a m anifold und er additiv e noise is related to decon vo lution. It is also related to regression with errors in v ariables. The purp ose of this section is to explain the connections b et wee n the pr ob lems. 6.1. R elationship to density de c onvolution. Recall that th e mo del is Y = X + Z where X ∼ G , G is supp orted on a man if old M and Z ∼ Φ . G is a singular measure supp orted on the d -dimensional manifold M . No w consider a somewhat simpler mo d el: supp ose again th at Y i = X i + Z i , but sup p ose that X h as a density g on R D (instead of b eing su p p orted on a manifold). All three distributions Q , G and Φ ha v e D -dimensional supp ort and Q = G ⋆ Φ . The problem of r eco vering the densit y g of X from Y 1 , . . . , Y n is the usu al densit y d econ v olution p roblem. A key reference is F an ( 1991 ). Most of the existing literature on decon v olution assumes that X and Y ha v e the same supp ort, or at least that the supp orts h a v e the same d imen- sion; an exception is Koltc hinskii ( 2000 ). Manifold learning ma y b e regarded as the p roblem of decon vo lution for singular m easures. It is instru ctiv e to compare the lea st fa v orable p air used f or pro ving th e lo w er b ounds in the ordinary case v ersus the singular case. Figure 4 sho ws a typical least fav orable pair f or pr o ving a low er b ound in ord inary decon vo - lution. The top left plot is a density g 0 , and th e top r ight plot is a densit y g 1 whic h is a p erturb ed version of g 0 . The L 1 distance b et w een the densities is ε . The b ottom plots are q 0 = R φ ( y − x ) g 0 ( x ) dx and q 1 = R φ ( y − x ) g 1 ( x ) dx . These densities are nearly ind istinguishable, and, in f act, their total v aria- tion distance is of order e − 1 /ε . Of course, these distribu tions h a v e the same supp ort and hence suc h a least fa vo rable p air will not suﬃce f or proving lo w er b oun d s in the manifold case where we will need tw o densities with diﬀeren t su pp ort. Figure 5 sho ws the type of least fav orable pair w e used for manifold learn- ing. Th e top t wo plots do n ot sh o w the densities; rather they show the sup- p ort of the densities. The distribution g 0 is uniform on the circle in the top left plot. Th e d istribution g 1 is u niform on the p erturb ed circle in the top righ t plot. Th e Hausdorﬀ distance b etw een the su pp orts of densities is ε . The b ottom plots are q 0 = R φ ( y − x ) g 0 ( x ) dx and q 1 = R φ ( y − x ) g 1 ( x ) dx . Again, these d ensities are nearly indistingu ish able, and, in fact, their tota l 20 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN Fig. 4 . A typic al le ast favor able p ai r for pr oving a lower b ounds in or dinary de c onvo- lution. The top l eft plot is a density g 0 and the top right plot is a density g 1 which is a p erturb e d version of g 0 . The L 1 distanc e b etwe en the densities is ε . The b ottom plots ar e q 0 = R φ ( y − x ) g 0 ( x ) dx and q 1 = R φ ( y − x ) g 1 ( x ) dx . These densities ar e ne arly i ndistin- guishable and, in f act, their total variation distanc e is e − 1 /ε . v ariation distance is e − 1 /ε . In th is case, ho w ev er, g 0 and g 1 ha v e diﬀerent supp orts. 6.2. R elationship to r e gr ession with me asur ement err or. W e can also re- late the manifold estimation problem with nonp arametric r egression with measuremen t err or. Supp ose th at U i = X i + Z 2 i , (42) Y i = m ( X i ) + Z 1 i , Fig. 5. The typ e of le ast f avor able p air ne e de d for pr oving lower b ounds i n m anifold le arning. The distribution g 0 is uni f orm on the cir cle in the top left plot. The distribution g 1 is uniform on the p erturb e d cir cle in the top right plot. T he Hausdorﬀ distanc e b etwe en the supp orts of the densities is ε . The b ottom plots ar e he at maps of q 0 = R φ ( y − x ) g 0 ( x ) dx and q 1 = R φ ( y − x ) g 1 ( x ) dx . These densities ar e ne arly indistinguishable and, in fact, their total variation distanc e i s e − 1 /ε . MANIFOLD ESTIMA TION 21 and we wan t to estimate the regression function m . If we obs er ve ( X 1 , Y 1 ) , . . . , ( X n , Y n ), then this is a standard nonparametric regression problem. But if w e only observ e ( U 1 , Y 1 ) , . . . , ( U n , Y n ), then this is the usual nonparamet- ric regression with measurement error problem. T h e r ates of con v ergence are sim ilar to decon v olution. Ind eed, F an and T ru on g ( 1993 ) ha v e an argu- men t that conv erts nonparametric regression with measurement error in to a densit y decon volutio n problem. Let us see ho w th is related to manifold learning. Supp ose that D = 2 and d = 1. F uther, su p p ose that the manifold is function-like , meaning that the manifold is a curv e of th e form M = { ( u, m ( u )) : u ∈ R } for some f unction m . Then eac h Y i can b e w ritten in th e form Y i =  Y i 1 Y i 2  =  U i m ( U i )  +  Z 1 i Z 2 i  whic h is exactly of the form ( 42 ). Let Q b e all such distributions obtained this wa y with | m ′′ ( u ) | ≤ 1 /κ . Ho wev er, this only h olds when the man if old has the function-lik e form. Moreo ve r, the lo wer b ound argu m en t in F an and T ruong ( 1993 ) cannot d irectly b e transferred to the manifold setting as we no w exp lain. In our low er b ound pro of, w e deﬁn ed a least f a v orable pair q 0 and q 1 for the d istribution of Y as follo ws. T ak e M 0 = { ( u, 0) : u ∈ R } and M 1 = { ( u, m ( u )) : u ∈ R } . [In fact, we us ed ( u, m ( u )) and ( u, − m ( u )), but the present discussion is clearer if we use ( u, 0) and ( u, m ( u )).] Let Y = ( Y 1 , Y 2 ). F or M 0 , the distrib ution q 0 for Y is b ased on  Y 1 Y 2  =  U 0  +  Z 1 Z 2  . The density of ( U, Y 2 ) is f 0 ( u, y 2 ) = ζ ( u ) φ ( y 2 ) where ζ is s ome density for U . Then q 0 ( y 1 , y 2 ) = f 0 ⋆ Φ = Z f 0 ( y 1 − Z 1 , y 2 ) d Φ ( z 1 ) , where the con v olution sy mb ol here and in what follo ws, refers to conv olution only o ve r U + Z 1 . No w let q 1 ( y 1 , y 2 ) d enote the distribu tion of Y in the mo del  Y 1 Y 2  =  U m ( U )  +  Z 1 Z 2  . This generates the least fa v orable pair q 0 , q 1 used in our pro of (restricted to this s p ecial case). The least fav orable pair used by F an and T ruong is diﬀerent in a sub - tle w a y . Th e ﬁr st d istribution q 0 is the same. Th e second, wh ic h w e will denote w 1 , is constructed as follo ws. Let w 1 ( y 1 , y 2 ) = f 1 ⋆ Φ , 22 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN where the conv olution is only o v er U , f 1 ( ξ , y 2 ) = f 0 ( ξ , y 2 ) + γ H ( ξ / √ γ ) h 0 ( y 2 ) , where f 1 ( ξ ) = g ( ξ ), γ H ( ξ / √ γ ) /g ( ξ ) = b ( ξ ), H is a p ertur bation function suc h as a cosine, and h 0 is c hosen so that R h 0 ( y 2 ) dy 2 = 0 and R y 2 h 0 ( y 2 ) dy 2 = 1. No w w e sho w that w 1 ( y 1 , y 2 ) 6 = q 1 ( y 1 , y 2 ). In fact, w 1 is not in Q . Note that w 1 ( y 1 , y 2 ) = f 1 ⋆ Φ = q 0 ( y 1 , y 2 ) + γ h 0 ( y 2 ) Z H  y 1 − z 1 √ γ  d Φ( z 1 ) . No w, q 1 ( y 2 | u ) = φ ( y 2 − m ( u )) , but f 1 ( y 2 | u ) = f 1 ( y 2 , u ) f 1 ( u ) = φ ( y 2 ) + m ( u ) h 0 ( y 2 ) . These b oth ha v e mean m ( u ) bu t the distrib utions are diﬀeren t. Ind eed, the marginals w 1 ( y 2 ) and q 1 ( y 2 ) are diﬀerent . In fact, w 1 ( y 2 ) = q 0 ( y 2 ) + ch 0 ( y 2 ) for some c . This is not in our class b ecause it is not of th e form φ ( y 2 − m ( u )). Hence, w 1 is not in our class Q : it d o es not corresp ond to d ra wing a p oin t on a manifold and adding noise. The p oint is that manifold learning redu ces to n onparametric regression with errors only in the sp ecial case that the manifold is fu nction-lik e. And ev en in this case, the pro ofs of the b oun ds are somewhat d iﬀerent than the usual pro ofs. 7. Discussion. Th e p urp ose of this pap er is to establish minimax b ound s on estimating manifolds. The estimators u sed to p ro v e the upp er b ounds are theoretica l constructions for the purp oses of the pro ofs. They are not practical estimators. There is a large literature on m etho dology for estimating manifolds. Ho w- ev er, these estimators are not lik ely to b e optimal except under stringen t conditions. In current wo rk we are trying to bridge the gap b et w een the theory and th e metho dology . Probably the most r ealistic noise condition is the additiv e mo del. In this case, we are dealing w ith a sin gular decon vo lution p roblem. The u pp er b oun d used decon v olution tec hniques. Suc h metho ds r equire that the noise distribu - tion is known (or is at least restricted to some narrow class of d istributions). This seems unrealistic in real problems. A more realistic goal is to estimate some pro xy manifold M ∗ that, in some s ense, appro ximates M . W e are currentl y w orking on suc h tec hniqu es. MANIFOLD ESTIMA TION 23 REFERENCES Arias-Castro, E. , Donoho, D. L. , H uo, X. and T o vey, C. A. (2005). Connect the dots: How many random p oints can a regular curve pass th rough? A dv. in Appl. Pr ob ab. 37 571–603. MR2156550 Baraniuk, R. G. and W akin, M. B. (2009). Random pro jections of smo oth manifolds. F ound. Comput. Math. 9 51–77. MR2472287 Bousquet, O. , Boucheron, S. and Lugosi, G . (2004). Introdu ction to statistical learn- ing theory . Machine Le arning 3176 169–207. Caillerie, C. , Chazal, F. , Dedecker, J. and Michel, B . (2011). Decon volution for the Wasserstein metric and geometric inference. El e ctr on. J. Stat. 5 1394–14 23. MR2851684 Chaudhuri, K. and Dasgupt a , S. (2010). Rates of conv ergence for the cluster tree. In A dvanc es i n Neur al Information Pr o c essing Systems 23 (J. Laﬀerty , C. K. I. Williams, J. Sh a we-T aylor, R . S. Zemel and A. Culotta) 343–351. Dey, T. K. (200 7). Curve and Surfac e R e c onstruction: Algorithms with Mathematic al Ana lysis . Cambridge M ono gr aphs on Applie d and Computational Mathematics 23 . Cam- bridge Univ . Press, Cambridge. MR2267420 F an, J. (1991). On th e optimal rates of con vergence for nonparametric deconvo lution problems. Ann. Statist. 19 1257–1272. MR1126324 F an, J. and Truong, Y. K. (1993). Nonp arametric regression with errors in var iables. Ann . Statist. 21 1900–19 25. MR1245773 Federer, H. (1959). Curv ature measures. T r ans. A mer. Math. So c. 93 418–491. MR0110078 Genovese, C. R. , Per one-P acifico, M. , Verdinelli, I. and W asserman, L. (2010). Minimax manifold estimation. Av ailable at arXiv: 1007.05 49 . Ko l tchinski i, V. I. ( 2000). Empirical geometry of multiv ariate data: A deconv olution approac h. Ann. Statist. 28 591–629. MR1790011 Meister, A. (2006). Estimating the supp ort of multiv ariate densities u nder measuremen t error. J. Mul tivariate An al. 97 1702–1717. MR2298884 Niyogi, P. , Smale, S. and Wei nberger, S . (2008). Finding t h e homology of submani- folds with h igh conﬁ dence from random samples. Discr ete Comput. Ge om. 39 419–441 . MR2383768 Niyogi, P. , Smale, S . and Weinberger, S. (2011). A top ological view of unsup ervised learning from n oisy data. SI AM J. Comput. 40 646–663. Ozer tem, U. and Erdogmu s, D. (2011). Lo cally deﬁned principal curves and surfaces. J. Mach. L e arn. R es. 12 1249–1286. MR2804600 Stef an ski, L. A. (1990). Rates of conv ergence of some estimators in a class of deconv o- lution problems. Statist. Pr ob ab. L ett. 9 229–235. MR1045189 Yu, B. (1997). Assouad, Fano, and Le Cam. In F estschrift for Lucien Le Cam 423–435. Springer, New Y ork. MR1462963 Yukich, J. E. (1985). Laws of large numbers for classes of functions. J. Multivariate Anal. 17 245–260. MR0813235 C. R. Genovese L. W asserman Dep art ment of St a tistics Carnegie Mellon University Pittsburgh, Pen nsyl v ania 15213 USA E-mail: geno ve se@stat.cmu.edu larry@stat.cmu.edu M. Perone-P acifico Dep art ment of St a tistical Sciences Sapienza University of Rome R ome It al y E-mail: marco.per onepaciﬁco@un iroma1.i t 24 GENOVESE, PER ONE-P ACIFICO, VER DINELLI AND W ASSERMAN I. Verdinelli Dep art ment of St a tistics Carnegie Mellon University Pittsburgh, Pennsyl v ania 1 5213 USA and Dep art ment of St a tistical Sciences Sapienza University of Rome R ome It al y E-mail: isab el l a@stat.cm u.edu

Manifold estimation and singular deconvolution under Hausdorff loss

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment