Dimensionality reduction with subgaussian matrices: a unified theory

DIMENSIONALITY REDUCTION WITH SUBGA USSIAN MA TRICES: A UNIFIED THEOR Y SJOERD DIRKSEN Abstract. W e present a t heory for Euclidean dimensionality reduct ion with subgaussian matrices whic h uniﬁes several restricted isometry property and Johnson-Lindenstrauss type results obtained earl ier for sp eciﬁc data sets. In particular, we recov er and, in s ev eral cases, improv e r esults f or sets of spars e and structured sparse vecto r s , low-rank matrices and tensors, and smo oth manifolds. In addition, w e establish a new Johnson-Lindenstrauss em b edding for data sets taking the form of an inﬁnite union of subspaces of a Hil ber t space. 1. Introduction The analysis of high-dimensional data lea ds to v arious co mputational issues which are gathered informally under the term ‘curse of dimensionality’. T o circum- ven t such issues, s e veral methods have been pro p o s ed to reduce the dimensionality of the data, i.e., to map the data set in a lower-dimensional space while a pproxi- mately prese r ving certain relev ant pro p erties of the set. This is o ften p ossible as many high-dimensional da ta sets p osses s some additional structure which ensures that it has a low ‘intrinsic dimension’ or ‘co mplexity’. This paper concerns the r andom dimensionality r eduction method, which seeks to em b ed a data set using a rando m linear map. Undoubtedly the most famous known result in this dir ection is the clas sical e mbedding o f J ohnson and Lin- denstrauss [28]. They show ed that if Φ is the o r thogonal pro jection onto an m - dimensional subspa c e of R n which is chosen uniformly a t ra ndo m, then with high probability Φ preser ves the pairwise dista nces in a g iven ﬁnite subset P of R n up to a mult iplica tive (or rela tiv e) err o r ε provided that m ≥ C ε − 2 log |P | , where |P | is the cardina lit y o f P . Simpler pr o ofs of this re s ult later app eared in [14, 21]. Due to these histor ic origins, many authors refer to rando m dimensionality reduction as the ‘ra ndom pr o jection metho d’. Ho wev er, it is well-kno wn that one can replace the rando m pr o jection matrix Φ by a computationa lly more attra ctive subgaussian matrix. These ma trices per form e q ually well, as stated in the following mo dernized version of the Jo hnson-Lindenstra uss embedding [26, 37]. Theorem 1. 1 (Johnson-Lindenstr auss embedding) . Le t P b e a set of |P | p oints in R n . L et ˜ Φ b e an m × n matrix with entries ˜ Φ ij which ar e indep endent, me an- zer o, unit varia nc e and √ α -sub gaussian. Set Φ = 1 √ m ˜ Φ . Ther e exists an absolute c onstant C > 0 such that for any given 0 < ε, η < 1 we have (1) (1 − ε ) k x − y k 2 2 ≤ k Φ( x ) − Φ( y ) k 2 2 ≤ (1 + ε ) k x − y k 2 2 for all x, y ∈ P with pr ob abili t y 1 − η , pr ovid e d that (2) m ≥ C α 2 ε − 2 max { log |P | , log( η − 1 ) } . Key wor ds and phr ases. Random dimensionality reduction, Johnson-Lindenstrauss embed- dings, restri cted isometry prop erties, union of subspaces. This research was supp orted by SFB gran t 1060 of the Deutsc he F orsc hungsgeme i nschaft (DF G). 1 2 SJOERD DIRKSEN If ε P , Φ denotes the smallest constant s uch that (1) holds, then we ca n formulate Theorem 1.1 compactly by saying that P ( ε P , Φ ≥ ε ) ≤ η if m satisﬁes (2). All results in this pap er will b e phrased in this manner . In general, the dep endence on the num ber of p oints |P | in (2) canno t b e improved [3 ] and the dependence o n ε and η is already o ptimal if |P | = 1 [27]. Random dimensionality reductio n is attractive for several r easons. It is easy to implement and computationa lly inexp ensive in comparison with o ther dimen- sionality re ductio n metho ds suc h as principa l comp onent analysis, see e.g. [7] for an e mpirical comparison on image and text data . Moreov er, the metho d is non- adaptive o r oblivio us to the data set, meaning that the metho d do es not re quire any prior knowledge of the data set as input. It is therefore particularly suitable as a prepro cess ing step. Metho ds incorp or ating r andom dimensionality reduction have bee n pr op osed for a wide range of task s , such as approximate nearest-neig hbor search [25, 26], lea rning mixtures of Ga ussians [1 3, 29], clustering [6, 46], mani- fold le a rning [2 4, 5 2], matched ﬁeld pro cessing [34, 36] and least squa res reg r ession [17, 33]. V arious other a pplications can b e fo und in [51]. So metimes these metho ds are coined a ‘compr essive’ version o f the or iginal metho d, e .g. co mpressive matc hed ﬁeld pro cessing [36]. Several of these a pplications rely on extensio ns o f Theorem 1.1 to an inﬁnite, but str uctured data s e t P . In these re s ults the facto r log |P | in (2) is replaced by a diﬀerent quantit y that r epresents the intrinsic dimension of the data set P . F or instance, if P is a K -dimensional subspace, then one can s how that P ( ε P , Φ ≥ ε ) ≤ η if m ≥ C α 2 ε − 2 max { K, log( η − 1 ) } . Suc h Johnson-Lindens trauss embeddings for s ubs pa ces were introduced in [45] for use in n umerica l linear alge- bra. W e also mention the embedding results for smo o th manifolds [5, 12, 1 8] which are motiv ated by manifold lea rning. In a slightly diﬀerent vein, some autho r s have inv estigated low er b ounds on m tha t guar antee that a subgaussian matrix preserves pairwise distances up to a n additive rather than a multiplicativ e er ror [2, 26]. In the signal pro cessing liter ature so me c losely r elated results a pp ea r in the form of r estricte d isometry pr op erties . Reca ll that a map Φ : R n → R m satisﬁes a restricted isometry pr op erty with constant δ on a set P if (1 − δ ) k x k 2 2 ≤ k Φ( x ) k 2 2 ≤ (1 + δ ) k x k 2 2 for all x ∈ P . It is a well-known result fro m compressed sensing that a subgaussia n matr ix Φ satisﬁes the r e s tricted isometry prop erty on the set of s -spar s e vectors in R n with probability 1 − η if m ≥ C α 2 δ − 2 max { s log( n/s ) , log( η − 1 ) } [4, 11, 16, 38, 39]. This prop erty implies that with high probability one can rec over any s -spar se signal in a stable, r obust and algo r ithmically eﬃcient manner from m ∼ s log( n/s ) subg aus- sian meas ur ements, see e.g. [20, Chapter 6] and the r eferences therein. Inspired by these developmen ts res tr icted is ometry prop er ties of subg a ussian matrices ha ve bee n established fo r v arious sig nal sets with str uctured spar s it y [9, 19, 22]. An- other example is the restricted isometry prop erty of subgaus sian matric e s a cting on low-rank matr ices. This re s ult is used as a substitute for the restricted iso metry prop erty on spar s e vectors in the low-rank matrix recov er y litera tur e [10, 44]. The purpo se o f this pa p er is to give a uniﬁed treatment of the aforementioned collection of restricted isometry and J ohnson-Lindenstra uss type prop erties for sub- gaussian matrices, as well a s their ‘a dditive erro r counterparts’. In Theor em 4.8 we for mulate a ‘master b ound’ from which one can deduce these pro per ties for subgaussia n maps (as in Deﬁnition 4.4) acting on a ny given data set in a p ossibly inﬁnite-dimensional Hilb ert space. This result is an extension of earlier work in [23, 30, 3 8], see the discussio n after Theore m 4.8 for details. W e give a transpar - ent pro o f using a new tail b ound for suprema o f empirical pro cess e s from [15]. The main fo cus of our work is to make Theorem 4.8 an accessible to ol for non-sp ecia lists, by demonstrating extens ively how to apply it to extract results for co ncrete da ta DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 3 structures. As it turns o ut, in all co nsidered applications we recover the bes t known results in the liter a ture and often ﬁnd an improved lower b ound on the tar g et dimen- sion m . On several o c c asions we a lso extend e arlier r esults for Gaussian matr ices to ge ne r al subga us sian matrices. This cla s s co ntains computationally more eﬃcient matrices than Gaus s ian matrices, s e e [1] and Example 4.6. Moreover, the extension to subgaussia n matrices is of interest for cer tain signal pro ces sing a pplica tions, see [20, Sectio n 1.2] and [42] for e x amples. T o co nclude, we give a br ie f overview of the cons ide r ed applicatio ns. In Sec- tion 5 we c onsider sets with low cov er ing dimens io n. This class o f sets includes ﬁnite unions o f subspaces, with sets of sparse and cospa rse v ecto r s as particular examples, as well as low-rank matrices and tensors. In Section 6 we cons ider sets forming an inﬁnite unio n of s ubs paces of a Hilber t space. A wide v ariety of mo dels in sig nal pro cessing can b e expr essed in this form. F or instance, signals exhibiting structured spar s it y , piecewise p olynomials , certain ﬁnite rate of innov ations mo dels and overlapping echoes ca n b e describ ed in this fashion [8 , 9 , 1 9, 32]. The main result in this section, Theorem 6 .3, es tablishes a new Jo hnson-Lindenstra uss t yp e embedding for an inﬁnite union of subspaces o f a Hilb ert spac e . The embedding di- mension in this result dep ends on the maximal dimension o f the subspaces and the complexity o f the index set measur ed in terms of the Finsler distance, which is re- lated to the large s t principal angles b e tw een the subs pa ces. Our res ult s ig niﬁcantly improv es upo n recent work in this dir ection [34, 35], s ee the dis c us sion a fter Theo- rem 6.3. By co mb ining with the main result o f [8] we deduce that one can robustly reconstruct a signal in an inﬁnite union of subspac es from a small n umber of sub- gaussian measure men ts using a g e neralized iterative har d thresholding metho d, s ee Remark 6.4. Finally , in Section 7 we deduce three diﬀerent dimensiona lity r eduction results for smo oth s ubmanifolds of R n . W e ﬁr s t de duce a guar antee under whic h lengths of curves in the manifold are preserved uniformly b y a subg a ussian map. F urther on we give c o nditions under which pa ir wise ambien t distances are preserved up to an a dditiv e er ror. Finally , we establish Johnso n-Lindenstrauss embedding re- sults for smo oth manifolds. W e ﬁrst give a n improvemen t of the embedding re s ult for manifolds with low linearization dimension from [2]. In Theorems 7.7 and 7.9 w e present embeddings in the spirit of [5, 12, 18]. In pa r ticular, we extend the recent result of [1 8] from Gaussia n to subga ussian matrices and achieve o ptimal sca ling in the er ror par ameter ε . 2. Preliminaries and not a tion Throughout the pap er we use the following terminology . W e use (Ω , F , P ) to denote a probability spa ce and wr ite E for the e xpe cted v alue. F o r a real-v alued random v ar iable X we deﬁne its ψ 2 or su b gaussian norm b y k X k ψ 2 = inf { C > 0 : E exp( | X | 2 /C 2 ) ≤ 2 } . If k X k ψ 2 < ∞ then we call X a sub gaussian r andom v ariable. In pa rticular, a ny centered Gaussian r andom v aria ble g with v ariance σ 2 is subga ussian a nd k g k ψ 2 . σ . Also, if | X | is b ounded by K then X is subga ussian a nd k X k ψ 2 . K . W e call a random vector X : Ω → R n subgaussia n if sup k x k 2 ≤ 1 kh X , x ik ψ 2 < ∞ . W e say that X is isotr opi c if E h X, x i 2 = k x k 2 2 , for all x ∈ R n . 4 SJOERD DIRKSEN If T is a set, then d : T × T → R + is called a semi-metric on T if d ( x, y ) = d ( y , x ) and d ( x, z ) ≤ d ( x, y ) + d ( y , z ) for all x, y , z ∈ T . If S ⊂ T , then we use ∆ d ( S ) = sup s,t ∈ S d ( s, t ) to denote its diameter . W e conclude by ﬁxing s o me no tation. W e use k · k 2 to denote the E uclidean norm o n R n and let d 2 denote the asso ciated Euclidean metric. If H is a Hilbe r t space then h· , ·i deno tes the inner pr o duct o n H , k · k H the induced inner pro duct and d H the induced metric on H . If T ⊂ H w e use ∆ H ( T ) to denote its dia meter. If A : H 1 → H 2 is a bo unded linear op erator b etw een tw o Hilber t spa ces then k A k denotes its o pe rator no rm. If S is a s et then we let | S | denote its cardinality . Given 0 < α < ∞ we wr ite log α x := (log( x )) α and log + ( x ) := max { lo g( x ) , 0 } for brevity . Finally , we write A . B if A ≤ C B for some universal co nstant C > 0 and wr ite A ≃ B if bo th A . B and A & B ho ld. 3. γ 2 -functional, Gaussian width an d entr opy In this s ection w e discuss the γ 2 -functional of a semi-metric space ( T , d ), which plays a central role in the formulation of Theor em 4.8. Intuitiv ely , one sho uld think of γ 2 ( T , d ) as mea suring the complexity of ( T , d ). Deﬁnition 3.1. Let ( T , d ) b e a s emi-metric space. A sequence T = ( T n ) n ≥ 0 of T is ca lle d admissible if | T 0 | = 1 and | T n | ≤ 2 2 n . The γ 2 -functional of ( T , d ) is deﬁned by γ 2 ( T , d ) = inf ( T , π ) sup t ∈ T X n ≥ 0 2 n/ 2 d ( t, π n ( t )) , where the inﬁmum is ta ken over all admissible sequences T = ( T n ) n ≥ 0 in T and all sequences π = ( π n ) n ≥ 0 of maps π n : T → T n . In the literature it is common to take π n ( t ) := argmin s ∈ T n d ( t, s ) in Deﬁnition 3.1, i.e., to deﬁne the γ 2 -functional as γ 2 ( T , d ) = inf T sup t ∈ T X n ≥ 0 2 n/ 2 d ( t, T n ) , where d ( t, T n ) = inf s ∈ T n d ( t, s ). Our slightly r elaxed deﬁnition will b e conv enient later o n. Let us recall the role of the γ 2 -functional in the theor y of gener ic chaining. W e recall tw o results from this theory . Suppos e that ( X t ) t ∈ T is a r eal-v alued sto chastic pro cess, which has subgaussian inc r ements with r esp ect to a semi-metric d . That is, for all s, t ∈ T , (3) P ( | X t − X s | ≥ u d ( t, s )) ≤ 2 exp( − u 2 ) ( u ≥ 0) . T alag rand’s g eneric chaining metho d [50] yields E sup t ∈ T | X t | . γ 2 ( T , d ) . This bo und is k nown to b e sharp in the following interesting sp ecial ca se. Suppose that ( G t ) t ∈ T is a centered Gaussian pro cess and let d can ( s, t ) = ( E | G s − G t | 2 ) 1 / 2 be the induced canonical metric on T . Then, T ala grand’s celebrated ma jorizing measures theor em [48, 49] s tates that (4) E sup t ∈ T | G t | ≃ γ 2 ( T , d can ) . Let g = ( g 1 , . . . , g n ) b e a vector consis ting of indep endent s ta ndard Gaus sian v ari- ables a nd for any x ∈ R n deﬁne G x = h g , x i . Then ( G x ) x ∈ T is a centered Gaussian DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 5 pro cess for any given subset T of R n . Note that the canonical metric d can coincides with the usual Euclidean metric d 2 in this case. Hence, (4 ) tra nslates into (5) γ 2 ( T , d 2 ) ≃ E sup x ∈ T |h g , x i| . The q uantit y on the rig ht hand side is known as the Gaussian width of the set T . The γ 2 -functional can b e estimated using covering num be r s. F or any given u > 0 let N ( T , d, u ) denote the c ov ering num b er of T , i.e., the smalles t num b er of ba lls of radius u in ( T , d ) needed to cov er T . Then log N ( T , d, u ) is called the u -entr opy of ( T , d ). Let I 2 ( T , d ) = Z ∆ d ( T ) 0 log 1 / 2 N ( T , d, u ) du be the asso cia ted entropy integral. It is shown in [50, Sec tion 1 .2] that (6) γ 2 ( T , d ) . I 2 ( T , d ) . and in particular γ 2 ( T , d ) . ∆ d ( T ) log 1 / 2 | T | if T is ﬁnite. The reverse estimate of (6) fails [50, Section 2.1]. Howev er, if T ⊂ R n then o ne can show that I 2 ( T , d 2 ) . (log n ) γ 2 ( T , d 2 ) . This worst-case bo und is attained by natura l ob jects, such as ellipso ids [5 0, Section 2.2]. Even though (6) is not sha r p, it is a very imp or tant to ol to estimate the γ 2 -functional in practica l situations and w e will use it s everal times b elow. F or our a nalysis we use the following tail b ound for suprema of empiric a l pro- cesses from [1 5, Theor em 5.5 ]. This result improves and extends tw o e a rlier res ults in the sa me direction of K la rtag and Mendelso n [30] and Mendelson, Pa jor and T omczak - Jaeger mann [38], s ee [15] for a detailed compariso n. Theorem 3.2 . Fix a pr ob ability sp ac e (Ω i , F i , P i ) for every 1 ≤ i ≤ m . F or every t ∈ T and 1 ≤ i ≤ m let X t,i ∈ L 2 (Ω i ) . Deﬁn e the pr o c ess of aver ages (7) A t = 1 m m X i =1 X 2 t,i − E X 2 t,i . Consider the s emi-metric d ψ 2 ( s, t ) = max 1 ≤ i ≤ m k X s,i − X t,i k ψ 2 ( s, t ∈ T ) and deﬁne t he r adius ¯ ∆ ψ 2 ( T ) = sup t ∈ T max 1 ≤ i ≤ m k X t,i k ψ 2 . Ther e exist c onstant s c, C > 0 s uch that for any u ≥ 1 , P  sup t ∈ T | A t | ≥ C  1 m γ 2 2 ( T , d ψ 2 ) + 1 √ m ¯ ∆ ψ 2 ( T ) γ 2 ( T , d ψ 2 )  + c  √ u ¯ ∆ 2 ψ 2 ( T ) √ m + u ¯ ∆ 2 ψ 2 ( T ) m  ≤ e − u . 4. Master bound Throughout, let H b e a re a l Hilb e rt spa c e. Let P b e a set of p oints in H . W e are interested in reducing the dimensionality of P , meaning that we would like to construct a ma p Φ : H → R m with the t ar get dimension m a s small as po s sible. W e call the dimension of H the original dimension , w hich we think of a s b eing very large or even inﬁnite. All the results in this pap er provide a low er b ound on m under whic h P ca n b e mapp ed int o R m , while pres e r ving ce r tain pro per ties of the set P . The following deﬁnition expr e sses that Φ approximately preserves the size of the original vectors. 6 SJOERD DIRKSEN Deﬁnition 4.1. Let P be a set in H and let Φ : H → R m . The r estricte d isometry c onstant δ P , Φ of Φ on P is the leas t po ssible co nstant 0 ≤ δ ≤ ∞ such that (1 − δ ) k x k 2 H ≤ k Φ( x ) k 2 2 ≤ (1 + δ ) k x k 2 H . It is co mmon pa rlance to lo osely say that a map Φ satisﬁes the r estricte d isometry pr op erty on P if δ P , Φ < δ ∗ , wher e δ ∗ < 1 is some sma ll v alue. The following deﬁnition expresses that Φ preserves p airwise distanc es b etw een elements of P . W e distinguish between a multiplic ative and a n additive erro r. Deﬁnition 4.2. Let P b e a set in H a nd let Φ : H → R m . W e say that Φ preserves distances o n P with multiplic ative error 0 < ε < 1 if (8) (1 − ε ) k x − y k 2 H ≤ k Φ( x ) − Φ( y ) k 2 2 ≤ (1 + ε ) k x − y k 2 H for all x, y ∈ P . The lea st p ossible constant ε P , Φ for whic h this ho lds is called the mult iplic a t ive pr e cisio n of Φ. W e say that Φ pr eserves dista nces on P with additive erro r 0 ≤ ζ < 1 if (9) k x − y k 2 H − ζ ≤ k Φ( x ) − Φ( y ) k 2 2 ≤ k x − y k 2 H + ζ for all x, y ∈ P . The least p o s sible constant ζ P , Φ for which this holds is called the additive pr e cision of Φ. The r estricted iso metr y constant and the multiplicativ e error in Deﬁnition 4.2 are c lo sely related. If (10) P c = { x − y : x, y ∈ P } denotes the set of chords ass o ciated with P , then ε P , Φ = δ P c , Φ . Maps Φ that pre serve pairwise distanc e s up to a multiplicativ e error ε are the most interesting for applications as they often preserve a dditional pro pe rties of P . F or instanc e , in applicatio ns it is often used that if Φ is in addition linear a nd −P = P then inner pr o ducts ar e pr eserved up to additive er ror, i.e., |h Φ( x ) , Φ( y ) i − h x, y i| ≤ ε for all unit vectors x, y ∈ P . This ca n b e r eadily shown using a po larizatio n argument. More ca n b e said if P is a manifold, see Section 7 b elow. Remark 4 . 3. In parts o f the literature, it is customar y to say that a map Φ approximately pr eserves distances on P with multiplicativ e error 0 < ˆ ε < 1 if (11) (1 − ˆ ε ) k x − y k H ≤ k Φ( x ) − Φ( y ) k 2 ≤ (1 + ˆ ε ) k x − y k H for all x, y ∈ P . That is, o ne leav es out the sq uares in (8). Let ˆ ε P , Φ be the smallest p oss ible ˆ ε in (11). One r eadily chec ks that ˆ ε P , Φ ≤ ˆ ε if ε P , Φ ≤ 2 ˆ ε − ˆ ε 2 . Below w e will der ive v ario us dimensionality reduction results for the following class o f rando m maps. Deﬁnition 4.4 (Subgauss ian map) . Let S be any set of p oints in H . F or ev er y 1 ≤ i ≤ m let (Ω i , F i , P i ) b e a pr obability spa c e. Let Ω b e the corre sp o nding pro duct pr o bability spac e . W e call Φ : Ω × H → R m a linea r, iso tropic, subga ussian map, or brieﬂy a su b gaussian map on S if the following conditions hold. (a) (Linearity) F or any ω ∈ Ω the map Φ( ω ) : H → R m is linear ; (b) (Independence) F or all x ∈ S and 1 ≤ i ≤ m , [Φ( x )] i ∈ L 2 (Ω i ); (c) (Isotropy) F o r any x ∈ S we have E k Φ( x ) k 2 2 = k x k 2 H ; (d) (Subgaussianity) Ther e is an α ≥ 1 such that for all x, y ∈ S ∪ { 0 } , max 1 ≤ i ≤ m k [Φ( x ) − Φ( y )] i k ψ 2 ≤ r α m k x − y k H . Note that the co ndition α ≥ 1 is forced b y the as sumption tha t Φ is iso tropic. DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 7 Example 4.5 (Subgaussian matrices) . Suppose that H = R n for some n ∈ N , equipp e d with the Euclidea n nor m. Let ˜ Φ be an m × n ra ndo m matrix, whose rows ˜ Φ 1 , . . . , ˜ Φ m are independent, mean-ze ro, isotropic, subgaus s ian rando m vectors in R n . By setting Φ = 1 √ m ˜ Φ we obtain a subga ussian map on R n . Example 4.6 (Databa se-friendly maps [1]) . As a particular instance of the pr evi- ous exa mple, we can let ˜ Φ b e any rando m matrix ﬁlled with indep endent, mean-zer o, unit v aria nce, subgauss ian (in par ticular, b ounded) entries Φ ij . In [1], Ach lio ptas prop osed to take indep endent r andom v ariable s satisfying P (Φ ij = − √ 3) = 1 6 , P (Φ ij = 0) = 2 3 , P (Φ ij = √ 3) = 1 6 . Due to the (exp ected) lar ge num b er o f zer o es o ccurring in Φ, this map require s less storage space (or , a s it is phra sed in [1], it is ‘data base-friendly’) and a llows for faster ma tr ix-vector multiplication than a densely po pula ted matr ix [1, Section 7]. More g enerally , for a ny q ≥ 1 one can take P (Φ ij = − √ q ) = 1 2 q , P (Φ ij = 0) = q − 1 q , P (Φ ij = √ q ) = 1 2 q . One ca n rea dily co mpute that the subgaussian parameter α in part (d) o f Deﬁni- tion 4 .4 is bo unded by q in this ca se. Example 4.7 (Inﬁnite-dimensional Hilbe rt spa c es) . Suppose that H is a separa ble Hilber t space. Let ( x j ) j ≥ 1 be any orthonorma l basis of H and, for 1 ≤ i ≤ m , let ( g ( i ) j ) j ≥ 1 be independent seque nce s o f i.i.d. standar d Gaussian random v ariables. F or 1 ≤ i ≤ m we deﬁne ˜ Φ i : H → L 2 (Ω) b y ˜ Φ i x = ∞ X j =1 g ( i ) j h x, x j i . The ma p Φ : Ω × H → R m deﬁned by Φ x = 1 √ m ( ˜ Φ 1 x, . . . , ˜ Φ m x ) is subg aussian on H . T o give a uniﬁed presentation of dimensiona lit y r e duction res ults for op er ators which ar e size and pairwise distance preser ving, we deﬁne for a given set S in H the co nstant 0 ≤ κ S , Φ ≤ ∞ b y κ S , Φ = sup x ∈S    k Φ( x ) k 2 2 − k x k 2 H    . The r estricted isometry constant of Φ on P is exac tly κ P nv , Φ , where (12) P nv = n x k x k H : x ∈ P o is the set of nor malized vectors in P . F or a pa irwise distance preserv ing op erato r Φ the multiplicativ e e r ror o n P is equal to κ P nc , Φ , wher e P nc = n x − y k x − y k H : x, y ∈ P o is the set of normalized chords c o rresp onding to P . The a dditiv e erro r on P is equal to κ P c , Φ , with P c as in (10). F or our treatment in Section 6 it will be conv enient to consider the situation where S is describ ed by a para meter set Ξ. W e will say that ξ : Ξ → S is a p ar ametrizatio n of S if ξ is a surjective ma p. T o a ny parametriza tion ξ we asso ciate a semi-metr ic d ξ on Ξ deﬁned by (13) d ξ ( x, y ) := k ξ ( x ) − ξ ( y ) k H ( x, y ∈ Ξ) . 8 SJOERD DIRKSEN W e ca n now sta te a ‘ma ster bo und’. E very dimensionality r eduction r esult stated below is a corollar y of this theor em. Theorem 4.8. L et S b e a set of p oints in H with r adius ¯ ∆ H ( S ) = sup y ∈S k y k H and let ξ : Ξ → S b e a p ar ametrization of S . L et Φ : Ω × H → R m b e a su b gaussian map on S . Ther e is a c onstant C > 0 such that for any 0 < κ, η < 1 we have P ( κ S , Φ ≥ κ ) ≤ η pr ovide d that (14) m ≥ C α 2 κ − 2 ¯ ∆ 2 H ( S ) max { γ 2 2 (Ξ , d ξ ) , ¯ ∆ 2 H ( S ) log( η − 1 ) } . Pr o of. F o r a ny x ∈ Ξ we wr ite Φ i ( x ) := √ m [Φ( ξ ( x ))] i . By isotropy of Φ, k Φ( x ) k 2 2 − k x k 2 H = k Φ( x ) k 2 2 − E k Φ( x ) k 2 2 = 1 m m X i =1 Φ i ( x ) 2 − E Φ i ( x ) 2 . W e can now set T = Ξ and X t,i = Φ i ( x ) in Theor em 3.2 to obtain for a ny u ≥ 1 P  κ S , Φ ≥ C  γ 2 2 (Ξ , d ψ 2 ) m + γ 2 (Ξ , d ψ 2 ) ¯ ∆ ψ 2 (Ξ) √ m + √ u ¯ ∆ 2 ψ 2 (Ξ) √ m + u ¯ ∆ 2 ψ 2 (Ξ) m  ≤ e − u . Since Φ is subgaussian we hav e for any x, y ∈ Ξ, d ψ 2 ( x, y ) = max 1 ≤ i ≤ m k Φ i ( x ) − Φ i ( y ) k ψ 2 = √ m ma x 1 ≤ i ≤ m k [Φ( ξ ( x )) − Φ( ξ ( y ))] i k ψ 2 ≤ √ α k ξ ( x ) − ξ ( y ) k H = √ αd ξ ( x, y ) and similar ly , for any x ∈ Ξ, max 1 ≤ i ≤ m k Φ i ( ξ ( x )) k ψ 2 ≤ √ α k ξ ( x ) k H ≤ √ α ¯ ∆ H ( S ) . In pa rticular, γ 2 (Ξ , d ψ 2 ) ≤ √ αγ 2 (Ξ , d ξ ) , ¯ ∆ ψ 2 (Ξ) ≤ √ α ¯ ∆ H ( S ) . W e conclude that P ( κ P , Φ ≥ κ ) ≤ η if (14) holds.  In the specia l case that S is a s ubset of the unit s pher e of H and ξ is the trivial para metr ization, Theorem 4.8 corr esp onds to a result of Mendelson, P a jor and T omczak-J aegerma nn [38, Corollar y 2.7], in which the γ 2 -functional in (14) is replaced by the Gaus sian width of S (this is equiv alent by (5) in this cas e). They reﬁned imp or tant earlier work of Klar ta g and Mendelso n [30], who pr oved the sa me result with a sub optimal dep endence in η . The res ult in [3 8] was obtained muc h earlier fo r Gaussia n matrices by Gordo n [23]. The pro of given in [38] makes sp eciﬁc use of the assumption that S is contained in the unit sphere and canno t be easily mo diﬁed to cov er the genera l c a se co ns idered her e . Remark 4.9 (Anisotro py) . O ne can relax the assumption that the subga ussian map Φ is isotropic on the set S . Suppose that Φ sa tisﬁes (a), (b) and (d) in Deﬁnition 4 .4. Set Ψ = E Φ ∗ Φ, then E k Φ x k 2 2 = x ∗ E (Φ ∗ Φ) x = x ∗ Ψ x = k Ψ 1 / 2 x k 2 H . The pr o of of Theorem 4 .8 shows that fo r any 0 < κ, η < 1 w e hav e − κ + k Ψ 1 / 2 x k 2 H ≤ k Φ x k 2 2 ≤ k Ψ 1 / 2 x k 2 H + κ, for all x ∈ S with pro bability at least 1 − η provided that (14) holds. The fo llowing s ta temen ts are immediate from Theorem 4.8 by s etting S = P nv , S = P nc and S = P c , r esp ectively , a nd taking the trivial par ametrization ξ ( x ) = x . DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 9 Corollary 4. 10. L et P b e a set of p oints and let Φ b e as in The or em 4.8. F or any 0 < δ, η < 1 we have P ( δ P , Φ ≥ δ ) ≤ η pr ovide d that m ≥ C α 2 δ − 2 max { γ 2 2 ( P nv , d H ) , log ( η − 1 ) } . Mor e over, for any 0 < ε , η < 1 we have P ( ε P , Φ ≥ ε ) ≤ η whenever m ≥ C α 2 ε − 2 max { γ 2 2 ( P nc , d H ) , log ( η − 1 ) } . Final ly, for any 0 < ζ , η < 1 we have P ( ζ P , Φ ≥ ζ ) ≤ η if m ≥ C α 2 ζ − 2 ∆ 2 H ( P ) max { γ 2 2 ( P , d H ) , ∆ 2 H ( P ) lo g( η − 1 ) } . As a ﬁrst illustr ation of Cor ollary 4 .1 0, note that it implies an extensio n of Theorem 1 .1 to general subg aussian maps as γ 2 2 ( P nc ) . log |P nc | ≤ 2 log |P | . Since the dep endence of m on ε a nd η in (2) is o ptimal, we see that in genera l one cannot exp ect a better dep endence of m on κ and η in (14 ). In the remainder of this pap e r we derive dimensio nality reductio n results for concrete data str uctures from Theorem 4.8. The technical work is to der ive a go o d estimate for the complexity parameter γ 2 2 (Ξ , d ξ ) a pp ea ring in (14). 5. Sets with low covering dimension In this section w e consider dimensionality reduction for sets with low cov ering dimension, in particular ﬁnite unions of subspaces. Deﬁnition 5.1. W e say that a metric space ( X , d ) has c overing dimension K > 0 with p ar ameter c > 0 and b ase c overing N 0 > 0 if, fo r all 0 < u ≤ 1, N ( X , d, u ∆ d ( X )) ≤ N 0  c u  K . Often c and N 0 are s o me small universal constants. In this situation we will lo osely say that ( X , d ) has covering dimension K . Example 5.2 (Unit ball of a ﬁnite-dimensiona l space) . A well-known example is the unit ball B X of a K -dimensiona l normed space X . Using a standar d volumetric argument (see e.g. [20, Pr op osition C.3]) one s hows that for any 0 < u ≤ 1, N ( B X , d X , u ) ≤  1 + 2 u  K ≤  3 u  K . Example 5.3 (Doubling dimension) . If ( X , d ) is a metric space, then the doubling c onstant λ X of X is the smallest integer λ such that for any x ∈ X and u > 0, the ball B ( x, u ) can b e cov er e d by at most λ balls of radius u/ 2. One can show that [26] for all 0 < u ≤ 1, N ( X , d, u ∆ d ( X )) ≤  2 u  log 2 λ X . That is, ( X , d ) has cov ering dimension log 2 λ X . The latter num ber is also known as the doubling dimension of ( X , d ). The notio n of doubling dimension was considered in the context of dimensionality reduction in, for exa mple, [2] and [2 6]. W e now formulate a dimensionality reduction result for sets with low covering dimension. In the pro of we us e that for c, u ∗ > 0, (15) Z u ∗ 0 log 1 / 2 ( c/u ) du ≤ u ∗ log 1 / 2  ec u ∗  . A shor t pro of of this estimate ca n b e found in [20, Lemma C.9]. The second statement in the following result was obtained by a diﬀerent metho d in [8, Theorem 10 SJOERD DIRKSEN 3] (note that this res ult was er roneously sta ted in [9]), but with a sub optimal depe ndence on δ . Corollary 5. 4. L et S 1 , . . . , S k b e subset s of a Hilb ert sp ac e H and let S = ∪ k i =1 S i . Set S i,nv = { x/ k x k 2 : x ∈ S i } . Supp ose that S i,nv has c overing dimension K i with p ar ameter c i and b ase c overing N 0 ,i with r esp e ct to d H . Set K = max i K i , c = ma x i c i and N 0 = max i N 0 ,i . L et Φ : Ω × H → R m b e a su b gaussian map on S nv . Then, for any 0 < δ, η < 1 we have P ( δ S, Φ ≥ δ ) ≤ η pr ovide d that m ≥ C α 2 δ − 2 max { log k + log N 0 + K lo g ( c ) , log ( η − 1 ) } . In p articular, if e ach S i is a K i -dimensional subsp ac e of R n , then P ( δ S, Φ ≥ δ ) ≤ η if m ≥ C α 2 δ − 2 max { log k + K , log ( η − 1 ) } . F rom Corollar y 5.4 one ca n rea dily deduce that P ( ε S, Φ ≥ ε ) ≤ η if m ≥ C α 2 δ − 2 max { log k + log N 0 + K lo g ( c ) , log ( η − 1 ) } , see the pro of of Theo rem 6.3 below. Pr o of. W e use (6) to estimate γ 2 ( S nv , d H ) . Z 1 0 log 1 / 2 N ( S nv , d H , u ) du. Clearly , if N i is an u -net for S i,nv , then ∪ k i =1 N i is an u -net for S nv . Therefor e, using our assumption on the S i,nv , we obtain N ( S nv , d H , u ) ≤ k X i =1 N ( S i,nv , d H , u ) ≤ k X i =1  c i u  K i ≤ k  c u  K . Using (15) we arrive at γ 2 ( S nv , d H ) . Z 1 0 log 1 / 2 ( k ( c/u ) K ) du ≤ log 1 / 2 ( k ) + K 1 / 2 Z 1 0 log 1 / 2 ( c/u ) du . log 1 / 2 ( k ) + K 1 / 2 log 1 / 2 ( c ) . The ﬁrst par t of the result now follows from the ﬁrst statement in Corollar y 4 .10 and the second pa rt follows by the o bserv ation in Example 5 .2.  T o illustrate Cor ollary 5.4, we consider four examples. Example 5. 5 (Spars e vectors: the ‘us ual’ RIP) . W e derive the r estricted isometry prop erty on s -s parse vectors fo r subgaussian maps Φ : Ω × R n → R m , a cla ssical result fro m compress ed sensing [4, 11, 16, 38, 39]. F or x ∈ R n we set k x k 0 = |{ 1 ≤ i ≤ n : x i 6 = 0 }| . A vector is called s - sparse if k x k 0 ≤ s . Let D s,n = { x ∈ R n : k x k 0 ≤ s } be the set o f s -spar s e vectors. The restricted isometry c o nstant δ s of Φ is deﬁned as the smallest constant δ such that (1 − δ ) k x k 2 2 ≤ k Φ x k 2 2 ≤ (1 + δ ) k x k 2 2 for all x ∈ D s,n . In o ur notatio n, δ s = δ D s,n , Φ . Note that we can write D s,n = [ I ⊂{ 1 ,...,n } , | I | = s S I , DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 11 where S I is the s -dimensional subspace S I = { x ∈ R n : x i = 0 if i ∈ I c } . Since the num b er o f s -e le ment subsets of { 1 , . . . , n } is  n s  ≤  en s  s , the se c ond part of C o rollar y 5.4 implies that P ( δ s ≥ δ ) ≤ η provided that m ≥ C α 2 δ − 2 max { s log( en/s ) , log( η − 1 ) } . The s caling of this low er b ound in n and s is optimal, see [20, Co rollar y 10 .8 ]. Example 5. 6 (Cospar se vectors: the Ψ-RIP) . In many signal pro cessing applica- tions, signals of int er est are not sparse themselves in the standard basis, but can rather b e represented as a sparse vector. Let Υ : R n → R p be a linear op er ator, which is usua lly ca lled the ‘analys is op erator ’ in the litera tur e. W e ar e in tere sted in elements x ∈ R n such that Υ x is sparse. It has become customar y to count the n umber of zero comp onents of Υ x , rather than the num b er of nonzero ones. Accordingly , a vector x ∈ R n is ca lled l - c osp arse with r esp e ct to Υ if there is a set Λ ⊂ { 1 , . . . , p } with | Λ | = l such that Υ Λ x = 0, wher e Υ Λ : R n → R p is the op erator obtained by setting the rows of Υ indexed b y Λ c equal to zer o . Let N Λ be the null space of Υ Λ , then we c a n write the set of l -cospar se vectors as C Υ ,l,p = [ | Λ | = l N Λ . The Υ -r estricte d isometry c onstant δ l of Φ is deﬁned as the sma llest po ssible δ > 0 such that (1 − δ ) k x k 2 2 ≤ k Φ x k 2 2 ≤ (1 + δ ) k x k 2 2 for all x ∈ C Υ ,l,p . In our notation, δ l = δ C Υ ,l,p , Φ . Obser ve that dim( N Λ ) ≤ n − l and the n umber of l -element subsets of { 1 , . . . , p } is  p l  ≤  ep l  l . The s econd par t of Co rollary 5.4 now implies that P ( δ l ≥ δ ) ≤ η if m ≥ C α 2 δ − 2 max { l log ( ep/l ) + ( n − l ) , lo g( η − 1 ) } . This re sult improves up on the RIP-result in [22, Theorem 3 .8]. In fact, as is heuris- tically expla ined in [40, Section 6.1], one cannot e x pec t a better low er bo und for m . An upp er b ound on δ l leads to p er formance guar antees for greedy -like recov ery algorithms for cospar se vectors from subga ussian measurements, see [2 2] for some results in this direction. Example 5. 7 (Matrix RIP) . Another direc t cons equence of Cor o llary 5.4 is a new pro of of the restricted isometry pro per ty for subgaussian matrix maps. This prop erty plays the s a me ro le in low-rank matr ix recovery as the ‘usual’ r estricted isometry pr op erty discusse d in E xample 5.5 pla ys in compre ssed sensing, see e.g. [10, 4 4] for further infor mation. W e use the following no tation. Given t wo matrices X , Y ∈ R n 1 × n 2 we consider the F robenius inner pro duct h X, Y i = n 1 X i =1 n 2 X j =1 X ij Y ij . Let k X k F = h X , X i 1 / 2 be the co rresp onding norm and d F ( X, Y ) = k X − Y k F be the induced metric. Also, we use Rank( X ) to denote the ra nk of X . F or 12 SJOERD DIRKSEN 1 ≤ r ≤ min { n 1 , n 2 } we deﬁne the restricted isometry constant δ r of a ma p Φ : R n 1 × n 2 → R m as the smallest constant δ > 0 such that (1 − δ ) k X k 2 F ≤ k Φ X k 2 F ≤ (1 + δ ) k X k 2 F for all X ∈ R n 1 × n 2 with Rank( X ) ≤ r . If we set D r = { X ∈ R n 1 × n 2 : k X k F = 1 , Rank( X ) ≤ r } , then δ r = δ D r , Φ in o ur notatio n. The cov er ing num b er e s timate [1 0, Lemma 3.1] N ( D r , d F , u ) ≤ (9 /u ) r ( n 1 + n 2 +1) (0 < u ≤ 1) , shows that D r has cov ering dimension r ( n 1 + n 2 + 1) in ( R n 1 × n 2 , d F ). The ﬁrst part of Cor o llary 5.4 implies for any subga ussian map Φ that P ( δ r ≥ δ ) ≤ η if m ≥ C α 2 δ − 2 max { r ( n 1 + n 2 + 1) , log( η − 1 ) } . This r esult was obtained in a diﬀerent wa y in [1 0, Theo rem 2.3 ], s ee also [44, Theorem 4 .2] for a slig htly worse result. Example 5.8 (T ensor RIP) . The previo us exa mple can be extended to hig he r order tensor s. Let d ≥ 2 and s et n = ( n 1 , . . . , n d ) ∈ N d . Giv en tw o tensors X , Y ∈ R n 1 ×···× n d we consider their F robenius inner pro duct h X, Y i = n 1 X i 1 =1 · · · n d X i d =1 X ( i 1 , . . . , i d ) Y ( i 1 , . . . , i d ) . Let k X k F = h X , X i 1 / 2 be the corr esp onding no rm and d F ( X, Y ) = k X − Y k F be the induced metric. Let Rank( X ) denote the r a nk of X as so ciated with its HOSVD decomp osition, s ee [43] for mo re informa tion. Given r = ( r 1 , . . . , r d ), 0 ≤ r i ≤ n i , we deﬁne the res tricted isometry constant δ r of a map Φ : R n 1 ×···× n d → R m as the smallest constant δ > 0 such that (1 − δ ) k X k 2 F ≤ k Φ X k 2 F ≤ (1 + δ ) k X k 2 F for all X ∈ R n 1 ×···× n d with Ra nk ( X ) ≤ r . If we set D r = { X ∈ R n 1 ×···× n d : k X k F = 1 , Rank( X ) ≤ r } , then δ r = δ D r , Φ in o ur notatio n. It is shown in [43] that for any 0 < u ≤ 1 N ( D r , d F , u ) ≤ (3( d + 1) /u ) r 1 ··· r d + P d i =1 n i r i . In other words, D r has covering dimensio n r 1 · · · r d + P d i =1 n i r i with par ameter 3( d + 1). Co rollar y 5.4 implies that for a ny subgaussia n map Φ and any 0 < δ, η < 1 we hav e P ( δ r ≥ δ ) ≤ η , provided that m ≥ C α 2 δ − 2 max n r 1 · · · r d + d X i =1 n i r i  log( d ) , lo g( η − 1 ) o . This result was obtained orig inally for a more r estricted class of subg aussian maps in [4 3]. The res ults presented in the four ex amples ab ov e c a n b e derived in a diﬀere n t, more elementary fashio n using the ε -net tec hnique, see [4, 39], [22], [1 0, 44], and [43], resp ectively . In fact, this is already true for the statement in Coro llary 5.4. The res ults in the following tw o sections cannot b e achiev ed using the ε -net tec h- nique, how ever, and therefore g eneric chaining metho ds, which a re at the basis of Theorem 3 .2, b ecome necessary to achiev e the b est results. DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 13 6. Infinite union of subsp a ces Many s e ts o f sig nals relev ant to signa l pro ces s ing can b e expresse d as a p ossibly inﬁnite union o f ﬁnite-dimensio nal subspaces of a Hilb ert spac e. F or example, sig- nals with v arious forms o f s tructured s parsity (e.g . sparse, cospars e, blo ck sparse, simult a neously sparse data), piecewise p olyno mials, certain ﬁnite rate of inno v a- tions mo dels and ov er lapping echo es can b e describ ed in this fashion [8, 9 , 19, 32]. In this section we prov e a Johnson-L indens trauss embedding result for an inﬁnite union of subspaces. W e consider the following setup. Let H b e a Hilb ert space and let B H denote its unit ball. Let Θ be a pa rameter set and supp ose that for every θ ∈ Θ we ar e given a ﬁnite-dimensional subspace S θ of H . W e use P θ to denote the o r thogonal pro jection onto S θ . It will be natural to consider the Finsler metr ic o n Θ, which is deﬁned by d Fin ( θ, θ ′ ) := k P θ − P θ ′ k ( θ, θ ′ ∈ Θ) . If tw o subspaces S θ , S θ ′ hav e the same dimensio n, then the Finsler distance satisﬁes d Fin ( θ, θ ′ ) = sin( γ ( θ , θ ′ )) , where γ ( θ , θ ′ ) is the larg est canonica l angle (or la rgest principa l angle) betw een the subspaces S θ and S θ ′ [47, Co rollar y 2.6 ]. Deﬁne the union (16) U = [ θ ∈ Θ S θ . W e are interested in reducing the dimensiona lity of U using a subgaussia n map. T o achiev e this, we apply Theorem 4.8 with a suitable pa rametrizatio n of U ∩ B H . W e estimate the relev ant γ 2 -functional in the following lemma. Lemma 6.1. Set K = sup θ ∈ Θ dim( S θ ) . L et ξ : Θ × B H → U ∩ B H b e the p ar ametrizatio n deﬁne d by ξ ( θ , x ) = P θ x and let d ξ b e as in (13 ) . Then, γ 2 (Θ × B H , d ξ ) . √ K + γ 2 (Θ , d Fin ) . Pr o of. Let (Θ n ) n ≥ 0 be a n y admissible sequence in Θ and let ρ = ( ρ n ) n ≥ 0 be a n asso ciated sequence of maps ρ n : Θ → Θ n . F or any g iven θ ∈ Θ and n ≥ 0 we deﬁne a semi-metric on B H by d n,θ ( x, y ) = k P ρ n ( θ ) ( x − y ) k H . Next, for every θ ∈ Θ we deﬁne an admissible seq uence H θ = ( H n,θ ) n ≥ 0 of B H by H n,θ := arg min A sup x ∈ B H d n,θ ( x, A ) , where the minimization is over all subsets A of B H with | A | ≤ 2 2 n . W e use e n,θ = inf A sup x ∈ B H d n,θ ( x, A ) = sup x ∈ B H d n,θ ( x, H n,θ ) to denote the asso ciated entropy num b ers. Finally , we deﬁne σ n,θ ( x ) = arg min y ∈ H n,θ d n,θ ( x, y ) . F or completeness we s et Θ − 1 equal to Θ 0 , d − 1 ,θ equal to d 0 ,θ and H − 1 ,θ equal to H 0 ,θ . Now we deﬁne T n = { ( ρ n − 1 ( θ ) , σ n − 1 ,θ ( x )) ∈ Θ × B H : θ ∈ Θ , x ∈ B H } . Note tha t σ n,θ ( x ) dep ends only on θ through ρ n ( θ ). It fo llows that | T n | ≤ | Θ n − 1 | 2 2 n − 1 ≤ 2 2 n 14 SJOERD DIRKSEN and therefor e T = ( T n ) n ≥ 0 is an a dmissible sequence for Θ × B H . F or ( θ , x ) ∈ Θ × B H we deﬁne π n ( θ, x ) = ( ρ n − 1 ( θ ) , σ n − 1 ,θ ( x )). By the triangle inequality , d ξ (( θ, x ) , π n ( θ, x )) = k P θ x − P ρ n − 1 ( θ ) σ n − 1 ,θ ( x ) k H ≤ k ( P θ − P ρ n − 1 ( θ ) ) x k H + k P ρ n − 1 ( θ ) ( x − σ n − 1 ,θ ( x )) k H ≤ k P θ − P ρ n − 1 ( θ ) k + d n − 1 ,θ ( x, σ n − 1 ,θ ( x )) ≤ k P θ − P ρ n − 1 ( θ ) k + e n − 1 ,θ , where we used in the ﬁnal estimate tha t, d n,θ ( x, σ n,θ ( x )) = d n,θ ( x, H n,θ ) ≤ sup x ∈ B H d n,θ ( x, H n,θ ) = e n,θ . Using these obser v ations we obtain γ 2 (Θ × B H , d ξ ) ≤ X n ≥ 0 2 n/ 2 d ξ (( θ, x ) , π n ( θ, x )) ≤ X n ≥ 0 2 n/ 2 d Fin ( θ, ρ n − 1 ( θ )) + 2 n/ 2 e n − 1 ,θ ≤ (1 + √ 2)  X n ≥ 0 2 n/ 2 d Fin ( θ, ρ n ( θ )) + X n ≥ 0 2 n/ 2 e n,θ  . It r emains to bo und the second term on the right hand side. Observe tha t e n,θ = inf { u : N ( B H , d n,θ , u ) ≤ 2 2 n } . If ( a α ) is a u -net for the unit ball in S ρ n ( θ ) with r esp ect to d H and we pick x α such that a α = P ρ n ( θ ) x α , then ( x α ) is a u -net for B H with resp ect to d n,θ . Since S ρ n ( θ ) is at most K - dimensional, we ﬁnd for all u > 0, N ( B H , d n,θ , u ) ≤ N ( B R K , d 2 , u ) . Thu s we can conclude that e n,θ ≤ e n , wher e e n = inf { u : N ( B R K , d 2 , u ) ≤ 2 2 n } . Now, if u < e n then N ( B R K , d 2 , u ) ≥ 2 2 n + 1 and hence we can estimate  1 − 1 √ 2  X n ≥ 0 2 n/ 2 e n ≤ X n ≥ 0 2 n/ 2 e n − X n ≥ 1 2 ( n − 1) / 2 e n = X n ≥ 0 2 n/ 2 ( e n − e n +1 ) ≤ 1 log 1 / 2 (2) X n ≥ 0 log 1 / 2 (1 + 2 2 n ) ( e n − e n +1 ) ≤ 1 log 1 / 2 (2) X n ≥ 0 Z e n e n +1 log 1 / 2 N ( B R K , d 2 , u ) du = 1 log 1 / 2 (2) Z 1 0 log 1 / 2 N ( B R K , d 2 , u ) du. As o bserved in E xample 5 .2, N ( B R K , d 2 , u ) ≤ (1 + 2 u − 1 ) K . Putting these estimates together we conclude using (15) that X n ≥ 0 2 n/ 2 e n,θ ≤ √ K  log 1 / 2 (2) − log 1 / 2 (2) √ 2  − 1 Z 1 0 log 1 / 2 (1 + 2 u − 1 ) du DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 15 ≤ √ K  log 1 / 2 (2) − log 1 / 2 (2) √ 2  − 1 log 1 / 2 (3 e ) . This completes the pro of.  Theorem 4.8 and Lemma 6.1 together imply the following r esult. Theorem 6 .2. L et U b e the u nion of subsp ac es deﬁne d in (16) and let K = sup θ ∈ Θ dim( S θ ) . L et Φ : Ω × H → R m b e a su b gaussian map on U . Then ther e is a c onstant C > 0 s u ch that for any 0 < δ, η < 1 we have P ( δ U , Φ ≥ δ ) ≤ η pr ovide d that m ≥ C α 2 δ − 2 max { K + γ 2 2 (Θ , d Fin ) , log ( η − 1 ) } . Pr o of. Reca ll fro m (12) that δ U , Φ = κ U nv , Φ and clearly κ U nv , Φ ≤ κ U ∩ B H , Φ . Let ξ be the para metrization of U ∩ B H deﬁned in Lemma 6 .1. By Theorem 4 .8 we hav e P ( κ U ∩ B H , Φ ≥ δ ) ≤ η if m ≥ C α 2 δ − 2 max { γ 2 2 (Θ × B H , d ξ ) , log ( η − 1 ) } . The a ssertion now follows fro m Lemma 6.1.  Theorem 6.2 impr ov es up on the seco nd part o f Cor ollary 5.4 even if Θ is a ﬁnite set. Indeed, this follows from the b ound γ 2 2 (Θ , d Fin ) . log | Θ | . Let us now derive a condition under which Φ prese rves pairwis e distances in U . F or θ , θ ′ ∈ Θ let S ( θ ,θ ′ ) be the subspace spanned by S θ and S θ ′ and let P ( θ ,θ ′ ) be the pr o jection onto this subspace. Theorem 6.3 . L et Φ : Ω × H → R m b e a sub gaussian m ap on U . Set K = sup θ ,θ ′ dim( S ( θ ,θ ′ ) ) . On the set Θ × Θ c onsider t he metric d Fin (( θ, θ ′ ) , ( τ , τ ′ )) = k P ( θ ,θ ′ ) − P ( τ ,τ ′ ) k . Then, ther e is a c onst ant C > 0 su ch that for any 0 < ε, η < 1 we have P ( ε U , Φ ≥ ε ) ≤ η pr ovi de d that (17) m ≥ C α 2 ε − 2 max { K + γ 2 2 (Θ × Θ , d Fin ) , log ( η − 1 ) } . Pr o of. Reca ll that ε U , Φ = δ U − U , Φ . Since U − U ⊂ U ∗ := ∪ ( θ ,θ ′ ) ∈ Θ × Θ S ( θ ,θ ′ ) , we have δ U − U , Φ ≤ δ U ∗ , Φ . The r e sult follows by applying Theorem 6 .2 to U ∗ , no ting that dim( S ( θ ,θ ′ ) ) ≤ 2 K for a ll ( θ , θ ′ ) ∈ S ( θ ,θ ′ ) .  Clearly , if there exists a o ne-to-one map from U into R m then w e must hav e m ≥ K . In par ticular, the scaling of m in K in (17) ca nno t b e improv ed. Remark 6.4. T ogether with the main result of [8], Theorem 6.3 implies the fol- lowing very general unifor m signal r ecov ery result. Suppose that we wish to recover a vector x ∈ U from m noisy measurements y ∈ R m given by (18) y = Φ x + e, where e ∈ H represe n ts the measurement error . If m satisﬁe s (17), then in the terminology of [8] the subgaussia n map Φ is with probability 1 − η a bilipschit z map on U with constants 1 − ε a nd 1 + ε . Therefor e, if (1 + ε ) / (1 − ε ) < 3 / 2, then with proba bilit y 1 − η we can rec ov er any x ∈ U ro bustly from the m measurements y in (18) using a pro jective Landweber algorithm. W e re fer to [8 , Theorem 2] for details and a quantitativ e statement. 16 SJOERD DIRKSEN In [3 4, 3 5], Mantzel and Romberg prov ed a version o f Theorem 6 .2 for a ma trix Φ po pulated with i.i.d. standard Gaussian entries. They ass ume that Θ has cov ering dimension K Fin with res pe c t to d Fin , with base cov ering N 0 . Their result in [35] states (in our terminology) that P ( δ U , Φ ≥ δ ) ≤ η provided that (19) m ≥ C ma x { δ − 1 log( K ) , δ − 2 } max { K ( K Fin + log K + log N 0 ) , K log ( η − 1 ) } . Note that in this setup Theorem 6.2 implies that (cf. the a r gument in the pro o f of Corollar y 5.4) (20) m ≥ C δ − 2 max { K + lo g N 0 + K Fin , log ( η − 1 ) } is alr e ady suﬃcient. Moreov er, this statement extends to a n y subgaussia n ma p, in particular the databas e-friendly map discussed in Exa mple 4 .6. The appro ach in [34, 35] is very diﬀerent from ours. The idea is to write δ U , Φ = sup θ ∈ Θ k P θ Φ ∗ Φ P θ − P θ k and to estimate the ex pec ted v alue of the right hand side us ing a (classica l) chaining argument in the op era tor norm, ba s ed o n the noncommutativ e Bernstein inequality . Note that this approa ch cannot yield the improved condition (20). F o r example, the factor lo g( K ) in (1 9) is incurr ed throug h the use of the noncommutativ e Bernstein inequality , and therefore an artefact of the used method. 7. Manif olds Let M b e a K -dimensio na l C 1 -submanifold of R n , eq uipped with the Riemann- ian metr ic induced by the Euclidean inner pro duct on R n . W e use the following standard notation a nd termino logy . F o r any x ∈ M we let T x M denote the tange nt space of M at x a nd let P x : R n → T x M b e the asso c ia ted pr o jection onto T x M . W e use T M = [ x ∈M T x M to denote the tangent bundle of M . If γ : [ a, b ] → M is a piecewise C 1 curve in M , then its length is deﬁned a s L ( γ ) = Z b a k γ ′ ( t ) k 2 dt. F or x, y ∈ M , let d M ( x, y ) b e the g eo desic dis tance b etw een x and y , which can b e describ ed as d M ( x, y ) = inf { L ( γ ) : γ : [ a, b ] → M piecewise C 1 , a, b ∈ R , γ ( a ) = x, γ ( b ) = y } . F or more infor mation on Riemannian (sub)manifolds we refer to [3 1]. Below we prov e three diﬀeren t types of dimensionality reduction r esults for a subgaussia n ma p Φ. W e derive a suﬃcient condition under which Φ uniformly preserves the lengths of a ll curves in M up to a spec iﬁed multiplicativ e er ror and conditions under which ambien t distances are pres erved up to an additive err or and a multiplicativ e erro r, r e sp e c tively . 7.1. Preserv ation of curv e l e ngths. W e ca n immediately apply Theorem 6 .2 to derive a condition under which Φ u niformly preser ves the length of all curves in M up to a sp eciﬁed multiplicativ e error . Theorem 7.1. Le t M b e a K -dimensional C 1 -submanifold of R n . L et Φ : Ω × R n → R m b e a sub gaussian map. Ther e is a c onstant C > 0 such that for any 0 < ε , η < 1 we have with pr ob abi lity at le ast 1 − η for any pie c ewise C 1 -curve γ in M , (21) (1 − ε ) L ( γ ) ≤ L (Φ γ ) ≤ (1 + ε ) L ( γ ) DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 17 pr ovide d t hat (22) m ≥ C α 2 (2 ε − ε 2 ) − 2 max { K + γ 2 2 ( M , d Fin ) , log ( η − 1 ) } . Pr o of. Let γ : [ a, b ] → M b e any piecewise C 1 -curve in M , then Φ γ is a piecewise C 1 -curve and (Φ γ ) ′ ( t ) = Φ γ ′ ( t ) whenever γ is diﬀerentiable at t . Ther efore, (1 − δ T M , Φ ) k γ ′ ( t ) k 2 2 ≤ k (Φ γ ) ′ ( t ) k 2 2 ≤ (1 + δ T M , Φ ) k γ ′ ( t ) k 2 2 . Note tha t if δ T M , Φ ≤ 2 ε − ε 2 , then (see Remark 4.3 for a similar observ ation) (1 − ε ) k γ ′ ( t ) k 2 ≤ k (Φ γ ) ′ ( t ) k 2 ≤ (1 + ε ) k γ ′ ( t ) k 2 . Int eg rating on both sides ov er [ a, b ] yields (1 − ε ) L ( γ ) ≤ L (Φ γ ) ≤ (1 + ε ) L ( γ ) . By Theorem 6.2, we hav e P ( δ T M , Φ ≥ 2 ε − ε 2 ) ≤ η under condition (22) and this implies the res ult.  Remark 7.2. If the map Φ in Theo rem 7.1 also happ ens to b e a manifold em- bedding , i.e., a n immer sion that is homeo morphic o nt o its image, then it preserves geo desic dis ta nces. That is, if for a given 0 < ε < 1, (21) holds for all piecewise C 1 -curves in M , then (1 − ε ) d M ( x, y ) ≤ d Φ M (Φ x, Φ y ) ≤ (1 + ε ) d M ( x, y ) , for all x, y ∈ M . Indeed, for given x, y ∈ M , let γ g, Φ be a g eo desic b etw een Φ( x ) and Φ( y ), a nd let γ b e the preima ge of γ g, Φ . By (21), (1 − ε ) d M ( x, y ) ≤ (1 − ε ) L ( γ ) ≤ L ( γ g, Φ ) = d Φ M (Φ( x ) , Φ( y )) . Similarly , if γ g is a geo desic b etw een x and y in M , then d Φ M (Φ( x ) , Φ( y )) ≤ L (Φ γ g ) ≤ (1 + ε ) L ( γ g ) = (1 + ε ) d M ( x, y ) . 7.2. Preserv ation of am bie nt distances: additive error. W e brieﬂy consider maps that preserve pairwise ambien t dista nc e s up to a sp eciﬁed additive error . The fo llowing result is similar to a result established for ra ndo m pro jections in [2 , Theorem 9 ]. Prop ositio n 7.3. L et M b e a C 1 -manifold with doubling dimension D M in the ge o desic distanc e d M and let ∆ M b e its diameter in d M . L et Φ : Ω × R n → R m b e a sub gaussian map. Then, t her e is a c onstant C > 0 su ch that P ( ζ M , Φ ≥ ζ ) ≤ η , pr ovide d t hat m ≥ C α 2 ζ − 2 ∆ 4 M max { D M , log ( η − 1 ) } . Pr o of. Since d 2 ≤ d M , we ﬁnd using (6) γ 2 ( M , d 2 ) . Z ∆ M 0 log 1 / 2 N ( M , d M , ε ) dε = ∆ M Z 1 0 log 1 / 2 N ( M , d M , ε ∆ M ) dε ≤ ∆ M D 1 / 2 M Z 1 0 log 1 / 2 ( c/ε ) dε . ∆ M D 1 / 2 M , where in the ﬁnal step we used (15). The result is now immedia te from the third statement in Coro llary 4.1 0.  If γ is a C 1 -curve in R n , then it has doubling dimension 2 with resp ect to the geo desic distance. Therefore, Prop os itio n 7.3 implies in this case that with probability 1 − η k x − y k 2 2 − ζ ≤ k Φ( x − y ) k 2 2 ≤ k x − y k 2 2 + ζ , for all x, y ∈ γ , 18 SJOERD DIRKSEN whenever m ≥ C α 2 ζ − 2 ∆ 4 γ max { 2 , lo g( η − 1 ) } . 7.3. Preserv ation of am bie nt di s tances: m ulti plicativ e error. W e will now inv estigate under which c onditions a subgaussian map Φ on M preserves pair w is e ambien t distances up to a small multiplicativ e erro r. These maps also approxi- mately preserve several other pr op erties of the manifold, such as its volume and the length and curv ature of curves in the manifold (see [5, Sec tion 4.2] for a dis- cussion). Results in this direction were ﬁrst obtained in [5] a nd improv ed up on in [12, 1 8]. Let us ﬁrst obse r ve an embedding result for ma nifo lds with a low linear ization dimension, which substantially improves [2, The o rem 8]. Corollary 7 .4. F or every 1 ≤ i ≤ k let M i b e smo oth submanifold of R n with line arization dimension K i . Set M = ∪ k i =1 M i and K = max i K i . L et Φ : Ω × R n → R m b e a sub gaussian map. Then ther e is a c onstant C > 0 such that for every ε, η > 0 we have P ( ε M , Φ ≥ ε ) ≤ η if m ≥ C α 2 ε − 2 max { log k + K , log( η − 1 ) } . Pr o of. By [2, Lemma 3], M i is contained in an aﬃne subspace of dimensio n K i and therefore in a linear subspa ce of dimensio n K i + 1. The result is now immediate from the second statement in Coro llary 5.4.  T o der ive the main r esults of this section, Theorems 7.7 and 7.9, we apply Coro l- lary 4.10 a nd estimate the γ 2 -functional of the set M nc of normalized chords. W e use (6), i.e., γ 2 ( M nc , d 2 ) . Z 1 0 log 1 / 2 N ( M nc , d 2 , u ) du and estimate the covering num ber s o f M nc . The idea b ehind the cov ering n um- ber estimates, whic h is already implicit in [12], is to divide the set of norma lized chords into tw o categories . Firstly , one co nsiders no rmalized chords cor resp onding to x, y ∈ M which are ‘close’ in the Euc lidea n metric. In [12], these chords are called the ‘short chords’, which should b e ta ken a s s horthand for ‘the no rmalized chords cor resp onding to short chords’. Let Ch( x, y ) = y − x k y − x k 2 denote the nor ma lized chord from x to y . Since Ch( x, y ) conv erges to a unit tangent vector in T x M as y approaches x , it is clea r that this part of the cov ering num be r estimate requir es go o d control o f the ‘intrinsic dimension’ of the tangent bundle of M . Secondly , one considers normalize d chords corresp onding to x a nd y which are ‘fa r apart’ in E uclidean distance (the ‘long chords’ in the terminolog y of [12]). These chords can b e a pproximated well by chords Ch( a, b ), wher e a and b ar e taken from a c overing o f M itse lf, see Lemma 7.6 b elow. T o b e a ble to decide whether tw o po int s a re ‘close’ o r ‘far apart’, we need to q uantif y how well we can appr oximate a normalized chord by a tang ent vector. F or this purpo se we introduce the following parameter. Deﬁnition 7.5 . If M is a C 1 -submanifold o f R n , then we let ι ( M ) b e the smalles t constant 0 < ι ≤ ∞ sa tisfying k Ch( x 1 , x 2 ) − P x 1 Ch( x 1 , x 2 ) k 2 ≤ ι k x 1 − x 2 k 2 , for all x 1 , x 2 ∈ M . In the pro of of Theo r em 7.7 we use the following observ ation, which is implicitly used in [12]. It is r eadily pr ov en using the tria ng le and r everse triangle inequalities. DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 19 Lemma 7.6. If x 1 , x 2 ∈ R n satisfy k x 1 − x 2 k 2 ≥ t > 0 , then k Ch( x 1 , x 2 ) − Ch( y 1 , y 2 ) k 2 ≤ 2 t − 1 ( k x 1 − y 1 k 2 + k x 2 − y 2 k 2 ) . Theorem 7.7. Le t M b e a K -dimensional C 1 -submanifold of R n . L et Φ : Ω × R n → R m b e a su b gaussian m ap. Su pp ose that M has c overing dimensions K 2 and K Fin with r esp e ct to d 2 and d Fin , r esp e ctively. Then t her e is a c onstant C > 0 such that for any 0 < ε, η < 1 we have P ( ε M , Φ ≥ ε ) ≤ η pr ovide d t hat m ≥ C α 2 ε − 2 max { K 2 log + ( ι ( M )∆ d 2 ( M )) + K Fin + K , lo g( η − 1 ) } . Pr o of. Let 0 < a, b , c, t < ∞ b e para meter s to be determined later. Let N 2 be a n a -net of M with r esp ect to the Euclidean dis ta nce d 2 and let N Fin be a b -net for M with r esp ect to d Fin . Finally , for a ny y ∈ N Fin let N y be a c -net for the unit spher e S R n in R n with r e sp e ct to the induced semi-metric d y ( z 1 , z 2 ) := k P y ( z 1 − z 2 ) k 2 . Suppo se ﬁrst that k x 1 − x 2 k 2 > t . Let y 1 , y 2 ∈ N 2 be such tha t k x 1 − y 1 k 2 < a and k x 2 − y 2 k 2 < a . By Lemma 7 .6, k Ch( x 1 , x 2 ) − Ch( y 1 , y 2 ) k 2 ≤ 2 t − 1 ( k x 1 − y 2 k 2 + k x 2 − y 2 k 2 ) ≤ 4 t − 1 a. Suppo se now that k x 1 − x 2 k 2 ≤ t . P ick y ∈ N Fin such that k P x 1 − P y k < b and subsequently z ∈ N y such tha t k P y (Ch( x 1 , x 2 ) − z ) k 2 < c . Letting ι be a s in Deﬁnition 7 .5, we ﬁnd k Ch( x 1 , x 2 ) − P y z k 2 ≤ k Ch( x 1 , x 2 ) − P x 1 Ch( x 1 , x 2 ) k 2 + k P x 1 Ch( x 1 , x 2 ) − P y Ch( x 1 , x 2 ) k 2 + k P y (Ch( x 1 , x 2 ) − z ) k 2 ≤ ιt + b + c . Now let 0 < u ≤ 1. F r o m o ur estimates we see that if we pic k t = u/ (3 ι ), a = u 2 / (12 ι ), b = u / 3 and c = u/ 3 , then { Ch( y 1 , y 2 ) : y 1 , y 2 ∈ N 2 } ∪ { P y z : y ∈ N Fin , z ∈ N y } yields a u -net for M nc with r e sp e ct to d 2 . Since for every y ∈ N Fin and v > 0, N ( S R n , d y , v ) ≤ N ( B R K , d 2 , v ) ≤ (1 + 2 v − 1 ) K , we obtain (23) N ( M nc , d 2 , u ) ≤ N 2  M , d 2 , u 2 12 ι  + N  M , d Fin , u 3  1 + 6 u  K . By (6), γ 2 ( M nc , d 2 ) . Z 1 0 log 1 / 2 N ( M nc , d 2 , u ) du ≤ 2 √ 2 Z 1 0 log 1 / 2 N  M , d 2 , u 2 12 ι  du + 2 Z 1 0 log 1 / 2 N  M , d Fin , u 3  du + 2 √ K Z 1 0 log 1 / 2  1 + 6 u  du ≤ 2 p 2 K 2 Z 1 0 log 1 / 2 +  12 ι ∆ d 2 ( M ) u 2  du + 2 p K Fin Z 1 0 log 1 / 2  3 u  du + √ K Z 1 0 log 1 / 2  1 + 6 u  du . p K 2 log 1 / 2 + ( ι ∆ d 2 ( M )) + p K Fin + √ K , (24) where in the ﬁnal s tep we used (15). The r e s ult now follows from the second statement in Coro llary 4.1 0.  20 SJOERD DIRKSEN W e conclude b y proving a result related to [5, 18] using some to o ls fr om [18], which c a n in turn be traced back to [4 1]. Recall that the r e ach τ ( M ) of a smo oth submanifold M of R n is the smallest τ > 0 such that so me p oint o f R n at distance τ from M has t wo distinct po in ts of M a s closest p oints in M . Lemma 7.8. If M has r e ach τ , then ι ( M ) ≤ 2 τ − 1 . Mor e over, for any x 1 , x 2 ∈ M , d Fin ( x 1 , x 2 ) ≤ 2 √ 2 τ − 1 / 2 k x 1 − x 2 k 1 / 2 2 . Pr o of. Supp ose ﬁrst that k x 1 − x 2 k 2 ≤ τ / 2 . Then, k Ch( x 1 , x 2 ) − P x 1 Ch( x 1 , x 2 ) k 2 = sin( ∠ (Ch ( x 1 , x 2 ) , P x 1 Ch( x 1 , x 2 ))) = sin( ∠ ( x 2 − x 1 , P x 1 ( x 2 − x 1 ))) ≤ k x 1 − x 2 k 2 2 τ , where the ﬁnal ineq uality follows fr o m [1 8, Lemma 2 ]. On the other hand, if k x 1 − x 2 k 2 > τ / 2 , then trivially , k Ch( x 1 , x 2 ) − P x 1 Ch( x 1 , x 2 ) k 2 ≤ 1 ≤ 2 τ − 1 k x 1 − x 2 k 2 . The s econd statement for x 1 , x 2 satisfying k x 1 − x 2 k 2 < τ / 2 fo llows from [18, Lemma 9] and is tr ivial if k x 1 − x 2 k 2 ≥ τ / 2 .  Theorem 7. 9. L et M b e a K -dimensional C ∞ -submanifold of R n with r e ach τ and c overing dimension K 2 with r esp e ct to d 2 . L et Φ : Ω × R n → R m b e a sub gaussian map. Then ther e is a c onstant C > 0 su ch that for any 0 < ε, η < 1 we have P ( ε M , Φ ≥ ε ) ≤ η pr ovide d t hat m ≥ C α 2 ε − 2 max { K 2 log + ( τ − 1 ∆ d 2 ( M )) + K , log ( η − 1 ) } . If M has volume V M , then P ( ε M , Φ ≥ ε ) ≤ η if m ≥ C α 2 ε − 2 max { K log + ( K τ − 1 ) + K + log + ( V M ) , log ( η − 1 ) } . Pr o of. By the seco nd pa r t of Lemma 7.8, if N is a v -net o f M with resp ect to d 2 , then it is also a 2 √ 2 τ − 1 / 2 √ v -net with r e sp e c t to d Fin . Hence, N ( M , d Fin , 2 √ 2 τ − 1 / 2 √ v ) ≤ N ( M , d 2 , v ) , which implies that for any u > 0, N ( M , d Fin , u ) ≤ N  M , d 2 , τ 8 u 2  . Combining this with our es tima te (23) in the pro of of Theor e m 7.7, w e obtain N ( M nc , d 2 , u ) ≤ N  M , d 2 , u 2 12 ι  + N  M , d Fin , u 3  1 + 6 u  K ≤ N  M , d 2 , τ u 2 24  + N  M , d 2 , τ u 2 72  1 + 6 u  K , (25) where we applied Lemma 7 .8. By a computation simila r to (2 4) we ﬁnd γ 2 ( M nc , d 2 ) . p K 2 log 1 / 2 + ( τ − 1 ∆ d 2 ( M )) + √ K . The ﬁr st statement now follows from Corolla ry 4 .1 0. F or the sec o nd r esult w e use that for a ny v ≤ τ / 2 (cf. [18, Lemma 1 1]) N ( M , d 2 , v ) ≤  v 2 4 − v 4 64 τ 2  − K/ 2 V M V − 1 B R K ≤  v 2 4 − v 4 64 τ 2  − K/ 2  K + 2 4 π  K/ 2 V M . DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 21 W e a pply this b ound to the terms on the far right hand side of (25) to ﬁnd fo r some absolute co nstants c, ˜ c > 0 and 0 < u ≤ 1 , N ( M nc , d 2 , u ) ≤ V M  K + 2 4 π  K/ 2  c τ u 2  K +  ˜ c τ u 2  K  1 + 6 u  K  . A co mputation similar to (24 ) shows that γ 2 ( M nc , d 2 ) . √ K (log 1 / 2 ( K ) + lo g 1 / 2 + ( τ − 1 ) + 1) + log 1 / 2 + ( V M ) . The c la im now follows from Cor ollary 4.10.  The ﬁrs t sta temen t in Theorem 7.9 improves up on [5 , Theo rem 3 .1]. The second statement extends the result in [18 , The o rem 2] fr om Gaussian matrice s to gener al subgaussia n maps a nd remov es an additional O (log( ε − 1 )) dep endence of m on ε . The s uper ﬂuous facto r O (log ( ε − 1 )) seems to be an inherent construct o f the pro of in [18], as it o ccurs in several other pap ers which use es s ent ia lly the same metho d [2, 12, 26]. Ackno wledgement The resear ch for this pap er was initiated after a discus sion with Justin Romber g ab out co mpressive par a meter estimation. I would like to thank him for pr oviding me with William Mantzel’s P hD thesis [34]. References [1] D. Ac hlioptas . Da ta base-friendly random pr o jections: Johnso n- Lindenstrauss with binar y coins . J. Comput. System Sci. , 66(4 ):6 71–68 7, 2 003. [2] P . Agarwal, S. Har -Peled, and H. Y u. E m b eddings of surfaces , curves, and moving p oints in Euclidean s pace. SIAM J. Comput. , 42(2):44 2–458 , 2013. [3] N. Alon. Problems and res ults in extremal combin a torics. I. Discr ete Math. , 273(1- 3 ):31–53 , 20 03. [4] R. Bara niuk , M. Davenport, R. DeV ore, a nd M. W akin. A simple pro o f of the restricted iso metry prop erty for random ma trices. Constr. Appr ox. , 28(3):2 53– 263, 20 08. [5] R. Bara niuk and M. W akin. Random pro jections o f smo oth manifolds. F o u nd. Comput. Math. , 9(1):51–7 7, 20 09. [6] G. Biau, L. Devroye, and G. Lugos i. O n the p erformance of clustering in Hilber t spa ces. IEEE T r ans. Inform. The ory , 54(2):781 – 790, 2 0 08. [7] E. Bingham a nd H. Mannila. Random pro jection in dimensio nality reduc- tion: a pplications to image and text data. In Pr o c e e dings of t he seventh ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 24 5 –250 . A CM, 200 1 . [8] T. Blumensath. Sampling and re constructing signals from a union of linear subspaces. IEEE T r a ns . Inform. The ory , 57(7):46 60–46 71, 20 11. [9] T. Blumensa th and M. Davies. Sampling theorems for signals fro m the unio n of ﬁnite-dimensional line a r subspaces . IEEE T r ans. Inform. The ory , 55 (4):1872 – 1882, 200 9. [10] E. Ca nd` es and Y. Pla n. Tight o racle inequa lities for low-rank matrix r e c ov ery from a minimal num b er of nois y random measurements. IEEE T r ans. Inform. The ory , 57(4 ):2342– 2359, 2011. [11] E. Ca nd` es a nd T. T ao. Near-optimal signa l recovery from random pro jections: universal enco ding s trategies? IEEE T r ans. In form. The ory , 52(12 ):5406– 5425, 200 6. 22 SJOERD DIRKSEN [12] K. Clar kson. Tighter bounds for r andom pro jections of ma nifolds. In Pr o- c e e dings of t he twenty-fourt h annual symp osi u m on Computational ge ometry , pages 39 – 48. A CM, 200 8. [13] S. Das gupta. Learning mixtures of Ga ussians. In 40th Annual Symp osium on Foundations of Computer S cienc e (New Y ork, 1999) , pa g es 63 4–644 . IEEE Computer So c., Los Alamito s , CA, 199 9. [14] S. Dasgupta a nd A. Gupta. An e lemen tar y pro of of a theor em of Johnson and Lindenstrauss. R andom Structu r es Algorithms , 22 (1 ):60–65 , 200 3. [15] S. Dirksen. T ail b ounds via generic chaining. arXiv:13 09.35 2 2 . [16] D. Donoho . Compressed sensing. IEEE T r ans. In form. The ory , 52(4):1 2 89– 1306, 200 6. [17] P . Drineas, M. Maho ney , S. Muthukrishnan, and T. Sar l´ os. F a s ter leas t squares approximation. Numer. Math. , 117(2 ):219–2 49, 2011. [18] A. E ftekhari and M. W ak in. New analy sis of manifold embeddings and s ignal recov ery from compressive mea surements. arXiv:130 6 .4748. [19] Y. Eldar and M. Mishali. Robust re cov ery of signals fro m a structured union of subspace s . IEEE T r ans. Inform. The ory , 55(11):5 3 02–5 316, 20 09. [20] S. F o ucart and H. Rauhut. A Mathematic al Intr o duction to Compr essive Sens - ing . Birkha ¨ user, Bo ston, 2013 . [21] P . F rankl and H. Maehar a . The J ohnson-Lindenstr auss lemma and the spheric- it y of some gr aphs. J. Combin. The ory Ser. B , 44(3):355 – 362, 1988 . [22] R. Giryes, S. Nam, M. Elad, R. Grib onv al, a nd M. Da vies . Greedy- like algo- rithms for the cosparse analys is mo de l. T o app e ar in Line ar Algebr a and its Applic atio ns , 2013. [23] Y. Gordo n. On Milma n’s inequality and r a ndom subspaces w hich esca pe through a mesh in R n . In Ge ometric asp e cts of functional analysis (1986/87) , volume 13 17 of L e ctur e Notes in Math. , pages 84–10 6. Springer, Berlin, 1 988. [24] C. Heg de, M. W akin, and R. Ba raniuk. Random pro jections for manifold learning. In A dvanc es in neur al information pr o c essing systems , pages 641– 648, 20 07. [25] P . Indyk and R. Motw ani. Approximate ne a rest neighbor s: T ow ar ds removing the curse of dimensiona lity . In Pr o c e e dings of the Thirtieth Annual ACM Sym- p osium on The ory of Computing , STOC ’9 8, pages 6 04–6 1 3, New Y ork, NY, USA, 19 98. ACM. [26] P . Indyk and A. Naor. Nearest-neig hbo r-preser ving embeddings. ACM T r ans. Algo rithms , 3(3):Art. 31, 12 , 200 7. [27] T. Jayram a nd D. W o o dr uﬀ. Optimal b ounds for Johnson- Lindenstrauss trans- forms a nd str eaming pr oblems with s ubco nstant error . A CM T r ans. Algo - rithms , 9 (3):Art. 26, 17, 2 013. [28] W. Johnson and J. Lindenstraus s. E xtensions of Lipschitz mappings into a Hilber t space. In Confer enc e in mo dern analysis and pr ob ability (N ew Haven, Conn., 1982) , volume 2 6 of Contemp. Math. , page s 189–2 06. Amer . Ma th. So c., Pr ovidence, RI, 1984. [29] A. Ka lai, A. Moitra, and G. V aliant. Disentangling g aussians. Commun . ACM , 55(2):113 –120 , F ebr uary 2 012. [30] B. Klartag and S. Mendelson. E mpirical pro ces ses a nd random pro jections. J. F u n ct. Anal. , 225 (1):229– 2 45, 2005. [31] J. Lee. Riemannian manifolds . Springer-V erlag, New Y ork, 1997 . [32] Y. Lu and M. Do. A theor y for sampling signa ls from a union o f subspaces. IEEE T r ans. Signal Pr o c ess. , 5 6(6):233 4 –234 5, 20 08. [33] O. Maillard and R. Munos . Linea r re gressio n with random pr o jections. J. Mach. L e arn. R es. , 13:273 5–277 2, 201 2. DIMENSIONALITY REDUCTION WITH SUBGAUSSIAN MA TRICES 23 [34] W. Mantzel. Par ametric estimation of r andomly c ompr esse d functions . PhD. Thesis, Geor gia Insitute of Technology , 2 013. [35] W. Man tzel and J. Romberg. Compressive pa rameter e s timation. Preprint, 2013. [36] W. Mantzel, J . Romberg, and K . Sabra . Co mpressive matched-ﬁeld pr o cessing. The Journal of the A c oustic al So ciety of Americ a , 132(1 ), 201 2 . [37] J. Matou ˇ sek. O n v aria nt s o f the Johnso n-Lindenstrauss lemma . R andom Struc- tur es Algo rithms , 3 3 (2):142– 156, 2008 . [38] S. Mendelso n, A. Pa jor, a nd N. T omczak-Jaeger ma nn. Reconstructio n and subgaussia n o per ators in asymptotic geo metric analysis. Ge om. F u nct. Anal. , 17(4):124 8–12 8 2, 200 7. [39] S. Mendelso n, A. Pa jor , and N. T omczak-J a egermann. Unifor m uncertaint y principle for Bernoulli and subgauss ian ensembles. Constr. Appr ox. , 28(3):27 7– 289, 20 08. [40] S. Na m, M. Da vies , M. Elad, and R. Gr ib o nv al. The co sparse ana lysis mo del and alg o rithms. Appl. Comput. Harmon. Anal . , 34(1):3 0 –56, 2013 . [41] P . Niyogi, S. Smale, and S. W einberge r . Finding the ho mology of submanifolds with high conﬁdence fro m r a ndom s amples. Discr ete Comput. Ge om. , 39(1- 3):419– 441, 20 08. [42] M. Raginsky , R. Willett, Z. Harmany , and R. Mar c ia. Compres sed sens- ing p erforma nce bo unds under Poisson no ise. IEEE T r ans. Signal Pr o c ess. , 58(8):399 0–40 0 2, 201 0. [43] H. Rauhut, R. Schneider, and Z. Sto janac. Low-rank tensor recov ery via iter - ative har d thresholding. Preprint. [44] B. Rech t, M. F azel, and P . Parrilo. Guar anteed minimum-rank solutions of linear ma tr ix equatio ns via nuclear no rm minimiza tion. SIAM Re v. , 52 (3):471– 501, 20 10. [45] T. Sarl´ os . Improved approximation a lg orithms for lar ge matrices via random pro jections. In F oundatio ns of Computer Scienc e, 2006. F OCS ’06. 47th An- nual IEEE S ymp osium on , pages 143 –152, Oct 2006. [46] L. Sch ulman. Clustering for edge-c ost minimization. In Pr o c e e dings of the Thirty-se c ond Annual ACM Symp osium on The ory of Computing , STOC ’0 0, pages 54 7 –555 , New Y o rk, NY, USA, 2000 . ACM. [47] G. Stewart. Er ror and p erturba tion b ounds for subspaces as so ciated with certain eig env alue pro blems. SIAM R ev. , 15:727 –764, 197 3. [48] M. T a lagra nd. Reg ularity of Gaussian pro cesses. A cta Math. , 15 9 (1-2):99 – 149, 1987. [49] M. T alag rand. Ma jorizing meas ures without meas ur es. A nn . Pr ob ab. , 29(1):411 –417 , 200 1. [50] M. T a lagrand. The generic chaining . Spr inger-V e rlag, Berlin, 2 005. [51] S. V empala. The r andom pr oje ction metho d . DIMACS Series in Discr ete Ma th- ematics and Theo retical Computer Science, 65 . American Mathematical So ci- ety , P rovidence, RI, 200 4. [52] N. V erma. Distance pres e rving embeddings fo r gener al n -dimensio nal mani- folds. P reprint, 2 013. Universit ¨ at Bonn , Hausdorff Center for Ma thema tics, Endenicher Allee 60, 53115 Bonn, Germa ny E-mail addr ess : sjoerd.d irksen@hcm.uni -b onn.de

Dimensionality reduction with subgaussian matrices: a unified theory

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment