Subexponential-Time Algorithms for Sparse PCA

Sub exp onen tial-Time Algorithms for Sparse PCA Y unzi Ding ∗ 1 , Dmitriy Kunisky ‡ 1 , Alexander S. W ein § 1 , and Afonso S. Bandeira ¶ 2 1 Departmen t of Mathematics, Couran t Institute of Mathematical Sciences, New Y ork Univ ersit y , USA 2 Departmen t of Mathematics, ETH Zurich , Switzerland Abstract W e study the computationa l cost of recovering a unit-nor m sparse principal comp onent x ∈ R n planted in a random matrix, in either the Wigner o r Wishart spiked mo del (observing either W + λxx ⊤ with W drawn fro m the Gaus sian orthogo nal ensemble, or N indep enden t samples fro m N (0 , I n + β xx ⊤ ), resp ectively). Prior work has shown that when the signal- to- noise r atio ( λ or β p N /n , resp ectively) is a small co nstan t and the fra ction o f nonzero entries in the planted vector is k x k 0 /n = ρ , it is p ossible to recover x in p olynomial time if ρ . 1 / √ n . While it is pos s ible to recover x in exp onential time under the weaker condition ρ ≪ 1, it is b eliev ed that p olynomial-time recovery is imp ossible unless ρ . 1 / √ n . W e inv estigate the precise amount of time required for r eco very in the “p ossible but har d” reg ime 1 / √ n ≪ ρ ≪ 1 b y exploring the power of subexp onential-time a lgorithms, i.e., algorithms running in time exp( n δ ) for some constant δ ∈ (0 , 1). F or a n y 1 / √ n ≪ ρ ≪ 1, we give a re c o very alg orithm with runtime roughly exp( ρ 2 n ), demo nstrating a smo oth tra deoﬀ b et ween spa r sit y and r un time. O ur family of algorithms interpo lates smoothly betw een t wo existing algor ithms: the poly no mial-time diagonal thresholding algorithm and the e x p( ρn )-time exhaustive sea rc h alg orithm. F urthermo re, b y analyzing the low-degree likeliho o d ratio, we give rig orous evidence sugges ting that the tradeoﬀ achiev ed by our algo rithms is optimal. ∗ Email: yding@nyu.e du . Partially su p ported by NSF gran t DMS-1712730. ‡ Email: kunisky@cims.nyu.e du . Partially supp orted by NSF gran ts DMS-1712730 and DMS-1719545. § Email: awein@cims. nyu.e du . P artially supp orted b y NSF gran t DMS-1712730 and b y the Simons Collab oratio n on Algorithms and Geometry . ¶ Email: b andeir a@math.eth z.ch . Most of this work w as done while ASB w as with t he Department of Mathematics at the Courant In stitute of Mathematical Sciences, and the Center for Data S cience, at New Y o rk U niv ersity; and partially su pported by NSF gran ts DMS-1712730 and DMS-1719545 , and by a grant from the Sloan F oundation. Con ten ts 1 In tro duction 3 1.1 Spike d Matrix Mo dels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Principal Comp onen t An alysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Sparse P C A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Our Con tribu tio ns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Bac kground on the Low-D egree L ik elihoo d Ratio . . . . . . . . . . . . . . . . . . . . 6 2 Main Results 8 2.1 The Wishart Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 The Wigner Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Pro ofs for Sube xp onential-Time Algorithms 15 3.1 The Wishart Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 The Wigner Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Pro ofs for Lo w-Degree Likelihoo d Ratio Bounds 25 4.1 Lo w-Degree Lik eliho o d Ratio for Spiked Mo dels . . . . . . . . . . . . . . . . . . . . . 25 4.2 In tro duction an d Estimates of A d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 The Wishart Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.4 The Wigner Mo del . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Ac kno wledgmen ts 37 References 37 A C hernoﬀ Bounds 43 2 1 In tro duction 1.1 Spik ed Matrix Mo dels Since the foundational w ork of Johnstone [Joh01], spike d random matrix ensembles ha v e b een widely studied throughout random matrix theory , statistics, and theoretical data science. These mo dels d escrib e a deform ation of one of several canonical random matrix distribu tions by a ran k - one p ertur bation or “sp ik e,” in tended to capture a signal corrupted b y noise. S p ectral pr op er ties of these spik ed mo dels ha v e receiv ed m uch attent ion in random matrix theory [BBP05, BS06, P au04, P´ ec06, FP07, CDMF09, BGN11, PRS 13 , KY13], leading to a theoretical understand in g of metho ds based on principal comp onent analysis (PCA) for r eco v ering the d irecti on of the r ank-one spik e [J oh01, JL04, P au07, Nad08, JL09]. Spik ed matrix m odels h a v e also found more sp eciﬁc applications to problems suc h as comm unit y detection in graphs (see, e.g., [McS01, V u18, D AM16], or [Mo o17, Abb17] f or surve ys) and synchronizati on o ver groups (see, e.g., [S in11, SS11, JMR T16, PWBM16, PWBM18a]). W e w ill stud y tw o classical v arian ts of the spiked m atrix mo del: the Wigner and Wishart mo dels. The mo dels diﬀer in ho w noise is applied to th e signal v ector. In either case, let x ∈ R n b e the signal v ector (or “spik e”). W e will either ha ve x deterministic with k x k 2 = 1, or x ∈ R n random f or eac h n with k x k 2 → 1 in probabilit y as n → ∞ . • Spik ed Wigner Mo del. Let λ > 0. Observe Y = W + λxx ⊤ , where W ∈ R n × n is dra wn from the Gaussian orth o gonal ensemble GOE ( n ), i.e., W is symmetric w ith en tries distributed ind epend en tly as W ii ∼ N (0 , 2 /n ) f or all 1 ≤ i ≤ n , and W ij = W j i ∼ N (0 , 1 /n ) for all 1 ≤ i < j ≤ n . • Spik ed Wishart Model. Let β > 0 and N ∈ N . Observe N samples y (1) , y (2) , . . . , y ( N ) ∈ R n dra wn indep endently from N (0 , I n + β x x ⊤ ). The ratio of dimension to num b er of samples is d enote d γ : = n/ N . W e will f o cus on the high-dimensional regime wh ere γ con v erges to a constan t as n → ∞ . W e let Y denote the sample c ovarianc e matrix Y = 1 N P N i =1 y ( i ) y ( i ) ⊤ . Eac h of th ese plante d mo dels has a corresp ond ing nul l mo del , giv en by sampling fr om the plan ted mo del with either λ = 0 (Wigner) or β = 0 (Wishart). W e are inte rested in the computational feasibilit y of the follo wing t wo statistical tasks, to b e p erformed giv en a realizatio n of the data (either Y or { y (1) , . . . , y ( N ) } ) dra wn from either the n u ll or plante d d istribution. • Detection. P erf orm a simple hyp othesis test b et we en the plante d mo del and null mo del. W e sa y that str ong dete ction is ac h iev ed by a statist ical test if b oth the t yp e-I and t yp e-I I errors tend to 0 as n → ∞ . • Reco very . Estimate the spik e x giv en data drawn from the plan ted mo del. W e sa y that a un it-norm estimator b x ∈ R n ac hiev es we ak r e c overy if h b x, x i 2 remains b ounded a wa y f r om zero with probabilit y tending to 1 as n → ∞ . 1 (Note that w e cannot hop e to d istinguish b et w een the plan ted mo dels with signals x and − x .) F or high-dimen sional in f erence p roblems su c h as the spiked Wigner and Wishart mo dels, these t w o tasks t ypically share the same computational proﬁle: with a giv en computational time bud get , strong d etec tion and w eak reco ve ry are p ossible in the same regions of parameter space. 1 W e will also consider stronger notions of recov ery: str ong r e c overy is h ˆ x , x i 2 → 1 as n → ∞ and exact r e c overy is ˆ x = x with probabilit y 1 − o (1). 3 1.2 Principal Comp onen t Analysis Simple algorithms for b oth detection and reco v ery are given by princip al c omp onent analysis (PCA) of the matrix Y . F or detecti on, one computes and thresholds the maxim um eigen v alue λ max ( Y ) of Y , while for reco v ery one estimates x u s ing the leading eige nv ector v max ( Y ). Both the s pik ed Wishart and Wigner mo dels are kno wn to exhibit a sharp transition in their top eigen v alue as the mo del parameters v ary . F or the Wishart mo del, the celebrated “BBP transition” of Baik, Ben Arous, and P´ ec h ´ e [BBP05, BS06] states that the maxim um eige nv alue of the sample co v ariance matrix Y emerges from th e Marc henko –Pa stur -distributed bu lk if and only if β 2 > γ . Similarly , in the Wigner mo del, the maximum eigen v alue of Y emerges fr om the semicircular bulk if and only if λ > 1 [FP07]. More formally , th e follo win g statemen ts hold. Theorem 1.1 ([FP07, BGN11]) . Consider the spike d W igner mo del Y = W + λxx ⊤ with k x k = 1 and λ > 0 ﬁxe d. Then as n → ∞ , • if λ ≤ 1 , λ max ( Y ) → 2 almost sur e ly, and h v max ( Y ) , x i 2 → 0 almost sur ely (wher e v max denotes the le ading ei g enve ctor); • if λ > 1 , λ max ( Y ) → λ + λ − 1 > 2 almost sur ely, and h v max ( Y ) , x i 2 → 1 − λ − 2 > 0 almost sur ely. Theorem 1.2 ([BBP05, BS06, Pa u04]) . L e t Y denote the sample c ovarianc e matrix in the spike d Wishart mo del with k x k = 1 , β > 0 ﬁxe d, and N = N ( n ) such that γ : = n/ N c onver ges to a c onstant ¯ γ > 0 as n → ∞ . Then as n → ∞ , • if β 2 ≤ ¯ γ , λ max ( Y ) → (1 + √ ¯ γ ) 2 almost sur ely, and h v max ( Y ) , x i 2 → 0 almos t sur ely (wher e v max denotes the le ading ei g enve ctor); • if β 2 > ¯ γ , λ max ( Y ) → (1 + β )(1 + ¯ γ /β ) > (1 + √ ¯ γ ) 2 almost sur ely, and h v max ( Y ) , x i 2 → (1 − ¯ γ /β 2 )(1 + ¯ γ /β ) > 0 almost sur ely. W e deﬁne the signal-to-noise r atio (SNR) ˆ λ by ˆ λ : =  λ in the Wigner mo del, β / √ γ in th e Wishart mo del. (1) Theorems 1.1 and 1.2 then c haracterize the p erformance of PCA in terms of ˆ λ . Namely , thresholdin g the largest eigenv alue of Y succeeds at s tr ong detection when ˆ λ > 1 and fails wh en ˆ λ ≤ 1; s imilarly , the top eigen v ector s uccee ds at w eak reco very wh en ˆ λ > 1 and fails wh en ˆ λ ≤ 1. F or some distributions of x , including the spherical prior ( x dra wn u niformly from the unit sph ere) and the Rademac her prior (eac h en try x i dra wn i.i.d. from Unif ( ± 1 / √ n )), it is kn o wn that the P C A threshold is optimal, in the sense that strong detection and we ak r eco v ery are statistica lly imp ossible (for any test or estimator, regardless of computational cost) w hen ˆ λ < 1 [OMH13, MRZ15, D AM16 , BMV + 18, PWBM18b]. 1.3 Sparse PCA Sp arse PCA , a direction initiated by Johnstone and Lu [J L04, JL09], s eeks to imp r o ve p erf orm ance when the plan ted vec tor is kno wn to b e sparse in a given basis. While v arious sparsity assu m ptions 4 ha v e b een considered, a simp le and illustrativ e one is to tak e x dr a wn f r om the sp arse R ademacher prior , d en ote d X ρ n , in which eac h entry x i is d istributed indep endently as x i =              1 √ ρn with pr obabilit y ρ 2 − 1 √ ρn with pr obabilit y ρ 2 0 with probabilit y 1 − ρ (2) for some kno wn sp arsit y parameter ρ ∈ (0 , 1], whic h ma y dep end on n . 2 The normalization ensur es k x k → 1 in probab ility as n → ∞ . Consider the Wishart mo del (we will see that the Wigner mo del shares essen tially the same b eha vior) in th e regime ˆ λ = Θ(1) with ˆ λ < 1 (so that ordinary PCA fails at weak reco very). The simple diagonal thr esholding algorithm prop osed by [JL09] estimates the su pp ort of x by identifying the largest diagonal entries of Y . Under the cond ition 3 ρ . 1 / √ n log n , this has b een sho wn [A W08 ] to ac hieve exact supp ort r e c overy , i.e., it exactly reco v ers th e supp ort of x with pr ob ab ility tending to 1 as n → ∞ (and once the su pp ort is kn o wn , it is straigh tforw ard to reco v er x ). Th e more sophisticated c ovarianc e thr esholding algo rithm prop osed by [KNV15] has b een shown [DM14b] to ac hiev e exact sup p ort reco v ery w hen ρ . 1 / √ n . On the other hand , giv en unlimited computational p o w er, an exhaustiv e searc h o ver all p ossible supp ort sets of size ρn ac h iev es exact sup p ort reco ve ry un der th e muc h weak er assumption ρ . 1 / log ( n ) [PJ12, VL12, CMW13 ]. Simila rly , str ong detection and weak reco v ery are statistically p ossible ev en when ρ is a suﬃcien tly small constan t (dep ending on β , γ ) [BMV + 18, PWBM18b], and the precise critical constant ρ ∗ ( β , γ ) is giv en by the r eplic a formula from s tat istical physic s (see, e.g., [LKZ 15 a, LKZ15b, KXZ16, DMK + 16, L M19, Mio17, EKJ17, E K18, EKJ18], or [Mio18] for a surv ey). Ho we ver, no p olynomial-time algorithm is kno wn to succeed (for any reasonable notion of success) when ρ ≫ 1 / √ n , d espite extensiv e work on algorithms for s p arse PCA [dGJL05, ZHT06, MW A06 , dBG08, A W08, WTH09, BR13b, DM14a, KNV15, BPP18]. In fact, a gro wing b o dy of theoretical evidence su gge sts that no p olynomial-time algorithm can succeed when ρ ≫ 1 / √ n [BR13a, CMW13, MW15, WBS16, HKP + 17, BBH18, BB19]. Such evi- dence tak es th e form of r eductions [BR13a, WBS16, BBH18, BB1 9] from the plante d cliqu e problem (whic h is widely conjectured to b e h ard in certain regimes [Jer92, DM15, MPW15, BHK + 19]), as w ell as lo we r b ounds against the sum-of-squares hierarc h y of con v ex r ela xations [MW15, HKP + 17]. Th u s , we exp ect sparse PCA to exh ibit a large “p ossible bu t h ard” r egi me when 1 / √ n ≪ ρ ≪ 1. Statistical- to-computational gaps of this kind are b eliev ed to o ccur and hav e b een studied exten- siv ely in many other statistical in f erence pr oblems, such as comm unity detection in the sto c hastic blo c k m o del [DKMZ11b, DKMZ11a ] and tensor PCA [RM14, HSS15, HKP + 17, ZX18, BB20]. 1.4 Our Contributions In this pap er, w e inv estigate precisely ho w hard the “hard” region (1 / √ n ≪ ρ ≪ 1) is in sparse PCA. W e consider sub exp onen tial-time algorithms, i.e., algo rithm s with runtime exp( n δ + o (1) ) for ﬁxed δ ∈ (0 , 1). W e show a smo oth tradeoﬀ b et w een sparsit y (go verned b y ρ ) and runtime (go ve rn ed b y δ ). More sp eciﬁcally , ou r results (for b oth the Wishart and Wigner m o dels) are as follo ws. • Algorithms. F or any δ ∈ (0 , 1), w e give an algorithm w ith runt ime exp( n δ + o (1) ) that achiev es exact su pp ort reco very , provided ρ ≪ n ( δ − 1) / 2 . 2 W e analyze our algorithms for a more general set of assumptions on x ; see Deﬁnition 2.1. 3 W e use A . B t o denote A ≤ C B for some constan t C , and use A ≪ B to denote A ≤ B / p olylog( n ). 5 • Lo w er b ounds. Thr ough an analysis of the low-de gr e e likeliho o d r atio (see S ection 1.5), w e giv e formal eviden ce suggesting that the ab o ve condition is essen tially tigh t in the sense that no algorithm of runtime exp( n δ + o (1) ) can su cce ed when ρ ≫ n ( δ − 1) / 2 . (Our results are sharp er th an the sum-of-squares lo wer b ounds of [HKP + 17] in th at we pin down the p r ecise constan t δ .) Our algorithm inv olves exhaustiv e searc h ov er su bsets of [ n ] of cardin alit y ℓ ≈ n δ . The case ℓ = 1 is diagonal thresholding (which is p olynomial-time and succeeds wh en ρ . 1 / √ n log n ) and th e case ℓ = ρn is exhaustive searc h o v er all p ossible spik es (which requires time exp( ρn 1+ o (1) ) and succeeds when ρ . 1 / (lo g n )). As ℓ v aries in the range 1 ≤ ℓ ≤ ρn , our algorithm interp olates sm oothly b et w een these t wo extremes. F or a giv en ρ in the range 1 / √ n ≪ ρ ≪ 1, the smallest admissible c hoice of ℓ is r oughly ρ 2 n , yielding an algorithm of runtime exp ( ρ 2 n 1+ o (1) ). Our results extend to the case ˆ λ ≪ 1, e.g., ˆ λ = n − α for some constan t α > 0. In this case, pro vided ρ ≪ ˆ λ 2 (whic h is information-theoreticall y necessary [PJ12, VL12, CMW13]), there is an exp( n δ + o (1) )-time algorithm if ρ ≪ ˆ λn ( δ − 1) / 2 , and the low-degree lik eliho od r atio again suggests that this is optimal. In other words, for a giv en ρ in the range ˆ λ/ √ n ≪ ρ ≪ ˆ λ 2 , w e can solv e sparse PC A in time exp( ˆ λ − 2 ρ 2 n 1+ o (1) ). The analysis of our algorithm app lies not jus t to the sparse Rademac her spik e prior, b ut also to a w eak er set of assumptions on the spik e that do not require all of the n onzero en tries to ha v e the s ame magnitude. Our algorithm is guaran teed (with high probabilit y) to exactly reco ve r b oth the supp ort of x and the signs of the nonzero entries of x . Once the supp ort is kno wn, it is straigh tforw ard to estimate x via the leading eigen vec tor of the appropriate sub matrix. In ind ep end en t work [HSV19], a diﬀerent algorithm for sparse PCA was prop osed and sh o wn to h a ve essent ially the same su b exp onen tial runtime as ours . Also, prior work [KZ 14] ga ve a sub exp onen tial-time algorithm for certifying the restricted isometry prop ert y , th at is somewhat similar in spirit to our algorithm for spars e PC A. Remark 1.3. Ce rta in pr oblems b esides sp arse PCA have a similar smo oth tr ade oﬀ b etwe en sub ex- p onential runtime r e quir ements and statistic al p ower. These include r e futing r andom c onstr aint satisfaction pr oblems [RRS17] and tensor PCA [BGG + 16, BGL16, WE M19]. In c ontr ast, other pr oblems have a sharp thr e sho ld at which they tr ansition fr om b eing solvable in p olynomial-time to (c onje ctur al ly) r e quiring essential ly exp onential time: exp( n 1 − o (1) ) . E xam ples of this b ehavior c an o c cur at the sp e ctr al tr ansition at ˆ λ = 1 in the spike d Wishart and Wigne r matrix mo dels (se e [BKW19, KWB19] ) as wel l as at the Kesten–Stigum thr e shold in the sto chastic blo c k mo del (se e [DKMZ11b, D K MZ11a, HS17, Hop18]). 1.5 Bac kground on the Lo w-Degree Lik eliho o d Ratio A sequence of recen t work on th e sum-of-squares h ierarc hy [BHK + 19, HS17, HKP + 17, Hop18] has led to the dev elopmen t of a remark ably simp le metho d for predicting the amount of computation time required to solv e statistical tasks. T h is method —whic h we w ill refer to as the low-de gr e e metho d —is based on analyzing the so-called low-de gr e e likeliho o d r atio , and is b eliev ed to b e in- timately conn ect ed to the p o w er of su m-of-squares (although formal implications ha v e not b een established). W e now give an o v erview of this metho d; see [Hop18, KWB19 ] f or more d eta ils. W e will consider the problem of d istinguishing t w o sim p le hyp otheses P n and Q n , whic h are probabilit y distributions on some domain Ω n = R d ( n ) with d ( n ) = p oly( n ). The idea of the lo w - degree metho d is to explore whether th ere is a lo w-degree p olynomial f n : Ω n → R that can distinguish P n from Q n . 6 W e call Q n the “null” distr ib ution, w h ic h for us will alw a ys b e i.i.d. Gauss ian (see Deﬁnitions 2.4 and 2.17). Q n induces an inner pro duct on L 2 functions f : Ω n → R giv en by h f , g i L 2 ( Q n ) = E Y ∼ Q n [ f ( Y ) g ( Y )], and a norm k f k 2 L 2 ( Q n ) = h f , f i L 2 ( Q n ) . F or D ∈ N , let R [ Y ] ≤ D denote the m ultiv ariate p olynomials Ω n → R of d egree at most D . F or f : Ω n → R , let f ≤ D denote the orthogonal pro jection (with resp ect to h· , ·i L 2 ( Q n ) ) of f onto R [ Y ] ≤ D . The follo wing result then relates the distinguishing p ow er of lo w-degree p olynomials (in a certain L 2 sense) to th e low-de gr e e likeliho o d r atio . Theorem 1.4 ([HS17, HKP + 17]) . L et P and Q b e pr ob ability distributions on Ω = R d . Supp ose P is absolutely c ontinuous with r esp e ct to Q , so that the likeliho o d r atio L = d P d Q is deﬁne d. Then max f ∈ R [ Y ] ≤ D \{ 0 } E Y ∼ P [ f ( Y )] p E Y ∼ Q [ f ( Y ) 2 ] = k L ≤ D k L 2 ( Q ) . (3) (The pr oof is straigh tforward: the fraction on the left can b e written as h f , L i L 2 ( Q n ) / k f k L 2 ( Q n ) , so the m aximiz er is f = L ≤ D .) Th e left-hand side of (3) is a heuristic measure of h o w well degree- D p olynomials can d istinguish P from Q : if this quan tit y is O (1) as n → ∞ , this s u gge sts that no degree- D p olynomial can ac hieve strong detection (and indeed this is m ade formal by Theorem 4.3 of [KWB19]). The right-hand side of (3) is the norm of the lo w-degree lik eliho o d ratio (LDLR), whic h can b e computed or b ou n ded in many cases, m aking th is heu ristic a practical to ol f or pr edicting computational f easibility of hypothesis testing. The key assump tion u n derlying the lo w-degree metho d is that, for many n atural d istr ibutions P n and Q n , d egree- D p olynomials are as p ow erfu l as algorithms of runtime n ˜ Θ( D ) , where ˜ Θ hides factors of log n . This is captured b y the follo wing inf ormal conjecture, wh ic h is b ased on [HS17, HKP + 17, Hop18]; in p articula r, see Hyp othesis 2.1.5 of [Hop18]. Conjecture 1.5 (Informal) . Supp ose t : N → N . F or “nic e” se que nc es of distributions P n and Q n , if k L ≤ D ( n ) n k L 2 ( Q n ) r emains b ounde d as n → ∞ whenever D ( n ) ≤ t ( n ) · p olylog ( n ) , then ther e exists no se qu enc e of functions f n : Ω n → { p , q } with f n c omputable i n time n t ( n ) that str ongly distinguishes P n and Q n , i.e., that satisﬁes lim n →∞ Q n [ f n ( Y ) = q ] = lim n →∞ P n [ f n ( Y ) = p ] = 1 . (4) On a ﬁner scale, it is conjectured [HS17, Hop18] that if for some ε > 0 w e h a ve D ( n ) ≥ log 1+ ε ( n ) and k L ≤ D ( n ) n k L 2 ( Q n ) = O (1), then no p olynomial-time algorithm can strongly d istinguish P n from Q n . In pr ac tice, it seems that the con verse of C onjecture 1.5 often holds as w ell, in the sense that if k L ≤ D ( n ) n k L 2 ( Q n ) = ω (1) for some D ( n ) = t ( n ) / p olylog( n ), th en there is an n t ( n ) -time distinguishin g algorithm (how eve r, see Remark 2.16 for one ca vea t). Calculations w ith th e LDLR hav e b een carried out for problems such as comm unity d etec tion [HS17, Hop18], p lan ted clique [BHK + 19, Hop18], th e spiked Wishart mo del [BKW19], the spike d Wigner mod el [KWB19], and tensor PCA [HKP + 17, Hop18, KWB19] (tensor PCA exhibits a sub exp onen tial-time tradeoﬀ similar to sparse PC A; see [KWB19]). In all of th e ab o v e cases, the lo w-degree predictions coincide w ith widely-conjectured statistical-v ersu s -co mpu tati onal tradeoﬀs. V arious leading algorithmic app roac hes can b e approximat ed b y low-degree p olynomials and are th us ru led out by lo w-degree lo wer b ounds of the form k L ≤ D ( n ) n k L 2 ( Q n ) = O (1). These app roac hes include a general class of sp ectral m ethods (see Theorem 4.4 of [KWB19]) as w ell as the algorithms that we present in this pap er (see Remark 2.20). The lo w-degree predictions are also conjectured to coincide w ith th e p ow er of th e su m-of-squares h ierarc hy and are in particular connected to 7 the pseudo-c alibr ation approac h [BHK + 19]; see [HKP + 17, RSS18, Hop18]. W e refer the reader to Section 4 of [KWB19] for fu rther discussion of the implications (b oth formal and conjectural) of lo w-degree lo w er b ounds. Conjecture 1.5 is in formal in the sen s e that we ha ve not sp eciﬁed the m ea nin g of “nice” P n and Q n . Roughly sp eaking, highly-symm etric high-dimen s ional problems are considered “nice” s o long as P n and Q n ha v e at least a small amount of noise in order to rule out brittle high-degree algorithms s u c h as Gaussian eliminatio n. (In particular, w e consider spik ed Wigner and Wishart to b e “nice.”) Conjecture 2.2.4 of [Hop18] is one formal v arian t of the lo w-d egree conjecture, although it u ses the more reﬁned notion of c o or dinate de gr e e and so do es not apply to the calculations in this pap er. W e remark that if k L n k L 2 ( Q n ) = O (1) (the D = ∞ ) case, then it is statistically imp ossible to s tr ongly distinguish P n and Q n ; this is a commonly-used se c ond moment metho d (see e.g., [MRZ15, BMV + 18, P WBM18b]) of whic h Conjecture 1.5 is a computationally-boun ded analogue. In this p aper we giv e tight computational lo wer b ounds for spars e PC A, conditional on Conjec- ture 1.5. Alternativ ely , one can view the resu lts of th is p ap er as a “stress test” for Conj ecture 1.5: w e sho w that Conjecture 1.5 predicts a certain statistical-v ersus -co mpu tati onal tradeoﬀ and this indeed matches the b est algorithms that we kno w. Organization. The remainder of the pap er is organized as follo ws. In Section 2, w e present our sub exp onen tial-time algorithms and our lo w er b ound s based on th e lo w-degree likelihoo d r ati o. In Section 3, we giv e pro ofs for the correctness of our algorithms. I n Section 4, we giv e pro ofs f or our analysis of the lo w-degree lik eliho od ratio. Notation. W e use standard asymptotic notation O ( · ), Ω( · ), Θ( · ), alw a ys p ertaining to the limit n → ∞ . W e also use ˜ O ( B ) to mean O ( B · p olylog( n )) and ˜ Ω( B ) to m ean Ω( B / p olylog( n )). Also recall that f ( n ) = o ( g ( n )) means f ( n ) /g ( n ) → 0 as n → ∞ an d f ( n ) = ω ( g ( n )) means f ( n ) /g ( n ) → ∞ as n → ∞ . An eve nt o ccurs with high pr ob ability if it o ccurs with p robabilit y 1 − o (1). W e sometimes use the shorthand A . B to mean A ≤ C B f or an absolute constant C , and the shorthand A ≪ B to mean A ≤ B / p olylog( n ). 2 Main Results In the analysis of our algorithms, we consider the spik ed Wishart and Wigner mo dels with signal x satisfying the follo wing prop erties. Deﬁnition 2.1. F or ρ ∈ (0 , 1] and A ≥ 1 , a ve ctor x ∈ R n is c al le d ( ρ, A )-sparse if • k x k 2 = 1 and k x k 0 = ρn , and • for any i ∈ supp( x ) , 1 A √ ρn ≤ | x i | ≤ A √ ρn . Here we h a ve used the standard notations supp( x ) = { i ∈ [ n ] : x i 6 = 0 } and k x k 0 = | supp( x ) | . W e assume that ρ (wh ic h may dep end on n ) is c hosen so that ρn is an inte ger. Remark 2.2. A lower b ound on | x i | is essential for exact supp ort r e c overy, sinc e we c annot hop e to distinguish tiny nonzer o entries of x fr om zer o entries. The upp er b ound on | x i | is a te chnic al c ondition that is likely not essential, and is only use d for r e c overy in the Wishart mo del (The o- r em 2.10). 8 In our calculations of the low-deg ree likeli ho o d ratio, we in s tea d assum e the signal x is dra wn from the sparse Rademacher d istribution, deﬁned as follo ws. Deﬁnition 2.3. The sparse Rademac her prior X ρ n with sp arsity ρ ∈ (0 , 1] is the distribution on R n wher eby x ∼ X ρ n has i. i .d. entries distribute d as x i =    +1 / √ ρn with pr ob ability ρ/ 2 , − 1 / √ ρn with pr ob ability ρ/ 2 , 0 with pr ob ability 1 − ρ. (5) Note that x ∼ X ρ n has k x k 2 → 1 in pr ob ability as n → ∞ . 2.1 The Wishart Model W e ﬁrst p resen t our resu lts for the Wishart mo del. Ou r algorithms and results for the Wigner mo del are essentially identic al and can b e found in S ect ion 2.2. Deﬁnition 2.4 (S pik ed Wishart mo del) . The spike d Wishart mo del with p ar ameters n, N ∈ N + , β ≥ 0 , and plante d signal x ∈ R n is deﬁne d as fol lows. • Under P n = P n,N ,β , we observe N indep e ndent samples y (1) , . . . , y ( N ) ∼ N (0 , I n + β xx ⊤ ) . • Under Q n = Q n,N , we observe N indep e ndent samples y (1) , . . . , y ( N ) ∼ N (0 , I n ) . We wil l sometimes sp e cify a prior X n for x , in which c ase P n ﬁrst dr aws x ∼ X n and then dr aws y (1) , . . . , y ( N ) as ab ove. Detection. W e ﬁrst consid er the detection p roblem, where the goal is to determine w hether the giv en data { y ( i ) } was dra wn from P n or Q n . Algorithm 1: Detection in the spike d Wishart mo del Input: Data { y ( i ) } 1 ≤ i ≤ N , p aramete rs ρ ∈ (0 , 1], β ≥ 0, A ≥ 1, ℓ ∈ N + 1: Compu te the sample co v ariance matrix: Y ← 1 N P N i =1 y ( i ) y ( i ) ⊤ 2: Sp ecify the searc h set: I n,ℓ ← { v ∈ {− 1 , 0 , 1 } n : k v k 0 = ℓ } 3: Compu te the test statistic: T ← max v ∈I n,ℓ v ⊤ Y v 4: Compu te the thresh old: T ∗ ← ℓ (1 + β ℓ 2 A 2 ρn ) 5: if T ≥ T ∗ then 6: return p 7: else 8: return q 9: end if The d etec tion algorithm is motiv ated b y the f ac t that v ⊤ Y v = 1 N P N i =1 h v , y ( i ) i 2 . Under the plante d mo del P n , y ( i ) ∼ N (0 , I n + β x x ⊤ ) and thus h v, y ( i ) i ∼ N (0 , ℓ + β h v , x i 2 ) for an y ﬁxed v ∈ I n,ℓ ; as a r esult, if v correctly “guesses” ℓ en tries of x w ith correct signs (up to a global ﬂip), then the con tribution of h v , x i 2 to the v ariance of h v , y ( i ) i w ill cause v ⊤ Y v to b e large. Remark 2.5 (Run time) . The runtime of Algor ithm 1 is dominate d by exhaustive se ar ch over I n,ℓ during Step 3, when we c ompute T . Sinc e |I n,ℓ | =  n ℓ  2 ℓ ≤ (2 n ) ℓ , the runtime is n O ( ℓ ) . If ℓ = ⌈ n δ ⌉ for a c onstant δ > 0 , then the runtime is n O ( n δ ) = exp ( n δ + o (1) ) . 9 Theorem 2.6 (Wishart d ete ction) . Consider the spike d Wishart mo del with a ( ρ, A ) - sp arse signal x , and let γ = n/ N . L et { y ( i ) } N i =1 b e dr awn fr om either P n or Q n , and let f n b e the output of Algor ithm 1. Supp ose ρ ≤ min  1 , β A 2  β 25 A 2 γ 1 log n . (6) L et ℓ b e any inte ger in the interval ℓ ∈  25 A 4 γ β 2 ρ 2 n log n, min  1 , β A 2  A 2 β ρn  , (7) which is nonempty due to (6) . Then, the total failur e pr ob ability of Algorithm 1 satisﬁes P n [ f n = q ] + Q n [ f n = p ] ≤ 2 exp  − β 2 48 A 4 γ ℓ 2 ρ 2 n  ≤ 2 n − 25 ℓ/ 48 , wher e the last ine q u ality fol lows fr om (7) . Remark 2.7. Sinc e the runtime is n O ( ℓ ) , for the b est p ossible runtime we should cho ose ℓ as smal l as p ossible, i.e., ℓ =  25 A 4 γ β 2 ρ 2 n log n  . Remark 2.8. We ar e primarily inter este d in the r e gime n → ∞ with γ = Θ(1) , A = Θ(1) , ρ = n − τ for a c onstant τ ∈ (0 , 1) , and ei ther β = n − α for a c onstant α > 0 , or β = Θ(1) with ˆ λ : = β / √ γ < 1 (in which c ase α : = 0 ). In this c ase, the r e quir ement (6) r e ads ρ ≤ Ω( ˆ λ 2 / log n ) (or, in other wor ds, τ > 2 α ), which is information-the or etic al ly ne c essary up to lo g factors [PJ12, VL12, CMW13]. Cho osing ℓ as in R emark 2.7 yields an algorith m of runtime n O (1+ ˆ λ − 2 ρ 2 n log n ) = p oly( n ) + exp( n 2 α − 2 τ +1+ o (1) ) . Remark 2.9. F or S ⊆ [ n ] , let Y S denote the c orr esp onding princip al sub matrix of Y (i.e. , r estrict to the r ows and c olumns whose i ndic es lie in S ). An alternative dete ction algorithm would b e to thr eshold the test statistic T ′ : = max S ∈ ( [ n ] ℓ ) λ max ( Y S ) , i.e., the lar g est eigenv alue of any ℓ × ℓ princip al sub matrix. O ne c an obtain similar guar ante es for this algorithm as f or Algorithm 1. Reco very . W e no w turn to the p r oblem of exactly reco ve rin g th e supp ort and signs of x , given data dr awn fr om P n . The goal is to output a vecto r ¯ x ∈ {− 1 , 0 , 1 } n suc h that sign( ¯ x ) = ± sign( x ) where sign( x ) i = sign( x i ) and sign( x i ) =    1 if x i > 0 , − 1 if x i < 0 , 0 if x i = 0 . Note that we can only hop e to reco v er sign( x ) u p to a global sign ﬂip , b ecause xx ⊤ = ( − x )( − x ) ⊤ . 10 Algorithm 2: Reco v ery of su pp ( x ) and sign( x ) in the sp ik ed Wishart mo del Input: Data { y ( i ) } 1 ≤ i ≤ N , p aramete rs ρ ∈ (0 , 1] , β ≥ 0 , A ≥ 1 , ℓ ∈ N + 1: ¯ N ← ⌊ N/ 2 ⌋ 2: Compu te sample co v ariance matrices: Y ′ ← 1 ¯ N P ¯ N i =1 y ( i ) y ( i ) ⊤ , Y ′′ ← 1 ¯ N P 2 ¯ N i = ¯ N +1 y ( i ) y ( i ) ⊤ 3: Sp ecify the searc h set: I n,ℓ ← { v ∈ {− 1 , 0 , 1 } n : k v k 0 = ℓ } 4: Compu te the initial estimate: v ∗ ← argmax v ∈I n,ℓ v ⊤ Y ′ v 5: Compu te the reﬁn ed estimate: z ← ( Y ′′ − I ) v ∗ 6: for j = 1 to n do 7: ¯ x j ← sign ( z j ) · 1 {| z j | > β ℓ 2 √ 3 A 2 ρn } 8: end f or Output: ¯ x F or tec hnical reasons, we d ivide our N samples int o tw o subsamples of size ¯ N = ⌊ N / 2 ⌋ (with one sample d isca rd ed if N is o dd ) and pro duce t wo in d ep enden t sample co v ariance matrices Y ′ and Y ′′ . The ﬁrst step of th e algorithm is similar to the detection algorithm: b y exh austiv e searc h, we ﬁnd the vec tor v ∗ ∈ I n,ℓ maximizing v ⊤ Y ′ v . In the course of pr o ving that the algorithm succeeds, w e will sho w that v ∗ has n on trivial correlation with x . The second step is to r eco v er the su p p ort (and signs) of x by thresholding z = ( Y ′′ − I ) v ∗ . Note that z discards (i.e., do es not dep end on) the columns of Y ′′ that do not lie in supp( v ∗ ); since sup p( v ∗ ) has subs tan tial ov erlap with su p p ( x ), this serves to amp lify the signal. Theorem 2.10 (Wishart supp ort and sign reco v ery) . Consider the plante d spike d Wishart mo del P n with an arbitr ary ( ρ, A ) -sp arse signal x , and let γ = n/ N . Supp ose ρ ≤ min  1 , β 25 A 8  A 4 β 400 γ 1 log n . (8) L et ℓ b e any inte ger in the interval ℓ ∈  10000 A 4 γ β 2 ρ 2 n log n , min  1 , β 25 A 8  25 A 8 β ρn  , (9) which is nonempty due to (8) . Then the failur e pr ob ability of Algorithm 2 satisﬁes 1 − P n [sign( ¯ x ) = ± sign( x )] ≤ 6 exp  − β 2 6400 A 4 γ ℓ ρ 2 n  ≤ 6 n − 3 / 2 , wher e the last ine q u ality fol lows fr om (9) . Remark 2.11. As for dete ction, the runtime of Algorithm 2 is n O ( ℓ ) , and we c an minimize this by cho osing ℓ =  10000 A 4 γ β 2 ρ 2 n log n  . Once w e obtain su p p( x ) u sing Algo rithm 2, it is straight forward to estimate x (up to global sign ﬂ ip ) using the leading eigen vec tor of the appropriate submatrix. Th is step of the algo rithm requires only p olynomial time. 11 Theorem 2.12 (Wishart r eco v ery) . Consider the plante d spike d Wishart mo del P n with an arbi- tr ary ( ρ, A ) -sp arse signal x , and let γ = n / N . Supp ose we have ac c e ss (e.g., via A lgorithm 2) to I = su pp ( x ) ⊂ [ n ] . Write P I = P i ∈I e i e ⊤ i , y ( i ) I = P I y ( i ) and Y I = P I Y P ⊤ I = 1 N P N i =1 y ( i ) I y ( i ) I ⊤ . L et ˜ x denote the unit-norm eigenve ctor c orr esp onding to the maximum eigenvalue of Y I . Then, ther e exists an absolute c onstant C > 0 such that, for any ǫ ∈ [ 2(1+ β ) √ γ ρ C β , 1) , P n  h ˜ x, x i 2 ≤ 1 − ǫ  ≤ 2 exp  − C 2 β 2 nǫ 2 4(1 + β ) 2 γ ρ  ≤ 2 exp( − n ) . Remark 2.13. In the r e gime we ar e inter e ste d in, n → ∞ with A = O (1) , β = O (1) , and (8) is satisﬁe d. In this c ase, the c onclusion of The or em 2.12 gives h ˜ x, x i 2 > 1 − o (1) with high pr ob ability. Lo w-degree lik eliho o d. No w, we turn to con trolling the lo w-degree lik eliho od ratio (LDLR) (see Section 1.5) to pr o vide rigorous eviden ce that the ab o ve algorithms are optimal. In this section w e tak e a fu lly Bay esian app r oac h, and assum e that the p lan ted signal x is drawn fr om th e sp arse Rademac her prior X ρ n . Recall that th e signal-to-noise ratio is deﬁn ed as ˆ λ : = β / √ γ . As discussed in Section 1.5, we will determine the b eha vior of k L ≤ D n k in the limit n → ∞ : if k L ≤ D n k = O (1), this suggests hardness for n ˜ Ω( D ) -time algorithms. W e allo w the parameters D , ρ, β , γ to dep end on n , whic h w e sometimes emphasize by wr iting, e.g., ρ n . F or D n = o ( n ), our results suggest h ardness f or n ˜ Ω( D ) -time algorithms whenev er ˆ λ < 1 and ρ ≫ ˆ λ p D n /n . This is essentia lly tigh t, matc hing PCA (which su cceeds when ˆ λ > 1) and our algorithm with ℓ = D n (whic h succeeds w hen ρ ≪ ˆ λ p D n /n ) (how ever, see Remark 2.16 b elo w for one cav eat). Theorem 2.14 (Boundedness of LDLR for large ρ ) . Under the spike d Wishart mo del with spike prior X = X ρ n , supp ose D n = o ( n ) . If one of the fol lowing holds f or su ﬃciently lar ge n : (a) lim sup n →∞ ˆ λ n < 1 and ρ n ≥ m ax 1 , s 1 6 log (1 / ˆ λ n ) ! r D n n , or (10) (b) lim sup n →∞ ˆ λ n < 1 / √ 3 and ρ n ≥ ˆ λ n r D n n , (11) then, as n → ∞ , k L ≤ D n,N ,β , X k = O (1) . The follo wing result on divergence of th e LDLR serve s as a sanity c hec k: w e show th at k L ≤ D n k indeed d iv erges in the r egime where we kno w that a n ˜ Ω( D ) -time algorithm exists. Theorem 2.15 (Div ergence of LDLR for small ρ ) . Under the spike d Wishart mo del with spike prior X = X ρ n , supp ose D n = ω (1) and D n = o ( n ) . If one of the fol lowing holds: (a) lim inf n →∞ ˆ λ n > 1 , or (b) lim sup n →∞ ˆ λ n < 1 , | log ˆ λ n | = o ( √ D n ) and for suﬃciently lar ge n , ρ n < C ˆ λ n log − 2 (1 / ˆ λ n ) r D n n wher e C is an absolute c onstant, 12 then, as n → ∞ , k L ≤ D n,N ,β , X k = ω (1) . Remark 2.16. Ther e is one r e gime wher e the ab ove r esults give some unexp e cte d b ehavior. R e c al l ﬁrst that optimal Bayesian infer enc e for sp arse PCA c an b e p erforme d in time n O ( ρn ) by c omputing the likeliho o d r atio. Thus if k L ≤ D n k = O (1) for some D n ≫ ρn , this suggests that the pr oblem is information-the or etic al ly imp ossible; fr om our r esults ab ove, ther e ar e r e gimes wher e this o c curs (and inde e d the pr oblem i s information-the or etic al ly imp ossible), yet k L ≤ D n k = ω (1) for some lar ge r D n (which inc orr e ctly suggests that ther e should b e an algorithm). This is analo gous to a phenomenon wher e the se c ond moment of the (non-low-de gr e e) like liho o d r atio k L n k c an sometimes diver ge e ven when str ong dete ction is imp ossible (se e , e.g. [BMNN 16, BMV + 18, PWBM18b]). Luckily, this issue never o c curs for us in the r e gime of inter est D n ≪ ρn , and ther ef or e do es not pr e v ent our r esults fr om b ei ng tight. Note also that none of these observations c ontr adict Conje ctur e 1.5. 2.2 The Wigner Mo del W e n ow state our algorithms and results for the Wigner mo del. T hese are v ery similar to the Wishart case, so w e omit some of th e discussion. Deﬁnition 2.17 (Spiked Wigner mo del) . The spike d Wigner mo del with p ar ameters n ∈ N + , λ ≥ 0 , and plante d signal x ∈ R n is deﬁne d as fol lows. • Under P n = P n,λ , we observe the matrix Y = W + λxx ⊤ , wher e W ∼ G OE ( n ) . • Under Q n , we observe the matrix Y ∼ GOE ( n ) . Algorithm 3: Detection in the spike d Wigner mo del Input: Data Y , parameters ρ ∈ (0 , 1] , λ > 0 , A ≥ 1 , ℓ ∈ N + 1: Sp ecify the searc h set: I n,ℓ ← { v ∈ {− 1 , 0 , 1 } n : k v k 0 = ℓ } 2: Compu te the test statistic: T ← max v ∈I n,ℓ v ⊤ Y v 3: Compu te the thresh old: T ∗ ← λℓ 2 2 A 2 ρn 4: if T ≥ T ∗ then 5: return p 6: else 7: return q 8: end if Remark 2.18 (Ru ntime) . As in the Wishart c ase (se e R emark 2.5), the runtime is n O ( ℓ ) . The same holds for Algor ithm 4 b elow. Theorem 2.19 (Wigner detectio n) . Consider the spike d Wigner mo del with an arbitr ary ( ρ, A ) - sp arse si g nal x . L et Y b e dr awn fr om either P n or Q n , and let f n b e the output of Algorithm 3. Supp ose ρ ≤ λ 2 36 A 4 1 log n . (12) L et ℓ b e any inte ger in the interval ℓ ∈  36 A 4 λ 2 ρ 2 n log n , ρn  , (13) 13 which in nonempty due to (12) . Then the total failur e pr ob ability of Algorithm 3 satisﬁes P n [ f n = q ] + Q n [ f n = p ] ≤ 2 exp  − λ 2 32 A 4 ℓ 2 ρ 2 n  ≤ 2 n − 9 ℓ/ 8 , wher e the last ine q u ality fol lows fr om (13) . Remark 2.20. Sinc e our lower b ounds ar e against the class of low-de gr e e algorithms, it is natur al to ask whether our algor ithms fal l into this class. While our test statistic T i s not a p olynomial function of Y , we c an inste ad take as a pr oxy the de gr e e- 2 k p olynomial P ( Y ) = P v ∈I n,ℓ ( v ⊤ Y v ) 2 k for some choic e of k . Our analysis c an b e adapte d to show that P c an b e use d to solve str ong dete ction under essential ly the same c onditions as The or em 2.19, pr ovide d k & ℓ log n . N ote that (up to lo g factors) this matches the c orr esp ondenc e b etwe en runtime and de g r e e in Conje ctur e 1.5. Algorithm 4: Reco v ery of su pp ( x ) and sign( x ) in the sp ik ed Wigner mo del Input: Data Y , parameters ρ ∈ (0 , 1] , λ > 0 , A ≥ 1 , ℓ ∈ N + 1: Sample ˜ W ∼ GOE ( n ) 2: Compu te indep endent data matrices: Y ′ ← ( Y + ˜ W ) / √ 2 and Y ′′ ← ( Y − ˜ W ) / √ 2 3: Sp ecify the searc h set: I n,ℓ ← { v ∈ {− 1 , 0 , 1 } n : k v k 0 = ℓ } 4: Compu te the initial estimate: v ∗ ← argmax v ∈I n,ℓ v ⊤ Y ′ v 5: Compu te the reﬁn ed estimate: z ← Y ′′ v ∗ 6: for j = 1 to n do 7: ¯ x j = sign( z j ) · 1 {| z j | > λℓ 4 A 2 ρn } 8: end f or Output: ¯ x F or tec hn ical reasons, our ﬁ rst step is to ﬁ ctit iously “split” the data into t wo indep end ent copies Y ′ and Y ′′ . Note that Y ′ = λ √ 2 xx ⊤ + W + ˜ W √ 2 and Y ′′ = λ √ 2 xx ⊤ + W − ˜ W √ 2 . Since W ′ : = W + ˜ W √ 2 and W ′′ : = W − ˜ W √ 2 are indep endent GOE ( n ) matrices, Y ′ and Y ′′ are distribu ted as indep endent observ ations dr a wn from P n with the same plante d signal x and with eﬀectiv e signal-to-noise r atio ¯ λ = λ/ √ 2. Theorem 2.21 (Wigner supp ort and sign reco very) . Consider the plante d spike d Wishart mo del P n with an arbitr ary ( ρ, A ) -sp arse signal x . Supp ose ρ ≤ λ 2 338 A 4 1 log n . (14) L et ℓ b e any inte ger in the interval ℓ ∈  338 A 4 λ 2 ρ 2 n log n, ρn  , (15) which is nonempty due to (14) . Then the failur e pr ob ability of Algorithm 4 satisﬁes 1 − P n [supp( ¯ x ) = sup p ( x ) , sign( ¯ x ) = ± sign( x )) ≤ 4 exp  − λ 2 288 A 4 ℓ ρ 2 n  ≤ 4 n − 169 / 144 , wher e the last ine q u ality fol lows fr om (15) . 14 As in the Wishart case, once we hav e reco vered the supp ort, there is a standard p olynomial-time sp ectral metho d to estimate x . Theorem 2.22 (Wigner reco very) . Consider the plante d spike d Wigner mo del P n with an arbitr ary ( ρ, A ) -sp arse signal x . Supp ose we have ac c ess (e.g., via Algor ithm 4) to I = s u pp( x ) ⊂ [ n ] . Write P I = P i ∈I e i e ⊤ i and Y I = P I Y P ⊤ I . L et ˜ x denote the unit-norm eigenve ctor c orr esp onding to the maximum ei genvalue of Y I . Then for any ǫ ∈ ( 4 √ 2 ρ λ , 1) , P n  h ˜ x, x i 2 ≤ 1 − ǫ  ≤ 4 exp  − n 16  λǫ − 4 p 2 ρ  2  . Remark 2.23. In the r e gime we ar e inter este d in, n → ∞ with (14) satisﬁe d, so that √ ρ/λ → 0 . In this c ase, the c onclusion of The or em 2.22 gives h ˜ x, x i 2 > 1 − o (1) with high pr ob ability, up on cho osing for example ǫ = 8 √ 2 ρ λ . W e also hav e the follo wing results on the b ehavi or of the lo w-degree lik eliho od r atio . Theorem 2.24 (Boundedn ess of LDLR for large ρ ) . Under the spike d Wigner mo del with prior X = X ρ n , supp ose D n = o ( n ) . If one of the fol lowing holds f or su ﬃciently lar ge n : (a) lim sup n →∞ λ n < 1 and ρ n ≥ m ax 1 , s 1 6 log (1 /λ n ) ! r D n n , or (16) (b) lim sup n →∞ λ n < 1 / √ 3 and ρ n ≥ λ n r D n n , (17) then, as n → ∞ , k L ≤ D n,λ, X k = O (1) . Theorem 2.25 (Dive rgence of LDLR for small ρ ) . U nder the spike d Wigne r mo del with prior X = X ρ n , supp ose D n = ω (1) and D n = o ( n ) . If one of the fol lowing holds: (a) lim inf n →∞ λ n > 1 , or (b) lim sup n →∞ λ n < 1 , | log λ n | = o ( √ D n ) and for suﬃciently lar ge n , ρ n < C λ n log − 2 (1 /λ n ) r D n n wher e C is an absolute c onstant, then, as n → ∞ , k L ≤ D n,λ, X k = ω (1) . 3 Pro ofs for Sub exp onen tial-Time Algorithms 3.1 The Wishart Model Pr o of of The or e m 2.6 (Dete ction). Under Q n , for an y ﬁxed v ∈ I n,ℓ w e ha ve v ⊤ y ( i ) ∼ N (0 , ℓ ) for i ∈ [ N ] and v ⊤ Y v = 1 N N X i =1 ( v ⊤ y ( i ) ) 2 (d) = ℓ N χ 2 N , 15 where χ 2 N is a chi-squared random v ariable with N degrees of fr eedom, i.e., the su m of th e squ ares of N s tandard gaussians . Using Corollary A.3, we u nion b ound o ve r v ∈ I n,ℓ for any t ∈ (0 , 1 2 ): Q n [ T ≥ ℓ (1 + t )] ≤ |I n,ℓ | Pr  χ 2 N ≥ N (1 + t )  ≤  n ℓ  2 ℓ · exp  − N t 2 3  ≤ exp  ℓ log (2 n ) − N t 2 3  . Under th e cond ition N t 2 ≥ 4 ℓ log (2 n ) (18) w e ha ve Q n [ T ≥ ℓ (1 + t )] ≤ exp  − 1 12 N t 2  . Mean while, un der P n , when v = ¯ v correctly guesses ℓ en tries and their signs in the supp ort of x (whic h requires ℓ ≤ ρn ), for an y i ∈ [ N ] we ha ve ¯ v ⊤ y ( i ) ∼ N (0 , ¯ v ⊤ ( I n + β xx ⊤ ) ¯ v ) = N (0 , ℓ + β h ¯ v , x i 2 ) . Therefore, ¯ v ⊤ Y ¯ v = 1 N N X i =1 ( ¯ v ⊤ y ( i ) ) 2 (d) = 1 N ( ℓ + β h ¯ v , x i 2 ) χ 2 N where h ¯ v , x i 2 ≥ ℓ 2 A 2 ρn . As a resu lt, by Corollary A.3, P n [ T < ℓ (1 + t )] ≤ P n h ¯ v ⊤ Y ¯ v < ℓ (1 + t ) i = Pr  1 N ( ℓ + β h ¯ v , x i 2 ) χ 2 N < ℓ (1 + t )  = Pr  χ 2 N < N  1 − β ℓ − A 2 ρnt β ℓ + A 2 ρn  ≤ exp − N 3  β ℓ − A 2 ρnt β ℓ + A 2 ρn  2 ! , the last inequalit y r equiring 0 ≤ β ℓ − A 2 ρnt β ℓ + A 2 ρn ≤ 1 2 . (19) T o satisfy t ∈ (0 , 1 2 ), (18) and (19) at the same time, w e c ho ose t = β ℓ 2 A 2 ρn . Under th e cond ition β ℓ A 2 n ≤ ρ ≤ β 5 A 2 √ γ s ℓ n log n , (20) whic h is equiv alen t to the in terv al for ℓ giv en in (7), thresholding the statistic T at ℓ (1 + t ) s u ccee ds at distinguish ing P n and Q n with total error probabilit y Q n [ T ≥ ℓ (1 + t )] + P n [ T < ℓ (1 + t )] ≤ 2 exp  − β 2 48 A 4 γ ℓ 2 ρ 2 n  , whic h completes the pro of. 16 Pr o of of The or e m 2.10 (Supp ort and Sign R e c overy). First, w e give a h igh-p robabilit y lo wer b ound on h v ∗ , x i . F rom the analysis of the detection algorithm, we k n o w that under the condition (20), 1 − 2 exp  − β 2 48 A 4 γ ℓ 2 ρ 2 n  ≤ P n  ℓ  1 + β ℓ 2 A 2 ρn  ≤ v ∗ ⊤ Y ′ v ∗  ≤ P n  ℓ  1 + β ℓ 2 A 2 ρn  ≤ 1 ¯ N ( ℓ + β h v ∗ , x i 2 ) χ 2 ¯ N , h v ∗ , x i 2 < ℓ 2 3 A 2 ρn  + P n  ℓ  1 + β ℓ 2 A 2 ρn  ≤ 1 ¯ N ( ℓ + β h v ∗ , x i 2 ) χ 2 ¯ N , h v ∗ , x i 2 ≥ ℓ 2 3 A 2 ρn  ≤ P n  ℓ  1 + β ℓ 2 A 2 ρn  ≤ 1 ¯ N  ℓ + β ℓ 2 3 A 2 ρn  χ 2 ¯ N  + P n  h v ∗ , x i 2 ≥ ℓ 2 3 A 2 ρn  , where P n  ℓ  1 + β ℓ 2 A 2 ρn  ≤ 1 ¯ N  ℓ + β ℓ 2 3 A 2 ρn  χ 2 ¯ N  = P n  χ 2 ¯ N ≥ ¯ N  1 + β ℓ 2 β ℓ + 6 A 2 ρn  ≤ exp − ¯ N 3  β ℓ 2 β ℓ + 6 A 2 ρn  2 ! ≤ exp  − β 2 384 A 4 γ ℓ 2 ρ 2 n  , hence we ha v e th e lo w er b ound P n  h v ∗ , x i 2 ≥ ℓ 2 3 A 2 ρn  ≥ 1 − 2 exp  − β 2 48 A 4 γ ℓ 2 ρ 2 n  − exp  − β 2 384 A 4 γ ℓ 2 ρ 2 n  ≥ 1 − 3 exp  − β 2 384 A 4 γ ℓ 2 ρ 2 n  . W e no w ﬁ x v ∗ satisfying the ab o v e low er b ound on h v ∗ , x i 2 . F rom this p oint on ward, w e will only use the second copy Y ′′ of our d ata ; note that, crucially , Y ′′ is indep endent fr om v ∗ . T o simplify th e notation, we will write y (1) , . . . , y ( ¯ N ) instead of y ( ¯ N +1) , . . . , y (2 ¯ N ) for the samples used to form Y ′′ . W e n o w adopt an equiv alen t representa tion of the observ ations: y ( i ) = u ( i ) + √ β w ( i ) x , where u ( i ) ∼ N (0 , I n ) and w ( i ) ∼ N (0 , 1) are ind epend en t random gaussian v ectors and scalars, resp ectiv ely . Su bstituting this into z = ( Y ′′ − I ) v ∗ yields z j = 1 ¯ N ¯ N X i =1 ( a ij + b ij + c ij + d ij + e ij ) where, for i ∈ [ ¯ N ] and j ∈ [ n ], a ij = ( w ( i ) ) 2 β x j h v ∗ , x i b ij = (( u ( i ) j ) 2 − 1) v ∗ j c ij = X k 6 = j u ( i ) j u ( i ) k v ∗ k d ij = u ( i ) j w ( i ) p β h v ∗ , x i e ij = h u ( i ) , v ∗ i w ( i ) p β x j , 17 with E ( a ij ) = β x j h v ∗ , x i and E ( b ij ) = E ( c ij ) = E ( d ij ) = E ( e ij ) = 0. W e w ill sho w separate union b ounds for these ﬁve con tributions to z j . In the follo wing, we ﬁx the constan t µ = 1 / 20. Union b ound for a ij . F or all j ∈ supp( x ), β ℓ √ 3 A 2 ρn ≤ β 1 A √ ρn ℓ √ 3 A √ ρn ≤ | β x j h v ∗ , x i| ≤ β A √ ρn Aℓ √ ρn = A 2 β ℓ ρn , so by Corollary A.3, log P n         1 ¯ N ¯ N X i =1 a ij − β x j h v ∗ , x i       > µ β ℓ A 2 ρn   = log Pr      1 ¯ N χ 2 ¯ N − 1     > µ β ℓ A 2 ρn · 1 | β x j h v ∗ , x i|  ≤ log Pr      1 ¯ N χ 2 ¯ N − 1     > µ A 4  ≤ − µ 2 ¯ N 3 A 8 . Therefore, we ma y union b ound o ve r j ∈ supp( x ): P n         1 ¯ N ¯ N X i =1 a ij − β x j h v ∗ , x i       ≤ µ β ℓ A 2 ρn , for all j ∈ [ n ]   ≥ 1 − X j ∈ supp( x ) P n         1 ¯ N ¯ N X i =1 a ij − β x j h v ∗ , x i       > µ β ℓ A 2 ρn   ≥ 1 − exp  log n − µ 2 ¯ N 3 A 8  ≥ 1 − exp  − µ 2 n 7 A 8 γ  . (21) Union b ound for b ij . W e ha ve b ij nonzero only when j ∈ supp( v ∗ ). F or such j , by Corollary A.3, log P n         1 ¯ N ¯ N X i =1 b ij       > µ β ℓ A 2 ρn   = log Pr      1 ¯ N χ 2 ¯ N − 1     > µ β ℓ A 2 ρn  ≤ − ¯ N 3  µ β ℓ A 2 ρn  2 . Therefore, P n         1 ¯ N ¯ N X i =1 b ij       ≤ µ β ℓ A 2 ρn , for all j ∈ [ n ]   ≥ 1 − X j ∈ supp( v ∗ ) P n         1 ¯ N ¯ N X i =1 b ij       > µ β ℓ A 2 ρn   ≥ 1 − exp log ℓ − ¯ N 3  µ β ℓ A 2 ρn  2 ! ≥ 1 − exp  − µ 2 β 2 12 A 4 γ ℓ 2 ρ 2 n  , (22) 18 under the condition ρ ≤ 1 √ 12 µβ A 2 √ γ s ℓ n log ℓ . Union b ound for c ij and d ij . In the follo wing, let u, u ′ denote indep enden t samp les from N (0 , I ¯ N ). Note that ˜ u ( i ) j : = X k 6 = j u ( i ) k v ∗ k ∼ N (0 , ˜ ℓ j ) , where ˜ ℓ j = ℓ − 1 { j ∈ supp( v ∗ ) } , and ˜ u ( i ) j is in dep enden t from u ( i ) j . Therefore, ¯ N X i =1 c ij = ¯ N X i =1 u ( i ) j ˜ u ( i ) j (d) = q ˜ ℓ j h u, u ′ i . Therefore, by Lemma A.1 , for th e c ij w e ha ve P n         1 ¯ N ¯ N X i =1 c ij       ≤ µ β ℓ A 2 ρn , for all j ∈ [ n ]   ≥ 1 − n X j =1 Pr " |h u, u ′ i| > µ β √ ℓ ¯ N A 2 ρn # ≥ 1 − 2 n exp   − 1 4 ¯ N µ β √ ℓ ¯ N A 2 ρn ! 2   ≥ 1 − exp  − µ 2 β 2 16 A 4 γ ℓ ρ 2 n  . (23) The last inequalit y h olds und er the condition µ β √ ℓ A 2 ρn ≤ 1 2 , µ 2 β 2 16 A 4 γ ℓ ρ 2 n ≥ log (2 n ) , whic h follo ws from 2 µβ √ ℓ A 2 n ≤ ρ ≤ µβ 5 A 2 √ γ s ℓ n log n . Mean while, ¯ N X i =1 d ij (d) = p β h v ∗ , x ih u, u ′ i . Therefore, as for the c ij , for the d ij w e ha ve P n         1 ¯ N ¯ N X i =1 d ij       ≤ µ β ℓ A 2 ρn , for all j ∈ [ n ]   ≥ 1 − n X j =1 Pr  |h u, u ′ i| > µ √ β ¯ N A 3 √ ρn  ≥ 1 − 2 n exp − 1 4 ¯ N  µ √ β ¯ N A 3 √ ρn  2 ! ≥ 1 − exp  − µ 2 β 16 A 6 γ 1 ρ  . (24) 19 The last inequalit y h olds und er the condition µ √ β A 3 √ ρn ≤ 1 2 , µ 2 β 16 A 6 γ 1 ρ ≥ log (2 n ) , whic h follo ws from 4 µ 2 β A 6 n ≤ ρ ≤ µ 2 β 17 A 6 γ log n . Union b ound for e ij . W e ha ve ¯ N X i =1 e ij (d) = x j p β ℓ h u, u ′ i , whic h is only nonzero for j ∈ supp( x ). Therefore, P n         1 ¯ N ¯ N X i =1 e ij       ≤ µ β ℓ A 2 ρn , for all j ∈ [ n ]   ≥ 1 − X j ∈ supp( x ) Pr  |h u, u ′ i| > µ √ β ℓ ¯ N A 2 ρn | x j |  ≥ 1 − X j ∈ supp( x ) Pr  |h u, u ′ i| > µ √ β ℓ ¯ N A 4 √ ρn  ≥ 1 − 2 n exp − 1 4 ¯ N  µ √ β ℓ ¯ N A 4 √ ρn  2 ! ≥ 1 − exp  − µ 2 β 16 A 8 γ ℓ ρ  . (25) The last inequalit y h olds und er the condition µ √ β ℓ A 4 √ ρn ≤ 1 2 , µ 2 β 16 A 8 γ ℓ ρ ≥ log (2 n ) , whic h follo ws from 4 µ 2 β ℓ A 8 n ≤ ρ ≤ µ 2 β ℓ 17 A 8 γ log n . Final steps. No w, combining all of the un ion b ounds and conditions f rom (21), (22), (23), (24) and (25), assuming that β , γ = Θ(1) and that ω (1) ≤ ℓ ( n ) ≤ o ( n/ log n ), under th e cond ition max  1 , β 25 A 8  ℓ n ≤ ρ ≤ β 100 A 2 √ γ s ℓ n log n (26) 20 whic h is equiv alen t to the r egi me for ℓ giv en in (9) that we are considering, w e ha ve P n  for some j, | z j − β x j h v ∗ , x i| > β ℓ 4 A 2 ρn  + P n  h v ∗ , x i 2 ≤ ℓ 2 3 A 2 ρn  ≤ P n   for some j,       1 ¯ N ¯ N X i =1 a ij − β x j h v ∗ , x i       > µ β ℓ A 2 ρn   + P n   for some j,       1 ¯ N ¯ N X i =1 b ij       > µ β ℓ A 2 ρn   + P n   for some j,       1 ¯ N ¯ N X i =1 c ij       > µ β ℓ A 2 ρn   + P n   for some j,       1 ¯ N ¯ N X i =1 d ij       > µ β ℓ A 2 ρn   + P n   for some j,       1 ¯ N ¯ N X i =1 e ij       > µ β ℓ A 2 ρn   + P n  h v ∗ , x i 2 ≤ ℓ 2 3 A 2 ρn  ≤ 5 exp  − β 2 6400 A 4 γ ℓ ρ 2 n  + exp  − β 2 384 A 4 γ ℓ 2 ρ 2 n  ≤ 6 exp  − β 2 6400 A 4 γ ℓ ρ 2 n  . Since | β x j h v ∗ , x i| ≥ β ℓ √ 3 A 2 ρn for j ∈ supp( x ) and | β x j h v ∗ , x i| = 0 for j / ∈ su pp( x ), we conclude that, with pr obabilit y at least 1 − 6 exp  − β 2 6400 A 4 γ ℓ ρ 2 n  , f or ev ery j ∈ [ n ], j ∈ supp( x ) if and only if | z j | ≥ β ℓ 2 √ 3 A 2 ρn , and sign( z j ) = sign( x j h v ∗ , x i ) , completing the pro of. Pr o of of The or e m 2.12 (F ul l R e c overy). By a result in the analysis of co v ariance matrix estimation for subgaussian distribu tio ns ([V er10], Remark 5.51), there exists an absolute constan t C > 0 such that, for an y δ ∈ [ √ ργ /C , 1), the follo wing holds with p robabilit y at least 1 − 2 exp  − C 2 δ 2 N ρ  : k Y I − ( P I + β xx ⊤ ) k ≤ δ k P I + β xx ⊤ k = δ (1 + β ) . Whenev er this is tru e, by the deﬁnition of sp ectral norm we ha v e δ (1 + β ) ≥ ˜ x ⊤ h Y I − ( P I + β xx ⊤ ) i ˜ x = k Y I k − (1 + β h ˜ x, x i 2 ) ≥ (1 − δ )(1 + β ) − (1 + β h ˜ x, x i 2 ) , 21 whic h is equiv alen t to h ˜ x, x i 2 ≥ 1 − ǫ u p on taking δ = β ǫ 2(1+ β ) . Thus, for any ǫ ∈ ( 2(1+ β ) √ ργ C β , 1), Pr  h ˜ x, x i 2 ≤ 1 − ǫ  ≤ 2 exp  − C 2 β 2 nǫ 2 4(1 + β ) 2 γ ρ  , whic h completes the pro of. 3.2 The Wigner Mo del Pr o of of The or e m 2.19 (Dete c tion). F or simplicit y w e denote t = λℓ 2 2 A 2 ρn . Und er P n , when ¯ v cor- rectly guesses ℓ en tries in the su pp ort of x with correct signs (wh ich requires ℓ ≤ ρn ), ¯ v ⊤ Y ¯ v = ¯ v ⊤ W ¯ v + λ h ¯ v , x i 2 , where ¯ v ⊤ W ¯ v ∼ N (0 , ℓ 2 /n ). Note that λ h ¯ v , x i 2 ≥ λℓ 2 A 2 ρn = 2 t. Therefore, a standard Gaussian tail b ound give s P n [ T < t ] ≤ P n h ¯ v ⊤ Y ¯ v < t i ≤ Pr  N (0 , ℓ 2 /n ) > t  ≤ exp − n 2 ℓ 2  λℓ 2 2 A 2 ρn  2 ! = exp  − λ 2 8 A 4 ℓ 2 ρ 2 n  . Under Q n , f or eac h ﬁxed v ∈ I n,ℓ , we hav e v ⊤ Y v ∼ N (0 , 2 ℓ 2 /n ) . By the same tail b oun d, Q n h v ⊤ Y v ≥ t i ≤ exp  − nt 2 4 ℓ 2  . No w, b y a u n ion b ound o v er v ∈ I n,ℓ , Q n [ T ≥ t ] ≤ |I n,ℓ | exp  − nt 2 4 ℓ 2  =  n ℓ  2 ℓ exp  − nt 2 4 ℓ 2  ≤ exp  ℓ log (2 n ) − nt 2 4 ℓ 2  . Under th e cond ition nt 2 8 ℓ 2 ≥ ℓ log (2 n ) ⇐ ρ < λ 6 A 2 s ℓ n log n , 22 whic h is equiv alen t to the interv al for ℓ giv en in (13 ), w e ha ve Q n [ T ≥ t ] ≤ exp  − nt 2 8 ℓ 2  = exp  − λ 2 32 A 4 ℓ 2 ρ 2 n  . Therefore, by thresh olding T at t , u nder the cond itio n ℓ n ≤ ρ ≤ λ 6 A 2 s ℓ n log n , (27) w e can distinguish P n and Q n with total failure p robabilit y at most P n [ T < t ] + Q n [ T ≥ t ] ≤ exp  − λ 2 8 A 4 ℓ 2 ρ 2 n  + exp  − λ 2 32 A 4 ℓ 2 ρ 2 n  ≤ 2 exp  − λ 2 32 A 4 ℓ 2 ρ 2 n  , completing the pro of. Pr o of of The or e m 2.21 (Supp ort and Sign R e c overy). First, w e sho w that v ∗ has signiﬁcant ov erlap with the supp ort of x . F rom the analysis of the detection algorithm, pr o vided (27) h olds, with probabilit y at least 1 − 2 exp  − ¯ λ 2 32 A 4 ℓ 2 ρ 2 n  w e ha ve ¯ λℓ 2 2 A 2 ρn ≤ v ∗ ⊤ Y ′ v ∗ = ¯ λ h v ∗ , x i 2 + v ∗ ⊤ W ′ v ∗ . where v ∗ ⊤ W ′ v ∗ ∼ N (0 , 2 ℓ 2 /n ). Therefore, for n suﬃ cie ntly large, P n  h v ∗ , x i 2 ≥ ℓ 2 4 A 2 ρn  ≥  1 − 2 exp  − ¯ λ 2 32 A 4 ℓ 2 ρ 2 n   1 − Pr  N (0 , 2 ℓ 2 /n ) ≥ ¯ λℓ 2 4 A 2 ρn  ≥ 1 − 2 exp  − ¯ λ 2 32 A 4 ℓ 2 ρ 2 n  − exp  − ¯ λ 2 64 A 4 ℓ 2 ρ 2 n  ≥ 1 − 3 exp  − ¯ λ 2 64 A 4 ℓ 2 ρ 2 n  . W e no w ﬁ x v ∗ satisfying the ab o v e low er b ound on h v ∗ , x i 2 . F rom this p oint onw ard , we will only use the s eco nd copy Y ′′ of our data; it is imp ortan t here that Y ′′ is in dep enden t fr om v ∗ . W e will that x is successfully r eco v ered b y thresholding th e en tries of z = Y ′′ v ∗ . Entrywise, w e ha ve z i = ¯ λx i h v ∗ , x i + e ⊤ i W ′′ v ∗ . F or all i ∈ s u pp( x ), | ¯ λx i h v ∗ , x i| ≥ ¯ λ 1 A √ ρn · ℓ 2 A √ ρn = ¯ λℓ 2 A 2 ρn . F or simplicit y we d en ote s = ¯ λℓ 2 A 2 ρn and µ = 1 3 . Note that for all i ∈ [ n ], e ⊤ i W ′′ v ∗ ∼ N (0 , k v k 2 /n ) = N (0 , ℓ/n ) and therefore P n h | e ⊤ i W ′′ v ∗ | ≥ µs i ≤ 2 exp  − nµ 2 s 2 2 ℓ  . (28 ) 23 By a union b ound o v er all i ∈ [ n ], P n h | e ⊤ i W ′′ v ∗ | ≤ µs f or all i i ≥ 1 − 2 n exp  − nµ 2 s 2 2 ℓ  ≥ 1 − exp  log(2 n ) − nµ 2 s 2 2 ℓ  ≥ 1 − exp  − nµ 2 s 2 4 ℓ  = 1 − exp  − ¯ λ 2 144 A 4 ℓ ρ 2 n  under the condition nµ 2 s 2 4 ℓ ≥ log (2 n ) ⇐ ρ ≤ ¯ λ 13 A 2 s ℓ n log n = λ 13 √ 2 A 2 s ℓ n log n , whic h, com bined w ith (27) , is equiv alent memb ership in the int erv al for ℓ that we are considering p er (15). Th erefore, w ith probabilit y at least 1 − 3 exp  − ¯ λ 2 64 A 4 ℓ 2 ρ 2 n  − exp  − ¯ λ 2 144 A 4 ℓ ρ 2 n  ≥ 1 − 4 exp  − λ 2 288 A 4 ℓ ρ 2 n  for all j ∈ [ n ], j ∈ supp( x ) if and only if | z j | ≥ s 2 and sign( z j ) = sign( x j h v ∗ , x i ) . Th u s , we ﬁ nd that thresholding the en tries of z at s/ 2 successfully reco v ers the supp ort and s igns of x , completing the pr oof. Pr o of of The or e m 2.22 (F ul l R e c overy). S ince Y I ˜ x = λ max ( Y I ) ˜ x , we must hav e supp( ˜ x ) ⊂ I . De- note W I = P I W P ⊤ I and ¯ W I the ℓ × ℓ submatrix of W I with ro ws and columns indexed by I (the only nonzero rows and columns). Now, the v ariational description of the leading eigen ve ctor yields ˜ x ⊤ W I ˜ x + λ h ˜ x, x i 2 = ˜ x ⊤ Y I ˜ x ≥ x ⊤ Y I x = x ⊤ W I x + λ. Therefore, h ˜ x, x i 2 ≥ 1 − 1 λ ( ˜ x ⊤ W I ˜ x − x ⊤ W I x ) ≥ 1 − 1 λ ( λ max ( ¯ W I ) + λ max ( − ¯ W I )) . Note that ¯ W I has the same law as ( ¯ G + ¯ G ⊤ ) / √ 2 n , where ¯ G is an ρn × ρn matrix wh ose entries are indep endent standard n ormal random v ariables. No w, for an y ǫ > 4 √ 2 ρ λ , we hav e √ 2 nλǫ 4 > 2 √ ρn , a standard singular v alue estimate for Gaussian matrices (see [V er10], Corollary 5.35) giv es P n  h ¯ x, x i 2 ≤ 1 − ǫ  ≤ Pr  λ max ( ¯ W I ) + λ max ( − ¯ W I ) ≥ λǫ  ≤ 2 Pr  λ max ( ¯ W I ) ≥ λǫ 2  ≤ 2 Pr " σ max ( ¯ G ) ≥ √ 2 nλǫ 4 # ≤ 4 exp  − n 16  λǫ − 4 p 2 ρ  2  , whic h conclues th e pr oof. 24 4 Pro ofs for Lo w-Degree Lik eliho o d Ratio B ounds 4.1 Lo w-Degree Likel iho o d Ratio for Spik ed Mo dels W e b egin b y giving expressions for the norm of the lo w -degree lik eliho od ratio (LDLR) for the spik ed Wigner and Wishart mo dels. T hese expressions are d eriv ed in [KWB19] and [BKW19], resp ectiv ely . Lemma 4.1 ( D -LDLR for s p ik ed Wigner mo del [KWB19]) . L et L ≤ D n,λ, X denote the de gr e e- D like- liho o d r atio for the spike d Wigner mo del with p ar ameters n, λ and spike prior X . Then, k L ≤ D n,λ, X k 2 = E v (1) ,v (2) ∼X n " D X d =0 1 d !  n 2 λ 2 h v (1) , v (2) i 2  d # (29) wher e v (1) , v (2) ar e dr awn indep endently f r om X n . Lemma 4.2 ( D -LDLR for spiked Wishart mo del [BKW19]) . L et L ≤ D n,N ,β , X denote the de gr e e - D likeliho o d r atio for the spike d Wishart mo del with p ar ameters n, N , β and spike prior X . Deﬁne ϕ N ( x ) : = (1 − 4 x ) − N/ 2 (30) ϕ N ,k ( x ) : = k X d =0 x d X d 1 ,...,d N P d i = d N Y i =1  2 d i d i  , (31) so that ϕ N ,k ( x ) is the T aylor series of ϕ N ar ound x = 0 trunc ate d to de gr e e k . Then, k L ≤ D n,N ,β , X k 2 2 = E v (1) ,v (2) ∼X n " ϕ N , ⌊ D/ 2 ⌋ β 2 h v (1) , v (2) i 2 4 !# = E v (1) ,v (2) ∼X n ⌊ D/ 2 ⌋ X d =0     X d 1 ,...,d N P d i = d N Y i =1  2 d i d i      β 2 h v (1) , v (2) i 2 4 ! d , (32) wher e v (1) , v (2) ar e dr awn indep endently f r om X n . W e consid er a signal x dra wn from the sparse Rademac her prior, X n = X ρ n . Th e goal of this section is to prov e up p er and lo w er b ound s on the LDLR expressions in (29) and (32) as n → ∞ , for certain regimes of the parameters ( λ, ρ for the Wigner mod el and β , γ , ρ for the Wishart mo del). These b oun d s are obtained in seve ral steps . Firs t, we treat the moment terms A d : = ( nρ ) 2 d E v (1) ,v (2) ∼X ρ n h v (1) , v (2) i 2 d (33) from (29) and (32) in Section 4.2 , with upp er b ou n ds giv en in Lemmas 4.4 and 4.5 and a lo wer b ound giv en in Lemma 4.6. W e then giv e a precise estimate in Lemma 4.7 of the co eﬃcien t X d 1 ,...,d N P d i = d N Y i =1  2 d i d i  in the LDLR (32) of the Wishart mo del. Finally , by combining the ab o ve b ounds, we show regimes of p arameters un d er wh ich the LDLR either remains b ounded or div erges as n → ∞ . This yields the pro ofs of Theorems 2.14 and 2.15 for the Wish art mo del, and T h eorems 2.24 and 2.25 for the Wigner mo del. 25 4.2 In tro duction and Estimates of A d In th is section, we carry out com binatorial estimates of the momen ts A d deﬁned in (33), which app ear in th e LDLR expressions (29) an d (32). W e giv e u p p er b oun ds (Lemmas 4.4 and 4.5) and a low er b ound (Lemma 4.6) on these moment s. F or indep endent v (1) , v (2) dra wn fr om the sparse Rademac her prior X ρ n , h v (1) , v (2) i has the same distribution as S n,ρ = 1 n P n i =1 s i,ρ for i.i.d. s i,ρ with s i,ρ =    +1 /ρ with probability ρ 2 / 2 , − 1 /ρ with probability ρ 2 / 2 , 0 with pr obabilit y 1 − ρ 2 , (34) and k th momen t (for k > 0) giv en b y E s k i,ρ =  0 for k o dd , ρ 2 − k for k ev en . (35) Therefore, the moments of h v (1) , v (2) i h a ve the combinatorial d escription E v (1) ,v (2) ∼X ρ n h v (1) , v (2) i 2 d = n − 2 d E S 2 d n,ρ = n − 2 d X i 1 ,...,i 2 d ∈ [ n ] E s i 1 ,ρ s i 2 ,ρ . . . s i 2 d ,ρ = n − 2 d X a 1 ,...,a n ≥ 0 P a i = d  2 d 2 a 1 · · · 2 a n  E Y a j > 0 s 2 a j j,ρ = n − 2 d X a 1 ,...,a n ≥ 0 P a i = d  2 d 2 a 1 · · · 2 a n  Y a j > 0 ρ 2 − 2 a j = n − 2 d X a 1 ,...,a n ≥ 0 P a i = d  2 d 2 a 1 · · · 2 a n  ρ 2 |{ i : a i > 0 }|− 2 d . (36) Recall, fr om (33), th at A d = ( nρ ) 2 d E v (1) ,v (2) ∼X ρ n h v (1) , v (2) i 2 d = X a 1 ,...,a n ≥ 0 P a i = d  2 d 2 a 1 · · · 2 a n  ρ 2 |{ i : a i > 0 }| . Fix d ≤ D n , and let 1 ≤ k ≤ d b e the num b er of p ositiv e num b er s among the { a i } . Sup p ose d = w k + r , where 0 ≤ r < k . F or p ositiv e integ ers b 1 , b 2 , . . . , b k suc h that P b i = d , we claim that  2 d 2 b 1 · · · 2 b k  ≤  2 d 2 w · · · 2 w 2( w + 1) · · · 2( w + 1)  = : M ( k ) , (37) and that equalit y holds if and only if { b i } consists of r copies of ( w + 1) and k − r copies of w . This follo ws from the simple fact that, for any 1 ≤ i, j ≤ k suc h that b i ≥ b j + 2, we ha ve (2 b i )!(2 b j )! > (2( b i − 1))!(2( b j + 1))!, and therefore the left-hand side of the ab o v e inequalit y b ecomes strictly larger as w e “unbalance” the b i b y r ep lac ing b i and b j with b i − 1 and b j + 1. As a result, the left-hand side is maximized if and only if th e maxim um and minim um of { b 1 , b 2 , . . . , b k } 26 diﬀer by at most 1. No w, since the total num b er of p ositiv e int eger solutions to P k i =1 b i = d is  d − 1 k − 1  , we ha v e A d ≤ d X k =1  n k  d − 1 k − 1  ρ 2 k M ( k ) . (38) Before pr oceeding to b oun ds on the A d , we in tro duce the follo wing r esult, w hic h will b e usefu l in several estimates in this section. Lemma 4.3. Supp ose D n = o ( n ) . Then, for suﬃciently lar g e n , n ( n − 1) · · · ( n − D n + 1) n D n > 1 2 e − D 2 n /n . Pr o of. By Stirling’s formula, as n → ∞ and n − D n → ∞ (whic h is ensured by D n = o ( n )), n ( n − 1) · · · ( n − D n + 1) n D n = n ! ( n − D n )! n D n ∼ √ 2 π n ( n e ) n p 2 π ( n − D n )( n − D n e ) n − D n n D n ∼ 1 e D n  1 + D n n − D n  n − D n . Here the relation ∼ means that the quotien t of the qu an tities on either side tends to 1 as n → ∞ . Since D n = o ( n ), for large enough n s uc h that D n < n 3 , we ha v e log " 1 e D n  1 + D n n − D n  n − D n # = − D n + ( n − D n ) log  1 + D n n − D n  ≥ − D n + ( n − D n )  D n n − D n − D 2 n 2( n − D n ) 2  ≥ − D 2 n n , and the lemma f oll ows. Lemma 4.4 (First upp er b ound on A d ) . In the setting of the spike d Wishart or Wigner mo del with sp arse R ademacher prior X ρ n , su pp ose D n = o ( n ) . If for some µ > 0 we have ρ ≥ max  1 , r 1 6 µ  r D n n , then for suﬃciently lar ge n and for any 1 ≤ d ≤ D n , A d deﬁne d b y (33) satisﬁes A d ≤ d X k =1 G ( k ) ≤ 2 de µd + d 2 n G ( d ) = 2 de µd + d 2 n  n d  (2 d )! 2 d ρ 2 d , (39) wher e G ( k ) : =  n k  d − 1 k − 1  ρ 2 k M ( k ) . 27 Pr o of. Fix 1 ≤ d ≤ D n . Recall that the ﬁrst in equalit y in (39) is a restatemen t of (38). By a simple comparison argument, we observ e that M ( k ) is monotone increasing w ith resp ect to k . F or an y 1 ≤ k < d 2 , we ha v e G ( k + 1) G ( k ) =  n k +1  d − 1 k  ρ 2 k + 2 M ( k + 1)  n k  d − 1 k − 1  ρ 2 k M ( k ) ≥ ( n − k )( d − k ) k ( k + 1) ρ 2 ≥ ( n − d )( d 2 ) k ( k + 1) d n > 1 if ρ ≥ q D n n . Therefore, X k < d 2 G ( k ) ≤  d 2  X k ≥ d 2 G ( k ) . Mean while, for d 2 ≤ k < d , by (37) , M ( k ) = max b 1 ,...,b k > 0 P b i = d  2 d 2 b 1 · · · 2 b k  =  2 d 2 · · · 2 4 · · · 4  = (2 d )! 24 d − k 2 2 k − d , since the m aximum is attained when { b i } h as ( d − k ) o ccurrences of 2 and (2 k − d ) o ccurrences of 1. As a result, with the help of Lemma 4.3, G ( k ) G ( d ) =  n k  d − 1 k − 1  ρ 2 k M ( k )  n d  ρ 2 d M ( d ) ≤ 1 (6 ρ 2 ) d − k ·  n k  d k   n d  ≤ 1 (6 ρ 2 ) d − k · d ! ( n − k )( n − k − 1) . . . ( n − d + 1) k ! · d ! k !( d − k )! ≤ 1 (6 ρ 2 ) d − k · 2 e d 2 /n n d − k ·  d ! k !  2 1 ( d − k )! ≤ 2 e d 2 /n  d 2 6 ρ 2 n  d − k · 1 ( d − k )! . Th u s for ρ ≥ q D 6 µn ≥ q d 6 µn w e ha ve d 2 6 ρ 2 n ≤ µ d and d X k =1 G ( k ) ≤ ( ⌊ d/ 2 ⌋ + 1) X k ≥ d 2 G ( k ) ≤ 2 de d 2 /n X k ≥ d 2 ( µd ) d − k ( d − k )! G ( d ) ≤ 2 de µd + d 2 /n G ( d ) , completing the pro of. Lemma 4.5 (Second up p er b ound on A d ) . In the setting of the spike d Wishart or Wigner mo del with sp arse R ademacher prior X ρ n , su pp ose D n = o ( n ) . If for some µ < 1 / √ 3 we have ρ ≥ µ r D n n , then for suﬃciently lar ge n and for any 11 ≤ d ≤ D n , A d ≤ d X k =1 G ( k ) . √ de d 2 /n  11 e 30  d/ 2 µ − 2 d G ( d ) = √ de d 2 /n  11 e 30  d/ 2 µ − 2 d  n d  (2 d )! 2 d ρ 2 d , wher e A d is deﬁne d in (33) and G ( k ) is deﬁne d as i n L emma 4.4. 28 Pr o of. As in the pro of of Lemma 4.4, for suﬃcien tly large n and for any 1 ≤ d ≤ D n and any 1 ≤ k < d 2 , we hav e G ( k + 1) G ( k ) ≥ ( n − k )( d − k ) k ( k + 1) ρ 2 ≥ ( n − k )( d − k ) k ( k + 1)  µ 2 d n  ≥ µ 2 . Therefore, X k < d 2 G ( k ) ≤ ( µ − 2 + µ − 4 + · · · + µ − 2( ⌈ d/ 2 ⌉− 1 ) ) G ( ⌈ d/ 2 ⌉ ) . µ − 2 ⌈ d/ 2 ⌉ X k ≥⌈ d/ 2 ⌉ G ( k ) . Mean while, for d 2 ≤ k < d , G ( k ) G ( d ) ≤ 2 e d 2 /n  d 2 6 ρ 2 n  d − k · 1 ( d − k )! ≤ 2 e d 2 /n  d 6 µ 2  d − k · 1 ( d − k )! . Summing these quantiti es, we ﬁnd (the last inequalit y requirin g d ≥ 11) P k ≥ d/ 2 G ( k ) G ( d ) ≤ 2 e d 2 /n X k ≤ d/ 2  d 6 µ 2  k · 1 k ! . de d 2 /n  d 6 µ 2  ⌊ d/ 2 ⌋ · 1 ⌊ d/ 2 ⌋ ! . √ de d 2 /n  ed 6 µ 2 ⌊ d/ 2 ⌋  ⌊ d/ 2 ⌋ . √ de d 2 /n  11 e 30  ⌊ d/ 2 ⌋ µ − 2 ⌊ d/ 2 ⌋ . Here w e h a ve u sed the fact that ( d/ 6 µ 2 ) k /k ! is monotone increasing for 1 ≤ k ≤ d 2 , sin ce d/ 6 µ 2 > d/ 2. Combining the tw o cases, we conclude that P d k =1 G ( k ) G ( d ) . (1 + µ − 2 ⌈ d/ 2 ⌉ ) P k ≥ d/ 2 G ( k ) G ( d ) . √ de d 2 /n  11 e 30  d/ 2 µ − 2 d , completing the pro of. Lemma 4.6 (Lo wer b ound on A d ) . In the settings of the spike d Wishart or Wigner mo dels with sp arse R ademacher prior X ρ n , c onsider a series d = d n = o ( n ) with inte gers w = w n satisfying w n | d n . Then, as d/w → ∞ , A d deﬁne d by (33) satisﬁes A d ≥ (1 − o (1))  n d  (2 d )! √ w 2 d " 2  d neρ 2  1 − 1 w  w (2 w )!  1 w # d ρ 2 d . Pr o of. T o obtain a lo wer b ound on A d , we only consider the contribution to the sum fr om terms { a i } with d w -man y o ccurrences of w and ( n − d w )-man y o ccurrences of zero: ρ − 2 d A d = X a 1 ,...,a n ≥ 0 P a i = d  2 d 2 a 1 · · · 2 a n  ρ 2 |{ i : a i > 0 }|− 2 d ≥  n d/w  (2 d )! [(2 w )!] d/w ρ 2 d w − 2 d = : T n,d ( w ) . 29 No w, w e calculate the r atio T n,d ( w ) T n,d (1) = d !( n − d )! ( d w )!( n − d w )! 2 d [(2 w )!] d w ρ 2 d w − 2 d = d ! / ( d w )! ( n − d w )  n − d w − 1  · · · ( n − d + 1) " 2 [(2 w )!] 1 w ρ 2(1 − 1 w ) # d . (40) By S tirling’s f orm u la, for w ﬁxed and d suﬃciently large, d ! ( d/w )! = (1 + o (1)) √ w ( d/e ) d − d w w d w . Mean while,  n − d w   n − d w − 1  · · · ( n − d + 1) ≤ n d − d w . Plugging into (40) we get T n,d ( w ) T n,d (1) ≥ (1 − o (1)) √ w " 2  d neρ 2  1 − 1 w  w (2 w )!  1 w # d , completing the pro of. 4.3 The Wishart Model In th is s ect ion, we ﬁrs t carry out an estimate on the extra coeﬃcien t o ccurring in the Wishart LDLR (32) (Lemma 4.7 ), then use the b oun ds on A d (Lemmas 4.4, 4.5 and 4.6) to p ro v e the u pp er b ound (Theorem 2.14) and the low er b ound (Theorem 2.15) on (32). Lemma 4.7 (Bounds on co eﬃcien t in Wishart LDLR) . Supp ose D n = o ( N ) . Ther e exist absolute c onstants c 1 , c 2 > 0 such that, for suﬃciently lar g e N , for any 1 ≤ d ≤ D n , (2 N ) d d ! ≤ X d 1 ,...,d N P d i = d N Y i =1  2 d i d i  ≤ c 1 d 3 / 2 e c 2 d 2 / N (2 N ) d d ! . (41) Pr o of. F or the lo w er b ound, n ote that for an y d i ≥ 2,  2 d i d i  = 2 d i (2 d i − 1) d 2 i  2( d i − 1) d i − 1  ≥  2 1  2( d i − 1) d i − 1  , so for an y d 1 , . . . , d N ≥ 0 such that P d i = d , N Y i =1  2 d i d i  ≥ 2 d . Summing ov er all of the { d i } , we ﬁnd X d 1 ,...,d N P d i = d N Y i =1  2 d i d i  ≥  N + d − 1 d  2 d ≥ (2 N ) d d ! . 30 F or the u pp er b ound, we separately consider those { d i } ’s with exactly k p ositiv e en tries for eac h k = 1 , 2 , . . . , d : X d 1 ,...,d N P d i = d N Y i =1  2 d i d i  = d X k =1  N k  X c 1 ,...,c k > 0 P c i = d k Y i =1  2 c i c i  ≤ d X k =1  N k  d − 1 k − 1  max c 1 ,...,c k > 0 P c i = d k Y i =1  2 c i c i  . Giv en an y p ositiv e in tegers c 1 , . . . , c k suc h that P c i = d , if there are t w o entries c j ≥ c ℓ ≥ 2, consider ˜ c j = c j + 1, ˜ c ℓ = c ℓ − 1 and ˜ c i = c i for all i 6 = j, ℓ . W e ha v e the comparison Q k i =1  2˜ c i ˜ c i  Q k i =1  2 c i c i  = 4 − 2 c j +1 4 − 2 c l +1 > 1 . Therefore, f or ﬁxed k , the pro duct Q k i =1  2 c i c i  is maximized when { c i } is comp osed of ( k − 1) o ccurrences of 1 and one o ccurrence of ( d − k + 1). As a result, X d 1 ,...,d N P d i = d N Y i =1  2 d i d i  ≤ d X k =1  N k  d − 1 k − 1  · 2 k − 1  2( d − k + 1) d − k + 1  = : d X k =1 S ( k ) . Since S ( k ) =  N k  d − 1 k − 1  · 2 k − 1  2( d − k + 1) d − k + 1  ≤ N k k !  d k  · 2 k  2( d − k ) d − k  (42) and S tirling’s f ormula giv es k ! &  k e  k ,  d k  . d d k k ( d − k ) d − k ,  2( d − k ) d − k  . 4 d − k , d ! . √ d  d e  d , substituting into (42) and d en oti ng k = (1 − η ) d , we ﬁn d S ( k )  (2 N ) d d !  − 1 .  N e k  k · d d k k ( d − k ) d − k · 2 2 d − k · 1 √ d  2 N e d  d ! − 1 = √ d  2 d N eη  ηd 1 (1 − η ) 2(1 − η ) d . No w, for η ∈ (0 , 1], denote h ( η ) : =  2 d N eη  η 1 (1 − η ) 2(1 − η ) . 31 Then, X d 1 ,...,d N P d i = d N Y i =1  2 d i d i  ≤ d X k =1 S ( k ) ≤ d max 1 ≤ k ≤ d S ( k ) . d 3 / 2 (2 N ) d d ! " sup η ∈ (0 , 1] h ( η ) # d . (43) The last step is to ev aluate h ( η ). Note that d dη [log h ( η )] = log  2 d N  − log ( η ) + 2 log (1 − η ) . Since 2 d N ≤ 2 D n N = o (1) as n → ∞ , for large N the unique maximizer of h h as th e f orm η ∗ = η ∗ ( N , d ) = (2 − o (1)) d N , and consequently sup η ∈ (0 , 1] h ( η ) = h ( η ∗ ) =  2 e (2 − o (1))  η ∗ (1 − η ∗ ) − 2(1 − η ∗ ) ≤ e 4 η ∗ (1 − η ∗ ) ≤ e c 2 d/ N . Substituting in to (43) then completes the pr oof. Pr o of of The or e m 2.14(a). Supp ose (10) holds. Let µ = − log ˆ λ . In the setting of Lemma 4.7, note that lim n →∞ D n N = 0 , lim inf n →∞  − 1 2 log ˆ λ n  = − 1 2 log  lim sup n →∞ ˆ λ n  > 0 . F or n large enough so that ( c 2 + 1 γ ) D n N < − 1 2 log ˆ λ n , applying Lemma 4.4 in the expression of (32) yields k L ≤ D n,N ,β , X k 2 2 = E v (1) ,v (2) ∼X ρ n ⌊ D/ 2 ⌋ X d =0     X d 1 ,...,d N P d i = d N Y i =1  2 d i d i      " β 2 h v (1) , v (2) i 2 4 # d . ⌊ D/ 2 ⌋ X d =1  d 3 / 2 (2 N ) d d ! e c 2 d 2 / N  "  β 2 4  d ( nρ ) − 2 d · 2 de µd + d 2 /n  n d  (2 d )! 2 d ρ 2 d # ≤ ⌊ D/ 2 ⌋ X d =1 2 d 5 / 2 e c 2 dD/ N (2 d )! 4 d ( d !) 2  e µ + D/n  N n  β 2  d . ∞ X d =1 d 2  e ( c 2 + 1 γ ) D N e µ ˆ λ 2  d ≤ ∞ X d =1 d 2 ( ˆ λ 1 / 2 ) d = O (1) , where the last equalit y is by the assumption that lim sup n →∞ ˆ λ n < 1. 32 Pr o of of The or e m 2.14(b ). If (11) h olds, then ˆ λ n ≤ 1 / √ 3 for suﬃcient ly large n . In the setting of Lemma 4.7, supp ose ( c 2 + 1 γ ) D N < 0 . 001. Then, substituting the estimates in Lemm a 4.5 (taking µ = ˆ λ ) into (32) gives k L ≤ D n,N ,β , X k 2 2 . ⌊ D/ 2 ⌋ X d =11  d 3 / 2 (2 N ) d d ! e c 2 d 2 / N  "  β 2 4  d ( nρ ) − 2 d · √ de d 2 /n  11 e 30  d 2 ˆ λ − 2 d  n d  (2 d )! 2 d ρ 2 d # . ⌊ D/ 2 ⌋ X d =11 d 2 e ( c 2 + 1 γ ) dD N (2 d )! 4 d ( d !) 2  β 2  N n  ˆ λ − 2  d  11 e 30  d/ 2 . ∞ X d =11 d 3 / 2 e 0 . 001 r 11 e 30 ! d = O (1) , completing the pro of. Pr o of of The or e m 2.15(a). In (36), only coun ting the terms { a i } with d o ccurrences of 1 and ( n − d ) o ccurrences of zero yields E v (1) ,v (2) ∼X ρ n h v (1) , v (2) i 2 d ≥ n − 2 d  n d  (2 d )! 2 d . (44) F rom Lemma 4.3 w e hav e that w hen D n = o ( n ), for suﬃcient ly large n and for any 1 ≤ d ≤ D n ,  n d  = n ( n − 1) . . . ( n − d + 1) n d · n d d ! ≥ 1 2 e − d 2 /n n d d ! ≥ 1 2 e − dD n /n n d d ! . (45) Substituting Lemma 4.7, (44) an d (45) int o (32) yields, f or suﬃciently large n , k L ≤ D n,N ,β , X k 2 2 ≥ D n X d =1  (2 N ) d d !   β 2 d 4 d n − 2 d  n d  (2 d )! 2 d  & D n X d =1 (2 d )! 4 d ( d !) 2  β 2 γ  d e − dD n /n & D n X d =1 1 √ d  ˆ λ 2 n e − D n /n  d ≥ D n X d =1 1 √ d = ω (1) , since D n = ω (1), lim inf n →∞ ˆ λ n > 1 and e − D n /n → 1. Lemma 4.8. Supp ose ω (1) ≤ D n ≤ o ( n ) . If ther e e xists a series of p ositive inte ge rs w n = o ( √ D n ) such that lim inf n →∞ 2 ˆ λ 2 n  D n 2 neρ 2 n  1 − 1 w n  w n (2 w n )!  1 w n > 1 , (46) then k L ≤ D n n,N ,β , X k 2 2 → ∞ as n → ∞ . 33 Pr o of. If (46) holds, w e can c ho ose an ǫ > 0 such that, for suﬃcient ly large n , 2 ˆ λ 2 n  D n 2 neρ 2 n  1 − 1 w n  w n (2 w n )!  1 w n > 1 + ǫ. Let n satisfy the ab o ve inequalit y . Pic k µ ∈ (0 , 1) such th at µ 1 − 1 w n (1 + ǫ ) > 1 , whic h implies 2  µD n 2 neρ 2 n  1 − 1 w n  w n (2 w n )!  1 w n > γ β 2 . (47) In the sum (32), w e only consider those d ∈ ( µD n / 2 , ⌊ D n / 2 ⌋ ) that are m ultiples of w n . By Lemma 4.7, Lemma 4.6, and (47), E v (1) ,v (2) ∼X ρ n     X d 1 ,...,d N P d i = d N Y i =1  2 d i d i      β 2 h v (1) , v (2) i 2 4 ! d & (2 N ) d d ! "  β 2 4  d ( nρ ) − 2 d  n d  (2 d )! √ w n 2 d ρ 2 d ·  γ β 2  d # & (2 d )! 4 d ( d !) 2  N γ n  d & 1 √ d . Therefore k L ≤ D n n,N ,β , X k 2 2 & X µD n / 2 0 . F or large enough n that D n n < − 1 2 log λ n , applying Lemma 4.4 in the expr ession of (29) yields k L ≤ D n n,λ, X k 2 = E v (1) ,v (2) ∼X ρ n " D n X d =0 1 d !  n 2 λ 2  d h v (1) , v (2) i 2 d # . D n X d =1 1 d !  n 2 λ 2  d ( nρ ) − 2 d · 2 de µd + d 2 /n  n d  (2 d )! 2 d ρ 2 d . D n X d =1 d (2 d )! 4 d ( d !) 2 ( e µ + D n /n λ 2 ) d . D n X d =1 1 √ d ( λ 1 / 2 ) d = O (1) , where the last equation is by the assump tion lim sup n →∞ λ n < 1. Pr o of of The or e m 2.24(b ). By the assumption (17 ), for suﬃcien tly large n w e ha v e λ n ≤ 1 / √ 3. No w, Theorem 2.24(b) immediately follo w s from Lemma 4.5 (taking µ = λ ), since for large en ou gh 35 n that D n n < 0 . 001, we ha ve k L ≤ D n n,λ, X k 2 . D n X d =11 1 d !  n 2 λ 2  d ( nρ ) − 2 d · √ de d 2 /n  11 e 30  d/ 2 λ − 2 d  n d  (2 d )! 2 d ρ 2 d . ∞ X d =11 √ d (2 d )! 4 d ( d !) 2 e d 2 /n  11 e 30  d/ 2 . ∞ X d =11 e D n /n r 11 e 30 ! d . ∞ X d =11 e 0 . 001 r 11 e 30 ! d = O (1) , completing the pro of. Pr o of of The or e m 2.25(a). Substituting (44) and (45) in to (29) yields k L ≤ D n n,λ, X k 2 2 ≥ D n X d =1 1 d !  n 2 λ 2  d · n − 2 d  n d  (2 d )! 2 d & D n X d =1 (2 d )! 4 d ( d !) 2 λ 2 d e − dD n /n & D n X d =1 1 √ d  λ 2 e − D n /n  d ≥ D n X d =1 1 √ d = ω (1) , since D n = ω (1), lim inf n →∞ λ n > 1 and e − D n /n → 1. Lemma 4.9. Supp ose ω (1) ≤ D n ≤ o ( n ) . If ther e e xists a series of p ositive inte ge rs w n = o ( √ D n ) such that lim inf n →∞ 2 λ 2 n  D n neρ 2 n  1 − 1 w n  w n (2 w n )!  1 w n > 1 (49) then k L ≤ D n n,λ, X k 2 2 → ∞ as n → ∞ . Pr o of. If (49) holds, w e can c ho ose an ǫ > 0 such that for suﬃcientl y large n , 2 λ 2 n  D n neρ 2 n  1 − 1 w n  w n (2 w n )!  1 w n > 1 + ǫ. Let n satisfy the ab o ve inequalit y . Pic k µ ∈ (0 , 1) such th at µ 1 − 1 w n (1 + ǫ ) > 1 . 36 In the sum (29) w e only consider those d > µD n that are multiples of w n . F or eac h of them, Lemma 4.6 giv es 1 d !  n 2 λ 2 n  d E h v (1) , v (2) i 2 d & 1 d !  n 2 λ 2 n  d · n − 2 d  n d  (2 d )! 2 d " 2  d neρ 2  1 − 1 w n  w n (2 w n )!  1 w # d ≥ (2 d )! 4 d ( d !) 2 · n ( n − 1) · · · ( n − d + 1) n d · " 2 λ 2  µD n neρ 2  1 − 1 w n  w n (2 w n )!  1 w n # d & 1 √ d . Therefore, k L ≤ D n n,λ, X k 2 2 & X µD n

Subexponential-Time Algorithms for Sparse PCA

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment