Anisotropic local law for non-separable sample covariance matrices

We establish local laws for sample covariance matrices $K = N^{-1}\sum_{i=1}^N \g_i\g_i^*$ where the random vectors $\g_1, \ldots, \g_N \in \R^n$ are independent with common covariance $Σ$. Previous work has largely focused on the separable model $\g…

Authors: Zhou Fan, Renyuan Ma, Elliot Paquette

Anisotropic local law for non-separable sample covariance matrices
ANISOTR OPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES ZHOU F AN ∗ , RENYUAN MA ∗ , ELLIOT P A QUETTE † , AND ZHICHA O W ANG ‡ Abstract. W e establish lo cal la ws for sample co v ariance matrices K = N − 1 P N i =1 g i g ∗ i where the random v ectors g 1 , . . . , g N ∈ R n are indep enden t with common co v ariance Σ. Previous w ork has largely fo cused on the separable model g = Σ 1 / 2 w with w having indep endent entries, but this structure is rarely presen t in statistical applications in volving dep enden t or nonlinearly transformed data. Under a concen tration assumption for quadratic forms g ∗ A g , we prov e an optimal av eraged lo cal la w showing that the Stieltjes transform of K conv erges to its deterministic limit uniformly do wn to the optimal scale η ≥ N − 1+ ε . Under an additional structural assumption on the cumulan t tensors of g — which in terp olates b et ween the highly structured case of independent entries and generic dep endence — w e establish the full anisotropic local la w, providing en trywise con trol of the resolv ent ( K − zI ) − 1 in arbitrary directions. W e discuss several classes of non-separable examples satisfying our assumptions, including conditionally mean-zero distributions, the random features mo del g = σ ( X w ) arising in mac hine learning, and Gaussian measures with nonlinear tilting. The pro ofs introduce a tensor netw ork framework for analyzing fluctuation av eraging in the presence of higher-order cum ulant structure. Contents 1. In tro duction 2 1.1. Sample cov ariance matrices and the deformed Marc henk o-P astur la w 2 1.2. Lo cal laws: from global to optimal scale 3 1.3. Bey ond the separable model: the main question 4 1.4. Main contributions 4 1.5. Related work 5 1.6. Notation 7 2. Main results 7 2.1. Assumptions 7 2.2. Deformed Marchenk o-P astur law and regular sp ectral domains 9 2.3. Lo cal law for the Stieltjes transform 11 2.4. Anisotropic lo cal la w for the linearized resolven t 12 2.5. Examples 14 2.6. Negativ e examples 16 2.7. Pro of ideas 18 3. Fluctuation av eraging lemmas 21 3.1. Preliminaries 23 3.2. Sherman-Morrison recursions 25 3.3. Pro of of Lemma 3.4 27 3.4. Pro of of Lemma 3.5 29 ∗ Dep ar tment of St a tistics and D a t a Science, Y ale University † Dep ar tment of Ma thema tics and St a tistics, McGill University ‡ Dep ar tment of St a tistics and ICSI, University of California Berkeley E-mail addresses : zhou.fan@yale.edu, jack.ma.rm2545@yale.edu, elliot.paquette@mcgill.ca, zhichao.wang@berkeley.edu . 1 2 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 3.5. Pro of of Lemma 3.6 33 4. Pro ofs of the main results 53 4.1. Resolv en t bounds 53 4.2. Pro of of en trywise law (Theorem 2.5) 56 4.3. Pro of of anisotropic law (Theorems 2.8) 64 4.4. Pro of outside the sp ectrum (Theorem 2.10) 67 5. Analysis of examples 68 5.1. Separable distributions 68 5.2. Conditionally mean-zero distributions 68 5.3. Random features mo del 72 5.4. Random features tilt 88 Ac kno wledgments 94 References 95 1. Introduction 1.1. Sample co v ariance matrices and the deformed Marchenk o-Pastur la w. Random ma- trices arise naturally across mathematics, statistics, and ph ysics. In statistics, Wishart [ Wis28 ] in tro duced random co v ariance matrices in 1928 for m ultiv ariate analysis, while in ph ysics, Wigner [ Wig55 ] employ ed random matrices in 1955 to mo del energy lev els in heavy n uclei. A fundamen tal random matrix in statistics inv olv es the observ ation of N indep endent random v ectors g 1 , . . . , g N ∈ R n equal in distribution to a v ector g ∈ R n with mean zero and co v ariance matrix Σ. Understanding the sp ectral prop erties of the asso ciated sample cov ariance matrix K = 1 N N X i =1 g i g ∗ i = 1 N GG ∗ , where G = [ g 1 , . . . , g N ] ∈ R n × N , is central to co v ariance estimation, principal comp onen t analysis, and n umerous other statistical pro cedures [ BS10 ]. The asso ciated Gr am matrix e K = 1 N G ∗ G ∈ R N × N is cen tral to kernel metho ds and has the same nonzero eigenv alues as K . Man y problems in optimization theory inv olving random data, as well as related questions in machine learning, can also be reduced to understanding the singular v alue and singular v ector structure of the data matrix G [ BHMM19 ; BLL T20 ; MM22 ; PP AP25 ]. Bey ond classical statistics, the sp ectral theory of suc h matrices has found applications in wireless communications, where the c hannel capacity of MIMO systems is determined by the log-determinant 1 n log det( I + SNR · K ) [ TV04 ], as well as in k ernel metho ds [ El 10 ], random features regression [ RR07 ], and the analysis of neural netw orks [ PW17 ; JGH18 ]; see [ CL22 ] for a comprehensiv e in tro duction to random matrix theory in mac hine learning. In the high-dimensional regime where the dimension n and sample size N gro w prop ortionally , n/ N → γ ∈ (0 , ∞ ), the classical theory initiated by Marchenk o and Pastur [ MP67 ] describ es the limiting sp ectral distribution of K . When Σ = I n and g has i.i.d. entries with mean zero and v ariance one, the empirical sp ectral distribution of K con verges w eakly to the Mar chenko-Pastur distribution with density ρ MP ( x ) = (1 − 1 /γ ) + δ 0 ( x ) + 1 2 π γ x p ( x − λ − )( λ + − x ) 1 [ λ − ,λ + ] ( x ) , where λ ± = (1 ± √ γ ) 2 are the sp ectral edges. ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 3 F or general cov ariance Σ, a deterministic approximation for the sp ectral distribution is the deforme d Mar chenko-Pastur law µ 0 , c haracterized through its Stieltjes transform m 0 ( z ) = R ( x − z ) − 1 dµ 0 ( x ) which satisfies the self-consistent equation m 0 ( z ) = 1 n n X α =1 1 σ α (1 − γ − γ z m 0 ( z )) − z , (1.1) where σ 1 , . . . , σ n are the eigenv alues of Σ. This law can also b e understo o d as the multiplicativ e free con volution of the standard Marc henko-P astur law with the empirical eigenv alue distribution of Σ [ V oi87 ]. A sufficien t condition for the conv ergence of the empirical sp ectral distribution of K to the deformed Marchenk o-Pastur la w was established by Bai and Zhou [ BZ08 ]: the weak conv ergence in probability of quadratic forms 1 n ( g ∗ A g − T r Σ A ) P − → 0 , (1.2) for all matrices A with b ounded op erator norm. This condition is an ticipated by the classical literature, as it is essentially all that is needed for the arguments of [ Sil95 ]; see also [ Y as16 ] for a necessary and sufficient formulation in the isotropic case. This condition is also readily v erified in simple cases, most notably in the sep ar able (or line ar ) mo del where g = Σ 1 / 2 w for a v ector w with i.i.d. standard en tries, the setting of the foundational work of Silv erstein and Bai [ SB95 ]. 1.2. Lo cal la ws: from global to optimal scale. While the con vergence of the empirical sp ectral distribution provides a global description of the sp ectrum, many applications in random matrix theory — including universalit y of lo cal eigen v alue statistics, eigenv alue rigidity , and eigenv ector delo calization — require understanding the sp ectrum at lo c al scales, down to the scale of individual eigen v alue spacings. An aver age d lo c al law asserts that the Stieltjes transform m ( z ) = n − 1 T r( K − z I ) − 1 remains close to its deterministic approximation m 0 ( z ) uniformly o v er sp ectral parameters z = E + iη with η as small as N − 1+ ε for any ε > 0. Specifically , in the setting where g has iden tit y cov ariance and satisfies concen tration of quadratic forms, the celebrated work of Pillai and Yin [ PY14 ] established | m ( z ) − m 0 ( z ) | ≺ ( N η ) − 1 (1.3) uniformly ov er z in suitable regular sp ectral domains extending do wn to the optimal scale η ≥ N − 1+ ε . (The ≺ notation means that the inequality holds except on an even t of probability at most a p ow er of N and up to errors that grow slo w er than any p o w er of N ; see Section 3 for details.) F or many applications, including eigenv ector delo calization, the distribution of eigen v ectors, and the analysis of finite-rank deformations, the av eraged local law is insufficien t. This leads to the isotr opic lo c al law [ BEK+14 ] and more generally the anisotr opic lo c al law . F or the sep ar able mo del g = Σ 1 / 2 w , Kno wles and Yin [ KY17 ], building on [ BEK+14 ] in the case Σ = I n , established that | u ∗ R ( z ) v − u ∗ Π( z ) v | ≺ Ψ( z ) , for all unit v ectors u , v ∈ C n (1.4) where Π( z ) = ( − z I − z e m 0 ( z )Σ) − 1 , Ψ( z ) = p Im e m 0 ( z ) / ( N η ) + ( N η ) − 1 is the optimal error param- eter, and e m 0 ( z ) is the Stieltjes transform of the limiting sp ectral distribution of the Gram matrix e K = N − 1 G ∗ G . The anisotropic lo cal law has profound implications: it yields eigen vector delocalization in any fixed basis (not just the standard basis) and eigenv alue rigidit y at the optimal scale. Com bined with a comparison argument, it leads to T racy-Widom fluctuations for edge eigenv alues and bulk univ ersalit y of lo cal eigen v alue statistics [ LSSY16 ; LS16 ]. These results hav e made the anisotropic lo cal law a cornerstone of modern random matrix theory . 4 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 1.3. Bey ond the separable mo del: the main question. The separable mo del g = Σ 1 / 2 w with indep enden t entries in w is restrictive in practice. Man y statistical and machine learning applications naturally inv olv e random vectors with more complex dep endence structures. This brings us to the cen tral question of the present w ork: Do optimal lo c al laws hold for distributions of g b eyond the sep ar able setting? There are many p ossible answ ers to this question, dep ending on the class of distributions under consideration. One important class is the r andom fe atur es mo del g = σ ( X w ) , (1.5) where X ∈ R n × d is a deterministic feature matrix, w ∈ R d is a random v ector with indep enden t en tries, and σ : R → R is a (p ossibly entrywise) nonlinear activ ation function. This mo del arises naturally in the study of random features regression, whic h is a statistical approximation of k ernel metho ds [ HL22 ; LLC18 ; MM22 ; HMR T22 ] and linear approximan ts for neural net w orks [ PW17 ; FW20 ; BP21 ]. W e discuss applications and related w ork on random features mo dels further in Section 1.5 and further details in Section 2.5 . The weak conv ergence condition ( 1.2 ) is not sufficien t for the lo cal la w to hold at optimal scales, as a quantitativ e strengthening in terms of the rate of concentration in ( 1.2 ) can b e seen to b e necessary for an a v eraged lo cal law to hold (see Section 2.5 ). A first contribution of our w ork is to show that such a strengthening is also sufficient for the a v eraged lo cal la w to hold. In brief, we w ork in a standard high-dimensional regime where n/ N remains b ounded and ∥ Σ ∥ op ≤ C , and we assume the strong concentration g ∗ A g − T r Σ A ≺ ∥ A ∥ F for all deterministic matrices A (see Assumptions 1 and 2 in Section 2.1 for the precise formulation). This concentration is kno wn to hold in the separable case under moment bounds on w , and also for v ectors satisfying a log-Sob olev inequalit y [ Ada15 ] or log-concav e distributions [ BX25 ]. W e show that the a veraged lo cal law holds under this assumption alone (Theorem 2.5 ). T o establish the full anisotropic lo cal la w, whic h provides control of the resolven t in arbitrary directions, our analyses require additional structure on the higher-order cum ulan ts of g . W e in- tro duce Assumption 3 in Section 2.1 , a condition on the cum ulan t tensors of g that interpolates b et w een the highly structured case of indep enden t entries (where cumulan t tensors are diagonal) and generic tensors (where no such structure exists). This assumption is significan tly more gen- eral than requiring a separable model, y et remains v erifiable for imp ortant examples including the random features mo del ( 1.5 ). 1.4. Main con tributions. The main contributions of this pap er are: (i) Averaged lo cal la w under concentration of quadratic forms (Theorem 2.5 ): Under Assumptions 1 and 2 , we establish the optimal a veraged lo cal law ( 1.3 ) together with an en trywise lo cal law for the Gram matrix resolv en t e R ( z ). This extends the results of [ PY14 ] to the case of non-identit y co v ariance. (ii) Anisotropic local la w under cum ulant structure (Theorem 2.8 ): Under the additional Assumption 3 on cum ulan t tensors, we establish the optimal anisotropic lo cal la w ( 1.4 ) for the linearized resolven t ( 2.15 ). This extends results of [ KY17 ] b eyond the separable mo del, pro viding the first anisotropic lo cal la w down to optimal sp ectral scales η ∼ N − 1+ ε for a general class of non-separable sample co v ariance matrices. (iii) V erification for non-separable examples (Section 2.5 ): W e v erify Assumptions 2 and 3 for several classes of non-separable distributions, including conditionally mean-zero dis- tributions (Prop osition 2.13 ), the random features model g = σ ( X w ) (Prop osition 2.17 ), and Gaussian measures with random features tilting (Proposition 2.18 ). ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 5 (iv) T ensor netw ork metho dology : Our pro ofs emplo y a no v el tensor net w ork framew ork for analyzing fluctuation av eraging in the presence of higher-order cumulan t structure. This framew ork, which systematically reduces complex moment expansions to graph-theoretic b ounds, may b e of indep endent in terest for other problems in random matrix theory . W e giv e a brief in troduction in Section 2.7 . As corollaries, w e obtain eigenv alue rigidit y at the optimal scale and delo calization of eigen v ectors of b oth K and e K in any fixed basis (see Section 2.4 for more details). 1.5. Related w ork. Random features and Gaussian equiv alence. The random features mo del ( 1.5 ) was in tro- duced by Rahimi and Rech t [ RR07 ] as a computationally efficient appro ximation to k ernel methods, and is equiv alent to a tw o-lay er neural net work with frozen first-la y er weigh ts and trainable read- out. A remark able phenomenon, observed empirically and established rigorously in several settings [ HMR T22 ; MM22 ; MRSY19 ; GMKZ20 ; GLK+20 ; HL22 ], is that the random features mo del is asymptotically equiv alent to a surrogate linear Gaussian mo del: in the prop ortional limit, the training and generalization errors of the nonlinear mo del match those of a Gaussian equiv alen t. The anisotropic lo cal law established in this paper (Theorem 2.8 ) can b e viewed as a str ong form of Gaussian e quivalenc e at the level of the resolven t: it asserts that ( K − z I ) − 1 is well-appro ximated at optimal scales b y a deterministic matrix Π( z ) dep ending only on the first t w o momen ts of g . This extends naturally to sp ectral filters φ ( K ) for functions φ smo oth on an y scale larger than the mean eigenv alue spacing. The mo del also extends to deep arc hitectures with frozen interme- diate lay ers [ SCDL23 ]; our cumulan t assumption is designed to accommo date such comp ositional structure, though v erifying it for deep mo dels with many la y ers remains an open problem. Co v ariance estimation and concentration. The estimation of cov ariance matrices in high dimensions has a v ast literature; see [ BS10 ; V er12 ] for comprehensive treatments. A central question is: what conditions on the distribution of g ensure that the sample co v ariance K = N − 1 P N i =1 g i g ∗ i concen trates around Σ in op erator norm? F or log-concav e distributions, optimal ∥ K − Σ ∥ op = O ( p n/ N ) rates were established by Adam- czak, Litv ak, Pa jor, and T omczak-Jaegermann [ ALPT10 ]. The work of Sriv asta v a and V ershynin [ SV13 ] show ed that if all k -dimensional marginals of the whitened v ector Σ − 1 / 2 g ha v e (2 + ε )- momen t tails deca ying lik e t − (1+ η ) for t > C k , then N = O ( n ) samples suffice for op erator norm concen tration. This w as refined by Tikhomiro v [ Tik18 ], who obtained the optimal Bai-Yin rate under L p - L 2 norm equiv alence for p > 4, whic h giv es momen t growth estimates of line ar forms , i.e. uniform control of moments of ⟨ u , g ⟩ in place of our quadr atic forms g ∗ A g . Most recently , Ab dalla and Zhivoto vskiy [ AZ24 ] pro v ed a non-asymptotic, dimension-free Bai-Yin theorem: if ∥ g ∥ 2 ≤ N 1 / 2+ ε and one-dimensional marginals satisfy L p - L 2 norm equiv alence for some p > 4, then the sample co v ariance ac hieves the optimal rate dep ending on the effectiv e rank rather than the am bien t dimension. Hence, linear form estimates tend to b e sufficien t for the c ovarianc e estimation pr oblem . Im- p ortan tly , b oundedness of linear forms do es not imply concentration of quadratic forms, even in the weak sense of ( 1.2 ). A simple coun terexample is the mixtur e mo del : let g = ξ · z where z ∼ N (0 , I n ) is Gaussian and ξ is an indep endent b ounded scalar v ariable with unit second mo- men t. Then E g = 0 and E gg ∗ = I n , and linear forms ⟨ u , g ⟩ are sub-Gaussian. How ev er, the v ariance of ξ 2 ma y induce fluctuations in the quadratic form g ∗ A g = ξ 2 z ∗ A z that are large enough to cause the standard Marchenk o-P astur la w to not be a deterministic equiv alen t for the sp ectral distribution (see Example 2.16 for further discussion). 6 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Lo cal la ws for Wigner matrices and generalizations. The dev elopmen t of lo cal laws b egan with Wigner matrices (symmetric matrices with i.i.d. en tries), where the semicircle la w holds glob- ally . Pioneering work of Erd˝ os, Schlein, and Y au [ ESY09a ; ESY09b ] established lo cal semicircle la ws down to scales η ∼ N − 1+ ε . This was subsequen tly extended to generalized Wigner matrices and mo dels of mild sparsit y [ EYY12a ; EKYY13b ; EKY13 ; EKYY13a ]. A p ow erful generalization to less homogeneous matrices is the matrix Dyson e quation (MDE) formalism developed by Ajanki, Erd˝ os, and Kr ¨ uger [ AEK17 ; AEK19 ]. F or a random matrix H with mean A = E H and co v ari- ance operator S [ R ] = E ( H − A ) R ( H − A ), the resolven t G = ( H − z I ) − 1 is approximated by a deterministic matrix M satisfying I + ( z I − A + S [ M ]) M = 0 . This framework accommo dates matrices with correlated en tries (provided correlations deca y suffi- cien tly fast), non-identical v ariances, and non-zero means. The characteristic flo w metho d. A complemen tary approac h to lo cal la ws proceeds dynami- cally: one interpolates the giv en matrix tow ard a Gaussian ensemble b y running a Dyson Brownian motion, and studies how the resolv en t ev olv es along this flow. The k ey observ ation, going back to P astur [ P as72 ], is that the Stieltjes transform of the empirical sp ectral measure satisfies a transport equation whose c haracteristics are curv es in the upp er half-plane con tracting aw a y from the sp ectral supp ort as time increases; this allo ws one to propagate estimates that hold at larger scales to the optimal shorter scales η ∼ N − 1+ ε . This metho d was pioneered for lo cal laws b y [ HL19 ; AH20 ] in the con text of β -ensem bles. [ Bou22 ; Ben20 ] used related sto c hastic advection equations to study extreme gap statistics and eigenv ector statistics respectively . The metho d has b een also extended b ey ond the self-adjoint setting: [ BF22 ] employ ed it in the context of non-Hermitian dynamics, and [ CES24 ] in troduced the zig-zag metho d , whic h applies the characteristic flow iteratively to pro ve m ulti-resolv ent lo cal laws b y alternating short flo w steps with Greens function comparison esti- mates; this metho d has also b een applied to generalized Wigner mo dels [ ER24 ], which are closely related to the linearized resolven ts of sample cov ariance matrices. Although the presen t pap er do es not use the c haracteristic flow method, identif ying the correct assumptions on non-separable ran- dom v ectors that would allow one to run this dynamical argument remains in teresting for future w ork, and we exp ect it would b e similar in spirit to what we need here. See [ SW19 ] for a short and elegan t dynamical pro of of a lo cal semicircle law via the characteristic flo w method. Lo cal la ws for sample co v ariance and Gram matrices. F or sample cov ariance matrices with iden tit y co v ariance, the a veraged lo cal la w program was initiated b y Pillai and Yin [ PY14 ] (see also [ BPZ15 ]). The w ork of Kno wles and Yin [ KY17 ] established the anisotropic local la w for separable cov ariance matrices (c.f. Prop osition 2.12 ) via a strategy closely related to the metho d here, though they use a sophisticated interpolation argument to show the full anisotropic lo cal law. Alt, Erd˝ os, and Kr ¨ uger extended these results to general Gram matrices and correlated sample co v ariance mo dels using the Dyson equation approac h [ AEK18 ; AEK20 ; AEKS21 ]. An imp ortan t statistical application of these lo cal la ws is to edge universalit y: Lee and Schnelli [ LS16 ] used lo cal la ws as input to establish T racy-Widom fluctuations for the largest eigenv alue of sample cov ariance matrices with general p opulation cov ariance, via a Green function comparison argument. This w as extended to T racy-Widom fluctuations at each regular edge of the sp ectrum b y F an and Johnstone [ FJ22 ]. P olynomial scaling regimes and spike eigen v alues. The prop ortional regime n ≍ N consid- ered in this pap er is not the only scaling of in terest. In the p olynomial r e gime N ≍ d ℓ for integer ℓ ≥ 1, limiting spectral distributions of random inner-pro duct kernel matrices hav e been analyzed b y [ L Y25 ; XHM+22 ]; universalit y for general distributions w as established in [ DLMY23 ; PWZ25 ]. ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 7 P olynomial scalings of random features regression ha v e also b een studied [ HLM24 ; DLM24 ], reveal- ing subtleties suc h as the failure of Gaussian equiv alence in the quadratic scaling regime [ WHL+25 ]. Bey ond the bulk spectrum, “spik e” eigenv alues separated from the bulk are often the primary fea- tures of in terest. The celebrated Baik-Ben Arous-P´ ech ´ e (BBP) phase transition [ BBP05 ] describes when spike eigenv alues emerge from the bulk in spiked cov ariance mo dels. F or the separable mo del g = Σ 1 / 2 x , these phenomena are well-understoo d [ BY12 ; BN12 ]. The work of W ang, W u, and F an [ WWF24 ] extends this to nonlinear models, characterizing ho w spiked structure in input data prop- agates through neural netw ork lay ers and establishing a Gaussian equiv alence for spike eigenv alue b eha vior for non-separable sample co v ariance matrix. 1.6. Notation. W e use C , c for p ositiv e constan ts that may change from line to line but dep end only on fixed mo del parameters. The notation a ≍ b means c ≤ a/b ≤ C for constants c, C > 0. W e write ∥ · ∥ op and ∥ · ∥ F for the op erator (sp ectral) norm and F rob enius norm of matrices, and ∥ · ∥ 2 for the Euclidean norm of vectors. F or a general tensor T ∈ ( R n ) ⊗ k , we write ∥ T ∥ ∞ for the entrywise l ∞ (maxim um) norm, and ∥ T ∥ F for its F rob enius tensor norm, i.e. ∥ · ∥ 2 -norm of its v ectorization. W e write X ≺ Y to denote sto chastic domination: for an y ε, D > 0, there exists C ≡ C ( ε, D ) > 0 such that P [ | X | > N ε Y ] ≤ C N − D for all N ≥ 1. Prop erties of this notation are further reviewed in Section 3 . 2. Main resul ts 2.1. Assumptions. W e w ork in the follo wing standard high-dimensional setting. Assumption 1 (Basic assumptions) . Ther e exist c onstants C, c > 0 such that c ≤ n/ N ≤ C , ∥ Σ ∥ op ≤ C , and Σ is p ositive semidefinite with at most (1 − c ) n of its eigenvalues b elonging to [0 , c ] . The central assumption w e make on quadratic forms is the follo wing. Assumption 2 (Concentration of quadratic forms) . g 1 , . . . , g N ar e indep endent r andom ve ctors satisfying E g i = 0 , E g i g ∗ i = Σ , and E ∥ g i ∥ k 2 ≤ n C k for e ach k ≥ 1 and c onstants C k > 0 . F urthermor e, for any ε, D > 0 , ther e exists a c onstant C ≡ C ( ε, D ) > 0 such that for e ach i = 1 , . . . , N and any A ∈ R n × n , P [ | g ∗ i A g i − T r Σ A | ≥ n ε ∥ A ∥ F ] ≤ C n − D . Assumption 2 can equiv alen tly b e form ulated in terms of L p b ounds on the re-centered quadratic form g ∗ i A g i − T r Σ A , b y Marko v’s inequality . It is known to hold in the separable case g i = Σ 1 / 2 w i under momen t bounds on the en tries of w i , and also for vectors satisfying a con vex concentration inequalit y with at most logarithmic dep endence on the dimension [ Ada15 ], including isotropic log- conca v e random vectors [ BX25 ]. Under Assumptions 1 and 2 , we will establish an optimal lo cal law for the Stieltjes transforms of K and e K and an en trywise lo cal la w for the resolven t of e K (c.f. Theorem 2.5 ) ov er sp ectral domains of C + satisfying suitable regularit y conditions. W e will then establish an optimal anisotropic lo cal law for the resolven ts of K and e K (c.f. Theorem 2.8 ) under an additional assumption for the higher-order cumulan t tensors of g 1 , . . . , g N . T o form ulate this assumption, we introduce the following definition. Definition 2.1 ( U -norm) . F or an y subset U ⊆ R n con taining all standard basis v ectors e 1 , . . . , e n , and for an y k ≥ 1, w e define a norm on tensors T ∈ ( R n ) ⊗ k b y ∥ T ∥ U = sup x 1 ,..., x k ∈U |⟨ x 1 ⊗ · · · ⊗ x k , T ⟩| . 8 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Note that for U con taining the standard basis v ectors e 1 , . . . , e n and ha ving p olynomial cardi- nalit y |U | ≤ n C , the norm ∥ T ∥ U ma y b e understo o d as a certain strengthening of the entrywise ℓ ∞ norm. Our additional assumption for the cum ulan t tensors of g 1 , . . . , g N is the follo wing. Assumption 3. L et κ k ( g i ) ∈ ( R n ) ⊗ k denote the k -th or der cumulant tensor of g i , whose entries ar e given by ⟨ κ k ( g i ) , e α 1 ⊗ · · · ⊗ e α k ⟩ = κ k ( e ∗ α 1 g i , . . . , e ∗ α k g i ) wher e κ k ( · ) on the right side is the k th mixe d cumulant of entries of g i . Then for e ach k ≥ 3 , ther e exists a c onstant C k > 0 and a subset of deterministic ve ctors U k ⊂ R n satisfying { e 1 , . . . , e n } ⊆ U k , |U k | ≤ n C k , ∥ x ∥ 2 ≤ C k for al l x ∈ U k , such that the fol lowing holds: F or any ε > 0 , ther e exists a c onstant C ≡ C ( ε, k ) > 0 such that for e ach i = 1 , . . . , N and any m ∈ { 1 , . . . , k − 1 } , s 1 , . . . , s m ∈ R n , and T ∈ ( R n ) ⊗ k − m , |⟨ κ k ( g i ) , s 1 ⊗ · · · ⊗ s m ⊗ T ⟩| ≤ C n ε ( √ n ) k − m − 1 ∥ T ∥ U k m Y t =1 ∥ s t ∥ 2 . (2.1) Remark 2.2. T o provide some in tuition for Assumption 3 , w e note that a simple cumulan t gro wth condition that w ould imply the needed concentration of quadratic forms in Assumption 2 is |⟨ κ k ( g ) , S ⊗ T ⟩| ≺ ∥ S ∥ F ∥ T ∥ F (2.2) for all fixed k ≥ 3, m ∈ { 1 , . . . , k − 1 } , S ∈ ( R n ) ⊗ m , and T ∈ ( R n ) ⊗ k − m . (That ( 2.2 ) implies Assumption 2 follows from the high moment estimate of Lemma 5.7 and Mark ov’s inequality .) Sp ecializing this gro wth condition ( 2.2 ) to S = s 1 ⊗ . . . ⊗ s m and applying ∥ T ∥ F ≤ √ n k − m ∥ T ∥ ∞ w ould give |⟨ κ k ( g ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩| ≺ ( √ n ) k − m ∥ T ∥ ∞ m Y t =1 ∥ s t ∥ 2 , (2.3) whic h is a v ersion of ( 2.1 ) with a weak er b ound √ n k − m in place of √ n k − m − 1 . Although the condition ( 2.1 ) is not directly comparable to ( 2.2 ), one may interpret ( 2.1 ) as a certain strengthening of ( 2.2 ) b y an extra factor of n − 1 / 2 when restricted to rank-1 tensors S and measured in a ℓ ∞ -t yp e norm ∥ T ∥ U k rather than ∥ T ∥ F . One may also c hec k that for a “generic” tensor κ k ∈ ( R n ) ⊗ k satisfying ( 2.2 ), Assumption 3 with the stronger b ound of √ n k − m − 1 cannot hold. Indeed, giv en any set U k ⊂ R n satisfying the conditions of Assumption 3 , there exists a unit vector s ∈ R n satisfying sup u 2 ,..., u k ∈U k |⟨ κ k , s ⊗ u 2 ⊗ . . . ⊗ u k ⟩| ≺ n − 1 / 2 sup u : ∥ u ∥ 2 =1 sup u 2 ,..., u k ∈U k |⟨ κ k , u ⊗ u 2 ⊗ . . . ⊗ u k ⟩| ≺ n − 1 / 2 ∥ κ k ∥ inj where ∥ κ k ∥ inj is the injectiv e norm, and also ∥ s · κ k ∥ 2 F ≍ n − 1 ∥ κ k ∥ 2 F where s · κ k ∈ ( R n ) ⊗ k − 1 is the contraction of s with κ k in the first co ordinate. (It is easily c hec ked that b oth conditions hold with high probabilit y o v er a uniform random c hoice of s on the unit sphere.) Then for m = 1 and T = s · κ k , we hav e |⟨ κ k , s ⊗ T ⟩| = ∥ s · κ k ∥ 2 F ≍ n − 1 ∥ κ k ∥ 2 F , ∥ T ∥ U k ≺ n − 1 / 2 ∥ κ k ∥ inj . If κ k is generic and satisfies ( 2.2 ) — e.g., κ k = n − ( k − 1) / 2 Z k where Z k ∈ ( R n ) ⊗ k is a symmetric Gaussian tensor with i.i.d. N (0 , 1) entries up to symmetry — then we would exp ect ∥ κ k ∥ 2 F ≍ n − ( k − 1) ∥ Z k ∥ 2 F ≍ n and ∥ κ k ∥ inj ≍ n − ( k − 1) / 2 ∥ Z k ∥ inj ≍ n − k/ 2+1 (with high probability ov er Z k ). Thus ( 2.1 ) can only hold with the weak er factor √ n k − 1 = √ n k − m instead of √ n k − m − 1 , so Assumption 3 requires a certain non-genericity of eac h cumulan t tensor κ k . ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 9 In the simplest setting of a separable mo del where g i = Σ 1 / 2 w i and w i has indep endent entries with b ounded moments, it is easily chec k ed (c.f. Proposition 2.12 ) that Assumption 3 holds in a stronger form with √ n k − m − 1 in ( 2.1 ) replaced by √ n 1 { m =1 } . Imp ortantly , Assumptions 2 and 3 are significantly more general than requiring this separable form, and w e discuss sev eral illustrative examples in Section 2.5 . W e show also in Section 2.5 that neither Assumption 2 or 3 strictly implies the other: for a Gaussian mixture mo del with random v ariance, there is a regime where Assumption 3 holds while Assumption 2 fails; con v ersely , for the spherical 4-spin mo del at v ery high temp erature (e.g. β = n − 1 / 6 ), Assumption 2 holds while Assumption 3 fails, precisely through the large gap b etw een injective and F rob enius tensor norms discussed ab o v e. 2.2. Deformed Marchenk o-Pastur la w and regular sp ectral domains. Under general con- ditions encompassing those of Assumptions 1 and 2 , the empirical eigen v alue distributions of K, e K are well-appro ximated in large dimensions b y deterministic la ws µ 0 , ˜ µ 0 [ MP67 ; Sil95 ; SB95 ]. W e review here relev ant background regarding µ 0 , ˜ µ 0 and their regularit y properties. Denote the ordered eigenv alues of Σ b y σ 1 ≥ . . . ≥ σ n ≥ 0 . F or each z ∈ C + , there exists a unique solution e m 0 ( z ) ∈ C + [ MP67 ] to the Marc henko-P astur equation z = − 1 e m 0 ( z ) + 1 N n X α =1 σ α 1 + σ α e m 0 ( z ) . (2.4) Define also m 0 ( z ) ∈ C + b y e m 0 ( z ) = γ m 0 ( z ) + (1 − γ )( − 1 /z ) , γ = n/ N . (2.5) Then µ 0 and ˜ µ 0 are the (unique) probability distributions whose Stieltjes transforms are given by m 0 , e m 0 : C + → C + , m 0 ( z ) = Z 1 x − z dµ 0 ( x ) , e m 0 ( z ) = Z 1 x − z d e µ 0 ( x ) . (2.6) This coincides with ( 1.1 ), and moreo v er µ 0 is the deformed Marc henk o-P astur law. Note that the relation ( 2.5 ) implies that e µ 0 = γ µ 0 + (1 − γ ) δ 0 (denoting a mixture of µ 0 and an atom at 0) if γ ≤ 1, or µ 0 = (1 /γ ) e µ 0 + (1 − 1 /γ ) δ 0 if γ ≥ 1. In particular, supp( µ 0 ) ∩ (0 , ∞ ) = supp( e µ 0 ) ∩ (0 , ∞ ) . It is sho wn in [ SC95 ] that e µ 0 admits a con tinuous densit y at eac h point x > 0, given by ρ 0 ( x ) = lim z ∈ C + → x 1 π Im e m 0 ( z ) . F urthermore, supp( e µ 0 ) ∩ (0 , ∞ ) is a finite union of interv als c haracterized in the following wa y: Let ¯ n = rank(Σ) and consider the meromorphic function with p oles P = { 0 , − σ − 1 1 , . . . , − σ − 1 ¯ n } , z 0 ( m ) = − 1 m + 1 N n X α =1 σ α 1 + σ α m , m ∈ C \ P , (2.7) whic h locally inv erts e m 0 ( z ) defined by ( 2.4 ). Then under Assumption 1 (c.f. [ KY17 , Lemmas 2.4– 2.6]), z 0 has an ev en n umber 2 p of critical p oints coun ting multiplicit y on the extended real line ¯ R = R ∪ {∞} , with exactly one critical p oin t m 1 ∈ ( − σ − 1 1 , 0), either 0 or 2 critical p oin ts in each in terv al ( − σ − 1 k +1 , − σ − 1 k ) where σ k  = σ k +1 , and exactly one critical point m 2 p ∈ ( −∞ , − σ − 1 ¯ n ) ∪ (0 , ∞ ] whic h is m 2 p = ∞ if ¯ n = N . Ordering these critical p oin ts as 0 > m 1 > m 2 ≥ m 3 > m 4 ≥ m 5 > . . . > m 2 p − 2 ≥ m 2 p − 1 > − σ − 1 ¯ n and m 2 p ∈ ( −∞ , − σ − 1 ¯ n ) ∪ (0 , ∞ ], we hav e C > x 1 > x 2 ≥ x 3 > x 4 ≥ x 5 > . . . > x 2 p − 2 ≥ x 2 p − 1 > x 2 p ≥ 0 , where x j = z 0 ( m j ) , (2.8) 10 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES for a constan t C > 0 and supp( e µ 0 ) ∩ (0 , ∞ ) =  p [ k =1 [ x 2 k , x 2 k − 1 ]  ∩ (0 , ∞ ) ⊆ [0 , C ] . If m j is a critical p oin t of multiplicit y 1, then x j is an edge of e µ 0 (i.e. a b oundary point of supp( e µ 0 )), whic h is a righ t edge for j o dd and a left edge for j ev en. If m 2 k = m 2 k +1 is a critical point of m ultiplicit y 2, then x 2 k = x 2 k +1 is a cusp of e µ 0 (i.e. in terior p oint x ∈ supp( e µ 0 ) where the densit y ρ 0 ( x ) = 0). W e refer to [ x 2 k , x 2 k − 1 ] for k = 1 , . . . , p as the bulk components of e µ 0 . Corresp onding to these bulk comp onen ts and edges/cusps, and denoting z = E + iη ∈ C + where E = Re z and η = Im z , w e define for δ > 0 and τ ∈ (0 , 1) the domains D b k ( δ, τ ) = { z ∈ C + : E ∈ [ x 2 k + δ, x 2 k − 1 − δ ] , N − 1+ τ ≤ η ≤ 1 } D e j ( δ, τ ) = { z ∈ C + : E ∈ [ x j − δ, x j + δ ] , N − 1+ τ ≤ η ≤ 1 } D o ( δ, τ ) = { z ∈ C + : dist( z , supp( e µ 0 )) ≥ δ, | z | ≥ δ, | E | ≤ δ − 1 , N − 1+ τ ≤ η ≤ 1 } . (2.9) Here, dist( z , U ) = inf x ∈ U | x − z | . Our lo cal law results will hold o ver those domains which satisfy appropriate regularit y prop erties as discussed in [ KY17 ], and which we summarize in the following definition. Definition 2.3 (Regular spectral domain) . W e call D ⊂ { z ∈ C + : η ∈ (0 , 1] } a sp e ctr al domain if z = E + iη ∈ D implies that z ′ = E + iη ′ ∈ D for each η ′ ∈ [ η , 1]. Corresponding to z ∈ D , denote κ = min 2 p j =1 | x j − E | , and define L ( z ) = { z } ∪ { w ∈ D : Re w = E , Im w ∈ [ η , 1] ∩ { N − 5 , 2 N − 5 , 3 N − 5 , . . . , 1 }} . Then the domain D is r e gular if there exist constan ts C, c > 0 such that the follo wing hold: (a) (Basic b ounds for e m 0 ( z )) F or all z ∈ D , c ≤ | z | ≤ C , c ≤ | e m 0 ( z ) | ≤ C, min α ∈ [ n ] | 1 + σ α e m 0 ( z ) | ≥ c, cg ( z ) ≤ Im e m 0 ( z ) ≤ C g ( z ) where g ( z ) = ( √ κ + η if E ∈ supp( e µ 0 ) η √ κ + η if E / ∈ supp( e µ 0 ) (2.10) (b) (Stability of the Marchenk o-P astur equation) Let u : C + → C + b e the Stieltjes transform of an y probabilit y measure, and let ∆ : D → (0 , ∞ ) b e any deterministic function satisfying: • (Boundedness) N − 2 ≤ ∆( z ) ≤ (log N ) − 1 for all z ∈ D , • (Lipschitz contin uit y) | ∆( z ) − ∆( w ) | ≤ N 2 | z − w | for all z , w ∈ D , and • (Monotonicity) F or each fixed E , the function η 7→ ∆( E + iη ) is non-increasing ov er η ∈ [ N − 1+ τ , 1]. F or eac h z ∈ D , if | z 0 ( u ( w )) − w | ≤ ∆( w ) for ev ery w ∈ L ( z ), then | u ( z ) − e m 0 ( z ) | ≤ C ∆( z ) √ κ + η + p ∆( z ) . (2.11) The follo wing result was shown in [ KY17 ] for the regularit y of the preceding domains D b k ( δ, τ ), D e j ( δ, τ ), and D o ( δ, τ ). Lemma 2.4 ([ KY17 ]) . Supp ose Assumption 1 holds. Fix any c onstant τ ∈ (0 , 1) . (a) (R e gular bulk c omp onent) Supp ose, for a bulk c omp onent [ x 2 k , x 2 k − 1 ] , ther e exist c onstants δ, c 0 > 0 such that ρ 0 ( x ) > c 0 for al l x ∈ [ x 2 k + δ, x 2 k − 1 − δ ] . Then D b k ( δ, τ ) is r e gular. (b) (R e gular e dge) Supp ose, for an e dge x j = z 0 ( m j ) , ther e exists a c onstant δ > 0 such that x j > δ, | x j − x k | > δ for e ach k  = j, min α | m j + σ − 1 α | > δ. Then ther e exists a c onstant δ ′ > 0 such that D e j ( δ ′ , τ ) is r e gular. ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 11 (c) (Outside the sp e ctrum) Fix any c onstant δ > 0 . Then D o ( δ, τ ) is r e gular. (F or e ach statement, Definition 2.3 holds with c onstants C, c > 0 dep ending on τ , δ, δ ′ , c 0 .) Pr o of. Parts (a), (b), and regularity of D o ( δ, τ ) ∩ { z ∈ C + : dist( E , supp( e µ 0 ) ∪ { 0 } ) ≥ δ / 2 } in (c) follo w from [ KY17 , Lemmas A.4, A.5, A.6, A.8]. The remaining p oints z ∈ D o ( δ, τ ) with dist( E , supp( e µ 0 ) ∪ { 0 } ) < δ / 2 must satisfy η ≥ δ / 2: F or these p oin ts, ( 2.10 ) follows from the b ounds Im e m 0 ≥ cη and | e m 0 | ≤ η − 1 implied by ( 2.6 ), and ( 2.11 ) follows directly from the condition | z 0 ( u ( z )) − z | ≤ ∆( z ), uniqueness of the solution in C + to ( 2.4 ) which implies u ( z ) = e m 0 ( z + ∆) for some | ∆ | ≤ ∆( z ) ≤ (log N ) − 1 , and the Lipschitz contin uit y | e m ′ 0 ( z ) | ≤ η − 2 . □ W e remark that the regularity conditions of Definition 2.3 are satisfied in a wide range of ex- amples; see [ KY17 , Section 3] for detailed verifications under v arious conditions on the p opulation co v ariance Σ. 2.3. Lo cal law for the Stieltjes transform. This and the follo wing section state the main results of our pap er. Pro ofs of Theorems 2.5 , 2.8 , 2.10 and their corollaries will b e given in Section 4 . F or spectral parameters z = E + iη ∈ C + , define the resolv en ts and Stieltjes transforms of K and e K b y R ( z ) = ( K − z I n ) − 1 , m ( z ) = n − 1 T r R ( z ) , e R ( z ) = ( e K − z I N ) − 1 , e m ( z ) = N − 1 T r e R ( z ) . W e will denote the entries of these resolv ents by R αβ ( z ) = e ∗ α R ( z ) e β and e R ij ( z ) = e ∗ i e R ( z ) e j . F or z = E + iη ∈ C + , define also the error con trol parameter Ψ( z ) = s Im e m 0 ( z ) N η + 1 N η where we recall the b ounds for Im e m 0 ( z ) from ( 2.10 ). The following theorem establishes a lo cal la w for the Stieltjes transforms m ( z ) and e m ( z ), together with an en trywise lo cal law for the resolv ent e R ( z ) of the Gram matrix e K , o ver a regular sp ectral domain. Theorem 2.5. (Aver age d and entrywise lo c al laws) Supp ose Assumptions 1 and 2 hold, and D is a r e gular sp e ctr al domain. Then for any ε, D > 0 , ther e exists a c onstant C ≡ C ( ε, D ) > 0 such that with pr ob ability at le ast 1 − C N − D , for al l z = E + iη ∈ D we have | m ( z ) − m 0 ( z ) | ≤ N ε Ψ( z ) 2 √ κ + η + Ψ( z ) , | e m ( z ) − e m 0 ( z ) | ≤ N ε Ψ( z ) 2 √ κ + η + Ψ( z ) , (2.12) max 1 ≤ i,j ≤ N | e R ij ( z ) − e m 0 ( z ) 1 { i = j }| ≤ N ε Ψ( z ) . (2.13) Remark 2.6. The b ound ( 2.10 ) implies Im e m 0 ( z ) ≤ C √ κ + η in all cases, so the right side of ( 2.12 ) may b e further b ounded as Ψ( z ) 2 √ κ + η + Ψ( z ) ≤ 2 C √ κ + η ( N η ) − 1 + 2( N η ) − 2 √ κ + η + ( N η ) − 1 ≤ 2( C + 1) N η . The result stated in ( 2.12 ) is equiv alen t to a b ound of N ε ( N η ) − 1 for E ∈ supp( ˜ µ 0 ) inside the sp ectral support, where Im e m 0 ( z ) ≍ √ κ + η . The stronger form of ( 2.12 ) for E / ∈ supp( ˜ µ 0 ) has serv ed useful to show optimal concentration of the eigen v alues of e K around supp( ˜ µ 0 ) (see e.g. [ PY14 , Eq. (8.8)]). 12 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES W e state here a few known implications of Theorem 2.5 on eigen v alue rigidity and eigenv ector delo calization [ ESY09b ; EYY12b ]. Let λ 1 ≥ . . . ≥ λ min( n,N ) b e the leading eigenv alues of e K (or equiv alently of K ). F or an y i ∈ { 1 , . . . , min( n, N ) } , define the classical eigenv alue location θ ( i ) > 0 through N Z ∞ θ ( i ) ρ 0 ( x ) dx = i − 1 2 where w e recall the densit y ρ 0 ( x ) of e µ 0 on (0 , ∞ ). Corresp onding to eac h edge/cusp x j defined in ( 2.8 ), denote N j = N Z ∞ x j ρ 0 ( x ) dx. (2.14) (It is sho wn in [ KY17 , Lemma A.1] that N j is alwa ys an in teger.) Corollary 2.7. Supp ose Assumptions 1 and 2 hold. Then for any δ, ε, D > 0 , ther e exist c onstants τ ≡ τ ( ε ) ∈ (0 , 1) and C ≡ C ( δ, ε, D ) > 0 such that with pr ob ability at le ast 1 − C N − D , the fol lowing hold: (a) (Sp e ctr al supp ort) K has no eigenvalues outside { x ∈ R : dist( x, supp( µ 0 )) ≤ δ } , and e K has no eigenvalues outside { x ∈ R : dist( x, supp( e µ 0 )) ≤ δ } . (b) ( N − 2 / 3 -c onc entr ation at r e gular e dges) If x 2 k − 1 is any right e dge for which D e 2 k − 1 ( δ, 1 / 3) is r e gular, then K and e K have no eigenvalues in [ x 2 k − 1 + N − 2 / 3+ ε , x 2 k − 1 + δ ] . The analo gous statement holds for a left e dge x 2 k and [ x 2 k − δ, x 2 k − N − 2 / 3+ ε ] . (c) (Rigidity of eigenvalues) If x 2 k − 1 is any right e dge for which D e 2 k − 1 ( δ, τ ) is r e gular, then | λ i − θ ( i ) | ≤ ( i − N 2 k − 1 ) − 1 / 3 N − 2 / 3+ ε for al l i such that θ ( i ) ∈ [ x 2 k − 1 − δ, x 2 k − 1 ] . If x 2 k is any left e dge for which D e 2 k ( δ, τ ) is r e gular, then | λ i − θ ( i ) | ≤ ( N 2 k − i ) − 1 / 3 N − 2 / 3+ ε for al l i such that θ ( i ) ∈ [ x 2 k , x 2 k + δ ] . If [ x 2 k , x 2 k − 1 ] is any bulk c omp onent for which D b k ( δ, τ ) is r e gular, and at le ast one e dge domain D e 2 k ( δ, τ ) or D e 2 k − 1 ( δ, τ ) is also r e gular, then | λ i − θ ( i ) | ≤ N − 1+ ε for al l i such that θ ( i ) ∈ [ x 2 k + δ, x 2 k − 1 − δ ] . (d) ( l ∞ -delo c alization of eigenve ctors of e K ) If x j is any e dge for which D e j ( δ, ε ) is r e gular, then for e ach eigenvalue λ ∈ [ x j − δ, x j + δ ] of e K and e ach asso ciate d eigenve ctor e x , ∥ e x ∥ ∞ ≤ N ε √ N . If [ x 2 k , x 2 k − 1 ] is any bulk c omp onent for which D b k ( δ, τ ) is r e gular, then the same holds for every eigenve ctor of e K c orr esp onding to an eigenvalue λ ∈ [ x 2 k + δ, x 2 k − 1 − δ ] . 2.4. Anisotropic lo cal law for the linearized resolven t. F or all z ∈ C + , define the linearized resolv en t Π( z ) = " − z I n 1 √ N G 1 √ N G ∗ − I N # − 1 ∈ C ( n + N ) × ( n + N ) . (2.15) Note that b y Sc h ur’s complement, we ha v e Π( z ) =  R ( z ) ∗ ∗ z e R ( z )  ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 13 where the t wo diagonal blo cks con tain the resolven ts of K and e K . Under the additional condition of Assumption 3 , the follo wing theorem establishes an optimal anisotropic lo cal la w for Π( z ). Theorem 2.8 (Anisotropic lo cal law) . Supp ose Assumptions 1 , 2 , and 3 hold, and D is a r e gular sp e ctr al domain. Then for any ε, D > 0 , ther e exists a c onstant C ≡ C ( ε, D ) > 0 such that the fol lowing holds: Fix any deterministic unit ve ctors v 1 , v 2 ∈ C N and u 1 , u 2 ∈ C n . Then with pr ob ability at le ast 1 − C N − D , for al l z = E + iη ∈ D we have      u ∗ 1 v ∗ 1   Π( z ) −  ( − z I n − z e m 0 ( z )Σ) − 1 0 0 z e m 0 ( z ) I N   u 2 v 2      ≤ N ε Ψ( z ) . This implies the delocalization of eigenv ectors of both K and e K in any fixed basis, extending the entrywise delo calization for e K in Corollary 2.7 (d). Corollary 2.9 (Delo calization of eigenv ectors of K and e K ) . Supp ose Assumptions 1 , 2 , and 3 hold. Fix any deterministic unit ve ctors u ∈ R n and v ∈ R N . Then for any δ , ε, D > 0 , ther e exists a c onstant C ≡ C ( δ, ε, D ) > 0 such that with pr ob ability at le ast 1 − C N − D , the fol lowing holds: If x j is any e dge of e µ 0 for which D e j ( δ, ε ) is r e gular, then for e ach eigenvalue λ ∈ [ x j − δ, x j + δ ] and asso ciate d eigenve ctor x or e x of K or e K , | u ∗ x | ≤ N ε √ N , | v ∗ e x | ≤ N ε √ N . If [ x 2 k , x 2 k − 1 ] is any bulk c omp onent for which D b k ( δ, τ ) is r e gular, then the same holds for every eigenve ctor x or e x of K or e K c orr esp onding to an eigenvalue λ ∈ [ x 2 k + δ, x 2 k − 1 − δ ] . Although the primary fo cus of our w ork is on Theorems 2.5 and 2.8 for lo cal sp ectral parameters z ∈ C + , let us clarify that in the case of z = E + iη where E has constan t separation from the sp ectral supp ort, our argumen ts also establish the analogue of Theorem 2.8 under Assumptions 1 and 2 alone. W e summarize this result here. Theorem 2.10 (Outside the sp ectrum) . Supp ose Assumptions 1 and 2 hold. Fix any c onstant δ > 0 and c onsider ¯ D o ( δ ) = { z ∈ C : dist( z , supp( e µ 0 )) ≥ δ, | z | ≥ δ, | E | ≤ δ − 1 , η ∈ [ − 1 , 1] } . Then for any ε, D > 0 , ther e exists a c onstant C ≡ C ( δ, ε, D ) > 0 such that the fol lowing hold: With pr ob ability 1 − C N − D , for al l z ∈ ¯ D o ( δ ) we have | m ( z ) − m 0 ( z ) | ≤ N ε N , | e m ( z ) − e m 0 ( z ) | ≤ N ε N . F urthermor e, fix any deterministic unit ve ctors v 1 , v 2 ∈ C N and u 1 , u 2 ∈ C n . With pr ob ability 1 − C N − D , for al l z ∈ ¯ D o ( δ ) we have      u ∗ 1 v ∗ 1   Π( z ) −  ( − z I n − z e m 0 ( z )Σ) − 1 0 0 z e m 0 ( z ) I N   u 2 v 2      ≤ N ε √ N . This approximation for R ( z ) (the upp er-left blo ck of Π( z )) was established previously in [ WWF24 ], whic h used this result to give a c haracterization of the outlier eigen v alues and eigenv ectors of the sample cov ariance matrix K under spiked mo dels for Σ. Remark 2.11. If n < N and dist(0 , supp( µ 0 )) ≥ 2 δ for some constant δ > 0, then Corollary 2.7 (a) implies that with high probabilit y , m ( z ), m 0 ( z ), R ( z ), and ( − z I n − z e m 0 ( z )Σ) − 1 are all analytic in a neigh b orho od of 0. Then the b ounds | m − m 0 | ≤ N ε / N and | u ∗ 1 ( R − ( − z I n − z e m 0 Σ) − 1 ) u 2 | ≤ N ε / √ N in Theorem 2.10 extend to { z ∈ C : | z | ≤ δ } b y the maximum mo dulus principle. Similarly , the b ounds | e m − e m 0 | ≤ N ε / N and | v ∗ 1 ( e R − e m 0 I N ) v 2 | ≤ N ε / √ N extend to { z ∈ C : | z | ≤ δ } when n > N and dist(0 , supp( e µ 0 )) ≥ 2 δ . 14 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 2.5. Examples. In this section, assuming g 1 , . . . , g N are equal in la w to g ∈ R n , we pro vide concrete examples of distributions for g that satisfy Assumptions 2 and 3 . All pro ofs of results in this section are deferred to Section 5 . 2.5.1. Sep ar able distributions. In the separable setting of [ KY17 ] where g = X w for a vector w ∈ R d ha ving indep enden t entries, the cumulan t tensor κ k ( g ) is a contraction of a diagonal tensor κ k ( w ) ∈ ( R d ) ⊗ k with X along each co ordinate axis. It is easily chec ked that in this case, Assumption 3 holds in a stronger form as guaranteed in the follo wing proposition. Prop osition 2.12. Supp ose Assumption 1 holds, wher e Σ = X X ∗ for a matrix X ∈ R n × d and cn ≤ d ≤ C n for c onstants C, c > 0 . Supp ose g = X w wher e w = ( w [ α ]) d α =1 ∈ R d has indep endent entries satisfying E w [ α ] = 0 , E w [ α ] 2 = 1 , and E | w [ α ] | k < C k for e ach k ≥ 3 and a c onstant C k > 0 . Then g satisfies the c onditions of Assumptions 2 and 3 . Each set U k of Assumption 3 may b e taken as U k ≡ U = { e 1 , . . . , e n } ∪ { X e 1 , . . . , X e d } , and the b ound ( 2.1 ) holds in the str onger form (for any k ≥ 3 , 1 ≤ m ≤ k − 1 , and a c onstant C ≡ C ( k ) > 0 ) |⟨ κ k ( g ) , s 1 ⊗ · · · ⊗ s m ⊗ T ⟩| ≤ C √ n 1 { m =1 } ∥ T ∥ U m Y t =1 ∥ s t ∥ 2 . (2.16) 2.5.2. Conditional ly me an-zer o distributions. W e next consider a more general class of distributions g = X w where the entries of w need not be independent, but satisfy a conditional mean-zero assumption that E Q k i =1 w [ α i ] = 0 whenever one index among α 1 , . . . , α k ∈ [ d ] is distinct from the rest. W e c heck that in suc h settings, Assumption 3 for g is implied b y Assumption 2 for w . Prop osition 2.13. Supp ose Assumption 1 holds, wher e Σ = X X ∗ for a matrix X ∈ R n × d and cn ≤ d ≤ C n for c onstants C, c > 0 . Supp ose g = X w wher e Assumption 2 holds for w ∈ R d , E ww ∗ = I , and E Q k i =1 w [ α i ] = 0 for any indic es α 1 , . . . , α k ∈ [ d ] such that α 1 / ∈ { α 2 , . . . , α k } . Then g satisfies Assumptions 2 and 3 , wher e e ach set U k may again b e taken as U k ≡ U = { e 1 , . . . , e n } ∪ { X e 1 , . . . , X e d } . Example 2.14. Supp ose that w is a uniformly random v ector on the sphere { w ∈ R d : ∥ w ∥ 2 2 = d } . It is clear that all conditions of Proposition 2.13 hold for w . Example 2.15. Supp ose w is an isotropic “unconditionally” log-concav e random v ector, i.e. the log-densit y log p ( w ) ∈ [ −∞ , ∞ ) is a conca ve function on R d , E ww ∗ = I , and w is equal in la w to w ⊙ s (the en trywise pro duct) for any sign vector s ∈ {± 1 } d . The random Gram matrix e K = 1 N G ∗ G, G = [ w 1 , . . . , w N ] where w 1 , . . . , w N are i.i.d. copies of w w as studied recently in [ BX25 ], who built up on an entrywise lo cal law for e K in [ PY14 ] to directly sho w T racy-Widom fluctuations of its leading eigen v alue. F or such distributions, Assumption 2 for w follo ws from a log-concav e isop erimetric inequality , as w as sho wn in [ BX25 , Lemma 3.3]. Then it is readily c hec k ed that all conditions of Proposition 2.13 hold for w , so our results imply that an isotropic lo cal law holds for G = [ w 1 , . . . , w N ], and also that an anisotropic lo cal law holds for G = [ g 1 , . . . , g N ] where g i = X w i . Example 2.16. Supp ose w follo ws a mixture mo del where there exists a random v ariable λ suc h that w = ( w [ α ]) d α =1 has i.i.d. en tries conditional on λ , and for some constan ts C k > 0, E [ w [ α ] | λ ] = 0 , E [ | w [ α ] | k | λ ] ≤ C k for all k ≥ 2 and λ ∈ Λ , E [ E [ w [ α ] 2 | λ ]] = 1 . (2.17) ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 15 Th us w has a mean-zero, isotropic, and exchangeable law, and the conditional law ( w | λ ) satisfies Assumption 2 uniformly ov er λ ∈ Λ. If, in addition, | E [ w [ α ] 2 | λ ] − 1 | ≺ 1 / √ n, (2.18) then all conditions of Prop osition 2.13 hold for w . (See Section 5 for a verification of this and the remaining claims in this example.) In general, the condition ( 2.18 ) is required for the lo cal law estimates of Theorems 2.5 and 2.8 to hold for such a mo del: Consider for e xample P [ λ = 1] = P [ λ = − 1] = 1 / 2 , ( w | λ ) ∼ N (0 , (1 + c n λ ) I ) for some c n > 0 . (2.19) Then | E [ w [ α ] 2 | λ ] − 1 | = c n , so ( 2.18 ) requires c n ≺ n − 1 / 2 . Note that if c n ≡ c is a fixed constan t, then ev en the con v ergence in probabilit y of ( 1.2 ) fails; in this setting, the standard Marchenk o- P astur law µ 0 (corresp onding to Σ = I ) is not the correct deterministic equiv alent for the spectral distribution of K = N − 1 P N i =1 w i w ∗ i , and one must instead view K as a Gram matrix and take the “companion” la w of the deformed Marc henk o-Pastur law that results from conditioning on the corresp onding latent v ariables λ 1 , . . . , λ N : K = N − 1 f W ∗ diag(1 + cλ 1 , ..., 1 + cλ N ) f W , f W ij i.i.d. ∼ N (0 , 1) . If n − 1 / 2 ≪ c n ≪ 1, then µ 0 pro vides a global approximation for the sp ectral distribution of K , but Assumption 2 do es not hold, and an av eraged lo cal law also do es not hold in the quan titativ e form of Theorem 2.5 . F or this mo del ( 2.19 ), one may also directly c heck that Assumption 3 holds under the weak er condition c n ≺ n − 1 / 4 . Thus for n − 1 / 2 ≪ c n ≪ n − 1 / 4 , Ass umption 3 holds while Assumption 2 does not, illustrating that Assumption 3 is not strictly stronger than Assumption 2 . 2.5.3. R andom fe atur es mo dels. Consider the mo del g = σ ( X w ) where w ∈ R d has indep endent en tries and σ ( · ) is an entrywise nonlinear map. This model has b een of considerable in terest in the statistical learning literature on random features regression mo dels [ LLC18 ; HMR T22 ; MM22 ] and linear appro ximants for neural netw orks [ PW17 ; FW20 ; BP21 ]. The following proposition verifies that Assumptions 2 and 3 b oth hold in a linear scaling regime n ≍ d under suitable conditions for σ ( · ). W e note that we allo w the activ ation function to b e different on eac h entry . Prop osition 2.17. L et X ∈ R n × d b e a deterministic matrix, wher e ∥ X ∥ op ≤ C and d ≤ C n for a c onstant C > 0 . L et w ∈ R d b e a r andom ve ctor with indep endent entries, and let g = σ ( X w ) ∈ R n wher e σ : R n → R n is given by σ ( x ) = ( σ i ( x [ i ])) n i =1 for sc alar functions σ 1 , . . . , σ n : R → R applie d entrywise. Supp ose E σ ( X w ) = 0 and E σ ( X w ) σ ( X w ) ∗ = Σ , wher e Σ ∈ R n × n satisfies Assumption 1 . (a) Supp ose the entries of w = ( w [ α ]) d α =1 satisfy E w [ α ] = 0 , E w [ α ] 2 = 1 , and E | w [ α ] | k ≤ C k for e ach k ≥ 3 and a c onstant C k > 0 . If, for some c onstants C, D > 0 , e ach function σ i ( x ) = D X l =0 a il x l is a p olynomial with de gr e e at most D and c o efficients satisfying max n i =1 max D l =0 | a il | < C , then g satisfies Assumptions 2 and 3 . (b) Supp ose that w ∼ N (0 , I d ) is a standar d Gaussian ve ctor. If, for some c onstants C > 0 and β > 1 / 2 , e ach function σ i admits a (absolutely c onver gent) series r epr esentation σ i ( x ) = ∞ X l =0 a il x l 16 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES wher e max n i =1 | a il | < C ( l !) − β , then g also satisfies Assumptions 2 and 3 . Condition (b) is equiv alent to the condition that σ i ( x ) extends to an en tire function on C satisfy- ing the gro wth | σ i ( z ) | ≤ C exp( c | z | 1 /β ) for some constants C , c > 0 and β > 1 / 2. In particular, for β = 1, this is the class of entire functions of exp onen tial type (i.e. ha ving F ourier transform with compact supp ort [ Rud87 , Theorem 19.3]), whic h has also b een considered recently in a slightly differen t con text of random inner-pro duct k ernel matrices [ KNH24 ; L Y25 ]. 2.5.4. R andom fe atur es tilt. A related class of examples arises when the random features structure app ears not in the random vector itself but in its density . Consider a random v ector w ∈ R n with densit y proportional to exp( − U ( w )), where U ( w ) = 1 2 ∥ w ∥ 2 2 + λ d X i =1 σ i ( x ∗ i w ) (2.20) for deterministic directions x 1 , . . . , x d ∈ R n , smo oth nonlinearities σ i , and a coupling constant λ . The follo wing proposition establishes that for | λ | sufficien tly small, d at most p olynomial in n , and under a b oundedness condition for the deriv ativ es of σ i ( · ), the centered v ector g = w − E w satisfies b oth Assumptions 2 and 3 . Prop osition 2.18. L et X ∈ R d × n b e a deterministic matrix with r ows { x i } d i =1 such that ∥ X ∥ op ≤ C and d ≤ n C for a c onstant C > 0 . Supp ose w ∈ R n is a r andom ve ctor having density pr op ortional to exp( − U ( w )) , wher e U ( w ) is given by ( 2.20 ) , and σ 1 , . . . , σ d : R → R ar e smo oth functions satisfying sup a ∈ R d ∥ ( σ ( k ) i ( a [ i ])) d i =1 ∥ 2 ≤ C k for al l k ≥ 1 (wher e σ ( k ) i is the k th -or der derivative). L et g = w − E w , Σ = E gg ∗ . Then ther e exists a c onstant λ ∗ > 0 such that whenever | λ | < λ ∗ , Assumptions 1 , 2 , and 3 hold for Σ and g . Each set U k of Assumption 3 may b e taken as U k ≡ U = { e 1 , . . . , e n } ∪ { x 1 , . . . , x d } , and the b ound ( 2.1 ) holds in the str onger form (for any k ≥ 3 , 1 ≤ m ≤ k − 1 , and a c onstant C ≡ C ( k ) > 0 ) ⟨ κ k ( g ) , s 1 ⊗ · · · ⊗ s m ⊗ T ⟩ ≤ C m Y i =1 ∥ s i ∥ 2 ! ∥ T ∥ U . This mo del is of particular interest when d is b ounded and ∥ x i ∥ 2 = 1: the higher-order cumulan t tensors of g then contain spikes along the directions { x i } d i =1 . Moreov er this higher cumulan t struc- ture p ersists even in the whitened v ector ˜ g = Σ − 1 / 2 g whic h has identit y co v ariance and satisfies b oth Assumption 2 and 3 . Our lo cal la w implies that the resolv en t of the sample cov ariance of i.i.d. copies of ˜ g is well-appro ximated b y the Marc henko-P astur deterministic equiv alen t, showing that even refined lo cal statistics of the eigenv alues and eigenv ectors of the sample co v ariance ma- trix alone cannot recov er the directions x 1 , . . . , x d , in the regime of sample size N prop ortional to the dimension n . This st yle of “spik ed cum ulan t” mo del and the associated hidden-direction reco v ery task is closely related to sample-complexity and computational low er-bound results for non-Gaussian comp onen t analysis and higher-order correlation mo dels [ WL19 ; TV18 ; DKRS23 ; DKPP24 ; SBGG24 ]. 2.6. Negativ e examples. F or illustration, w e discuss also some examples which do not satisfy the h yp otheses of Assumptions 2 and 3 , to illustrate the necessit y of the assumptions and/or limitations of our curren t tec hniques. ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 17 2.6.1. Chaos ve ctors. Let M , d b e integers with 1 ≤ M ≤ d . F or a v ector x = ( x [1] , . . . , x [ d ]) ∈ R d , of i.i.d. standard normal random v ariables, define a v ector g [ i, j ] = x [ i ] x [ j ] for 1 ≤ i < j ≤ min { i + M , d } . (The case of M = d corresp onds to all pairwise pro ducts of entries of X .) W e set n to b e the size of the indexing set { ( i, j ) : 1 ≤ i < j ≤ min( i + M , d ) } , which is up to constants n ≍ M d . This random vector provides an example whic h satisfies the w eak concen tration of quadratic forms ( 1.2 ) but do es not hav e the rate in Assumption 2 ; moreov er, it is easily verified not to satisfy the a v eraged local la w with the optimal rate. More precisely , for any test matrix A ∈ R n × n , it is readily c heck ed that E | g ∗ A g − E g ∗ A g | 2 ≍ M ∥ A ∥ 2 F , and hence from hypercontractivit y of the Gaussian measure, for an y k ≥ 2, E | g ∗ A g − E g ∗ A g | k ≤ C k ( E | g ∗ A g − E g ∗ A g | 2 ) k/ 2 ≲ M k/ 2 ∥ A ∥ k F . This suffices for the Bai-Zhou theorem [ BZ08 ] for any M ≤ d , i.e. the Marc henk o-P astur law holds at the global sp ectral scale for the sample cov ariance matrix K defined b y N ≍ n samples. How ev er, it is also easily chec ked that V ar(T r K ) ≍ M n/ N ≍ M . By a simple fourth momen t estimate for T r K − E T r K , we can argue with probability b ounded b elo w indep endently of n that | T r K − E T r K | ≳ √ M . Hence when M = d α for any α ∈ (0 , 1], this con tradicts the eigen v alue rigidity that w ould follo w from an optimal lo cal la w. Th us the rate of concentration assumed in Assumption 2 is needed to ensure that an av eraged lo cal la w holds at the optimal scale. 2.6.2. Spin glass mo dels. As an illustration of the role of Assumption 3 , consider a spherical spin glass mo del. Let J ∈ ( R n ) ⊗ p b e a symmetric random tensor with i.i.d. standard Gaussian entries (up to symmetry), scaled so that ∥ J ∥ F ≍ n p/ 2 , and let g b e dra wn from the Gibbs measure dµ ( g ) ∝ exp  β n ( p − 1) / 2 ⟨ J, g ⊗ p ⟩  on the sphere { g ∈ R n : ∥ g ∥ 2 2 = n } , where β > 0 is the inv erse temperature or coupling constant. F or β < β 0 some absolute constan t, the measure µ is easily chec k ed to satisfy a log-Sob olev inequalit y with a constant indep enden t of n , which implies concentration of quadratic forms; hence Assumption 2 holds and the a v eraged lo cal law of Theorem 2.5 applies to the sample cov ariance matrix of i.i.d. draws from µ . Ho w ever, for the spherical 4-spin mo del, it can b e quan tified rigorously (c.f. [ FMPW26 ]) that ∥ κ 4 ∥ inj ≤ C  β 3 + β √ n + 1 n  , ∥ κ 4 ∥ F ≥ cβ √ n. A t β = n − 1 / 6 , this gives ∥ κ 4 ∥ inj / ∥ κ 4 ∥ 2 F ≤ C n − 7 / 6 , exhibiting a small ratio of injective norm to F rob enius norm. Then b y the reasoning discussed in Remark 2.2 , Assumption 3 do es not hold, so our curren t results do not imply an optimal anisotropic local la w in the form of Theorem 2.8 for the sample co v ariance matrix of i.i.d. draws from suc h a mo del. In this case of “disordered” cum ulan t tensors, w e note that cancellations of fluctuations in the cumulan t tensors themselv es m ust b e considered to obtain optimal estimates of v arious cumulan t contractions, and this is not captured by the proof metho d we employ based on Assumption 3 . 18 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 2.7. Pro of ideas. W e conclude this section b y explaining briefly the main nov elties of the pro ofs. F or any s ubset S ⊂ [ N ], let R ( S ) = ( N − 1 P i / ∈ S g i g ∗ i − z I ) − 1 denote the resolven t of the sample co- v ariance matrix lea ving out the samples in S . Then a classical argumen t of [ Sil95 ], applying the def- inition of z 0 ( e m ) and the Sherman-Morrison identit y R = R ( i ) − N − 1 R ( i ) g i g ∗ i R ( i ) / (1 + N − 1 g ∗ i R ( i ) g i ), sho ws the equality z 0 ( e m ) − z = − 1 e m · 1 N n X i =1 N − 1 T r( g i g ∗ i − Σ) R ( i ) ( I + e m Σ) − 1 1 + N − 1 g ∗ i R ( i ) g i . W e in tro duce a control parameter Φ = Φ( z ) ∈ [ N − 1 / 2 , 1] depending on z and N , whose explicit form is not imp ortant here (c.f. Lemma 3.4 ). Assuming the t wo a priori estimates N − 1 ∥ R ( S ) ∥ F ≺ Φ o ver all S ⊂ [ N ] of b ounded cardinalit y , | e m − e m 0 | ≪ 1 , (2.21) it is readily chec k ed from Assumption 2 and regularit y of the sp ectral domain that      N − 1 T r( g i g ∗ i − Σ) R ( i ) ( I + e m Σ) − 1 1 + N − 1 g ∗ i R ( i ) g i      ≺ Φ , and hence also | z 0 ( e m ) − z | ≺ Φ. The main technical step to establish an optimal a v eraged lo cal la w is to impro v e this estimate to | z 0 ( e m ) − z | ≺ Φ 2 . (2.22) The a veraged lo cal la w then follo ws b y applying the stability condition of ( 2.11 ) in conjunction with a stochastic contin uit y argumen t as developed and used in e.g. [ ESY09a ; EKYY13b ; EKYY13a ; PY14 ; KY17 ]. T o illustrate the fluctuation av eraging mechanism that leads to ( 2.22 ), and also to explain our main no velties for extending this to an anisotropic local la w, consider the sligh tly simpler quantit y ∆( A ) = n X i =1 N − 1 T r( g i g ∗ i − Σ) R ( i ) A for any deterministic matrix A ∈ C n × n with ∥ A ∥ op ≤ 1. The b ound ( 2.22 ) is analogous to the simpler b ound | ∆( A ) | ≺ N Φ 2 (ignoring the factors in the denominator of z 0 ( ˜ m ) − z , whic h do not significan tly alter the pro of strategy). Applying the Sherman-Morrison identit y to resolve the dep endence of R ( i ) on g j and vice v ersa, E | ∆( A ) | 2 = n X i,j =1 E h N − 1 T r( g i g ∗ i − Σ) R ( i ) A · N − 1 T r( g j g ∗ j − Σ) R ( j ) A i = O ≺ ( N Φ 2 ) + 1 N X i  = j E " T r( g i g ∗ i − Σ) R ( ij ) A √ N | {z } := Y ( ij ) i − T r( g i g ∗ i − Σ) R ( ij ) N g j g ∗ j R ( ij ) A √ N | {z } := Z ( ij ) ij j · 1 1 + N − 1 g ∗ j R ( ij ) g j | {z } := Q ( ij ) j ! × T r( g j g ∗ j − Σ) R ( ij ) A √ N | {z } := Y ( ij ) j − T r( g j g ∗ j − Σ) R ( ij ) N g i g ∗ i R ( ij ) A √ N | {z } := Z ( ij ) j ii · 1 1 + N − 1 g ∗ i R ( ij ) g i | {z } := Q ( ij ) i !# Observ e that E [ Y ( ij ) i Y ( ij ) j ] = 0, E [ Y ( ij ) i Z ( ij ) j ii Q ( ij ) i ] = 0, and E [ Z ( ij ) ij j Q ( ij ) j Y ( ij ) j ] = 0. Assumption 2 and the a priori estimates ( 2.21 ) imply that |Z ( ij ) j ii | , |Z ( ij ) ij j | ≺ √ N Φ 2 and |Q ( ij ) i | , |Q ( ij ) j | ≺ 1. Hence E | ∆( A ) | 2 ≺ N 2 Φ 4 , and extending this to a high momen t estimate sho ws | ∆( A ) | ≺ N Φ 2 as desired. This fluctuation a v eraging idea w as used in [ WWF24 ] in the same con text of a non-separable ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 19 sample cov ariance matrix mo del as our current w ork, to analyze outlier eigenv alues/eigenv ectors in the setting of a spik ed co v ariance Σ. Sho wing an optimal anisotropic lo cal la w for the resolv en t R ( z ) will instead require a stronger estimate that is analogous to a bound of the form | ∆( uu ∗ ) | ≺ Φ for a rank-one matrix input A = uu ∗ , assuming the a priori conditions in place of ( 2.21 ), N − 1 / 2 ∥ R ( S ) x ∥ 2 ≺ Φ o ver all deterministic unit vectors x ∈ C n , | e m − e m 0 | ≪ 1 . (2.23) Applying the same expansion for E | ∆( uu ∗ ) | 2 , we may readily chec k as ab o v e that E | ∆( uu ∗ ) | 2 = O ≺ (Φ 2 ) + 1 N X i  = j E [ Z ( ij ) ij j Q ( ij ) j Z ( ij j ii Q ( ij ) i ] (2.24) where now |Z ( ij ) ij j | , |Z ( ij ) j ii | ≺ Φ 2 and |Q ( ij ) i | , |Q ( ij ) j | ≺ 1 under ( 2.23 ). Ho wev er, the primary c hallenge is that this only giv es a b ound of E | ∆( uu ∗ ) | 2 ≺ N Φ 4 , leading to | ∆( uu ∗ ) | ≺ √ N Φ 2 , (2.25) whic h do es not imply the desired b ound | ∆( uu ∗ ) | ≺ Φ when z ∈ C + is a lo cal sp ectral parameter. Ov ercoming this c hallenge w as a cen tral technical con tribution of [ KY17 ], who prop osed instead a metho d of b ootstrapping on the sp ectral scale in multiplicativ e incremen ts of N − δ in the imaginary part, and using the additional a priori estimate | x ∗ R ( S ) y | ≺ N 2 δ o v er all deterministic unit vectors x , y ∈ C n (2.26) obtained from the preceding b o otstrap iteration to establish the b ound | x ∗ R ( S ) y − x ∗ ( − z I − z e m 0 Σ) − 1 y | ≺ N C δ Φ . (2.27) This then implies also | x ∗ R ( S ) y | ≺ 1, which allo ws one to contin ue the b ootstrap. The argument in [ KY17 ] for showing ( 2.27 ) from ( 2.26 ) was sp ecific to a separable mo del g = Σ 1 / 2 w where w has indep enden t en tries, and relied on a technically intricate interpolation b etw een w and a Gaussian v ector z ∼ N (0 , I ). In this work, w e emplo y the same b o otstrapping structure of [ KY17 ], but pursue a more general pro of by directly establishing the desired fluctuation a v eraging b ound | ∆( uu ∗ ) | ≺ N 2 δ Φ (2.28) under the a priori assumptions ( 2.23 ) and ( 2.26 ), without interpolation. W e note that the ab ov e estimates |Z ( ij ) ij j | , |Z ( ij ) j ii | ≺ Φ 2 and |Q ( ij ) i | , |Q ( ij ) j | ≺ 1 leading to ( 2.25 ) are sharp, and that this impro v ed b ound ( 2.28 ) will instead arise from verifying that under the cumulan t condition of Assumption 3 , the exp ectation of Z ( ij ) ij j Q ( ij ) j Z ( ij ) j ii Q ( ij ) i is m uc h smaller than its typical size. W e will sho w this through a p ow er series expansion of Q ( ij ) i together with a momen t-cum ulant expansion of E Z ( ij ) ij j Q ( ij ) j Z ( ij ) j ii Q ( ij ) i . T o illustrate this calculation, consider the simplest term E Z ( ij ) ij j Z ( ij ) j ii . Recalling the forms of Z ( ij ) ij j and Z ( ij ) j ii with A = uu ∗ , we observe that E Z ( ij ) ij j Z ( ij ) j ii dep ends on the 4th-order momen t tensors E g ⊗ 4 i and E g ⊗ 4 j , and one term in the resulting momen t-cum ulan t expansion takes the form v al := n X α 1 ,...,α 4 ,β 1 ,...,β 4 =1 κ 4 ( g i )[ α 1 , . . . , α 4 ] κ 4 ( g j )[ β 1 , . . . , β 4 ] × u [ α 1 ] R ( ij ) u √ N [ α 2 ] u [ β 1 ] R ( ij ) u √ N [ β 2 ] R ( ij ) N [ α 3 , β 3 ] R ( ij ) N [ α 4 , β 4 ] . (2.29) 20 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES W e observ e that v al tak es the form v al = ⟨ κ 4 ( g i ) , u ⊗ R ( ij ) u √ N ⊗ T i ⟩ , where T i ∈ ( C n ) ⊗ 2 is the order-2 tensor enco ding the remaining contraction inv olving κ 4 ( g j ). Applying Assumption 3 with k = 4 and m = 2 giv es | v al | ≺ ( √ N ) 4 − 2 − 1 ∥ u ∥ 2      R ( ij ) u √ N      2 ∥ T i ∥ U 4 . (2.30) T o b ound the U 4 -norm, we expand ∥ T i ∥ U 4 = sup x , y ∈U 4 |⟨ x ⊗ y , T i ⟩| . F or eac h fixed x , y ∈ U 4 , ⟨ x ⊗ y , T i ⟩ = D κ 4 ( g j ) , u ⊗ R ( ij ) u √ N ⊗ R ( ij ) ∗ x N ⊗ R ( ij ) ∗ y N | {z } =: T j ( x , y ) E . A second application of Assumption 3 with k = 4 and m = 2 then gives |⟨ x ⊗ y , T i ⟩| ≺ ( √ N ) 4 − 2 − 1 ∥ u ∥ 2      R ( ij ) u √ N      2 ∥ T j ( x , y ) ∥ U 4 ≤ ( √ N ) 4 − 2 − 1 ∥ u ∥ 2      R ( ij ) u √ N      2 sup v , w ∈U 4      v ∗ R ( ij ) w N      ! 2 . Since |U 4 | ≤ n C 4 has p olynomial cardinality , com bining with ( 2.30 ) and applying the a priori conditions ( 2.23 ) and ( 2.26 ) sho ws that | v al | = O ≺ ( N − 1+4 δ Φ 2 ). F or general terms in the high momen t expansion, to manage the algebraic complexity , we will in tro duce a tensor net work formalism and systematically map the moment expansions to a family of tensor net works. Note that the example ab ov e has the following diagrammatic representation: κ 4 ( g i ) κ 4 ( g j ) R ( ij ) N R ( ij ) N u R ( ij ) u √ N u R ( ij ) u √ N α 1 α 2 α 3 β 3 α 4 β 4 β 1 β 2 Applying Assumption 3 on κ 4 ( g i ) κ 4 ( g j ) R ( ij ) N R ( ij ) N x y u R ( ij ) u √ N α 3 α 4 β 3 β 4 β 1 β 2 As illustrated in this example, we employ a recursiv e “p eeling” strategy: w e successively remov e high-degree vertices (representing higher-order cumulan ts) from the graph, using Assumption 3 to con trol the cost of each remo v al, un til the net w ork is reduced to simple c hains and cycles that can b e b ounded directly (see Lemmas 3.20 – 3.24 ). Finally , we emphasize the dual role of our tensor net w ork framework. Beyond driving the pro ofs of our main results, it pro vides the essential mac hinery for verifying Assumption 3 in complex ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 21 nonlinear settings. While v erification is relativ ely straigh tforw ard for linear or sign-inv arian t mo dels (Prop ositions 2.12 and 2.13 ), the random features model (Proposition 2.17 ) presen ts significan t com binatorial c hallenges. In Section 5.3 , we address these by coupling a monomial expansion of g = σ ( X w ) with a tensor net work representation of the cum ulant form ⟨ κ ( g ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩ . This allows us to systematically b ound the resulting diagrams and rigorously establish the required cum ulan t deca y . 3. Fluctua tion a veraging lemmas The primary tec hnical input for our main results consists of a sequence of fluctuation a veraging lemmas that resolv e the w eak dep endence of the resolven t R ( z ) on eac h individual sample g j for j  = i . In this section, we first state and pro v e these fluctuation av eraging lemmas. Definition 3.1 (Minors) . F or an y S ⊆ [ N ], let G ( S ) ∈ R n × N b e the matrix with columns G ( S ) e i = ( g i if i / ∈ S 0 otherwise . Let K ( S ) = 1 N G ( S ) G ( S ) ∗ = 1 N X i ∈ [ N ] \ S g i g ∗ i , e K ( S ) = 1 N G ( S ) ∗ G ( S ) , and denote their resolven ts and Stieltjes transforms R ( S ) ( z ) = ( K ( S ) − z I n ) − 1 , m ( S ) ( z ) = n − 1 T r R ( S ) ( z ) , e R ( S ) ( z ) = ( e K ( S ) − z I N ) − 1 , e m ( S ) ( z ) = N − 1 T r e R ( S ) ( z ) . The exp ectation with resp ect to the columns { g i } i ∈ S of G is denoted as E S [ · ] := E [ · | { g i } i ∈ [ N ] \ S ] . W e will often omit the sp ectral argument z for brevit y when the meaning is clear. Definition 3.2 (Sto c hastic domination) . F or ( N , n )-dep enden t random v ariables X ∈ R and Y ≥ 0, w e write X ≺ Y or X = O ≺ ( Y ) if, for any constants ε, D > 0, there exists C ≡ C ( ε, D ) > 0 suc h that P [ | X | ≥ N ε Y ] ≤ C N − D . (Equiv alently , there exists N 0 ≡ N 0 ( ε, D ) such that P [ | X | ≥ N ε Y ] ≤ N − D for all N ≥ N 0 .) F or X ≡ X ( u ) and Y ≡ Y ( u ) depending on a parameter u ∈ U ( N ) , we say that X ≺ Y or X = O ≺ ( Y ) uniformly o ver u ∈ U ( N ) if the preceding holds for a uniform constan t C ≡ C ( ε, D ) > 0 and all u ∈ U ( N ) . This constan t C ( ε, D ) may dep end on other constan t quan tities in the context of the statemen t, including τ and the regularit y constants C , c > 0 in Definition 2.3 for a regular sp ectral domain. W e will often use implicitly the following well-kno wn prop erties of sto chastic domination ≺ . F or the pro of of the follo wing lemma, we refer to [ FJ22 , Lemma D.2]. Lemma 3.3. (a) If X ( u, v ) ≺ ζ ( u, v ) uniformly over u ∈ U and v ∈ V , and | V | ≤ n C for some c onstant C > 0 , then uniformly over u ∈ U , X v ∈ V X ( u, v ) ≺ X v ∈ V ζ ( u, v ) (b) If X 1 ≺ ζ 1 and X 2 ≺ ζ 2 uniformly over u ∈ U , then also X 1 X 2 ≺ ζ 1 ζ 2 uniformly over u ∈ U . 22 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES (c) Supp ose X ≺ ζ uniformly over u ∈ U , wher e ζ is deterministic, ζ > n − C , and E [ | X | k ] ≤ n C k for al l k ∈ [1 , ∞ ) and some c onstants C, C k > 0 . Then E [ X | G ] ≺ ζ uniformly over u ∈ U and over al l sub-sigma-fields G ⊆ F of the underlying pr ob ability sp ac e (Ω , F , P ) . F or an y z = E + iη ∈ C + and S ⊂ [ N ], let us define the control parameter Γ ( S ) ( z ) = max i / ∈ S | e R ( S ) ii ( z ) − e m 0 ( z ) | . The following lemma will b e used to sho w the a v eraged local la w of Theorem 2.5 . Lemma 3.4. Supp ose Assumptions 1 and 2 hold, and D ⊂ C + is a r e gular sp e ctr al domain. Supp ose ther e exists a c onstant τ ′ > 0 and a deterministic function Φ : D → [ N − 1 / 2 , N − τ ′ ] such that for any fixe d L ≥ 1 , uniformly over al l S ⊆ [ N ] with | S | ≤ L , al l z ∈ D , we have Γ ( S ) ≺ N − τ ′ , ∥ R ( S ) ∥ F N ≺ Φ , Then uniformly over al l z ∈ D and deterministic matric es A ∈ C n × n with ∥ A ∥ op ≤ 1 ,      N X i =1 N − 1 T r( g i g ∗ i − Σ) R ( i ) A 1 + N − 1 g ∗ i R ( i ) g i      ≺ N Φ 2 . T o establish anisotropic lo cal la ws, our fluctuation av eraging lemmas will assume, as a stronger input, the condition N − 1 / 2 ∥ R ( S ) x ∥ 2 ≺ Φ uniformly o v er unit vectors x ∈ C n . W e note that this implies the preceding condition N − 1 ∥ R ( S ) ∥ F ≺ Φ, since N − 1 ∥ R ( S ) ∥ F ≤ max n α =1 N − 1 / 2 ∥ R ( S ) e α ∥ 2 . The following lemma will b e used to sho w Theorem 2.10 outside the sp ectrum. Lemma 3.5. Supp ose Assumptions 1 and 2 hold, and D ⊂ C + is a r e gular sp e ctr al domain. Supp ose ther e exists a c onstant τ ′ > 0 and a deterministic function Φ : D → [ N − 1 / 2 , N − τ ′ ] such that for any fixe d L ≥ 1 , uniformly over al l S ⊆ [ N ] with | S | ≤ L , al l z ∈ D , and al l deterministic unit ve ctors x ∈ C n , we have Γ ( S ) ≺ N − τ ′ , ∥ R ( S ) x ∥ 2 √ N ≺ Φ . Then (a) Uniformly over al l z ∈ D and deterministic unit ve ctors u ∈ C n ,      N X i =1 N − 1 u ∗ ( g i g ∗ i − Σ) R ( i ) u 1 + N − 1 g ∗ i R ( i ) g i      ≺ √ N Φ 2 . (b) Uniformly over al l z ∈ D and deterministic unit ve ctors v ∈ C N ,       X i  = j ¯ v [ i ] v [ j ] N − 1 g ∗ i R ( ij ) g j (1 + N − 1 g ∗ i R ( i ) g i )(1 + N − 1 g ∗ j R ( ij ) g j )       ≺ N Φ 3 . (c) Uniformly over al l z ∈ D and deterministic unit ve ctors v ∈ C N and u ∈ C n ,      N X i =1 ¯ v ( i ) N − 1 / 2 g ∗ i R ( i ) u 1 + N − 1 g ∗ i R ( i ) g i      ≺ √ N Φ 2 . The follo wing strengthening of Lemma 3.4 will show, under the additional condition of Assump- tion 3 , the full anisotropic lo cal law of Theorem 2.8 . F or a resolven t with sp ectral decomp osition R = P α ( λ α − z ) − 1 v α v ∗ α , we write ∥ R ∥ 1 = P α | λ α − z | − 1 . ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 23 Lemma 3.6. Supp ose, in addition to the c onditions of L emma 3.5 , that Assumption 3 also holds, and ther e exists a c onstant δ ≥ 0 such that for any fixe d L ≥ 1 , uniformly over al l S ⊆ [ N ] with | S | ≤ L , al l z ∈ D , and al l deterministic unit ve ctors x , y ∈ C n , we have N − 1 ∥ R ( S ) ∥ 1 ≺ 1 , | x ∗ R ( S ) y | ≺ N δ . Then (a) Uniformly over al l z ∈ D and deterministic unit ve ctors u ∈ C n ,      N X i =1 N − 1 u ∗ ( g i g ∗ i − Σ) R ( i ) u 1 + N − 1 g ∗ i R ( i ) g i      ≺ N δ Φ . (3.1) (b) Uniformly over al l z ∈ D and deterministic unit ve ctors v ∈ C N ,       X i  = j ¯ v ( i ) v ( j ) N − 1 g ∗ i R ( ij ) g j (1 + N − 1 g ∗ i R ( i ) g i )(1 + N − 1 g ∗ j R ( ij ) g j )       ≺ N 2 δ Φ . (c) Uniformly over al l z ∈ D and deterministic unit ve ctors u ∈ C n and v ∈ C N ,      N X i =1 ¯ v ( i ) N − 1 / 2 g ∗ i R ( i ) u 1 + N − 1 g ∗ i R ( i ) g i      ≺ N δ Φ . In the remainder of this section, we pro v e Lemmas 3.4 , 3.5 , and 3.6 . 3.1. Preliminaries. Lemma 3.7 (Resolven t identities) . F or any S ⊆ [ N ] : (a) F or any distinct i, j / ∈ S , e R ( S ) ii = − 1 z 1 1 + N − 1 g ∗ i R ( iS ) g i , e R ( S ) ij = z e R ( S ) ii e R ( iS ) j j · N − 1 g ∗ i R ( ij S ) g j . (3.2) (b) F or any i, j, k / ∈ S (including i = j ) with k / ∈ { i, j } , e R ( S ) ij = e R ( kS ) ij + e R ( S ) ik e R ( S ) kj e R ( S ) kk , 1 e R ( S ) ii = 1 e R ( kS ) ii − e R ( S ) ik e R ( S ) ki e R ( kS ) ii e R ( S ) kk e R ( S ) ii . (3.3) (c) (Sherman-Morrison) F or any i / ∈ S , R ( S ) = R ( iS ) − N − 1 R ( iS ) g i g ∗ i R ( iS ) 1 + N − 1 g ∗ i R ( iS ) g i = R ( iS ) + z e R ( S ) ii · N − 1 R ( iS ) g i g ∗ i R ( iS ) . (3.4) (d) (War d’s identity) ∥ R ( S ) ∥ 2 F := T r R ( S ) R ( S ) ∗ = Im T r R ( S ) Im z . (3.5) Pr o of. Parts (a–b) follow from applying Sch ur’s complemen t to the linearized resolv en t Π( z ). Part (c) is the standard Sherman-Morrison formula for a rank-one up date of the matrix in verse R ( S ) . P art (d) follows from the iden tit y R ( S ) − R ( S ) ∗ = R ( S ) ( z − ¯ z ) R ( S ) ∗ = (2 Im z ) R ( S ) R ( S ) ∗ . □ Lemma 3.8 (Concentration) . Under Assumptions 1 and 2 , we have (a) (Line ar and quadr atic forms) Uniformly over al l deterministic A ∈ C n × n , u ∈ C n , and i, j ∈ [ N ] with i  = j , g ∗ i A g j ≺ ∥ A ∥ F , ∥ A g i ∥ 2 ≺ ∥ A ∥ F , u ∗ g i ≺ ∥ u ∥ 2 . (b) (Op er ator norm) F or every S ⊆ [ N ] , ∥ K ( S ) ∥ op ≤ ∥ K ∥ op ≺ 1 . 24 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Pr o of. F or (a), we ha v e | u ∗ g i | 2 = g ∗ i uu ∗ g i = u ∗ Σ u + O ≺ ( ∥ u ∥ 2 2 ) = O ≺ ( ∥ u ∥ 2 2 ), where the second equalit y follo ws from Assumption 2 applied to the matrix uu ∗ . Thus u ∗ g i ≺ ∥ u ∥ 2 . Next, since g i and g j are indep endent, this implies g ∗ i A g j ≺ ∥ A g j ∥ 2 . Moreo v er, ∥ A g j ∥ 2 2 = g ∗ j A ∗ A g j = T r Σ A ∗ A + O ≺ ( ∥ A ∗ A ∥ F ) = O ≺ ( ∥ A ∥ 2 F ) , so ∥ A g j ∥ 2 ≺ ∥ A ∥ F and thus g ∗ i A g j ≺ ∥ A ∥ F . This pro v es (a). F or (b), note that 0 ⪯ K ( S ) ⪯ K in the p ositiv e-definite ordering, so ∥ K ( S ) ∥ op ≤ ∥ K ∥ op . T o sho w ∥ K ∥ op ≺ 1, define A i = N − 1 ( g i g ∗ i − Σ). Then K = P N i =1 A i + Σ. By Assumption 1 , ∥ Σ ∥ op ≤ C , so it suffices to sho w ∥ P N i =1 A i ∥ op ≺ 1. Since the A i are indep endent, mean-zero, symmetric random matrices, the matrix Rosen thal inequalit y implies that for any k ≥ 1, there exists a constan t C k > 0 suc h that E 1 N T r N X i =1 A i ! 2 k ≤ C k      N X i =1 E A 2 i      k op | {z } = I + C k N X i =1 E ∥ A i ∥ 2 k op | {z } = I I . By Assumption 1 , for eac h i ∈ [ N ], ∥ A i ∥ op ≤ N − 1 ∥ g i ∥ 2 2 + N − 1 ∥ Σ ∥ op ≤ N − 1 ∥ g i ∥ 2 2 + C / N . Moreov er, ∥ g i ∥ 2 2 = g ∗ i I n g i = T r Σ + O ≺ ( ∥ I n ∥ F ) = O ≺ ( n ), so N − 1 ∥ g i ∥ 2 2 ≺ 1. Thus ∥ A i ∥ op ≺ 1. It follows that for any fixed k ≥ 1, ∥ A i ∥ 2 k op ≺ 1 and hence I I ≺ N . F or the first term I , consider any unit v ector v ∈ C n . Then, since 0 ⪯ E A 2 i = N − 2 [ E ( g i g ∗ i ) 2 − Σ 2 ] ⪯ N − 2 E ( g i g ∗ i ) 2 , we ha v e P N i =1 v ∗ E A 2 i v ≤ P N i =1 N − 2 E v ∗ ( g i g ∗ i ) 2 v . Since also v ∗ g i ≺ 1 and ∥ g i ∥ 2 2 ≺ n as sho wn ab o v e, this shows P N i =1 v ∗ E A 2 i v ≺ 1. Th us for an y fixed k ≥ 1, ∥ P n i =1 E A 2 i ∥ k op ≺ 1 and hence I ≺ 1. Putting this together yields E 1 N T r N X i =1 A i ! 2 k ≺ N . The claim no w follows from Mark o v’s inequality: F or any ε, D > 0, there exist constants k = k ( ε, D ) > 0 and C ≡ C ( ε, D ) > 0 large enough suc h that P        N X i =1 A i      op ≥ N ε   ≤ N − 2 εk E      N X i =1 A i      2 k op ≤ N − 2 εk E T r N X i =1 A i ! 2 k ≤ C N − 2 εk N 2+ ε ≤ C N − D . □ ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 25 3.2. Sherman-Morrison recursions. F or any A ∈ C n × n , u ∈ C n , S ⊂ [ N ], and i, j, k ∈ [ N ] \ S , define Y ( S ) i [ A ] = T r( g i g ∗ i − Σ) R ( S ) A √ N , Z ( S ) ij k [ A ] = T r( g i g ∗ i − Σ) R ( S ) N g j g ∗ k R ( S ) A √ N , i / ∈ { j, k } , X ( S ) i [ u ] = g ∗ i R ( S ) u √ N , B ( S ) j k = g ∗ j R ( S ) N g k , j  = k , P ( S ) i = T r( g i g ∗ i − Σ) R ( S ) N , Q ( S ) i =  1 + g ∗ i R ( S ) N g i  − 1 , C ( S ) =  1 + T r Σ R ( S ) N  − 1 . Note that each R ( S ) in the expressions is scaled by N − 1 , and eac h R ( S ) A and R ( S ) u is scaled by N − 1 / 2 . W e remark that the v ectors g i are real and R ( S ) is symmetric (in the real sense, without complex conjugation) so B ( S ) j k = B ( S ) kj . W e first establish basic bounds for these quantities under the conditions of Lemmas 3.4 and 3.5 . Lemma 3.9. Supp ose the c onditions of L emma 3.4 hold. Then for any fixe d L ≥ 1 , uniformly over al l S ⊆ [ N ] with | S | ≤ L , al l i, j, k ∈ [ N ] \ S with i / ∈ { j, k } , al l A ∈ C n × n with ∥ A ∥ op ≤ 1 , al l unit ve ctors u ∈ C n , and al l z = E + iη ∈ D , we have |Y ( S ) i [ A ] | ≺ N 1 / 2 Φ , |Z ( S ) ij k [ A ] | ≺ N 1 / 2 Φ 2 , |B ( S ) ij | ≺ Φ , |P ( S ) i | ≺ Φ , |Q ( S ) i | ≺ 1 , |C ( S ) | ≺ 1 . If, in addition, the c ondition N − 1 / 2 ∥ R ( S ) u ∥ 2 ≺ Φ of L emma 3.5 holds, then |Y ( S ) i [ uu ∗ ] | ≺ Φ , |Z ( S ) ij k [ uu ∗ ] | ≺ Φ 2 , |X ( S ) i [ u ] | ≺ Φ . Pr o of. F or the first t w o b ounds for Y ( S ) i and Z ( S ) ij k , Assumption 2 and Lemma 3.8 (a) give |Y ( S ) i [ A ] | ≺ N − 1 / 2 ∥ R ( S ) A ∥ F , |Z ( S ) ij k [ A ] | ≺ ∥ N − 1 R ( S ) g j g ∗ k N − 1 / 2 R ( S ) A ∥ F ≺ N − 1 ∥ R ( S ) ∥ F · N − 1 / 2 ∥ R ( S ) A ∥ F . Since ∥ R ( S ) A ∥ F ≤ ∥ R ( S ) ∥ F ∥ A ∥ op ≤ ∥ R ( S ) ∥ F and N − 1 ∥ R ( S ) ∥ F ≺ Φ, the first t w o b ounds follo w. F or a unit v ector u and A = uu ∗ , w e ha v e ∥ R ( S ) uu ∗ ∥ F = ∥ R ( S ) u ∥ 2 , so under the additional assumption N − 1 / 2 ∥ R ( S ) u ∥ 2 ≺ Φ, this shows |Y ( S ) i [ uu ∗ ] | ≺ Φ and |Z ( S ) ij k [ uu ∗ ] | ≺ Φ as well as |X ( S ) i [ u ] | ≺ N − 1 / 2 ∥ R ( S ) u ∥ 2 ≺ Φ . The b ounds |B ( S ) ij | ≺ Φ and |P ( S ) i | ≺ Φ follow directly from Assumption 2 and the condition N − 1 ∥ R ( S ) ∥ F ≺ Φ. F or Q ( S ) i and C ( S ) , Lemma 3.7 (a) and the condition Γ ( S ) ≺ N − τ ′ imply Q ( S ) i = − z e R ( S \{ i } ) ii = − z e m 0 + O ≺ (Γ ( S \{ i } ) ) = − z e m 0 + O ≺ ( N − τ ′ ). The conditions | z | ≍ 1 and | e m 0 ( z ) | ≍ 1 in Definition 2.3 for regularit y of D then imply that for some constants C , c > 0 and any D > 0, P [ c ≤ |Q ( S ) i | ≤ C ] ≥ 1 − C ( D ) n − D . 26 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES This shows |Q ( S ) i | ≺ 1. Moreov er, ( Q ( S ) i ) − 1 = 1 + g ∗ i ( N − 1 R ( S ) ) g i = 1 + T r Σ( N − 1 R ( S ) ) + T r( g i g ∗ i − Σ)( N − 1 R ( S ) ) = ( C ( S ) ) − 1 + O ≺ (Ψ) . Since Ψ ≤ N − τ ′ , the ab o v e bounds for |Q ( S ) i | then also imply |C ( S ) | ≺ 1. □ The follo wing lemma describ es a system of recursions for these quantities, derived from the Sherman-Morrison identit y ( 3.4 ). Lemma 3.10. F or any A ∈ C n × n , u ∈ C n , S ⊆ [ N ] , i, j, k , l ∈ [ N ] \ S with l / ∈ { i, j, k } and i / ∈ { j, k } , and z ∈ C + , we have Y ( S ) i [ A ] = Y ( S l ) i [ A ] − Z ( S l ) ill [ A ] Q ( S l ) l , (3.6) Z ( S ) ij k [ A ] = Z ( S l ) ij k [ A ] − Z ( S l ) ij l [ A ] B ( S l ) kl Q ( S l ) l − Z ( S l ) ilk [ A ] B ( S l ) lj Q ( S l ) l + Z ( S l ) ill [ A ] B ( S l ) kl B ( S l ) lj ( Q ( S l ) l ) 2 , (3.7) Z ( S ) ikk [ A ] = Z ( S l ) ikk [ A ] − Z ( S l ) ikl [ A ] B ( S l ) kl Q ( S l ) l − Z ( S l ) ilk [ A ] B ( S l ) lk Q ( S l ) l + Z ( S l ) ill [ A ] B ( S l ) kl B ( S l ) lk ( Q ( S l ) l ) 2 , (3.8) X ( S ) i [ u ] = X ( S l ) i [ u ] − X ( S l ) l [ u ] B ( S l ) il , (3.9) B ( S ) ij = B ( S l ) ij − B ( S l ) il B ( S l ) lj Q ( S l ) l . (3.10) Mor e over, fix any c onstants L ≥ 1 and D > 0 . If the c onditions of L emma 3.4 hold, then uniformly over al l z ∈ D , S ⊆ [ N ] with | S | ≤ L , and i, l ∈ [ N ] \ S with i  = l , Q ( S ) i = D X m =0 ( Q ( S l ) i ) m +1 ( B ( S l ) il B ( S l ) li Q ( S l ) l ) m + O ≺ ( N − 2 τ ′ ( D +1) ) , (3.11) Q ( S ) i = D X m =0 ( C ( S ) ) m +1 ( −P ( S ) i ) m + O ≺ ( N − τ ′ ( D +1) ) . (3.12) Pr o of. The identities ( 3.6 - 3.10 ) follo w directly from applying the Sherman-Morrison form ula ( 3.4 ) to the left sides. F or ( 3.11 ), by Lemma 3.9 w e ha v e |Q ( S ) i | , |C ( S ) | ≺ 1 , |B ( S ) ij | , |P ( S ) i | ≺ Φ ≤ N − τ ′ . Applying Sherman-Morrison to ( Q ( S ) i ) − 1 giv es ( Q ( S ) i ) − 1 = 1 + g ∗ i ( N − 1 R ( S ) ) g i = ( Q ( S l ) i ) − 1 − B ( S l ) il B ( S l ) li Q ( S l ) l . Using the iden tity for scalar v alues a, b  = 0 a − 1 = D X m =0 b − ( m +1) ( b − a ) m + a − 1 b − ( D +1) ( b − a ) D +1 , (3.13) w e obtain Q ( S ) i = D X m =0 ( Q ( S l ) i ) m +1 ( B ( S l ) il B ( S l ) li Q ( S l ) l ) m + ( Q ( S ) i )( Q ( S l ) i ) D +1 ( B ( S l ) il B ( S l ) li Q ( S l ) l ) D +1 = D X m =0 ( Q ( S l ) i ) m +1 ( B ( S l ) il B ( S l ) li Q ( S l ) l ) m + O ≺ ( N − 2 τ ′ ( D +1) ) , since |B ( S l ) il B ( S l ) li Q ( S l ) l | ≺ N − 2 τ ′ and |Q ( S ) i | , |Q ( S l ) i | ≺ 1. F or ( 3.12 ), apply ( 3.13 ) to ( Q ( S ) i ) − 1 = 1 + g ∗ i ( N − 1 R ( S ) ) g i = ( C ( S ) ) − 1 + P ( S ) i , ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 27 yielding similarly Q ( S ) i = D X m =0 ( C ( S ) ) m +1 ( −P ( S ) i ) m + O ≺ ( N − τ ′ ( D +1) ) . □ 3.3. Pro of of Lemma 3.4 . W e fix a matrix A ∈ C n × n with ∥ A ∥ op ≤ 1, and abbreviate Y ( S ) i = Y ( S ) i [ A ] , Z ( S ) ij k = Z ( S ) ij k [ A ] . All subsequent O ≺ ( · ) b ounds are implicitly uniform ov er all suc h matrices A . The following lemma c haracterizes the com binatorics of an expansion of Y ( i ) i Q ( i ) i to resolve the dep endence on v ariables { g j } j ∈ S in a set S ⊂ [ N ], using the recursions of Lemma 3.10 . Lemma 3.11. Supp ose the c onditions of L emma 3.4 hold. Fix any L ≥ 1 and D > 0 . Then uniformly over z ∈ D , S ⊂ [ N ] with | S | ≤ L , and i ∈ S , the fol lowing holds: Denote S ( i ) = S \ { i } . Ther e exists a c ol le ction of monomials M i,S such that the quantity Y ( i ) i Q ( i ) i c an b e exp ande d as Y ( i ) i Q ( i ) i = X q ∈M i,S q  Y ( S ) i , {Z ( S ) ij k } j,k ∈ S ( i ) , {B ( S ) ij } j ∈ S ( i ) , {B ( S ) j k } j  = k ∈ S ( i ) , P ( S ) i , {P ( S ) j } j ∈ S ( i ) , C ( S )  + O ≺ ( N − τ ′ ( D +1)+1 / 2 ) . (3.14) Each monomial q ∈ M i,S is a pr o duct of ± 1 and one or mor e of its inputs, al lowing r ep etition. We have q = O ≺ ( N 1 / 2 Φ) uniformly over q ∈ M i,S , and the numb er of monomials |M i,S | is at most a c onstant dep ending only on L, D . F urthermor e, for e ach q ∈ M i,S , letting m Y , m Z , m ∗ B , m B denote the numb ers of factors of the forms Y ( S ) i , {Z ( S ) ij k } j,k ∈ S ( i ) , {B ( S ) ij } j ∈ S ( i ) , and {B ( S ) j k } j  = k ∈ S ( i ) in q , we have: (a) m Y + m Z = 1 . (b) m ∗ B is even. (c) The numb er of distinct indic es of S ( i ) that app e ar as lower indic es acr oss al l factors of q is at most m Z + m B + 1 2 m ∗ B . Pr o of. W e arbitrarily order the indices of S ( i ) as l 1 , . . . , l | S |− 1 . Beginning with the term Y ( i ) i Q ( i ) i , iterativ ely for j = 1 , . . . , | S | − 1, w e replace all factors with sup erscript ( il 1 . . . l j − 1 ) by a sum of terms with sup erscript ( il 1 . . . l j ), using the recursions ( 3.6 - 3.8 ) for Y and Z , ( 3.10 ) for B , and ( 3.11 ) for Q . After we hav e replaced all sup erscripts to b e ( S ) = ( il 1 . . . l | S |− 1 ), w e then apply the recursion ( 3.12 ) for Q to replace eac h factor Q ( S ) l b y factors C ( S ) and P ( S ) l . It is then direct to c hec k that this giv es a represen tation of the form ( 3.14 ), where: • Each application of ( 3.6 - 3.8 ) replaces a factor Y ( ... ) i or Z ( ... ) ij k b y terms having exactly one suc h factor. Thus, each monomial q ∈ M i,S has exactly one factor Y ( S ) i or Z ( S ) ij k , i.e. m Y + m Z = 1. • The num b er of total applications of ( 3.6 - 3.8 ), ( 3.10 ), ( 3.11 ), and ( 3.12 ) is b ounded b y a constan t dep ending on L, D , so |M i,S | and the n um b er of factors of each q ∈ M i,S are also b ounded by constants dep ending on L, D . By the bounds of Lemma 3.9 , the factor Y ( S ) i or Z ( S ) ij k of q is O ≺ ( N 1 / 2 Φ), and all other factors are O ≺ (1). Th us each q ∈ M i,S satisfies O ≺ ( N 1 / 2 Φ), and the remainder in ( 3.14 ) is at most O ≺ ( N − τ ′ ( D +1)+1 / 2 ). 28 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES • Each application of ( 3.11 ) replaces Q ( ... ) i or Q ( ... ) j b y terms having an ev en n um b er of factors {B ( ... ) ij } j ∈ S ( i ) , and each application of ( 3.6 - 3.8 ) do es not c hange the num ber of factors of the form {B ( ... ) ij } j ∈ S ( i ) . Thus m ∗ B is even. • If an index of S ( i ) app ears as a low er index on the left side of ( 3.6 - 3.8 ), ( 3.10 ), or ( 3.11 ), then it also app ears as a low er index on some factor of each term of the right side. F urthermore, eac h term on the right side of ( 3.6 - 3.8 ) and ( 3.10 ) that contains any factor with the new lo w er index l has at least one more factor of the form {Z ( ... ) ij k } j,k ∈ S ( i ) or {B ( ... ) j k } j  = k ∈ S ( i ) than the left side, and similarly each term on the righ t side of ( 3.11 ) that contains the new lo w er index l has at least t w o more factors of the form {B ( ... ) ij } j ∈ S ( i ) than the left side. I.e., whenev er a new low e r index of S ( i ) is in tro duced, either m Z + m B increases b y at least 1, or m ∗ B increases by at least 2. Thus the n umber of distinct low er indices of S ( i ) across all factors of q is at most m Z + m B + 1 2 m ∗ B . Com bining these observ ations yields the lemma. □ Pr o of of L emma 3.4 . Fix constants L ≥ 1 and D > 0. A high moment expansion giv es E      1 √ N N X i =1 Y ( i ) i Q ( i ) i      2 L = N − L N X i 1 ,...,i 2 L =1 E L Y l =1 Y ( i l ) i l Q ( i l ) i l 2 L Y l = L +1 Y ( i l ) i l Q ( i l ) i l | {z } E m ( i 1 ,...,i 2 L ) Let S = { i 1 , . . . , i 2 L } denote the set of distinct indices in i 1 , . . . , i 2 L . Applying Lemma 3.11 to expand each Y ( i l ) i l Q ( i l ) i l o v er S , E m ( i 1 , . . . , i 2 L ) = X q 1 ∈M i 1 ,S · · · X q 2 L ∈M i 2 L,S E L Y l =1 q l 2 L Y l = L +1 ¯ q l | {z } := q + O ≺ ( N − τ ′ ( D +1)+ L ) . Hence, noting that the num ber of index tuples ( i 1 , . . . , i 2 L ) with |{ i 1 , . . . , i 2 L }| = s is at most C ( L ) N s for some constan t C ( L ) > 0, E      1 √ N N X i =1 Y ( i ) i Q ( i ) i      2 L ≺ max i 1 ,...,i 2 L ∈ [ N ] q 1 ∈M i 1 ,S ,...,q 2 L ∈M i 2 L ,S n N | S |− L E | E S q | o + N − τ ′ ( D +1)+ L (3.15) where E S denotes the partial exp ectation o v er { g i : i ∈ S } . W e now bound E S q = E S Q L l =1 q l Q 2 L l = L +1 ¯ q l . Let K ⊆ S corresp ond to the indices app earing exactly once in the list ( i 1 , . . . , i 2 L ). F or each l ∈ [2 L ], let m Z ( l ) , m B ( l ) , m ∗ B ( l ) denote the counts m Z , m B , m ∗ B of Lemma 3.11 for q l , and let m ∗ P ( l ) denote also the num b er of factors P ( S ) i l in q l . Set m Z = 2 L X l =1 m Z ( l ) , m B = 2 L X l =1 m B ( l ) , m ∗ B = 2 L X l =1 m ∗ B ( l ) , m ∗ P = 2 L X l =1 m ∗ P ( l ) . (3.16) W e consider three cases for eac h index i l ∈ K : • i l do es not app ear as a low er index on any factor of q l other than Y ( S ) i l or Z ( S ) i l j k , or on an y factor of { q l ′ : l ′  = l } . In this case, Lemma 3.11 (a) implies that the only factor of q whic h dep ends on g i l is the (exactly one) factor Y ( S ) i l or Z ( S ) i l j k of q l . Since E i l [ Y ( S ) i l ] = E i l [ Z ( S ) i l j k ] = 0, it follows that E S q = 0. ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 29 • i l app ears as a low er index in some { q l ′ : l ′  = l } . Since i l is distinct from i l ′ for eac h l ′  = l and hence b elongs to S ( i ′ l ) , Lemma 3.11 (c) ensures that the total num b er of such indices i l ∈ K is at most m Z + m B + 1 2 m ∗ B . • i l do es not app ear as a low er index on any { q l ′ : l ′  = l } , but app ears as a low er index on at least one factor P ( S ) i l or {B ( S ) i l j } j ∈ S ( i l ) of q l . Then either m ∗ P ( l ) ≥ 1, or m ∗ P ( l ) = 0 in which case Lemma 3.11 (b) ensures that m ∗ B ( l ) ≥ 2. So the num ber of suc h indices i l ∈ K is at most m ∗ P + 1 2 m ∗ B . Com bining these cases, w e see that either E S q = 0, or m Z + m B + 1 2 m ∗ B + m ∗ P + 1 2 m ∗ B = m Z + m B + m ∗ B + m ∗ P ≥ | K | . In the latter case, using Lemma 3.9 to b ound |Y ( S ) i | ≺ N 1 / 2 Φ, |Z ( S ) ij k | ≺ N 1 / 2 Φ 2 , |B ( S ) ij | ≺ Φ (for the factors coun ted b y both m B and m ∗ B ), |P ( S ) i | ≺ Φ, and each other factor of q b y O ≺ (1), since q has exactly 2 L factors |Y ( S ) i | and |Z ( S ) ij k | by Lemma 3.11 (a), we get | E S q | ≺ Φ | K | ( N 1 / 2 Φ) 2 L . (3.17) Hence E      1 √ N N X i =1 Y ( i ) i Q ( i ) i      2 L ≺ N | S | Φ | K | +2 L + N − τ ′ ( D +1)+ L . Since indices in K app ear exactly once, the remaining 2 L − | K | indices m ust app ear at least t wice, so (2 L − | K | ) / 2 + | K | = 2 L + | K | / 2 ≥ | S | (the n um b er of distinct indices). Thus N | S | Φ | K | +2 L ≤ ( N Φ 2 ) | K | / 2+ L ≤ ( N Φ 2 ) 2 L , where the last inequality uses | K | ≤ 2 L . Since Φ ≥ N − 1 / 2 , this is larger than the second term N − τ ′ ( D +1)+ L for a sufficien tly large c hoice of constan t D ≡ D ( τ ′ , L ) > 0. Th us E      1 √ N N X i =1 Y ( i ) i Q ( i ) i      2 L ≺ ( N Φ 2 ) 2 L . Then Marko v’s inequality implies | N − 1 / 2 P N i =1 Y ( i ) i Q ( i ) i | ≺ N Φ 2 , which shows Lemma 3.4 . □ 3.4. Pro of of Lemma 3.5 . Fixing a unit vector u ∈ C n , let us now abbreviate Y ( S ) i = Y ( S ) i [ uu ∗ ] , Z ( S ) ij k = Z ( S ) ij k [ uu ∗ ] , X ( S ) i = X ( S ) i [ u ] . All subsequent O ≺ ( · ) b ounds are implicitly uniform in u . W e now show Lemma 3.5 under the additional condition that N − 1 / 2 ∥ R ( S ) u ∥ 2 ≺ Φ. Pr o of of L emma 3.5 (a). The pro of is identical to Lemma 3.4 , where now b y Lemma 3.9 w e hav e the b ounds |Y ( S ) i | ≺ Φ and |Z ( S ) ij k | ≺ Φ 2 . This giv es, instead of ( 3.17 ), | E S q | ≺ Φ | K | Φ 2 L , and hence E      1 √ N N X i =1 Y ( i ) i Q ( i ) i      2 L ≺ N | S |− L Φ | K | +2 L + N − τ ′ ( D +1)+ L ≤ N − L ( N Φ 2 ) 2 L + N − τ ′ ( D +1)+ L Cho osing D large enough and applying Mark ov’s inequalit y shows Lemma 3.5 (a). □ W e next show Lemma 3.5 (b). The follo wing lemma is similar to Lemma 3.11 . Lemma 3.12. Supp ose the c onditions of L emma 3.5 hold. Fix any L ≥ 1 and D > 0 . Then uniformly over z ∈ D , S ⊂ [ N ] with | S | ≤ L , and i, j ∈ S with i  = j , the fol lowing holds: 30 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Denote S ( ij ) = S \{ i, j } . Then ther e exists a c ol le ction of monomials M ij,S such that B ( ij ) ij Q ( ij ) i Q ( ij ) j c an b e exp ande d as B ( ij ) ij Q ( i ) i Q ( ij ) j = X q ∈M ij,S q  {B ( S ) ij } , {B ( S ) ik } k ∈ S ( ij ) , {B ( S ) j k } k ∈ S ( ij ) , {B ( S ) kl } k  = l ∈ S ( ij ) , P ( S ) i , P ( S ) j , {P ( S ) k } k ∈ S ( ij ) , C ( S )  + O ≺ ( N − τ ′ ( D +1) ) . (3.18) Each monomial q ∈ M ij,S is a pr o duct of ± 1 and one or mor e of its inputs, al lowing r ep etition. We have q = O ≺ (Φ) uniformly over q ∈ M ij,S , and the numb er of monomials |M ij,S | is at most a c onstant dep ending only on L, D . F urthermor e, for e ach q ∈ M ij,S , letting m ∗∗ B , m ∗ 1 B , m ∗ 2 B , m B denote the numb ers of factors of the forms {B ( S ) ij } , {B ( S ) ik } k ∈ S ( ij ) , {B ( S ) j k } k ∈ S ( ij ) , {B ( S ) kl } k  = l ∈ S ( ij ) app e aring in q , we have: (a) Either m ∗∗ B ≥ 1 , or b oth m ∗ 1 B ≥ 1 and m ∗ 2 B ≥ 1 . (b) The numb er of o c curr enc es of i as a lower index in al l factors B ( S ) ... of q (i.e. m ∗∗ B + m ∗ 1 B ) is o dd. Similarly, the numb er of such o c curr enc es of j (i.e. m ∗∗ B + m ∗ 2 B ) is o dd. F or e ach k ∈ S ( ij ) , the numb er of o c curr enc es of k as a lower index acr oss al l factors B ( S ) ... of q is even. (c) The numb er of distinct indic es of S ( ij ) that app e ar as lower indic es acr oss al l factors of q is at most m B + 1 2 ( m ∗ 1 B + m ∗ 2 B ) . Pr o of. W e may first apply ( 3.11 ) to expand Q ( i ) i in j , to get B ( ij ) ij Q ( i ) i Q ( ij ) j = D X m =0 [ B ( ij ) ij ] 2 m +1 [ Q ( ij ) i ] m +1 [ Q ( ij ) j ] m +1 + O ≺ ( N − 2 τ ′ ( D +1) ) . (3.19) W e ma y then successiv ely expand B and Q in the indices of S ( ij ) using ( 3.10 ) and ( 3.11 ), and finally expand eac h Q ( S ) ... in C ( S ) and P ( S ) ... using ( 3.12 ). This giv es a representation of the form ( 3.18 ). Since |B ( S ) ij | ≺ Φ and |Q ( S ) i | ≺ 1, w e hav e q = O ≺ (Φ), and the remainder is O ≺ ( N − τ ′ ( D +1) ). Eac h term of ( 3.19 ) has at least one factor of B ( ij ) ij . If ( 3.10 ) is applied to expand B ( ... ) ij , then the resulting terms either hav e a factor of B ( ... ) ij or hav e a factor eac h of B ( ... ) ik and B ( ... ) j l for some k , l ∈ S ( ij ) , and this is preserv ed in later steps of the expansion. This shows prop ert y (a). Prop ert y (b) holds b ecause i and j eac h o ccur an o dd num ber of times as a lo w er index of B ( ij ) ... in ( 3.19 ), and eac h k ∈ S ( ij ) o ccurs 0 times. The parities of these occurrences do not change in the applications of ( 3.10 ) and ( 3.11 ). Prop ert y (c) holds as in Lemma 3.11 because each application of ( 3.10 ) or ( 3.11 ) whic h introduces a new low er index in troduces at least one additional factor of {B ( ... ) kl } k,l ∈ S ( ij ) or tw o additional factors of {B ( ... ) ik } k ∈ S ( ij ) ∪ {B ( ... ) j k } k ∈ S ( ij ) . □ Pr o of of L emma 3.5 (b). Fix any L ≥ 1 and D > 0. F or any unit v ector v ∈ R N , we may expand E       X i  = j ¯ v ( i ) v ( j ) B ( ij ) ij Q ( i ) i Q ( ij ) j       2 L = X i 1  = j 1 · · · X i 2 L  = j 2 L L Y l =1 ¯ v ( i l ) v ( j l ) 2 L Y l = L +1 v ( i l ) ¯ v ( j l ) E " L Y l =1 B ( i l j l ) i l j l Q ( i l ) i l Q ( i l j l ) j l 2 L Y l = L +1 B ( i l j l ) i l j l Q ( i l ) i l Q ( i l j l ) j l # . ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 31 W riting S = { i 1 , . . . , i 2 L , j 1 , . . . , j 2 L } for the set of distinct indices, q l ∈ M i l j l ,S for the monomials in the expansion of B ( i l j l ) i l j l Q ( i l ) i l Q ( i l j l ) j l in Lemma 3.12 , and q = Q L l =1 q l Q 2 L L +1 ¯ q l , we hav e E       X i  = j ¯ v ( i ) v ( j ) B ( ij ) ij Q ( i ) i Q ( ij ) j       2 L ≤ X i 1  = j 1 · · · X i 2 L  = j 2 L 2 L Y l =1 | v ( i l ) v ( j l ) |   X q 1 ∈M i 1 j 1 ,S ,...,q 2 L ∈M i 2 L j 2 L ,S E | E S q | + O ≺ ( N − τ ′ ( D +1) )   . Let K ⊆ S denote the subset of indices that app ear exactly once in the list ( i 1 , . . . , i 2 L , j 1 , . . . , j 2 L ). Applying P i | v ( i ) | ≤ √ N and P i | v ( i ) | p ≤ 1 for eac h p ≥ 2, for any fixed k ∈ { 1 , . . . , 4 L } and a constan t C ≡ C ( L ) > 0 we then ha v e X i 1  = j 1 ,...,i 2 L  = j 2 L | K | = k 2 L Y l =1 | v ( i l ) v ( j l ) | ≤ C N k/ 2 ≤ C N 2 L . Th us E       X i  = j ¯ v ( i ) v ( j ) B ( ij ) ij Q ( i ) i Q ( ij ) j       2 L ≺ max i 1 ,...,i 2 L ,j 1 ,...,j 2 L ∈ [ N ] q 1 ∈M i 1 j 1 ,S ,...,q 2 L ∈M i 2 L j 2 L ,S n N | K | / 2 E | E S q | o + O ≺ ( N − τ ′ ( D +1)+2 L ) . (3.20) Here S = { i 1 , . . . , i 2 L , j 1 , . . . , j 2 L } denotes the set of distinct indices and K ⊆ S denotes the subset app earing exactly once, b oth of which dep end implicitly on ( i 1 , . . . , i 2 L , j 1 , . . . , j 2 L ). F or each l ∈ [2 L ], let m ∗∗ B ( l ) , m ∗ 1 B ( l ) , m ∗ 2 B ( l ) , m B ( l ) denote the counts m ∗∗ B , m ∗ 1 B , m ∗ 2 B , m B of Lemma 3.12 for q l , and let m ∗ 1 P ( l ) , m ∗ 2 P ( l ) denote also the total n um ber of factors P ( S ) i l , P ( S ) j l in q l resp ectiv ely . Set m ∗∗ B = 2 L X l =1 m ∗∗ B ( l ) , m ∗ B = 2 L X l =1 m ∗ 1 B ( l ) + m ∗ 2 B ( l ) , m B = 2 L X l =1 m B ( l ) , m ∗ P = 2 L X l =1 m ∗ 1 P ( l ) + m ∗ 2 P ( l ) , and denote also E ( l ) = m ∗∗ B ( l ) + 1 2 ( m ∗ 1 B ( l ) + m ∗ 2 B ( l )) + m ∗ 1 P ( l ) + m ∗ 2 P ( l ) − 1 . Note that Lemma 3.12 (a) implies E ( l ) ≥ 0 for each l = 1 , . . . , 2 L . Consider three cases for an index i l ∈ K (or j l ∈ K ), analogous to the pro of of Lemma 3.4 : • i l app ears as a low er index on exactly one factor of q l (whic h must b e B ( S ) i l k for some k  = i l ), and it do es not app ear as a low er index on any factor of { q l ′ : l ′  = l } . Then E S q = 0. • i l app ears as a low er index in some { q l ′ : l ′  = l } . Since i l is distinct from { i l ′ , j l ′ } for each l ′  = l , Lemma 3.12 (c) ensures that the total num b er of such indices i l , j l ∈ K is at most m B + 1 2 m ∗ B . • i l do es not app ear as a low er index on any { q l ′ : l ′  = l } , but app ears as a low er index on at least tw o factors of q l . Then Lemma 3.12 (a–b) ensures that either m ∗ 1 P ( l ) ≥ 1 and m ∗∗ B ( l ) + 1 2 ( m ∗ 1 B ( l ) + m ∗ 2 B ( l )) ≥ 1, or m ∗ 1 P ( l ) = 0 and m ∗∗ B ( l ) + 1 2 ( m ∗ 1 B ( l ) + m ∗ 2 B ( l )) ≥ 2 (since m ∗∗ B ( l ) + m ∗ 1 B ( l ) ≥ 2 is o dd, and also m ∗ 2 B ( l ) ≥ 1 if m ∗∗ B ( l ) = 0). This implies E ( l ) ≥ 1. F urthermore, if also j l ∈ K and j l app ears as a lo w er index on at least t w o factors of q l , then similarly , either m ∗ 1 P ( l ) + m ∗ 2 P ( l ) ≥ 2 and m ∗∗ B ( l ) + 1 2 ( m ∗ 1 B ( l ) + m ∗ 2 B ( l )) ≥ 1, or m ∗ 1 P ( l ) + m ∗ 2 P ( l ) = 1 and m ∗∗ B ( l ) + 1 2 ( m ∗ 1 B ( l ) + m ∗ 2 B ( l )) ≥ 2, or m ∗ 1 P ( l ) + m ∗ 2 P ( l ) = 0 and m ∗∗ B ( l ) + 1 2 ( m ∗ 1 B ( l ) + m ∗ 2 B ( l )) ≥ 3. This implies E ( l ) ≥ 2. 32 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Th us, the total n um b er of suc h indices i l , j l ∈ K is at most P 2 L l =1 E ( l ). Com bining these cases, w e see that either E S q = 0, or m B + 1 2 m ∗ B + 2 L X l =1 E ( l ) = m ∗∗ B + m ∗ B + m B + m ∗ P − 2 L ≥ | K | . In the latter case, using Lemma 3.9 to b ound |B ( S ) ij | , |P ( S ) i | ≺ Φ and each other factor of q b y O ≺ (1), w e get | E S q | ≺ Φ 2 L + | K | , and hence E       X i  = j ¯ v ( i ) v ( j ) B ( ij ) ij Q ( i ) i Q ( ij ) j       2 L ≺ N | K | / 2 Φ 2 L + | K | + N − τ ′ ( D +1)+2 L ≤ ( √ N Φ) 4 L Φ 2 L + N − τ ′ ( D +1)+2 L where the last inequalit y applies √ N Φ ≥ 1 and | K | ≤ 4 L . F or sufficiently large D , the second term is at most the first, so Lemma 3.5 (b) again follo ws from Mark ov’s inequalit y . □ Finally , w e show Lemma 3.5 (c). The following lemma is similar to Lemmas 3.11 and 3.12 . Lemma 3.13. Supp ose the c onditions of L emma 3.4 hold. Fix any L ≥ 1 and D > 0 . Then uniformly over z ∈ D , S ⊂ [ N ] with | S | ≤ L , and i ∈ S , the fol lowing holds: Denote S ( i ) = S \ i . Ther e exists a c ol le ction of monomials M i,S such that the quantity X ( i ) i Q ( i ) i c an b e exp ande d as X ( i ) i Q ( i ) i = X q ∈M i,S q  X ( S ) i , {X ( S ) j } j ∈ S ( i ) , {B ( S ) ij } j ∈ S ( i ) , {B ( S ) j k } j  = k ∈ S ( i ) , P ( S ) i , {P ( S ) j } j ∈ S ( i ) , C ( S )  + O ≺ ( N − τ ′ ( D +1) ) Each monomial q ∈ M i,S is a pr o duct of ± 1 and one or mor e of its inputs, al lowing r ep etition. We have q = O ≺ (Φ) uniformly over q ∈ M i,S , and the numb er of monomials |M i,S | is at most a c onstant dep ending only on L, D . F urthermor e, for e ach q ∈ M i,S , letting m ∗ X , m X , m ∗ B , m B denote the numb ers of factors of the forms X ( S ) i , {X ( S ) j } j ∈ S ( i ) , {B ( S ) ij } j ∈ S ( i ) , and {B ( S ) j k } j  = k ∈ S ( i ) in q , we have: (a) m ∗ X + m X = 1 . (b) m ∗ X + m ∗ B is o dd, and for e ach j ∈ S ( i ) , the numb er of o c curr enc es of j as a lower index of the factors {X ( S ) j } j ∈ S ( i ) and {B ( S ) j k } j  = k ∈ S ( i ) is even. (c) The numb er of distinct indic es of S ( i ) that app e ar as a lower index acr oss al l factors of q is at most 1 2 m X + m B + 1 2 m ∗ B . Pr o of. The pro of is similar to the pro ofs of Lemmas 3.11 and 3.12 , where w e instead apply the recursion ( 3.9 ) and the b ound |X ( S ) j | ≺ Φ. W e omit the details for brevit y . □ Pr o of of L emma 3.5 (c). Fixing any L ≥ 1 and D > 0, a high momen t expansion giv es E      N X i =1 ¯ v ( i ) X ( i ) i Q ( i ) i      2 L = N X i 1 ,...,i 2 L =1 L Y l =1 ¯ v ( i l ) 2 L Y l = L +1 v ( i l ) E L Y l =1 X ( i l ) i l Q ( i l ) i l 2 L Y l = L +1 X ( i l ) i l Q ( i l ) i l ≤ N X i 1 ,...,i 2 L =1 2 L Y l =1 | v ( i l ) |   X q 1 ∈M i 1 ,S ,...,q 2 L ∈M i 2 L ,S E | E S q | + O ≺ ( N − τ ′ ( D +1) )   , where S = { i 1 , . . . , i 2 L } is the set of distinct indices, q l ∈ M i l ,S are the monomials in the expansion of X ( i l ) i l Q ( i l ) i l in Lemma 3.13 , and q = Q L l =1 q l Q 2 L l = L +1 ¯ q l . Let K ⊆ S b e those indices app earing ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 33 exactly once in ( i 1 , . . . , i 2 L ). Then, applying P i | v ( i ) | ≤ √ N and P i | v ( i ) | p ≤ 1 for each p ≥ 2, this implies as in Lemma 3.4 (b) that E      N X i =1 ¯ v ( i ) X ( i ) i Q ( i ) i      2 L ≺ max i 1 ,...,i 2 L ∈ [ N ] q 1 ∈M i 1 ,S ,...,q 2 L ∈M i 2 L ,S n N | K | / 2 E | E S q | o + O ≺ ( N − τ ′ ( D +1)+ L ) . (3.21) F or each l ∈ [2 L ], let m ∗ X ( l ) , m X ( l ) , m ∗ B ( l ) , m B ( l ) denote the counts of Lemma 3.13 for q l , and let m ∗ P ( l ) denote the total n um b er of factors P ( S ) i l in q l . Set m ∗ X = 2 L X l =1 m X ( l ) , m X = 2 L X l =1 m X ( l ) , m ∗ B = 2 L X l =1 m ∗ B ( l ) , m B = 2 L X l =1 m B ( l ) , m ∗ P = 2 L X l =1 m ∗ P ( l ) , and set also E ( l ) = m ∗ X ( l ) + 1 2 ( m X ( l ) + m ∗ B ( l )) + m ∗ P ( l ) − 1 . Note that Lemma 3.13 (a) implies E ( l ) ≥ 0 for each l = 1 , . . . , 2 L . Consider three cases for an index i l ∈ K : • i l app ears as a low er index on exactly one factor of q l (whic h must b e X ( S ) i l or B ( S ) i l k for some k  = i l ), and it do es not app ear as a lo w er index on any factor of { q l ′ : l ′  = l } . Then E S q = 0. • i l app ears as a low er index in some { q l ′ : l ′  = l } . Lemma 3.13 (c) ensures that the total n um b er of suc h indices i l ∈ K is at most 1 2 m X + m B + 1 2 m ∗ B . • i l do es not app ear as a low er index on any { q l ′ : l ′  = l } , but app ears as a low er index on at least t wo factors of q l . Then Lemma 3.13 (a–b) ensures that either m ∗ P ( l ) ≥ 1 and m ∗ X ( l ) + 1 2 ( m X ( l ) + m ∗ B ( l )) ≥ 1, or m ∗ P ( l ) = 0 and m ∗ X ( l ) + 1 2 ( m X ( l ) + m ∗ B ( l )) ≥ 2, so E ( l ) ≥ 1. Thus the total num ber of such indices i l ∈ K is at most P 2 L l =1 E ( l ). Com bining these cases, either E S q = 0, or 1 2 m X + m B + 1 2 m ∗ B + 2 L X l =1 E ( l ) = m ∗ X + m X + m ∗ B + m B + m ∗ P − 2 L ≥ | K | . Bounding |X ( S ) i | , |B ( S ) ij | , |P ( S ) i | ≺ Φ giv es | E S q | ≺ Φ 2 L + | K | . Then E      N X i =1 ¯ v ( i ) X ( i ) i Q ( i ) i      2 L ≺ N | K | / 2 Φ 2 L + | K | + N − τ ′ ( D +1)+ L ≤ ( √ N Φ) 2 L Φ 2 L + N − τ ′ ( D +1)+ L where the last inequalit y uses √ N Φ ≥ 1 and | K | ≤ 2 L . F or large enough D > 0, the second term is at most the first, so Lemma 3.5 (c) follows again by Marko v’s inequality . □ 3.5. Pro of of Lemma 3.6 . 3.5.1. We ak cumulant tensor b ound. W e will use implicitly the observ ation that if Assumption 3 holds, then the condition ( 2.1 ) is v alid als o for complex inputs s 1 , . . . , s m ∈ C n and T ∈ ( C n ) ⊗ k − m , as it ma y b e applied separately to the real and complex parts. In what follo ws, ⟨· , ·⟩ represen ts the non-conjugate, scalar pro duct on ( C n ) ⊗ k . The condition in Assumption 3 p ertains to settings where m, k − m ∈ { 1 , . . . , k − 1 } . The follo wing lemma clarifies that for all v alues of m including { 0 , k } , Assumption 3 has the following weak er implication. 34 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Lemma 3.14 (W eak cumulan t tensor bound) . Supp ose Assumption 3 holds. Fix any k ≥ 3 , let ∥ · ∥ U k b e as in Assumption 3 , and for a sc alar quantity T ∈ ( C n ) 0 define ∥ T ∥ U k = | T | . Then for any ε > 0 , ther e exists a c onstant C ≡ C ( ε, k ) > 0 such that for al l i ∈ [ N ] , 0 ≤ m ≤ k , s 1 , . . . , s m ∈ C n , and T ∈ ( C n ) ⊗ k − m , |⟨ κ k ( g i ) , s 1 ⊗ · · · s m ⊗ T ⟩| ≤ C n ε ( √ n ) k − m ∥ T ∥ U k m Y t =1 ∥ s t ∥ 2 . Pr o of. F or 1 ≤ m ≤ k − 1, the result is implied by Assumption 3 . F or m = k , the result also follows b y identifying a single vector s t as T in Assumption 3 and applying ∥ s t ∥ U k ≤ C k ∥ s t ∥ 2 . F or m = 0, let T ∈ ( C n ) ⊗ k b e any k -th order tensor. W e can w rite T = X α 1 ,...,α k T [ α 1 , . . . , α k ] e α 1 ⊗ · · · ⊗ e α k = X α 1 e α 1 ⊗ X α 2 ,...,α k T [ α 1 , . . . , α k ] e α 2 ⊗ · · · ⊗ e α k | {z } e T ( α 1 ) ∈ ( C n ) ⊗ k − 1 . Fix ε > 0. By Assumption 3 , for a constant C ≡ C ( ε, k ) > 0, w e ha v e |⟨ κ k ( g i ) , T ⟩| ≤ X α 1    ⟨ κ k ( g i ) , e α 1 ⊗ e T ( α 1 ) ⟩    ≤ n · C n ε ( √ n ) k − 1 − 1 max α 1  ∥ e α 1 ∥ 2 · ∥ e T ( α 1 ) ∥ U k  = C n ε ( √ n ) k max α 1 ∥ e T ( α 1 ) ∥ U k . The pro of is completed b y the definition of the U k -norm, max α 1 ∥ e T ( α 1 ) ∥ U k = max α 1 max u 1 ,..., u k − 1 ∈U k    ⟨ e T ( α 1 ) , u 1 ⊗ · · · u k − 1 ⟩    = max α 1 max u 1 ,..., u k − 1 ∈U k |⟨ T , e α 1 ⊗ u 1 ⊗ · · · u k − 1 ⟩| ≤ ∥ T ∥ U k . □ 3.5.2. Moment-cumulant exp ansion. F or an y random vector g ∈ R n with finite moments of all orders, any integer k ∈ N , and any partition π of [ k ], we define the tensor κ π ( g ) ∈ ( R n ) ⊗ k as κ π ( g ) = O B ∈ π κ | B | ( g ) , where κ | B | ( g ) is the | B | -th order cumulan t tensor of g , and eac h factor κ | B | ( g ) of the t ensor pro duct corresp onds to the indices of B ∈ π . W e hav e the follo wing momen t-cum ulan t decomp osition: Lemma 3.15. Supp ose g ∈ R n is a r andom ve ctor with E g = 0 , E gg ∗ = Σ , and finite moments of al l or ders. Then for any inte gers k , m ≥ 0 , we have E [( gg ∗ − Σ) ⊗ k ⊗ g ⊗ m ] = X π ∈ ˙ P k,m κ π ( g ) , wher e ˙ P k,m is the set of al l p artitions of [2 k + m ] that do es not have any singleton blo ck or any of the c ar dinality-2 blo cks { 1 , 2 } , { 3 , 4 } , . . . , { 2 k − 1 , 2 k } . Pr o of. W e prov e by induction on k : F or k = 0, let P m denote the set of all partitions of [ m ]. By the moment-cum ulan t relations, for any vectors v 1 , . . . , v m ∈ R n , E ⟨ g ⊗ m , v 1 ⊗ . . . ⊗ v m ⟩ = X π ∈ P m Y B ∈ π κ | B | ( g ∗ v i : i ∈ B ) , hence E [ g ⊗ m ] = P π ∈ P m κ π ( g ). If π has a singleton blo c k, then κ π ( g ) = 0 since κ 1 ( g ) = E [ g ] = 0, and the result follows for m = 0. ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 35 No w supp ose the claim holds for k − 1 for all m ≥ 0. Let Q : ( R n ) 2 k + m → ( R n ) 2 k + m b e the p erm utation for which Q ( T )( i 1 , . . . , i 2 k + m ) = T ( i 1 , . . . , i 2 k − 2 , i 2 k + m − 1 , i 2 k + m , i 2 k − 1 , . . . , i 2 k + m − 2 ), i.e. rotating the last t wo co ordinates to p ositions 2 k − 1 and 2 k . Then E [( gg ∗ − Σ) ⊗ k ⊗ g ⊗ m ] = E [( gg ∗ − Σ) ⊗ k − 1 ⊗ g ⊗ m +2 ] − Q  E [( gg ∗ − Σ) ⊗ k − 1 ⊗ g ⊗ m ] ⊗ Σ  = X π ∈ ˙ P k − 1 ,m +2 κ π ( g ) − X π ∈ ˙ P k − 1 ,m Q  κ π ( g ) ⊗ Σ  . where the second equality holds b y the induction hypothesis. The first sum is o v er partitions π of [2 k + m ] that do not con tain singletons or the blocks { 1 , 2 } , . . . , { 2 k − 3 , 2 k − 2 } . The second sum may b e understo o d as the sum of κ π ′ ( g ) for partitions π ′ of [2 k + m ] that do not contain singletons or the blo c ks { 1 , 2 } , . . . , { 2 k − 3 , 2 k − 2 } , but con tains the blo c k { 2 k − 1 , 2 k − 2 } . Hence their difference is exactly P π ∈ ˙ P k,m κ π ( g ), completing the induction. □ 3.5.3. T ensor network r epr esentation. W e again fix a deterministic unit v ector u ∈ C n and abbre- viate Y ( S ) i = Y ( S ) i [ uu ∗ ] , Z ( S ) ij k = Z ( S ) ij k [ uu ∗ ] , X ( S ) i = X ( S ) i [ u ] . All subsequent O ≺ ( · ) b ounds are implicitly uniform ov er all suc h vectors u . T o pro v e Lemma 3.6 , we shall now develop a tensor net w ork language to express the high moment expansions of the quantities to b e b ounded. F or any tensor T ∈ ( C n ) ⊗ k , w e write deg( T ) = k to mean the degree or order of the tensor. In particular, this means for v ectors x ∈ C n that deg( x ) = 1 and for matrices A ∈ C n × n that deg( A ) = 2. Definition 3.16. F or any integer L ≥ 1 and S ⊂ [ N ] with | S | ≤ L , we say that ( G , f G ) is a ( S, L ) -valid tensor network if there exist constants M , C ≥ 1 dep ending only on L such that the follo wing holds: Let { v α } n α =1 b e the (random, real) eigenv ectors of R ( S ) , let u ∈ C n b e a fixed deterministic unit v ector, and let U = S M k =1 U k where {U k } k ≥ 1 are the sets of Assumption 3 . Then G = ( V G , E G ) is an undirected multi-graph with no self lo ops and | V G | < C , and f G : V G → T is a lab eling function on the vertices V G taking v alues in a set of tensors T = L ∪ R ∪ E ∪ I m ∪ I t , where L := { u , ¯ u } , R := R o ∪ R n , R o := ( R ( S ) √ N u , ¯ R ( S ) √ N ¯ u ) , R n := [ x ∈U ( R ( S ) √ N x , ¯ R ( S ) √ N x ) , E := { v 1 , . . . , v n } , I m := ( R ( S ) N , ¯ R ( S ) N ) I t := M [ k =2 { κ k ( g 1 ) , . . . , κ k ( g N ) } (where κ 2 ( g i ) = Σ) . W e say that v ∈ V G is a left le af , a right le af , a right original le af , a right new le af , an eigen-le af , a matrix vertex , or a tensor vertex if f G ( v ) b elongs to L , R , R o , R n , E , I m , or I t resp ectiv ely . W e denote the corresp onding sets of vertices V l G , V r G , V ro G , V rn G , V e G , V m G , V t G ⊆ V G . Moreo v er, w e require that the follo wing hold for ( G , f G ): (a) F or all v ∈ V G , w e ha v e deg( v ) = deg( f G ( v )), where deg( v ) is the vertex degree of v ∈ V G and deg( f G ( v )) is the degree/order of its lab el f G ( v ). In particular, G has no v ertices of degree 0, eac h v ertex of V l G ∪ V r G ∪ V e G has degree 1, and each v ertex of V m G has degree 2. (b) G is a bipartite m ulti-graph b etw een V l G ∪ V r G ∪ V e G ∪ V m G and V t G , i.e., each edge of E G connects some u ∈ V l G ∪ V r G ∪ V e G ∪ V m G with some v ∈ V t G . 36 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES (c) If u ∈ V m G and v ∈ V t G ha v e 2 edges b etw een them, then deg( v ) ≥ 3. W e refer to such a v ertex u ∈ V m G as a typ e 1 matrix vertex , and we refer to each other matrix vertex u ∈ V m G (connecting to t w o distinct v ertices v , v ′ ∈ V t G ) as a typ e 2 matrix vertex . W e denote the sets of suc h v ertices by V m 1 G , V m 2 G ⊆ V m G . F or eac h tensor v ertex v ∈ V t G , we will write l ( v ) , r ( v ) , r o ( v ) , r n ( v ) , e ( v ) , m ( v ) , m 1( v ) , m 2( v ) for the n um b ers of left leav es, right leav es, right original leav es, right new leav es, eigen-leav es, matrix v ertices, type 1 matrix vertices, and type 2 matrix vertices adjacent to v . (Eac h adjacent t yp e 1 matrix neighbor con tributes a count of 1 to m 1( v ), even though it has 2 edges connecting to v .) The pro of of Lemma 3.6 (a) will consider net w orks ha ving only lab els L , R o , R n , I m , I t (th us no eigen-lea ves), while the pro of of Lemma 3.6 (b–c) will p ertain to netw orks ha ving only lab els E , R o , R n , I m , I t (th us no left lea ves). F or con venience, we state here several general results that p ertain to net w orks whic h may ha ve v ertices of all types. Definition 3.17. F or a ( S, L )-v alid tensor net w ork ( G , f G ), we define its contracted v alue to b e v al( G , f G ) := X α ∈ [ n ] E G Y v ∈ V G [ f G ( v )] α ( ∂ v ) where α ( ∂ v ) denotes the m ulti-set of indices α e asso ciated with the edges e inciden t to v . Note that this is w ell-defined since all tensors f G ( v ) ∈ T are symmetric (under p erm utations of co ordinates, without complex conjugation). Remark 3.18. v al( G , f G ) is multiplicativ e across disjoint connected comp onen ts of the graph. I.e., if ( G , f G ) is a ( S, L )-v alid tensor netw ork suc h that G splits into t wo connected comp onen ts G 1 , G 2 , then b oth ( G 1 , f G 1 ) , ( G 2 , f G 2 ) m ust b e ( S, L )-v alid tensor netw ork, with f G i defined as the restriction of f G to G i , and v al( G , f G ) = v al( G 1 , f G 1 ) × v al( G 2 , f G 2 ). W e will pro ceed to b ound the v alue of a ( S , L )-v alid tensor netw ork via op erations that succes- siv ely remo ve tensor vertices v ∈ V t G with deg( v ) ≥ 3 from G . This procedure is formalized by the follo wing definition. Definition 3.19. F or a ( S, L )-v alid tensor net w ork ( G , f G ) and v ∈ V t G with deg( v ) ≥ 3, we say that ( G ′ , F G ′ ) is the family of networks gener ate d by r emoving v if G ′ is constructed by removing v , its inciden t edges, and all of its adjacen t lea ves and adjacent type 1 matrix vertices, and F G ′ is the set of all lab elings f G ′ : V G ′ → T satisfying the follo wing: • If u ∈ V G is not adjacen t to v , then f G ( u ) = f G ′ ( u ) . • If u ∈ V m 2 G is adjacent to v , i.e. u is a type 2 matrix neigh b or of v (and hence remains in G ′ ), then f G ′ ( u ) ∈ R n . In words, each ( G ′ , f G ′ ) is constructed by remo ving v and replacing each t yp e 2 matrix neighbor of v by a righ t new leaf, and F G ′ comprises all p ossible lab elings of these righ t new lea ves. W e note that this pro cedure of removing v from G do es not change the degree of an y v ertices whic h re main in G . The following lemma gives a basic b ound for v al( G , f G ) via suc h a remov al. Lemma 3.20. Supp ose the c onditions of L emma 3.6 hold. L et ( G , f G ) b e a ( S, L ) -valid tensor network, let v ∈ V t G b e a tensor vertex satisfying de g ( v ) ≥ 3 , and let ( G ′ , F G ′ ) b e the family of networks gener ate d by r emoving v . Then | v al( G , f G ) | ≺ Φ r ( v ) × max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | uniformly over z ∈ D and al l S ⊆ [ N ] with | S | ≤ L . Pr o of. By Remark 3.18 , we may consider without loss of generality the case where G is a single connected comp onen t. A represen tative form of ( G , f G ) is given b y the follo wing picture, where ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 37 v ∈ V t G has lab el κ k ( g i ), x denotes a deterministic vector in U , v α denotes an eigen v ector of R ( S ) , and T denotes the con traction of all tensors connecting to v via its neigh boring t yp e 2 matrix v ertices: κ k ( g i ) u u . . . l ( v ) v α v α . . . e ( v ) R ( S ) N R ( S ) N . . . T R ( S ) N R ( S ) N . . . m 1( v ) R ( S ) u √ N R ( S ) u √ N . . . ro ( v ) R ( S ) x √ N R ( S ) x √ N . . . rn ( v ) (Other forms of ( G , f G ) ma y replace certain copies of R ( S ) or u by their complex conjugates, ha ve differing vectors x ∈ U for the r n ( v ) neighbors that are right new lea ves, and/or hav e differing eigen v ectors v α for the e ( v ) neigh b ors that are eigen-lea ves. F or all suc h netw orks, v al( G , f G ) ma y b e b ounded similarly . Throughout this and the subsequent pro ofs, we will fo cus on a single represen tativ e example of v al( G , f G ) for notational conv enience.) W riting the spectral decomp osition R ( S ) = P α ( λ α − z ) − 1 v ⊗ 2 α , | v al( G , f G ) | for the ab o ve netw ork ma y be b ounded using Lemma 3.14 as    D κ k ( g i ) , ( N − 1 R ( S ) ) ⊗ m 1( v ) ⊗ v ⊗ e ( v ) α ⊗ u ⊗ l ( v ) ⊗ ( N − 1 / 2 R ( S ) u ) ⊗ ro ( v ) ⊗ ( N − 1 / 2 R ( S ) x ) ⊗ rn ( v ) ⊗ T E    ≤ N − m 1( v ) × X α 1 ,...,α m 1( v ) m 1( v ) Y l =1 1 | λ α l − z | ×       * κ k ( g i ) , m 1( v ) O l =1 v ⊗ 2 α l ⊗ v ⊗ e ( v ) α ⊗ u ⊗ l ( v ) ⊗ ( N − 1 / 2 R ( S ) u ) ⊗ ro ( v ) ⊗ ( N − 1 / 2 R ( S ) x ) ⊗ rn ( v ) ⊗ T +       ≤ ( N − 1 ∥ R ( S ) ∥ 1 ) m 1( v ) × max α 1 ,...,α m 1( v )       * κ k ( g i ) , m 1( v ) O l =1 v ⊗ 2 α l ⊗ v ⊗ e ( v ) α ⊗ u ⊗ l ( v ) ⊗ ( N − 1 / 2 R ( S ) u ) ⊗ ro ( v ) ⊗ ( N − 1 / 2 R ( S ) x ) ⊗ rn ( v ) ⊗ T +       ( a ) ≺ ( N − 1 ∥ R ( S ) ∥ 1 ) m 1( v ) × ( √ N ) deg( T ) × ∥ N − 1 / 2 R ( S ) u ∥ ro ( v ) 2 × ∥ N − 1 / 2 R ( S ) x ∥ rn ( v ) 2 × ∥ T ∥ U k ( b ) ≺ Φ r ( v ) × ∥ ( √ N ) deg( T ) T ∥ U k ( c ) ≺ Φ r ( v ) × max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . Here (a) applies Lemma 3.14 , (b) applies the b ounds N − 1 ∥ R ( S ) ∥ 1 ≺ 1, N − 1 / 2 ∥ R ( S ) u ∥ 2 ≺ Φ, and N − 1 / 2 ∥ R ( S ) x ∥ 2 ≺ Φ ∥ x ∥ 2 ≺ Φ uniformly o v er all vectors x ∈ U , and (c) uses that ∥ ( √ N ) deg( T ) T ∥ U k is the maximum of | v al( G ′ , f G ′ ) | ov er a subset of the lab elings F G ′ in the netw ork generated by remo ving v . □ 38 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES F or certain types of vertices v ∈ V t G , we will require a refinement of the ab o ve b ound using Assumption 3 directly , instead of its weak er implication of Lemma 3.14 . Lemma 3.21. Supp ose the c onditions of L emma 3.6 hold. L et ( G , f G ) b e a ( S, L ) -valid tensor network, let v ∈ V t G b e a tensor vertex satisfying de g ( v ) ≥ 3 , and let ( G ′ , F G ′ ) b e the the family of networks gener ate d by r emoving v . Then: (a) If m 1( v ) ≥ 1 , or if l ( v ) + e ( v ) = 1 and r ( v ) + m 1( v ) = 0 , then | v al( G , f G ) | ≺ N δ √ N × Φ r ( v ) × max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | uniformly over al l S ⊂ [ N ] with | S | ≤ L and over z ∈ D . (b) If r ( v ) ≥ 1 , then | v al( G , f G ) | ≺ N δ √ N × Φ r ( v ) − 1 × max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . uniformly over al l S ⊂ [ N ] with | S | ≤ L and over z ∈ D . Pr o of. W e may again consider the case where G is a single connected component. F or (a), supp ose first m 1( v ) ≥ 1 and the only v ertices adjacen t to v are type 1 matrix vertices. This implies that in fact m 1( v ) ≥ 2, b ecause deg( v ) ≥ 3. Then v and these adjacent vertices form a connected comp onen t, and th us must b e all of G since G is connected. In this case, v al( G , f G ) tak es the representativ e form |⟨ κ k ( g i ) , ( N − 1 R ( S ) ) ⊗ m 1( v ) ⟩| , for whic h w e hav e    D κ k ( g i ) , ( N − 1 R ( S ) ) ⊗ m 1( v ) E    ≤ N − ( m 1( v ) − 1) X α 1 ,...,α m 1( v ) − 1   m 1( v ) − 1 Y t =1 1 | λ α t − z |         * κ k ( g i ) , m 1( v ) − 1 O t =1 v ⊗ 2 α t ⊗ N − 1 R ( S ) +       ≤ ( N − 1 ∥ R ( S ) ∥ 1 ) m 1( v ) − 1 max α 1 ,...,α m 1( v ) − 1       * κ k ( g i ) , m 1( v ) − 1 O t =1 v ⊗ 2 α t ⊗ N − 1 R ( S ) +       ≺ ( N − 1 ∥ R ( S ) ∥ 1 ) m 1( v ) − 1 ( √ N ) 2 − 1 ∥ N − 1 R ( S ) ∥ U k ≺ N − 1 / 2 N δ . b y Assumption 3 and the bounds ∥ N − 1 R ( S ) ∥ 1 ≺ 1 and ∥ R ( S ) ∥ U k ≺ N δ . This implies the lemma since r ( v ) = 0 and G ′ is empty . If m 1( v ) ≥ 1 and the only v ertices adjacent to v are matrix vertices, of whic h at least one is t yp e 2, then a representativ e form of v al( G , f G ) is b ounded using Assumption 3 similarly as    D κ k ( g i ) , ( N − 1 R ( S ) ) ⊗ m 1( v ) ⊗ T E    ≤ N − m 1( v ) X α 1 ,...,α m 1( v ) m 1( v ) Y t =1 1 | λ α t − z |       * κ k ( g i ) , m 1( v ) O t =1 v ⊗ 2 α t ⊗ T +       ≺ ( N − 1 ∥ R ( S ) ∥ 1 ) m 1( v ) ( √ N ) deg( T ) − 1 ∥ T ∥ U k ≺ N − 1 / 2 × max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . This implies the lemma since r ( v ) = 0 and δ ≥ 0. ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 39 If m 1( v ) ≥ 1 and v is adjacen t to at least one leaf vertex, i.e. l ( v ) + e ( v ) + r ( v ) ≥ 1, then a represen tativ e form of v al( G , f G ) is b ounded similarly using Assumption 3 as    D κ k ( g i ) , u ⊗ l ( v ) ⊗ v ⊗ e ( v ) β ⊗ ( N − 1 / 2 R ( S ) u ) ro ( v ) ⊗ ( N − 1 / 2 R ( S ) x ) ⊗ rn ( v ) ⊗ ( N − 1 R ( S ) ) ⊗ m 1( v ) ⊗ T E    ≤ N − ( m 1( v ) − 1) X α 1 ,...,α m 1( v ) − 1   m 1( v ) − 1 Y t =1 1 | λ α t − z |         * κ k ( g i ) , u ⊗ l ( v ) ⊗ v ⊗ e ( v ) β ⊗ ( N − 1 / 2 R ( S ) u ) ro ( v ) ⊗ ( N − 1 / 2 R ( S ) x ) ⊗ rn ( v ) ⊗ m 1( v ) − 1 O t =1 v ⊗ 2 α t ⊗ ( N − 1 R ( S ) ) ⊗ T +       ≺ ( N − 1 ∥ R ( S ) ∥ 1 ) m 1( v ) − 1 ( √ N ) deg( T )+2 − 1 ∥ N − 1 / 2 R ( S ) u ∥ ro ( v ) 2 ∥ N − 1 / 2 R ( S ) x ∥ rn ( v ) 2 ∥ N − 1 R ( S ) ∥ U k ∥ T ∥ U k ≺ N − 1 / 2 N δ × Φ r ( v ) × ∥ ( √ N ) deg( T ) T ∥ U k ≺ N − 1 / 2 N δ × Φ r ( v ) × max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . This cov ers all cases where m 1( v ) ≥ 1. Next, supp ose l ( v ) + e ( v ) = 1 and r ( v ) + m 1( v ) = 0. Then, since deg( v ) ≥ 3, v is adjacent to at least one t yp e 2 matrix vertex, so a representativ e form of v al( G , f G ) is b ounded as    D κ k ( g i ) , u ⊗ l ( v ) ⊗ v ⊗ e ( v ) β ⊗ T E    ≺ ( √ N ) deg( T ) − 1 ∥ T ∥ U k ≺ N − 1 / 2 × max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . This again implies the lemma since r ( v ) = 0 and δ ≥ 0. This sho ws part (a). F or (b), if m 1( v ) ≥ 1, then the statement is implied by (a) as Φ ≤ 1. Supp ose then that m 1( v ) = 0. If v is adjacen t to at least one t yp e 2 matrix v ertex, then a representativ e form of v al( G , f G ) is b ounded as    D κ k ( g i ) , u ⊗ l ( v ) ⊗ v ⊗ e ( v ) β ⊗ ( N − 1 / 2 R ( S ) u ) ro ( v ) ⊗ ( N − 1 / 2 R ( S ) x ) ⊗ rn ( v ) ⊗ T E    ≺ ( √ N ) deg( T ) − 1 Φ r ( v ) ∥ T ∥ U k ≺ N − 1 / 2 × Φ r ( v ) × max ( G ′ ,f G ′ ) ∈F G ′ | v al( G ′ , f G ′ ) | whic h again implies the lemma. Finally , if v is not adjacen t to any matrix vertex, then all neighbors of v hav e degree 1, and v and its neighbors constitute all of G whic h is a star. Isolating one factor of R ( S ) u or R ( S ) x as T in Assumption 3 , a representativ e form of v al( G , f G ) is b ounded as    D κ k ( g i ) , u ⊗ l ( v ) ⊗ v ⊗ e ( v ) β ⊗ ( N − 1 / 2 R ( S ) u ) ro ( v ) ⊗ ( N − 1 / 2 R ( S ) x ) ⊗ rn ( v ) E    ≺ max { N − 1 / 2 ∥ R ( S ) x ∥ U k , N − 1 / 2 ∥ R ( S ) u ∥ U k } × max { N − 1 / 2 ∥ R ( S ) u ∥ 2 , N − 1 / 2 ∥ R ( S ) x ∥ 2 } r ( v ) − 1 ≺ N − 1 / 2 N δ × Φ r ( v ) − 1 . This implies the lemma since G ′ is empty , completing the proof. □ 3.5.4. Pr o of of L emma 3.6 (a). W e no w fo cus attention on netw orks with left leav es and no eigen- lea v es, and show Lemma 3.6 (a). In addition to Definition 3.16 , we define the following. Definition 3.22. Let ( G , f G ) b e a ( S, L )-v alid tensor net w ork that contains no eigen-leav es. W e sa y that a left leaf u ∈ V l G is singular if its adjacent tensor vertex v ∈ V t G is not adjacent to any left lea v es other than u , and w e call v a singular tensor vertex . W e write V sl G , V st G for the sets of suc h v ertices, and t ( u ) = v for the one-to-one corresp ondence b et w een these t wo sets. The following lemma no w expresses pro ducts of monomials arising from Lemma 3.11 via tensor net w ork v alues. 40 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Lemma 3.23. Fix any inte ger L ≥ 1 . F or any i 1 , . . . , i 2 L ∈ [ N ] , denote S = { i 1 , . . . , i 2 L } the set of distinct indic es. L et q 1 , . . . , q 2 L b e a se quenc e of monomials satisfying the c onditions of L emma 3.11 , wher e e ach q l is a monomial fr om the exp ansion of Y ( i l ) i l Q ( i l ) i l , and let q = Q L l =1 q l Q 2 L l = L +1 ¯ q l . Then ther e exists a c onstant C ≡ C ( L ) > 0 , an exp onent m ∈ [0 , C ] , and a set N of ( S, 2 L ) -valid tensor networks with |N | ≤ C , such that | E S q | = |C ( S ) | m · X ( G ,f G ) ∈N | v al( G , f G ) | Mor e over, for al l ( G , f G ) ∈ N , ther e exist disjoint subsets of singular left le aves U 1 G , U 2 G ⊆ V sl G such that the fol lowing holds: (a) We have | V l G | = | V r G | = | V ro G | = 2 L and | V rn G | = | V e G | = 0 . (I.e., ther e ar e no right new le aves or eigen-le aves, and exactly 2 L left le aves and 2 L right original le aves.) (b) L et K ⊆ S b e the set of indic es that app e ar exactly onc e in the list ( i 1 , . . . , i 2 L ) . Then | K | = | U 1 G | + | U 2 G | . (c) We have | V m 2 G | ≥ | U 1 G | . (d) F or al l u ∈ U 2 G , we have de g ( t ( u )) ≥ 3 an d m 1( t ( u )) ≥ 1 . Pr o of. F or each distinct index i ∈ S , collecting all appearances of g i and ( g i g ∗ i − Σ) across q 1 , . . . , q 2 L in to a single tensor, and then ev aluating the exp ectation E S o v er the indep endent v ectors { g i } i ∈ S , it is clear from the forms of Y ( S ) i , Z ( S ) ij k , B ( S ) j k , P ( S ) j , C ( S ) that E S q ma y b e expressed as a pro duct of factors C ( S ) , C ( S ) and contractions of tensors of the form { E [( g i g ∗ i − Σ) ⊗ k ⊗ g ⊗ m i ] : i ∈ S and k , m ∈ [0 , C ( L )] } with matrices and vectors of the forms L := { u , ¯ u } , R o := ( R ( S ) u √ N , R ( S ) u √ N ) , I m := ( R ( S ) N , R ( S ) N ) . F or each distinct index i ∈ S , we may then decomp ose the corresp onding moment tensor using Lemma 3.15 , E [( g i g ∗ i − Σ) ⊗ k ⊗ g ⊗ m i ] = X π ∈ ˙ P k,m κ π ( g i ) . Then expanding by multi-linearit y , we claim that E S q is a pro duct of factors C ( S ) , C ( S ) with P ( G ,f G ) ∈N | v al( G , f G ) | ov er a family N of ( S, 2 L )-v alid tensor net w orks, where |N | ≤ C ( L ). T o see this claim, consider as an illustrativ e example q 1 = Z ( ij ) ij j C ( ij ) , q 2 = Z ( ij ) j ii C ( ij ) , q = q 1 ¯ q 2 for t w o distinct indices S = { i, j } . Then, defining a p erm utation Q : ( C n ) 8 → ( C n ) 8 suc h that Q ( T )( i 1 , . . . , i 8 ) = T ( i 1 , i 2 , i 7 , i 8 , i 5 , i 6 , i 3 , i 4 ), we hav e E { ij } [ q ] = |C ( ij ) | 2 E { ij } h u ∗ ( g i g ∗ i − Σ)( N − 1 R ( ij ) ) g j g ∗ j ( N − 1 / 2 R ( ij ) u ) × u ∗ ( g j g ∗ j − Σ)( N − 1 R ( ij ) ) g i g ∗ i ( N − 1 / 2 R ( ij ) u ) i = |C ( ij ) | 2 D E  ( g i g ∗ i − Σ) ⊗ g ⊗ 2 i  ⊗ E h ( g j g ∗ j − Σ) ⊗ g ⊗ 2 j i , Q  ¯ u ⊗ N − 1 R ( ij ) ⊗ N − 1 / 2 R ( ij ) u ⊗ u ⊗ N − 1 R ( ij ) ⊗ N − 1 / 2 R ( ij ) u E = X π 1 ∈ ˙ P 1 , 2 X π 2 ∈ ˙ P 1 , 2 |C ( ij ) | 2 D κ π 1 ( g i ) ⊗ κ π 2 ( g j ) , Q  ¯ u ⊗ N − 1 R ( ij ) ⊗ N − 1 / 2 R ( ij ) u ⊗ u ⊗ N − 1 R ( ij ) ⊗ N − 1 / 2 R ( ij ) u E . ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 41 The summand for eac h pair of partitions ( π 1 , π 2 ) corresp onds to |C ( ij ) | 2 times v al( G , f G ) for a (p os- sibly disconnected) ( S, 2 L )-v alid tensor net work. F or example, the partitions π 1 = {{ 1 , 3 } , { 2 , 4 }} and π 2 = {{ 1 , 2 , 3 , 4 }} corresp ond to the net w ork κ 4 ( g j ) u i 5 R ( ij ) u √ N i 4 R ( ij ) N κ 2 ( g i ) u i 6 i 7 i 1 R ( ij ) N κ 2 ( g i ) R ( ij ) u √ N i 3 i 2 i 8 while the partitions π 1 = {{ 1 , 4 } , { 2 , 3 }} and π 2 = {{ 1 , 4 } , { 2 , 3 }} corresp ond to the net w ork κ 2 ( g i ) κ 2 ( g j ) R ( ij ) N R ( ij ) N i 2 i 3 i 7 i 6 u κ 2 ( g i ) R ( ij ) u √ N i 1 i 8 u κ 2 ( g j ) R ( ij ) u √ N i 5 i 4 More generally , any suc h monomial q may b e decomp osed in this wa y , where it is clear by construction that each resulting ( G , f G ) satisfies conditions (a) and (b) of Definition 3.16 , that G = ( V G , E G ) is bipartite and deg( v ) = deg( f G ( v )) for all v ∈ V G . It satisfies condition (c) b y the follo wing reasoning: If u ∈ V m G , then its lab el f G ( u ) must b e a cop y of N − 1 R ( S ) or N − 1 R ( S ) arising from a term Z ( S ) ij k , B ( S ) j k , or P ( S ) j . F or Z ( S ) ij k , this vertex u m ust b e t yp e 2 b y the constraint i  = j in Z ( S ) ij k , and the same holds for B ( S ) j k b y the constrain t j  = k . Thus u ∈ V m G can only b e t ype 1 — i.e. b oth edges connect to a single neigh b or v ∈ V t G — if f G ( u ) arises from a term P ( S ) j . In this case, its neigh b or v ∈ V t G cannot ha v e deg( v ) = 2 b ecause the partitions b elonging to ˙ P k,m cannot ha v e a cardinality-2 blo ck that represents a pairing of the t w o copies of g j in P ( S ) j . Thus deg( v ) ≥ 3, v erifying condition (c). This sho ws the first claim of the lemma, that | E S q | = |C ( S ) | m · X ( G ,f G ) ∈N | v al( G , f G ) | 42 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES for some m ∈ [0 , C ( L )] and family N of ( S, 2 L )-v alid tensor net w orks with |N | ≤ C ( L ). W e no w prov e the remaining claims of the lemma: Since each q l con tains exactly one factor Y ( S ) i l or Z ( S ) i l j k , there are exactly 2 L app earances of u and 2 L app earances of N − 1 / 2 R ( S ) u in q . Thus | V l G | = | V r G | = 2 L . There are no right new leav es or eigen-leav es in the ab ov e construction, so | V rn G | = | V e G | = 0, and hence also | V r G | = | V ro G | . This pro v es (a). Next, let K ⊆ S b e the subset of distinct indices which app ear exactly once in ( i 1 , . . . , i 2 L ). F or each i l ∈ K , there is exactly 1 factor of the form Y ( S ) i l or Z ( S ) i l j k in q . It follo ws that across all tensor v ertices lab eled by a cum ulant of g i l , there is exactly one adjacent left leaf. This left leaf for eac h i l ∈ K m ust then b e singular. Let U G ⊆ V sl G b e the subset of all such singular left lea v es corresp onding to the indices i l ∈ K , so | U G | = | K | . W e will partition U G in to disjoint subsets U 1 G , U 2 G , so that | K | = | U 1 G | + | U 2 G | , showing (b). F or eac h index i l ∈ K , let u ( i l ) ∈ U G b e the corresponding singular left leaf identified ab o v e, and let v ( i l ) = t ( u ( i l )) ∈ V t G b e its tensor v ertex neighbor. W e define U 1 G to b e the set of u ( i l ) ∈ U G satisfying either: (1) i l app ears as a low er index in some { q l ′ : l ′  = l } , or (2) i l do es not app ear as a low er index in any { q l ′ : l ′  = l } , and either deg( v ( i l )) = 2 or r o ( v ( i l )) + m 2( v ( i l )) ≥ 2. T o b ound | U 1 G | , let m Z ( l ) , m ∗ B ( l ) , m B ( l ) b e the coun ts of Lemma 3.11 for eac h q l , and let m ∗ P ( l ) b e the n um b er of factors P ( S ) i l in q l for whic h the copy of N − 1 R ( S ) in P ( S ) i l lab els a t yp e 2 matrix v ertex. (Note that this is different from the pro of of Lemma 3.5 where m ∗ P ( l ) counted all factors of P ( S ) i l in q l .) Set m Z = 2 L X l =1 m Z ( l ) , m ∗ B = 2 L X l =1 m ∗ B ( l ) , m B = 2 L X l =1 m B ( l ) , m ∗ P = 2 L X l =1 m ∗ P ( l ) . Lemma 3.11 ensures that the num ber of vertices u ( i l ) ∈ U G satisfying condition (1.) is at most m Z + m B + 1 2 m ∗ B . F or condition (2.), if deg( v ( i l )) = 2, let u ′ b e the vertex other than u ( i l ) whic h is inciden t to v ( i l ). Then u ′ is not a left leaf b ecause u ( i l ) is singular. If u ′ is a righ t leaf, then its lab el f G ( u ′ ) m ust b e the cop y of N − 1 / 2 R ( S ) u multiplying g ∗ i l in some factor Z ( S ) ij k of q with k = i l , or m ultiplying ( g i l g ∗ i l − Σ) in the factor Y ( S ) i l of q l . The former is not p ossible b ecause i l do es not app ear as a lo w er index of any { q l ′ : l ′  = l } , and the latter is not p ossible b ecause the partitions b elonging to ˙ P k,m cannot pair the t w o copies of g i l in Y ( S ) i l . Thus u ′ is also not a right leaf, so it must b e a type 2 matrix v ertex. Since i l do es not app ear as a low er index of any { q l ′ : l ′  = l } , and since ˙ P k,m cannot pair the tw o copies of g i l in either Z ( S ) i l or P ( S ) i l , the lab el f G ( u ′ ) must b e the matrix N − 1 R ( S ) arising from a factor B ( S ) i l j of q l . Since m ∗ B ( l ) is even by Lemma 3.11 , this means 1 2 m ∗ B ( l ) ≥ 1. By similar reasoning, if deg( v ( i l )) ≥ 3 and r o ( v ( i l )) + m 2( v ( i l )) ≥ 2, then the factor Y ( S ) i l or Z ( S ) i l of q can con tribute at most one righ t original leaf or type 2 matrix vertex neigh b or for v ( i l ). Thus, since r o ( v ( i l )) + m 2( v ( i l )) ≥ 2, v ( i l ) has some other neighboring type 2 matrix vertex, whic h must arise from a factor P ( S ) i l or B ( S ) i l j of q l . Since m ∗ B ( l ) is ev en, this means 1 2 m ∗ B ( l ) + m ∗ P ( l ) ≥ 1. Thus the n um b er of v ertices u ∈ U G satisfying condition (2.) is at most 1 2 m ∗ B + m ∗ P . Putting this together, | U 1 G | ≤ m Z + m B + 1 2 m ∗ B | {z } condition (1.) + 1 2 m ∗ B + m ∗ P | {z } condition (2.) = m Z + m B + m ∗ B + m ∗ P ≤ | V m 2 G | , ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 43 the last inequality holding b ecause each factor counted b y m Z , m B , m ∗ B , m ∗ P has a copy of N − 1 R ( S ) lab eling a differen t type 2 matrix vertex. This sho ws (c). Finally , b y construction, all remaining vertices u ( i l ) ∈ U 2 G = U G \ U 1 G satisfy deg( v ( i l )) ≥ 3 and r o ( v ( i l )) + m 2( v ( i l )) ≤ 1. Also rn ( v ( i l )) = e ( v ( i l )) = 0, and l ( v ( i l )) = 1 since u ( i l ) is singular. Then the condition deg( v ( i l )) ≥ 3 requires m 1 ( v ( i l )) ≥ 1. This shows (d) and completes the pro of. □ After removing all tensor vertices v ∈ V t G with deg( v ) ≥ 3, the following lemma will b e used to b ound the v alue of a tensor netw ork with only chains and cycles. Lemma 3.24. Supp ose the c onditions of L emma 3.6 hold. L et ( G , f G ) b e a ( S, L ) -valid tensor network that c ontains no eigen-le aves, and such that de g ( v ) = 2 for al l v ∈ V t G . (This implies that G is a simple gr aph with no typ e 1 matrix vertic es and only chains and cycles.) Then for any subset (p ossibly empty) of singular left le aves U G ⊆ V sl G , uniformly over al l | S | ≤ L and z ∈ D , | v al( G , f G ) | ≺  N δ √ N  | U G | × Φ | V r G | + | V m 2 G |−| U G | Pr o of. By Remark 3.18 , we may consider without loss of generality the case where G is a single connected comp onent. W e will use rep eatedly that under the conditions of Lemma 3.6 , w e hav e also N − 1 ∥ R ( S ) ∥ F ≤ max α N − 1 / 2 ∥ R ( S ) e α ∥ 2 ≺ Φ Since deg( v ) = 2 for eac h v ∈ V t G , all tensor v ertices hav e lab el Σ. If G is a chain, w e write u 1 , u 2 for the t wo endp oin ts of the chain. W e then consider all p ossible cases for G : (1) Supp ose G is a cycle. Then a represen tativ e form of ( G , f G ) is R ( S ) N Σ R ( S ) N Σ Σ R ( S ) N Σ R ( S ) N The n um b er of v ertices with lab el N − 1 R ( S ) in the cycle is | V m 2 G | , whic h is at least tw o since these v ertices are type 2. Thus the v alue is b ounded as T r( N − 1 R ( S ) Σ) | V m 2 G | ≤ ∥ Σ ∥ | V m 2 G | op × ( N − 1 ∥ R ( S ) ∥ F ) | V m 2 G | ≺ Φ | V m 2 G | . This implies the lemma, since | U G | = | V r G | = 0. (2) Suppose G is a c hain, with endp oints u 1 , u 2 / ∈ U G . These endp oin ts ma y b e left lea v es, righ t original leav es, or righ t new lea ves, leading to the following 6 t yp es of represen tativ e netw orks ( G , f G ): 44 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES u Σ . . . Σ u u Σ . . . Σ R ( S ) x √ N R ( S ) x √ N Σ . . . Σ R ( S ) x √ N u Σ . . . Σ R ( S ) u √ N R ( S ) u √ N Σ . . . Σ R ( S ) u √ N R ( S ) u √ N Σ . . . Σ R ( S ) x √ N In each case, we may apply ∥ u ∥ 2 = 1, ∥ Σ ∥ op ≤ C , ∥ N − 1 R ( S ) ∥ F ≺ Φ, ∥ N − 1 / 2 R ( S ) u ∥ 2 ≺ Φ, and ∥ N − 1 / 2 R ( S ) x ∥ 2 ≺ Φ to get | v al( G , f G ) | ≺ Φ | V r G | + | V m 2 G | . W e clarify that in particular, if | V r G | + | V m 2 G | = 0, this is the b ound | u ∗ Σ u | ≺ 1. Here, since | U G | = 0, the lemma also holds. (3) Supp ose G is a chain and exactly one of u 1 , u 2 , say u 1 , b elongs to U G . Then u 1 is singular, so its adjacen t tensor vertex t ( u 1 ) cannot b e adjacent to any other left lea ves, meaning the other neigh b or of t ( u 1 ) is a matrix vertex or a righ t leaf. If | V r G | + | V m 2 G | = 1, then we hav e the cases u u 1 Σ t ( u 1 ) R ( S ) u √ N u u 1 Σ t ( u 1 ) R ( S ) x √ N u u 1 Σ t ( u 1 ) R ( S ) N Σ u F or these cases, w e apply the bounds 1 √ N | u ∗ Σ R ( S ) u | ≺ N δ √ N , 1 √ N | u ∗ Σ R ( S ) x | ≺ N δ √ N , 1 N | u ∗ Σ R ( S ) Σ u | ≺ N δ N , whic h imply the lemma since | U G | = 1 and | V r G | + | V m 2 G | = 1. If | V r G | + | V m 2 G | ≥ 2, then t ( u 1 ) must b e adjacent to a matrix v ertex, and w e are left with the follo wing p ossibilities: u u 1 Σ t ( u 1 ) R ( S ) N . . . R ( S ) N Σ u u u 1 Σ t ( u 1 ) R ( S ) N . . . Σ R ( S ) u √ N u u 1 Σ t ( u 1 ) R ( S ) N . . . Σ R ( S ) x √ N ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 45 In these cases, we may con tract the lab els u and Σ on u 1 and t ( u 1 ) with the first N − 1 R ( S ) , to get | v al( G , f G ) | ≺ 1 √ N Φ | V r G | + | V m 2 G | . F or example, | u ∗ Σ( N − 1 R ( S ) Σ) | V m 2 G | ( N − 1 / 2 R ( S ) u ) | ≺ 1 √ N ∥ N − 1 / 2 u ∗ Σ R ( S ) ∥ 2 × ∥ N − 1 R ( S ) ∥ | V m 2 G |− 1 F × ∥ N − 1 / 2 R ( S ) u ∥ 2 ≺ 1 √ N Φ | V r G | + | V m 2 G | . This is stronger than the lemma, b ecause | U G | = 1, Φ ≤ 1, and δ ≥ 0. (4) Finally , supp ose b oth endpoints u 1 , u 2 ∈ U G . Then | V r G | = 0. If | V m 2 G | = 1, then a represen- tativ e case is u u 1 Σ t ( u 1 ) R ( S ) N Σ t ( u 2 ) u u 2 W e apply the b ound 1 N | u ∗ Σ R ( S ) Σ u | ≺ N δ N ≺  N δ √ N  2 , whic h implies the lemma since | U G | = 2 and Φ ≤ 1. If | V m 2 G | ≥ 2, then since u 1 , u 2 are singu- lar left lea ves, their adjacent tensor vertices t ( u 1 ) , t ( u 2 ) m ust b e adjacen t to matrix vertices. A represen tativ e case is u u 1 Σ t ( u 1 ) R ( S ) N . . . R ( S ) N Σ t ( u 2 ) u u 2 for which we ma y b ound | u ∗ Σ( N − 1 R ( S ) Σ) | V m 2 G | u | ≺ 1 N ∥ N − 1 / 2 u ∗ Σ R ( S ) ∥ 2 ×∥ N − 1 R ( S ) ∥ | V m 2 G |− 2 F ×∥ N − 1 / 2 R ( S ) Σ u ∥ 2 ≺ 1 N Φ | V m 2 G | . This again implies the lemma since | U G | = 2, Φ ≤ 1, and δ ≥ 0. This completes the proof. □ Pr o of of L emma 3.6 (a). W e recall from ( 3.15 ) that, for any fixed constants L ≥ 1 and D > 0, we ha v e E      1 √ N N X i =1 Y ( i ) i Q ( i ) i      2 L ≺ max i 1 ,...,i 2 L ∈ [ N ] q 1 ∈M i 1 ,S ,...,q 2 L ∈M i 2 L ,S n N | S |− L E | E S q | o + N − τ ′ ( D +1)+ L . (3.22) where S = { i 1 , . . . , i 2 L } is the set of distinct indices in i 1 , . . . , i 2 L , the monomials q l ∈ M i l ,S are those arising in the expansion of Y ( i l ) i l Q ( i l ) i l via Lemma 3.11 , and q = Q L l =1 q l Q 2 L l = L +1 ¯ q l . By Lemma 3.23 and the b ound |C ( S ) | ≺ 1, there exists a family N of ( S, 2 L )-v alid tensor netw orks for whic h | E S q | ≺ X ( G ,f G ) ∈N | v al( G , f G ) | ≺ max ( G ,f G ) ∈N | v al( G , f G ) | , where the second inequality applies |N | ≤ C . It remains to bound v al( G , f G ) for eac h ( G , f G ) ∈ N . 46 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES The proof for eac h ( G , f G ) ∈ N pro ceeds b y reducing ( G , f G ) via a sequence of four steps. In eac h of the first three steps, we sequentially apply either Corollary 3.20 or Lemma 3.21 to remo v e a single tensor vertex v ∈ V t G with deg( v ) ≥ 3. After removing all suc h vertices, the graph is reduced to c hains and cycles, and w e apply Lemma 3.24 in the fourth step to complete the proof. Letting U 1 G , U 2 G ⊆ V sl G b e the disjoint subsets of singular left lea v es in Lemma 3.23 (which are fixed throughout the b elo w pro cess), the four steps are as follows: (1) While there exists a singular left leaf u ∈ U 1 G ∪ U 2 G inciden t to a tensor vertex v = t ( u ) ∈ V t G suc h that deg( v ) ≥ 3 and either m 1( v ) ≥ 1 or r ( v ) + m 1( v ) = 0: Remov e v , and apply Lemma 3.21 (a) to b ound | v al( G , f G ) | ≺ N δ √ N Φ r ( v ) max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . (2) While there exists a singular left leaf u ∈ U 1 G inciden t to a tensor vertex v ∈ V t G suc h that deg( v ) ≥ 3 and r ( v ) ≥ 1: Remov e v , and apply Lemma 3.21 (b) to bound | v al( G , f G ) | ≺ N δ √ N Φ r ( v ) − 1 max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . (3) While there exists a tensor vertex v ∈ V t G suc h that deg( v ) ≥ 3: Remov e v , and apply Lemma 3.20 to b ound | v al( G , f G ) | ≺ Φ r ( v ) max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . (4) At this p oin t, the remaining net work ( G , f G ) is such that all v ∈ V t G satisfy deg( v ) = 2. Let | V ro G (4) | , | V rn G (4) | , | V m 2 G (4) | , and | U 1 G (4) | denote the num b ers of righ t original leav es, right new lea v es, (t yp e 2) matrix v ertices, and singular left lea ves of U 1 G in ( G , f G ). W e apply Lemma 3.24 to b ound | v al( G , f G ) | ≺  N δ √ N  | U 1 G (4) | × Φ | V ro G (4) | + | V rn G (4) | + | V m 2 G (4) |−| U 1 G (4) | . W e clarify that in the 4 inequalities ab o v e, ( G , f G ) on the left side denotes the mo dified netw ork after having applied all previous remov als, r ( v ) on the right side denotes the num ber of righ t lea v es adjacen t to v in this mo dified netw ork ( G , f G ), and ( G ′ , F G ′ ) denotes the family of netw orks obtained by further removing v from ( G , f G ) according to Definition 3.19 . Within Steps 1–3, if there are m ultiple vertices that may be remo ved, w e may c ho ose one to remo v e arbitrarily , but w e do this sequentially as eac h remov al may increase r ( v ′ ) for the remaining vertices v ′ ∈ V t G . F or example, supp ose t w o v ertices u 1 , u 2 ∈ U 1 G with adjacen t v ertices v 1 = t ( u 1 ) and v 2 = t ( u 2 ) are suc h that deg( v 1 ) ≥ 3 and deg( v 2 ) ≥ 3, r ( v 1 ) + m 1( v 1 ) = 0 and r ( v 2 ) + m 1( v 2 ) = 0, and v 1 , v 2 are adjacen t to a common type 2 matrix v ertex u ∈ V m G . Pictorially , w e ha v e ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 47 u u 1 κ k ( g i ) v 1 . . . R ( S ) N u κ k ( g i ) v 2 . . . u u 2 Apply Step 1 R ( S ) x √ N u κ k ( g i ) v 2 . . . u u 2 Then Step 1 may first remov e either v 1 or v 2 . Up on removing, say , v 1 , the adjacent matrix vertex u is replaced by a right new leaf according to Definition 3.19 . Then r ( v 2 ) ≥ 1 in the resulting graph, implying that v 2 is no longer a candidate for remov al in Step 1, and instead will be remov ed in Step 2. Since the remo v al of an y vertex v ∈ V t G do es not change deg( v ′ ) or m 1 ( v ′ ) and can only increase r ( v ′ ) for each remaining vertex v ′ ∈ V t G , the ab ov e pro cedure is forward-progressing in the sense that eac h remov al of a tensor v ertex in Steps 1–3 cannot result in a tensor vertex b eing eligible for remo v al b y a previous step. F or each Step k ∈ { 1 , 2 , 3 } , let | V ro G ( k ) | , | V rn G ( k ) | , | U 1 G ( k ) | , and | U 2 G ( k ) | denote the total num bers of right original lea ves, right new leav es, singular left leav es of U 1 G , and singular left leav es of U 2 G that are remov ed in Step k . Importantly , all vertices u ∈ U 2 G ha v e m 1( t ( u )) ≥ 1 b y Lemma 3.23 (d), so after all applications of Step 1, the netw ork has no remaining v ertices in U 2 G . Thus, | U 2 G (2) | = | U 2 G (3) | = 0, and the num ber of total applications of Step 2 is | U 1 G (2) | . Also, after all applications of Steps 1 and 2, no singular left leaf u ∈ U 1 G has deg( t ( u )) ≥ 3, so Step 3 do es not remo v e an y vertices of U 1 G , i.e. | U 1 G (3) | = 0. Th us, letting v al( G , f G ) denote the v alue of the starting net work, the abov e steps yield the bound | v al( G , f G ) | ≺  N δ √ N  U 1 G (1)+ U 2 G (1) Φ | V ro G (1) | + | V rn G (1) | | {z } Step 1 ×  N δ √ N  U 1 G (2) Φ | V ro G (2) | + | V rn G (2) |−| U 1 G (2) | | {z } Step 2 × Φ | V ro G (3) | + | V rn G (3) | | {z } Step 3 ×  N δ √ N  | U 1 G (4) | Φ | V ro G (4) | + | V rn G (4) | + | V m 2 G (4) |−| U 1 G (4) | | {z } Step 4 . F urthermore, letting | V ro G | , | V rn G | , | V m 2 G | , | U 1 G | , and | U 2 G | denote these quantities for the original net w ork ( G , f G ), we observe the following: (1) The singular left lea v es remo ved in Steps 1–2 and those remaining in Step 4 form a partition of U 1 G ∪ U 2 G , so | U 1 G (1) | + | U 2 G (1) | + | U 1 G (2) | + | U 1 G (4) | = | U 1 G | + | U 2 G | . (2) Similarly , the right original leav es remov ed in Steps 1–4 form a partition of V ro G , so 4 X k =1 | V ro G ( k ) | = | V ro G | . 48 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES (3) By the remo v al pro cess of Definition 3.19 , every right new leaf remo v ed in Steps 1–4 m ust ha v e b een a type 2 matrix v ertex in the original net w ork ( G , f G ). T ogether with the t yp e 2 matrix v ertices remaining in Step 4, this accoun ts for all t yp e 2 matrix v ertices of ( G , f G ). Th us 4 X k =1 | V rn G ( k ) | + | V m 2 G (4) | = | V m 2 G | . (4) It is clear that w e must ha ve | U 1 G (2) | + | U 1 G (4) | ≤ | U 1 G | . Th us, w e hav e the bound | v al( G , f G ) | ≺  N δ √ N  | U 1 G | + | U 2 G | × Φ | V ro G | + | V m 2 G |−| U 1 G | ( a ) ≤  N δ √ N  | K | × Φ | V ro G | ( b ) ≤ N −| K | / 2 × ( N δ Φ) 2 L ( c ) ≤ N −| S | + L × ( N δ Φ) 2 L where (a) applies | U 1 G | + | U 2 G | = | K | and | V m 2 G | ≥ | U 1 G | from Lemma 3.23 (b–c), (b) applies | K | ≤ 2 L and | V ro G | = 2 L b y Lemma 3.23 (a), and (c) applies the b ound (2 L − | K | ) / 2 + | K | ≥ | S | . Applying this b ound to ( 3.22 ) and choosing D > 0 large enough sho ws E      N X i =1 N − 1 / 2 Y ( i ) i Q ( i ) i      2 L ≺ ( N δ Φ) 2 L , and the desired b ound follo ws from Marko v’s inequalit y . □ 3.5.5. Pr o of of L emma 3.6 (b). The follo wing lemma is similar to Lemma 3.23 . Lemma 3.25. Fix any inte ger L ≥ 1 . F or any i 1 , . . . , i 2 L , j 1 , . . . j 2 L ∈ [ N ] wher e i l  = j l , denote S = { i 1 , . . . , i 2 L } ∪ { j 1 , . . . , j 2 L } the set of distinct indic es. L et q 1 , . . . , q 2 L b e a se quenc e of mono- mials satisfying the c onditions of L emma 3.12 , wher e e ach q l is a monomial in the exp ansion of B ( i l j l ) i l j l Q ( i l ) i l Q ( i l j l ) j l . L et q = Q L l =1 Q 2 L l = L +1 ¯ q l . Then ther e exists a c onstant C ≡ C ( L ) > 0 , an exp onent m ∈ [0 , C ] , and a set N of ( S, 4 L ) -valid tensor networks with |N | ≤ C such that | E S q | = |C ( S ) | m · X ( G ,f G ) ∈N | v al( G , f G ) | Mor e over, for al l ( G , f G ) ∈ N , ther e exist disjoint subsets of tensor vertic es U 1 G , U 2 G ⊆ V t G such that the fol lowing holds: (a) We have | V l G | = | V r G | = 0 . (I.e., ther e ar e no le aves, and the only vertic es ar e tensor vertic es and matrix vertic es.) (b) L et K ⊆ S b e the subset of indic es that app e ar exactly onc e in the list ( i 1 , . . . , i 2 L , j 1 , . . . , j 2 L ) . Then | K | = | U 1 G | + | U 2 G | . (c) F or al l v ∈ U 1 G ∪ U 2 G , de g ( v ) is o dd and de g ( v ) ≥ 3 . (d) We have | V m 2 G | ≥ 2 L + | U 1 G | . (e) F or al l v ∈ U 2 G , m 1( v ) ≥ 1 . Pr o of. The pro of is similar to Lemma 3.23 , so we omit some details. The same argumen ts as in Lemma 3.23 sho w that | E S q | = |C ( S ) | m · X ( G ,f G ) ∈N | v al( G , f G ) | for some m ∈ [0 , C ( L )] and family N of ( S, 4 L )-v alid tensor net works where |N | ≤ C ( L ). These net w orks ha v e no lea ves since the terms B ( S ) ij and P ( S ) i do not in volv e the v ector u . ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 49 Let K b e the set of indices app earing exactly once in ( i 1 , . . . , i 2 L , j 1 , . . . , j 2 L ). F or eac h i l ∈ K (or j l ∈ K ), Lemma 3.12 (b) ensures that g i l (resp. g j l ) appears an o dd n um b er of times in q l and an even n um b er of times in Q l ′  = l q l ′ . Thus, in the momen t-cumulan t expansion, there must exist some cumulan t tensor κ k ( g i l ) (resp. κ k ( g j l )) with k o dd. Moreo ver, since E g i l = E g j l = 0, we m ust ha v e k ≥ 3. Let U G ⊆ V t G b e a choice of one odd-degree tensor v ertex represen ting an o dd-order cum ulan t for eac h i l ∈ K or j l ∈ K ; we denote this choice b y v ( i l ) ∈ U G or v ( j l ) ∈ U G . It follows that | U G | = | K | . W e no w partition U G in to U 1 G and U 2 G satisfying the conditions (d) and (e) of the lemma. Let U 1 G b e the set of v ( i l ) or v ( j l ) in U G for which either (1) i l (resp. j l ) app ears as a lo w er index in some { q l ′ : l ′  = l } , or (2) i l (resp. j l ) do es not app ear as a lo wer index in an y { q l ′ : l ′  = l } , and v ( i l ) (resp. v ( j l )) has at least three type 2 matrix neighbors. T o b ound | U 1 G | , let m ∗∗ B ( l ) , m ∗ 1 B ( l ) , m ∗ 2 B ( l ) , m B ( l ) denote the v alues of m ∗∗ B , m ∗ 1 B , m ∗ 2 B , m B for q l as defined in Lemma 3.12 , and set m ∗∗ B = 2 L X l =1 m ∗∗ B ( l ) , m ∗ B = 2 L X l =1 m ∗ 1 B ( l ) + m ∗ 2 B ( l ) , m B = 2 L X l =1 m B ( l ) . Eac h i l ∈ K (or j l ∈ K ) is distinct from { i l ′ , j l ′ : l ′  = l } , so Lemma 3.12 (c) ensures that the num b er of vertices v ( i l ) , v ( j l ) ∈ U G satisfying condition (1.) is at most m B + 1 2 m ∗ B . F or condition (2.), denote further m ∗ 1 P ( l ) as the num ber of factors of P ( S ) i l app earing in q l for whic h its copy of N − 1 R ( S ) lab els a type 2 matrix v ertex. Define similarly m ∗ 2 P ( l ) as the num ber of suc h factors of P ( S ) j l in q l , and set E ( l ) = m ∗∗ B ( l ) + 1 2 ( m ∗ 1 B ( l ) + m ∗ 2 B ( l )) + m ∗ 1 P ( l ) + m ∗ 2 P ( l ) − 1 . Consider any l ∈ { 1 , . . . , 2 L } . Supp ose first that b oth i l , j l ∈ K , and v ( i l ) , v ( j l ) ∈ U G and satisfy condition (2.). Then either (i) m ∗∗ B ( l ) ≥ 3; (ii) m ∗∗ B ( l ) = 2 in whic h case m ∗ 1 B ( l ) , m ∗ 2 B ( l ) ≥ 1 by Lemma 3.12 (b); (iii) m ∗∗ B ( l ) = 1 in which case m ∗ 1 B ( l ) + m ∗ 1 P ( l ) ≥ 2 and m ∗ 2 B ( l ) + m ∗ 2 P ( l ) ≥ 2 since v ( i l ) , v ( j l ) b oth hav e at least three type 2 matrix neighbors; or (iv) m ∗∗ B ( l ) = 0 in which case m ∗ 1 B ( l ) + m ∗ 1 P ( l ) ≥ 3 and m ∗ 2 B ( l ) + m ∗ 2 P ( l ) ≥ 3. In all four cases (i–iv), we verify that E ( l ) ≥ 2. No w supp ose that only v ( i l ) ∈ U G satisfies condition (2.) (so either j l / ∈ K or v ( j l ) ∈ U G do es not satisfy condition (2.)). Then either (i) m ∗∗ B ( l ) ≥ 2; (ii) m ∗∗ B ( l ) = 1 and m ∗ 1 B ( l ) + m ∗ 1 P ( l ) ≥ 2; or (iii) m ∗∗ B ( l ) = 0 and m ∗ 1 B ( l ) + m ∗ 1 P ( l ) ≥ 3. In case (iii), w e hav e also m ∗ 2 B ( l ) ≥ 1 since m ∗ 2 B ( l ) is o dd by Lemma 3.12 (b). Th us in all three cases (i–iii), we verify that E ( l ) ≥ 1. Finally , we hav e E ( l ) ≥ 0 for ev ery l = 1 , . . . , 2 L by Lemma 3.12 (a). Thus, we conclude that the num ber of vertices v ( i l ) , v ( j l ) ∈ U G satisfying condition (2.) is at most P 2 L l =1 E ( l ), and so | U 1 G | ≤ m B + 1 2 m ∗ B | {z } condition (1.) + 2 L X l =1 E ( l ) | {z } condition (2.) ≤ m B + m ∗ B + m ∗∗ B + m ∗ P − 2 L, where we hav e set also m ∗ P = P 2 L l =1 m ∗ 1 P ( l ) + m ∗ 2 P ( l ). Here m B + m ∗ B + m ∗∗ B + m ∗ P ≤ | V m 2 G | , the total n um b er of t yp e 2 matrix v ertices, showing statemen t (d) of the lemma. Finally , eac h remaining vertex v ∈ U 2 G = U G \ U 1 G satisfies deg( v ) ≥ 3 and m 2( v ) ≤ 2, implying that m 1( v ) ≥ 1. This shows (e) and completes the pro of. □ 50 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Pr o of of L emma 3.6 (b). Recall from ( 3.20 ) that, for any fixed constan ts L ≥ 1 and D > 0, E       X i  = j ¯ v ( i ) v ( j ) B ( ij ) ij Q ( i ) i Q ( ij ) j       2 L ≺ max i 1 ,...,i 2 L ,j 1 ,...,j 2 L ∈ [ N ] q 1 ∈M i 1 j 1 ,S ,...,q 2 L ∈M i 2 L j 2 L ,S n N | K | / 2 E | E S q | o + O ≺ ( N − τ ′ ( D +1)+2 L ) . (3.23) By Lemma 3.25 , there exists a family N of ( S , 4 L )-v alid tensor net w orks ( G , f G ) for whic h | E S q | ≺ max ( G ,f G ) ∈N | v al( G , f G ) | . Similar to the pro of of Lemma 3.6 (a), we reduce ( G , f G ) in a sequence of four steps. These steps will successiv ely remo ve tensor vertices from the net w ork, introducing right new lea v es and also eigen-lea v es to the net w ork. Letting U 1 G , U 2 G ⊆ V t G b e the subsets of tensor vertices in Lemma 3.25 , these steps are: (1) While there exists a tensor vertex v ∈ U 2 G : Remov e v and apply Lemma 3.21 (a) to b ound | v al( G , f G ) | ≺ N δ √ N Φ r ( v ) max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . (2) While there exists a tensor vertex v ∈ U 1 G : (a) If there exists suc h a vertex with either m 1( v ) ≥ 1, r ( v ) ≥ 1, or b oth e ( v ) = 1 and r ( v ) + m 1( v ) = 0, remov e v and apply either Lemma 3.21 (a) or (b) to b ound | v al( G , f G ) | ≺ 1 √ N Φ r ( v ) − 1 max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . (Lemma 3.21 (a) w ould give a stronger b ound with r ( v ) in place of r ( v ) − 1, and w e w eak en this to r ( v ) − 1 ab ov e.) (b) If all v ertices v ∈ U 1 G satisfy e ( v ) + r ( v ) + m 1( v ) = 0, then since deg( v ) ≥ 3, each suc h v is adjacen t to at least three t yp e 2 matrix vertices. Pick one of them, sa y u ∈ V m 2 G . Decomp osing the lab el N − 1 R ( S ) of u as N − 1 P α | λ α − z | − 1 v α v ∗ α , let ( G ′′ , F G ′′ ) denote the family of netw orks obtained by replacing u with tw o eigen-lea v es having a common lab el in { v α : α ∈ [ N ] } . Pictorially , this corresp onds to the follo wing op eration: κ k ( g i ) R ( S ) N R ( S ) N . . . R ( S ) N R ( S ) N R ( S ) N . . . R ( S ) N R ( S ) N . . . 1 N P α 1 | λ α − z | × κ k ( g i ) R ( S ) N R ( S ) N . . . R ( S ) N R ( S ) N . . . R ( S ) N R ( S ) N . . . v α v α By the giv en bound N − 1 ∥ R ( S ) ∥ 1 ≺ 1, w e ha v e | v al( G , f G ) | ≺ max f G ′′ ∈F G ′′ | v al( G ′′ , f G ′′ ) | The vertex v in ( G ′′ , f G ′′ ) now satisfies e ( v ) = 1 and r ( v ) + m 1( v ) = 0, so w e ma y apply Lemma 3.21 (a) to remov e v and b ound the resulting v alue. Letting ( G ′ , F G ′ ) denote ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 51 the family of netw orks obtained by removing v from some ( G ′′ , f G ′′ ) in this wa y , this yields | v al( G , f G ) | ≺ N δ √ N Φ r ( v ) max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . (3) While there exists a tensor vertex v ∈ V t G suc h that deg( v ) ≥ 3: Remov e v , and apply Corollary 3.20 to b ound | v al( G , f G ) | ≺ Φ r ( v ) max f G ′ ∈F G ′ | v al( G ′ , f G ′ ) | . (4) At this p oint there are no type 1 matrix vertices, and we ha v e deg( v ) = 2 for all v ∈ V t G so G con tains only chains and cycles. The endp oints of a chain can b e either righ t (new) lea v es or eigen-leav es. Then, applying the b ounds T r( N − 1 R ( S ) Σ) k ≺ ∥ N − 1 R ( S ) ∥ k F ≺ Φ k , ( N − 1 / 2 R ( S ) x ) ∗ ( N − 1 R ( S ) ) k ( N − 1 / 2 R ( S ) y ) ≺ ∥ N − 1 R ( S ) ∥ k F ∥ N − 1 / 2 R ( S ) x ∥ 2 ∥ N − 1 / 2 R ( S ) y ∥ 2 ≺ Φ k +2 , v ∗ α ( N − 1 R ( S ) ) k ( N − 1 / 2 R ( S ) x ) ≺ ∥ N − 1 R ( S ) ∥ k F ∥ N − 1 / 2 R ( S ) x ∥ 2 ≺ Φ k +1 , v ∗ α ( N − 1 R ( S ) ) k v β ≺ ∥ N − 1 R ( S ) ∥ k F ≺ Φ k uniformly ov er x , y ∈ U and eigenv ectors v α , v β of R ( S ) , we hav e | v al( G , f G ) | ≺ Φ | V m 2 G (4) | + | V r G (4) | . where | V m 2 G (4) | , | V r G (4) | denote the num b ers of t ype 2 matrix vertices and righ t (new) lea v es in the net work at the start of Step 4. W e clarify that as in Lemma 3.6 (a), Steps 1–3 are applied sequentially , where eac h application of Step 2 applies Step 2(b) only when there are no vertices v ∈ U 1 G satisfying the conditions of Step 2(a). At the conclusion of Step 1, all vertices of U 2 G are remo ved from the netw ork. Eac h time Step 2(b) is applied, it creates an eigen-leaf in the resulting netw ork ( G ′ , f G ′ ). If this eigen-leaf is adjacen t to a different v ertex v ′ ∈ U 1 G , then Step 2(b) cannot b e applied again un til v ′ is remo ved b y a further application of Step 2(a). Thus, Step 2 preserves the property that ev ery v ∈ U 1 G alw a ys has a n um ber of adjacent eigen-leav es e ( v ) that is 0 or 1. Then at the conclusion of Step 2, all v ertices of U 1 G are remov ed from the netw ork. F or each Step k ∈ { 1 , 2 , 3 } , let | V r G ( k ) | denote the total num ber of righ t (new) lea v es remov ed in Step k . Denote also | U 1 G (2 a ) | and | U 1 G (2 b ) | as the total num b ers of v ertices in U 1 G that are remov ed in Step 2(a) and 2(b), respectively . Then the ab o ve pro cedure gives the b ound | v al( G , f G ) | ≺  N δ √ N  | U 2 G | Φ | V r G (1) | | {z } Step 1 ×  N δ √ N  | U 1 G | Φ | V r G (2) |−| U 1 G (2 a ) | | {z } Step 2 × Φ | V r G (3) | | {z } Step 3 × Φ | V m 2 G (4) | + | V r G (4) | | {z } Step 4 Eac h type 2 matrix vertex of the original netw ork ( G , f G ) is either turned in to a right new leaf in Steps 1–4, turned into an eigen-leaf in Step 2(b), or remains a type 2 matrix v ertex in Step 4. Moreo v er, eac h application of Step 2(b) turns exactly one matrix v ertex into an eigen-leaf. Th us 4 X k =1 | V r G ( k ) | + | V m 2 G (4) | = | V m 2 G | − | U 1 G (2 b ) | . Com bining with | U 1 G (2 a ) | + | U 1 G (2 b ) | = | U 1 G | and | U 1 G | + | U 2 G | = | K | (b y Lemma 3.25 ) gives | v al( G , f G ) | ≺  N δ √ N  | K | Φ | V m 2 G |−| U 1 G | ≤ N −| K | / 2 ( N 2 δ Φ) 2 L , 52 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES the second inequalit y applying | V m 2 G | ≥ 2 L + | U 1 G | from Lemma 3.25 and | K | ≤ 4 L . Applying this to ( 3.23 ) and choosing D > 0 large enough sho ws that ( 3.23 ) is O ≺ (( N 2 δ Φ) 2 L ), so the result follo ws again from Mark ov’s inequalit y . □ 3.5.6. Pr o of of L emma 3.6 (c). The follo wing lemma is similar to Lemmas 3.23 and 3.25 . Lemma 3.26. Fix any inte ger L ≥ 1 . F or any i 1 , . . . , i 2 L ∈ [ N ] , denote S = { i 1 , . . . , i 2 L } the set of distinct indic es. L et q 1 , . . . , q 2 L b e a se quenc e of monomials satisfying the c onditions of L emma 3.13 , wher e e ach q l is a monomial fr om the exp ansion of X ( i l ) i l Q ( i l ) i l , and let q = Q L l =1 q l Q 2 L l = L +1 ¯ q . Then ther e exists a c onstant C ≡ C ( L ) > 0 , an exp onent m ∈ [0 , C ] , and a set N of ( S, 2 L ) -valid tensor networks with |N | ≤ C , such that | E S q | = |C ( S ) | m · X ( G ,f G ) ∈N | v al( G , f G ) | Mor e over, for al l ( G , f G ) ∈ N , ther e exist disjoint subsets of tensor vertic es U 1 G , U 2 G ⊆ V t G such that the fol lowing holds: (a) We have | V ro G | = 2 L and | V l G | = | V rn G | = | V e G | = 0 . (I.e., ther e ar e no left le aves, right new le aves, or eigen-le aves, and only right original le aves.) (b) F or al l v ∈ U 1 G ∪ U 2 G , de g ( v ) is o dd and de g ( v ) ≥ 3 . (c) L et K ⊆ S b e the subset of indic es app e aring exactly onc e in ( i 1 , . . . , i 2 L ) . Then | K | = | U 1 G | + | U 2 G | . (d) We have | V ro G | + | V m 2 G | ≥ 2 L + | U 1 G | . (e) F or al l v ∈ U 2 G , m 1( v ) ≥ 1 . Pr o of. As in Lemmas 3.23 and Lemma 3.25 , we ha ve | E S q | = |C ( S ) | m · X ( G ,f G ) ∈N | v al( G , f G ) | for some m ∈ [0 , C ( L )] and family N of ( S, 2 L )-v alid tensor netw orks where |N | ≤ C ( L ). It is clear that | V l G | = | V rn G | = | V e G | = 0 in this representation. The claim | V ro G | = 2 L follows from the fact that there is exactly 1 factor of the form X ( S ) i l or {X ( S ) j } j ∈ S ( i l ) in each q l , establishing (a). As in Lemma 3.25 , for eac h i l ∈ K , g i l app ears an odd num b er of times in q l and an ev en n um b er of times in Q l ′  = l q l ′ , so we ma y iden tify a subset U G = { v ( i l ) : i l ∈ K } ⊆ V t G of tensor v ertices represen ting an o dd-order cumulan t of g i l for each i l ∈ K ; thus | U G | = | K | , and deg( v ) ≥ 3 and deg( v ) is o dd for e ac h v ∈ U G . W e partition U G in to U 1 G ∪ U 2 G b y letting U 1 G b e the set of vertices v ( i l ) for whic h either (1) i l app ears as a low er index in some { q l ′ : l ′  = l } , or (2) i l do es not app ear as a low er index in any { q l ′ : l ′  = l } , and r o ( v ( i l )) + m 2( v ( i l )) ≥ 3. Let m ∗ X ( l ) , m X ( l ) , m ∗ B ( l ) , m B ( l ) denote the coun ts of Lemma 3.13 for each q l , and let m ∗ P ( l ) denote the num ber of factors P ( S ) i l of q l whic h correspond to type 2 matrix vertices. Set m ∗ X = 2 L X l =1 m ∗ X ( l ) , m X = 2 L X l =1 m X ( l ) , m ∗ B = 2 L X l =1 m ∗ B ( l ) , m B = 2 L X l =1 m B ( l ) , m ∗ P = 2 L X l =1 m P ( l ) and set also E ( l ) = m ∗ X ( l ) + 1 2 ( m X ( l ) + m ∗ B ( l )) + m ∗ P ( l ) − 1 . By Lemma 3.13 , the num b er of v ertices v ( i l ) ∈ U G satisfying (1.) is at most 1 2 m X + m B + 1 2 m ∗ B . F or eac h v ( i l ) ∈ U G satisfying (2.), we ha ve either m ∗ X ( l ) = 1 in whic h case m ∗ B ( l ) + m ∗ P ( l ) ≥ 2, or m ∗ X ( l ) = 0 in which case m ∗ B ( l ) + m ∗ P ( l ) ≥ 3 and also m X ( l ) = 1 b y Lemma 3.13 (a). Th us in b oth ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 53 cases, E ( l ) ≥ 1. W e hav e also E ( l ) ≥ 0 for eac h l = 1 , . . . , 2 L by Lemma 3.13 , so the num ber of v ertices v ( i l ) ∈ U G satisfying (2.) is at most P 2 L l =1 E ( l ), implying | U 1 G | ≤ 1 2 m X + m B + 1 2 m ∗ B + 2 L X l =1 E ( l ) ≤ m ∗ X + m X + m ∗ B + m B + m ∗ P − 2 L ≤ | V ro G | + | V m 2 G | − 2 L. This sho ws (d). F or each remaining v ∈ U 2 G , w e ha v e deg( v ) ≥ 3 and ro ( v ) + m 2( v ) ≤ 2, implying m 1( v ) ≥ 1, which sho ws (e). □ Pr o of of L emma 3.6 (c). Recall from ( 3.21 ) that, for any fixed constan ts L ≥ 1 and D > 0, E      N X i =1 ¯ v ( i ) X ( i ) i Q ( i ) i      2 L ≺ max i 1 ,...,i 2 L ∈ [ N ] q 1 ∈M i 1 ,S ,...,q 2 L ∈M i 2 L ,S n N | K | / 2 E | E S q | o + O ≺ ( N − τ ′ ( D +1)+ L ) . (3.24) By Lemma 3.26 , there exists a family N of ( S , 2 L )-v alid tensor net w orks for which | E S q | ≺ max ( G ,f G ) ∈N | v al( G , f G ) | . By following the same four steps and arguments as in the pro of of Lemma 3.6 (b), we arriv e at the same b ound v al( G , f G ) ≺  N δ √ N  | U 2 G | Φ | V r G (1) | | {z } Step 1 ×  N δ √ N  | U 1 G | Φ | V r G (2) |−| U 1 G (2 a ) | | {z } Step 2 × Φ | V r G (3) | | {z } Step 3 × Φ | V m 2 G (4) | + | V r G (4) | | {z } Step 4 , where the only difference is that | V r G ( k ) | for each Step k ∈ { 1 , 2 , 3 , 4 } coun ts righ t original leav es in addition to righ t new leav es. By the same argumen ts as in Lemma 3.6 (b), w e hav e 4 X k =1 | V r G ( k ) | + | V m 2 G (4) | = | V m 2 G | + | V ro G | − | U 1 G (2 b ) | , | U 1 G (2 a ) | + | U 1 G (2 b ) | = | U 1 G | , and | U 1 G | + | U 2 G | = | K | , so | v al( G , f G ) | ≺  N δ √ N  | K | Φ | V ro G | + | V m 2 G |−| U 1 G | ≤ N −| K | / 2 ( N δ Φ) 2 L , the last inequality using | V ro G | + | V m 2 G | ≥ 2 L + | U 1 G | and K ≤ 2 L . Applying this ab o ve and choosing D > 0 large enough sho ws that ( 3.24 ) is O ≺ (( N δ Φ) 2 L ), so the result follows again b y Mark o v’s inequalit y . □ 4. Proofs of the main resul ts W e now sho w Theorems 2.5 , 2.8 , and 2.10 . Many of the argumen ts are close to those of [ Sil95 ; BS98 ; PY14 ; KY17 ], which we repro duce here for the reader’s conv enience. 4.1. Resolv en t b ounds. F or each z = E + iη ∈ C + , let us define the con trol parameters Λ = N max i,j =1 | e R ij − e m 0 1 { i = j }| , Θ = | e m − e m 0 | , Ψ Θ = s Im e m 0 + Θ N η . Fixing a constan t C 0 > 0, w e define the ( z -dep enden t) even t Ξ = { Λ ≤ N − τ /C 0 } . 54 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Lemma 4.1 (Global b ounds) . Supp ose Assumptions 1 and 2 hold. Then for al l i ∈ [ N ] and z = E + iη ∈ C + satisfying | z | ≤ C for a c onstant C > 0 , we have | e m | ≤ η − 1 , | e R ii | ≤ η − 1 , | z e R ii | = | 1 + N − 1 g ∗ i R ( i ) g i | − 1 ≤ C η − 1 , N − 1 / 2 ∥ R ( i ) ∥ F ≤ η − 1 , ∥ ( I + e m Σ) − 1 ∥ op ≤ C η − 1 , ∥ ( I + e m ( i ) Σ) − 1 ∥ op ≤ C η − 1 . F urthermor e, uniformly over i ∈ [ N ] and z = E + iη ∈ C + with | z | ≤ C , | e m | − 1 ≺ η − 1 , | e m − e m ( i ) | ≺ N − 1 η − 3 . Pr o of. The first four b ounds follo w from ∥ e R ∥ op , ∥ e R ( i ) ∥ op ≤ η − 1 . F or ∥ ( I + e m Σ) − 1 ∥ op , letting λ i denote the i -th eigenv alue of e K , note that Im [ z e m ( z )] = 1 N N X i =1 η λ i | λ i − z | 2 ≥ 0 . Then for an y α ∈ [ n ], | z + z σ α e m ( z ) | ≥ | η + σ α Im [ z e m ( z )] | ≥ η , so ∥ ( I + e m Σ) − 1 ∥ op ≤ | z | η − 1 ≤ C η − 1 , and similarly ∥ ( I + e m ( i ) Σ) − 1 ∥ op ≤ C η − 1 . F or the b ound on | e m | − 1 , | e m | ≥ Im e m = 1 N N X i =1 η ( λ i − E ) 2 + η 2 ≥ η max i ( λ i − E ) 2 + η 2 . Applying ∥ e K ∥ op = ∥ K ∥ op ≺ 1 from Lemma 3.8 (b) and | z | ≤ C yields max α ( λ α − E ) 2 + η 2 ≺ 1, so | e m | − 1 ≺ η − 1 . F or | e m − e m ( i ) | , apply ( 3.4 ) to obtain | e m − e m ( i ) | = γ | m − m ( i ) | = | z e R ii N − 2 g ∗ i ( R ( i ) ) 2 g i | ≺ η − 1 ( N − 1 ∥ R ( i ) ∥ F ) 2 ≤ N − 1 η − 3 , where we applied Assumptions 1 and 2 to show g ∗ i ( R ( i ) ) 2 g i ≺ | T r( R ( i ) ) 2 Σ | + ∥ ( R ( i ) ) 2 ∥ F ≺ ∥ R ( i ) ∥ 2 F . □ Lemma 4. 2 (Lo cal b ounds) . Supp ose Assumptions 1 and 2 hold, and D = { z ∈ C + : E ∈ U, η ∈ [ N − 1+ τ , 1] is a r e gular sp e ctr al domain. Then for any fixe d L ≥ 0 , uniformly over al l z ∈ D , S ⊆ [ N ] with | S | ≤ L , and i, j / ∈ S with i  = j , | e R ( S ) ii | 1 Ξ ≺ 1 , | e R ( S ) ii | − 1 1 Ξ ≺ 1 , | z e R ( S ) ii | 1 Ξ = | 1 + N − 1 g ∗ i R ( iS ) g i | − 1 1 Ξ ≺ 1 , | e R ( S ) ij | 1 Ξ ≺ Ψ Θ , | e m ( S ) | 1 Ξ ≺ 1 , | e m ( S ) | − 1 1 Ξ ≺ 1 , ∥ ( I + e m ( S ) Σ) − 1 ∥ op 1 Ξ ≺ 1 . N − 1 ∥ R ( S ) ∥ F 1 Ξ ≺ Ψ Θ , N − 1 ∥ R ( S ) ∥ 1 1 Ξ ≺ 1 . Mor e over, uniformly also over deterministic unit ve ctors x , y ∈ C n , | e m − e m ( S ) | 1 Ξ ≺ Ψ 2 Θ , | e R ( S ) ii − e m ( S ) | 1 Ξ ≺ Ψ Θ , | x ∗ R y − x ∗ R ( S ) y | 1 Ξ ≺ ∥ R x ∥ 2 √ N · ∥ R y ∥ 2 √ N , ∥ R ( S ) x ∥ 2 1 Ξ ≺ ∥ R x ∥ 2 . Pr o of. The b ounds | e R ( S ) ii | 1 Ξ ≺ 1 , | e R ( S ) ii | − 1 1 Ξ ≺ 1 , | z e R ( S ) ii | 1 Ξ = | 1 + N − 1 g ∗ i R ( iS ) g i | − 1 1 Ξ ≺ 1 follo w from | z | , | e m 0 ( z ) | ≍ 1 in Definition 2.3 for regularit y of D , the definition of Ξ, and iterativ e application of the resolven t identit y ( 3.2 ). W e next prov e | e m − e m ( S ) | 1 Ξ ≺ Ψ 2 Θ b y induction on | S | . F or | S | = 1, sa y S = { i } , we hav e | e m − e m ( i ) | = γ | m − m ( i ) | = | N − 1 T r R − N − 1 T r R ( i ) | . Applying the Sherman-Morrison identit y ( 3.4 ) and Assumption 2 yields | e m − e m ( i ) | 1 Ξ = | z e R ii || N − 2 g ∗ i R ( i )2 g i | 1 Ξ ≺ N − 2 ∥ R ( i ) ∥ 2 F 1 Ξ ≺ Ψ 2 Θ + ( N η ) − 1 | e m − e m ( i ) | 1 Ξ , ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 55 where the last b ound uses W ard’s identit y ( 3.5 ), N − 2 ∥ R ( i ) ∥ 2 F = γ Im m ( i ) N η = Im e m ( i ) N η + (1 − γ ) η | z | − 2 N η ≺ Im e m 0 + Θ N η + | e m − e m ( i ) | N η + 1 N ≺ Ψ 2 Θ + | e m − e m ( i ) | N η . Since ( N η ) − 1 ≤ N − τ ≤ 1 / 2 for large N , rearranging pro v es the case | S | = 1. No w assume the claim holds for | S | = s . F or i / ∈ S with | S | = s , let S = { i 1 , . . . , i s } and S k = { i 1 , . . . , i k } . Iterativ ely applying ( 3.4 ) gives | e m − e m ( iS ) | 1 Ξ =      z e R ( S ) ii N − 2 g ∗ i ( R ( iS ) ) 2 g i + s X k =1 z e R ( S k − 1 ) i k i k N − 2 g ∗ i k ( R ( S k ) ) 2 g i k      1 Ξ ≺ Ψ 2 Θ + ( N η ) − 1 | e m − e m ( iS ) | 1 Ξ + ( N η ) − 1 s X k =1 | e m − e m ( S k ) | 1 Ξ ≺ Ψ 2 Θ + ( N η ) − 1 | e m − e m ( iS ) | 1 Ξ where the last line applies the inductive h yp othesis. Rearranging completes the induction. Since | e m − e m 0 | = Θ ≤ Λ, the b ounds | e m ( S ) | 1 Ξ ≺ 1, | e m ( S ) | − 1 1 Ξ ≺ 1, and ∥ ( I + e m ( S ) Σ) − 1 ∥ op 1 Ξ ≺ 1 then follo w from the definition of Ξ and the corresp onding statements for e m 0 in Definition 2.3 for the regularit y of D . The b ound N − 1 ∥ R ( S ) ∥ F 1 Ξ ≺ Ψ Θ follo ws from W ard’s identit y as applied ab o v e, and | e R ( S ) ij | 1 Ξ ≺ Ψ Θ follo ws from the resolv en t identit y ( 3.3 ) and Assumption 2 . F or N − 1 ∥ R ( S ) ∥ 1 , corresp onding to z = E + iη ∈ D , define L = max { l ∈ N : η 2 l − 1 < 1 } , η l = 2 l η for l = 0 , . . . , L − 1 , η L = 1 . Then L ≤ C log N and η l /η l − 1 ≤ 2. Let { λ α } n α =1 b e the eigenv alues of K ( S ) . Define η − 1 = 0, η L +1 = ∞ , and U l = { α : η l − 1 ≤ | λ α − E | < η l } , l = 0 , . . . , L + 1 . Then N − 1 ∥ R ( S ) ∥ 1 = 1 N n X α =1 1 | λ α − z | = L +1 X l =0 1 N X α ∈ U l 1 | λ α − z | . F or l = 1 , . . . , L , since η l − 1 ≤ | λ α − E | < η l for α ∈ U l and | λ α − z | ≥ | λ α − E | , w e hav e 1 n X α ∈ U l 1 | λ α − z | ≤ 1 n X α ∈ U l η l ( λ α − E ) 2 ≤ 2 n X α ∈ U l η l ( λ α − E ) 2 + η 2 l − 1 ≤ 2 η l η l − 1 Im m ( E + iη l − 1 ) . Since the map η 7→ η Im m ( E + iη ) is nondecreasing, Im m ( E + iη l − 1 ) ≤ η l η l − 1 Im m ( E + iη l ). Then, applying η l /η l − 1 ≤ 2, 1 n X α ∈ U l 1 | λ α − z | ≤ 8 Im m ( E + iη l ) . F or l = 0, since 0 ≤ | λ α − E | < η 0 = η for α ∈ U 0 , we hav e similarly 1 n X α ∈ U 0 1 | λ α − z | ≤ 2 n X α ∈ U 0 η ( λ α − E ) 2 + η 2 ≤ 2 Im m ( E + iη ) ≤ 4 Im m ( E + iη 1 ) . F or l = L + 1, since η L = 1, | λ α − E | ≥ 1 for α ∈ U L +1 , and max α λ α = ∥ K ∥ op ≺ 1 b y Lemma 3.8 (b), we hav e 1 n X α ∈ U L +1 1 | λ α − z | ≤ 1 n X α ∈ U L +1 | λ α − E | + η ( λ α − E ) 2 + η 2 ≺ 1 n X α ∈ U L +1 1 ( λ α − E ) 2 + 1 ≤ Im m ( E + iη L ) . 56 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Th us N − 1 ∥ R ( S ) ∥ 1 ≺ L X l =1 Im m ( S ) ( E + iη l ) . (4.1) Since γ m ( S ) = e m ( S ) + (1 − γ ) z − 1 and γ ≍ 1, this implies N − 1 ∥ R ( S ) ∥ 1 1 Ξ ≺ L X l =1 Im e m ( S ) ( E + iη l ) + η l | E + iη l | 2 ! 1 Ξ ≺ 1 where the last b ound uses Im e m ( S ) ( E + iη l ) 1 Ξ ≺ 1, η l / | E + iη l | 2 ≺ 1 since | E | ≥ δ b y Definition 2.3 for regularity of D , and L ≤ C log N ≺ 1. F or | e R ( S ) ii − e m ( S ) | , applying the resolven t identit y ( 3.2 ), we ha v e on Ξ that 1 e R ( S ) ii = − z − z N − 1 g ∗ i R ( iS ) g i = − z − z N − 1 T r Σ R ( iS ) + O ≺ ( N − 1 ∥ R ( iS ) ∥ F ) = − z − z N − 1 T r Σ R − z 2 e R ( S ) ii N − 2 g ∗ i R ( iS ) Σ R ( iS ) g i + O ≺ ( N − 1 ∥ R ( iS ) ∥ F ) = − z − z N − 1 T r Σ R − z 2 e R ( S ) ii N − 2 T r Σ R ( i ) Σ R ( iS ) + O ≺ ( N − 2 ∥ R ( iS ) ∥ 2 F ) + O ≺ ( N − 1 ∥ R ( iS ) ∥ F ) = − z − z N − 1 T r Σ R + O ≺ ( N − 2 ∥ R ( iS ) ∥ 2 F ) + O ≺ ( N − 1 ∥ R ( iS ) ∥ F ) . Th us, for i  = j ,      1 e R ( S ) ii − 1 e R ( S ) j j      1 Ξ ≺ Ψ 2 Θ + Ψ Θ ≺ Ψ Θ , and so | e R ( S ) ii − e m ( S ) | 1 Ξ ≤ 1 N N X j =1 | e R ( S ) ii − e R ( S ) j j | 1 Ξ ≤ 1 N N X j =1 | e R ( S ) ii e R ( S ) j j |      1 e R ( S ) ii − 1 e R ( S ) j j      1 Ξ ≺ Ψ Θ . Finally , for | x ∗ R y − x ∗ R ( S ) y | , the b ound | g ∗ i R ( S ) x | ≺ ∥ R ( S ) x ∥ 2 in Lemma 3.8 (a) and iterative application of the Sherman-Morrison identit y ( 3.4 ) yield | x ∗ R y − x ∗ R ( S ) y | 1 Ξ ≺ ∥ R x ∥ 2 ∥ R y ∥ 2 / N . The last claim for ∥ R ( S ) x ∥ 2 then follows from ∥ R ( S ) x ∥ 2 2 1 Ξ = Im x ∗ R ( S ) x η 1 Ξ ≺ Im x ∗ R x η + | x ∗ R x − x ∗ R ( S ) x | η 1 Ξ ≺ ∥ R x ∥ 2 2 + ∥ R x ∥ 2 2 N η ≺ ∥ R x ∥ 2 2 . □ 4.2. Pro of of en trywise la w (Theorem 2.5 ). W e no w b egin analysis of the Marc henk o-P astur fixed p oint equation. Lemma 4.3. F or any A ∈ C n × n and z ∈ C + , we have T r RA − T r( − z I − z e m Σ) − 1 A = 1 z N X i =1 d i ( A ) 1 + N − 1 g ∗ i R ( i ) g i , (4.2) wher e d i ( A ) = N − 1 g ∗ i R ( i ) A ( I + e m Σ) − 1 g i − N − 1 T r RA ( I + e m Σ) − 1 Σ . In p articular, r e c al ling the function z 0 ( m ) fr om ( 2.7 ) and setting A = I in ( 4.2 ) , z 0 ( e m ) − z = − 1 e m · 1 N N X i =1 d i ( I ) 1 + N − 1 g ∗ i R ( i ) g i . (4.3) ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 57 Pr o of. The argumen t is the same as [ Sil95 , Eq. (2.4)]: F or an y B ∈ C n × n , T r B = T r( K − z I ) RB = − z T r RB + 1 N N X i =1 g ∗ i RB g i = − z T r RB + N X i =1 N − 1 g ∗ i R ( i ) B g i 1 + N − 1 g ∗ i R ( i ) g i where the last equality applies the Sherman-Morrison identit y ( 3.4 ). T aking B = I n , applying T r R ( z ) = nm ( z ) = N e m ( z ) − ( n − N ) /z , and rearranging giv es e m = − 1 N z N X i =1 1 1 + N − 1 g ∗ i R ( i ) g i . Then taking B = A ( I + e m Σ) − 1 and identifying d i ( A ) gives T r( I + e m Σ) − 1 A = − z T r RA ( I + e m Σ) − 1 + N X i =1 d i ( A ) + N − 1 T r RA ( I + e m Σ) − 1 Σ 1 + N − 1 g ∗ i R ( i ) g i = − z T r RA ( I + e m Σ) − 1 − z e m T r RA ( I + e m Σ) − 1 Σ + N X i =1 d i ( A ) 1 + N − 1 g ∗ i R ( i ) g i = − z T r RA + N X i =1 d i ( A ) 1 + N − 1 g ∗ i R ( i ) g i . Rearranging shows ( 4.2 ), and sp ecializing to A = I sho ws ( 4.3 ). □ The follo wing lemma will allow us to simplify the analysis b y replacing factors of ( I + e m Σ) − 1 in d i ( A ) with their deterministic coun terpart ( I + e m 0 Σ) − 1 . Lemma 4.4. Supp ose Assumptions 1 and 2 hold, and D is a r e gular sp e ctr al domain. Then ther e exists a c onstant l ≥ 1 such that, uniformly over al l z ∈ D , deterministic matric es A ∈ C n × n , and i ∈ [ N ] , d i ( A ) 1 Ξ = l X k =0 ( e m − e m 0 ) k h N − 1 T r( g i g ∗ i − Σ) R ( i ) A Σ k ( I + e m 0 Σ) − k − 1 i 1 Ξ + O ≺ (Ψ 2 Θ ∥ A ∥ op ) . Mor e over, if A = uv ∗ for ve ctors u , v ∈ C n , then d i ( uv ∗ ) 1 Ξ = l X k =0 ( e m − e m 0 ) k h N − 1 T r( g i g ∗ i − Σ) R ( i ) uv ∗ Σ k ( I + e m 0 Σ) − k − 1 i 1 Ξ + O ≺ ( N − 2 ∥ R ( i ) u ∥ 2 ∥ v ∗ Σ k +1 ( I + e m 0 Σ) − k − 1 R ( i ) ∥ 2 ) + O ≺ ( N − 1 Ψ Θ ∥ u ∥ 2 ∥ v ∥ 2 ) . Pr o of. Iteratively applying the iden tit y A − 1 − B − 1 = A − 1 ( B − A ) B − 1 giv es, for an y integer l ≥ 1, ( I + e m Σ) − 1 = l X k =0 ( e m − e m 0 ) k Σ k ( I + e m 0 Σ) − k − 1 + ( e m − e m 0 ) l +1 Σ l +1 ( I + e m 0 Σ) − l − 1 ( I + e m Σ) − 1 , 58 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES using the comm utativity of Σ, ( I + e m 0 Σ) − 1 , and ( I + e m Σ) − 1 . Thus, for any i ∈ [ N ], 1 N g ∗ i R ( i ) A ( I + e m Σ) − 1 g i − 1 N T r RA ( I + e m Σ) − 1 Σ = l X k =0 ( e m − e m 0 ) k  1 N g ∗ i R ( i ) A Σ k ( I + e m 0 Σ) − k − 1 g i − 1 N T r RA Σ k ( I + e m 0 Σ) − k − 1 Σ  + ( e m − e m 0 ) l +1 1 N g ∗ i R ( i ) A Σ l +1 ( I + e m 0 Σ) − l − 1 ( I + e m Σ) − 1 g i − ( e m − e m 0 ) l +1 1 N T r RA Σ l +1 ( I + e m 0 Σ) − l − 1 ( I + e m Σ) − 1 Σ . Since Θ ≤ Λ ≤ N − τ /C 0 on Ξ, the remainder terms satisfy     ( e m − e m 0 ) l +1 1 N g ∗ i R ( i ) A Σ l +1 ( I + e m 0 Σ) − l − 1 ( I + e m Σ) − 1 g i     1 Ξ ≤ | e m − e m 0 | l +1 1 N ∥ g i ∥ 2 2 ∥ R ( i ) ∥ op ∥ A ∥ op ∥ Σ ∥ l +1 op ∥ ( I + e m 0 Σ) − 1 ∥ l +1 op ∥ ( I + e m Σ) − 1 ∥ op 1 Ξ ≺ N − ( l +1) τ /C 0 η − 1 ∥ A ∥ op ≤ N − ( l +1) τ /C 0 +1 ∥ A ∥ op , using η ≥ N − 1+ τ > N − 1 , and similarly     ( e m − e m 0 ) l +1 1 N T r RA Σ l +1 ( I + e m 0 Σ) − l − 1 ( I + e m Σ) − 1 Σ     1 Ξ ≺ N − ( l +1) τ /C 0 +1 ∥ A ∥ op . F or eac h 0 ≤ k ≤ l , 1 N g ∗ i R ( i ) A Σ k ( I + e m 0 Σ) − k − 1 g i 1 Ξ − 1 N T r RA Σ k ( I + e m 0 Σ) − k − 1 Σ 1 Ξ = 1 N T r( g i g ∗ i − Σ) R ( i ) A Σ k ( I + e m 0 Σ) − k − 1 1 Ξ + N − 2 g ∗ i R ( i ) A Σ k +1 ( I + e m 0 Σ) − k − 1 R ( i ) g i 1 + N − 1 g ∗ i R ( i ) g i 1 Ξ , and applying Assumption 2 , Cauc hy-Sc h w arz, and Lemma 4.2 yields N − 2 | g ∗ i R ( i ) A Σ k +1 ( I + e m 0 Σ) − k − 1 R ( i ) g i | | 1 + N − 1 g ∗ i R ( i ) g i | 1 Ξ ≺ | N − 2 T r Σ R ( i ) A Σ k +1 ( I + e m 0 Σ) − k − 1 R ( i ) | 1 Ξ + N − 2 ∥ R ( i ) A Σ k +1 ( I + e m 0 Σ) − k − 1 R ( i ) ∥ F 1 Ξ ≺ ( N − 1 ∥ R ( i ) ∥ F ) 2 ∥ A ∥ op 1 Ξ ≺ Ψ 2 Θ ∥ A ∥ op . If A = uv ∗ , then w e ma y apply instead N − 2 | g ∗ i R ( i ) uv ∗ Σ k +1 ( I + e m 0 Σ) − k − 1 R ( i ) g i | | 1 + N − 1 g ∗ i R ( i ) g i | 1 Ξ ≺ N − 2 ∥ R ( i ) u ∥ 2 ∥ v ∗ Σ k +1 ( I + e m 0 Σ) − k − 1 R ( i ) ∥ 2 1 Ξ . The result follo ws b y c hoosing l large enough so that N − ( l +1) τ /C 0 +1 ≤ N − 1 Ψ Θ ≤ Ψ 2 Θ , noting that l is a constant dep ending only on τ and C 0 . □ Lemma 4.5. Supp ose Assumptions 1 and 2 hold, and D is a r e gular sp e ctr al domain. Then ther e exists a c onstant l ≥ 1 such that, uniformly over z ∈ D , [ z 0 ( e m ) − z ] 1 Ξ = O ≺ (Ψ 2 Θ ) + O ≺ max 0 ≤ k ≤ l      1 N N X i =1 N − 1 T r( g i g ∗ i − Σ) R ( i ) Σ k ( I + e m 0 Σ) − k − 1 1 + N − 1 g ∗ i R ( i ) g i      1 Ξ ! = O ≺ (Ψ Θ ) . ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 59 Pr o of. By Lemma 4.3 , [ z 0 ( e m ) − z ] 1 Ξ = − 1 e m · 1 N N X i =1 d i ( I ) 1 + N − 1 g ∗ i R ( i ) g i 1 Ξ . Applying Lemma 4.4 with A = I giv es, for a sufficien tly large in teger l ≥ 1, d i ( I ) 1 Ξ = l X k =0 ( e m − e m 0 ) k h N − 1 T r( g i g ∗ i − Σ) R ( i ) Σ k ( I + e m 0 Σ) − k − 1 i 1 Ξ + O ≺ (Ψ 2 Θ ) . Since Θ = | e m − e m 0 | ≤ Λ ≤ N − τ /C 0 ≤ 1 on Ξ and l is fixed, substituting the expansion of d i ( I ) in to the expression for z 0 ( e m ) − z and applying the b ounds from Lemma 4.2 yield the first equalit y . Moreo v er, b y Assumption 2 , uniformly o ver all 0 ≤ k ≤ l , | N − 1 T r( g i g ∗ i − Σ) R ( i ) Σ k ( I + e m 0 Σ) − k − 1 | 1 Ξ ≺ N − 1 ∥ R ( i ) ∥ F 1 Ξ ≺ Ψ Θ . Then the second equality follows since Ψ 2 Θ ≺ Ψ Θ . □ Lemma 4.6. Supp ose Assumptions 1 and 2 hold, and D is a r e gular sp e ctr al domain. Then, uniformly over z = E + iη ∈ D with η = 1 , we have Λ( z ) ≺ N − 1 / 4 . Pr o of. Let d i ( I ) b e as in Lemma 4.3 . As in [ BS98 ], we decomp ose d i ( I ) = d i, 1 + d i, 2 + d i, 3 + d i, 4 , where d i, 1 = N − 1 g ∗ i R ( i ) ( I + e m Σ) − 1 g i − N − 1 g ∗ i R ( i ) ( I + e m ( i ) Σ) − 1 g i , d i, 2 = N − 1 g ∗ i R ( i ) ( I + e m ( i ) Σ) − 1 g i − N − 1 T r Σ R ( i ) ( I + e m ( i ) Σ) − 1 , d i, 3 = N − 1 T r Σ R ( i ) ( I + e m ( i ) Σ) − 1 − N − 1 T r Σ R ( I + e m ( i ) Σ) − 1 , d i, 4 = N − 1 T r Σ R ( I + e m ( i ) Σ) − 1 − N − 1 T r Σ R ( I + e m Σ) − 1 . Applying the b ounds from Lemma 4.1 and Assumption 2 yields | d i, 1 | = | e m − e m ( i ) || N − 1 g ∗ i R ( i ) ( I + e m Σ) − 1 Σ( I + e m ( i ) Σ) − 1 g i | ≺ | e m − e m ( i ) | ( ∥ R ( i ) ∥ op + N − 1 ∥ R ( i ) ∥ F ) ∥ ( I + e m Σ) − 1 ∥ op ∥ ( I + e m ( i ) Σ) − 1 ∥ op ≺ N − 1 η − 6 , | d i, 2 | ≺ N − 1 ∥ R ( i ) ( I + e m ( i ) Σ) − 1 ∥ F ≤ N − 1 ∥ R ( i ) ∥ F ∥ ( I + e m ( i ) Σ) − 1 ∥ op ≺ N − 1 / 2 η − 2 , | d i, 3 | = | N − 2 g ∗ i R ( i ) ( I + e m ( i ) Σ) − 1 Σ R ( i ) g i | | 1 + N − 1 g ∗ i R ( i ) g i | ≺ N − 2 ∥ R ( i ) ∥ 2 F η − 2 ≺ N − 1 η − 4 , | d i, 4 | = | e m − e m ( i ) || N − 1 T r Σ R ( I + e m ( i ) Σ) − 1 Σ( I + e m Σ) − 1 | ≤ | e m − e m ( i ) |∥ Σ ∥ 2 op ∥ ( I + e m ( i ) Σ) − 1 ∥ op ∥ ( I + e m Σ) − 1 ∥ op ∥ R ∥ op ≺ N − 1 η − 6 . Since η = 1, w e ha v e | d i ( I ) | ≺ N − 1 / 2 . Thus, b y Lemma 4.3 , z 0 ( e m ) − z = − 1 e m · 1 N N X i =1 O ≺ ( N − 1 / 2 ) 1 + N − 1 g ∗ i R ( i ) g i = O ≺ ( N − 1 / 2 ) . F or an y small ε > 0 and large D > 0, there exists N 0 ( ε, D ) such that for all N ≥ N 0 , P  | z 0 ( e m ) − z | ≤ N ε N − 1 / 2  ≥ 1 − N − D . Since N ε N − 1 / 2 satisfies the requiremen ts for ∆ in Definition 2.3 , on this even t, | e m − e m 0 | ≤ C N ε N − 1 / 2 √ κ + η + N ε/ 2 N − 1 / 4 ≤ C N ε/ 2 N − 1 / 4 . 60 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Since ε is arbitrary , Θ ≺ N − 1 / 4 . By Lemma 4.2 with η = 1, uniformly ov er distinct i, j ∈ [ N ], | e R ii − e m 0 | ≤ | e R ii − e m | + Θ ≺ Ψ Θ + Θ ≺ N − 1 / 4 , | e R ij | ≺ Ψ Θ ≺ N − 1 / 2 . □ The follo wing lemma now establishes a weak lo cal la w o v er all z ∈ D using a sto c hastic con tin uit y argumen t (see e.g. [ PY14 , Lemma 6.12]). Lemma 4.7. Supp ose Assumptions 1 and 2 hold, and D is a r e gular sp e ctr al domain. Then uniformly over z = E + iη ∈ D , we have Λ( z ) ≺ ( N η ) − 1 / 4 ≤ N − τ / 4 . Pr o of. F or any z = E + iη ∈ D , let L ( z ) b e as in Definition 2.3 . Then M = | L ( z ) | ≤ N 5 . Define z 0 = E + i , z M = z , and z k := E + iη k = E + i (1 − k N − 5 ) for 1 ≤ k ≤ M − 1. F or a small enough ε > 0, in tro duce the ev ents Ω k := Ω( z k ) := ( | e m ( z k ) − e m 0 ( z k ) | ≤ C N 2 ε ( N η k ) − 1 / 2 √ κ + η k + N ε ( N η k ) − 1 / 4 ) , E k := E ( z k ) := n Λ( z k ) ≤ N 2 ε ( N η k ) − 1 / 4 o , where η k = Im z k . Cho ose a large enough D > 0. W e pro ceed with a sto chastic contin uity argumen t. F or k = 0, by Lemma 4.6 and Definition 2.3 , w e hav e P ((Ω 0 ∩ E 0 ) c ) ≤ N − D . F or an y k ≥ 1, since Λ is 2 N 2 -Lipsc hitz con tin uous ov er z ∈ D , and | η k − η k − 1 | = N − 5 , on the ev en t Ω k − 1 ∩ E k − 1 w e hav e Λ( z k ) ≤ N 2 ε ( N η k − 1 ) − 1 / 4 + 2 N 2 − 5 ≤ N 2 ε − τ / 4 + 2 N − 3 , implying that for C 0 > 5, small enough ε > 0, and large enough N , the even t Ξ( z k ) = { Λ( z k ) ≤ N − τ /C 0 } holds. Th us, b y Lemma 4.5 , P  Ω k − 1 ∩ E k − 1 ∩ n | z 0 ( e m ( z k )) − z k | ≥ N 2 ε ( N η k ) − 1 / 2 o ≤ N − D , where we used Ψ Θ ( z k ) 1 Ξ( z k ) ≺ ( N η k ) − 1 / 2 . Since the function ∆( w ) = N 2 ε ( N Im w ) − 1 / 2 satisfies the requirements of Definition 2.3 , it follows that P Ω k − 1 ∩ E k − 1 ∩ ( | e m ( z k ) − e m 0 ( z k ) | ≥ C N 2 ε ( N η k ) − 1 / 2 √ κ + η k + N ε ( N η k ) − 1 / 4 )! ≤ N − D , whic h implies P (Ω k − 1 ∩ E k − 1 ∩ Ω c k ) ≤ N − D . On the other hand, b y Lemma 4.2 , on the even t Ω k − 1 ∩ E k − 1 ∩ Ω k w e ha v e Λ( z k ) 1 Ω k − 1 ∩E k − 1 ∩ Ω k ≺ (Ψ Θ ( z k ) + Θ( z k )) 1 Ω k − 1 ∩E k − 1 ∩ Ω k ≺ N ε ( N η k ) − 1 / 4 , whic h giv es P (Ω k − 1 ∩ E k − 1 ∩ Ω k ∩ E c k ) ≤ N − D for sufficiently large N . Therefore, P (Ω k − 1 ∩ E k − 1 ∩ (Ω k ∩ E k ) c ) ≤ P (Ω k − 1 ∩ E k − 1 ∩ Ω c k ) + P (Ω k − 1 ∩ E k − 1 ∩ Ω k ∩ E c k ) ≤ 2 N − D . In general, for any 0 ≤ k ≤ M , P ((Ω k ∩ E k ) c ) ≤ P ((Ω 0 ∩ E 0 ) c ) + k X j =1 P (Ω j − 1 ∩ E j − 1 ∩ (Ω j ∩ E j ) c ) ≤ N − D + 2 k N − D ≤ C N 5 N − D = C N 5 − D . Since D is arbitrary , this prov es the claim. □ Remark 4.8. Let us take C 0 > 5 in the definition Ξ = { Λ ≤ N − τ /C 0 } . Then this implies that 1 Ξ ≺ 1 uniformly ov er z ∈ D , so all preceding estimates hold with 1 Ξ replaced by 1. ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 61 Pr o of of The or em 2.5 . Applying Lemma 4.5 giv es z 0 ( e m ) − z = O ≺ (Ψ 2 Θ ) + O ≺ max 0 ≤ k ≤ l      1 N N X i =1 N − 1 T r( g i g ∗ i − Σ) R ( i ) Σ k ( I + e m 0 Σ) − k − 1 1 + N − 1 g ∗ i R ( i ) g i      ! . (4.4) No w suppose for some constan t c ∈ (0 , 1] w e hav e Θ ≺ ( N η ) − c . Then Ψ Θ ≺ s Im e m 0 + ( N η ) − c N η := Φ . Note that Definition 2.3 implies η ≲ Im e m 0 ≲ 1, so N − 1 / 2 ≲ Φ ≲ N − τ / 2 . By Lemma 4.2 , uniformly ov er all S ⊆ [ N ] with | S | ≤ L a fixed constan t, we ha v e N − 1 ∥ R ( S ) ∥ F ≺ Φ, and also Γ ( S ) = max i | e R ( S ) ii − e m ( S ) | ≺ Ψ Θ + Θ ≺ N − cτ / 2 . Thus, b y the fluctuation a v eraging result of Lemma 3.4 , for all 0 ≤ k ≤ l ,      1 N N X i =1 N − 1 T r( g i g ∗ i − Σ) R ( i ) Σ k ( I + e m 0 Σ) − k − 1 1 + N − 1 g ∗ i R ( i ) g i      ≺ ∥ Σ ∥ k op ∥ ( I + e m 0 Σ) − 1 ∥ k +1 op Φ 2 ≺ Φ 2 . This implies | z 0 ( e m ) − z | ≺ Φ 2 . Then b y the stability condition of Definition 2.3 and the b ound Im e m 0 ( z ) ≤ C √ κ + η , Θ = | e m − e m 0 | ≺ Im e m 0 +( N η ) − c N η √ κ + η + q Im e m 0 +( N η ) − c N η ≺ 1 N η + ( N η ) − c − 1 ( N η ) − c/ 2 − 1 / 2 ≺ ( N η ) − c/ 2 − 1 / 2 . Th us, w e ha v e sho wn the implication Θ ≺ ( N η ) − c ⇒ Θ ≺ ( N η ) − c/ 2 − 1 / 2 . F or any ε > 0, starting with c = 1 / 4 and iterating this a constant C ε n um b er of times yields Θ ≺ ( N η ) − 1+ ε . Since ε > 0 is arbitrary , Θ ≺ ( N η ) − 1 . Substituting in to Ψ Θ giv es Ψ Θ ≺ s Im e m 0 N η + 1 N η := Ψ . Then by Lemma 4.2 , | e R ii − e m 0 | ≺ Θ + Ψ Θ ≺ Ψ and | e R ij | ≺ Ψ, whic h shows ( 2.13 ). F or ( 2.12 ), applying the ab o ve b ounds with c = 1 giv es | z 0 ( e m ) − z | ≺ Ψ 2 , so applying Definition 2.3 once more giv es | e m − e m 0 | ≺ Ψ 2 √ κ + η + Ψ . Since e m − e m 0 = γ ( m − m 0 ) and γ ≍ 1, the same b ound holds for m − m 0 , so this shows ( 2.12 ). Since all quan tities in ( 2.13 ) and ( 2.12 ) are N 2 -Lipsc hitz ov er z ∈ D , taking a union bound ov er a cov ering net sho ws that with probability 1 − C N − D , these statements hold sim ultaneously for ev ery z ∈ D . □ Corollary 2.7 no w follows from known arguments, whic h we summarize briefly here. The following lemma is a version of [ KY17 , Lemmas 10.3, 10.4] (see also the asymptotic exact separation result of [ BS99 ]), clarifying that no bulk/edge regularit y is needed for this statement, and that it holds also for v alues a ∈ (0 , ∞ ) b elo w the smallest left edge. Lemma 4.9. Supp ose Assumptions 1 and 2 hold. L et λ 1 , . . . , λ N b e the eigenvalues of e K , and for e ach a ∈ (0 , ∞ ) , let b N ( a ) = N X i =1 1 { λ i ≥ a } , N ( a ) = N Z ∞ a ρ 0 ( x ) dx. (4.5) 62 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Then for any c onstants δ, D > 0 , ther e exists a c onstant C ≡ C ( δ, D ) > 0 such that if a ∈ (0 , ∞ ) satisfies ( a − δ, a + δ ) ∩ supp( e µ 0 ) = ∅ , then with pr ob ability at le ast 1 − C N − D , b N ( a ) = N ( a ) . Pr o of. The av eraged lo cal law ( 2.12 ) applied for z = E + iη ∈ D o ( δ, τ ) with an y fixed constan t τ ∈ (0 , 1 / 2) implies | e m ( z ) − e m 0 ( z ) | ≪ ( N η ) − 1 . Then, since Im e m 0 ≲ η ≪ ( N η ) − 1 b y ( 2.10 ) and regularit y of D o ( δ, τ ) (c.f. Lemma 2.4 ), w e ha v e P N i =1 1 { λ i ∈ [ E − η , E + η ] } ≤ 2 N η Im e m ( z ) ≪ 1, so e K has no eigen v alue in [ E − η , E + η ]. Thus, ( 2.12 ) implies for some constan t C ≡ C ( δ, D ) > 0, with probability 1 − C N − D , e K has no eigen v alues in { x ∈ R : | x | ≤ δ − 1 , dist( x, supp( e µ 0 ) ∪ { 0 } ) ≥ δ } . (4.6) T o show ( 4.5 ), w e b egin with the case where G is Gaussian, i.e. e K = N − 1 X ∗ Σ X where X ∈ R n × N has i.i.d. N (0 , 1) en tries and assume without loss of generalit y Σ = diag( σ 1 , . . . , σ n ) with σ 1 ≥ . . . ≥ σ n . Let m 1 , . . . , m 2 p and x 1 > x 2 ≥ x 3 > . . . > x 2 p ≥ 0 b e as defined in ( 2.8 ). Note that if a > x 1 ( a is ab ov e the largest edge), then N ( a ) = 0, and ( 4.6 ) together with ∥ e K ∥ op ≤ N − 1 ∥ X ∥ 2 op ∥ Σ ∥ op and the high-probabilit y b ound ∥ X ∥ op ≤ C √ N [ V er12 , Corollary 5.35] shows b N ( a ) = N ( a ) = 0 with probability 1 − C ( δ, D ) N − D . If a ∈ ( x 2 k +1 , x 2 k ) for some k ∈ { 1 , . . . , p − 1 } ( a is betw een t wo separated bulk components), then [ KY17 , Lemma A.1] shows that N ( a ) = P n α =1 1 { σ α  = 0 , − σ − 1 α ∈ ( m 2 k , 0) } , i.e. N ( a ) coun ts the num b er of non-zero p oles of z 0 ( m ) to the righ t of m 2 k with multiplicit y . Let Σ = diag (Σ 1 , Σ 2 ) where Σ 1 = diag( σ 1 , . . . , σ N ( a ) ) contains the eigenv alues satisfying − σ − 1 α ∈ ( m 2 k , 0), and Σ 2 = diag( σ N ( a )+1 , . . . , σ n ) contains the rest. Define, for t ∈ [0 , 1], Σ( t ) = diag  (1 − t )Σ 1 + tσ 1 I , (1 − t )Σ 2  =: diag  σ 1 ( t ) , . . . , σ n ( t )  , whic h in terpolates b et w een Σ(0) = Σ and Σ(1) = diag( σ 1 , . . . , σ 1 | {z } N ( a ) , 0 , . . . , 0 | {z } N − N ( a ) ), and let e K ( t ) = N − 1 X ∗ Σ( t ) X with eigenv alues λ 1 ( t ) , . . . , λ N ( t ) . Observ e that d dσ α z ′ 0 ( m ) = − 1 N d dσ α 1 ( σ − 1 α + m ) 2 whic h is p ositiv e when − σ − 1 α > m and negativ e when − σ − 1 α < m . Then, writing z 0 ( m ; t ) and z ′ 0 ( m ; t ) for z 0 ( m ) and z ′ 0 ( m ) defined by Σ( t ), we see that z ′ 0 ( m ; t ) is increasing in t for each m ∈ ( m 2 k +1 , m 2 k ). This implies that there remain tw o distinct critical p oin ts of z 0 ( m ; t ) b et ween the p oles − σ N ( a ) ( t ) − 1 and − σ N ( a )+1 ( t ) − 1 for all t ∈ (0 , 1). W riting m 2 k +1 ( t ) < m 2 k ( t ) for these t w o critical p oin ts, w e hav e d dt h z 0 ( m 2 k ( t ); t ) − z 0 ( m 2 k +1 ( t ); t ) i = 1 N n X α =1  1 (1 + σ α ( t ) m 2 k ( t )) 2 − 1 (1 + σ α ( t ) m 2 k +1 ( t )) 2  d dt σ α ( t ) . Since 1 (1+ σ α ( t ) m 2 k ( t )) 2 − 1 (1+ σ α ( t ) m 2 k +1 ( t )) 2 > 0 and d dt σ α ( t ) > 0 for α = 1 , . . . , N ( a ), and similarly b oth are negativ e for α = N ( a ) + 1 . . . , n , we hav e d dt [ z 0 ( m 2 k ( t ); t ) − z 0 ( m 2 k +1 ( t ); t )] > 0. Then z 0 ( m 2 k ( t ); t ) − z 0 ( m 2 k +1 ( t ); t ) > z 0 ( m 2 k (0); 0) − z 0 ( m 2 k +1 (0); 0) > 2 δ (4.7) for all t ∈ (0 , 1), by the starting assumption that ( a − δ, a + δ ) ∩ supp( e µ 0 ) = ∅ . Note that Σ( t ) satisfies Assumption 1 for all t ∈ [0 , 1 − c ] and any constant c > 0. Then ( 4.6 ) holds for each matrix e K ( t ) and t ∈ [0 , 1 − c ]. Cho osing c ≡ c ( δ ) > 0 sufficiently small, the op erator norm b ound ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 63 ∥ X ∥ op ≤ C √ N and a standard co v ering argumen t sho w that with probability 1 − C N − D , ( 4.6 ) holds simultaneously for all { e K ( t ) : t ∈ [0 , 1] } . On this even t, ( 4.6 ) and ( 4.7 ) imply b N ( a ) = N X i =1 1 n λ i (0) ≥ z 0 ( m 2 k (0); 0) − δ o = N X i =1 1 n λ i (1) ≥ z 0 ( m 2 k (1); 1) − δ o (4.8) where m 2 k (1) = lim t → 1 m 2 k ( t ). At t = 1, denoting Y ∈ R N ( a ) × N as the first N ( a ) rows of X , we ha v e e K (1) = σ 1 · N − 1 Y ∗ Y , and z 0 ( m 2 k (1); 1) is the lo w er edge of the single bulk comp onen t of the standard Marchenk o-P astur la w, given by z 0 ( m 2 k (1); 1) = σ 1 (1 − p N ( a ) / N ) 2 . Then ( 4.7 ) implies that N ( a ) ≤ (1 − c ( δ )) N for some constan t c ( δ ) > 0, and by concen tration of the smallest singular v alue of Y [ V er12 , Corollary 5.35], the right side of ( 4.8 ) is exactly equal to N ( a ) with probability 1 − C N − D . Finally , if a ∈ (0 , x 2 p ) ( a is b elo w the smallest left edge), then [ KY17 , Lemma A.1] shows N ( a ) = min( N , ¯ n ) where we denote ¯ n = rank(Σ). W rite Σ = diag (Σ 1 , 0) where Σ 1 ∈ R ¯ n × ¯ n , and consider the in terp olation Σ( t ) = diag  (1 − t )Σ 1 + tσ 1 I , 0  . F or the (unique) critical p oin t m 2 p ( t ) ∈ ( −∞ , − σ ¯ n ( t ) − 1 ) ∪ (0 , ∞ ], the ab o v e argument establishes d dt z 0 ( m 2 p ( t ); t ) > 0 and hence z 0 ( m 2 p ( t ); t ) > z 0 ( m 2 k (0); 0) > δ. (4.9) Hence, on the even t where ( 4.6 ) holds for all { e K ( t ) : t ∈ [0 , 1] } , we ha ve that ( 4.8 ) holds with k = p . Here e K (1) = σ 1 · N − 1 Y ∗ Y where Y ∈ R ¯ n × N con tains the first ¯ n ro ws of X , z 0 ( m 2 p (1); 1) = σ 1 (1 − p ¯ n/ N ) 2 , and ( 4.9 ) implies that ¯ n / ∈ (1 ± c ( δ )) N for some c ( δ ) > 0. Then again, the righ t side of ( 4.8 ) is equal to N ( a ) = min( N , ¯ n ) with probabilit y 1 − C N − D . This shows b N ( a ) = N ( a ) in all cases. T o extend this to the general non-Gaussian setting, let e K = N − 1 G ∗ G where G satisfies Assump- tions 1 and 2 , let e G ∈ R n × N ha v e columns { e g i } N i =1 that are i.i.d. N (0 , Σ) and indep endent from G , and consider the interpolation for t ∈ { 0 , 1 / N , 2 / N , . . . , 1 } G ( t ) = √ tG + √ 1 − t e G, e K ( t ) = N − 1 G ( t ) ∗ G ( t ) . Then G ( t ) satisfies Assumption 2 uniformly ov er all such t , so ( 4.6 ) holds with probabilit y 1 − C N − D for each such matrix e K ( t ). It is easy to c hec k that for a constan t C > 0 and each l = 1 , . . . , N , ∥ e K ( l / N ) − e K (( l − 1) / N ) ∥ op ≤ C N − 1 / 2 max 0 ≤ s ≤ N ∥ e K ( s/ N ) ∥ op , and w e ha v e ∥ e K ( s/ N ) ∥ op ≤ N ε for an y ε > 0 and eac h s = 0 , 1 , . . . , N with probabilit y 1 − C ( ε, D ) N − D b y Lemma 3.8 (b). Then on the intersection of these even ts, ( 4.5 ) for the Gaussian matrix e K (1) and ( 4.6 ) for eac h matrix e K ( l /n ) implies ( 4.5 ) for the original matrix e K = e K (0), completing the pro of. □ Pr o of of Cor ol lary 2.7 . F or (a), b y ( 4.6 ) and the implication of Lemma 4.9 that b N ( a ) = N ( a ) = 0 for a > max( x ∈ supp( e µ 0 )) + δ , we know that with probability 1 − C ( δ, D ) N − D , K and e K hav e no eigen v alues outside { x ∈ R : dist( x, supp( e µ 0 ∪ { 0 } )) ≤ δ } . If there is a constant δ > 0 for whic h dist(0 , supp( e µ 0 )) > δ , then applying Lemma 4.9 with a = δ / 2 shows b N ( a ) = N ( a ) = N , and th us e K also has no eigenv alues in [0 , δ / 2]. Similarly , if dist(0 , supp( µ 0 )) > δ , then N ≥ n and e µ 0 = ( n/ N ) µ 0 + (1 − n/ N ) δ 0 , so applying Lemma 4.9 with a = δ / 2 shows b N ( a ) = N ( a ) = n . Then K has full rank with no eigenv alues in [0 , δ / 2]. This sho ws all statemen ts of (a). F or (b), the av eraged lo cal law ( 2.12 ) applied for z = E + iη ∈ D e j ( δ, τ ) with κ = E − x 2 k − 1 ≥ N − 2 / 3+ ε and η = N − 1 / 2 − ε/ 4 κ 1 / 4 ≥ N − 2 / 3 implies | e m ( z ) − e m 0 ( z ) | ≪ ( N η ) − 1 (see e.g. [ PY14 , Pro of 64 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES of (3.4), Step 1]). Then again P N i =1 1 { λ i ∈ [ E − η , E + η ] ≤ 2 N η Im e m ( z ) ≪ 1, so e K has no eigen v alue in [ E − η , E + η ]. Applying this ov er a lattice of suc h v alues z ∈ D e j ( δ, τ ) sho ws (b). F or (c), if D = { z ∈ C + : E ∈ [ E 1 , E 2 ] , η ∈ [ N − 1+ τ , 1] } is a regular domain, then applying ( 2.12 ) for sufficien tly small τ ≡ τ ( ε ) ∈ (0 , 1), the argument of [ PY14 , Pro of of (3.4), Step 2] implies that for any a, b ∈ [ E 1 , E 2 ], | ( b N ( b ) − b N ( a )) − ( N ( b ) − N ( a )) | ≤ C (log N ) N ε . Th us, together with ( 4.5 ), this sho ws that if an edge x j is suc h that D e j ( δ, τ ) is regular, then with probability 1 − C N − D for some C ≡ C ( δ, ε, D ) > 0, | b N ( a ) − N ( a ) | ≤ C (log N ) N ε for each a ∈ [ x j − δ, x j + δ ]. F urthermore if D b k ( δ, τ ) is regular and either D e 2 k ( δ, τ ) or D e 2 k − 1 ( δ, τ ) is regular, then this holds also for eac h a ∈ [ x 2 k + δ, x 2 k − 1 − δ ]. T ogether with the square-ro ot decay of ρ 0 ( x ) at regular edges, this implies part (c) (see e.g. [ PY14 , Pro of of (3.6)]). F or (d), writing the sp ectral decomp osition e K = P N i =1 λ i e x i e x ∗ i , for any j ∈ [ N ], we ha v e Im e R j j = P N i =1 | e x i [ j ] | 2 η / (( λ i − E ) 2 + η 2 ). If D e j ( δ, ε ) is regular, then for eac h eigenv alue λ i ∈ [ x j − δ , x j + δ ], applying ( 2.13 ) at z = λ i + iN − 1+ ε sho ws | e x i [ j ] | 2 ≤ N − 1+ ε Im e R j j ≺ N − 1+ ε . The same argumen t holds for a regular bulk domain D b k ( δ, ε ), sho wing part (d). □ 4.3. Pro of of anisotropic la w (Theorems 2.8 ). W e now prov e Theorem 2.8 on anisotropic appro ximations for the linearized resolven t Π( z ). Note that for z ∈ C + , Π( z ) = " − z I n 1 √ N G 1 √ N G ∗ − I N # − 1 =  R ( z ) Π o ( z ) ∗ Π o ( z ) z e R ( z )  ∈ C ( n + N ) × ( n + N ) . Indexing [ N ] b y Roman indices i, j, . . . and [ n ] by Greek indices α, β , . . . , the follo wing resolv en t iden tities hold by Sc h ur complemen t: (Π( z )) ij = z e R ij ( z ) , (Π( z )) αβ = R αβ ( z ) , (Π( z )) iα = (Π o ( z )) iα = z e R ii ( z ) g ∗ i R ( i ) ( z ) e α . (4.10) W e will first show an anisotropic lo cal la w for the upp er-left block R ( z ), and then use this to sho w the result for z e R ( z ) and Π o ( z ). Fix t wo constan ts C 0 > 0 and δ ∈ (0 , τ / 2 C 0 ). F ollo wing the same argument as in [ KY17 ], w e will b ootstrap on the sp ectral scale in multiplicativ e increments of N − δ : F or an y η ≥ N − 1+ τ , define L ≡ L ( η ) = max { l ∈ N : η N δ ( l − 1) < 1 } , η l = η N δ l for l = 0 , . . . , L − 1 , η L = 1 . Note that L ≤ δ − 1 + 1. The follo wing lemma, applying the fluctuation a v eraging result of Lemma 3.6 (a), is the main input for the bo otstrap argumen t. Lemma 4.10. Supp ose Assumptions 1 , 2 , and 3 hold, and D is a r e gular sp e ctr al domain. Supp ose ther e exists a deterministic function Φ : D → [ N − 1 / 2 , N − τ / 2 ] and a c onstant δ > 0 such that uniformly over al l deterministic unit ve ctors x , y ∈ C n , ∥ R x ∥ 2 √ N ≺ Φ , | x ∗ R y | ≺ N 2 δ . Then, uniformly over al l deterministic unit ve ctors u ∈ C n and z ∈ D , | u ∗ R u − u ∗ ( − z I − z e m 0 Σ) − 1 u | ≺ max( N 2 δ Φ , Ψ) wher e Ψ is the b ound in The or em 2.5 . Pr o of. W e hav e | u ∗ R u − u ∗ ( − z − z e m 0 Σ) − 1 u | ≤ | u ∗ R u − u ∗ ( − z − z e m Σ) − 1 u | {z } I | + | u ∗ ( − z − z e m Σ) − 1 u − u ∗ ( − z − z e m 0 Σ) − 1 u | {z } I I | ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 65 F or I I , b y the iden tit y A − 1 − B − 1 = A − 1 ( B − A ) B − 1 | I I | = | z || e m − e m 0 || u ∗ ( − z − z e m Σ) − 1 Σ( − z − z e m 0 Σ) − 1 u | ≺ Ψ ∥ ( − z − z e m Σ) − 1 ∥ op ∥ ( − z − z e m 0 Σ) − 1 ∥ op ≺ Ψ , where we applied Lemma 4.2 and Theorem 2.5 . F or I , by Lemma 4.3 and Lemma 4.4 , there exists a constant l ≥ 1 suc h that | u ∗ R u − u ∗ ( − z − z e m Σ) − 1 u | ≺ max 0 ≤ k ≤ l      1 N N X i =1 u ∗ Σ k ( I + e m 0 Σ) − k − 1 ( g i g ∗ i − Σ) R ( i ) u 1 + N − 1 g ∗ i R ( i ) g i      + max(Φ 2 , Ψ) , where w e hav e used the giv en condition and Lemma 4.2 to b ound the remainder in Lemma 4.4 as N − 1 / 2 ∥ R ( i ) u ∥ 2 ≺ N − 1 / 2 ∥ R u ∥ 2 ≺ Φ and similarly N − 1 / 2 ∥ u ∗ Σ k +1 ( I + e m 0 Σ) − k − 1 R ( i ) ∥ 2 ≺ Φ. Note that u ∗ Σ k ( I + e m 0 Σ) − k − 1 is a deterministic v ector b ounded in ℓ 2 -norm, and the conditions of Lemma 3.6 are satisfied by the giv en assumptions N − 1 / 2 ∥ R x ∥ 2 ≺ Φ, | x ∗ R y | ≺ N 2 δ , and the conditions Γ ( S ) ≺ Ψ and N − 1 ∥ R ( S ) ∥ 1 ≺ 1 in light of Lemma 4.2 and Theorem 2.5 . Thus the first term is O ≺ ( N 2 δ Φ) by the p olarization iden tit y and fluctuation a v eraging result of Lemma 3.6 (a), concluding the pro of. □ Lemma 4.11. Under Assumptions 1 and 2 , uniformly over al l z = E + iη ∈ D and al l deterministic unit ve ctors x , y ∈ C n , | x ∗ R ( z ) y | ≺ N 2 δ L ( η ) X l =1 [Im x ∗ R ( E + iη l ) x + Im y ∗ R ( E + iη l ) y ] . Pr o of. W riting the sp ectral decomp osition K = P n α =1 λ α v α v ∗ α , | x ∗ R ( z ) y | ≤ X α | x ∗ v α || y ∗ v α | | λ α − z | ≤ X α | x ∗ v α | 2 | λ α − z | + X α | y ∗ v α | 2 | λ α − z | . Define η − 1 = 0, η L +1 = ∞ , and the index sets U l = { α : η l − 1 ≤ | λ α − E | < η l } for l = 0 , . . . , L + 1. Then the same argument as leading to ( 4.1 ), using η l /η l − 1 ≤ N δ in place of η l /η l − 1 ≤ 2, sho ws X α | x ∗ v α | 2 | λ α − z | = L +1 X l =0 X α ∈ U l | x ∗ v α | 2 | λ α − z | ≺ N 2 δ L ( η ) X l =1 Im x ∗ R ( E + iη l ) x , and the analogous b ound holds for y . □ Lemma 4.12. Supp ose Assumptions 1 , 2 , and 3 hold, and D is a r e gular sp e ctr al domain. F or k ∈ N , define the domain D k = { z ∈ D : Im z ∈ [ N − kδ , 1] } . L et Ψ b e the b ound in The or em 2.5 . Then the fol lowing holds: (a) Uniformly over al l z ∈ D 0 and al l deterministic unit ve ctors u ∈ C n , | u ∗ R u − u ∗ ( − z I − z e m 0 Σ) − 1 u | ≺ N C 0 δ Ψ , (b) F or al l 0 ≤ k ≤ δ − 1 , if uniformly over al l z ∈ D k and al l deterministic unit ve ctors u ∈ C n , | u ∗ R u − u ∗ ( − z I − z e m 0 Σ) − 1 u | ≺ N C 0 δ Ψ , then uniformly over al l z ∈ D k and al l deterministic unit ve ctors u ∈ C n , Im u ∗ R u ≺ Im e m 0 + N C 0 δ Ψ . 66 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES (c) F or al l 0 ≤ k ≤ δ − 1 , if uniformly over al l z ∈ D k and al l deterministic unit ve ctors u ∈ C n , Im u ∗ R u ≺ Im e m 0 + N C 0 δ Ψ , then uniformly over al l z ∈ D k +1 and al l deterministic unit ve ctors u ∈ C n , | u ∗ R u − u ∗ ( − z I − z e m 0 Σ) − 1 u | ≺ N C 0 δ Ψ . Pr o of. F or (a), note that for z ∈ D 0 , we ha ve η = 1, so ∥ R x ∥ 2 / √ N ≤ ∥ R ∥ op / √ N ≤ N − 1 / 2 and | x ∗ R y | ≤ ∥ R ∥ op ≤ 1 for any unit v ectors x , y ∈ C n . By Lemma 4.10 , | u ∗ R u − u ∗ ( − z I − z e m 0 Σ) − 1 u | ≺ N 2 δ √ N = N 2 δ − 1 / 2 ≤ N C 0 δ Ψ , where the last b ound holds for sufficien tly large C 0 > 0 and any δ ∈ (0 , τ / 2 C 0 ). This pro v es (a). F or (b), note that uniformly o v er z ∈ D k , (a) implies Im u ∗ R u ≺ Im u ∗ ( − z I − z e m 0 Σ) − 1 u + N C 0 δ Ψ . Denoting v α as the eigen vectors of Σ with eigenv alues σ α , Im u ∗ ( − z I − z e m 0 Σ) − 1 u = n X α =1 η (1 + Re e m 0 ) + E Im e m 0 | z | 2 | 1 + σ α e m 0 | 2 | v ∗ α u | 2 ≤ C Im e m 0 , since, ov er a regular domain D , w e ha v e Im e m 0 ≳ η , | z | ≍ 1, | Re e m 0 | ≤ | e m 0 | ≲ 1, and min α | 1 + σ α e m 0 | ≳ 1. This prov es (b). F or (c), let z = E + iη ∈ D k +1 , so η ≥ N − ( k +1) δ . By construction, E + iη l ∈ D k for l = 1 , . . . , L ( η ), since η l = η N δ l ≥ N − ( k +1) δ N δ = N − kδ . The assumption implies uniformly o v er all suc h z = E + iη ∈ D k +1 and unit v ectors x ∈ C n , Im x ∗ R ( E + iη l ) x ≺ Im e m 0 ( E + iη l ) + N C 0 δ Ψ( E + iη l ) ≺ 1 , since Im e m 0 ( E + iη l ) ≤ C , Ψ( E + iη l ) ≤ C N − τ / 2 for any η l ≥ N − 1+ τ , and δ ∈ (0 , τ / 2 C 0 ). Then b y Lemma 4.11 , uniformly ov er unit vectors x , y ∈ C n , | x ∗ R ( z ) y | ≺ N 2 δ . Since the function η 7→ η Im s ( z ) is nondecreasing and η 7→ Im s ( z ) /η is nonincreasing for an y Stieltjes transform s ( z ), letting z 1 = E + iN δ η ∈ D m , we hav e Im u ∗ R ( z ) u ≤ N δ η η Im u ∗ R ( z 1 ) u ≺ N δ [Im e m 0 ( z 1 ) + N C 0 δ Ψ( z 1 )] ≤ N 2 δ [Im e m 0 ( z ) + N C 0 δ Ψ( z )] . Th us, ∥ R u ∥ 2 √ N = s Im u ∗ R u N η ≺ N δ s Im e m 0 + N C 0 δ Ψ N η ≤ N δ (1 + N C 0 δ / 2 )Ψ , the last inequality using √ a + b ≤ √ a + √ b and p Im ˜ m 0 ( z ) / N η ≤ Ψ( z ). Noting that N δ (1 + N C 0 δ / 2 ) ≤ 2 N ( C 0 / 2+1) δ and setting Φ = N ( C 0 / 2+1) δ Ψ, Lemma 4.10 then shows that | u ∗ R u − u ∗ ( − z I − z e m 0 Σ) − 1 u | ≺ N 2 δ Φ = N ( C 0 / 2+3) δ Ψ ≤ N C 0 δ Ψ , the last inequalit y holding for C 0 > 6. This completes the pro of. □ Pr o of of The or em 2.8 . Lemma 4.12 implies, uniformly o ver z ∈ D and all deterministic unit v ectors u ∈ C n , | u ∗ R u − u ∗ ( − z I − z e m 0 Σ) − 1 u | ≺ N C 0 δ Ψ . Since the constan t δ ∈ (0 , τ / 2 C 0 ) here is arbitrarily small, this implies | u ∗ R u − u ∗ ( − z I − z e m 0 Σ) − 1 u | ≺ Ψ . (4.11) ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 67 T aking δ > 0 arbitrarily small within the arguments of Lemma 4.12 shows also ∥ R x ∥ 2 √ N ≺ Ψ , | x ∗ R y | ≺ 1 uniformly o ver unit vectors x , y ∈ C n and z ∈ D . Then by the resolv en t identit y ( 3.2 ), uniformly o v er unit vectors v ∈ C N , v ∗ e R v = e m 0 + N X i =1 | v ( i ) | 2 ( e R ii − e m 0 ) + X i  = j ¯ v ( i ) v ( j ) e R ij = e m 0 + O ≺ (Ψ) + z X i  = j ¯ v ( i ) v ( j ) e R ii e R ( i ) j j N − 1 g ∗ i R ( ij ) g j = e m 0 + O ≺ (Ψ) + z − 1 X i  = j ¯ v ( i ) v ( j ) N − 1 g ∗ i R ( ij ) g j (1 + N − 1 g ∗ i R ( i ) g i )(1 + N − 1 g ∗ j R ( ij ) g j ) . Then by the fluctuation av eraging result of Lemma 3.6 (b) applied with δ = 0 and Φ ≍ Ψ, | v ∗ e R v − e m 0 | ≺ Ψ . (4.12) Similarly , b y the iden tity ( 4.10 ), v ∗ Π o u = N X i =1 ¯ v ( i ) N − 1 / 2 z e R ii g ∗ i R ( i ) u = − N X i =1 ¯ v ( i ) N − 1 / 2 g ∗ i R ( i ) u 1 + N − 1 g ∗ i R ( i ) g i , so by the fluctuation av eraging result of Lemma 3.6 (c) applied with δ = 0 and Φ ≍ Ψ, | v ∗ Π o u | ≺ Ψ . (4.13) The theorem follo ws from ( 4.11 ), ( 4.12 ), ( 4.13 ) and the p olarization identit y . □ Pr o of of Cor ol lary 2.9 . W riting K = P n α =1 λ α x α x ∗ α and e K = P N i =1 λ i e x i e x ∗ i , for any u ∈ R n and v ∈ R N w e hav e Im u ∗ R u = P n α =1 ⟨ u , x α ⟩ 2 η / (( λ α − E ) 2 + η 2 ) and Im v ∗ e R v = P N i =1 ⟨ v , e x i ⟩ 2 η / (( λ i − E ) 2 + η 2 ). Thus the result follo ws from Theorem 2.8 b y the same argumen t as Corollary 2.7 (d). □ 4.4. Pro of outside the sp ectrum (Theorem 2. 10 ). Pr o of of The or em 2.10 . Fixing η ∈ (0 , 1) and δ > 0, w e first show the result ov er the (regular) sp ectral domain D o ≡ D o ( δ, η ) defined in ( 2.9 ). ( 4.6 ) shows that K has no eigenv alues in { x ∈ R : | x | ≤ 2 δ − 1 , dist( x, supp( e µ 0 ∪ { 0 } )) ≥ δ / 2) } , with high probability . In light of Lemma 4.2 , Theorem 2.5 , and the b ound Im e m 0 ≲ η for z ∈ D o , w e ha ve | e m ( S ) − e m | ≺ Ψ 2 Θ ≺ N − 1 + ( N η ) − 2 . Then the argument for ( 4.6 ) may b e applied equally with e m ( S ) in place of e m , to show that for all S ⊂ [ N ] with | S | ≤ L a fixed constan t, also K ( S ) has no eigen v alues in { x ∈ R : | x | ≤ 2 δ − 1 , dist( x, supp( e µ 0 ∪ { 0 } )) ≥ δ / 2) } with high probability . Th us ∥ R ( S ) ( z ) ∥ op ≺ 1 uniformly ov er z ∈ D o and such subsets S ⊂ [ N ]. This implies N − 1 ∥ R ( S ) ( z ) ∥ F ≤ N − 1 / 2 ∥ R ( S ) ∥ op ≺ N − 1 / 2 , so Lemmas 4.2 , 4.4 , and 4.5 all hold with N − 1 / 2 in place of Ψ Θ . Applying Lemma 3.4 with Φ = N − 1 / 2 sho ws, in place of ( 4.4 ), that | z 0 ( e m ) − z | ≺ N − 1 . Then regularit y of D o and the observ ation κ ≍ 1 for z ∈ D o implies that | e m − e m 0 | ≺ N − 1 √ κ + η + N − 1 / 2 ≺ N − 1 , and hence also | m − m 0 | ≺ N − 1 . 68 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES W e hav e also uniformly ov er unit vectors x , y ∈ C n , N − 1 / 2 ∥ R ( S ) x ∥ 2 ≤ N − 1 / 2 ∥ R ( S ) ∥ op ≺ N − 1 / 2 and | x ∗ R ( S ) y | ≤ ∥ R ( S ) ∥ op ≺ 1. Then applying Lemma 3.5 with Φ = N − 1 / 2 in place of Lemma 3.6 in the argumen ts of Lemma 4.10 , ( 4.12 ), and ( 4.13 ) sho ws | u ∗ R u − u ∗ ( − z I − z e m 0 Σ) − 1 u | ≺ N − 1 / 2 , | v ∗ e R v − e m 0 | ≺ N − 1 / 2 , | v ∗ Π o u | ≺ N − 1 / 2 . By the p olarization identit y , this implies all statements of the theorem for z ∈ D o . T o extend these estimates to ¯ D o ( δ ), first note that by the conjugate symmetry e m ( ¯ z ) = e m ( z ) and Π( ¯ z ) = Π( z ) it suffices to extend to the domain ¯ D o ( δ ) ∩ C + . Next, observ e that ( 4.6 ) implies also, on an even t of high probability , R ( z ) and e R ( z ) are 2 /δ -Lipschitz in the op erator norm ov er ¯ D o ( δ ). Applying this Lipsc hitz con tinuit y and the results for D o ( δ, η ) shows that all statements of Theorem 2.10 hold o v er ¯ D o ( δ ) ∩ C + with an additional additiv e error of (2 /δ ) N − 1+ τ . Since τ > 0 is arbitrary , this implies the theorem. □ 5. Anal ysis of examples 5.1. Separable distributions. Pr o of of Pr op osition 2.12 . Supp ose first that d = n , X = I , and g = w . Then Assumption 2 holds b y a standard high momen t estimate, see e.g. [ BEK+14 , Lemma 3.1]. F or Assumption 3 , set U = { e 1 , . . . , e n } , so ∥ T ∥ U = ∥ T ∥ ∞ . Since w has indep endent en tries, κ k ( w ) ∈ ( R n ) ⊗ k is diagonal. Then for an y 1 ≤ m ≤ k − 1, s 1 , . . . , s m ∈ R n , and T ∈ ( R n ) ⊗ k − m , |⟨ κ k ( w ) , s 1 ⊗ · · · ⊗ s m ⊗ T ⟩| ≤ n X α =1 | κ k ( w [ α ]) || T [ α, . . . , α ] | m Y t =1 | s t [ α ] | ≤ C ∥ T ∥ U n X α =1 m Y t =1 | s t [ α ] | , where the last inequalit y holds for some C ≡ C ( k ) > 0 b y the momen t b ounds for w [ α ]. W e ma y b ound P n α =1 | s 1 [ α ] | ≤ √ n ∥ s 1 ∥ 2 for m = 1, and P n α =1 Q m t =1 | s t [ α ] | ≤ Q m t =1 ∥ s t ∥ 2 for m ≥ 2, showing Assumption 3 in the form of ( 2.16 ), whic h is stronger since k ≥ 3 and m ≤ k − 1. F or general d ≍ n and X  = I , since ∥ X ∥ op ≤ ∥ Σ ∥ 1 / 2 op ≤ C for a constan t C > 0, it is clear that Assumption 2 holds for g since it holds for w . F or any 1 ≤ m ≤ k − 1, s 1 , . . . , s m ∈ R n , and T ∈ ( R n ) ⊗ k − m , letting s ′ t = X ∗ s t ∈ R d and T ′ = ( X ∗ ) ⊗ ( k − m ) · T ∈ ( R d ) ⊗ k − m (denoting the m ultiplication of T along eac h axis b y X ∗ ), it follo ws from multilinearit y of the cumulan ts that ⟨ κ k ( g i ) , s 1 ⊗ · · · ⊗ s m ⊗ T ⟩ = ⟨ κ k ( w i ) , s ′ 1 ⊗ · · · ⊗ s ′ m ⊗ T ′ ⟩ . Setting U = { e 1 , . . . , e n , X e 1 , . . . , X e d } , we hav e ∥ s ′ t ∥ 2 ≤ ∥ X ∥ op ∥ s t ∥ 2 and ∥ T ′ ∥ ∞ ≤ ∥ T ∥ U , so ( 2.16 ) follo ws from the abov e result for X = I . □ 5.2. Conditionally mean-zero distributions. W e sho w Prop osition 2.13 . Lemma 5.1. In the setting of Pr op osition 2.13 , for any fixe d k ≥ 1 , uniformly over α 1 , . . . , α k ∈ [ d ] , we have | κ ( w [ α 1 ] , . . . , w [ α k ]) | ≺ 1 (5.1) and d X β =1 κ k +2 ( w [ β ] , w [ β ] , w [ α 1 ] , . . . , w [ α k ]) 2 ≺ 1 . (5.2) Pr o of. Applying Assumption 2 for w with A = e α e ∗ α sho ws | w [ α ] | ≺ 1. T ogether with the m omen t b ound for w in Assumption 2 , this implies for any fixed in teger k ≥ 1 that E | w [ α ] | k ≺ 1. The first claim | κ ( w [ α 1 ] , . . . , w [ α k ]) | ≺ 1 then follows from the moment-cum ulan t relations. ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 69 F or the second claim, let S = { α 1 , . . . , α k } . Let P k +2 denote the set of all partitions of [ k + 2]. F or an y β / ∈ S , denoting α k +1 = α k +2 = β , the moment-cum ulan t relations giv e κ k +2 ( w [ α 1 ] , . . . , w [ α k ] , w [ β ] , w [ β ]) = X π ∈ P k +2 ( − 1) | π |− 1 ( | π | − 1)! Y B ∈ π E h Y j ∈ B w [ α j ] i Observ e that if an y blo c k B ∈ π contains exactly one of the tw o indices { k + 1 , k + 2 } , then E Q j ∈ B w [ α j ] = 0 b y the giv en condition for w . Thus, denoting b y ˙ P k +2 those partitions whic h put k + 1 , k + 2 in the same blo ck, w e ha v e κ k +2 ( w [ α 1 ] , . . . , w [ α k ] , w [ β ] , w [ β ]) = X π ∈ ˙ P k +2 ( − 1) | π |− 1 ( | π | − 1)! Y B ∈ π E h Y j ∈ B w [ α j ] i = κ k +1 ( w [ α 1 ] , . . . , w [ α k ] , w [ β ] 2 ) = κ k +1 ( w [ α 1 ] , . . . , w [ α k ] , w [ β ] 2 − 1) . No w let P k +1 b e all partitions of [ k + 1], and for all π ∈ P k +1 , let B ∗ ∈ π denote the blo c k con taining the last elemen t k + 1. Then w e may further expand this as κ k +1 ( w [ α 1 ] , . . . , w [ α k ] , w [ β ] 2 − 1) = X π ∈ P k +1 ( − 1) | π |− 1 ( | π | − 1)!   Y B ∈ π \ B ∗ E   Y j ∈ B w [ α j ]     E     Y j ∈ B ∗ \{ k +1 } w [ α j ]   ( w [ β ] 2 − 1)   = E         X π ∈ P k +1 ( − 1) | π |− 1 ( | π | − 1)!   Y B ∈ π \ B ∗ E   Y j ∈ B w [ α j ]     Y j ∈ B ∗ \{ k +1 } w [ α j ]   | {z } := u ( w [ β ] 2 − 1)       = E [ u ( w [ β ] 2 − 1)] , where u ∈ R is a scalar random v ariable in the probability space of w , dep ending on α 1 , . . . , α k , and satisfying | u | ≺ 1 uniformly ov er α 1 , . . . , α k ∈ [ d ]. Then, applying also the first claim ( 5.1 ) for β ∈ S , w e hav e d X β =1 κ k +2 ( w [ α 1 ] , . . . , w [ α k ] , w [ β ] , w [ β ]) 2 = X β / ∈ S κ k +1 ( w [ α 1 ] , . . . , w [ α k ] , w [ β ] 2 − 1) 2 + O ≺ (1) = d X β =1  E [ u ( w ( β ) 2 − 1)]  2 + O ≺ (1) . The pro of is concluded b y the observ ation d X β =1  E [ u ( w [ β ] 2 − 1)]  2 = sup v ∈ R d : ∥ v ∥ 2 =1  d X β =1 v [ β ] E [ u ( w [ β ] 2 − 1)]  2 ≤ sup v ∈ R d : ∥ v ∥ 2 =1 E u 2 · E     d X β =1 v [ β ]( w [ β ] 2 − 1)     2 ≺ 1 , where the last inequality uses | u | ≺ 1 and Assumption 2 for w with A = diag ( v ). □ Pr o of of Pr op osition 2.13 . Supp ose first that d = n , X = I , and g = w . Let U = { e 1 , . . . , e d } . Fix any k ≥ 3, m ∈ { 1 , . . . , k − 1 } , unit v ectors v 1 , . . . , v m ∈ R n , and T ∈ ( R n ) ⊗ k − m . Then 70 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES ∥ T ∥ U = ∥ T ∥ ∞ , and |⟨ κ k ( w ) , v 1 ⊗ · · · ⊗ v m ⊗ T ⟩| ≤ ∥ T ∥ U n X α 1 ,...,α k =1 | κ k ( w [ α 1 ] , . . . , w [ α k ]) | m Y l =1 | v l [ α l ] | . F or an y index tuple α = ( α 1 , . . . , α k ) ∈ [ n ] k , write π ( α ) for the partition of [ k ] suc h that i, j b elong to the same blo c k of π ( α ) if and only if α i = α j . If any blo c k of π ( α ) has cardinality 1, then | κ k ( w [ α 1 ] , . . . , w [ α k ]) | = 0 b y the given condition for w . Th us, letting P e k b e the set of partitions of [ k ] where each blo c k has cardinality at least 2, |⟨ κ k ( w ) , v 1 ⊗ · · · ⊗ v m ⊗ T ⟩| ≤ ∥ T ∥ U X π ∈ P e k X α ∈ [ n ] k : π ( α )= π | κ k ( w [ α 1 ] , . . . , w [ α k ]) | m Y l =1 | v l ( α l ) | . No w consider any π ∈ P e k . Note that n X α =1 1 = n, n X α =1 | v l [ α ] | ≤ √ n ∥ v l ∥ 2 , n X α =1 | v l 1 [ α ] . . . v l p [ α ] | ≤ ∥ v l 1 ∥ 2 . . . ∥ v l p ∥ 2 for p ≥ 2 . (5.3) Let K 0 ( π ) and K 1 ( π ) b e the n um b ers of blo c ks B ∈ π for which 0 and 1, resp ectiv ely , of the first m indices α 1 , . . . , α m b elong to B . Then, applying ( 5.1 ) and ( 5.3 ), we get |⟨ κ k ( w ) , v 1 ⊗ · · · ⊗ v m ⊗ T ⟩| ≺ ∥ T ∥ U m Y l =1 ∥ v l ∥ 2 X π ∈ P e k √ n 2 K 0 ( π )+ K 1 ( π ) . (5.4) Note that if 0 (or 1) such indices b elong to B , then B has at least 2 (resp. 1) indices from m + 1 , . . . , k , so 2 K 0 + K 1 ≤ k − m . This establishes a w eak er version of ( 2.1 ), with exp onen t k − m in place of k − m − 1. T o impro v e this exp onent to k − m − 1, consider an y π ∈ P e k suc h that 2 K 0 ( π ) + K 1 ( π ) = k − m . Eac h blo c k of π counted by K 0 m ust hav e exactly 2 indices in m + 1 , . . . , k , and each blo ck of π coun ted by K 1 m ust hav e exactly 1 index in m + 1 , . . . , k . If K 0 ≥ 1, let B ∗ ∈ π b e a blo c k counted by K 0 , and supp ose for notational con v enience that B ∗ = { k − 1 , k } . Denote α [ k − 2] = ( α 1 , . . . , α k − 2 ). Then we may apply ( 5.2 ) to b ound X α ∈ [ n ] k : π ( α )= π | κ k ( w [ α 1 ] , . . . , w [ α k ]) | m Y l =1 | v l [ α l ] | ≤ X α [ k − 2] : π ( α [ k − 2] )= π \ B ∗ m Y l =1 | v l [ α l ] | n X β =1 | κ ( w [ α 1 ] , . . . , w [ α k − 2 ] , w [ β ] , w [ β ]) | ≺ √ n X α [ k − 2] : π ( α [ k − 2] )= π \ B ∗ m Y l =1 | v l [ α l ] | ≤ √ n · n K 0 − 1 · n K 1 / 2 = √ n k − m − 1 . If K 0 = 0 and K 1 = k − m , let B ∗ ∈ π b e a blo ck counted b y K 1 , and supp ose for notational con v enience that B ∗ = { α 1 , α k } . Denote in this case α [ k − 2] = ( α 2 , . . . , α k − 1 ). Then w e ma y apply ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 71 ( 5.2 ) to b ound X α ∈ [ n ] k : π ( α )= π | κ k ( w [ α 1 ] , . . . , w [ α k ]) | m Y l =1 | v l [ α l ] | ≤ X α [ k − 2] : π ( α [ k − 2] )= π \ B ∗ m Y l =2 | v l [ α l ] | n X β =1 | κ ( w [ β ] , w [ α 2 ] , . . . , w [ α k − 1 ] , w [ β ]) || v 1 [ β ] | ≤ X α [ k − 2] : π ( α [ k − 2] )= π \ B ∗ m Y l =2 | v l [ α l ] |   n X β =1 κ ( w [ β ] , w [ α 2 ] , . . . , w [ α k − 1 ] , w [ β ]) 2   1 / 2 ∥ v 1 ∥ 2 ≺ X α [ k − 2] : π ( α [ k − 2] )= π \ B ∗ m Y l =2 | v l [ α l ] | ≤ n ( K 1 − 1) / 2 = √ n k − m − 1 . Applying these cases to ( 5.4 ) for the partitions π ∈ P e k where 2 K 0 ( π ) + K 1 ( π ) = k − m , this shows that Assumption 3 holds. The argument to extend to general d ≍ n and X  = I is the same as in Prop osition 2.12 . □ V erific ation of Example 2.16 . W e verify the claims of Example 2.16 . W e first c heck that the conditions of Prop osition 2.13 for w hold under ( 2.17 ) and ( 2.18 ): The conditions E ww ∗ = I and E Q k i =1 w [ α i ] = 0 if α 1 / ∈ { α 2 , . . . , α k } are immediate from ( 2.17 ). F urthermore, ( 2.17 ) implies that the conditional la w ( w | λ ) satisfies Assumption 2 uniformly ov er λ ∈ Λ. T o c hec k Assumption 2 unconditionally , for an y matrix A ∈ R n × n , we may decomp ose w ∗ A w − T r A = ( w ∗ A w − E [ w ∗ A w | λ ]) + ( E [ w ∗ A w | λ ] − T r A ) = ( w ∗ A w − E [ w ∗ A w | λ ]) + ( E [ w [ α ] 2 | λ ] − 1) T r A. (5.5) The first term is O ≺ ( ∥ A ∥ F ) by Assumption 2 conditional on λ , while the second is O ≺ ( ∥ A ∥ F ) by ( 2.18 ) and the b ound | T r A | ≤ √ n ∥ A ∥ F . Next, for the sp ecific example of ( 2.19 ), w e note that E [ w [ α ] 2 | λ ] − 1 = c n λ . Then taking A = I in ( 5.5 ) sho ws that if c n ≫ n − 1 / 2 , then Assumption 2 do es not hold. In this case, w e also ha v e V ar[T r K ] = N − 1 V ar[ ∥ w i ∥ 2 2 ] = N − 1  E [V ar[ ∥ w i ∥ 2 2 | λ ]] + V ar[ E [ ∥ w i ∥ 2 2 | λ ]]  ≍ 1 + N c 2 n ≫ 1 . Then Theorem 2.5 cannot hold, as the eigenv alue rigidit y in Corollary 2.7 must instead imply | T r K − E T r K | ≺ 1 for concentration of the linear sp ectral statistic T r K . Finally , let us chec k that for this mo del ( 2.19 ), Assumption 3 holds as long as c n ≺ n − 1 / 4 : Since w is sign-inv ariant, w e hav e κ k ( w ) = 0 for any odd k . It is therefore enough to consider the case where k ≥ 4 is ev en. Let γ = 1 + c n λ , denote P k the set of all p ossible partitions of [ k ], and let P ′ k b e the set of all p ossible pairings of [ k ]. F or any T ∈ ( R n ) ⊗ k , we ha v e b y the law of total cum ulance ⟨ κ k ( w ) , T ⟩ = X α 1 ,...,α k κ ( w [ α 1 ] , . . . , w [ α k ]) T [ α 1 , . . . , α k ] = X α 1 ,...,α k X π ∈ P k κ ( κ ( w [ α b ] : b ∈ B | γ ) : B ∈ π ) T [ α 1 , . . . , α k ] = X α 1 ,...,α k X π ∈ P ′ k κ k/ 2 ( γ ) Y { p,q }∈ π κ ( w [ α p ] , w [ α q ]) T [ α 1 , . . . , α k ] = κ k/ 2 ( γ ) X π ∈ P ′ k X α 1 ,...,α k T [ α 1 , . . . , α k ] Y { p,q }∈ π 1 α p = α q 72 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES where we used the fact that ( w | γ ) has i.i.d. N (0 , γ ) en tries, so κ ( w [ α b ] : b ∈ B | γ ) = γ if B = { p, q } where α p = α q , and κ ( w [ α b ] : b ∈ B | γ ) = 0 otherwise. Then, for an y ev en k ≥ 4, 1 ≤ m ≤ k − 1, s 1 , . . . , s m ∈ R n , and T ∈ ( R n ) ⊗ k − m , we hav e ⟨ κ k ( w ) , s 1 ⊗ · · · ⊗ s m ⊗ T ⟩ = κ k/ 2 ( γ ) X π ∈ P ′ k X α 1 ,...,α k s 1 [ α 1 ] · · · s m [ α m ] T [ α m +1 , . . . , α k ] Y { p,q }∈ π 1 α p = α q . Note that | κ k/ 2 ( γ ) | = c k/ 2 n | κ k/ 2 ( λ ) | ≺ c k/ 2 n ≺ n − 1 / 2 where the last b ound holds for c n ≺ n − 1 / 4 and k ≥ 4. Thus we can b ound, similar to the pro of of Prop osition 2.13 , |⟨ κ k ( w ) , s 1 ⊗ · · · ⊗ s m ⊗ T ⟩| ≺ n − 1 / 2 ∥ T ∥ ∞ max π ∈ P ′ k X α 1 ,...,α k | s 1 [ α 1 ] | · · · | s m [ α m ] | Y { p,q }∈ π 1 α p = α q ≺ n − 1 / 2 ∥ T ∥ ∞ n ( k − m ) / 2 ∥ s 1 ∥ 2 · · · ∥ s m ∥ 2 ≺ n ( k − m − 1) / 2 ∥ s 1 ∥ 2 · · · ∥ s m ∥ 2 ∥ T ∥ ∞ , establishing Assumption 3 . □ 5.3. Random features mo del. W e now sho w Prop osition 2.17 . T o simplify notation, we will assume that X ∈ R n × d and w ∈ R d ha v e equal dimensions n = d . This is without loss of generality , as if n  = d , it is equiv alent to sho w Proposition 2.17 with X , w , and/or s 1 , . . . , s m , T extended b y 0-padding to ha ve all dimensions equal to max( n, d ). 5.3.1. T ensor networks and bip olar orientations. Definition 5.2. (T ensor netw ork) W e say that a tuple ( G, f ) is a tensor netw ork if the follo wing holds: (1) G = ( V , E ) is a m ultigraph with no self-lo ops. Each v ertex v ∈ V has an ordered list ∂ v ⊆ E of its incident edges, and eac h edge e = ( u, v ) has an ordering of its 2 incident vertices u, v ∈ V . W e denote by deg( v ) = | ∂ v | the v ertex degree, counting edge multiplicit y . (2) f is a lab eling of V ∪ E such that f ( v ) ∈ ( R n ) ⊗ deg( v ) is a tensor of order deg( v ) for each v ertex v ∈ V , and f ( e ) = R n × n is a matrix for each edge e ∈ E . The contracted v alue of this tensor net w ork is then given by v al( G, f ) = X i v,e ∈ [ n ]: v ∈V , e ∈ ∂ v Y v ∈V f ( v )[ i v ,e : e ∈ ∂ v ] Y e =( u,v ) ∈E f ( e )[ i u,e , i v ,e ] where the summation is ov er one index i v ,e ∈ [ n ] for each v ertex-edge pair ( v , e ) such that v is inciden t to e , and [ i v ,e : e ∈ ∂ v ] denotes the ordered tuple of such indices for a giv en vertex v ∈ V . Note that if one p erm utes the order of the edges ∂ v incident to v ∈ V , and p ermutes corre- sp ondingly the indices of the tensor lab el f ( v ), then v al( G, f ) remains unchanged. Similarly , if one exc hanges the v ertex order of e = ( u, v ) and replaces f ( e ) b y the transp ose f ( e ) ∗ , then v al( G, f ) also remains unc hanged. If tw o tensor netw orks ( G, f ) and ( G ′ , f ′ ) are equiv alen t under some suc h p erm utations of their vertex/edge orderings and lab els, w e will c onsider ( G, f ) and ( G ′ , f ′ ) to b e the same tensor netw ork. Otherwise, ( G, f ) and ( G ′ , f ′ ) are distinct. W e will omit the sp ecification of the ordering of ∂ v or e = ( u, v ) if the tensor/matrix lab el f ( v ) or f ( e ) is symmetric. Definition 5.3. (Bip olar orientation) Let G = ( V , E ) b e a multigraph with no self-lo ops. G is bic onne cte d if it consists of a single connected comp onen t and furthermore G is not broken in to disconnected comp onen ts by removing an y single vertex v ∈ V together with its incident edges ∂ v . F or t wo v ertices s, t ∈ V , G admits a ( s, t ) -bip olar orientation if there is a c hoice of direction for eac h edge suc h that G has no directed cycles, s is the unique source (v ertex with no incoming edges), and t is the unique sink (v ertex with no outgoing edges). ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 73 W e note that the existence of a ( s, t )-bip olar orientation dep ends only on whic h pairs of vertices are connected b y an edge in G , and not on the edge multiplicities. It is well-kno wn that the follo wing are equiv alent (see e.g. [ R T86 , Section 2]): (1) G admits a ( s, t )-bip olar orientation. (2) G admits a ( s, t )-num b ering, i.e. a n umbering of its vertices v 1 , . . . , v m with m = |V | such that s = v 1 , t = v m , and each other vertex is adjacent to b oth a lo w er-n um b ered and a higher-n um b ered vertex. (3) The m ultigraph G + = ( V , E ∪ ( s, t )) adding the additional edge ( s, t ) is biconnected. F or any tensor T ∈ ( R n ) ⊗ k and any partition of { 1 , . . . , k } into t w o disjoint subsets I , J , we denote b y mat I ,J ( T ) ∈ R n | I | × R n | J | its matricization or flattening with respect to this index partition, and b y vec( T ) ∈ R n k its vectorization. W e define the norms ∥ T ∥ F = ∥ v ec( T ) ∥ 2 , ∥ T ∥ I → J = ∥ mat I ,J ( T ) ∥ op , ∥ T ∥ mat-op = max I ,J : I ⊔ J =[ k ] | I |≥ 1 , | J |≥ 1 ∥ T ∥ I → J . Th us ∥ T ∥ mat-op is the maximum of the matrix operator norm ∥ T ∥ I → J o v er all flattenings of T corresp onding to all p ossible partitions of { 1 , . . . , k } in to t w o non-empty index sets. The follo wing lemma is similar to the analyses of [ MS12 ], and allows us to b ound the v alue of an y te nsor netw ork with bipolar orien tation. Lemma 5.4. L et ( G, f ) b e a tensor network, wher e G = ( V , E ) admits a ( s, t ) -bip olar orientation. Then | v al( G, f ) | ≤ ∥ f ( s ) ∥ F ∥ f ( t ) ∥ F Y v ∈V \{ s,t } ∥ f ( v ) ∥ mat - op Y e ∈E ∥ f ( e ) ∥ op . Pr o of. Number the v ertices V = { v 1 , . . . , v m } according to a ( s, t )-num bering, and consider the bip olar orien tation suc h that each edge directs from a lo w er-n umbered vertex to a higher-n um b ered v ertex. By replacing the lab el f ( e ) with f ( e ) ∗ , we may assume that the vertices for eac h edge e = ( u, v ) in Definition 5.2 are also ordered such that u has lo w er num ber than v . F or eac h k ∈ { 2 , . . . , m − 1 } , let I k , O k b e the partition of { 1 , . . . , deg( v k ) } corresp onding to the incoming and outgoing (ordered) edges of v k , resp ectively . W e ma y then express v al( G, f ) = vec( f ( v 1 )) ⊤ E 2 M 2 · · · E m − 1 M m − 1 E m v ec( f ( v m )) where • Each M k for k = 2 , . . . , m − 1 is a matrix given by the tensor pro duct of mat I k ,O k ( f ( v k )) and a n umber of identit y matrices. • Each E k for k = 2 , . . . , m is a matrix giv en by the tensor pro duct of the matrices { f ( e ) : e is an incoming edge of v k } and a num b er of iden tit y matrices. As an example, for the follo wing netw ork with V = { v 1 , . . . , v 6 } f ( v 1 ) f ( v 3 ) f ( v 2 ) f ( v 4 ) f ( v 5 ) f ( v 6 ) f ( { 1 , 3 } ) f ( { 1 , 2 } ) f ( { 2 , 3 } ) f ( { 4 , 5 } ) f ( { 3 , 4 } ) f ( { 2 , 5 } ) f ( { 4 , 6 } ) f ( { 5 , 6 } ) the matrices M 2 , . . . , M m − 1 and E 2 , . . . , E m are given by 74 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES f ( v 1 ) f ( v 2 ) f ( v 3 ) f ( v 4 ) f ( v 5 ) f ( v 6 ) I I I I f ( { 1 , 2 } ) f ( { 1 , 3 } ) f ( { 2 , 3 } ) f ( { 2 , 5 } ) f ( { 1 , 3 } ) f ( { 3 , 4 } ) I f ( { 4 , 5 } ) f ( { 4 , 6 } ) I I f ( { 5 , 6 } ) vec( f ( v 1 )) I ⊗ mat 1 , 2 ( f ( v 2 )) mat 2 , 1 ( f ( v 3 )) ⊗ I mat 1 , 2 ( f ( v 4 )) ⊗ I I ⊗ mat 2 , 1 ( f ( v 5 )) vec( f ( v 6 )) More generally , for each directed edge e = ( v k ′ , v k ), w e introduce a factor of the iden tit y matrix I ∈ R n × n in the tensor pro duct defining each matrix M k ′ +1 , . . . , M k − 1 and E k ′ +1 , . . . , E k − 1 , and a factor of f ( e ) ∈ R n × n in to that defining E k , to form a path of degree-2 v ertices from v k ′ to v k whose lab els ha v e product f ( e ). Then, since ∥ M k ∥ op = ∥ f ( v k ) ∥ I k → O k and ∥ E k ∥ op = Q incoming edges e of v k ∥ f ( e ) ∥ op , where eac h edge app ears in exactly one such term ∥ E k ∥ op , this implies the b ound | v al( G, f ) | ≤ ∥ v ec( f ( v 1 )) ∥ 2 ∥ v ec( f ( v m )) ∥ 2 m − 1 Y k =2 ∥ M k ∥ op m Y k =2 ∥ E k ∥ op ≤ ∥ f ( s ) ∥ F ∥ f ( t ) ∥ F Y v ∈ V \{ s,t } ∥ f ( v ) ∥ mat-op Y e ∈ E ∥ f ( e ) ∥ op . □ Man y of the netw orks that arise in our pro of will b e of the following form, whic h b y the following lemma admits a bip olar orien tation: Lemma 5.5. Supp ose ( G, f ) is a tensor network wher e G = ( { s, t } ∪ A ∪ B , E ) has its vertic es p artitione d into thr e e disjoint sets { s, t } , A , B , such that • The sub gr aph induc e d by A ∪ B is c onne cte d. • Each vertex of { s, t } has at le ast 1 incident e dge c onne cting to A , and e ach vertex v ∈ A has at le ast 1 incident e dge c onne cting to { s, t } . • Each vertex u ∈ B has at le ast 2 distinct neighb ors in A . Then G admits a ( s, t ) -bip olar orientation. Pr o of. It suffices to chec k that the augmen ted multigraph G + = ( { s, t } ∪ A ∪ B , E ∪ ( s, t )) is biconnected, i.e. it remains connected up on remo ving any vertex v and its incident edges. W e denote this graph by G + \ { v } . F or a ∈ { s, t } , G + \ { a } remains connected since the subgraph induced by A ∪ B is connected, and the remaining vertex of { s, t } different from a is a neigh b or of A . F or u ∈ B , note that all v ertices of A are neighbors of { s, t } , which are neighbors of each other via the added edge ( s, t ). Th us { s, t } ∪ A forms a single connected comp onen t. Up on remo ving u , eac h remaining vertex of B is a neighbor of A , so G + \ { u } is connected. F or v ∈ A , by the same reasoning, all vertices of { s, t } ∪ ( A \ { v } ) form a single connected comp onen t of G + \ { v } . Up on remo ving v and its inciden t edges, each u ∈ B remains a neigh b or of A \ { v } , since u has at least 2 distinct neighbors in A . Thus G + \ { v } is connected. □ ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 75 5.3.2. T ensor network r epr esentation. Lemma 5.6. L et x ∈ R n b e a r andom ve ctor having me an 0 and finite moments of al l or ders, and let x ⊙ l = ( x [ i ] l ) n i =1 ∈ R n b e its entrywise l th p ower. L et k ≥ 1 , l 1 , . . . , l k ≥ 1 , and U ∈ ( R n ) ⊗ k . Then ⟨ κ k ( x ⊙ l 1 , ..., x ⊙ l k ) , U ⟩ = X ( G,f ) ∈G v al( G, f ) wher e G ≡ G ( l 1 , . . . , l k , U ) is the set of al l distinct tensor networks ( G, f ) satisfying the pr op erties: (1) G = ( { t } ∪ K ∪ V , E ) wher e the vertic es ar e p artitione d into disjoint sets { t } , K , V . Each e dge e ∈ E has lab el f ( e ) = I ∈ R n × n , and c onne cts one vertex of { t } ∪ K with one vertex of V (henc e G is bip artite). F urthermor e, the bip artite sub gr aph induc e d by K ∪ V is c onne cte d. (2) V = { v 1 , . . . , v k } wher e e ach v i has de gr e e de g ( v i ) = l i + 1 and lab el f ( v i ) = I ∈ ( R n ) ⊗ l i +1 . Her e I is the or der- ( l i + 1) identity tensor, having al l diagonal entries 1 and off-diagonal entries 0. (3) The vertex t has de g ( t ) = k , lab el f ( t ) = U , and its or der e d e dges c onne ct to v 1 , . . . , v k . (4) Each vertex u ∈ K has de g ( u ) ≥ 2 and lab el f ( u ) = κ de g ( u ) ( x ) ∈ ( R n ) ⊗ de g ( u ) . Pr o of. By linearit y , it suffices to show the lemma when U = e i 1 ⊗ . . . ⊗ e i k is a standard basis elemen t for every c hoice of indices i 1 , . . . , i k ∈ [ n ]. W e induct on k . F or k = 1 and U = e i , we hav e b y the moment-cum ulan t relation ⟨ κ 1 ( x ⊙ l ) , U ⟩ = κ 1 ( x [ i ] l ) = E [ x [ i ] l ] = X π ∈ P l Y B ∈ π κ | B | ( x [ i ]) , where P l is the set of all partitions of { 1 , . . . , l } . Since x has mean 0, the summand corresponding to π ∈ P l is 0 if π has a singleton blo ck. Eac h remaining summand corresp onding to π = { B 1 , . . . , B j } ma y b e understo o d as v al( G, f ) for the tensor netw ork describ ed b y the lemma, where K has j v ertices of degrees | B 1 | , . . . , | B j | with all edges connecting to the single vertex v ∈ V . The set G ( l, U ) of suc h tensor netw orks is in 1-to-1 corresp ondence with such summands π ∈ P l . This pro v es the lemma for k = 1. No w supp ose the lemma holds for all k ≤ t , and consider k = t + 1 and U = e i 1 ⊗ . . . ⊗ e i t +1 . By the moment-cum ulan t relation for a mixed m omen t of order t + 1, E " t +1 Y m =1 x [ i m ] l m # = X π ∈ P t +1 Y B ∈ π κ | B | ( x [ i b ] l b : b ∈ B ) . Let 1 t +1 ∈ P t +1 denote the partition having one blo c k containing all t +1 elements. Then rearranging giv es ⟨ κ t +1 ( x ⊙ l 1 , . . . , x ⊙ l t +1 ) , U ⟩ = κ t +1 ( x [ i 1 ] l 1 , . . . , x [ i t +1 ] l t +1 ) = E " t +1 Y m =1 x [ i m ] l m # − X π ∈ P t +1 \ 1 t +1 Y B ∈ π κ | B | ( x [ i b ] l b : b ∈ B ) . On the other hand, denote b y A ( l 1 , . . . , l k , U ) the set of all distinct tensor net w orks ( G, f ) satisfying all prop erties of the lemma except the prop erty that K ∪ V is connected. Let L = l 1 + . . . + l t +1 , and let j 1 = . . . = j l 1 = i 1 , j l 1 +1 = . . . = j l 1 + l 2 = i 2 , ..., j l t +1 = . . . = j l t +1 = i t +1 . Applying the momen t-cumulan t relation for a mixed moment of order L , we hav e E " t +1 Y m =1 x [ i m ] l m # = E " L Y m =1 x [ j m ] # = X π ∈ P L Y B ∈ π κ | B | ( x [ j b ] : b ∈ B ) = X ( G,f ) ∈A ( l 1 ,...,l t +1 ,U ) v al( G, f ) . 76 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES F or each π ∈ P t +1 \ 1 t +1 and blo c k B ∈ π , let U B = ⊗ b ∈ B e i b b e the corresp onding factors of U . Then by the induction hypothesis, w e hav e κ | B | ( x [ i b ] l b : b ∈ B ) = X ( G,f ) ∈G (( l b : b ∈ B ) , U b ) v al( G, f ) T aking the pro duct o v er blo cks B ∈ π , w e get Y B ∈ π κ | B | ( x [ i b ] l b : b ∈ B ) = X ( G,f ) ∈G π ( l 1 ,...,l t +1 ,U ) v al( G, f ) where, if π = { B 1 , . . . , B j } , then G π ( l 1 , . . . , l t +1 , U ) is the set of tensor net w orks in A ( l 1 , . . . , l t +1 , U ) for whic h K ∪ V splits exactly in to j connected comp onen ts, with one comp onent corresp onding to each blo c k B ∈ π that contains the vertices { v b ∈ V : b ∈ B } . Then, since G ( l 1 , . . . , l t +1 , U ) = G 1 t +1 ( l 1 , . . . , l t +1 , U ) is precisely the set of tensor netw orks with a single connected comp onent, A ( l 1 , . . . , l t +1 , U ) \ G ( l 1 , . . . , l t +1 , U ) = [ π ∈ P t +1 \ 1 t +1 G π ( l 1 , . . . , l t +1 , U ) . Th us ⟨ κ t +1 ( x ⊙ l 1 , . . . , x ⊙ l t +1 ) , U ⟩ = X ( G,f ) ∈A ( l 1 ,...,l t +1 ,U ) v al( G, f ) − X ( G,f ) ∈A ( l 1 ,...,l t +1 ,U ) \G ( l 1 ,...,l t +1 ,U ) v al( G, f ) = X ( G,f ) ∈G ( l 1 ,...,l t +1 ,U ) v al( G, f ) . This completes the inductive argument. □ 5.3.3. Pr o of of Assumption 2 . Lemma 5.7. L et x ∈ R n b e a r andom ve ctor having me an 0 and finite moments of al l or ders. Then for any k ∈ N and any deterministic, symmetric matrix A ∈ R n × n , E ( x ∗ A x − E x ∗ A x ) 2 k ≤ X p artitions π of [4 k ] Y B ∈ π ∥ κ | B | ( x ) ∥ mat - op ∥ A ∥ 2 k F . Pr o of. Let Σ = E xx ∗ . W e can rewrite the e xpectation as E ( x ∗ A x − E x ∗ A x ) 2 k = E [T r(( xx ∗ − Σ) A )] 2 k = ⟨ E [( xx ∗ − Σ) ⊗ 2 k ] , A ⊗ 2 k ⟩ . Applying Lemma 3.15 , the ab o v e v alue ma y b e decomposed as ⟨ E [( xx ∗ − Σ) ⊗ 2 k ] , A ⊗ 2 k ⟩ = X ( G,f ) ∈N v al( G, f ) where N is the set of all distinct tensor net w orks ( G, f ) satisfying the prop erties: (1) G = ( V , E ) where the vertices V = A ∪ B are partitioned in to disjoint sets A , B . Each edge e ∈ E has label f ( e ) = I ∈ R n × n , and connects one vertex of A with one vertex of B (hence G is bipartite). (2) A = { a 1 , . . . , a 2 k } where eac h v ertex a i has degree 2 and lab el f ( a i ) = A ∈ R n × n . (3) B = { b 1 , . . . , b m } for some 1 ≤ m ≤ 2 k , where eac h vertex b i satisfies 2 ≤ deg( b i ) ≤ 4 k and has lab el f ( b i ) = κ deg( b i ) ( x ). (4) If any b i has degree 2, then the tw o edges incident to b i connect to distinct neigh b ors in A . Here deg( b i ) ≥ 2 because E x = κ 1 ( x ) = 0, so only diagrams without κ 1 ( x ) contribute. Additionally , Lemma 3.15 ensures that no cop y of A is contracted with a single cum ulant tensor κ 2 ( x ) = Σ, implying prop ert y 4. Note that these prop erties ensure each connected comp onen t of G contains at least 2 vertices from A , and eac h vertex of B has at least 2 distinct neigh b ors in A . ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 77 Since v al( G, f ) factorizes across connected comp onents of G , we ma y b ound v al( G ′ , f ) for each connected component G ′ of G . Let us mo dify G ′ to yield a net work satisfying the conditions of Lemma 5.5 : Let A = U diag ( v ) U ∗ b e the spectral decomp osition of A . F or eac h a ∈ A ∩ G ′ , denote its tw o incident edges e 1 , e 2 . W e in tro duce a new vertex v , connect v to a via a new edge e 3 so that deg( v ) = 1 and deg( a ) = 3, and relab el as f ( v ) = v , f ( a ) = I ∈ ( R n ) ⊗ 3 (the diagonal order-3 tensor with all diagonal entries 1), f ( e 3 ) = I ∈ R n × n , and f ( e 1 ) = f ( e 2 ) = U ∈ R n × n . Note that this does not change v al( G ′ , f ). Let V be the set of newly added vertices. Then |V | = |A ∩ G ′ | ≥ 2. W e partition V in to t w o non-empt y sets V 1 , V 2 of cardinalities i, j ≥ 1, merge the v ertices in V 1 in to a single vertex s having lab el f ( s ) = v ⊗ i , and merge the vertices in V 2 in to a single vertex t ha ving lab el f ( t ) = v ⊗ j . This also do es not change v al( G ′ , f ). It is readily chec k ed that G ′ no w satisfies the condition of Lemma 5.5 . Thus by Lemma 5.4 , | v al( G ′ , f ) | ≤ ∥ v ∥ i 2 ∥ v ∥ j 2 ∥ U ∥ 2 |A∩ G ′ | op Y b ∈B∩ G ′ ∥ κ deg( b ) ( x ) ∥ mat-op = ∥ A ∥ |A∩ G ′ | F Y b ∈B∩ G ′ ∥ κ deg( b ) ( x ) ∥ mat-op , where we hav e used i + j = |V | = |A ∩ G ′ | and ∥ v ∥ 2 = ∥ A ∥ F . Multiplying across all connected comp onen ts G ′ of G and then summing o v er all ( G, f ) ∈ N completes the pro of. □ Lemma 5.8. Fix any k ≥ 1 and l 1 , . . . , l k ≥ 1 . L et |G ( l 1 , . . . , l k ) | b e the c ar dinality of the set G ≡ G ( l 1 , . . . , l k , U ) in L emma 5.6 (wher e this c ar dinality do es not d ep end on U ). L et l = l 1 + . . . + l k , and set C l ( w ) = sup p artitions π of [ l ] Q B ∈ π ∥ κ | B | ( w ) ∥ ∞ . Then ∥ κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k ) ∥ mat - op ≤ C l ( w ) ∥ X ∥ l op |G ( l 1 , . . . , l k ) | . Pr o of. Consider an y partition of [ k ] in to tw o disjoint non-empty sets I , J , and suppose U ∈ ( R n ) ⊗ k factorizes as S ⊗ T with respect to this partition, for some S ∈ ( R n ) ⊗| I | and T ∈ ( R n ) ⊗| J | . W e hav e b y Le mma 5.6 ⟨ κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k ) , U ⟩ = X G ∈G v al( G, f ) where G ≡ G ( l 1 , . . . , l k , U ). W e proceed to modify each tensor netw ork ( G, f ) ∈ G via a sequence of steps that do not ch ange v al( G, f ), to yield a netw ork satisfying the requirements of Lemma 5.5 : (1) The vertex t in Lemma 5.6 has lab el f ( t ) = U = S ⊗ T . Since f ( t ) factorizes, we ma y split t in to tw o new v ertices s, t with lab els f ( s ) = S and f ( t ) = T , suc h that edges corresp onding to I connect to s and edges corresp onding to J connect to t . (2) Each v ertex u ∈ K has lab el f ( u ) = κ deg( u ) ( X w ), with deg( u ) inciden t edges ha ving lab el I . By multilinearit y of mixed cumulan ts, we may relab el f ( u ) = κ deg( u ) ( w ) and relab el each inciden t edge e = ( v , u ) with X . Note that f ( u ) is then diagonal b ecause w has independent en tries. (3) F or eac h v ertex u ∈ K ha ving a single unique neigh b or v ∈ V : Let its inciden t edges be e 1 , . . . , e deg( u ) = ( v , u ). Note that deg( v ) ≥ deg( u ) + 2 since v is connected to at least one of { s, t } and to ( K ∪ V ) \ { u, v } . (If K ∪ V = { u, v } , this contin ues to hold as then v is connected to b oth { s, t } .) W e contract u b y removing u and e 1 , . . . , e deg( u ) from ( G, f ), and replacing the diagonal tensor label f ( v ) ≡ diag ( e v ) ∈ ( R n ) ⊗ deg( v ) with the new lab el diag( ˆ v ) ∈ ( R n ) ⊗ deg( v ) − deg( u ) ha ving entries ˆ v [ i ] = ˜ v [ i ] ×  n X j =1 X [ i, j ] deg( u ) κ deg( u ) ( w )[ j ]  . A t the end of Step 2, we ha ve ∥ f ( s ) ∥ F ∥ f ( t ) ∥ F Y v ∈K∪V ∥ f ( v ) ∥ ∞ Y e ∈E ∥ f ( e ) ∥ op ≤ C l ( w ) ∥ X ∥ l op ∥ S ∥ F ∥ T ∥ F . (5.6) 78 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES F or eac h application of Step 3, since deg( u ) ≥ 2, we hav e the bound ∥ ˆ v ∥ ∞ ≤ ∥ ˜ v ∥ ∞ ∥ κ deg( u ) ( w ) ∥ ∞ ∥ X ∥ deg( u ) − 2 ∞ n max i =1 n X j =1 X [ i, j ] 2 ≤ ∥ ˜ v ∥ ∞ ∥ κ deg( u ) ( w ) ∥ ∞ ∥ X ∥ deg( u ) op , and th us the inequality ( 5.6 ) con tinues to hold after Step 3. It is readily chec k ed that after all three steps of the ab ov e pro cedure, G satisfies the condition of Lemma 5.5 with A ≡ V and B ≡ K . So applying Lemma 5.4 gives |⟨ κ k ( x ⊙ l 1 , . . . , x ⊙ l k ) , U ⟩| ≤ X G ∈G | v al( G, f ) | ≤ C l ( w ) ∥ X ∥ l op |G ( l 1 , . . . , l k ) | · ∥ S ∥ F ∥ T ∥ F . Since I , J and S, T defining U = S ⊗ T are arbitrary , by definition of ∥ · ∥ mat-op , this completes the pro of. □ Pr o of of Pr op osition 2.17 , Assumption 2 . Denote a l = ( a il ) n i =1 (where in the setting of Proposition 2.17 (a), a l = 0 for l > D ). F or any fixed k ≥ 1 and l 1 , . . . , l k ≥ 1, we ma y expand by m ultilinearit y of the mixed cumulan t κ k ( a l 1 ⊙ ( X w ) ⊙ l 1 , . . . , a l k ⊙ ( X w ) ⊙ l k ) = ( a l 1 ⊗ . . . ⊗ a l k ) ⊙ κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k ) . Th us, noting that g = σ ( X w ) = P ∞ l =0 a l ⊙ ( X w ) ⊙ l and any mixed cum ulan t κ k ( . . . ) is 0 if an argumen t is constant (non-random), we hav e ∥ κ k ( g ) ∥ mat-op =       ∞ X l 1 ,...,l k =1 ( a l 1 ⊗ . . . ⊗ a l k ) ⊙ κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k )       mat-op ≤ ∞ X l 1 ,...,l k =1  k Y i =1 ∥ a l i ∥ ∞  ∥ κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k ) ∥ mat-op ≤ ∞ X l 1 ,...,l k =1  k Y i =1 ∥ a l i ∥ ∞  C l ( w ) ∥ X ∥ l op |G ( l 1 , . . . , l k ) | , the last inequalit y using Lemma 5.8 . In the setting of Prop osition 2.17 (a), C l ( w ), ∥ X ∥ l op , |G ( l 1 , . . . , l k ) | , and Q k i =1 ∥ a l i ∥ ∞ are all b ounded by a constan t dep end only on k , D , so ∥ κ k ( g ) ∥ mat-op ≤ C for a constan t C ≡ C ( k , D ) > 0. In the setting of Prop osition 2.17 (b) where w ∼ N (0 , I ), we hav e κ 2 ( w ) = I and κ k ( w ) = 0 for all k ≥ 3. Then C l ( w ) = 1 for all ev en l ≥ 2, and C l ( w ) = 0 for o dd l . In Lemma 5.6 , also v al( G, f ) = 0 unless l = l 1 + . . . + l k is ev en and each vertex u ∈ K has degree exactly 2, in whic h case |G ( l 1 , . . . , l k ) | ≤ l !! ≤ C l l l/ 2 where l !! is the n um b er of partitions of l in to l/ 2 blocks of size 2. Applying the given condition ∥ a l i ∥ ∞ < C ( l i !) − β < C ( l i /e ) − β l i and P k i =1 l i log l i > l log( l /k ), we ha v e that Q k i =1 ∥ a l i ∥ ∞ < ( C k e β ) l l − β l . Then, for a constant C ′ > 0 not dep ending on k , l and for an y l ≥ k , X l 1 ,...,l k ≥ 1 l 1 + ... + l k = l  k Y i =1 ∥ a l i ∥ ∞  C l ( w ) ∥ X ∥ l op |G ( l 1 , . . . , l k ) | ≤ X l 1 ,...,l k ≥ 1 l 1 + ... + l k = l ( C k ) l l ( 1 2 − β ) l ≤ ( C k 2 ) l l ( 1 2 − β ) l . (5.7) Since β > 1 / 2, this is summable ov er l ≥ k , so also ∥ κ k ( g ) ∥ mat-op ≤ C for a constant C ≡ C ( k , β ) > 0 in this case. ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 79 Then b y Lemma 5.7 , for any A ∈ R n × n , w e hav e E ( x ∗ A x − E x ∗ A x ) 2 k ≤ C ∥ A ∥ 2 k F for eac h fixed k ≥ 1 and a constant C ≡ C ( k ) > 0 (where this holds also for asymmetric A up on applying the lemma with ( A + A ∗ ) / 2). This implies Assumption 2 by Mark ov’s inequalit y . □ 5.3.4. Pr o of of Assumption 3 . F or each integer l ≥ 0, let us no w define a set of vectors U ( l ) ⊂ R n for v erifying Assumption 3 , as follo ws: Define first the normalized matrices and scalar cumulan ts, for k ≥ 2, ¯ X = X/ ∥ X ∥ op , ¯ κ k ( w [ j ]) = κ k ( w [ j ]) / ∥ κ k ( w ) ∥ ∞ . (If ∥ κ k ( w ) ∥ ∞ = 0, we set ¯ κ k ( w [ j ]) = 0.) Define the diagonal matrices D k = diag  n X j =1 ¯ X [ i, j ] k ¯ κ k ( w [ j ])  n i =1  , K k = diag  ¯ κ k ( w [ j ])  n j =1  . Let M 0 = { I , ¯ X , ¯ X ∗ } ∪ { D k } k ≥ 2 ∪ { K k } k ≥ 2 . W e say that I has degrees 0 in b oth X and w , that ¯ X , ¯ X ∗ ha v e degree 1 in X and 0 in w , that D k has degrees k in b oth X and w , and that K k has degree 0 in X and k in w . F or eac h j ≥ 1, let M j = { AB , A ⊙ B : A, B ∈ M j − 1 } (where AB is the matrix pro duct and A ⊙ B is the entrywise pro duct). Eac h set M j is closed under matrix transp ose, since this holds for M 0 . If A, B ha v e degrees a x , b x in X and a w , b w in w , then we sa y that AB and A ⊙ B ha v e degrees a x + b x in X and a w + b w in w . Let M = S j ≥ 0 M j , i.e. M consists of all matrices obtained from I , ¯ X , D k , K k through combinations of matrix pro ducts, en trywise pro ducts, and matrix transp oses, and let M ( l ) ⊂ M b e those elements with degree at most l in X and at most l in w . W e set U ( l ) = { M e i : M ∈ M ( l ) , M  = 0 , i ∈ [ n ] } . (5.8) The following lemma is the main step to verify Assumption 3 . Lemma 5.9. Fix any k ≥ 3 and l 1 , . . . , l k ≥ 1 , and let l = l 1 + . . . + l k . L et |G ( l 1 , . . . , l k ) | and C l ( w ) b e as define d in L emma 5.8 . Then for any m ∈ { 1 , . . . , k − 1 } , s 1 , . . . , s m ∈ R n and T ∈ ( R n ) ⊗ k − m ,    ⟨ κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩    ≤ C l ( w ) ∥ X ∥ l op |G ( l 1 , . . . , l k ) |× √ n k − m − 1 ∥ T ∥ U ( l ) m Y i =1 ∥ s i ∥ 2 . (5.9) W e pro ceed to sho w Lemma 5.9 . F or the later combinatorial arguments and casew ork, it will b e conv enient to first sp ecialize Lemma 5.6 to this setting where U = s 1 ⊗ . . . ⊗ s m ⊗ T , and to simplify the net works by contracting vertices with 1 or 2 unique neighbors. The structure of the resulting simplified net works is captured b y the following lemma. Lemma 5.10. In the setting of L emma 5.9 , ther e exists a set G of tensor networks ( G, f ) having c ar dinality |G ( l 1 , . . . , l k ) | such that ⟨ κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩ = X ( G,f ) ∈G v al( G, f ) , and e ach ( G, f ) ∈ G satisfies the fol lowing pr op erties: (1) G = ( { s, t } ∪ K ∪ V , E ) wher e the vertic es ar e p artitione d into disjoint sets { s, t } , K , V . The lab el f ( v ) for e ach v ∈ K ∪ V is a diagonal tensor, the lab el f ( s ) is a r ank-1 tensor (or a ve ctor if de g ( s ) = 1 ), and the lab el f ( e ) for e ach e dge incident to { s, t } is I ∈ R n × n . (2) The sub gr aph induc e d by K ∪ V is c onne cte d. Every e dge that is incident to a vertex in { s, t } ∪ K c onne cts to a vertex in V . (V ertic es of V may also c onne ct to e ach other, so G is not ne c essarily bip artite.) The vertic es { s, t } have de gr e es de g ( s ) = m and de g ( t ) = k − m , and e ach vertex of V is incident to at le ast one e dge c onne cting to { s, t } . (3) Each vertex u ∈ K ∪ V has de g ( u ) ≥ 3 , and any two vertic es u, v ∈ K ∪ V ar e c onne cte d by at most 1 e dge (i.e. the sub gr aph induc e d by K ∪ V is simple). 80 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES (4) L et U ( l ) b e as define d in ( 5.8 ) , and let C l ( w ) b e as define d in L emma 5.9 . Then ∥ f ( s ) ∥ F ∥ f ( t ) ∥ ∞ Y v ∈K∪V ∥ f ( v ) ∥ ∞ Y e ∈E ∥ f ( e ) ∥ op ≤ C l ( w ) ∥ X ∥ l op ∥ T ∥ U ( l ) m Y i =1 ∥ s i ∥ 2 (5.10) Pr o of. W e b egin with the represen tation of Lemma 5.6 applied to x = X w , ⟨ κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩ = X ( G,f ) ∈G v al( G, f ) where G ≡ G ( l 1 , . . . , l k , s 1 ⊗ . . . ⊗ s m ⊗ T ). W e pro ceed to mo dify eac h tensor netw ork ( G, f ) ∈ G via a sequence of steps that do not change v al( G, f ), to yield a netw ork satisfying the stated prop erties: (1) As in Lemma 5.8 , w e split the v ertex t with lab el f ( t ) = s 1 ⊗ . . . ⊗ s m ⊗ T into tw o v ertices s, t with lab els f ( s ) = s 1 ⊗ . . . ⊗ s m and f ( t ) = T , such that the first m edges are inciden t to s and the last k − m edges are incident to t . (2) As in Lemma 5.8 , we relab el each u ∈ K by (the diagonal lab el) f ( u ) = κ deg( u ) ( w ) and relab el each inciden t edge e = ( v , u ) with X . This resulting net w ork satisfies Conditions 1–2 of Lemma 5.10 . (3) W e then rep eatedly apply the following steps. It is readily c hec k ed that each step preserves v al( G, f ) and Conditions 1–2: (a) As in Lemma 5.8 , if a vertex u ∈ K has a single unique neigh b or v ∈ V with incident edges e 1 , . . . , e deg( u ) = ( v , u ), w e contract u by removing u and e 1 , . . . , e deg( u ) from ( G, f ), and replacing the lab el lab el f ( v ) ≡ diag( ˜ v ) ∈ ( R n ) ⊗ deg( v ) with the new lab el diag( ˆ v ) ∈ ( R n ) ⊗ deg( v ) − deg( u ) ha ving entries ˆ v [ i ] = ˜ v [ i ] ×  n X j =1 X [ i, j ] deg( u ) κ deg( u ) ( w )[ j ] | {z } = ∥ X ∥ deg( u ) op D deg( u ) [ i,i ]  . (b) Supp ose t w o vertices u, v ∈ K ∪ V eac h hav e at least 2 distinct neigh bors, and u, v are also connected b y a ≥ 2 edges e 1 , . . . , e a = ( u, v ). W e replace these edges by a single edge e = ( u, v ) ha ving lab el f ( e ) = f ( e 1 ) ⊙ . . . ⊙ f ( e a ) (the entrywise pro duct), and reduce the orders of f ( u ) , f ( v ) by a − 1 while keeping their diagonal en tries the same. (c) Supp ose a vertex u ∈ K has deg( u ) = 2 and tw o distinct neighbors v , v ′ ∈ V . Let e = ( v , u ) and e ′ = ( u, v ′ ). W e con tract u b y removing u and e, e ′ , and replacing these with the single new edge e ′′ = ( v , v ′ ) lab eled b y f ( e ′′ ) = f ( e ) f ( u ) f ( e ′ ). (d) Supp ose a v ertex v ∈ V has deg( v ) = 2 and is adjacen t to one vertex in { s, t } — say , s via its last incident edge — and one v ertex u ∈ K ∪ V . Let e = ( s, v ) and e ′ = ( v , u ), and note that f ( u ) is diagonal and f ( e ) = I . W e contract v b y removing v and e, e ′ , replacing these by a single new edge e ′′ = ( s, u ) with lab el f ( e ′′ ) = I , reassigning u to the v ertex set V if it b elongs to K (so that eac h edge inciden t to s still connects to V ), and replacing f ( s ) ≡ e S ∈ ( R n ) ⊗ deg( s ) b y the new lab el b S ∈ ( R n ) ⊗ deg( s ) giv en by the con traction b S [ i 1 , . . . , i deg( s ) ] = n X j =1 e S [ i 1 , . . . , i deg( s ) − 1 , j ] f ( v )[ j, j ] f ( e ′ )[ j, i deg( s ) ] Note that if e S = ˜ s 1 ⊗ . . . ⊗ ˜ s deg( s ) , then b S = ˜ s 1 ⊗ . . . ⊗ ˜ s deg( s ) − 1 ⊗ ( f ( e ′ ) ∗ f ( v ) ˜ s deg( s ) ), so b S remains rank-1, as required in Condition 1. W e apply the abov e steps (a–d) iteratively in any order, un til none of these steps are further applicable. (The num ber of suc h applications is finite, since eac h step remov es at least one v ertex or edge.) ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 81 W e no w chec k Conditions 3–4 for this resulting netw ork: Note first that since Conditions 1–2 hold, it is either the case that every v ∈ V has at least 1 neighbor in { s, t } and at least 1 neighbor in ( K ∪ V ) \ { v } , or K ∪ V = { v } consists of the single vertex v which has m edges connecting to s and k − m edges connecting to t . In b oth cases, v has at least 2 distinct neighbors. W e hav e deg( v ) ≥ 3 in the former case b ecause Step 3(d) is no longer applicable, and deg( v ) ≥ 3 in the latter case b ecause k ≥ 3. F or each u ∈ K , we also hav e that u has at least 2 distinct neighbors and deg( u ) ≥ 3, because Steps 3(a,c) are no longer applicable. Then any tw o v ertices u, v ∈ K ∪ V are connected b y at most 1 edge, as Step 3(b) is no longer applicable. This sho ws Condition 3. T o c hec k Condition 4, note that the net work ( G, f ) after Step 2 has f ( s ) = s 1 ⊗ . . . ⊗ s m , f ( t ) = T , and ∥ f ( s ) ∥ F ∥ f ( t ) ∥ ∞ Y v ∈K∪V ∥ f ( v ) ∥ ∞ Y e ∈E ∥ f ( e ) ∥ op =  m Y i =1 ∥ s i ∥ 2  ∥ T ∥ ∞  Y u ∈K ∥ κ deg( u ) ( w ) ∥ ∞  ∥ X ∥ l op . (5.11) Th us Condition 4 holds for this netw ork. F or each application of Step 3(a), since deg( u ) ≥ 2, we ha v e ∥ ˆ v ∥ ∞ ≤ ∥ ˜ v ∥ ∞ ∥ κ deg( u ) ( w ) ∥ ∞ ∥ X ∥ deg( u ) − 2 ∞ n max i =1 n X j =1 X [ i, j ] 2 ≤ ∥ ˜ v ∥ ∞ ∥ κ deg( u ) ( w ) ∥ ∞ ∥ X ∥ deg( u ) op . F or eac h application of Step 3(b), w e hav e ∥ f ( e ) ∥ op = ∥ f ( e 1 ) ⊙ . . . ⊙ f ( e a ) ∥ op ≤ ∥ f ( e 1 ) ∥ op . . . ∥ f ( e a ) ∥ op . F or eac h application of Step 3(c), w e hav e ∥ f ( e ′′ ) ∥ op = ∥ f ( e ) f ( u ) f ( e ′ ) ∥ op ≤ ∥ f ( e ) ∥ op ∥ f ( u ) ∥ ∞ ∥ f ( e ′ ) ∥ op . F or eac h application of Step 3(d) to s , w e hav e ∥ b S ∥ F = ∥ ˜ s 1 ∥ 2 . . . ∥ ˜ s deg( s ) − 1 ∥ 2 ∥ f ( e ′ ) ∗ f ( u ) ˜ s deg( s ) ∥ 2 ≤ ∥ f ( e ′ ) ∥ op ∥ f ( u ) ∥ ∞ ∥ e S ∥ F . Th us, e ac h of these op erations do not increase the left side of ( 5.11 ). Eac h time Step 3(d) is applied to t , a matrix of the form f ( v ) f ( e ′ ) is con tracted with one axis of the tensor f ( t ) ∈ ( R n ) ⊗ k − m . Here, f ( v ) f ( e ′ ) is obtained from successively applying matrix pro ducts, en trywise pro ducts, and transp oses to the matrices ∥ X ∥ k op D k , diag (( κ k ( w [ j ])) n j =1 ) = ∥ κ k ( w ) ∥ ∞ K k , and X = ∥ X ∥ op ¯ X arising in the preceding applications of Steps 3(a–c). Hence, the final lab el f ( t ) is given b y the con traction of T along each axis j = 1 , . . . , k − m with a matrix of the form ∥ X ∥ a j op Q k ∈ B j ∥ κ k ( w ) ∥ ∞ · M j for some a j ≥ 0, some list B j (p ossibly empty) of integers ≥ 2, and some matrix M j ∈ M ha ving degree a j in X and b j := P k ∈ B j k in w , where M and this notion of degree are as defined preceding ( 5.8 ). In the netw ork after Step 2, w e ha ve l edges with lab el X , and also P u ∈K deg( u ) = l . Thus, for the lab el f ( t ) after the conclusion of the ab o v e steps, w e must hav e a 1 + . . . + a k − m ≤ l and b 1 + . . . + b k − m ≤ l , and in particular M j ∈ M ( l ) for eac h j = 1 , . . . , k − m . Thus ∥ f ( t ) ∥ ∞ ≤ ∥ T ∥ U ( l ) k − m Y j =1   ∥ X ∥ a j op Y k ∈ B j ∥ κ k ( w ) ∥ ∞   . Whenev er Step 3(d) is applied to contract a matrix of the form f ( v ) f ( e ′ ) = ∥ X ∥ a op Q k ∈ B ∥ κ k ( w ) ∥ ∞ M in to f ( t ), this factor of ∥ X ∥ a op Q k ∈ B ∥ κ k ( w ) ∥ ∞ is remov ed from the b ound ( 5.11 ) for the remaining 82 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES factors on the left side of ( 5.11 ). Thus, for the final net w ork ( G, f ), w e hav e the b ound ∥ f ( s ) ∥ F ∥ f ( t ) ∥ ∞ Y v ∈K∪V ∥ f ( v ) ∥ ∞ Y e ∈E ∥ f ( e ) ∥ op ≤  m Y i =1 ∥ s i ∥ 2  ∥ T ∥ U ( l )  Y u ∈U ∥ κ deg( u ) ( w ) ∥ ∞  ∥ X ∥ l op whic h sho ws Condition 4 since Q u ∈K ∥ κ deg( u ) ( w ) ∥ ∞ ≤ C l ( w ). □ Note that each net w ork ( G, f ) of Lemma 5.10 satifies the conditions of Lemma 5.5 with A = V and B = K . Then, by Lemma 5.4 , w e hav e | v al( G, f ) | ≤ ∥ f ( s ) ∥ F ∥ f ( t ) ∥ F Y u ∈K∪V ∥ f ( u ) ∥ mat-op Y e ∈E ∥ f ( e ) ∥ op . Applying also ∥ f ( t ) ∥ F ≤ √ n k − m ∥ f ( t ) ∥ ∞ for any f ( t ) ∈ ( R n ) ⊗ k − m , and ∥ f ( u ) ∥ mat-op = ∥ f ( u ) ∥ ∞ b ecause f ( u ) is diagonal, Lemma 5.10 then implies the b ound    ⟨ κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩    ≤ X ( G,f ) ∈G | v al( G, f ) | ≤ X ( G,f ) ∈G √ n k − m ∥ f ( s ) ∥ F ∥ f ( t ) ∥ ∞ Y u ∈K∪V ∥ f ( u ) ∥ ∞ Y e ∈E ∥ f ( e ) ∥ op ≤ C l ( w ) ∥ X ∥ l op |G ( l 1 , . . . , l k ) | × √ n k − m ∥ T ∥ U ( l ) m Y i =1 ∥ s i ∥ 2 . This shows a w eak er form of ( 5.9 ) with factor √ n k − m instead of √ n k − m − 1 . W e conclude the pro of of Lemma 5.9 by im pro ving this factor to √ n k − m − 1 using a more in v olved com binatorial argument and some casework. Lemma 5.11. L et ( G, f ) ∈ G b e any tensor network satisfying the pr op erties of L emma 5.10 . Supp ose that k − m ≥ 2 , and that the vertex t c onne cts to k − m distinct neighb ors V T ⊆ V . L et G ′ b e the sub gr aph of G induc e d by { s } ∪ K ∪ V . Then G ′ c ontains a tr e e H satisfying fol lowing pr op erty: Cal l a le af vertex v of H “go o d” if v ∈ V T and v is c onne cte d via a p ath of e dges in G ′ \ H to either a differ ent vertex u ∈ V T or to s . Then H has at le ast 2 go o d le af vertic es. Pr o of. Since the subgraph induced by K ∪ V is connected, there exists at least one tree in this subgraph con taining V T . Among all suc h trees, let H be one ha ving the smallest n um b er of vertices. Then each leaf of H b elongs to V T , and all edges inciden t to s b elong to G ′ \ H . Let V S ⊆ V b e the vertices neighboring s . Note that V = V S ∪ V T , and V S ∩ V T ma y be non-empt y as a v ertex can neighbor b oth s and t . Let us call an edge of G ′ \ H a “non-tree edge”. Since |V T | = deg( t ) = k − m , the v ertex t has only a single edge in G connecting to each v ertex of V T . Then, since every v ∈ V T has degree at least 3 in G , ev ery leaf v of H must ha v e at least one inciden t non-tree edge ( v , w ). If w = s , w ∈ V T , or w ∈ V S (and hence connects to s via a non-tree edge), then b y definition v is go o d. If w ∈ K and w / ∈ H , then letting u ∈ V S ∪ V T b e any neighbor of w different from v , ( w , u ) must b e a non-tree edge, so v is also go o d. Th us w e ha v e established the following claim: If a leaf vertex v of H is not goo d, then there exists a non-tree edge ( v , w ) where w ∈ H ∩ K . Note that if H consists of t w o leaf v ertices connected by a single edge, or if H is a star consisting of a single vertex u ∈ K ∪ V connecting b y an edge to all other v ertices of H (which are lea v es and hence b elong to V T ), then no such non-tree edge ( v , w ) can exist. Then all leaf vertices of H are go od, so in particular H has at least 2 go od lea ves, and the lemma holds for H . F or any other tree H , there is a path of at least 3 distinct edges in H , sa y u − r − r ′ − v . Let us call the tw o components of H connected to r and to r ′ up on remo ving the edge r − r ′ the “sub-trees ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 83 ro oted at r and r ′ ” resp ectiv ely . Let L b e the set of leav es in the sub-tree ro oted at r that hav e maximal distance from r , and define similarly L ′ for the sub-tree ro oted at r ′ . If L and L ′ eac h con tain a go o d leaf, then again H has at least 2 go o d leav es, so the lemma holds for H . Otherwise, at least one of L and L ′ — sa y , L — do es not contain any go o d leaf. W e then consider the following mo dification of H : T ake any v ∈ L , let p ( v ) b e its parent in the sub-tree ro oted at r , and let L v b e the set of all children of p ( v ) (including v ). Eac h v ′ ∈ L v is a leaf vertex that is not go o d, so by the ab ov e claim, for each v ′ ∈ L v , there exists a non-tree edge ( v ′ , w ( v ′ )) where w ( v ′ ) ∈ H ∩ K . W e consider the graph ˜ H that remo v es the edges { ( v ′ , p ( v )) : v ′ ∈ L v } from H and adds the new edges { ( v ′ , w ( v ′ )) : v ′ ∈ L v } . Note that ˜ H is a tree ha ving the same v ertices as H , that each v ′ ∈ L v and p ( v ) itself are no w leav es of ˜ H , and that eac h leaf of ˜ H other than p ( v ) was a leaf of H and hence b elongs to V T . W e m ust ha v e p ( v ) ∈ V T , because otherwise w e ma y remov e p ( v ) from ˜ H to obtain a smaller tree con taining V T , contradicting that H was suc h a tree ha ving the smallest num ber of v ertices. Th us all leav es of ˜ H b elong to V T . F urthermore, each v ′ ∈ L v connects to p ( v ) b y an edge not b elonging to ˜ H , so p ( v ) and all v ertices of { v ′ ∈ L v } are goo d lea v es of ˜ H . Th us ˜ H has at least 2 go o d leav es, so the lemma holds for ˜ H . □ Pr o of of L emma 5.9 . Consider any tensor net w ork ( G, f ) ∈ G given in Lemma 5.10 . Recall that the vertex t has deg( t ) = k − m and label f ( t ) ∈ ( R n ) ⊗ k − m . W e will establish the b ound | v al( G, f ) | ≤ √ n k − m − 1 ∥ f ( s ) ∥ F ∥ f ( t ) ∥ ∞ Y v ∈K∪V ∥ f ( v ) ∥ ∞ Y e ∈E ∥ f ( e ) ∥ op . (5.12) Case 1: Supp ose that t has strictly few er than k − m distinct neighbors, i.e. that some tw o edges inciden t to t — say , its last tw o edges — connect to the same vertex of V . Consider the mo dified net w ork ( ˜ G, ˜ f ) that replaces the last t wo edges incident to t b y a single edge with lab el ˜ f ( e ) = I , and replaces f ( t ) by ˜ f ( t ) ∈ ( R n ) ⊗ k − m − 1 ha ving entries ˜ f ( t )[ i 1 , . . . , i k − m − 1 ] = f ( t )[ i 1 , . . . , i k − m − 2 , i k − m − 1 , i k − m − 1 ] . Then v al( G, f ) = v al( ˜ G, ˜ f ). Lemma 5.5 (with A = V and B = K ) implies that ˜ G still admits a ( s, t )-bip olar orientation. Then, denoting by ˜ E the edges of ˜ G , we hav e b y Lemma 5.4 that | v al( G, f ) | = | v al( ˜ G, ˜ f ) | ≤ ∥ ˜ f ( s ) ∥ F ∥ ˜ f ( t ) ∥ F Y v ∈K∪V ∥ ˜ f ( v ) ∥ ∞ Y e ∈ ˜ E ∥ ˜ f ( e ) ∥ op ≤ √ n k − m − 1 ∥ f ( s ) ∥ F ∥ f ( t ) ∥ ∞ Y v ∈K∪V ∥ f ( v ) ∥ ∞ Y e ∈E ∥ f ( e ) ∥ op , the second inequalit y applying ∥ ˜ f ( t ) ∥ F ≤ √ n k − m − 1 ∥ ˜ f ( t ) ∥ ∞ = √ n k − m − 1 ∥ f ( t ) ∥ ∞ . Case 2: Supp ose now that t has exactly k − m distinct neigh b ors, and k − m = 1. Let v t ∈ V b e the vertex adjacent to t . W e modify the netw ork ( G, f ) in to a new netw ork ( ˜ G, ˜ f ) via t w o steps: • W e contract f ( t ) ∈ R n b y removing f ( t ) and its incident edge, and replacing the diagonal lab el f ( v t ) ≡ diag ( v ) ∈ ( R n ) ⊗ deg( v ) b y ˜ f ( v t ) = diag( v ⊙ f ( t )) ∈ ( R n ) ⊗ deg( v ) − 1 . • Note that m ≥ 2, since k ≥ 3. Let e 1 , . . . , e m b e the ordered edges incident to s in G , and supp ose its (rank-1) lab el factorizes as f ( s ) = ˜ s 1 ⊗ . . . ⊗ ˜ s m . W e split s into tw o vertices s ′ , s ′′ of ˜ G , where s ′ is inciden t to e 1 with label ˜ f ( s ′ ) = ˜ s 1 , and s ′′ is inciden t to e 2 , . . . , e m with lab el ˜ f ( s ′′ ) = ˜ s 2 ⊗ . . . ⊗ ˜ s m . Then v al( G, f ) = v al( ˜ G, ˜ f ). W e claim that ( ˜ G, ˜ f ) admits a ( s ′ , s ′′ )-bip olar orientation. Indeed, if v t has an edge connecting to either s ′ or s ′′ in ˜ G , then Lemma 5.5 (with A = V and B = K ) applies to show ( ˜ G, ˜ f ) has a ( s ′ , s ′′ )-bip olar orientation. Otherwise, eac h edge of ˜ G incident to v t m ust connect to a distinct neighbor in K ∪ V , and there are at least 2 such edges, with each such neigh b or having degree ≥ 3. Then the graph ˜ G \ { v t } removing v t and its incident edges admits a 84 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES ( s ′ , s ′′ )-bip olar orientation b y Lemma 5.5 applied with A = V \ { v t } and B = K . Since v t has at least 2 distinct neigh b ors in ˜ G \ { v t } , this implies that ˜ G also admits a ( s ′ , s ′′ )-bip olar orientation , v erifying our claim. Then, denoting by ˜ E the edges of ( ˜ G, ˜ f ), we ha v e by Lemma 5.4 that | v al( G, f ) | = | v al( ˜ G, ˜ f ) | ≤ ∥ ˜ f ( s ′ ) ∥ F ∥ ˜ f ( s ′′ ) ∥ F Y v ∈K∪V ∥ ˜ f ( v ) ∥ ∞ Y e ∈ ˜ E ∥ ˜ f ( e ) ∥ ∞ ≤ ∥ f ( s ) ∥ F ∥ f ( t ) ∥ ∞ Y v ∈K∪V ∥ f ( v ) ∥ ∞ Y e ∈E ∥ f ( e ) ∥ ∞ , the second inequalit y applying ∥ ˜ f ( s ′ ) ∥ F ∥ ˜ f ( s ′′ ) ∥ F = ∥ f ( s ) ∥ F and ∥ ˜ f ( v t ) ∥ ∞ ≤ ∥ f ( t ) ∥ ∞ ∥ f ( v t ) ∥ ∞ . Case 3: Supp ose t has exactly k − m distinct neighbors, and k − m ≥ 2. Let ∂ s and ∂ t denote the edges incident to s and t , resp ectively , where | ∂ s | = m and | ∂ t | = k − m . Let E K∪V denote all remaining edges, which b elong to the connected subgraph induced by K ∪ V . F or each edge e ∈ ∂ s ∪ ∂ t , denote by v e ∈ V the vertex other than { s, t } that is inciden t to e . As f ( s ) is rank-1, it admits a factorization f ( s ) = ⊗ e ∈ ∂ s ˜ s e for some m v ectors ( ˜ s e : e ∈ ∂ s ) in R n . Then, since f ( e ) = I for each e ∈ ∂ s ∪ ∂ t and f ( v ) is diagonal for eac h v ertex v ∈ K ∪ V , we may express v al( G, f ) as v al( G, f ) = X i v ∈ [ n ]: v ∈K∪V f ( t )[ i v e : e ∈ ∂ t ] Y e ∈ ∂ s ˜ s e [ i v e ] Y v ∈K∪V f ( v )[ i v , . . . , i v ] Y e =( u,v ) ∈E K∪V f ( e )[ i u , i v ] where the summation is ov er one distinct index i v ∈ [ n ] for each vertex of K ∪ V . Let V T = { v e : e ∈ ∂ t } ⊆ V and V S = { v e : e ∈ ∂ s } ⊆ V denote the sets of v ertices adjacen t to t and s , where V = V T ∪ V S and |V T | = k − m ≥ 2. (W e recall that V S ∩ V T ma y b e non- empt y .) W e will next partition the edges of E K∪V ∪ ∂ s — including those inciden t to s but not those incident to t — into tw o non-empty disjoint sets E 1 ∪ E 2 , so as to bound the abov e expression using Cauch y-Sch w arz. Giv en any suc h partition E 1 ∪ E 2 , let us split the v ertices of K ∪ V that are inciden t to E 1 in to tw o t yp es: Let U ∗ 1 ⊆ K ∪ V b e those vertices inciden t to some edge of E 1 , but not b elonging to V T or inciden t to any edge of E 2 . Let U 1 ⊆ K ∪ V b e those remaining v ertices which are incident to an edge of E 1 and either b elong to V T or are incident to an edge of E 2 . W e define a tensor U 1 ∈ ( R n ) ⊗|U 1 | of order |U 1 | , whic h sums o v er v ertices of U ∗ 1 and is indexed by ( i v : v ∈ U 1 ) corresp onding to v ertices of U 1 . This tensor U 1 has entries U 1 [ i v : v ∈ U 1 ] = X i v ∈ [ n ]: v ∈U ∗ 1 Y e ∈ ∂ s ∩E 1 ˜ s e [ i v e ] Y v ∈U ∗ 1 f ( v )[ i v , . . . , i v ] Y e ∈ ( u,v ) ∈E 1 ∩E K∪V f ( e )[ i u , i v ] . W e define analogously the sets U ∗ 2 , U 2 ⊆ K ∪ V and tensor U 2 for E 2 . Note that K ∪ V is then the disjoin t union of the three sets U ∗ 1 , U ∗ 2 , and U 1 ∪ U 2 , where V T ⊆ U 1 ∪ U 2 b ecause eac h v ertex of V T m ust be inciden t to at least one edge of the connected subgraph K ∪ V . Then v al( G, f ) = X i v ∈ [ n ]: v ∈U 1 ∪U 2 f ( t )[ i v e : e ∈ ∂ t ]  Y v ∈U 1 ∪U 2 f ( v )[ i v , . . . , i v ]  U 1 [ i v : v ∈ U 1 ] U 2 [ i v : v ∈ U 2 ] . Applying Cauch y-Sc h w artz o v er the indices of U 1 ∪ U 2 , | v al( G, f ) | ≤ ∥ f ( t ) ∥ ∞  Y v ∈U 1 ∪U 2 ∥ f ( v ) ∥ ∞   X i v ∈ [ n ]: v ∈U 1 ∪U 2 U 1 [ i v : v ∈ U 1 ] 2  1 / 2  X i v ∈ [ n ]: v ∈U 1 ∪U 2 U 2 [ i v : v ∈ U 2 ] 2  1 / 2 . Note that ( U 1 ∪ U 2 ) \ U 1 consists of the v ertices of V T that are not inciden t to an y edge of E 1 . Let k 1 b e the num ber of suc h vertices, and define similarly k 2 as the num b er of vertices of V T that are ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 85 not incident to an y edge of E 2 . Then the ab o ve yields the b ound | v al( G, f ) | ≤ √ n k 1 + k 2 ∥ f ( t ) ∥ ∞  Y v ∈U 1 ∪U 2 ∥ f ( v ) ∥ ∞  ∥ U 1 ∥ F ∥ U 2 ∥ F . (5.13) T o further b ound ∥ U 1 ∥ F , consider the connected components of the subgraph induced b y E 1 (i.e. the edges E 1 , their incident v ertices U 1 , U ∗ 1 , and the v ertex s if E 1 ∩ ∂ s is non-empty). If E 1 ∩ ∂ s is non-empt y , and the comp onen t of s breaks in to j pieces up on removing s , let us further split this comp onen t of s in to j separate comp onents, each containing a copy of s . Let C b e the set of all comp onen ts. F or each comp onent C ∈ C , let U ∗ C ⊆ U ∗ 1 , U C ⊆ U 1 , E C ⊆ E 1 ∩ E K∪V , and ∂ s C ⊆ ∂ s b e the vertices and edges b elonging to C , where we set ∂ s C = ∅ if C do es not con tain (a copy of ) s . Note that U C m ust b e non-empty , b ecause Lemma 5.10 ensures that the subgraph ( K ∪ V , E K∪V ) is connected. Then U 1 factorizes as U 1 = O C ∈C U C where each U C ∈ ( R n ) ⊗|U C | is the tensor indexed b y v ertices of U C , with en tries U C [ i v : v ∈ U C ] = X i v ∈ [ n ]: v ∈U ∗ C Y e ∈ ∂ s C ˜ s e [ i v e ] Y v ∈U ∗ C f ( v )[ i v , . . . , i v ] Y e =( u,v ) ∈E C f ( e )[ i u , i v ] . Th us, ∥ U 1 ∥ F = Y C ∈C ∥ U C ∥ F = Y C ∈C  sup T ′ ∈ ( R n ) ⊗|U C | : ∥ T ′ ∥ F =1 ⟨ U C , T ′ ⟩  . (5.14) F or eac h comp onent C ∈ C where C con tains (a cop y of ) s , let ˜ G b e the graph that adds a sink v ertex t ′ to C , connected by an edge to each vertex of U C . W e claim that ˜ G admits a ( s, t ′ )-bip olar orien tation: Let A be the v ertices of ˜ G \ { s, t ′ } that are connected to { s, t ′ } by an edge, and let B b e all remaining vertices of ˜ G \ { s, t ′ } . The subgraph induced by A ∪ B in ˜ G is connected, b y the construction of C as a single comp onent in C . By definition, any vertex v ∈ ˜ G ∩ U C is adjacen t to t ′ , hence v ∈ A . Recalling the definition of U 1 , note that an y vertex v ∈ ˜ G ∩ V T m ust b e in U C , hence also v ∈ A . Any vertex v ∈ ˜ G ∩ V S m ust either neighbor s in ˜ G (if the edge ( s, v ) b elongs to E 1 and hence to ˜ G ) or neigh bor t ′ in ˜ G (if ( s, v ) b elongs to E 2 and hence v ∈ U C ); thus also v ∈ A . Then ˜ G ∩ ( U C ∪ V ) ⊆ A , implying that B ⊆ U ∗ C ∩ K . An y such vertex u ∈ B has at least 3 inciden t edges, all of these edges b elong to ˜ G (since u ∈ U ∗ C ), and each such edge must connect to a distinct neighbor in V = V S ∪ V T (since u ∈ K ) whic h m ust b elong to A . Th us, ˜ G satisfies the conditions of Lemma 5.5 , implying that ˜ G has a ( s, t ′ )-bip olar orientation as claimed. W e observ e that ⟨ U C , T ′ ⟩ = v al( ˜ G, ˜ f ) for a lab eling ˜ f where ˜ f ( s ) = ⊗ e ∈ ∂ s C ˜ s e and ˜ f ( t ′ ) = T ′ . Then, applying Lemma 5.4 and ∥ ˜ f ( s ) ∥ F = Q e ∈ ∂ s C ∥ ˜ s e ∥ 2 , we get |⟨ U C , T ′ ⟩| = | v al( ˜ G, ˜ f ) | ≤ ∥ T ′ ∥ F Y e ∈ ∂ s C ∥ ˜ s e ∥ 2 Y v ∈U ∗ C ∥ f ( v ) ∥ ∞ Y e ∈E C ∥ f ( e ) ∥ op . (5.15) F or eac h C ∈ C where C do es not contain s , let ˜ G b e the graph that adds t w o vertices s ′ , t ′ to C , where s ′ connects by an edge to an y single vertex of U C , and t ′ connects by an edge to ev ery vertex of U C . The same arguments as ab o v e show that ˜ G satisfies the conditions of Lemma 5.5 (with A b eing the v ertices adjacent to { s ′ , t ′ } and B b eing the rest), and hence admits a ( s ′ , t ′ )-bip olar orien tation. W e observe that ⟨ U C , T ′ ⟩ = v al( ˜ G, ˜ f ) for a lab eling ˜ f where ˜ f ( s ) = (1 , 1 , . . . , 1) ∈ R n and ˜ f ( t ′ ) = T ′ . Applying Lemma 5.4 and ∥ ˜ f ( s ) ∥ F = √ n , we get |⟨ U C , T ′ ⟩| = | v al( ˜ G, ˜ f ) | ≤ √ n ∥ T ′ ∥ F Y v ∈U ∗ C ∥ f ( v ) ∥ ∞ Y e ∈E C ∥ f ( e ) ∥ op . (5.16) 86 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES Let m 1 , m 2 b e the num bers of connected comp onents in the subgraphs induced b y E 1 , E 2 that do not contain s . Then, applying the b ound ( 5.15 ) or ( 5.16 ) to eac h comp onent C ∈ C of ( 5.14 ), we obtain ∥ U 1 ∥ F ≤ √ n m 1 Y e ∈ ∂ s ∩E 1 ∥ ˜ s e ∥ 2 Y v ∈U ∗ 1 ∥ f ( v ) ∥ ∞ Y e ∈E K∪V ∩E 1 ∥ f ( e ) ∥ op , and similarly for ∥ U 2 ∥ F and m 2 . Applying these b ounds bac k to ( 5.13 ) and recalling that U ∗ 1 , U ∗ 2 , and U 1 ∪ U 2 form a partition of K ∪ V , | v al( G, f ) | ≤ √ n k 1 + k 2 + m 1 + m 2 ∥ f ( s ) ∥ F ∥ f ( t ) ∥ ∞ Y v ∈K∪V ∥ f ( v ) ∥ ∞ Y e ∈E K∪V ∥ f ( e ) ∥ op . Finally , to sho w ( 5.12 ), let us argue that we ma y choose the edge partition E 1 ∪ E 2 of E K∪V ∪ ∂ s to ensure that k 1 + k 2 + m 1 + m 2 ≤ k − m − 1. Let G ′ b e the subgraph induced b y { s } ∪ K ∪ V , i.e. con taining the v ertices { s } ∪ K ∪ V and edges E K∪V ∪ ∂ s . Since t has exactly k − m distinct neighbors in V T and k − m ≥ 2, Lemma 5.11 implies that there exists a tree H in G ′ with at least t wo go od lea v es in V T . Let G ′′ b e the graph G ′ remo ving all edges of H ; here G ′′ ma y b e disconnected and/or ha v e isolated vertices. By definition, each go o d leaf v ∈ V T is connected by a path in G ′′ either to another vertex u ∈ V T or to s . W e consider tw o cases: (a) Supp ose there is a go o d leaf v ∈ V T whose connected comp onent in G ′′ do es not contain s . Let E 1 b e all edges of this connected comp onent, and let E 2 b e all remaining edges of G ′ . Since v is connected b y a path in G ′′ to some other u ∈ V T , the (connected) subgraph induced by E 1 has at least tw o vertices of V T , so k 1 ≤ k − m − 2 and m 1 = 1. The subgraph induced by E 2 is also connected, b ecause all connected components of G ′′ m ust be connected via the remo v ed tree H . F urthermore, this subgraph contains b oth s and all vertices of V T , since E 2 con tains all edges of H . Th us k 2 = 0 and m 2 = 0, so k 1 + k 2 + m 1 + m 2 ≤ k − m − 1. (b) Supp ose all go od lea v es v ∈ V T b elong to the connected comp onen t of s in G ′′ . Again let E 1 b e all edges of this connected comp onent, and let E 2 b e all remaining edges of G ′ . The sub- graph induced by E 1 has at least t w o goo d lea v es in V T and also contains s , so k 1 ≤ k − m − 2 and m 1 = 0. The subgraph induced by E 2 again consists of a single connected comp onent which con tains all v ertices of V T , so k 2 = 0 and m 2 = 1. Thus, w e also ha v e k 1 + k 2 + m 1 + m 2 ≤ k − m − 1. This sho ws that ( 5.12 ) holds in all cases. The desired b ound ( 5.9 ) then follows from ( 5.12 ), ( 5.10 ), and |G | = |G ( l 1 , . . . , l k ) | in Lemma 5.10 , completing the pro of of Lemma 5.9 . □ Pr o of of Pr op osition 2.17 , Assumption 3 . Denote a l = ( a il ) n i =1 . W e may expand by m ultilinearit y of the mixed cumulan t ⟨ κ k ( g ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩ = ∞ X l 1 ,...,l k =1 D κ k ( a l 1 ⊙ ( X w ) ⊙ l 1 , . . . , a l k ⊙ ( X w ) ⊙ l k ) , s 1 ⊗ . . . ⊗ s m ⊗ T E = ∞ X l 1 ,...,l k =1 D ( a l 1 ⊗ . . . ⊗ a l k ) ⊙ κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k ) , s 1 ⊗ . . . ⊗ s m ⊗ T E = ∞ X l 1 ,...,l k =1 D κ k (( X w ) ⊙ l 1 , . . . , ( X w ) ⊙ l k ) , ( a l 1 ⊙ s 1 ) ⊗ . . . ⊗ ( a l m ⊙ s m ) ⊗ (( a l m +1 ⊗ . . . ⊗ a l k ) ⊙ T ) E . ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES 87 Then Lemma 5.9 implies ⟨ κ k ( g ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩ ≤ ∞ X l 1 ,...,l k =1 C l ( w ) ∥ X ∥ l op |G ( l 1 , . . . , l k ) | √ n k − m − 1 ∥ ( a l m +1 ⊗ . . . ⊗ a l k ) ⊙ T ∥ U ( l ) m Y i =1 ∥ a l i ⊙ s i ∥ 2 . In the definitions preceding ( 5.8 ), we hav e ∥ ¯ X ∥ op ≤ 1, ∥ K k ∥ op ≤ 1, and ∥ D k ∥ op ≤ ∥ K k ∥ op · max n i =1 P n j =1 | ¯ X [ i, j ] | k ≤ 1 for all k ≥ 2. Then each M ∈ M 0 has ∥ M ∥ op ≤ 1, implying that also eac h M ∈ M = S j ≥ 0 M j has ∥ M ∥ op ≤ 1. Th us, sup x ∈U ( l ) ∥ x ∥ 2 ≤ 1. F urthermore I ∈ M ( l ), so e 1 , . . . , e n ∈ U ( l ). Let us define e = (1 , . . . , 1) ∈ R n , and set U ′ ( l ) = { ( a / ∥ a ∥ ∞ ) ⊙ x : a ∈ { e , a 1 , . . . , a l } , x ∈ U ( l ) } . Then also e 1 , . . . , e n ∈ U ′ ( l ), sup x ∈U ′ ( l ) ∥ x ∥ 2 ≤ 1, and w e ha ve ∥ ( a l m +1 ⊗ . . . ⊗ a l k ) ⊙ T ∥ U ( l ) ≤ ∥ T ∥ U ′ ( l ) Q k i = k − m ∥ a l i ∥ ∞ . Applying this and ∥ a l i ⊙ s i ∥ 2 ≤ ∥ a l i ∥ ∞ ∥ s i ∥ 2 , ⟨ κ k ( g ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩ ≤ ∞ X l 1 ,...,l k =1 C l ( w ) ∥ X ∥ l op |G ( l 1 , . . . , l k ) |  k Y i =1 ∥ a l i ∥ ∞  √ n k − m − 1 ∥ T ∥ U ′ ( l ) m Y i =1 ∥ s i ∥ 2 . Under Prop osition 2.17 (a), noting that C l ( w ), ∥ X ∥ l op , |G ( l 1 , . . . , l k ) | , and Q k i =1 ∥ a l i ∥ ∞ are all b ounded by a constant, this sho ws for a constant C ≡ C ( k , D ) > 0 that |⟨ κ k ( g ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩| ≤ C √ n k − m − 1 ∥ T ∥ U ′ ( kD ) m Y i =1 ∥ s i ∥ 2 . Since also |U ′ ( k D ) | ≤ C ( k , D ) n for a constan t C ( k , D ) > 0, this shows Assumption 3 . Under Prop osition 2.17 (b), recalling the b ound ( 5.7 ), we ha ve |⟨ κ k ( g ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩| ≤ ∞ X l = k ( C k 2 ) l l ( 1 2 − β ) l √ n k − m − 1 ∥ T ∥ U ′ ( l ) m Y i =1 ∥ s i ∥ 2 | {z } := F ( l ) . Note that in the definitions preceding ( 5.8 ), the n um b er of distinct non-zero matrices in M ( l ) is at most the n um b er of strings in the symbols I , ¯ X , ¯ X ∗ , D 2 , K 2 , ⊙ , × , ( , ) of length C l , for some absolute constan t C > 0. Then for some constan ts C ′ , C ′′ > 0, w e ha v e |M ( l ) | ≤ C ′ l and |U ′ ( l ) | ≤ C ′′ l . F or a sufficiently large constan t C 0 > 0, let us set L = ⌊ C 0 k log n + k C 0 ⌋ , U ≡ U ′ ( L ) . Then |U | ≤ n C for a constan t C ≡ C ( C 0 , k ) > 0. F or all l > L , recalling that sup x ∈U ′ ( l ) ∥ x ∥ 2 ≤ 1, w e ma y b ound ∥ T ∥ U ′ ( l ) ≤ ∥ T ∥ F ≤ √ n k − m ∥ T ∥ ∞ ≤ √ n k ∥ T ∥ U . This giv es X l>L F ( l ) ≤ √ n k · √ n k − m − 1 ∥ T ∥ U m Y i =1 ∥ s i ∥ 2 × X l>L ( C k 2 ) l l ( 1 2 − β ) l . Under the given condition β > 1 / 2, choosing a large enough constan t C 0 > 0 that defines L ensures P l>L ( C k 2 ) l l ( 1 2 − β ) l < √ n − k , and th us X l>L F ( l ) < √ n k − m − 1 ∥ T ∥ U m Y i =1 ∥ s i ∥ 2 . 88 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES F or l ≤ L , since U ′ ( l ) ⊆ U , we may b ound ∥ T ∥ U ′ ( l ) ≤ ∥ T ∥ U . Then for a constan t C ( k , β ) > 0, L X l = k F ( l ) ≤ √ n k − m − 1 ∥ T ∥ U m Y i =1 ∥ s i ∥ 2 ∞ X l =1 ( C k 2 ) l l ( 1 2 − β ) l ≤ C ( k , β ) √ n k − m − 1 ∥ T ∥ U m Y i =1 ∥ s i ∥ 2 . This implies |⟨ κ k ( g ) , s 1 ⊗ . . . ⊗ s m ⊗ T ⟩| ≤ ( C ( k , β ) + 1) √ n k − m − 1 ∥ T ∥ U Q m i =1 ∥ s i ∥ 2 , which again sho ws Assumption 3 . □ 5.4. Random features tilt. T o pro ve Prop osition 2.18 , we use tec hniques dev elop ed in construc- tiv e field theory . In particular, w e follow a similar approach as outlined in [ Gur14 , Section A]. The main technical to ol we rely on is the univ ersal Brydges-Kennedy-Abdesselam-Riv asseau forest form ula [ AR95 , Theorem II I.1], whic h we state here for the reader’s con v enience: Lemma 5.12 (BKAR forest formula) . F or some l ≥ 1 , let f : [0 , 1] l ( l − 1) / 2 → R b e a smo oth function. Then f (1 , . . . , 1 | {z } l ( l − 1) / 2 ) = X G ∈F l Z [0 , 1] l ( l − 1) / 2 ∂ |E ( G ) | Q ( i,j ) ∈E ( G ) ∂ y ij f ( y )      y = y ( G,u ) Y i q is p ositive-semidefinite for al l 0 ≤ u ij ≤ 1 . W e also need the following simple iden tit y . Lemma 5.13. Fix any l ≥ 1 , and let Y = ( y ij ) 1 ≤ i,j ≤ l ∈ R l × l b e any p ositive-semidefinite matrix. F or a smo oth function Φ : R nl → R define d by ( w 1 , . . . , w l ) ∈ R nl 7→ Φ( w 1 , . . . , w l ) , denote ⟨∇ w i , ∇ w j ⟩ Φ = n X α =1 ∂ 2 ∂ w i [ α ] ∂ w j [ α ] Φ . If ( w 1 , . . . , w l ) ∼ N (0 , Y ⊗ I n ) , then for any i < j , ∂ ∂ y ij E [Φ( w 1 , . . . , w l )] = E [ ⟨∇ w i , ∇ w j ⟩ Φ( w 1 , . . . , w l )] . Pr o of. Since the cov ariance Y ⊗ I n can b e singular, w e shall pro ve this via the F ourier transform. Let w = ( w 1 , . . . , w l ) ∼ N (0 , Y ⊗ I n ). Its c haracteristic function, for any t = ( t 1 , . . . , t l ) ∈ R nl , is giv en b y µ w ( t ) = E exp( i ⟨ t , w ⟩ ) = exp  − 1 2 t ∗ ( Y ⊗ I n ) t  = exp   − 1 2 l X i =1 y ii ∥ t i ∥ 2 2 − X i 0 such that if max( | λ | , ∥ t ∥ 2 ) < c ∗ and P ∞ l =1 P G ∈T l 1 l ! | F ( G, λ, t ) | < ∞ , then log E w ∼ N (0 ,I n ) exp Φ( w , t , λ ) = ∞ X l =1 X G ∈T l 1 l ! F ( G, λ, t ) . Pr o of. T o b egin, w e expand E exp Φ( w , t , λ ) as E exp Φ( w , t , λ ) = ∞ X l =0 1 l ! E Φ( w , t , λ ) l . (5.18) This exc hange of summation and exp ectation is justified b y the F ubini-T onelli theorem: Since w ∼ N (0 , I n ), w e ha ve by [ ABW17 , eq. (2.4)] that under the conditions of Prop osition 2.18 and 90 ANISOTROPIC LOCAL LA W F OR NON-SEP ARABLE SAMPLE CO V ARIANCE MA TRICES the assumption ( 5.17 ), for some constan ts C , C ′ > 0, ( E | Φ( w , t , λ ) | l ) 1 /l = ( E | t ∗ w − λ 1 ∗ d σ ( X w ) | l ) 1 /l ≤ ( E | t ∗ w | l ) 1 /l + | λ | ( E | 1 ∗ d σ ( X w ) | l ) 1 /l ≤ C √ l ( ∥ t ∥ 2 + | λ | ( E ∥ X ∗ σ ′ ( X w ) ∥ l 2 ) 1 /l ) ≤ C ′ √ l ( ∥ t ∥ 2 + | λ | ) F or max( ∥ t ∥ 2 , | λ | ) < c ∗ sufficien tly small, this implies P ∞ l =0 1 l ! E | Φ( w , t , λ ) | l < ∞ , which justifies ( 5.18 ). W e ma y then write E Φ( w , t , λ ) l = E l Y i =1 Φ( w i , t , λ ) where w 1 = . . . = w l = w , i.e. ( w 1 , . . . , w l ) ∼ N (0 , 1 l 1 ∗ l ⊗ I n ). Let S l ⊂ [0 , 1] l × l b e the space of p ositiv e-semidefinite matrices such that S l [ i, i ] = 1 for all i ∈ [ l ] and S l [ i, j ] ∈ [0 , 1] for all i  = j , and define the function f : S l → R b y f ( Y ) = E Y l Y i =1 Φ( w i , t , λ ) . Iden tifying S l with a subset of [0 , 1] l ( l − 1) / 2 giv en b y its upper-triangular en tries, w e hav e by Lemmas 5.12 and 5.13 E Φ( w , t , λ ) l = f (1 l 1 ∗ l ) = X G ∈F l Z [0 , 1] l ( l − 1) / 2 ∂ |E ( G ) | Q ( i,j ) ∈E ( G ) ∂ y ij f ( Y )      Y = Y ( G,u ) Y i 0 such that if max( | λ | , ∥ t ∥ 2 ) < c ∗ , then log E exp Φ( w , t , λ ) = ∞ X l =1 1 l ! X G ∈T l X f ∈F ( G ) Z [0 , 1] l ( l − 1) / 2 E Y ( G,u ) [ val ( G, f )] Y i 0. Since G is a tree on l vertices, it has l − 1 edges, and P l i =1 deg( v i ) = 2 |E ( G ) | = 2( l − 1). Then this implies | v al( G, f ) | ≤ C ′ l max( ∥ t ∥ 2 , λ ) l (with probability 1 o v er ( w 1 , . . . , w l )), so also | F ( G, λ, t ) | ≤ Z [0 , 1] l ( l − 1) / 2 E Y ( G,u )   X f ∈F ( G ) | v al( G, f ) |   Y i 0. Using Cayley’s formula |T l | = l l − 2 and l ! ≥ ( l /e ) l , w e obtain for a constant C > 0 that ∞ X l =1 X G ∈T l 1 l ! | F ( G, λ, t ) | ≤ ∞ X l =1 1 l 2 C l max( ∥ t ∥ 2 , λ ) l . This is finite for max( ∥ t ∥ 2 , | λ | ) < c ∗ sufficien tly small, so Lemma 5.14 applies to sho w ( 5.20 ). All preceding argumen ts apply equally with z ∈ C n in place of t ∈ R n , so for eac h fixed λ with | λ | < c ∗ , the W eierstrass M-test implies P ∞ l =1 P G ∈T l 1 l ! F ( G, λ, z ) is uniformly con v ergent to an analytic limit F ( z ) ov er the disk { z ∈ C n : ∥ z ∥ 2 < c ∗ } . This implies the uniform conv ergence of the partial deriv atives in z , hence also ∂ ∂ t [ α 1 ] · · · ∂ ∂ t [ α k ] log E exp Φ( w , t , λ ) = ∞ X l =1 X G ∈T l 1 l ! ∂ ∂ t [ α 1 ] · · · ∂ ∂ t [ α k ] F ( G, λ, t ) . □ W e no w sho w that the quantit y to b e b ounded in Assumption 3 admits a series expansion in terms of exp ected tensor net w ork v alues. Lemma 5.16. In the setting of Pr op osition 2.18 , ther e exist a c onstant λ ∗ > 0 such that for al l | λ | < λ ∗ the fol lowing holds: F or any k ≥ 3 , 1 ≤ m ≤ k − 1 , any s 1 , . . . , s m ∈ R n , and any T ∈ ( R n ) ⊗ k − m , we have the c onver gent series exp ansion ⟨ κ k ( g ) , s 1 ⊗ · · · ⊗ s m ⊗ T ⟩ = ∞ X l = k +1 1 l ! X G ∈T l X ( H,f ) ∈G ( G ) Z [0 , 1] l ( l − 1) / 2 E Y ( G,u ) [ val ( H , f )] Y i 0, completing the pro of. □ Ac kno wledgmen ts. RM w ould lik e to thank Garrett G. W en for man y insigh tful discussions on the random features model, and Jing Guo for a helpful discussion on elemen ts of graph theory . ZF w as supp orted in part b y NSF DMS-2142476 and a Sloan Researc h F ellowship. EP w as supp orted b y an NSERC Disco very Gran t RGPIN-2025-04643, an FR QNT-NSERC NO V A Grant, a CIF AR Catalyst Grant, and a gift from Go ogle Canada. REFERENCES 95 References [AZ24] Pierre Ab dalla and Nikita Zhiv oto vskiy . “Cov ariance estimation: Optimal dimension- free guarantees for adv ersarial corruption and hea vy tails”. In: Journal of the Eur o- p e an Mathematic al So ciety (2024). T o appear. [AR95] Ab delmalek Abdesselam and Vincent Riv asseau. “T rees, forests and jungles: A botan- ical garden for cluster expansions”. In: Constructive Physics R esults in Field The- ory, Statistic al Me chanics and Condense d Matter Physics . Ed. by Vincent Riv asseau. Berlin, Heidelb erg: Springer Berlin Heidelb erg, 1995, pp. 7–36. isbn : 978-3-540-49222- 1. [Ada15] Rados la w Adamczak. “A note on the Hanson-Wright inequality for random v ectors with dep endencies”. In: Ele ctr onic Communic ations in Pr ob ability 20.72 (2015), pp. 1– 13. [ABW17] Rados la w Adamczak, Witold Bednorz, and P a we l W olff. “Moment estimates implied b y mo dified log-Sob olev inequalities”. en. In: ESAIM: Pr ob ability and Statistics 21 (2017), pp. 467–494. doi : 10 . 1051 / ps / 2016030 . url : https : / / www . numdam . org / articles/10.1051/ps/2016030/ . [ALPT10] Rados la w Adamczak, Alexander E Litv ak, Alain Pa jor, and Nicole T omczak-Jaegermann. “Quan titativ e estimates of the conv ergence of the empirical co v ariance matrix in log- conca v e ensem bles”. In: Journal of the Americ an Mathematic al So ciety 23.2 (2010), pp. 535–561. [AH20] Ark a Adhik ari and Jiaoy ang Huang. “Dyson Brownian motion for general β and p oten tial at the edge”. In: Pr ob ability The ory and R elate d Fields 178 (2020), pp. 893– 950. doi : 10.1007/s00440- 020- 00992- 9 . [AEK17] Osk ari H Ajanki, L´ aszl´ o Erd˝ os, and T orben Kr ¨ uger. “Univ ersalit y for general Wigner- t yp e matrices”. In: Pr ob ability The ory and R elate d Fields 169.3 (2017), pp. 667–727. [AEK19] Osk ari H Ajanki, L´ aszl´ o Erd˝ os, and T orb en Kr ¨ uger. “Stability of the matrix Dyson equation and random matrices with correlations”. In: Pr ob ability The ory and R elate d Fields 173.1 (2019), pp. 293–373. [AEK18] Johannes Alt, L´ aszl´ o Erd˝ os, and T orb en Kr ¨ uger. “Local inhomogeneous circular la w”. In: The A nnals of Applie d Pr ob ability 28.1 (2018), pp. 148–203. [AEK20] Johannes Alt, L´ aszl´ o Erd˝ os, and T orb en Kr ¨ uger. “The Dyson equation with linear self-energy: sp ectral bands, edges and cusps”. In: Do cumenta Mathematic a 25 (2020), pp. 1421–1539. [AEKS21] Johannes Alt, L´ aszl´ o Erd˝ os, T orb en Kr ¨ uger, and Dominik Schr¨ oder. “Correlated ran- dom matrices: band rigidity and edge universalit y”. In: The Annals of Pr ob ability 49.2 (2021), pp. 963–1001. [BS99] ZD Bai and Jack W Silv erstein. “Exact separation of eigenv alues of large dimensional sample cov ariance matrices”. In: A nnals of pr ob ability (1999), pp. 1536–1555. [BS98] Zhi-Dong Bai and Jack W Silverstein. “No eigenv alues outside the supp ort of the limiting sp ectral distribution of large-dimensional sample cov ariance matrices”. In: The Annals of Pr ob ability 26.1 (1998), pp. 316–345. [BS10] Zhidong Bai and Jack W Silv erstein. Sp e ctr al Analysis of L ar ge Dimensional R andom Matric es . 2nd. New Y ork: Springer, 2010. [BY12] Zhidong Bai and Jian-feng Y ao. “On sample eigenv alues in a generalized spiked pop- ulation mo del”. In: Journal of Multivariate Analysis 106 (2012), pp. 167–177. [BZ08] Zhidong Bai and W ang Zhou. “Large sample cov ariance matrices without indepen- dence structures in columns”. In: Statistic a Sinic a 18.2 (2008), pp. 425–442. 96 REFERENCES [BBP05] Jinho Baik, G ´ erard Ben Arous, and Sandrine P´ ec h´ e. “Phase transition of the largest eigen v alue for nonn ull complex sample cov ariance matrices”. In: The Annals of Pr ob- ability 33.5 (2005), pp. 1643–1697. [BPZ15] Zhigang Bao, Guangming Pan, and W ang Zhou. “Universalit y for the largest eigen- v alue of sample co v ariance matrices with general p opulation”. In: The A nnals of Statistics 43.1 (2015), pp. 382–421. [BX25] Zhigang Bao and Xiao cong Xu. “Extreme eigen v alues of log-concav e ensemble”. In: A nnales de l’Institut Henri Poinc ar e (B) Pr ob abilites et statistiques 61.1 (2025), pp. 155–184. [BLL T20] Peter L. Bartlett, Philip M. Long, G´ ab or Lugosi, and Alexander Tsigler. “Benign o v erfitting in linear regression”. In: Pr o c. Natl. A c ad. Sci. U.S.A. 117.48 (2020), pp. 30063–30070. doi : 10 . 1073 / pnas . 1907378117 . url : https : / / doi . org / 10 . 1073/pnas.1907378117 . [BHMM19] Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. “Reconciling modern mac hine-learning practice and the classical bias–v ariance trade-off”. In: Pr o c. Natl. A c ad. Sci. U.S.A. 116.32 (2019), pp. 15849–15854. doi : 10.1073/pnas.1903070116 . url : https://doi.org/10.1073/pnas.1903070116 . [BN12] Floren t Benayc h-Georges and Ra j Rao Nadakuditi. “The singular v alues and vec- tors of low rank perturbations of large rectangular random matrices”. In: Journal of Multivariate Analysis 111 (2012), pp. 120–135. [Ben20] Lucas Benigni. “Eigenv ectors distribution and quantum unique ergo dicit y for de- formed Wigner matrices”. In: Annales de l’Institut Henri Poinc ar ´ e Pr ob abilit´ es et Statistiques 56.4 (2020), pp. 2822–2867. doi : 10.1214/20- AIHP1060 . [BP21] Lucas Benigni and Sandrine P ´ ech ´ e. “Eigenv alue distribution of some nonlinear mo dels of random matrices”. In: Ele ctr onic Journal of Pr ob ability 26 (2021), pp. 1–37. [BEK+14] Alex Blo emendal, L´ aszl´ o Erdos, Antti Kno wles, Horng-Tzer Y au, and Jun Yin. “Isotropic lo cal laws for sample cov ariance and generalized Wigner matrices”. In: Ele ctr on. J. Pr ob ab 19.33 (2014), pp. 1–53. [Bou22] Paul Bourgade. “Extreme gaps b et w een eigen v alues of Wigner matrices”. In: Journal of the Eur op e an Mathematic al So ciety 24.8 (2022), pp. 2823–2873. doi : 10 . 4171 / JEMS/1141 . url : https://doi.org/10.4171/JEMS/1141 . [BF22] P aul Bourgade and Hugo F alconet. “Liouville quan tum gra vit y from random matrix dynamics”. In: arXiv pr eprint arXiv:2206.03029 (2022). [CES24] Giorgio Cip olloni, L´ aszl´ o Erd˝ os, and Dominik Sc hr¨ oder. “Mesoscopic central limit theorem for non-Hermitian random matrices”. In: Pr ob ability The ory and R elate d Fields 188 (2024), pp. 1131–1182. doi : 10.1007/s00440- 023- 01229- 1 . [CL22] Romain Couillet and Zhenyu Liao. R andom Matrix Metho ds for Machine L e arning . Cam bridge Universit y Press, 2022. [DLM24] Leonardo Defilippis, Bruno Loureiro, and Theo dor Misiakiewicz. “Dimension-free de- terministic equiv alents and scaling la ws for random feature regression”. In: arXiv pr eprint (2024). [DKRS23] Ilias Diak onikolas, Daniel M. Kane, Lisheng Ren, and Y uxin Sun. “SQ Lo wer Bounds for Non-Gaussian Comp onent Analysis with W eak er Assumptions”. In: A dvanc es in Neur al Information Pr o c essing Systems . V ol. 36. 2023. url : https://proceedings. neurips.cc/paper_files/paper/2023/hash/0d00a699f60e642b310eb04b76cc7731- Abstract- Conference.html . [DKPP24] Ilias Diak onik olas, Sushrut Karmalk ar, Spencer Pang, and Aaron P otec hin. “Sum-of- Squares Lo w er Bounds for Non-Gaussian Comp onen t Analysis”. In: arXiv pr eprint arXiv:2410.21426 (2024). T o app ear in FOCS 2024. doi : 10 . 48550 / arXiv . 2410 . 21426 . url : https://arxiv.org/abs/2410.21426 . REFERENCES 97 [DLMY23] Sofiia Dub ov a, Y ue M Lu, Benjamin McKenna, and Horng-Tzer Y au. “Universalit y for the global sp ectrum of random inner-product k ernel matrices in the p olynomial regime”. In: arXiv pr eprint arXiv:2310.18280 (2023). [El 10] Noureddine El Karoui. “The sp ectrum of kernel random matrices”. In: The A nnals of Statistics 38.1 (2010), pp. 1–50. [EKY13] L´ aszl´ o Erd˝ os, Antti Knowles, and Horng-Tzer Y au. “Averaging fluctuations in resol- v en ts of random band matrices”. In: A nnales Henri Poinc ar´ e 14.8 (2013), pp. 1837– 1926. [EKYY13a] L´ aszl´ o Erdos, An tti Knowles, Horng-Tzer Y au, and Jun Yin. “Spectral statistics of Erd˝ os–R ´ en yi graphs I: Lo cal semicircle law”. In: Annals of pr ob ability: A n official journal of the Institute of Mathematic al Statistics 41.3 (2013), pp. 2279–2375. [EKYY13b] L´ aszl´ o Erdos, An tti Kno wles, Horng-Tzer Y au, and Jun Yin. “The lo cal semicircle la w for a general class of random matrices”. In: Ele ctr on. J. Pr ob ab 18.59 (2013), pp. 1–58. [ER24] L´ aszl´ o Erd˝ os and V olo dymyr Riab o v. “Eigenstate thermalization hypothesis for Wigner- t yp e matrices”. In: Communic ations in Mathematic al Physics 405.12 (2024), p. 282. [ESY09a] L´ aszl´ o Erd˝ os, Benjamin Sc hlein, and Horng-Tzer Y au. “Lo cal semicircle la w and complete delo calization for Wigner random matrices”. In: Communic ations in Math- ematic al Physics 287.2 (2009), pp. 641–655. [ESY09b] L´ aszl´ o Erd˝ os, Benjamin Schlein, and Horng-Tzer Y au. “Semicircle law on short scales and delo calization of e igen v ectors for Wigner random matrices”. In: A nn. Pr ob ab. 37.1 (2009), pp. 815–852. [EYY12a] L´ aszl´ o Erd˝ os, Horng-Tzer Y au, and Jun Yin. “Bulk univ ersality for generalized Wigner matrices”. In: Pr ob ability The ory and R elate d Fields 154.1 (2012), pp. 341–407. [EYY12b] L´ aszl´ o Erd˝ os, Horng-Tzer Y au, and Jun Yin. “Rigidity of eigenv alues of generalized Wigner matrices”. In: A dvanc es in Mathematics 229.3 (2012), pp. 1435–1515. [FJ22] Zhou F an and Iain M Johnstone. “T racy-Widom at each edge of real co v ariance and MANO V A estimators”. In: Annals of Applie d Pr ob ability 32.4 (2022), p. 2967. [FMPW26] Zhou F an, Renyuan Ma, Elliot Paquette, and Zhichao W ang. “Spherical 4-Spin Mo del: Lo op Equations for Cumulan t Bounds”. T o appear. 2026. [FW20] Zhou F an and Zhichao W ang. “Spectra of the conjugate kernel and neural tangen t k er- nel for linear-width neural net works”. In: A dvanc es in neur al information pr o c essing systems 33 (2020), pp. 7710–7721. [GLK+20] F ederica Gerace, Bruno Loureiro, Florent Krzak ala, Marc M´ ezard, and Lenk a Zde- b oro v´ a. “Generalisation error in learning with random features and the hidden man- ifold mo del”. In: International Confer enc e on Machine L e arning (2020), pp. 3452– 3462. [GMKZ20] Sebastian Goldt, Marc M ´ ezard, Florent Krzak ala, and Lenk a Zdeb oro v´ a. “Mo delling the influence of data structure on learning in neural netw orks”. In: Physic al R eview X 10.4 (2020), p. 041044. [Gur14] Razv an Gurau. “Univ ersalit y for random tensors”. In: Ann. Inst. H. Poinc ar ´ e Pr ob ab. Statist. 50.4 (2014), p. 1474. [HMR T22] T rev or Hastie, Andrea Mon tanari, Saharon Rosset, and Ry an J Tibshirani. “Surprises in high-dimensional ridgeless least squares in terp olation”. In: A nnals of statistics 50.2 (2022), p. 949. [HL22] Hong Hu and Y ue M Lu. “Universalit y laws for high-dimensional learning with ran- dom features”. In: IEEE T r ansactions on Information The ory 69.3 (2022), pp. 1932– 1964. 98 REFERENCES [HLM24] Hong Hu, Y ue M Lu, and Theo dor Misiakiewicz. “Asymptotics of random feature regression b ey ond the linear scaling regime”. In: arXiv pr eprint (2024). [HL19] Jiao y ang Huang and Benjamin Landon. “Rigidit y and a mesoscopic central limit theorem for Dyson Brownian motion for general β and p otentials”. In: Pr ob ability The ory and R elate d Fields 175 (2019), pp. 209–253. doi : 10 . 1007 / s00440 - 018 - 0889- y . [JGH18] Arth ur Jacot, F ranck Gabriel, and Cl ´ ement Hongler. “Neural tangen t k ernel: Con- v ergence and generalization in neural net w orks”. In: A dvanc es in Neur al Information Pr o c essing Systems 31 (2018). [KY17] An tti Knowles and Jun Yin. “Anisotropic lo cal laws for random matrices”. In: Pr ob- ability The ory and R elate d Fields 169.1 (2017), pp. 257–352. [KNH24] Da vid Kogan, Sagnik Nandy, and Jiao y ang Huang. “Extremal Eigen v alues of Ran- dom Kernel Matrices with P olynomial Scaling”. In: arXiv pr eprint (2024). [LS16] Ji Oon Lee and Kevin Schnelli. “T racy–Widom distribution for the largest eigen v alue of real sample cov ariance matrices with general p opulation”. In: The Annals of Applie d Pr ob ability 26.6 (2016), pp. 3786–3839. doi : 10.1214/16- AAP1193 . [LSSY16] Ji Oon Lee, Kevin Sc hnelli, Ben Stetler, and Horng-Tzer Y au. “Bulk univ ersalit y for deformed Wigner matrices”. In: The Annals of Pr ob ability 44.3 (2016), pp. 2349–2425. [LLC18] Cosme Louart, Zhenyu Liao, and Romain Couillet. “A random matrix approac h to neural netw orks”. In: The Annals of Applie d Pr ob ability 28.2 (2018), pp. 1190–1248. [L Y25] Y ue M Lu and Horng-Tzer Y au. “An equiv alence principle for the spectrum of random inner-pro duct kernel matrices with p olynomial scalings”. In: The A nnals of Applie d Pr ob ability 35.4 (2025), pp. 2411–2470. [MR08] Jacques Magnen and Vincent Riv asseau. “Constructive ϕ 4 Field Theory without T ears”. In: Annales Henri Poinc ar ´ e 9 (2008), pp. 403–424. [MP67] Vladimir A Marcenko and Leonid Andreevic h P astur. “Distribution of eigen v alues for some sets of random matrices”. In: Mathematics of the USSR-Sb ornik 1.4 (1967), p. 457. [MM22] Song Mei and Andrea Montanari. “The generalization error of random features re- gression: Precise asymptotics and the double descen t curv e”. In: Commun. Pur e Appl. Math. 75.4 (2022), pp. 667–766. doi : 10. 1002/cpa .22008 . url : https: / /doi . org/ 10.1002/cpa.22008 . [MS12] James A Mingo and Roland Sp eicher. “Sharp b ounds for sums asso ciated to graphs of matrices”. In: Journal of F unctional Analysis 262.5 (2012), pp. 2272–2288. [MRSY19] Andrea Montanari, F eng Ruan, Y oungtak Sohn, and Jun Y an. “The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the o verpa- rameterized regime”. In: arXiv pr eprint arXiv:1911.01544 (2019). [PWZ25] P arthe Pandit, Zhichao W ang, and Yizhe Zh u. “Universalit y of k ernel random matri- ces and k ernel regression in the quadratic regime”. In: Journal of Machine L e arning R ese ar ch 26.224 (2025), pp. 1–73. [PP AP25] Courtney Paquette, Elliot Paquette, Ben Adlam, and Jeffrey P ennington. “Homoge- nization of SGD in high dimensions: exact dynamics and generalization prop erties”. In: Math. Pr o gr am. 214.1 (2025), pp. 1–90. doi : 10 . 1007 / s10107 - 024 - 02171 - 3 . url : https://doi.org/10.1007/s10107- 024- 02171- 3 . [P as72] L. A. P astur. “On the spectrum of random matrices”. In: The or etic al and Mathemat- ic al Physics 10.1 (1972), pp. 67–74. doi : 10.1007/BF01035768 . [PW17] Jeffrey Pennington and Pratik W orah. “Nonlinear random matrix theory for deep learning”. In: A dvanc es in neur al information pr o c essing systems 30 (2017). REFERENCES 99 [PY14] Natesh S Pillai and Jun Yin. “Universalit y of cov ariance matrices”. In: The Annals of Applie d Pr ob ability 24.3 (2014), p. 935. [RR07] Ali Rahimi and Benjamin Rech t. “Random features for large-scale kernel machines”. In: A dvanc es in neur al information pr o c essing systems 20 (2007). [R T86] Pierre Rosenstiehl and Rob ert E T arjan. “Rectilinear planar la y outs and bip olar orien tations of planar graphs”. In: Discr ete & Computational Ge ometry 1.4 (1986), pp. 343–353. [Rud87] W alter Rudin. R e al and c omplex analysis . McGra w-Hill, Inc., 1987. [SCDL23] Dominik Schr¨ oder, Hugo Cui, Daniil Dmitriev, and Bruno Loureiro. “Deterministic equiv alent and error univ ersality of deep random features learning”. In: International Confer enc e on Machine L e arning (2023), pp. 30285–30320. [Sil95] Jac k W Silv erstein. “Strong con vergence of the empirical distribution of eigenv alues of large dimensional random matrices”. In: Journal of Multivariate Analysis 55.2 (1995), pp. 331–339. [SB95] Jac k W Silverstein and Zhi Dong Bai. “On the empirical distribution of eigen v alues of a class of large dimensional random matrices”. In: Journal of Multivariate analysis 54.2 (1995), pp. 175–192. [SC95] Jac k W Silv erstein and Sang-Il Choi. “Analysis of the limiting sp ectral distribution of large dimensional random matrices”. In: Journal of Multivariate Analysis 54.2 (1995), pp. 295–309. [SW19] P er von So osten and Simone W arzel. “Random c haracteristics for Wigner matrices”. In: Ele ctr onic Communic ations in Pr ob ability 24.75 (2019), pp. 1–12. doi : 10.1214/ 19- ECP278 . [SV13] Nikhil Sriv astav a and Roman V ershynin. “Cov ariance estimation for distributions with 2+ ε momen ts”. In: The Annals of Pr ob ability 41.5 (2013), pp. 3081–3111. [SBGG24] Eszter Sz´ ekely, Lorenzo Bardone, F e derica Gerace, and Sebastian Goldt. “Learn- ing from higher-order correlations, efficien tly: h yp othesis tests, random features, and neural net w orks”. In: A dvanc es in Neur al Information Pr o c essing Systems . V ol. 37. 2024. url : https : / / proceedings . neurips . cc / paper _ files / paper / 2024 / hash / 8f8af4eebc4e50994e0490898d891c96- Abstract- Conference.html . [TV18] Y an Shuo T an and Roman V ersh ynin. “P olynomial Time and Sample Complexity for Non-Gaussian Comp onent Analysis: Sp ectral Methods”. In: Pr o c e e dings of the 31st Confer enc e On L e arning The ory . V ol. 75. Pro ceedings of Machine Learning Research. PMLR, 2018, pp. 498–534. url : https : / / proceedings . mlr . press / v75 / tan18a . html . [Tik18] Konstan tin Tikhomiro v. “Sample cov ariance matrices of heavy-tailed distributions”. In: International Mathematics R ese ar ch Notic es 2018.20 (2018), pp. 6254–6289. [TV04] An tonia M T ulino and Sergio V erd ´ u. R andom matrix the ory and wir eless c ommuni- c ations . V ol. 1. Now Publishers Inc, 2004. [V er12] Roman V ersh ynin. “Introduction to the non-asymptotic analysis of random matrices”. In: Compr esse d Sensing . Cambridge Universit y Press, 2012, pp. 210–268. [V oi87] Dan V oiculescu. “Multiplication of certain non-commuting random v ariables”. In: Journal of Op er ator The ory (1987), pp. 223–235. [WL19] Ch uang W ang and Y ue M. Lu. “The scaling limit of high-dimensional online indep en- den t comp onent analysis”. In: Journal of Statistic al Me chanics: The ory and Exp eri- ment (2019). doi : 10 .1088 / 1742- 5468/ ab39d6 . url : https: / / doi . org / 10 . 1088 / 1742- 5468/ab39d6 . [WWF24] Zhic hao W ang, Denny W u, and Zhou F an. “Nonlinear spiked cov ariance matrices and signal propagation in deep neural netw orks”. In: The Thirty Seventh Annual Confer enc e on L e arning The ory . PMLR. 2024, pp. 4891–4957. 100 REFERENCES [WHL+25] Garrett G. W en, Hong Hu, Y ue M Lu, Zhou F an, and Theo dor Misiakiewicz. “When do es Gaussian equiv alence fail and ho w to fix it: Non-universal b eha vior of random features with quadratic scaling”. In: arXiv pr eprint arXiv:2512.03325 (2025). [Wig55] Eugene P Wigner. “Characteristic vectors of b ordered matrices with infinite dimen- sions”. In: A nnals of Mathematics 62.3 (1955), pp. 548–564. [Wis28] John Wishart. “The generalised pro duct momen t distribution in samples from a nor- mal multiv ariate population”. In: Biometrika 20.1/2 (1928), pp. 32–52. [XHM+22] Lechao Xiao, Hong Hu, Theo dor Misiakiewicz, Y ue M Lu, and Jeffrey Pennington. “Precise learning curves and higher-order scaling limits for dot-pro duct k ernels”. In: A dvanc es in Neur al Information Pr o c essing Systems (2022). [Y as16] P av el Y asko v. “Necessary and sufficient conditions for the Marchenk o-P astur theo- rem”. In: (2016).

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment