Testing for high-dimensional geometry in random graphs
We study the problem of detecting the presence of an underlying high-dimensional geometric structure in a random graph. Under the null hypothesis, the observed graph is a realization of an Erd\H{o}s-R\'enyi random graph $G(n,p)$. Under the alternativ…
Authors: Sebastien Bubeck, Jian Ding, Ronen Eldan
T esting for high-dimensio nal geometry in random graphs S ´ ebastien Bub ec k ∗ Jian Ding † Ronen Eldan ‡ Mikl´ os Z. R´ acz § No v em b er 24, 2015 Abstract W e study the problem of detecting the presence of an underlying high-dimensiona l geometric structure in a random graph. Under the n ull hypothesis, the o bserved graph is a realization of an Erd˝ os-R´ enyi random gra ph G ( n, p ). Under the alternative, the graph is genera ted fro m the G ( n, p, d ) mo del, wher e each vertex corresp onds to a latent independent ra ndom vector uniformly distributed on the sphere S d − 1 , and tw o vertices are connected if the corr esp onding latent vectors are close enough. In the dense regime (i.e., p is a consta n t), we prop ose a near - optimal and computationally efficien t testing procedure based on a new qua ntit y whic h we ca ll signed triangles . The pr o of of the detection low er b ound is based on a new b ound on the total v ariation dis ta nce b etw een a Wishart ma trix and an appropriately normalized GOE matrix. In the spa rse regime, we ma ke a co njecture for the optimal detection b oundar y . W e conclude the pap er with some pr eliminary steps o n the problem of estimating the dimensio n in G ( n, p, d ). 1 In tro duction Extracting inf ormation from large graphs h as b ecome an imp ortan t statistical pr oblem, as net w ork data is n o w common in v arious fields. Whether one talks ab out so cial net w orks, gene netw orks, or (biologica l) n eural net w orks, in all cases there is a rapidly gro wing ind u stry dedicated to extracting kno wledge from these graphs. A particularly imp ortan t task in this spirit is to learn a useful r e pr esentation of th e v ertices, that is, a mapping f rom the ve rtices to some metric space (usually R d ). In this pap er, w e tak e a step b ac k and study the h yp othesis testing problem that underlies these in v estigations: giv en a large graph, one wan ts to tell if the obs erv ed connections r esult fr om a laten t geometrical structur e in the v ertices, or if they are pu rely random. The n ull hypothesis that w e consider is that the observ ed graph G on n vertic es h as b een generated by the standard Erd˝ os-R´ en yi rand om graph G ( n, p ) [17], where eac h edge app ears inde- p endently with probabilit y p : H 0 : G ∼ G ( n, p ) . On the other hand , for the alternativ e, we consider the simp lest mo d el of a random geometric graph—recall that a geometric graph is suc h that eac h v ertex is lab eled with a p oint in some metric space, and an edge is present b et w een t w o vertice s if the d istance b et w een the corresp ond ing lab els is smaller than some p r esp ecified thresh old. W e fo cus on the case where the u nderlying metric space is the Eu clidean sphere S d − 1 = { x ∈ R d : k x k 2 = 1 } , and the laten t lab els are i.i.d. unif orm ∗ Microsof t Research and Princeton U niversi ty; sebubeck@micros oft.com . † Universit y of Chicago; jianding @galton.uchic ago.edu . ‡ Universit y of W ashington; roneneldan@gmai l.com . § Universit y of California, Berk eley; racz@stat.be rkeley.edu . 1 random vect ors in S d − 1 . W e denote this mo del by G ( n, p, d ), where p is the probabilit y of an edge b et we en tw o vertices ( p determines the thresh old d istance for connection), and thus H 1 : G ∼ G ( n, p, d ) . Sligh tly more formally , G ( n, p, d ) is defined as follo ws. Let X 1 , . . . , X n b e indep enden t random v ectors, un if orm ly d istr ibuted on S d − 1 . In G ( n, p, d ), distin ct vertic es i ∈ [ n ] and j ∈ [ n ] are connected b y an edge if and only if h X i , X j i ≥ t p,d , where the th reshold v al ue t p,d ∈ [ − 1 , 1] is such that P ( h X 1 , X 2 i ≥ t p,d ) = p . In contrast to most previous w orks (see Section 1.1), the fo cus of this pap er is on the high- dimensional situation, where d can scale as a fu n ction of n . This p oint of view is in line w ith recen t adv a nces in all areas of applied mathematics, and in particular statistics and learning theory , where high-dimensional feature sp aces are b ecoming the n ew norm. W e also consider b oth the dense regime, w h ere p is a constan t indep enden t of n , and the sparse regime, w here p = c/n for some constan t c . A natural test to unco v er geometric stru cture is to coun t th e n umb er of triangles in G . In deed, in a purely random scenario, vertex u b eing connected to b oth v an d w sa ys nothing ab out wh ether v and w are connected. On the other hand, in a geometric setting this would imply that v and w are close to eac h other due to the triangle inequalit y , thus increasing the probability of a connection b et we en them. This, in turn , imp lies that the exp ected n umb er of triangles is larger in the geometric setting, giv en the same edge density . More broadly , the num b er of triangles is a basic statistic in the analysis of netw orks, as it giv es an in d ication of the d ep endencies in the netw orks, see, e.g., [19] and the r eferences therein. One of the main resu lts of this pap er is that in the dense regime, there are muc h more p o we rfu l statistics than tr iangles to d istinguish b etw ee n G ( n, p ) and G ( n, p, d ). In particular, we p rop ose a near-optimal statistic based on a new quantit y which we call signe d triangles . T o p u t it succin tly , if A d en otes the adjacency matrix of a graph G , then the total num b er of triangles is equal to T r( A 3 ), wh ile the total “n um b er” of signed triangles is defined as T r ( A − p ( J − I )) 3 , wh ere I is the iden tit y m atrix, and J is the matrix with ev ery entry equal to 1. T he k ey insigh t motiv ating this defin ition is that the v ariance of signed triangles is dramatically smaller than the v ariance of triangles, d ue to the cancellations introd uced by the cen tering of the adjacency matrix. While b eing elemen tary , this idea of using cancellations o f v arious p atterns seems to b e n ew in the n et- w ork analysis literature. It could ha ve applications b ey ond the pr oblem studied in this pap er, in particular b ecause it is quite different f rom other p opular statistics, such as those coming f rom the metho d of momen ts [10] (where patterns are alw a ys count ed p ositv ely), or those with the fla v or of the clustering co efficient [28] (where ratios of counts are considered). Indeed, the key asp ect of signed triangles is that d ifferen t p atterns are rew eigh ted with p ositiv e and negativ e weig hts: triangles and induced single edges are counte d p ositiv ely , while ind uced wedges and indep endent sets on th ree v ertices are coun ted negativ ely . 1.1 Related work In the so cial net w orks literature the laten t metric space is referr ed to as the so cial sp ac e . S tarting with [20], th ere hav e b een numerous works on estimating p ositions in the so cial space for ind ivid- uals in a so cial netw ork. V arious mo dels are considered in th is literature, and interestingly some of them are v ery close ly related to G ( n, p, d ), see [2 0, Section 2.2]. Most pap ers fo cu s on v arious 2 appro ximations to the maxim um lik eliho o d estimator, and some theoretical results ha v e b een ob- tained with other metho d s such as coun ting the n umber of common n eigh b ors, see [25, 2]. In th e present pap er w e settle for a less am bitious goal, as we are not learning the representa tion bu t simply testing for the p resence of a meaningful rep r esen tation. On the other hand, w e obtain m uc h more precise r esults, such as an almost tight c haracterizatio n of couples ( n, d ) for wh ic h testing b et we en G ( n, p ) and G ( n, p, d ) is p ossible in th e dens e regime. Most imp ortan tly , th is p r o vides, to the b est of our knowledge , th e first resu lt for the h igh-dimensional s etting, as all p revious works on so cial s p ace in f erence were in the lo w-dimensional r egime (i.e., d is fi xed and n → ∞ ). In p robabilit y th eory , mo dels su c h as G ( n, p, d ) h a v e b een studied for a long time in the lo w- dimensional r egime, see, e.g., [24]. T he high-dimensional setting was first in v estiga ted r ecen tly in [13]. In this pap er it w as observed that with n fixed and d → ∞ , G ( n, p, d ) con v erges in total v a riation to G ( n, p ). In other words, if the dimension d is v ery large compared to n , then one cann ot distinguish betw een G ( n, p ) and G ( n, p, d ). Our main result is that, in fact , in the dense r egime, there is a phase transition wh en d is of order n 3 , with the total v ariat ion distance going from 1 to 0. In spite of p revious w orks, the high-dimensional G ( n, p, d ) remains m ysterious in man y w a ys. Essen tially , in the dense regime, the only graph parameter w hic h is w ell unders to o d is the clique n umber , d ue to the results of [13 , 6 ], while in th e sparse case basically n othing is kno wn. On e of the tec hnical con tributions of the p resen t p ap er is to compute r ather precisely the pr obabilit y of a triangle in the s parse case. This pap er can b e s een as part of a series of recen t p ap ers stu d ying h yp othesis testing in random graph m o dels [7 , 27, 12]. F or ins tance, Arias-Castro an d V erzelen consider the problem of testing for c o mmunity structur e , while we test for ge ometric structur e . More precisely , their n ull is iden tical to ours, that is G ∼ G ( n, p ), while for the alternativ e they consider the mo del G ( n, p, k , q ) whic h differs from G ( n, p ) by h a ving a rand om subset of k v ertices for wh ic h edges app ear b et w een them with probabilit y q . This problem is closely related to comm unit y detection in sto chastic blo c k mo d els, a problem whic h has recen tly attracted a lot of attent ion, see, e.g., [23, 1] and the references therein. In terestingly , when testing for comm unity structure, one of the main obstacles is computational rather than in formation-theoretic. F or example, it is obvious that on e can distinguish b et w een G ( n, 1 / 2) and G ( n, 1 / 2 , k, 1) as long as k ≫ log( n ), though wh en k = o ( √ n ), no p olynomial test is kno wn for this problem (referred to as the plante d cliq ue pr oblem ), see, e.g., [4, 9]. On the contrary , in th e cont ext of testing for geometrical structure, we show that p olynomial time method s are near-optimal, at least for the dense regime, since one can efficientl y compu te signed triangles. Th is is p erhaps su rprising, as th e worst-c ase v ersion of our p roblem, namely recognizing if a graph can b e realized as a geometric graph, is kn o wn to b e NP-h ard [11]. Finally , we also note that our new signed triangles statistic is closely related to the tensor introdu ced in [18] for the plan ted clique problem. While [18] computes the s p ectral norm of this tensor, here we simp ly sum its en tries (in the case p = 1 / 2). 1.2 Con tributions and conten t of the pap er The main ob jectiv e of th e pap er is to identify the b oun d ary of testabilit y b et we en G ( n, p ) and G ( n, p, d ), in b oth th e d ense and the sparse regimes. T o put it differently , we are in terested in studying the total v ariatio n d istance b et wee n these t w o mo dels, denoted by TV ( G ( n, p ) , G ( n, p, d )) . Recall that in [13] it w as pro v ed that, if n is fi x ed and d → ∞ , then TV ( G ( n, p ) , G ( n, p, d )) → 0 . 1 1 More precisely , th ey sh o w that this conv ergence happ ens when d is exp onential in n . 3 Giv en this resu lt, our question of interest b ecomes the follo wing: ho w large can d b e, as a f u nction of n , so that TV ( G ( n, p ) , G ( n, p, d )) remains b ounded a w a y from 0 (or ev en b ecomes close to 1)? The core of our con tribution to this question is summarized b y the f ollo wing theorem. Theorem 1 (a) L et p ∈ (0 , 1) b e fixe d, and assume that d/n 3 → 0 . Then TV ( G ( n, p ) , G ( n, p, d )) → 1 . (1) (b) L et c > 0 b e fixe d, and assume that d/ log 3 ( n ) → 0 . Then TV G n, c n , G n, c n , d → 1 . (2) (c) F urthermor e, if d/n 3 → ∞ , then sup p ∈ [0 , 1] TV ( G ( n, p ) , G ( n, p, d )) → 0 . (3) The results o f Th eorem 1(a ) and 1( c) tigh tly c haracterize the dense regime. W e conjecture that the result of T h eorem 1 (b ) is tight f or the sparse regime: Conjecture 1 L et c > 0 b e fixe d , and assume that d/ log 3 ( n ) → ∞ . Then TV G n, c n , G n, c n , d → 0 . In the f ollo wing w e outline the metho ds we use to pro ve Theorem 1. W e fir st discuss th e pro of of part (a ) of Theorem 1, where our main metho dological contribution lies. As p oin ted out ab o v e, a n atural test to consid er is the num b er of triangles: T ( G ) := X { i,j,k }∈ ( [ n ] 3 ) A i,j A i,k A j,k , (4) where A denotes the adjacency matrix of th e grap h G w ith v ertex set [ n ], an d s o A i,j = 1 if v ertices i and j are connected by an edge, and A i,j = 0 otherwise. An elemen tary calculation sho ws that E T ( G ( n, p )) = n 3 p 3 , wh ile V ar ( T ( G ( n, p ))) is of ord er n 4 . A k ey calculation of the pap er is to sho w that E T ( G ( n, p, d )) ≥ n 3 p 3 (1 + C p / √ d ) for some constant C p that dep ends only on p , see Lemma 1 . In tuitiv ely , this sho ws th at tr iangles h a v e some p o wer as long as | E T ( G ( n, p )) − E T ( G ( n, p, d )) | ≫ p V ar ( T ( G ( n, p ))) , whic h is equiv al ent to d ≪ n 2 . One of the main con tributions of the pap er is to introdu ce a statistic that is asymptotically p o w erful as long as d ≪ n 3 , w hic h we refer to as the n umb er of signe d triangles : τ ( G ) := X { i,j,k }∈ ( [ n ] 3 ) ( A i,j − p )( A i,k − p )( A j,k − p ) . (5) The key p oint of signed triangles is the reduction of v ariance. Namely , V ar ( τ ( G ( n, p ))) is of order n 3 , instead of n 4 for triangles. The improv emen t comes from the fact that 4-verte x subgraph s with at least 5 edges do n ot con tribute to V ar ( τ ( G ( n, p ))), b ut th ey do con tribute to V ar ( T ( G ( n, p ))). W e study τ in d etail in Section 3, where w e pr ov e the follo wing result (wh ich implies Theorem 1(a)). 4 Theorem 2 L et p ∈ (0 , 1) b e fixe d, and assume that d/n 3 → 0 . Then TV ( τ ( G ( n, p )) , τ ( G ( n, p, d ))) → 1 . In the sparse regime we analyze th e triangle stat istic and prov e the follo wing theorem, w hic h implies part (b) of Theorem 1. Theorem 3 L et c > 0 . If d/ log 3 ( n ) → 0 , then TV T G n, c n , T G n, c n , d → 1 . In contrast w ith th e dens e r egime, in the sp arse regime the signed triangle statistic τ do es n ot give significan tly m ore p o w er than the triangle sta tistic T . This is b ecause in the s p arse regime, with high probabilit y , the graph do es not cont ain any 4-v ertex su bgraph w ith at least 5 ed ges, w hic h is where the imp ro v emen t comes fr om in the d ense r egime. W e b elieve that the upp er b ound log 3 ( n ) in Theorem 3 cannot b e significant ly imp ro v ed, as stated ab ov e in Conjecture 1 . Th e main r eason for this conjecture is that, when d ≫ log 3 ( n ), G ( n, c/n ) and G ( n, c/n, d ) seem to b e lo cally equiv alen t (in particular, th ey b oth h a v e the same P oisson num b er of triangles asymp totically) . Th us the only wa y to distinguish b et w een th em wo uld b e to fi nd an emerge nt global prop ert y which is significan tly different under the tw o mo d els, bu t this s eems unlik ely to exist. The final r esult of Theorem 1, part (c), complemen ts part (a) by giving a matc hing low er b oun d. This b ound is also v a lid for th e s p arse regime, though in this case we b eliev e that it is not tight. The main id ea b ehind th e pro of of this r esu lt is to view the rand om graphs in qu estion as (essenti ally) the same function of appropr iate rand om matrices. On the one hand, the random geometric graph G ( n, p, d ) can b e v iewed as a fun ction of an n × n Wishart m atrix w ith d degrees of freedom—i.e., a matrix of inner pro d u cts of n d -dimensional Gaussian vect ors—denoted b y W ( n, d ). On the other hand, one can view G ( n, p ) as a fu n ction of an n × n GOE random matrix—i.e., a symm etric matrix with i.i.d. Gaussian en tries on and ab o v e the d iagonal—denoted b y M ( n ). Moreo v er, these t w o functions are essen tially the same. T heorem 1(c) th en follo ws from the follo wing result on rand om matrices, stating that, if d/n 3 is v ery large, then the Wish art matrix has appro ximately the same la w as an appropriately cen tered and scaled GOE rand om matrix. Theorem 4 L et I n denote the n × n identity matrix. If d/n 3 → ∞ , then TV W ( n, d ) , √ dM ( n ) + dI n → 0 . The random ensem bles W ( n, d ) and M ( n ) are defined more precisely in Section 5, where the ab ov e theorem is also pro v ed. Finally , in Section 6 w e touc h up on the follo wing question: f or whic h v alues of n and d can the dimension d b e reconstructed b y observing a sample of G ( n, p, d )? W e giv e the follo wing b ound for p = 1 / 2, w hic h can b e consid ered as a pro of of concept. Theorem 5 Ther e exists a u ni v ersal c onsta nt C > 0 , such that for al l inte gers n and d 1 < d 2 , one has TV ( G ( n, 1 / 2 , d 1 ) , G ( n, 1 / 2 , d 2 )) ≥ 1 − C d 1 n 2 . 5 This b ound is tigh t, as demonstrated by an older result of th e third named author [16], whic h states that when d ≫ n , the graphs G ( n, 1 / 2 , d ) and G ( n, 1 / 2 , d + 1) are in distinguishable. Theorem 6 Ther e exists a universal c onstant C > 0 such that for al l inte gers n < d , TV ( G ( n, 1 / 2 , d ) , G ( n, 1 / 2 , d + 1)) ≤ C s d + 1 d − n 2 − 1 . 2 Estimates for the num b er of triangles in a geometric graph The p oin t of this sect ion is to giv e a lo we r b ound for the exp ected num b er of triangles in the random geometric graph G ( n, p , d ), usin g ele menta ry estimates. T o th is end , let X 1 , X 2 , and X 3 b e indep enden t uniform ly distribu ted p oin ts in S d − 1 . Consid er the eve nt E = n h X 1 , X 2 i ≥ t p,d , h X 1 , X 3 i ≥ t p,d , h X 2 , X 3 i ≥ t p,d o that the corresp onding v ertice s form a tria ngle; the exp ected n um b er of triangles in G ( n, p, d ) is th us E T ( G ( n, p, d )) = n 3 P ( E ). Our main result of this section is the follo wing. Lemma 1 Ther e exists a universal c onsta nt C > 0 such that whenever p < 1 4 we have that P ( E ) ≥ C − 1 p 3 (log 1 p ) 3 / 2 √ d . (6) Mor e over, for every fixe d 0 < p < 1 , ther e exists a c onstant C p > 0 suc h that for al l d ≥ C − 1 p , P ( E ) ≥ p 3 1 + C p √ d . (7) Before we mo v e on to the p ro of, we need some preliminary r esults, for which w e n eed some additional notation. De note by ϕ ( x ) = 1 / √ 2 π e − x 2 / 2 the standard Gaussian density and b y Φ( x ) = R ∞ x ϕ ( z ) dz the asso ciated complemen tary cumulat iv e distribu tion fun ction (note that Φ is decreasing). Moreo v er, define f d ( x ) = Γ( d/ 2) Γ(( d − 1) / 2 ) √ π (1 − x 2 ) ( d − 3) / 2 , x ∈ [ − 1 , 1] . (8) A standard ca lculation giv es that f d is the densit y of a one-dimensional marginal of a uniform random p oin t on S d − 1 (see [26, Section 2]) . Throughout the pap er w e use the notation a ∧ b := min { a, b } . Lemma 2 Ther e exists a universal c onsta nt C > 0 such that for al l 0 < p ≤ 1 / 2 we have that C − 1 ( 1 2 − p ) r log(1 /p ) d ! ∧ (1 / 2) ≤ t p,d ≤ C r log(1 /p ) d . (9) and f d − 1 ( t p,d ) ≥ C − 1 dpt p,d . (10) 6 Pro of W e ma y assu me that d ≥ 4; the statemen t is easily chec k ed w hen d ≤ 3. W e b egin by recalling the f ollo wing w ell-kno wn inequalities (see, e.g., [3, C hapter 6]): Γ( x ) √ x/ 100 ≤ Γ( x + 1 / 2) ≤ 2 √ x Γ( x ) , ∀ x > 1 . (11) Therefore, we h a v e for 0 < θ < 1 that Z 1 θ f d ( x ) dx ≤ 2 √ d √ 2 π Z 1 θ (1 − x 2 ) ( d − 3) / 2 dx = 2 √ d √ 2 π Z 1 − θ 0 e d − 3 2 log(1 − ( θ + x ) 2 ) dx ≤ 2 √ d √ 2 π (1 − θ 2 ) ( d − 3) / 2 Z 1 − θ 0 e d − 3 2 ( log(1 − ( θ + x ) 2 ) − log(1 − θ 2 ) ) dx ≤ 2 √ d √ 2 π (1 − θ 2 ) ( d − 3) / 2 Z 1 0 e − d − 3 2 ( θx + x 2 ) dx ≤ 8 1 ∧ 1 θ √ d (1 − θ 2 ) ( d − 3) / 2 . (12) T aking θ = t p,d th us giv es p ≤ 8(1 − t 2 p,d ) ( d − 3) / 2 and so log( p/ 8) ≤ − d − 3 2 t 2 p,d , whic h implies the upp er b ound in (9). Moreo v er, the inequalit y in (12) also giv es p ≤ 8 t p,d √ d (1 − t 2 p,d ) ( d − 3) / 2 (11) ≤ 8 C dt p,d f d ( t p,d ) for a u niv ersal constan t C > 0, w hic h pro v es (10). It remains to pro v e th e lo w er b ound in (9) . First assume 10 − 6 ≤ p ≤ 1 / 2. F or all 0 ≤ x ≤ 1 w e see that f d ( x ) ≤ 2 √ d , and so for all 0 ≤ θ ≤ 1 w e ha v e that R θ 0 f d ( x ) dx ≤ 2 θ √ d . Since R t p,d 0 f d ( x ) dx = 1 / 2 − p , we get that t p,d ≥ 1 / 2 − p 2 √ d as requ ir ed. Next, w e consider th e case for 0 < p ≤ 10 − 6 . In this case it suffices to assum e that d is larger than an absolute constant so that the left hand side of (9) is at most 1 / 2. By (11) , w e see that for 0 < θ ≤ 1 / 2 we hav e that Z 1 θ f d ( x ) dx ≥ √ d 400 Z 1 θ (1 − x 2 ) ( d − 3) / 2 dx ≥ √ d 400 Z θ + 1 √ d ∧ 1 θd θ e − x 2 ( d − 3) dx ≥ 10 − 4 1 ∧ 1 θ √ d (1 − θ 2 ) ( d − 3) / 2 . (13) Since R 1 t p,d f d ( x ) dx = p , the preceding inequalit y yields that t p,d ≥ ( c p log(1 /p ) /d ) ∧ (1 / 2) for some absolute constan t c > 0. This completes th e pr o of. W e are fin ally ready to p ro v e the main estimate of this section. Pro of of Lemma 1 First, assum e that d ≤ 1 4 log(1 /p ). Denot e b y ∠ ( · , · ) the geod esic distance on S d − 1 and set g ( θ ) := P ( ∠ ( X 1 , X 2 ) < θ ). A standard calculation shows that g ( θ ) = ( d − 1) π ( d − 1) / 2 Γ d +1 2 Z θ 0 sin( x ) d − 2 dx. 7 Using a change of v ariables and the fact that sin( x/ 2) ≥ sin( x ) / 2 for all x ∈ [0 , π ], we hav e th at g ( θ / 2) ≥ 2 − d g ( θ ) , ∀ θ < π . (14) Next obs erv e that one has, b y the triangle inequ alit y , E = { ∠ ( X 1 , X 2 ) < arccos( t p,d ) , ∠ ( X 1 , X 3 ) < arccos( t p,d ) , ∠ ( X 2 , X 3 ) < arccos( t p,d ) } ⊇ { ∠ ( X 1 , X 2 ) < 1 2 arccos( t p,d ) , ∠ ( X 1 , X 3 ) < 1 2 arccos( t p,d ) } . Since ∠ ( X 1 , X 2 ) and ∠ ( X 1 , X 3 ) are ind ep endent, we get th at P ( E ) ≥ g 1 2 arccos( t p,d ) 2 ≥ g (arccos( t p,d )) 2 2 − 2 d = p 2 2 − 2 d ≥ p 2 2 − 1 2 log(1 /p ) ≥ p 3 1 + c log 1 p 3 / 2 for a u niv ersal constan t c > 0, yielding the desired estimates. F rom this p oint on, w e ma y therefore assume that d ≥ 1 4 log (1 /p ). W e first pro v e the lemma under the assu mption that p < 1 2 . Th e pr o of of the case p ≥ 1 2 is similar, as explained b elow. Let Z b e th e fi rst co ordinate of a un if orm p oin t in S d − 1 , hence a random v ariable whose probabilit y densit y function is f d ( · ). Denote by E i,j the even t {h X i , X j i ≥ t p,d } and by E i,j ( x ) the ev en t {h X i , X j i = x } . In w hat follo ws, w e o ccasionally allo w ou r selv es to condition on the zero probabilit y ev en t E i,j ( x ). This sh ould b e u ndersto o d as conditioning on {h X i , X j i ∈ [ x − ε, x + ε ] } and then taking ε → 0, wh ic h is well- defined thanks to a simple con tin uit y argum ent. Note that P ( E 1 , 3 , E 2 , 3 | E 1 , 2 ( x )) is an increasing f u nction of x . Therefore, P ( E 1 , 3 , E 2 , 3 | E 1 , 2 ) − P ( E 1 , 3 , E 2 , 3 ) = P ( E 1 , 3 , E 2 , 3 | E 1 , 2 ) − 1 2 P ( E 1 , 3 , E 2 , 3 | h X 1 , X 2 i ≤ 0) − 1 2 P ( E 1 , 3 , E 2 , 3 | h X 1 , X 2 i ≥ 0) ≥ 1 2 { P ( E 1 , 3 , E 2 , 3 | E 1 , 2 ( t p,d )) − P ( E 1 , 3 , E 2 , 3 | E 1 , 2 (0)) } . (15) W rite Z 1 = h X 3 , X 1 i and Z 2 = * X 3 , Pro j X ⊥ 1 X 2 Pro j X ⊥ 1 X 2 + where Pro j X ⊥ 1 denotes the orthogonal p r o jection on to the orthogonal complemen t of X 1 . W e ha v e th at P ( E 1 , 3 , E 2 , 3 | E 1 , 2 ( t p,d )) − P ( E 1 , 3 , E 2 , 3 | E 1 , 2 (0)) = Z z 1 ≥ t p,d { P ( E 2 , 3 | Z 1 = z 1 , E 1 , 2 ( t p,d )) − P ( E 2 , 3 | Z 1 = z 1 , E 1 , 2 (0)) } f d ( z 1 ) dz 1 . (16) Conditioning on Z 1 = z 1 , it is easy to ve rify that Z 2 has the same distribu tion as p 1 − z 2 1 Z ′ , where Z ′ has d ensit y f d − 1 . Th erefore P ( E 2 , 3 | Z 1 = z 1 , E 1 , 2 ( t p,d )) = P q 1 − z 2 1 q 1 − t 2 p,d Z ′ + t p,d z 1 ≥ t p,d = P q 1 − t 2 p,d Z ′ ≥ r 1 − z 1 1 + z 1 t p,d . Since p (1 − z 1 ) / (1 + z 1 ) is a decreasing function of z 1 , the righ t h and side of the preceding in- equalit y is increasing in z 1 . Note also that P ( E 2 , 3 | Z 1 = z 1 , E 1 , 2 (0)) is a decreasing fun ction of z 1 . Th is is b ecause u nder E 1 , 2 (0) w e ha v e h X 2 , X 3 i = Z 2 , whic h, as mentio ned ab o v e, und er this 8 conditioning has the same distribution as p 1 − z 2 1 Z ′ , where Z ′ has densit y f d − 1 , and so the tail probabilit y of h X 2 , X 3 i is a decrea sing function of z 1 . Th us the in tegral in (16) is b oun ded from b elo w by p times the d ifference of the conditional probabilities ev aluated at z 1 = t p,d . T h erefore, using also th e previous displa y and that { Z 1 = z 1 } = E 1 , 3 ( z 1 ), w e ha ve P ( E 1 , 3 , E 2 , 3 | E 1 , 2 ( t p,d )) − P ( E 1 , 3 , E 2 , 3 | E 1 , 2 (0)) ≥ p P t p,d 1+ t p,d ≤ Z ′ ≤ t p,d q 1 − t 2 p,d ! ≥ p Z t p,d t p,d 1+ t p,d f d − 1 ( x ) dx. (17) Since f d − 1 ( z ) /f d ( z ) ≥ 1 / 10 for all − 1 ≤ z ≤ 1, usin g (10) w e learn that, f d − 1 ( z ) ≥ c ′ dpt p,d for all 0 ≤ z ≤ t p,d , where c ′ > 0 is an absolute constan t. Plugging th e preceding inequ alit y int o (17) giv es that P ( E 1 , 3 , E 2 , 3 | E 1 , 2 ( t p,d )) − P ( E 1 , 3 , E 2 , 3 | E 1 , 2 (0)) ≥ cdp 2 t 3 p,d (9) ≥ c ′ ( 1 2 − p ) 3 p 2 (log(1 /p )) 3 / 2 √ d ∧ d , where c, c ′ > 0 are absolute constants and the last inequalit y follo ws from (9). By our assu mption that d ≥ 1 4 log ( 1 /p ), we get th at P ( E 1 , 3 , E 2 , 3 | E 1 , 2 ( t p,d )) − P ( E 1 , 3 , E 2 , 3 | E 1 , 2 (0)) ≥ 1 16 c ′ ( 1 2 − p ) 3 p 2 log(1 /p ) 3 / 2 √ d , Plugging the preceding inequalit y into (15) we get that P ( E ) = P ( E 1 , 2 , E 1 , 3 , E 2 , 3 ) = P ( E 1 , 3 , E 2 , 3 | E 1 , 2 ) P ( E 1 , 2 ) = { P ( E 1 , 3 , E 2 , 3 | E 1 , 2 ) − P ( E 1 , 3 , E 2 , 3 ) } P ( E 1 , 2 ) + P ( E 1 , 3 , E 2 , 3 ) P ( E 1 , 2 ) ≥ ( 1 32 c ′ 1 2 − p 3 p 2 log(1 /p ) 3 / 2 √ d ) × p + p 2 × p = p 3 1 + 1 32 c ′ 1 2 − p 3 log(1 /p ) 3 / 2 √ d ! , whic h completes the p r o of of (6 ), and it also completes the pro of of (7) for the case p < 1 / 2 (note that P ( E 1 , 3 , E 2 , 3 ) = p 2 ). It r emains to pro v e the lemma for the case p ≥ 1 / 2. Since the case p = 1 / 2 is established in Lemma 5 in S ection 6, w e ma y assume that 1 2 < p < 1. This case is treat ed by consid er in g the ev en t e E = E C 1 , 2 ∩ E C 1 , 3 ∩ E C 2 , 3 and b y observin g that, since P ( E 1 , 2 ) = p and P ( E 1 , 2 ∩ E 1 , 3 ) = p 2 , w e ha v e by the inclusion-exclusion p rinciple th at P ( E ) = 1 − 3 p + 3 p 2 − P ( e E ) . (18) In light of th is equation, w e need to establish an upp er b ound on P ( e E ). Our computation in w hat follo ws can b e considered as a dual of th e case for p < 1 / 2. Note that P E C 1 , 3 , E C 2 , 3 E 1 , 2 ( x ) is an increasing function of x . Th er efore, P E C 1 , 3 , E C 2 , 3 E C 1 , 2 − P ( E C 1 , 3 , E C 2 , 3 ) ≤ 1 2 P E C 1 , 3 , E C 2 , 3 E 1 , 2 ( t p,d ) − P E C 1 , 3 , E C 2 , 3 E 1 , 2 (0) . W e claim that the p ro of of the b ou n d (7) (and th us of the ent ire lemma) is concluded if w e manage to s h o w th at P E C 1 , 3 , E C 2 , 3 E 1 , 2 ( t p,d ) − P E C 1 , 3 , E C 2 , 3 E 1 , 2 (0) ≤ − c p / √ d , (19) 9 for a constan t c p > 0 whic h dep ends only on p . Indeed, using the last t w o inequalities w e would then ha v e that P ( e E ) = (1 − p ) P E C 1 , 3 , E C 2 , 3 E C 1 , 2 ≤ − (1 − p ) c p / √ d + (1 − p ) 3 , whic h, com bined with (18), giv es that P ( E ) ≥ p 3 + (1 − p ) c p / √ d . It thus remains to prov e (19). As ab o v e, w e write Z 1 = h X 3 , X 1 i and Z 2 = * X 3 , Pro j X ⊥ 1 X 2 Pro j X ⊥ 1 X 2 + . W e see that P E C 1 , 3 , E C 2 , 3 E 1 , 2 ( t p,d ) − P E C 1 , 3 , E C 2 , 3 E 1 , 2 (0) = Z z 1 ≤ t p,d P E C 2 , 3 Z 1 = z 1 , E 1 , 2 ( t p,d ) − P E C 2 , 3 Z 1 = z 1 , E 1 , 2 (0) f d ( z 1 ) dz 1 . (20) Conditioning on Z 1 = z 1 , w e see that Z 2 has the same distribu tion as p 1 − z 2 1 Z ′ , where Z ′ has densit y f d − 1 . Therefore P E C 2 , 3 Z 1 = z 1 , E 1 , 2 ( t p,d ) = P q 1 − z 2 1 q 1 − t 2 p,d Z ′ + t p,d z 1 ≤ t p,d = P q 1 − t 2 p,d Z ′ ≤ r 1 − z 1 1 + z 1 t p,d . Since p (1 − z 1 ) / (1 + z 1 ) is a decreasing fu n ction of z 1 (and t p,d < 0 in this case), the righ t h and side of the preceding inequalit y is increasing in z 1 . It is clear that P E C 2 , 3 Z 1 = z 1 , E 1 , 2 (0) is a decreasing function of z 1 . Thus, the right hand side of (20) is b ounded fr om ab o v e by p times the in tegrand ev aluate d at z 1 = t p,d . Therefore, P E C 1 , 3 , E C 2 , 3 E 1 , 2 ( t p,d ) − P E C 1 , 3 , E C 2 , 3 E 1 , 2 (0) ≤ − p P t p,d 1+ t p,d ≤ Z ′ ≤ t p,d q 1 − t 2 p,d ! ≤ − pt p,d 1 q 1 − t 2 p,d − 1 1+ t p,d ! f d − 1 t p,d 1+ t p,d . It is easily seen that the ab o v e is smaller than − C p,d where C p,d > 0 for any fixed d and 1 2 < p < 1. Th us, by c ho osing the constan t c p to b e sm all enough, w e can m ak e su re that the in equalit y in (19) holds true for any fi xed v alue of d . In other w ords, w e ma y legitimately assume that d > e c p where e c p dep end s on p . Also, sin ce t 1 − p,d = − t p,d , the second inequalit y in (9) teac hes u s that | t p,d | ≤ C ′ s log 1 1 − p d , (21) for a u niv ersal constant C ′ > 0. In turn b y taking e c p to b e large enough, we ma y assume that | t p,d | < 1 / 10. Plugging this assum ption to the last in equalit y , we deduce that P E C 1 , 3 , E C 2 , 3 E 1 , 2 ( t p,d ) − P E C 1 , 3 , E C 2 , 3 E 1 , 2 (0) ≤ − cpt 2 p,d f d − 1 (2 t p,d ) . for a u niv ersal constan t c > 0. Usin g the b ounds (11) and (21), we hav e f d − 1 (2 t p,d ) ≥ 1 2 √ d (1 − 4 t 2 p,d ) ( d − 3) / 2 ≥ 1 2 √ d 1 − 4( C ′ ) 2 log 1 1 − p d ! ( d − 3) / 2 ≥ √ dc ′ p where c ′ p dep end s only on p . Combining the last tw o inequalities with (9) fi nally yields (19). 10 3 Pro of of Theorem 2 Recall fr om the introdu ction that T ( G ) and τ ( G ) denote the num b er of triangles and signed triangles, resp ectiv ely , of a graph G (see (4) and (5)). W e denote b y A th e adjacency mat rix of a graph G ; w e omit the d ep endence of A on G , as the graph G will alwa y s b e ob vious from the con text. T o abbr eviate notation, for distinct v ertice s i , j , and k , we define A i,j := A i,j − E [ A i,j ] and τ G ( i, j, k ) := A i,j A i,k A j,k . The n umber of signed triangles of a graph G with v ertex set [ n ] can then b e written as follo ws: τ ( G ) = X { i,j,k } ⊂ ( [ n ] 3 ) τ G ( i, j, k ) . In order to prov e Theorem 2, we need to sho w that τ ( G ( n, p )) and τ ( G ( n, p, d )) b eha v e v ery differen tly as long as d is muc h smaller than n 3 . This is d one by establishing that the exp ectations of τ ( G ( n, p )) and τ ( G ( n, p, d )) differ b y a quantit y which exceeds b oth of the corresp onding s tand ard deviations. W e br eak the pr o of int o f our parts acco rdin gly: in Section 3.1 w e analyze τ ( G ( n, p )), in Section 3.2 we giv e a lo w er b ound on E τ ( G ( n , p, d ) ), in Section 3.3 we giv e an upp er b ound on the v ariance of τ ( G ( n, p , d )), and finally in Section 3.4 we conclude the pro of of Theorem 2. 3.1 Analysis of τ ( G ( n, p ) ) The analysis of the statistic τ in the Erd˝ os-R ´ en yi mo d el is straigh tforw ard. First, n ote that for eac h { i, j } ∈ [ n ] 2 one has E A i,j = 0. Ther efore by the indep enden ce o f edges it is clear th at E τ G ( n,p ) ( i, j, k ) = 0 for all { i, j, k } ⊂ [ n ] 3 , and so E τ ( G ( n, p )) = 0. In order to compute the v ariance, w e again use the indep endence of edges to see that E τ G ( n,p ) (1 , 2 , 3) τ G ( n,p ) (1 , 2 , 4) = E h A 2 1 , 2 A 1 , 3 A 2 , 3 A 1 , 4 A 2 , 4 i = E h A 2 1 , 2 A 1 , 3 A 2 , 3 A 1 , 4 i E A 2 , 4 = 0 . Moreo v er, a simple calculation rev eals that E h A 2 i,j i = p (1 − p ) for ev ery { i, j } ∈ [ n ] 2 . W e conclude that for tw o triplets of v ertices with indices i, j, k and i ′ , j ′ , k ′ w e ha v e that E τ G ( n,p ) ( i, j, k ) τ G ( n,p ) i ′ , j ′ , k ′ = ( 0 if { i, j, k } 6 = { i ′ , j ′ , k ′ } , p 3 (1 − p ) 3 if { i, j, k } = { i ′ , j ′ , k ′ } . Th us the v ariance of τ ( G ( n, p )) is V ar ( τ ( G ( n, p ))) = X { i,j,k } X { i ′ ,j ′ ,k ′ } E [ τ G ( n,p ) ( i, j, k ) τ G ( n,p ) ( i ′ , j ′ , k ′ )] = n 3 p 3 (1 − p ) 3 . W e ha v e therefore established that E τ ( G ( n, p )) = 0 and V ar ( τ ( G ( n, p ))) ≤ n 3 . (22) 3.2 A lo w er b ound for E τ ( G ( n, p, d )) Our m ain goal h er e is to prov e the follo wing. 11 Lemma 3 F or every 0 < p < 1 , ther e exi sts a c onstant C p > 0 (dep ending only on p ) such that for al l n and d we have E τ ( G ( n, p, d )) ≥ n 3 C p √ d . (23) Pro of It suffices to estimate E τ G ( n,p,d ) (1 , 2 , 3), s in ce E τ ( G ( n, p, d )) = n 3 E τ G ( n,p,d ) (1 , 2 , 3) . W e ha v e that E τ G ( n,p,d ) (1 , 2 , 3) = E A 1 , 2 A 1 , 3 A 2 , 3 = E ( A 1 , 2 − p ) ( A 1 , 3 − p ) ( A 2 , 3 − p ) = E A 1 , 2 A 1 , 3 A 2 , 3 − p ( E A 1 , 2 A 1 , 3 + E A 1 , 2 A 2 , 3 + E A 1 , 3 A 2 , 3 ) + p 2 ( E A 1 , 2 + E A 1 , 3 + E A 2 , 3 ) − p 3 = E A 1 , 2 A 1 , 3 A 2 , 3 − p 3 , where the last equalit y f ollo ws from the simple f acts that E A i,j = p and E A i,j A i,k = p 2 for all triples { i, j, k } ⊂ [ n ] 3 . Th e b oun d (7) from Lemma 1 then giv es that E τ G ( n,p,d ) (1 , 2 , 3) ≥ C p √ d for s ome C p > 0, w h ic h conclud es the p ro of. 3.3 The v ariance of τ ( G ( n, p, d )) The estimation of th e v ariance of τ ( G ( n, p, d )) requires the follo wing tec hnical lemma. Lemma 4 F or every p ∈ [ 0 , 1] we have that E [ τ G ( n,p,d ) (1 , 2 , 3) τ G ( n,p,d ) (1 , 2 , 4)] ≤ π 2 /d. Pro of Define 2 g ( x ) := E τ G ( n,p,d ) (1 , 2 , 3) τ G ( n,p,d ) (1 , 2 , 4) h X 1 , X 2 i = x . Note first th at by rotational inv ariance, and by the indep endence of X 3 and X 4 , we hav e that g ( x ) = E h τ G ( n,p,d ) (1 , 2 , 3) τ G ( n,p,d ) (1 , 2 , 4) X 1 = e 1 , X 2 = xe 1 + p 1 − x 2 e 2 i = 1 { x 0 an d let p = c/n . T o abbreviate notation, for distinct vertic es i , j , and k of a graph G , let T G ( i, j, k ) := A i,j A i,k A j,k ; in other w ords, T G ( i, j, k ) is th e indicator that the vertic es i , j , and k form a triangle in G . W e can then write the num b er of triangl es in a graph G with v ertex set [ n ] as T ( G ) = X { i,j,k } ⊂ ( [ n ] 3 ) T G ( i, j, k ) . First, w e hav e a simple estimate for the exp ectation of T ( G ( n , c/n )): E [ T ( G ( n , c/n ))] = n 3 c n 3 ≤ c 3 . (30) W e now turn to estimating the first tw o moments of G ( n, p, d ). An application of the in equal- it y (6) from Lemma 1 giv es that E [ T ( G ( n , c/n, d ))] ≥ κc 3 log n c 3 / 2 √ d (31) 15 for a un iv ersal constan t κ > 0. Note that the right hand side of the inequalit y (31) goes to infi nit y when d/ log 3 ( n ) → 0. In order to esta blish an upp er b ound f or the v ariance o f T ( G ( n, c/n, d )), fi rst note th at T G ( n,p, d ) ( i, j, k ) and T G ( n,p, d ) ( i ′ , j ′ , k ′ ) are indep endent whenev er { i, j, k } and { i ′ , j ′ , k ′ } do not share an edge, i.e., whenev er |{ i, j, k } ∩ { i ′ , j ′ , k ′ }| ≤ 1. F urthermore, usin g the in d ep enden ce of T G ( n,p,d ) (1 , 2 , 3) and A 1 , 4 , w e hav e that E T G ( n,p,d ) (1 , 2 , 3) T G ( n,p,d ) (1 , 2 , 4) ≤ E T G ( n,p,d ) (1 , 2 , 3) A 1 , 4 = E T G ( n,p, d ) (1 , 2 , 3) × p. Using these f acts and expand ing T ( G ( n , p, d ) ) 2 as a sum of indicators, we h a v e that E h T ( G ( n, p, d )) 2 i = X { i,j,k } X { i ′ ,j ′ ,k ′ } E T G ( n,p, d ) ( i, j, k ) T G ( n,p, d ) i ′ , j ′ , k ′ ≤ n 3 2 E T G ( n,p,d ) (1 , 2 , 3) T G ( n,p,d ) (4 , 5 , 6) + n 4 4 2 E T G ( n,p, d ) (1 , 2 , 3) T G ( n,p, d ) (1 , 2 , 4) + n 3 E h T G ( n,p,d ) (1 , 2 , 3) 2 i ≤ ( E [ T ( G ( n , p, d ) )]) 2 + n 3 (2 np + 1) E T G ( n,p, d ) (1 , 2 , 3) , and so V ar [ T ( G ( n, c/n, d ) )] ≤ (2 c + 1) E [ T ( G ( n, c/n, d ) )] . (32) No w using this estimate, together with Ch eb yshev’s inequ alit y , give s that P T G n, c n , d ≤ 1 2 E T G n, c n , d ≤ 4 V ar T G n, c n , d E T G n, c n , d 2 ≤ 8 c + 4 E T G n, c n , d . On the other hand, Marko v’ s inequalit y and the estimate (30) together giv e th at P T G n, c n ≥ 1 2 E T G n, c n , d ≤ 2 c 3 E T G n, c n , d . Finally , note that b oth of these up p er b ounds go to 0 when d/ log 3 ( n ) → 0, as can b e seen fr om the estimate (31). Th is concludes the p ro of of Th eorem 3. 5 Pro of of the lo w er b ound Recall that if Y 1 is a standard normal random v ariable in R d , then Y 1 / k Y 1 k is uniform ly distribu ted on the sph ere S d − 1 . Consequen tly we can view G ( n, p, d ) as a f unction of an app ropriate Wishart matrix, as follo ws. Let Y b e an n × d matrix w here the entries are i.i.d. stand ard normal rand om v a riables, and let W ≡ W ( n, d ) = Y Y T b e the corresp onding n × n Wishart matrix. Note that W ii = h Y i , Y i i = k Y i k 2 and so h Y i / k Y i k , Y j / k Y j ki = W ij / p W ii W j j . Th us the n × n matrix A defined as A i,j = ( 1 if W ij / p W ii W j j ≥ t p,d and i 6 = j 0 otherwise has the same law as the adjacency matrix of G ( n, p, d ). Denote the map that tak es W to A b y H p,d , i.e., A = H p,d ( W ). 16 In a s im ilar wa y w e can view G ( n, p ) as a function of an n × n matrix drawn from the Gaussian Orthogonal Ens em ble (GOE). Let M ( n ) b e a symmetric n × n random matrix w here the diagonal en tries are i.i.d. normal ran d om v ariables with mean zero and v ariance 2, and the en tries ab o v e the diagonal are i.i.d. standard normal random v ariables, with the en tries on and abov e the diagonal all in dep end ent. T hen the n × n matrix B defined as B i,j = ( 1 if M ( n ) i,j ≥ Φ − 1 ( p ) and i 6 = j 0 otherwise has the same la w as the adjacency matrix of G ( n, p ). Note that B only dep en ds on the off-diagonal elemen ts of M ( n ), so in the defin ition of B w e can replace M ( n ) with M ( n, d ) := √ dM ( n ) + dI n , where I n is the n × n ident it y matrix, pr o vided w e also r eplace Φ − 1 ( p ) with √ d × Φ − 1 ( p ). Denote the map that tak es M ( n, d ) to B by K p,d , i.e., B = K p,d ( M ( n , d )). T he maps H p,d and K p,d are differen t, but ve ry s imilar, as we shall qu antify later. Using the triangle inequalit y , w e can conclude from th e previous t w o paragraphs th at for any p ∈ [0 , 1] w e hav e that TV ( G ( n, p, d ) , G ( n, p )) = TV ( H p,d ( W ( n, d )) , K p,d ( M ( n , d ))) ≤ TV ( H p,d ( W ( n, d )) , H p,d ( M ( n , d ))) + TV ( H p,d ( M ( n, d )) , K p,d ( M ( n , d ))) ≤ TV ( W ( n, d ) , M ( n, d )) + TV ( H p,d ( M ( n, d )) , K p,d ( M ( n , d ))) . The second term in the line ab o v e is a smaller order error term whic h w e deal with in App endix A, and th us Theorem 1(c) follo ws from the follo wing result, whic h is a more precise restate ment of Theorem 4 from the Intro d uction. Theorem 7 Define the r andom mat rix ensembles W ( n, d ) and M ( n , d ) as ab ove. If d/n 3 → ∞ , then TV ( W ( n, d ) , M ( n, d )) → 0 . (33) After pro ving this result, w e learned of the w ork of Jiang and Li [21], w ho also study the question of when can Wishart matrices b e approximate d by a GOE rand om matrix. They p ro v e a result analogous to T heorem 7 for the join t eigen v alue distrib utions, fr om wh ic h Th eorem 7 f ollo ws b y orthogonal in v aria nce. Nonetheless, w e present here our pro of, for the follo wing tw o reasons: (1) to k eep the pap er s elf-cont ained, and (2) b ecause our p ro of is shorter and simpler than the one present ed in [21]. Theorem 7 thus states that as d/n 3 → ∞ , all statistics of th e rand om matrix ensem bles W ( n, d ) and M ( n, d ) hav e asymptotically the same d istribution. W e n ote that in the random matrix lit- erature there h as b een lots of work sho wing that p articular statistics (e.g., the emp irical sp ectral distribution, or the largest eigen v alue) of these t w o ensembles ha v e asymptotically the same distri- bution, ev en when d/n → c ∈ [0 , ∞ ], see, e.g., [8, 22, 14, 15]. Pro of W e show (33) b y comparing the densities of the t w o random matrix ensem bles. Let P ⊂ R n 2 denote th e cone of p ositiv e s emid efinite matrices. I t is w ell kn o wn (see, e.g., [29]) that when d ≥ n , W ( n, d ) h as the follo wing d ensit y with resp ect to the Leb esgue measure on P : f n,d ( A ) := (det ( A )) 1 2 ( d − n − 1) exp − 1 2 T r ( A ) 2 1 2 dn π 1 4 n ( n − 1) Q n i =1 Γ 1 2 ( d + 1 − i ) , 17 where T r ( A ) denotes the trace of the matrix A . It is also known that the den sit y of a GOE random matrix with resp ect to the Leb esgue measure on R n 2 is A 7→ (2 π ) − 1 4 n ( n +1) 2 − n 2 exp − 1 4 T r A 2 and so the d ensit y of M ( n, d ) with resp ect to the Leb esgue measure on R n 2 is g n,d ( A ) := exp − 1 4 d T r ( A − dI n ) 2 (2 π d ) 1 4 n ( n +1) 2 n 2 . Denote the measure giv en by this densit y b y µ n,d , let λ denote the Leb esgue measure on R n 2 and write A 0 if A is p ositiv e semidefi nite. W e can then write TV ( W ( n, d ) , M ( n, d )) = 1 2 Z R n 2 f n,d ( A ) 1 { A 0 } − g n,d ( A ) dλ ( A ) = 1 2 Z R n 2 f n,d ( A ) 1 { A 0 } g n,d ( A ) − 1 dµ n,d ( A ) . In order to sho w that TV ( W ( n, d ) , M ( n, d )) = o ( 1), it s u ffices to sh o w that f n,d ( A ) 1 { A 0 } g n,d ( A ) = 1 + o ( 1) (34) as d/n 3 → ∞ with pr obabilit y 1 − o (1) according to the measure µ n,d . Th is is b ecause Z R n 2 f n,d ( A ) 1 { A 0 } g n,d ( A ) − 1 dµ n,d ( A ) = 0 , and so Z R n 2 f n,d ( A ) 1 { A 0 } g n,d ( A ) − 1 + dµ n,d ( A ) = Z R n 2 f n,d ( A ) 1 { A 0 } g n,d ( A ) − 1 − dµ n,d ( A ) . (35) Since f n,d ( A ) 1 { A 0 } g n,d ( A ) − 1 − ≤ 1, if (34) h olds, then the right hand side of (35) is o (1 ), and consequen tly so is the left hand side of (35), w hic h then sho ws that TV ( W ( n, d ) , M ( n, d )) = o (1). It is kn o wn (see, e.g., [5]) that, with probabilit y 1 − o (1), all the eigen v alues of M ( n ) are in the in terv al [ − 3 √ n, 3 √ n ] and so all th e eigen v alues of M ( n, d ) are in the in terv al h d − 3 √ dn, d + 3 √ dn i . Since d/n 3 → ∞ , we th us hav e P ( M ( n, d ) 0) = 1 − o (1), and so we ma y restrict our atten tion to P . Defin e α n,d ( A ) := log ( f n,d ( A ) /g n,d ( A )). It remains then to sho w that α n,d ( A ) = o ( 1) as d/n 3 → ∞ with pr obabilit y 1 − o (1) according to the measur e µ n,d . Denote the eigen v alues of an n × n m atrix A b y λ 1 ( A ) ≤ · · · ≤ λ n ( A ); when the matrix is ob vious f rom the con text, w e omit the dep endence on A . Recall that d et ( A ) = Q n i =1 λ i and T r ( A ) = P n i =1 λ i . W e then ha v e α n,d ( A ) = 1 2 n X i =1 ( d − n − 1) log λ i − λ i + 1 2 d ( λ i − d ) 2 + n ( n + 3) 4 − dn 2 log 2 + n 2 log π + n ( n + 1) 4 log d − n X i =1 log Γ 1 2 ( d + 1 − i ) . 18 By Stirling’s form ula we know that log Γ ( z ) = z − 1 2 log z − z + 1 2 log (2 π ) + O 1 z as z → ∞ , so α n,d ( A ) = 1 2 n X i =1 ( d − n − 1) log λ i − λ i + 1 2 d ( λ i − d ) 2 + n ( n + 1) 4 log d − 1 2 n X i =1 ( d − i ) log ( d + 1 − i ) + 1 2 n X i =1 ( d + 1 − i ) + O n d . No w writing log ( d + 1 − i ) = log d + log 1 − i − 1 d = log d − i − 1 d + O i 2 d 2 w e get that α n,d ( A ) = 1 2 n X i =1 ( d − n − 1) log λ i − λ i + 1 2 d ( λ i − d ) 2 + 1 2 {− nd log d + nd + n ( n + 1) log d } + O n 3 d . Defining h ( x ) := 1 2 n ( d − n − 1) log ( x/d ) − ( x − d ) + 1 2 d ( x − d ) 2 o , we hav e that α n,d ( A ) = n X i =1 h ( λ i ) + O n 3 d . (36) Note that the deriv ativ es of h at d are h ( d ) = 0, h ′ ( d ) = − n +1 2 d , h ′′ ( d ) = n +1 2 d 2 , and h (3) ( d ) = d − n − 1 d 3 , and also h (4) ( x ) = − 3( d − n − 1) x 4 . Appro ximating h with its third order T a ylor p olynomial around d w e get h ( x ) = − n + 1 2 d ( x − d ) + n + 1 4 d 2 ( x − d ) 2 + d − n − 1 6 d 3 ( x − d ) 3 − d − n − 1 8 ξ 4 ( x − d ) 4 , where ξ is some r eal num b er b et w een x and d . B y (36) we need to sho w that P n i =1 h ( λ i ) = o (1) with probabilit y tending to 1. Recall that with probability 1 − o ( 1) all eigen v al ues of M ( n, d ) are in the in terv al h d − 3 √ dn, d + 3 √ dn i . If λ i ∈ h d − 3 √ dn, d + 3 √ dn i and ξ i is b et w een λ i and d for all i ∈ [ n ], then it is immed iate that n X i =1 n + 1 4 d 2 ( λ i − d ) 2 − d − n − 1 8 ξ 4 i ( λ i − d ) 4 ≤ cn 3 d for s ome constant c , and so wh at remains to show is that n X i =1 − n + 1 2 d ( λ i − d ) + d − n − 1 6 d 3 ( λ i − d ) 3 = o (1) (37) as d/n 3 → ∞ with pr obabilit y 1 − o (1). This follo ws fr om kno wn r esults ab out the moments of the empirical sp ectral distribu tion of Wigner matrices. In particular, [5, T h eorem 2.1.31] sh o ws th at 1 ( nd ) 1 / 2 n X i =1 ( λ i − d ) and 1 ( nd ) 3 / 2 n X i =1 ( λ i − d ) 3 b oth con v erge w eakly to a n ormal distribution with app ropriate v a riance, from which (37) imme- diately follo ws. 19 6 Dimension estimation In this section we pr o v e Theorem 5. The idea for the pro of is very similar to that of Theorem 2, and also uses the s tatistic τ ( G ) counting the “n umber” of signed triangles, analyzed in S ection 3 . Ho w ev er, d im en sion estimation is a sligh tly more subtle matter, since here it is necessary to ha v e a b ound on the difference of the exp ected n umber of triangles b et w een consecutiv e dimen s ions, rather than just a lo w er b oun d on the exp ected num b er of triangles in th e random geometric graph G ( n, p, d ), as in Lemma 1. The next lemma giv es a b ound on this difference; b ut note that this lemma only deals with the case p = 1 / 2. W e b eliev e that this r esult should hold tru e for an y fi xed 0 < p < 1, bu t the pro of seems to b e muc h more inv ol ve d. Lemma 5 L et { e i : 1 ≤ i ≤ d } denote the standar d b asis in R d , and define, for e ach dimension d , h ( d ) := P ( h X 1 , e 1 i ≥ 0 , h X 2 , e 1 i ≥ 0 , h X 1 , X 2 i ≥ 0) , wher e X 1 and X 2 ar e indep endent uniformly distribute d r andom ve ctors on S d − 1 . Then h ( d − 1) − h ( d ) ≥ cd − 3 / 2 (38) for some universal c onsta nt c > 0 . Pro of Define the even t E := {h X 1 , e 1 i ≥ 0 , h X 2 , e 1 i ≥ 0 , h X 1 , X 2 i ≥ 0 } . W e first claim that P ( E | h X 1 , e 1 i = t ) = π − arccos t 2 π 1 { t ∈ [0 , 1] } (39) for all t ∈ [ − 1 , 1]. T o see this, first note that by the rotational in v ariance of the v ectors X 1 and X 2 , and since the eve nt E is in v aria nt un der the ac tion of the orth ogonal group on the thr ee vec tors sim ultaneously , when conditioning on h X 1 , e 1 i = t we can assume that X 1 = te 1 + √ 1 − t 2 e 2 . Thus for all t ∈ [0 , 1] we hav e that P ( E | h X 1 , e 1 i = t ) = P h X 2 , e 1 i ≥ 0 , t h X 2 , e 1 i + p 1 − t 2 h X 2 , e 2 i ≥ 0 . (40) Since the pro jection of X 2 on the span of e 1 and e 2 is distribu ted according to a la w whic h is in v arian t u nder rotations of this span, we ha v e that the probability in (40) is pr op ortional to the angle of th e w edge defi n ed b y { x ∈ span( e 1 , e 2 ) : h x, e 1 i ≥ 0 , t h x, e 1 i + √ 1 − t 2 h x, e 2 i ≥ 0 } . The angle of this wedge is exactly π − arccos t , an d thus formula (39) follo ws. Let Z d b e a random v a riable with the same la w as that of h X 1 , e 1 i ; in other words, Z d has densit y f d , see (8) in S ection 2. Defining Y d = arccos ( Z d ) w e th erefore ha v e that P ( E ) = E 1 2 − Y d 2 π 1 { Y d ∈ [0 ,π/ 2] } = 1 4 − 1 2 π E Y d 1 { Y d ∈ [0 ,π/ 2] } Using the change of v ariables y = arccos z and the formula for f d in (8), w e hav e that the density of Y d is give n b y g d ( x ) = f d (cos ( x )) sin ( x ) = Γ ( d/ 2) Γ (( d − 1) / 2) √ π (sin ( x )) d − 2 , x ∈ [0 , π ] . T o abb reviate n otation, d efine A d := Γ( d/ 2) Γ(( d − 1) / 2 ) √ π . By integrat ing w e th us get that h ( d ) = P ( E ) = 1 4 − A d 2 π Z π / 2 0 x (sin ( x )) d − 2 dx. (41) 20 Let µ d b e the probability measure on [0 , π / 2] defined b y dµ d ( x ) dx = 2 A d (sin ( x )) d − 2 . W e then ha v e that dµ d dµ d − 1 ( x ) = A d A d − 1 sin ( x ), and so Z π / 2 0 sin ( x ) dµ d − 1 ( x ) = A d − 1 A d . (42) Elemen tary estimates concerning the Γ fu n ction giv e th at A d A d − 1 = Γ d 2 Γ d − 2 2 Γ d − 1 2 2 ∈ 1 + 1 12 d , 1 + 2 d , and A d − 1 A d ∈ 1 − 2 d , 1 − 1 12 d . (43) In the f ollo wing w e write ν ≡ µ d − 1 to abb reviate n otation. W e then hav e that h ( d − 1) − h ( d ) = 1 2 π A d Z π / 2 0 x (sin ( x )) d − 2 dx − A d − 1 Z π / 2 0 x (sin ( x )) d − 3 dx ! = 1 4 π Z xdµ d ( x ) − Z xdµ d − 1 ( x ) = 1 4 π Z x A d A d − 1 sin ( x ) − 1 dν ( x ) = 1 4 π A d A d − 1 Z x sin ( x ) − A d − 1 A d dν ( x ) = 1 4 π A d A d − 1 Z x sin ( x ) − Z sin ( y ) dν ( y ) dν ( x ) . (44) Since the fun ction x 7→ sin ( x ) is contin uous and monotone on the interv al [0 , π / 2], there exists a unique x 0 ∈ (0 , π / 2) suc h that sin ( x 0 ) = Z π / 2 0 sin ( y ) dν ( y ) . (45) Since the fun ction x 7→ s in ( x ) is concav e on the in terv a l [0 , π / 2], w e hav e that sin ( x ) − sin ( x 0 ) ≤ cos ( x 0 ) ( x − x 0 ) for all x ∈ [0 , π / 2]. Thus conti nuing from (44) and al so u s ing that A d / A d − 1 ≥ 1 from (43), w e ha v e that h ( d − 1) − h ( d ) = 1 4 π A d A d − 1 Z π / 2 0 ( x − x 0 ) (sin ( x ) − sin ( x 0 )) dν ( x ) ≥ 1 4 π cos ( x 0 ) Z x 0 0 ( x − x 0 ) 2 dν ( x ) . (46) Using (42), (43), and (45), we hav e that x 0 = arcsin A d − 1 A d ∈ arcsin 1 − 2 d , arcsin 1 − 1 12 d , and so cos ( x 0 ) ≥ cos arcsin 1 − 1 12 d = cos arccos q 1 − 1 − 1 12 d 2 = r 1 6 d − 1 144 d 2 ≥ 1 4 √ d . 21 F urthermore, s in ce x 7→ arcsin ( x ) is co nv ex on [0 , 1] with deriv ati ve 1 / √ 1 − x 2 , we ha v e that f or all x ∈ 0 , arcsin 1 − 3 d , ( x − x 0 ) 2 ≥ arcsin 1 − 2 d − arcsin 1 − 3 d 2 ≥ 1 d × 1 q 1 − 1 − 3 d 2 2 = 1 6 d − 9 ≥ 1 6 d . Plugging the estimates from the previous three disp la ys back int o (46), we get that h ( d − 1) − h ( d ) ≥ 1 100 πd 3 / 2 ν 0 , arcsin 1 − 3 d . (47) Let Z ′ b e a random v a riable with density f d − 1 . Th en by the definition of ν w e ha v e that ν 0 , arcsin 1 − 3 d = 2 P arccos Z ′ ∈ 0 , arcsin 1 − 3 d = 2 P Z ′ ≥ cos arcsin 1 − 3 d = 2 P Z ′ ≥ cos arccos q 1 − 1 − 3 d 2 ≥ 2 P Z ′ ≥ √ 6 √ d − 1 , whic h, by (13), is b ounded from b elo w b y a unive rsal constant. T ogether with (47 ), this concludes the pro of. Pro of of Theorem 5 Recall fr om the pro of of Lemma 3 that E τ G n, 1 2 , d = n 3 h ( d ) − (1 / 2) 3 . Th us b y Lemma 5 we ha ve for all d 1 < d 2 that E τ G n, 1 2 , d 1 − E τ G n, 1 2 , d 2 = n 3 ( h ( d 1 ) − h ( d 2 )) ≥ n 3 ( h ( d 1 ) − h ( d 1 + 1)) ≥ n 3 c ( d 1 + 1) 3 / 2 ≥ c 1 n 3 d 3 / 2 1 for a u niv ersal constan t c 1 > 0. By (29) we h a v e a b ound on the v a riance of these statistics: max V ar τ G n, 1 2 , d 1 , V ar τ G n, 1 2 , d 2 ≤ n 3 + 3 n 4 d 1 . Using Chebyshev’s inequalit y thus giv es that P τ G n, 1 2 , d 1 ≤ 1 2 E τ G n, 1 2 , d 1 + E τ G n, 1 2 , d 2 ≤ 12 c 2 1 n 3 d 3 1 + n 4 d 2 1 n 6 ≤ 24 c 2 1 d 2 1 n 2 , where the second inequalit y holds if d 1 ≤ n ; if d 1 > n then th e upp er b ound on the probabilit y is v a cuously tru e by c ho osing c 1 ≤ 1. Similarly we also ha ve that P τ G n, 1 2 , d 2 ≥ 1 2 E τ G n, 1 2 , d 1 + E τ G n, 1 2 , d 2 ≤ 12 c 2 1 n 3 d 3 1 + n 4 d 2 1 n 6 ≤ 24 c 2 1 d 2 1 n 2 . Putting the t wo pr evious displays together concludes the pro of, and sho ws that w e ca n tak e C = 48 max 1 /c 2 1 , 1 for th e constan t in the statemen t of the theorem. 22 Ac kno wledgeme n ts W e thank Nouredd ine El Karoui, G´ ab or Lugosi, and Johan Ugander for helpful discussions and useful references, and Benedek V alk´ o for p oin ting us to ref. [21]. W e also thank Sh irshendu Ganguly and three anonymous referees for useful comment s that h elp ed impro v e the manuscript. Th is w ork w as don e while J.D. and R.E. w ere visiting researc hers and M.Z.R. was an in tern at the Theory Group of Microsoft Researc h. They thank the Th eory Group for their hospitalit y . J.D. ac kno wledges supp ort from NSF grant DMS 131359 6, and M.Z.R. ac kno wledges sup p ort from NSF gran t DMS 1106999 . References [1] Emmanuel Abb e, Afon s o Band eira, and Georgina Hall. E xact Reco v ery in the Sto c hastic Blo ck Mo del. arXiv pr e print arXiv:1405 .3267 , 2014. [2] Ittai Abr aham, Shiri Chechik, Da vid K emp e, and Aleksandr s Slivkins. Lo w-distortion inference of latent s imilarities f rom a multiplex so cial net wo rk. In Pr o c e e dings of the Twenty-F ourth Annual ACM-SIAM Symp osium on Discr ete Algorithm s , pages 1853–188 3. S IAM, 2013. [3] Milton Abramowit z and Irene A. Stegun. Handb o ok of M athematic al F u nc tions . Do v er, 1964. [4] Noga Alon, Mic hael Krive levic h, and Benn y Sudako v. Finding a large hidden clique in a random graph. In Pr o c e e dings of the Ninth annual ACM-SIAM Symp osium on D iscr ete Algorith ms , pages 594–5 98. So ciet y for Indu strial and App lied Mathematics, 1998. [5] Greg W. And er s on, Alice Guionnet, and O fer Zeitouni. An Intr o duction to R an dom Matric e s . Num b er 118. Cam brid ge Unive rsity Pr ess, 2010. [6] Ery Arias-Castro, S´ ebastien Bub ec k, and G´ ab or Lugosi. Detecting Positiv e Corr elations in a Multiv ariate Sample. Bernoul li , 21(1):209 –241, 2015. [7] Ery Arias-Castro and Nicolas V er zelen. Communit y d etectio n in dense random net wo rks. The Anna ls of Statistics , 42(3):940–9 69, 20 14. [8] Zhid ong Bai and Jac k W. S ilv erstein. Sp e c tr al Analysis of L ar ge Dimensional R ando m Matri- c es . Springer, 2nd ed ition, 2009. [9] Quentin Berthet and Philipp e Rigollet . Complexit y Th eoretic Lo wer Bounds for S p arse Prin- cipal C omp onent Detect ion. In Confer enc e on L e arning The ory , pages 1046– 1066, 2013. [10] P eter J. Bic k el, Aiy ou Chen, and Eliza v eta L evin a. Th e metho d of moment s and degree distributions for net wo rk m o dels. The Anna ls of Statistics , 39(5):2280 –2301, 20 11. [11] Heinz Breu and Da vid G. Kirkpatric k. Unit disk graph recognition is NP-hard. Computatio nal Ge o metry , 9(1):3–24 , 1998. [12] S´ ebastien Bub eck, E lchanan Mossel, and Mikl´ os Z . R´ acz. On th e influence of th e seed graph in the preferentia l attac hmen t mo del. IEEE T r ansactio ns on Network Scienc e and E ngine ering , 2(1):3 0–39, 2015. [13] Luc Devroy e, An d r´ as Gy¨ orgy , G´ ab or Lugosi, and F r ederic Ud ina. High-dimensional random geometric graphs and their clique n umber. Ele ctr onic Journal of Pr ob ability , 16 :2481–25 08, 2011. 23 [14] Noureddine E l Karoui. On the largest eigenv alue of Wishart matrices with ident it y co v ariance when n , p and p/n → ∞ . arXiv pr epr int math /03093 55 , 200 3. [15] Noureddine El Karoui. T racy-Widom limit for the largest eigen v alue of a large class of complex sample co v ariance matrices. The Annals of Pr o b ability , pages 663–7 14, 2007. [16] Ronen Eldan. An efficiency upp er b ound for in v erse co v a riance est imation. Isr ael Journa l of Mathematics , 207(1): 1–9, 2015. [17] P aul Erd˝ os and Alfr´ ed R ´ en yi. On the ev olution of r andom graphs. Publ. M ath. Inst. Hungar. A c ad. Sci , 5:17–61 , 1960. [18] Alan F rieze and Ra vi Kannan. A new approac h to the plan ted clique pr oblem. In Ramesh Hariharan, Madha v an Mukund, and V Vin a y , editors, IA R CS Annual Confer enc e on F ounda- tions of Softwar e T e chnolo gy and The or etic al Computer Scienc e , v olume 2 of L eibniz Interna- tional Pr o c e e dings in Informatics (LIPIcs) , p ages 187–1 98, Dagstuhl, Germany , 2008. Sc hloss Dagstuhl–Leibniz-Zen trum fuer In formatik. [19] Rishi Gupta, Tim Roughgarden, and C. Seshadh r i. Decomp ositions of T riangle-Dense Graphs. In Pr o c e e dings of the 5th Confer enc e on Innovations i n The or etic al Computer Scie nc e , pages 471–4 82. A CM, 2014. [20] P eter D. Hoff, Adrian E. Raftery , and Mark S. Hand co ck. Laten t Sp ace Approac hes to S o cial Net w ork Analysis. Journal of the Americ an Statistic al Asso ciation , 97(460):109 0–1098, 200 2. [21] Tiefeng J iang an d Danning Li. Approxima tion of Rectangular Beta-Laguerre Ensembles and Large Deviations. Journal of The or etic al Pr ob ability , 28(3):804–8 47, 20 15. [22] Iain M. Johnstone. On the d istribution of the largest eigen v al ue in principal comp onen ts analysis. Annals of Statistics , 29(2):2 95–327, 2001. [23] Elc hanan Mossel, Joe Neeman, and Allan Sly . Reconstruction and estimation in th e plant ed partition mo del. Pr ob ability The ory and R elate d Fields , 162(3):4 31–461 , 2015. [24] Mathew Pe nrose. R andom Ge ometric Gr ap hs , v olume 5 of O xfor d Studies in P r ob ability . Oxford Un iv ersit y Pr ess, 2003. [25] Purn amrita Sark ar, Deepa y an Chakrab arti, and Andrew W. Mo ore. Theoretical Justification of Popular Link Prediction Heur istics. In Confer enc e on L e arning The o ry , pages 295–3 07, 2010. [26] Sasha So din . T ail-Sensitiv e Gaussian Asymptotics for Marginals of Concentrate d Measures in High Dimension. In Vitali D. Milman and Gideon S c hec h tman, editors, Ge ometric A sp e cts of F unctional A nalysis , v olume 1910 of L e ctur e Notes in Mathematics , pages 271–295 . Spr inger Berlin Heidelb erg, 2007. [27] Nicolas V erzelen and Ery Arias-Castro. Communit y Detectio n in Sparse Random Net w orks. arXiv pr eprint arXiv:1308.295 5 , 2013. [28] Duncan J. W atts and Steve n H. Strogatz. Collectiv e dyn amics of ‘small-w orld’ net wo rks. Natur e , 393(66 84):440– 442, 1998. [29] John Wishart. The Generalised Pro duct Moment Distrib ution in Samples f r om a Normal Multiv ariate P opulation. Bi ometrika , 20A(1/2 ):32–52 , 1928. 24 A The error term in the pro of of the lo w er b ound Recalling notation from S ection 5, we prov e h ere the follo wing lemma, wh ic h is necessary in ord er to conclud e the p ro of of Theorem 1(c). Lemma 6 If d/n 3 → ∞ , then su p p ∈ [0 , 1] TV ( H p,d ( M ( n , d )) , K p,d ( M ( n, d ))) → 0 . Define Ψ d ( x ) = R 1 x/ √ d f d ( y ) dy , and note that b y the definition of t p,d , p = R 1 t p,d f d ( x ) dx = Ψ d t p,d √ d . Durin g the p ro of of Lemma 6 we use the follo wing result of So d in [26]. Lemma 7 Ther e e xist c onstants C , C 1 , C 2 > 0 , and a se quenc e ε d ց 0 that satisfies ε d = O (1 /d ) , such that the fol lowing ine qualities hold for al l 0 < t < C √ d : (1 − ε d ) Φ ( t ) exp − C 1 t 4 /d ≤ Ψ d ( t ) ≤ (1 + ε d ) Φ ( t ) exp − C 2 t 4 /d . W e note that in [26, Lemma 1] the fact that ε d satisfies ε d = O (1 /d ) is not sp ecified, bu t this can b e read from the p ro of, where this error comes from the error in Stirling’s formula. Pro of of Lemma 6 T o abbreviate notation, we write X := H p,d ( M ( n, d )), Y := K p,d ( M ( n , d )), and also M ≡ M ( n ). De fine b D i,j := r 1 + M i,i / √ d 1 + M j,j / √ d and recall th at for 1 ≤ i < j ≤ n we h a v e that X i,j = 1 { b D − 1 i,j M i,j ≥ √ dt p,d } , (48) Y i,j = 1 n M i,j ≥ Φ − 1 ( p ) o , and define also Z i,j = 1 { M i,j ≥ √ dt p,d } . (49) By the triangle inequalit y we ha v e that TV ( X, Y ) ≤ TV ( X , Z ) + TV ( Y , Z ) , and w e deal with the second term first. By a un ion b ound, we ha v e th at TV ( Y , Z ) ≤ P ( ∃ 1 ≤ i < j ≤ n : Y i,j 6 = Z i,j ) ≤ n 2 P ( Y 1 , 2 6 = Z 1 , 2 ) , and th us it r emains to show that P ( Y 1 , 2 6 = Z 1 , 2 ) = o n − 2 ; sin ce n 3 /d → 0, it is enough to sho w that P ( Y 1 , 2 6 = Z 1 , 2 ) = O d − 2 / 3 . Sin ce M 1 , 2 is a stand ard normal r andom v ariable, w e ha ve that P ( Y 1 , 2 6 = Z 1 , 2 ) = Φ √ dt p,d − p = Φ √ dt p,d − Ψ d √ dt p,d . (50) F or p = 1 / 2 this expression is zero, and noting that t p,d = − t 1 − p,d b y symmetry , it is enough to b ound the expression in (50) for p ∈ [0 , 1 / 2). F or p ∈ (0 , 1 / 2) fi xed, note that Lemma 2 states that there exists a constan t C p < ∞ suc h that 0 ≤ √ dt p,d ≤ C p . T oget her with Lemma 7 applied at t = √ dt p,d , this implies th at th e expression in (50) is O (1 /d ). More generally , by Lemm a 2 there exists a unive rsal constant C > 0, such that if p ∈ ( n − α , 1 / 2), then 0 ≤ √ dt p,d ≤ C p α log ( n ), and if p ∈ [0 , n − α ], then √ dt p,d ≥ C − 1 p α log ( n ). When 25 p ∈ ( n − α , 1 / 2), Lemma 7 applied at t = √ dt p,d then implies that the expression in (50 ) is O α 2 log 2 ( n ) /d , wh ic h is O d − 2 / 3 for constant α . When p ∈ [0 , n − α ] we then hav e that P ( Y 1 , 2 6 = Z 1 , 2 ) ≤ p + Φ √ dt p,d ≤ n − α + Φ C − 1 p α log ( n ) ≤ n − α + exp − α log ( n ) 2 C 2 . By c ho osing α := max 3 , 6 C 2 , this expression b ecomes O n − 3 . Th is concludes the p ro of that sup p ∈ [0 , 1] P ( Y 1 , 2 6 = Z 1 , 2 ) = O d − 2 / 3 + n − 3 . W e no w deal w ith the term TV ( X , Z ). F or a matrix A , let D ′ ( A ) denote th e diagonal matrix obtained from A b y sett ing th e non-diagonal terms to zero, and let D ( A ) := A − D ′ ( A ). With I ≡ I n denoting the n × n ident it y matrix, d efine the fu nction f ( A ) := I n + 1 √ d D ′ ( A ) − 1 / 2 A I n + 1 √ d D ′ ( A ) − 1 / 2 . Note that Z ca n b e obtained from D ( M ) b y thresh olding the en tries approp r iately (see (49 )), and X can b e obtained from D ( f ( M )) in the s ame wa y (see (48)). Consequen tly w e ha ve that sup p ∈ [0 , 1] TV ( X, Z ) ≤ TV ( D ( f ( M )) , D ( M )) , (51) and in the remaind er of the pr o of we sh ow that this latter quantit y go es to zero as n 3 /d → 0. Let Ω = R ( n 2 − n ) / 2 , which we canonically iden tify with the space of sym m etric n × n matrices with zero on the d iagonal, an d let Ω ′ = R n , whic h we think of as the s pace of th e diagonal ent ries on an n × n matrix. By sligh t abuse of notation, we allo w ourselv es to think of D and D ′ as f unctions from the space of symmetric n × n matrices to Ω and Ω ′ resp ectiv ely . W e can th us naturally view the function D ◦ f as a map p ing f rom Ω ⊕ Ω ′ to Ω . In order to b ound the right h and side of (51), w e need an estimate for the dens ity of D ( f ( M )), whic h w e denote b y w ( x ). Note that w ( x ) is the densit y of the push-forwa rd u n der the function D ◦ f of the measur e whose densit y is γ ( x, y ), w here γ ( x, y ) := 1 2 n/ 2 (2 π ) ( n 2 + n ) / 2 exp − 1 2 X 1 ≤ i √ d/ 2) < 2 e − d/ 8 . Con- sequen tly we ha v e for all 1 ≤ i ≤ n that E " 1 − ( n/ 4 + 1) P j 6 = i x 2 i,j d Γ 2 i ! 1 { | Γ i |≤ √ d/ 2 } # ≥ 1 − 2 e − d/ 8 − ( n/ 4 + 1) P j 6 = i x 2 i,j d , and the tw o last d ispla ys giv e that w ( x ) q ( x ) ≥ 1 − 2 ne − d/ 8 − n/ 4 + 1 d X 1 ≤ i d n 2 ≤ 2 ne − d/ 8 + 2 n 3 d + 2 n 2 e − d 2 n 2 , whic h completes the pro of. 28
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment