Information-theoretic bounds for exact recovery in weighted stochastic block models using the Renyi divergence

Information-theoretic b ounds for exact reco v ery in w eigh ted sto c hastic blo c k mo dels using the Ren yi d iv ergence V arun Jog P o-Ling Loh varunjog@wh arton.upenn.edu loh@wharton .upenn.edu Departmen ts of Stat istics & CIS Departmen t of Stat istics W arren Cen ter for Net w ork and Data Sciences The Wharton School Univ ersity of Pe n nsylv ania Univ ersity of Pe n nsylv ania Philadelphia, P A 19104 Philadelphia, P A 19104 Septem b er 2015 Abstract W e derive sharp th r esholds for exact recovery of communities in a w eighted sto chastic blo ck mo del, where obs erv ations are co llected in the form o f a weigh ted adjacency matrix, a nd the weigh t of ea ch edge is generated indep e ndent ly from a distribution determined by the com- m unity membership of its endp oints. Our main result, c hara cterizing the precise bo undary betw e e n success and failure of maximum likelihoo d estima tio n when edge weigh ts are drawn from discrete distributions, in volves the Re nyi divergence of order 1 2 betw e e n the distributions of within-communit y a nd betw ee n- communit y edges. When the Renyi divergence is ab ov e a certain thre s hold, meaning the edg e distributions ar e suﬃciently separa ted, max imum likeli- ho o d succeeds with pro bability tending to 1; when the Ren yi divergence is below the thresho ld, maximum likelihoo d fails with pr obability b ounded awa y from 0. In the language of graphical channels, the Renyi div er gence pinpo ints the information-theore tic capacity of discr ete g raphical channels with binary inputs. Our r esults gener alize previo usly established thresholds derived sp eciﬁcally fo r un weigh ted blo ck mo dels, and s uppo rt an imp or tant natur al intuit io n rela ting the in tr insic hardness o f comm unity estimation t o the problem of edge c lassiﬁcation. Along the wa y , we es ta blish a general relationship b etw ee n the Renyi divergence and the probability o f success of the maxim um lik eliho o d estimator for arbitrary edge weight distributions. Finally , we discuss consequence s of our b ounds for the related problems of censo red blo ck mo dels and submatrix lo calization, which may b e seen as sp ecial ca ses of the framework developed in our pap er. 1 In tro d uction The recent explosion of interest in net work data has created a n eed for new statistica l metho ds for analyzing net work datasets and interpreting r esults [ 30 , 13 , 22 , 16 ]. One activ e area of researc h with div erse applica tions in many scientiﬁc ﬁelds pertains to comm unit y detectio n and estimation, where the inform ation a v ailable consists of the presence or absence of edges b etw een no d es in the graph , and the goal is to partition th e no des into disjoin t groups b ased on their relativ e connectivit y [ 14 , 19 , 33 , 36 , 26 , 32 ]. A standard assumption in statistical mod eling is that conditioned on the comm unit y lab els of the no des in the graph, edges are generated indep end en tly according to ﬁxed d istributions go v ernin g the connectivit y of no des w ithin and b etw een comm unities in the graph. This is the setting of the sto c hastic blo ck mo d el (SBM) [ 21 , 39 , 38 ]. I n the h omogeneous case, edges foll ow one distribution w hen b oth endp oints are in the same communit y , regardless of the comm u nit y lab el; and edges follo w a second distrib ution when the endp oin ts are in diﬀerent comm unities. A v ariet y 1 of inte r esting statistical results hav e b een deriv ed recen tly c h aracterizing th e regimes un d er whic h exact or we ak r eco v ery of comm u nit y lab els is p ossible (e.g., [ 27 , 29 , 25 , 1 , 2 , 4 , 17 , 18 , 40 ]). E x act reco v ery refers to the case w here the comm unities are p artitioned p erfectly , and a corresp ondin g estimator is called str ongly c onsistent . O n the other han d , w eak reco very refers to th e case w here the estimated communit y lab els are p ositiv ely correlated with th e true lab els. In th e setting of sto c h astic blo ck mo d els with nearly-equal comm unity sizes and homogeneous connection p robabilities, Zhang and Zh ou [ 40 ] derive minimax rates f or statistical estimation in the case of exact reco v ery . Interestingly , the expression they obtain cont ains the Ren yi divergence of order 1 2 b et ween t wo Bern oulli distributions, corresp onding to the probability of generation f or within-comm un it y and b etw een-comm unit y edges. Hence, the h ardness of reco ve r ing the comm u- nit y assignments is somehow ca p tured in the hardness of in ferring whether pairs of nodes lie within the s ame communit y or in diﬀeren t communities. T h is resu lt h as a very natural intuitiv e int erp re- tation, since kno win g whether eac h pair of n o des (or ev en eac h pair of no des along the edges of a spanning tr ee of the graph) lies in the same comm un it y would clearly lead to p erfect reco v ery of the comm unity lab els. O n the other han d , this constitutes a s omewhat diﬀerent p ersp ectiv e from the prev ailing viewp oint of the hardness of reco vering communit y lab els b eing innately tied to the success or failure of a hyp othesis testing problem determinin g w hether an individu al no de lies in one comm unity or another [ 4 , 29 , 40 ]. S ev eral other attempts h a ve b een made to relate the sh arp thresh- old b ehavio r of comm unit y estimation to v arious quan tities in information theory [ 3 , 10 , 12 , 4 ], but the pr ecise r elationship is still largely un kno wn . The v ast ma j orit y of existing literature on sto c h astic blo ck mo dels h as fo cus ed on the case where no other in formation is av ailable b ey ond the un weig hted adjacency matrix. In an a ttempt to b etter understand the information-theoretic quan tities at work in determinin g the thr esholds for exact reco v ery in sto chastic blo c k mo d els, w e will widen our consideration to the more general weigh ted problem. Note that situations n aturally arise where netw ork datasets con tain information ab out the strength or type of connectivit y b et ween ed ges, as wel l [ 31 , 9 ]. In so cial net wo rk s , inform ation ma y b e a v ailable quantifying the strength of a tie, suc h as the n umb er of interacti ons b etw een the ind ividuals in a certain time p erio d [ 35 ]; in cellular net works, inform ation ma y b e av ailable quan tifyin g the frequency of comm unication b et wee n u sers [ 8 ]; in airline net works, edges ma y b e lab eled according to the typ e of air traﬃc linking pairs of cities [ 7 ]; and in neu r al net wo rk s , edge w eight s ma y symb olize the level of neur al activit y b etw een regions in the b rain [ 34 ]. Of cours e, the connectivit y d ata could b e condensed into an adjacency matrix consisting of only zeros and ones, but this wo u ld result in a loss of v aluable in formation that could b e used to r eco v er no de comm unities. In this p ap er, we analyze the “we ighted” setting of the sto c hastic blo ck mo del, wh ere edges are generated from arbitrary d istributions that are n ot restricted to b eing Bernoulli. Our ke y question is whether the Renyi d iv ergence of order 1 2 app earing in th e results of Zhang and Zhou [ 40 ] con tinues to p ersist as a fu ndamenta l quantit y that determin es the hardn ess of exact reco v ery in the generalized setting. Surprisin gly , our answer is aﬃrmativ e. First, we sho w that th e Ren yi divergence b etw een the within-communit y and b et ween-comm unit y edge distributions may b e used dir ectly to con trol the probabilit y of failure of the maxim u m like liho o d estimator. Hence, as the Renyi div ergence increases, corresp onding to edge distribu tions that are fu r ther apart, the probabilit y of failure of maxim um lik eliho o d is d riv en to zero. Next, w e focus o n a sp eciﬁc r egime inv olving discrete weigh ts (or colors), wh ere the a verage n u m b er of edges of eac h sp eciﬁc color connected to a no d e scales according to Θ (log n ). I n this case, w e sh o w that the b ound s deriv ed earlier inv olving the Ren yi div ergence are in fact tigh t, and exact reco very is imp ossible when the Ren yi div ergence b etw een the weigh ted d istributions is b elow a certain threshold. Our resu lts are also applicable in the more general setting of more than t wo comm un ities. Finally , we discu s s the consequences of our theorems 2 in the con text o f decoding in discrete g r ap h ical channels and su bmatrix lo calization w ith cont inuous distributions. The r emainder of the p ap er is organized as follo ws : In Section 2 , we int r o duce the basic bac k- ground and mathematical n otation used in th e pap er. In Section 3 , we p resen t our main theo- retical con tributions, b eginnin g with ac h iev abilit y resu lts for the maxim u m lik eliho o d estimator in a w eigh ted s to c hastic b lo c k m o del with arbitrarily many comm unities. W e then d eriv e sh arp thresholds for exact reco very in the discrete w eight ed case, and then in terp ret our r esults in the framew ork of graph ical c hannels and sub matrix lo calization. Section 4 con tains the main argum en ts for the pro ofs of our theorems. W e conclude in Section 5 with a discussion of sev eral op en questions related to ph ase transitions in weigh ted sto c hastic blo c k mo dels. 2 Bac kground and problem setup Consider a sto c hastic blo c k mo del with K ≥ 2 communities, eac h with n no des. F or eac h no de i , let σ ( i ) ∈ { 1 , 2 , . . . , K } denote t h e comm unit y assignment of the no de. A we ighted sto c hastic blo c k mo del consists of a random graph generated on the v ertices { 1 , 2 , . . . , nK } , using th e communit y assignmen ts σ , as well as a sequ en ce of distributions p ( k 1 ,k 2 ) n (= p ( k 2 ,k 1 ) n ), for 1 ≤ k 1 , k 2 ≤ K and n ≥ 1. The supp ort of the distributions ma y b e con tinuous or discrete. In the discrete case, we will often u se the terms w eigh t, color, and lab el in terc hangeably . The w eigh ted random graph is generated as follo ws: Eac h edge ( i, j ) is assigned a random weig ht W ( i,j ) ∼ p ( σ ( i ) ,σ ( j )) n , in dep end en t of the w eigh ts of all other ed ges. Suc h a sto c hastic blo c k mo del is called non-homo gene ous , sin ce the distrib utions of the edge w eights dep end not only on wh ether the endp oint s of an edge b elong to the same communit y , but also on whic h comm un ities they b elong to. In th is p ap er, we will consider a homo gene ous w eigh ted stochastic blo c k mo d el, which ma y b e describ ed simply as follo ws: Giv en a sequence of distribu tions { p n } and { q n } , ev ery edge ( i, j ) is assigned a rand om w eight W ( i,j ) , indep enden tly of all other edge weigh ts, s uc h that W ( i,j ) ∼ ( p n if σ ( i ) = σ ( j ) , q n if σ ( i ) 6 = σ ( j ) . (1) The traditional (unw eigh ted) sto c h astic blo ck mo dels constitute a sp ecial case of we ighted sto c hastic blo c k mo dels, sin ce w e may enco de edges with weig hts 1 or 0, corresp onding to the presence or absence of an edge. Our ultimate goal is to in fer the underlying communities based on observing the we ight matrix W . Sev eral diﬀering n otions of in ference ha ve b een studied in the case of unw eigh ted sto c h astic blo c k mo dels. In th e “sparse r egime,” where the distrib utions p n and q n scale as p n (0) = 1 − a/n n , p n (1) = a n , and q n (0) = 1 − b/n n , q n (1) = b n , for constan ts a, b ≥ 0, one cannot h op e to reco v er the communities exactly , since th e graph is not connected with high probability . The notion of “detect ion” or “w eak r eco v ery” considered in this regime consists of obtaining comm u nit y assignments that are p ositiv ely correlated with the tru e assignmen t. It has b een sh o wn in the case K = 2 that if ( a − b ) 2 > a + b, (2) 3 it is imp ossible to obtain suc h an assignment 1 ; whereas if ( a − b ) 2 < a + b, obtaining a p ositiv ely correlated assignment b ecomes p ossible [ 28 , 25 ]. In order to obtain exact reco very , a simple necessary condition is that the graph m ust b e connected, meaning the p r obabilit y of having an edge m u st scale acco r ding to Ω  log n n  . This regime wa s considered in Abb e et al. [ 2 ], where the probabilities were giv en by p n (0) = 1 − a log n/n n , p n (1) = a log n n , and q n (0) = 1 − b log n/n n , q n (1) = b log n n , for constan ts a, b ≥ 0. In this regime, it wa s sho wn [ 2 ] that exact reco very of comm u nities is p ossible if    √ a − √ b    > 1 , and imp ossible if    √ a − √ b    < 1 . Apart from exac t reco v ery (also known as strong consistency) and we ak reco v ery , a notio n of partial reco v ery (also known as wea k consistency) h as also b een considered [ 29 , 5 , 40 ]. T his notion lies b et ween the other t wo notions of reco v ery , and only requir es the fraction of misclassiﬁed no des to con ve r ge in p r obabilit y to 0 as n b ecomes large. A v ery g ener al result for th e K = 2 case, c haracter- izing wh en exact and partial rec ov ery are p ossib le for the u nw eigh ted homogeneo u s stoc h astic bloc k mo del, is p r o vided in Mossel et al. [ 29 ]. Zhang and Zh ou [ 40 ] consid er the problem of comm unity d e- tection in a min imax setting with an appropriate loss f unction, wh ere the parameter space consists of b oth homogeneous and non-homogeneous sto chastic blo ck mo dels, the num b er of communities ma y b e ﬁxed or gro wing, and the comm unity sizes need not b e exactly equal. In particular, for the case of homogeneous sto c hastic blo ck mo dels where the comm unity size s are almost equal and scale as n (1+ o (1)) K , they s h o w that the loss function deca ys at the rate of e − (1+ o (1)) nI /K whenev er nI K → ∞ . Here, I is the Ren yi div ergence of order 1 2 b et ween the t w o Bernoulli distributions corresp onding to b et ween-c ommunit y and within-comm un it y edges. F urthermore, th ey sho w that exact reco very is p ossible if and on ly if the loss function is o ( n − 1 ), wh ereas p artial reco very is p ossible if and only if it is o (1). The exact reco very b ounds ac h iev ed in this w a y matc h those of Abb e et al. [ 2 ]. Heimlic her et al. [ 20 ] also conjectured that similar threshold p henomena should exist in the case of the sto chastic blo c k mo del with discrete w eigh ts. In particular, Heimlic her et al. [ 20 ] consider the homogeneous ca se where K = 2 and the b et wee n -comm unity and within-comm unity connection probabilities scale as Θ  1 n  . Analogous to expression ( 2 ), they conjectured a threshold in terms of the d iscrete prob ab ilities suc h that weak reco very is p ossible ab o ve this th reshold and imp ossible b elo w the threshold. The imp ossibilit y of reconstruction b elo w the conjectured thr esh old w as established in Lelarge et al. [ 24 ], and eﬃcien t algorithms that ac h iev e we ak reco ve r y w ere p ro vided for a constant ab o ve the thresh old. In th is pap er, w e consider the problem of exact reco very in the homogeneous weigh ted sto c h astic blo c k mo del with K ≥ 2 communities. By deﬁnition, the estimator that minimizes the pr obabilit y of erroneous communit y assignments is the maxim um lik eliho o d estimator: If the m axim um lik elihoo d 1 W e appropriately modify the conditions to ta ke into accoun t that the communit y s ize in our setting is n , as opp osed to n/ 2. 4 estimator fails to reco ver the comm unities with a certain pr obabilit y , then the probabilit y of err or of any other estimator is also lo we r -b oun ded by the same probabilit y . Th u s, to show imp ossibilit y of reco v ery , it is suﬃcient to show that the m axim um likel ih o o d estimator fails with a nonzero probabilit y . Finally , note that as in the unw eigh ted case, the maxim um lik eliho o d estimator in th e w eight ed case is easy to describ e in terms of a min-cut graph partition [ 24 ]. Let L b e the class of edge lab els, and let p n and q n b e distributions supp orted on L whic h describ e the probabilities of edge lab els for w ithin-comm un it y and b et ween-co mmunit y edges. F or an edge with lab el ℓ ∈ L , w e assign a we ight of log  p n ( ℓ ) q n ( ℓ )  . Th e maxim u m lik eliho o d estimator then seeks to partition the v ertices into disjoint comm un ities in su c h a w ay that the sum of weigh ts of b etw een-comm unit y edges is minimized. 3 Main results and consequences In this sectio n , w e presen t our main resu lts concerning ac hiev abilit y and imp ossibilit y of exact reco v ery , along with sev eral applications. 3.1 Ren yi divergen c e and ac hiev ab ility W e b egin with a result that con trols the probabilit y of success for maximum lik eliho o d estimation under th e general homogeneous m o del ( 1 ), when K = 2. Our ﬁ r st theorem relates the probabilit y of failure of maxim um lik eliho o d to the Ren yi d iv ergence b et we en the distributions for within- comm unity and b et we en-communit y edge we ights. Theorem 3.1 (Pro of in S ection 4.1 ) . Consider a sto chastic blo ck mo del with two c ommunities of size n , with c onne ction pr ob abilities governe d by the mo del ( 1 ) . Then the pr ob ability that the maximum likeliho o d estimator fails is b ounde d as P ( F ) ≤ n/ 2 X k =1 exp  2 k  log n k + 1  − 2 k ( n − k ) I  , (3) wher e I is the R enyi diver genc e of or der 1 2 b etwe en the e dge weight distributions p n ( x ) and q n ( x ) , given by I = ( − 2 log  R ∞ −∞ p p n ( x ) q n ( x ) dx  , for c ontinuous distributions on R , − 2 log P ℓ ≥ 0 p p n ( ℓ ) q n ( ℓ ) , for discr ete distributions on N . Note that the general exp onent ial b ound in inequalit y ( 3 ) decreases with I , whic h corresp onds to the distrib utions p n and q n b ecoming more separated. This corrob orates the in tuition that the failure probability of maxim um likelihoo d P ( F ) app earing on the left-hand side of inequalit y ( 3 ) should d ecrease w ith I , s in ce the problem b ecomes easier to solv e as the within-comm un it y and b et ween-comm unit y distrib utions b ecome easier to distin gu ish . Of co u rse, Theorem 3.1 is particularly informative in regimes where w e ca n show that the righ t- hand s id e of inequalit y ( 3 ) tend s to 0, implying that th e maximum lik eliho o d estimator succeeds with probability tending to 1. T o illustrate this p oint, we ha ve th e follo wing corollary: Corollary 3.1 (Pro of in Section 4.2 ) . Supp ose the R enyi diver genc e b etwe en p n and q n satisﬁes lim inf n →∞ nI log n > 1 . 5 Then the maximum likeliho o d estimator suc c e e ds with pr o b ability c onver ging to 1 as n → ∞ . W e will discuss th e implications of Corollary 3.1 in v arious scenarios in the sections b elo w. W e also ha v e a version of Theorem 3.1 th at is applicable to the case of more than t w o comm unities. W e state and pro ve the more ge n eral theorem separately , sin ce the argumen t for K = 2 is substan tially simpler. Theorem 3.2 (Pro of in S ection 4.3 ) . Consider a sto cha stic blo ck mo del with K c ommunities of size n , with c onne ction pr ob abilitie s governe d by the mo del ( 1 ) . Then the pr ob ability that the maximum likeliho o d estimator fails is b ounde d as P ( F ) ≤ ⌊ n/ 2 ⌋ X m =1 min  enK 2 m  m , K nK  e ( − nm + m 2 ) I + nK X m = ⌊ n/ 2 ⌋ +1 min  enK 2 m  m , K nK  e − 2 mn 9 I , (4) wher e I is the R enyi diver genc e of or der 1 2 b etwe en the e dge weight distributions p n ( x ) and q n ( x ) . In p articular, if lim inf n →∞ nI log n > 1 , (5) then the maximum likeliho o d estimator suc c e e ds with pr ob ability c onver ging to 1 as n → ∞ . The pro of of Th eorem 3.2 builds up on the arguments of Zhang and Zhou [ 40 ] and extends them to more general distributions. 3.2 Thresholds for w eighted sto chastic blo ck mo dels In this section, we derive a threshold phenomenon for exact reco v ery in th e case when p n and q n are discrete distribu tions. Analogous to the scenario consid ered in [ 2 ], w e no w concen trate on the regime where the pr obabilit y of h a ving an edge scales as Θ  log n n  . How ev er, in addition to Bernoulli distribu tions, our f r amew ork accommo dates distributions on a larger alph ab et, denoted b y th e set { 0 , 1 , . . . , L } for L ≥ 1. Thus, in stead of sim p ly observing the presence or absence of an edge, we ma y also observe the corresp onding c olor or weight of the edge. W e deﬁne the distribu tions { p n , q n } as follo ws: F or t wo ve ctors a = [ a 1 , a 2 , . . . , a L ] and b = [ b 1 , b 2 , . . . , b L ] in R L + , deﬁne p n (0) = 1 − u log n n , and p n ( ℓ ) = a ℓ log n n , ∀ 1 ≤ ℓ ≤ L, (6) q n (0) = 1 − v log n n , and q n ( ℓ ) = b ℓ log n n , ∀ 1 ≤ ℓ ≤ L, (7) where u = P L ℓ =1 a ℓ and v = P L ℓ =1 b ℓ . W e wish to determine a criterion in terms of a and b that describ es w hen it is p ossible to to exactly determine th e comm un ities in this m o del. Our ﬁrs t result is the f ollo wing theorem guarante eing th e success of the maxim um lik eliho o d estimator: Theorem 3.3 (Pr o of in Section 4.4 ) . Supp ose L X ℓ =1  √ a ℓ − √ b ℓ  2 > 1 . (8) Then the maximum likeliho o d e stimator r e c overs the c o mmunities exactly w ith pr ob ability c onver ging to 1 as n → ∞ . 6 W e note th at the expression on the left-hand side of inequalit y ( 8 ) is incr easing in L , agreeing with the intuitio n that the exact reco v ery problem b ecomes easier when more edge colors are a v ailable: Given a graph w ith L edge colo r s , we ma y alw ays erase certain colors to obtain a new graph with L ′ < L colors, and th en apply a maxim u m lik eliho o d estimator to the n ew graph. Th e probabilit y of su ccess of this estimator m ust b e at least as large as th e probabilit y of success of a maxim um lik eliho o d estimator app lied to the original graph; in particular, if L ′ X ℓ =1  √ a ℓ − p b ℓ  2 > 1 , (9) implying that maximum lik eliho o d s u cceeds with probabilit y con v erging to 1 on the graph w ith L ′ colors, the probabilit y of success of maximum lik eliho o d on the graph w ith L colors m ust also con ve r ge to 1. Ind eed, inequalit y ( 9 ) implies in equalit y ( 8 ) , since L ′ < L . Similarly , w e ma y c hec k that by the Cauc hy-Sc h warz inequalit y , the follo wing relation holds:   v u u t L X ℓ =1 a ℓ − v u u t L X ℓ =1 b ℓ   2 ≤ L X ℓ =1  √ a ℓ − p b ℓ  2 . This captures the f act that if the m axim um lik eliho o d estimator s u cceeds with probabilit y con- v erging to 1 on a graph with L colors when w e replace all o ccurring edges with a single color, then the maximum like lih o o d estimator on the original graph sh ould also succeed with probabilit y con ve r ging to 1. Remark 3.1. E xamining the pr o of of The or em 3.3 , we may se e that it i s not ne c essary for the numb er of c olors L to b e ﬁnite. Inde e d, as long as we have ∞ X ℓ =1  √ a ℓ − p b ℓ  2 > 1 , in the inﬁnite c ase, we wil l also have lim inf n →∞ nI log n > 1 , implying the desir e d r esult. As will b e seen in the pro of of Theorem 3.3 b elo w, w e hav e the c haracterization I = L X ℓ =1  √ a ℓ − √ b ℓ  2 ! log n n + O  log 2 n n 2  of the Renyi d iv ergence. Hence, in equalit y ( 8 ) go verns whether I < log n n or I > log n n , for large n . As will b e illustrated in the computation app earing in th e pro of of Theorem 3.3 , the inequalit y I > log n n implies that th e righ t side of inequalit y ( 3 ) tends to 0 as n → ∞ . On the other hand, the next theorem guaran tees that if I < log n n , w e h av e P ( F ) b ounded a wa y from 0. Hence, the success or failure of maxim um likeli h o o d o ccurs with resp ect to a sharp threshold that is enco ded within the Renyi divergence. In the n ext theorem, we will mak e the additional assu mption that a ℓ , b ℓ > 0 , ∀ 1 ≤ ℓ ≤ L , (10) meaning the pr obabilities of all L colors are n onzero b oth within and b et wee n communities. 7 Theorem 3.4 (Pr o of in Section 4.5 ) . Supp ose the c onditio n ( 10 ) holds. If L X ℓ =1  √ a ℓ − √ b ℓ  2 < 1 , then for any K ≥ 2 and for suﬃciently lar ge n , the maximum lik eliho o d estimator f ails with pr ob ability at le ast 1 3 . View ed from another angle, Theorems 3.3 and 3.4 imply that the quan tity P L ℓ =1  √ a ℓ − √ b ℓ  2 determines a sharp threshold for w h en exact reco ve r y is p ossible in the K -communit y w eigh ted sto c hastic blo ck mo del; w hen the qu an tit y is larger than 1, the maxim um lik eliho o d estimator s uc- ceeds with pr ob ab ility con v erging to 1, whereas when the quan tit y is smaller than 1, the maxim u m lik eliho o d estimator fails with pr ob ab ility b ounded a wa y from 0. Also note that the qu an tity is a sort of Hellinger distance b et w een a and b , although a and b n eed n ot b e the pr ob ab ility mass functions of discrete distribu tions, sin ce their comp onent s do not necessarily sum to 1. Remark 3.2. The assumption ( 10 ) app e ars to b e an undesir able artifact of the te chnique u se d to pr ove The or em 3.4 , which involves b ounding appr opria te functions of the like liho o d r atio b etwe en within-c ommunity and b etwe en-c om munity distributions. However, i t app e a rs that a substantial ly diﬀer ent appr o ach may b e r e quir e d to hand le the c ase when assumption ( 10 ) do es not ne c essar ily hold. F urth ermor e, note that our ar gument also r e quir es the likeliho o d r atio to b e b ounde d by some c onstan t M . H enc e, although our imp os sib ility pr o of c ontinues to hold when L is inﬁnite, we wil l ne e d to assume a b ound of the form sup ℓ ≥ 0  log  p n ( ℓ ) q n ( ℓ )  ≤ M to establish the imp ossibility r esult when L is inﬁnite. (Suc h a b ound cle arly holds for ﬁnite values of L .) W e also note that the results of T heorems 3.3 and 3.4 could b e generalized further to includ e a m ixtu re of discrete and contin uous distr ibutions. In other words, the distrib u tions of p n ( x ) and q n ( x ) could follo w arbitrary (discrete or con tin uous) distr ib utions for the nonzero v alues, as long as p n (0) = 1 − u log n n , a n d q n (0) = 1 − v log n n . This r eﬂ ects the fact that th e graph is still fairly sparse, with av erage d egree scaling as Θ(log n ). Ho w ever, wh enev er t wo no des are c onn ected by an edge, the distr ibution of the co r r esp ond ing e d ge ma y follo w a more general d istr ibution. 3.3 Censored blo c k mo dels and graphical cha nnels W e no w discuss the relationship b et wee n o u r r esults and the notion of graph ical c hann els introd uced b y Ab b e and Mon tanari [ 3 ]. Recall that a graphical c hannel tak es as input a lab eling of v ertices on a graph , and eac h edge is enco ded b y a deterministic f u nction of the adjacen t vertic es. The edges are then p assed through a channel, and the output is observed. Abb e et al. [ 1 ] analyze a sp eciﬁc instan tiation of a discrete graphical c h annel kno wn as the c ensor e d blo ck mo del . In th is case, the no de lab elings are binary , and edges are enco d ed using the X OR op eration on adjacent v ertices. The c han n el is a discrete memoryless c hannel with output 8 alphab et { ⋆, 0 , 1 } , and for ﬁxed probabilities p, q 1 , q 2 ∈ [0 , 1], the transition matrix of the c hannel is giv en by ⋆ 0 1   0 1 − p p (1 − q 1 ) pq 1 1 1 − p p (1 − q 2 ) pq 2 . In other w ords, an edge is r ep laced by ⋆ with probabilit y 1 − p , and is otherwise ﬂ ip p ed with probabilit y q 1 or 1 − q 2 , dep endin g on whether the transm itted edge lab el is 0 or 1. C learly , the observ ed graph ma y b e view ed as a sp ecial case of the discrete model describ ed in Section 3.2 , with K = 2 and L = 2, where ⋆ represents an emp ty edge and the t wo “c olors” are represen ted by 0 and 1. This leads to the follo wing result, a corollary of Theorems 3.3 and 3.4 : Corollary 3.2. In the c enso r e d blo c k mo d e l, supp ose lim inf n →∞  pn log n   p 1 − q 1 − p 1 − q 2  2 + ( √ q 1 − √ q 2 ) 2  > 1 . Then the maximum likeliho o d estimator suc c e e d s with pr ob ability c onver ging to 1 as n → ∞ . On the other hand, if lim sup n →∞  pn log n   p 1 − q 1 − p 1 − q 2  2 + ( √ q 1 − √ q 2 ) 2  < 1 , then the maximum likeliho o d estimator fails with pr ob ability b ounde d away fr om 0. Sharp thresholds we r e d er ived f or the censored b lo c k mo del by Abb e et al. [ 1 ] an d Ha jek et al. [ 18 ] wh en K = 2 and q 1 = 1 − q 2 = ǫ , in the cases where ǫ = 1 2 and ǫ ∈ [0 , 1], resp ectiv ely . It is easy to c h ec k that their thresholds agree with ours. On the other hand, C orollary 3.2 do es not require the graphical channel to ﬂ ip edge lab els with equal p robabilit y , and w e ma y slight ly relax the scaling requiremen t p ≍ log p n in the statement of our corollary . F urthermore, the theorems in Section 3.2 clearly hold for more general graphical c hann els aside from the c hannel giving r ise to the censored blo c k mo del; w e ma y ha ve more th an t wo lab els for eac h no de, corresp onding to a larger cod eb o ok, and the outp ut alphab et of the channel may b e arbitrarily large. T rans lated in to the language of graphical c hann els, our results from Section 3.2 show th e f ollo wing: Corollary 3.3. Consider a gr aph ic al channel, wher e no de inputs ar e binary and e dges ar e enc o de d using an XOR op er ation. The e dges ar e p asse d thr ough a discr ete memoryless channel that maps e ach e dge to a discr ete lab el ℓ ∈ { 1 , . . . , L } , with pr ob ability a ℓ log n n for e dges enc o de d with 0 and pr ob ability b ℓ log n n for e dges enc o de d with 1, and er a ses e dges with pr ob abilities 1 − P L ℓ =1 a ℓ log n n and 1 − P L ℓ =1 b ℓ log n n , r esp e ctively. L et I denot e the R enyi entr opy b etwe en the two output distributions. If lim inf n →∞ nI log n > 1 , the maximum likeliho o d de c o der suc c e e ds with pr ob ability tending to 1. If lim sup n →∞ nI log n < 1 , the maximum likeliho o d de c o der fails with pr ob ability b ounde d away f r om 0. As noted by Abb e and San d on [ 4 ] in a sligh tly diﬀeren t setting, the th reshold for reliable com- m un ication in a graphical channel is go v er n ed by a diﬀerent quan tity from the m utu al inf ormation b et ween the input distribution and the outpu t of the c hannel, which arises from th e analysis of c hannel capacit y in traditional c hannel co d ing theo r y . T h is is b ecause the encodin g of the graphical c hannel is already built into the sto c h astic b lo c k mo d el framew ork, rather than b eing optimized b y the user. It is in teresting to observ e that Renyi div ergence and Hellinger distance are the information-theoretic quantit ies th at determine the “capacit y” of graph ical c hannels in the case of equal-sized communities. 9 3.4 Thresholds for submatrix lo calization The sto c h astic blo c k mo del fr amew ork describ ed in this pap er also has natural connections to the submatrix lo calization problem, in which our more general framew ork inv olving arbitrary (discrete or con tinuous) distrib utions is us efu l in derivin g thresholds for exact r eco v ery . The goal in submatrix lo calizati on is to partition the ro ws and column s of a r andom matrix A ∈ R n L × n R in to disjoint subsets { C 1 , . . . , C K } and { D 1 , . . . , D K } , where n L = P K k =1 C k and n R = P K k =1 D k . F or eac h 1 ≤ k ≤ K , the entries ( i, j ) ∈ C k × D k are d ra wn i.i.d. from a distribu tion G with mean µ n > 0, and all other en tries in A are d ra wn from the recent ered distribu tion G − µ n . Chen and Xu [ 11 ] derive imp ossibilit y and ac hiev abilit y results for su bmatrix lo calization when | C k | = K L and | D k | = K R ; i.e., the r o w and column sub sets ha ve equal size. F urth ermore, the distribution G is assumed to b e sub -Gaussian with parameter 1. Chen and Xu [ 11 ] s ho w that the maxim um lik eliho o d estimator su cceeds with p robabilit y tending to 1 when µ 2 n ≥ c 1 log n min { K L , K R } . (11) F u rthermore, if G ∼ N ( µ n , 1), the probabilit y that maxim um lik eliho o d fails is b ound ed aw a y from 0 when µ 2 n ≤ 1 12 max  log( n R − K R ) K L , log( n L − K L ) K R  . (12) Sp ecializing to the case when K R = K L = n , inequalities ( 11 ) and ( 12 ) imply the existence of a threshold at µ 2 = Θ  log n n  , although th e v alue of the constant h as n ot b een determined precisely . When K R = K L = n , the results in S ection 3.1 ma y b e applied to obtain su ﬃcien t conditions under whic h the maxim u m lik eliho o d estimator su cceeds for the sub matrix lo calization problem with p robabilit y con verging to 1. W e ha ve th e follo wing result, wh ic h follo ws directly from Corol- lary 3.1 and the computation I = µ 2 n 2 in the case when G ∼ N ( µ n , 1): Corollary 3.4. Supp ose K R = K L = n , and let I denote the the R enyi diver genc e of or der 1 2 b etwe en the distributions G and G − µ n . Su pp o se lim inf n →∞ nI log n > 1 . (13) Then the maximum likeliho o d estimato r suc c e e ds with pr ob ability c onver ging to 1. In p art i cular, when G ∼ N ( µ n , 1) , maximum likeliho o d suc c e e ds if lim inf n →∞ nµ 2 n log n > 4 . (14) In particular, note that th e condition ( 14 ) matc hes inequalit y ( 11 ), with a v alue for the sp eciﬁc constan t. F urthermore, the suﬃ cient condition ( 13 ) in Coroll ary 3.4 m ay b e of indep end en t interest in obtaining thresholds for a general v ersion of the sub matrix localization p roblem, where the remaining entries in the martrix are dra wn from a distrib ution G ′ rather than a sh ifted version of G . F or instance, if G ∼ N ( µ n , σ 2 n ) and G ′ ∼ N ( µ ′ n , σ ′ 2 n ), the suﬃcien t condition f or exact reco ve r y in Corollary 3.4 b ecomes lim inf n →∞  ( µ n − µ ′ n ) 2 4 ¯ σ 2 n + log  σ ′ n σ n  − 2 log  σ ′ n ¯ σ n  log n n > 1 , where ¯ σ 2 n := σ 2 n + σ ′ 2 n 2 . Although w e do not y et h a v e tec hn iques for deriving imp ossibilit y results in the general submatrix lo calization setting, w e conjecture that the up p er b oun d s of C orollary 3.4 based on the Ren yi div ergence m ay b e tight here, as well. 10 4 Pro ofs of theorems In th is section, w e outline the pro ofs of the main theorems. Detailed p ro ofs of th e more tec h nical lemmas are contai n ed in the app endix. 4.1 Pro of of Theorem 3.1 W e ﬁrst sho w that the r esult holds when p n and q n are absolutely con tinuous with resp ect to eac h other. W e provide a pro of for the case wh en p n and q n are contin uous d istributions; the result f or discrete distribu tions follo ws by replac in g the integral s with summ ation signs. When p n and q n are not absolutely con tin u ous with resp ect to eac h other, we establish the theorem for the t wo cases (con tin u ous and discrete distrib utions) separately . Deﬁne the fu nction d n ( x ) = log  p n ( x ) q n ( x )  . W e hav e the follo w ing lemma: Lemma 4.1. L et the sets of vertic es c onstituting the two c ommunities b e denote d by A and B . If the maximum likeliho o d estimator do e s not c oincide with the truth, then ther e exist 1 ≤ k ≤ n 2 and sets A w ⊂ A and B w ⊂ B such that | A w | = | B w | = k , and S ( A w , ¯ A w ) + S ( B w , ¯ B w ) ≤ S ( A w , ¯ B w ) + S ( ¯ A w , B w ) . (15) Her e, ¯ A w = A \ A w , ¯ B w = B \ B w , and for disjoint se ts of vertic es ˆ A and ˆ B , S ( ˆ A, ˆ B ) := X i ∈ ˆ A,j ∈ ˆ B d n ( w ij ) . Pr o of. Consider an assignmen t that is more lik ely than the maximum lik eliho o d estimate. F or this assignmen t, let A w and B w b e the sets of misclassiﬁed no des. Without loss of generalit y , we will assume that k = | A w | = | B w | ≤ n/ 2. F or disjoin t s ets of vertic es ˆ A and ˆ B , deﬁne p n ( ˆ A, ˆ B ) = Y i ∈ ˆ A,j ∈ ˆ B p n ( w ij ) , and d eﬁne q n ( ˆ A, ˆ B ) analogously . Since th e new assignmen t is more lik ely that the tru th, we must ha ve p n ( A w , ¯ A w ) p n ( B w , ¯ B w ) q n ( A w , ¯ B w ) q n ( ¯ A w , B w ) ≤ q n ( A w , ¯ A w ) q n ( B w , ¯ B w ) p n ( A w , ¯ B w ) p n ( ¯ A w , B w ) . T aking logarithms, this immediately imp lies that S ( A w , ¯ A w ) + S ( B w , ¯ B w ) ≤ S ( A w , ¯ B w ) + S ( ¯ A w , B w ) , completing the p ro of. Let F b e the ev en t that the maxim um lik eliho o d estimate do es not coincide w ith the truth. F or ﬁxed sets A w and B w of size k , denote P ( k ) n = P  S ( A w , ¯ A w ) + S ( B w , ¯ B w ) ≤ S ( A w , ¯ B w ) + S ( ¯ A w , B w )  . 11 By Lemma 4.1 and a union b ound , we h a ve P ( F ) ≤ n/ 2 X k =1  n k  2 P ( k ) n . (16) Let { X i } i ≥ 1 b e a sequence of i.i.d. random v ariables distributed a ccording to p n , and let { Y i } i ≥ 1 b e a sequence of i.i.d. random v ariables distributed according to q n . F or natural num b er N > 0, deﬁne the expr ession T ( N , p n , q n , ǫ ) = P N X i =1  d n ( Y i ) − d n ( X i )  ≥ ǫ ! . (17) Then P ( k ) n = P   2 k ( n − k ) X i =1 d n ( Y i ) − 2 k ( n − k ) X i =1 d n ( X i ) ≥ 0   = T (2 k ( n − k ) , p n , q n , 0) . (18) Let Z i = d n ( Y i ) − d n ( X i ). The momen t generating fu nction of Z i is then giv en by M ( t ) = E h e td n ( Y i ) i E h e − td n ( X i ) i = Z ∞ −∞  p n ( x ) q n ( x )  t q n ( x ) dx ! Z ∞ −∞  p n ( x ) q n ( x )  − t p n ( x ) dx ! . Let t ⋆ b e the the p oint wh ere M ( t ) is minimized for t > 0. W e ev aluate t ⋆ b y d iﬀeren tiating M ( t ) and setting it to 0, as follo ws: M ′ ( t ) = Z ∞ −∞  p n ( x ) q n ( x )  t log p n ( x ) q n ( x ) q n ( x ) dx ! Z ∞ −∞  p n ( x ) q n ( x )  − t p n ( x ) dx ! + Z ∞ −∞  p n ( x ) q n ( x )  t q n ( x ) dx ! Z ∞ −∞  p n ( x ) q n ( x )  − t log q n ( x ) p n ( x ) p n ( x ) dx ! . Note that if w e substitute t = 1 / 2 in the ab ov e expression, we obtain M ′ (1 / 2) =  Z ∞ −∞ p p n ( x ) q n ( x ) log p n ( x ) q n ( x ) dx   Z ∞ −∞ p p n ( x ) q n ( x ) dx  +  Z ∞ −∞ p p n ( x ) q n ( x ) dx   Z ∞ −∞ p p n ( x ) q n ( x ) log q n ( x ) p n ( x ) dx  = 0 . Since M ( t ) is a con vex function, w e conclude that t ⋆ = 1 / 2. Substituting, we then obtain M ( t ⋆ ) =  Z ∞ −∞ p p n ( x ) q n ( x ) dx  2 . In particular, I = − log M ( t ⋆ ) = − 2 log  Z ∞ −∞ p p n ( x ) q n ( x ) dx  is the Renyi div ergence deﬁned in th e statemen t of the theorem. 12 By a C hernoﬀ b oun d on the sum P 2 k ( n − k ) i =1 Z i , we hav e P ( k ) n ≤  inf t> 0 M ( t )  2 k ( n − k ) =  Z ∞ −∞ p p n ( x ) q n ( x ) dx  4 k ( n − k ) = exp( − 2 k ( n − k ) I ) . Using  n k  ≤  ne k  k and substituting in to inequ alit y ( 16 ), we arr iv e at the b ound P ( F ) ≤ n/ 2 X k =1  ne k  2 k exp( − 2 k ( n − k ) I ) = n/ 2 X k =1 exp  2 k  log n k + 1  − 2 k ( n − k ) I  , (19) whic h is exactly in equ alit y ( 3 ). As n oted earlier, the pro of for absolutely cont inuous discrete dis- tributions follo ws exactly the same steps as ab o ve, and we w ill not rep eat them here. W e n ow tu rn to the case where p n and q n are not necessarily absolutely con tinuous with r esp ect to eac h other. Case 1: p n and q n are contin uous d istributions. Our strategy is to d elib erately create a noisy v ersion of th e edges by adding a small Gauss ian r andom v ariable to the existing edge w eigh ts, and then apply the maxim um like liho o d estimator to the n ew noisy graph. Naturally , this new estimator is w orse than d irectly using a m axim um likeli h o o d estimat or for the original distributions; ho wev er, the b eneﬁt of addin g n oise is that it make s the new distribu tions absolutely con tin u ous with resp ect to eac h other. F or some ν > 0, w e write ˆ p n = p n ⋆ N (0 , ν 2 ) and ˆ q n = q n ⋆ N (0 , ν 2 ), where ⋆ represen ts con vo lu tion. Let the Ren yi diverge n ce b et w een ˆ p and ˆ q b e denoted b y I ν . Using the argument for absolutely cont inuous distr ib utions, we conclude th at P ( F ) ≤ n/ 2 X k =1 exp  2 k  log n k + 1  − 2 k ( n − k ) I ν  . (20) W e claim that li m ν → 0 I ν = I , whic h implies the desired r esult. F rom v an Erv en and Ha r remo ¨ es [ 37 ], the Renyi div ergence is uniform ly con tinuous in ( P , Q ), with resp ect to the to tal v ariation to p ology . Hence, it suﬃ ces to sh o w that lim ν → 0 || ˆ p n − p n || 1 = 0 , and lim ν → 0 || ˆ q n − q n || 1 = 0 . (21) The pr o of of the ab o ve fact is standard and ma y b e f ou n d in Theorem 6.20 of Knapp [ 23 ] or the lecture notes [ 6 ]. Case 2: p n and q n are discrete distributions. Similar to the case of con tin uous distribu tions, w e delib erately create a n oisy graph and use the maxim u m lik eliho o d estimator on this new graph . W e ﬁ x an ǫ > 0 and assume, without loss o f generalit y , that p n (0) , q n (0) > 0. W e ﬁrst replace every edge with w eigh t 0 in the original graph b y an edge with weig ht i , with probability ǫ 2 i , for all i ≥ 1. Th u s, the new edge w eight distributions are give n b y ˆ p n and ˆ q n where ˆ p n (0) = p n (0)(1 − ǫ ) , and p n ( ℓ ) = p n ( ℓ ) + p n (0) ǫ 2 ℓ , for ℓ ≥ 1 , and ˆ q n (0) = q n (0)(1 − ǫ ) , and q n ( ℓ ) = q n ( ℓ ) + q n (0) ǫ 2 ℓ , for ℓ ≥ 1 . Since ˆ p n and ˆ q n are absolutely contin uous w ith resp ect to eac h other, w e ha ve P ( F ) ≤ n/ 2 X k =1 exp  2 k  log n k + 1  − 2 k ( n − k ) I ǫ  . (22) 13 where I ǫ is the Renyi diverge n ce b etw een ˆ p n and ˆ q n . It is easy to see that as ǫ → 0, w e ha ve || ˆ p n − p n || 1 → 0 and || ˆ q n − q n || 1 → 0 . Aga in usin g the con tinuit y of the Renyi diverge n ce from v an Erv en and Harremo ¨ es [ 37 ], we conclude th at lim ǫ → 0 I ǫ = I , whic h concludes the pr o of. 4.2 Pro of of Corollary 3.1 Note that for suﬃcien tly large n , w e ha ve I ≥ (1 + ǫ ) log n n , for some ǫ > 0. Substituting in to the b oun d ( 3 ) of T h eorem 3.1 , we therefore hav e P ( F ) ≤ n/ 2 X k =1 exp  2 k  log n k + 1  − 2 k ( n − k )(1 + ǫ ) log n n  = n/ 2 X k =1 exp  2 k (log n − log k + 1 − (1 − k /n )(1 + ǫ ) log n )  = n/ 2 X k =1 exp  2 k ( − log k + 1 − ( ǫ − k /n − k ǫ/n ) log n )  = n/ 2 X k =1 n − 2 k ǫ exp  − 2 k  log k − 1 − k log n n − k ǫ log n n  . W e br eak up the su mmation in to t w o parts: n/ 2 X k =1 n − 2 k ǫ exp  − 2 k  log k − 1 − k log n n − k ǫ log n n  = 2 X k =1 n − 2 k ǫ exp  − 2 k  log k − 1 − k log n n − k ǫ log n n  + n/ 2 X k =3 n − 2 k ǫ exp  − 2 k  log k − 1 − k log n n − k ǫ log n n  . (23) F or 3 ≤ k ≤ n 2 , w e ha ve log k − k log n n − k ǫ log n n = lo g k − k (1 + ǫ ) log n n ≥ log k 3 . (24) This is the b ecause the fu nction log x x is decreasing for x ≥ 3, so w e only need to verify that 2 3 k log k ≥ (1 + ǫ ) log n n holds at k = n / 2. This is equiv alen t to chec king that 4 3 n log  n 2  = 4 3 n log n − 4 log 2 3 n ≥ (1 + ǫ ) log n n , 14 whic h indeed h olds for suﬃciently large n . S ubstituting the b ound ( 24 ) into inequalit y ( 23 ), w e then obtain n/ 2 X k =1 n − 2 k ǫ exp  − 2 k  log k − 1 − k log n n − k ǫ log n n  ≤ 2 X k =1 n − 2 k ǫ exp  − 2 k  log k − 1 − k log n n − k ǫ log n n  + n/ 2 X k =3 n − 2 k ǫ exp  − 2 k  − 1 + log k 3  . (25) The ﬁrst term in inequalit y ( 25 ) may b e b ounded as follo ws: 2 X k =1 n − 2 k ǫ exp  − 2 k  log k − 1 − k log n n − k ǫ log n n  = n − 2 ǫ exp  2 + 2(1 + ǫ ) log n n  + n − 4 ǫ exp  − 4  log 2 − 1 − 2 log n n − 2 ǫ log n n  < C n − 2 ǫ , for a su itable constan t C . F or the s econd term in in equalit y ( 25 ), note that n/ 2 X k =3 n − 2 k ǫ exp  − 2 k  − 1 + log k 3  ≤ n − 6 ǫ n/ 2 X k =3 exp  2 k log k 3  − 1 + 3 log k  ≤ n − 6 ǫ e 6 X k =3 exp  2 k log k 3  − 1 + 3 log k  + n − 6 ǫ ∞ X k = e 6 +1 exp  2 k log k 3  − 1 + 3 log k  ≤ C 1 n − 6 ǫ + n − 6 ǫ ∞ X k = e 6 +1 exp  − k log k 3  = O ( n − 6 ǫ ) . Th u s, we conclude th at P ( F ) ≤ C 2 n − 2 ǫ , (26) for a su itable constan t C 2 , implying th at P ( F ) → 0 as n → ∞ . This concludes the pr o of. 4.3 Pro of of Theorem 3.2 W e will follo w the argum en ts used in th e pro of of Theorem 3.2 of Zhang and Zhou [ 40 ]. W e lab el the no d es { 1 , 2 , . . . , n K } . Without loss of generalit y , supp ose comm un it y i comprises the no des { ( i − 1) n + 1 , . . . , in } , and denote th e corresp onding assignment mapping no d es to commu- nities by σ 0 . Let A nK × nK b e the adjacency m atrix for the graph, where A i,j ∈ { 0 , . . . , L } is the color of edge ( i, j ). Just as in the K = 2 case, the maxim um like liho o d estimator for K > 2 comm unities seeks the partition that minimizes the wei ght of cross-comm unity edges (equiv alen tly , maximizes the w eight of within-comm unity edges), where the weigh t of an edge with color ℓ ∈ { 0 , 1 , . . . , L } is giv en by w ℓ = lo g  p n ( ℓ ) q n ( ℓ )  . 15 In other words, the maxim u m lik eliho o d estimator ˆ σ satisﬁes ˆ σ = argmax σ X i,j w ℓ · 1 { A i,j = ℓ } 1 { σ ( i ) = σ ( j ) } := argma x σ T ( σ ) . Note that th e v al u e of T ( σ ) r emains the same f or p erm u tations of σ . T o b e precise, let ∆ b e th e set of all p ermutatio n s from { 1 , . . . , K } to { 1 , . . . , K } . F or an assignment σ , denote Γ( σ ) = { σ ′ : ∃ δ ∈ ∆ s.t. σ ′ = δ ◦ σ } . W e ma y c hec k that for all σ ′ ∈ Γ( σ ), we h a ve T ( σ ′ ) = T ( σ ). Thus, the maxim u m lik eliho o d estimator ﬁnd s the b est equiv alence class Γ suc h that any σ ∈ Γ ac hiev es th e maximum v alue of T . F r om the equiv alence class Γ, w e pick a p ermutation σ that is closest to σ 0 in terms of the Hamming distance. Let us denote this assignmen t b y σ (Γ); i.e. σ (Γ) ∈ a r gmin σ ∈ Γ d H ( σ , σ 0 ) , where d H ( σ , σ 0 ) denotes the Hamming distance b et we en σ and σ 0 . W e now deﬁne P m := P {∃ Γ : d H ( σ (Γ) , σ 0 ) = m and T ( σ (Γ)) ≥ T ( σ 0 ) } . Let P ( F ) b e the p robabilit y that the maximum lik eliho o d estimator fails. Clearly , P ( F ) ≤ nK X m =1 P m . (27) F u rthermore, w e hav e the inequalit y P m ≤ |{ Γ : d H ( σ (Γ) , σ 0 ) = m }| · max { σ : d H ( σ ,σ 0 )= m } P ( T ( σ ) ≥ T ( σ 0 )) . W e will b ound eac h of the terms in the ab ov e p ro du ct separately . F or the ﬁrs t term, we use the follo wing lemma: Lemma 4.2 (Prop osition 5.2 of Zhang and Zhou [ 40 ]) . The c ar dinality of the e quivalenc e classes Γ such that d H ( σ (Γ) , σ 0 ) = m is b ounde d as fol lows: |{ Γ : d H ( σ (Γ) , σ 0 ) = m }| ≤ min  enK 2 m  m , K nK  . Supp ose there exists an assignment σ suc h that d H ( σ , σ 0 ) = m and T ( σ ) ≥ T ( σ 0 ). This is equiv alen t to X i,j w ℓ · 1 { A i,j = ℓ } 1 { σ ( i ) = σ ( j ) } ≥ X i,j w ℓ · 1 { A i,j = ℓ } 1 { σ 0 ( i ) = σ 0 ( j ) } , or X i,j : σ ( i )= σ ( j ) ,σ 0 ( i ) 6 = σ 0 ( j ) w ℓ · 1 { A i,j = ℓ } ≥ X i,j : σ ( i ) 6 = σ ( j ) ,σ 0 ( i )= σ 0 ( j ) w ℓ · 1 { A i,j = ℓ } . Denoting γ =    { ( i, j ) : σ ( i ) = σ ( j ) , σ 0 ( i ) 6 = σ 0 ( j ) }    , and α =    { ( i, j ) : σ ( i ) 6 = σ ( j ) , σ 0 ( i ) = σ 0 ( j ) }    , 16 w e then ha ve the b ound P ( T ( σ ) ≥ T ( σ 0 )) = P γ X i =1 d n ( Y i ) − α X i =1 d n ( X i ) ≥ 0 ! ≤ inf t> 0  E e td n ( Y 1 )  γ  E e td n ( X 1 )  α = inf t> 0 L X ℓ =0  p n ( ℓ ) q n ( ℓ )  t q n ( ℓ ) ! · L X ℓ =0  p n ( ℓ ) q n ( ℓ )  − t p n ( ℓ ) ! , where as before, d n ( ℓ ) = log n p n ( ℓ ) q n ( ℓ ) o , and X i ∼ p n and Y i ∼ q n , and w e ha v e used a Chernoﬀ boun d in the ab o ve inequalit y . T aking t = 1 / 2, we then arrive at P ( T ( σ ) ≥ T ( σ 0 )) ≤ e − ( γ + α ) I ≤ e − min( γ , α ) I , (28) where I denotes the Renyi d ivergence of order 1 2 b et ween p n and q n . W e th en use the follo w in g lemma: Lemma 4.3 (Lemma 5.3 of Zhang and Zhou [ 40 ]) . F o r 0 < m < nK , the minimum of α and γ is b ounde d fr om b elow as fol lows: min( α, γ ) ≥ ( nm − m 2 , if m ≤ n 2 2 nm 9 , if m > n 2 . Substituting the b ound fr om Lemma 4.3 into in equalit y ( 28 ), we obtain the u p p er b ound P ( T ( σ ) ≥ T ( σ 0 )) ≤ ( e ( − nm + m 2 ) I , if m ≤ n 2 e − 2 mn 9 I , if m > n 2 . (29) Finally , su bstituting the resu lts of Lemma 4.2 and inequ alit y ( 29 ) into inequalit y ( 27 ), we arriv e at the b ound ( 4 ). Note that we ha ve exp ( − nm + m 2 ) I < exp  − 2 mn 9 I  , for m < 7 n 9 . In particular, th e b ound ( 4 ) may b e relaxed to obtain a b ound of the form P ( F ) ≤ m ′ X m =1 min  enK 2 m  m , K nK  e ( − nm + m 2 ) I + nK X m = m ′ +1 min  enK 2 m  m , K nK  e − 2 mn 9 I , (30) for an y m ′ ≤ ⌊ n 2 ⌋ . W e now v erify the suﬃciency of inequalit y ( 5 ). Supp ose th at for some ǫ > 0 and for all suﬃcien tly large n , we h a ve nI log n > 1 + ǫ. In particular, for m ′ = ⌊ ǫn 2 ⌋ and m ∈ { 1 , . . . , m ′ } , we h a v e the b ound ( n − m ′ ) I ≥ n  1 − ǫ 2  I ≥ n  1 − ǫ 2  (1 + ǫ ) log n n >  1 + ǫ 3  log n, 17 for small enough ǫ and large enough n , implyin g th at P m ≤  enK 2 m e − ( n − m ) I  m ≤  enK 2 e − ( n − m ′ ) I  m ≤  eK 2 n − ǫ/ 3  m . Th u s, m ′ X m =1 P m ≤ m ′ X m =1  eK 2 n − ǫ/ 3  m ≤ ∞ X m =1  eK 2 n − ǫ/ 3  m ≤  eK 2 n − ǫ/ 3  ∞ X m =0  eK 2 n − ǫ/ 3  m ≤ C 1 n − ǫ/ 3 , where the last inequalit y follo ws b ecause the geometric series conv erges for large enough n . F or m ∈ { m ′ + 1 , . . . , nK } , we hav e the b oun d P m ≤  enK 2 m e − 2 nI 9  m ≤  enK 2 m ′ + 1 e − 2 nI 9  m ≤  2 e ǫ K 2 e − 2 nI 9  m . Note that f or large enough n , w e also ha ve 2 nI 9 > 2 n 9 (1 + ǫ ) log n n > 2 log n 9 . Hence, P m ≤  2 e ǫ K 2 n − 2 9  m , implying the b ound nK X m = m ′ +1 P m ≤ ∞ X m =1  2 e ǫ K 2 n − 2 9  m ≤  2 e ǫ K 2 n − 2 9  ∞ X m =0  2 e ǫ K 2 n − 2 9  m ≤ C 2 n − 2 9 . Therefore, using the decomp osition ( 30 ), the total probabilit y of failure is b ound ed by P ( F ) ≤ C 1 n − ǫ/ 3 + C 2 n − 2 / 9 , whic h con ve r ges to 0 as n → ∞ . This sho ws that th e maximum lik eliho o d estimator succeeds with probabilit y tending to 1 as n → ∞ , as w anted. 18 4.4 Pro of of Theorem 3.3 Note that in this setting, we ha ve I = − 2 log s  1 − u log n n   1 − v log n n  ! + L X ℓ =1 √ a ℓ b ℓ log n n ! = − 2 log  1 − u log n 2 n + O  log 2 n n 2   1 − v log n 2 n + O  log 2 n n 2  + L X ℓ =1 √ a ℓ b ℓ log n n ! = − 2 log 1 − u log n 2 n − v log n 2 n + L X ℓ =1 √ a ℓ b ℓ log n n + O  log 2 n n 2  ! = − 2 − u log n 2 n − v log n 2 n + L X ℓ =1 √ a ℓ b ℓ log n n + O  log 2 n n 2  ! = C log n n + O  log 2 n n 2  , (31) where C = P L ℓ =1 ( √ a ℓ − √ b ℓ ) 2 . In p articular, I = L X ℓ =1  √ a ℓ − √ b ℓ  2 ! log n n + O  log 2 n n 2  . Corollary 3.1 (for K = 2 comm u nities) and Th eorem 3.2 (for more than t wo communities) then imply the desired result. 4.5 Pro of of Theorem 3.4 W e will follo w the p ro of strategy of Abb e et al. [ 2 ]. W e will show th at if L X ℓ =1  √ a ℓ − √ b ℓ  2 < 1 , there with a probabilit y of at least 1 / 3, w e can ﬁ nd n o des i ∈ A and j ∈ B such that exc h anging their comm unity assignmen ts has a larger like liho o d th an the ground truth. This wo u ld establish that the maxim u m lik eliho o d estimator f ails with probabilit y at least 1 / 3. Although we will es- tablish th e pro of for the case of t wo comm u nities, w e note that the pr o of b elo w trivially extends to K > 2 comm un ities eac h of size n , simply by taking A and B to b e an y t w o ﬁxed comm un ities from the K comm un ities. Let A = { 1 , 2 , . . . , n } and B = { n + 1 , n + 2 , . . . , 2 n } . F or i 6 = j , let w ij ∈ { 0 , 1 , . . . , L } b e the w eight of the edge ( i, j ). Just as in the case of unlab eled edges, maximizing the likeli h o o d in the lab eled case m a y b e in terpreted as ﬁnding the min-cut for the sto c h astic blo c k mo d el, where th e w eight of an edge with color ℓ ∈ { 0 , . . . , L } is log  p n ( ℓ ) q n ( ℓ )  . F or ease of notation, deﬁne the function d n ( ℓ ) = log  p n ( ℓ ) q n ( ℓ )  . 19 W e may describ e d n explicitly as d n (0) = log  1 − u log n/n 1 − v log n/n  , d n ( ℓ ) = log  a ℓ b ℓ  for 1 ≤ ℓ ≤ K . (32) Note that since d n (0) → 0 as n → ∞ , w e may ﬁn d a constant M > 0 that upp er-b ounds d n for all n . Thus, M ≥ max ℓ d n ( ℓ ) , for all n and all 0 ≤ ℓ ≤ L. F or an y no d e i and any subset of no d es H , denote S ( i, H ) = X j ∈ H,j 6 = i d n ( w ij ) . Using an argument along the lines of Lemma 4.1 , it is easy to chec k that if there exist no des i ∈ A and j ∈ B suc h that S ( i, A \ { i } ) + S ( j, B \ { j } ) < S ( i, B \ { j } ) + S ( j, A \ { i } ) , (33) then the comm unit y assignm ent where σ ( i ) = B and σ ( j ) = A and e very other assignmen t remains the same is more lik ely th an the tru th. Thus, the maxim um lik eliho o d estimator will fail if this happ ens. Deﬁne th e follo wing eve nts: F = maxim u m lik eliho o d f ails , F A = ∃ i ∈ A : S ( i, A \ { i } ) < S ( i, B ) − M , F B = ∃ j ∈ B : S ( j, B \ { j } ) < S ( i, A ) − M . W e hav e the follo w ing simple lemma: Lemma 4.4. If P ( F A ) ≥ 2 / 3 , then P ( F ) ≥ 1 / 3 . Pr o of. By symmetry , we ha v e P ( F B ) ≥ 2 / 3, so by a u nion b ound , P ( F A ∩ F B ) ≥ 1 / 3. Th us, with probabilit y at least 1 / 3, there exist no des i ∈ A and j ∈ B s u c h that S ( i, A \ { i } ) < S ( i, B ) − M ≤ S ( i, B ) − S ( i, j ) = S ( i, B \ { j } ) , and S ( j, B \ { j } ) < S ( j, A ) − M ≤ S ( j, A ) − S ( i, j ) = S ( j, A \ { j } ) . This implies S ( i, A \ { i } ) + S ( j, B \ { j } ) < S ( i, B \ { j } ) + S ( j, A \ { j } ) , whic h from expression ( 33 ), implies that the maximum lik eliho o d estimator fails. W e now d eﬁne γ ( n ) and δ ( n ) as follo ws: γ ( n ) = (log n ) log 2 3 n , an d δ ( n ) = √ log n log lo g n . Let H b e a ﬁx ed sub s et of A of s ize n γ ( n ) . W e will tak e γ ( n ) ≍ (log n ) log 2 3 n , suc h that n γ ( n ) is an in teger. De ﬁ ne th e ev ent ∆ as follo w s: ∆ = for all no des i ∈ H , S ( i, H ) < δ ( n ) . W e then ha v e the follo win g lemma: 20 Lemma 4.5 (Proof in App endix A.1 ) . P (∆) ≥ 9 10 . Finally , deﬁn e the ev ents F ( i ) H and F H as follo ws: F ( i ) H = no d e i ∈ H satisﬁes S ( i, A \ H ) + δ ( n ) < S ( i, B ) − M , F H = ∪ i ∈ H F ( i ) H , and deﬁn e ρ ( n ) = P  F ( i ) H  . (34) W e hav e the follo w ing result: Lemma 4.6. If ρ ( n ) > γ ( n ) log 10 n , then P ( F ) > 1 / 3 for suﬃciently lar ge n . Pr o of. W e ﬁ r st sho w that P ( F H ) > 9 10 for large enough n . Since the ev ents F ( i ) H are indep endent , w e ha ve P ( F H ) = P  ∪ i ∈ H F ( i ) H  = 1 − P  ∩ i ∈ H  F ( i ) H  c  = 1 − (1 − ρ ( n )) n γ ( n ) . Clearly , if ρ ( n ) is not o (1), then P ( F ) tends to 1 and we are done. If ρ ( n ) is o (1), then lim n →∞ (1 − ρ ( n )) n γ ( n ) = lim n →∞ (1 − ρ ( n )) 1 ρ ( n ) ρ ( n ) n γ ( n ) = lim n →∞ exp  − ρ ( n ) n γ ( n )  < 1 10 , where the last inequalit y used the fact that ρ ( n ) > γ ( n ) l og 10 n . Hence, P ( F H ) > 9 10 , as claimed. No w note that ∆ ∩ F H ⊆ F A . By Lemm a 4.5 , we also ha ve P (∆) ≥ 9 10 . Hence, P ( F A ) ≥ P (∆) + P ( F H ) − 1 ≥ 8 10 > 2 3 , whic h com bined with Lemma 4.4 implies the desired resu lt. Let { X i } i ≥ 1 b e a sequence of i.i.d. random v ariables distributed a ccording to p n , and let { Y i } i ≥ 1 b e a s equence of i.i.d. rand om v ariables distributed according to q n . F r om the deﬁnition ( 34 ) of ρ ( n ), and u sing indep end ence, w e ha ve ρ ( n ) = P   n X i =1 d n ( Y i ) − n − n γ ( n ) X i =1 d n ( X i ) > δ ( n ) + M   ≥ P   n − n γ ( n ) X i =1 d n ( Y i ) − n − n γ ( n ) X i =1 d n ( X i ) > δ ( n ) + M − ˆ δ ( n )   × P    n X i = n − n γ ( n ) +1 d n ( Y i ) ≥ ˆ δ ( n )    , (35) for an y ˆ δ ( n ). W e will c ho ose a suitable ˆ δ ( n ) so that P    n X i = n − n γ ( n ) +1 d n ( Y i ) ≥ ˆ δ ( n )    − → 1 . (36) 21 Note that d n ( Y i ) is a random v ariable satisfying P  d n ( Y i ) = log  1 − u log n/n 1 − v log n/n  = 1 − v log n n . Th u s, P  d n ( Y i ) = log  1 − u log n/n 1 − v log n/n  , for all n − n γ ( n ) − 1 ≤ i ≤ n  =  1 − v log n n  n γ ( n ) . W e may c hec k that  1 − v log n n  n γ ( n ) − → 1 , implying that P    n X i = n − n γ ( n ) +1 d n ( Y i ) = n γ ( n ) · log  1 − u log n/n 1 − v log n/n     − → 1 . Th u s, equation ( 36 ) holds with ˆ δ ( n ) =      n γ ( n ) · log  1 − u log n/n 1 − v log n/n       . (37) Since ˆ δ ( n ) = O  log n γ ( n )  = o ( p log n ) , w e ha ve δ ( n ) + M − ˆ δ ( n ) = o ( √ log n ). Recall the deﬁ nition o f the function T in equat ion ( 17 ). W e hav e the follo wing tec hnical lemma: Lemma 4.7 (Proof in App endix A.2 ) . L et ω ( n ) = o ( √ log n ) and N ( n ) = n (1 + o (1)) . Then − log T ( N ( n ) , p n , q n , ω ( n )) ≤ L X ℓ =1  √ a ℓ − p b ℓ  2 ! log n + o (log n ) . Noting that T  n − n γ ( n ) , p n , q n , δ ( n ) + M − ˆ δ ( n )  = P   n − n γ ( n ) X i =1 d n ( Y i ) − n − n γ ( n ) X i =1 d n ( X i ) ≥ δ ( n ) + M − ˆ δ ( n )   , and using L emm a 4.7 , we conclud e that − log T  n − n γ ( n ) , p n , q n , δ ( n ) + M − ˆ δ ( n )  ≤ L X i =1  √ a i − p b i  2 ! log n + o (log n ) . (38) Substituting the b ound s ( 36 ) and ( 38 ) into equation ( 35 ), w e then conclude that − log ρ ( n ) ≤ L X i =1  √ a i − p b i  2 ! log n + o (log n ) . 22 In particular, when L X ℓ =1  √ a ℓ − p b ℓ  2 < 1 , w e ha ve − log ρ ( n ) ≤ log n − log γ ( n ) − log log 10 , for suﬃcientl y large n . Lemma 4.6 then implies that maxim u m lik eliho o d fails with probabilit y at least 1 3 , completing the pro of of th e theorem. 5 Discussion and op en questions W e hav e established thr esholds for exact reco v ery in the framework of w eight ed sto c hastic blo c k mo dels, where edge we ights may b e drawn from arbitrary distribu tions. Whereas p revious in v es- tigatio n s had concen trated on the setting of u n weigh ted edges, we show that the same tec hniques ma y b e extended to the weig hted case. F urtherm ore, the Renyi div ergence of order 1 2 b et ween the distributions of edges coming from within-comm unity and b et ween-c ommunit y connections arises as a fun damen tal quan tit y go ve r ning the h ard ness of the communit y estimation problem. The conclusions of this pap er lea v e op en a n umb er of op en questions r egarding phase transi- tions in general weigh ted sto chastic blo c k m o dels. W e conclude our pap er by highlighting sev eral in teresting directions for future researc h. • T hre sholds for e xa ct recov ery under con tinuous distributions. Although the err or b oun d for m axim um lik eliho o d deriv ed in Th eorem 3.1 d o es n ot imp ose an y conditions on the distrib utions p n and q n , the pro ofs of the up p er and lo w er b ounds in Section 3.2 assume a sp eciﬁc setting in vo lvin g discrete distributions with the same su pp ort. Ho we ver, situations ma y arise w here the observe d edge w eights are generated fr om con tinuous d istributions. The submatrix lo calization pr oblem in S ection 3.4 pr o vides one such example. It w ould b e in ter- esting to see if the Renyi dive r gence b et wee n p n and q n again p la ys a role in c haracterizing the thr eshold for exact reco v ery in the con tinuous case. Ho wev er, a num b er of hurdles ex- ist in extending our p ro of of imp ossibilit y to contin uous distrib utions. Just as with discrete distributions, our p ro of tec hn ique do es not allo w fo r distributions th at are not absolutely co n - tin uous with resp ect to eac h other. F urthermore, we ha ve assumed the existence of a ﬁn ite upp er boun d M on the lik eliho o d ratio b etw een p n and q n . Suc h a b ound ma y n ot exist ev en for absolutely con tinuous distribu tions; for examp le, no suc h b ound exists for p n = N ( µ n , 1) and q n = N (0 , 1) in the sub m atrix lo calizat ion p roblem. Finally , the e m er gence and r elev ance of the Renyi diverge n ce term as a sharp threshold in th is problem ma y b e attributed in part to the sp eciﬁc regime we hav e considered, w here the prob ab ilities of connection scale according to Θ(log n/n ). Mossel et al. [ 29 ] ha ve s ho wn that for Bernoulli distrib utions p n and q n in sligh tly denser regimes, where the probabilities scale according to Θ  log 3 n n  , th e threshold is no longer simply a function of the Ren yi dive r gence. • General t hre sholds for weigh ted distributions. Mossel et al. [ 29 ] derive a v ery general theorem inv olving thr esh olds for the b in ary sto c h astic blo c k mo del when K = 2. Deﬁning P ( n, p n , q n ) = P n X i =1 Y i ≥ n X i =1 X i ! , (39) 23 where X ∼ p n and Y ∼ q n , and p n and q n are Bernoulli d istributions such that p n sto c has- tically dominates q n , Mossel et al. [ 29 ] pro ve that exact reco very of the t wo communities is p ossible if and only if P ( n, p n , q n ) = o  1 n  . On the other hand, there exists an estimator for whic h the fraction of m isclassiﬁed n o des con verges to 0 if and on ly if P ( n, p n , q n ) = o (1). It w ould b e in teresting to d eriv e su c h a s tatement when p n and q n are general distribu tions, whic h could then b e used to prov e our results in Section 3.2 as a sp ecial case. Sp eciﬁcally , one might constru ct the analog of expression ( 39 ) to b e P ( n, p n , q n ) = P n X i =1 d n ( Y i ) − n X i =1 d n ( X i ) ≥ 0 ! , and conjecture analogous r esults ab out exact an d partial r eco v ery based on the rate at whic h P ( n, p n , q n ) con verges to 0. • E ﬃcien t algorithms for exact recov ery in weigh ted sto c hastic blo ck mo de ls. Ha jek et al. [ 17 , 18 ] and Gao et al. [ 15 ] p ro vide eﬃcientl y computable algorithms that ac hieve the threshold for exact reco very in the case of binary sto chastic blo c k models. No w that w e ha v e c haracterized the threshold for a more general class of we ighted distr ibutions, it w ould b e interesting to see if simila r eﬃcien t algorithms ma y b e deriv ed to obtain comm un it y assignmen ts in the weigh ted case. A Pro ofs of tec hnical lemmas In this s ection, w e collect the pro ofs o f the more tec hnical lemmas used in p ro ving the main results. A.1 Pro of of Lemma 4.5 Let ∆ i b e the even t S ( i, H ) < δ ( n ) . By a simple un ion b oun d calculation, we h a ve P (∆) = 1 − P (∆ c ) = 1 − P ( ∪ i ∈ H ∆ c i ) ≥ 1 − | H | · P (∆ c i ) . W e will sh o w that | H | · P (∆ c i ) = o (1) , b y s ho wing that log | H | + log P (∆ c i ) → −∞ , as n → ∞ . Let the weigh ts of the edges from i to nod es w ithin H b e the random v ariables { X 1 , . . . , X | H |− 1 } . Note that the X i ’s are ind ep endent and id entical ly d istr ibuted according to p n . W e hav e P (∆ c i ) = P  S ( i, H ) ≥ √ log n log lo g n  = P   | H |− 1 X j =1 d n ( X i ) ≥ √ log n log lo g n   ≤ inf t> 0 ( E  e td n ( X 1 )  | H |− 1 e t √ log n log log n ) , using a C hernoﬀ b oun d in the last inequ alit y . Th u s , f or t > 0, we ha ve log | H | + log P (∆ c i ) ≤ log n γ ( n ) + log E  e td n ( X 1 )  n γ ( n ) − 1 e t √ log n log log n = lo g n γ ( n ) +  n γ ( n ) − 1  log E h e td n ( X 1 ) i − t √ log n log lo g n . 24 Pic king t = √ log n log log n , the last expression simpliﬁes to − log γ ( n ) +  n γ ( n ) − 1  log E h e √ log n (log l og n ) d n ( X 1 ) i . (40) W e now analyze log E h e √ log n (log l og n ) d n ( X 1 ) i carefully . Note that log E h e √ log n (log l og n ) d n ( X 1 ) i = lo g "  1 − u log n/n 1 − v log n/n  √ log n log log n  1 − u log n n  + L X ℓ =1  a ℓ b ℓ  √ log n log log n a ℓ log n n # := lo g (1 + µ n + ν n ) , where 1 + µ n =  1 − u log n/n 1 − v log n/n  √ log n log log n  1 − u log n n  , and ν n = L X ℓ =1  a ℓ b ℓ  √ log n log log n  a ℓ log n n  . The follo wing b ound holds for ν n : ν n = L X ℓ =1 (log n ) √ log n log a ℓ b ℓ  a ℓ log n n  ≤ C 1 (log n ) C 2 √ log n n , for suitable constants C 1 , C 2 . F or µ n , we ha v e µ n =  1 − u log n/n 1 − v log n/n  √ log n log log n  1 − u log n n  − 1 =  1 − u log n/n 1 − v log n/n  n/ log n ! (log n ) 3 / 2 log log n n  1 − u log n n  − 1 . The term  1 − u log n/n 1 − v log n/n  n/ log n tends to a constant, exp( v − u ). Thus, for large enough n , we ma y ﬁnd constan ts 0 < c 1 < c 2 suc h that  1 − u log n/n 1 − v log n/n  n/ log n ∈ ( c 1 , c 2 ). Using the T a ylor series appr o ximation of c x i near 0, we ha v e c (log n ) 3 / 2 log log n n i = 1 + (log n ) 3 / 2 log log n n log c i + O   (log n ) 3 / 2 log lo g n n ! 2   , so c (log n ) 3 / 2 log log n n i  1 − u log n n  − 1 = (log n ) 3 / 2 log log n n log c i + O   (log n ) 3 / 2 log lo g n n ! 2   − u log n n 1 + (log n ) 3 / 2 log log n n log c i ! . 25 Th u s, for large enough n , there exists a constan t C 3 that satisﬁes | µ n | ≤ C 3 log 2 n n . Using the b ound log(1 + µ n + ν n ) ≤ | µ n | + | ν n | , w e conclude that log E h e √ log n (log l og n ) d n ( X 1 ) i ≤ C ′ 1 (log n ) C ′ 2 √ log n n , for a su itable constan ts C ′ 1 and C ′ 2 . Returning to the expression ( 40 ), we conclud e that − log γ ( n ) +  n γ ( n ) − 1  log E h e √ log n (log log n ) d n ( X 1 ) i ≤ − log γ ( n ) +  n γ ( n ) − 1  C ′ 1 (log n ) C ′ 2 √ log n n . Substituting γ ( n ) = (log n ) log 2 3 n , w e arriv e at the u pp er b ound − log 2 3 n (log log n ) + n (log n ) log 2 3 n − 1 ! C ′ 1 (log n ) C ′ 2 √ log n n . It is easy to c heck that as n → ∞ , w e ha ve n (log n ) log 2 3 n − 1 ! C ′ 1 (log n ) C ′ 2 √ log n n → 0 , and − log 2 3 n (log log n ) → −∞ . This concludes the pro of. A.2 Pro of of Lemma 4.7 W e will us e the pr o of strategy found in Zhang and Z hou [ 40 ]. Let Z = d n ( Y ) − d n ( X ) , where X ∼ p n and Y ∼ q n . Let M ( t ) = E e tZ , and recall the follo wing results from the pro of of Theorem 3.1 : t ⋆ = a r g min t> 0 M ( t ) = 1 2 , M ( t ⋆ ) = L X ℓ =0 p p n ( ℓ ) q n ( ℓ ) ! 2 , I = − log M ( t ⋆ ) = − 2 log L X ℓ =0 p p n ( ℓ ) q n ( ℓ ) ! . 26 Let S N = P N ( n ) i =1 Z i , where the Z i ’s are i.i.d. and d istributed acco r ding to Z , and denote the distribution of Z by p Z . Deﬁne η ( n ) = l og 3 4 n. Then P ( S N ≥ ω ( n )) ≥ X z : S N ∈ [ ω ( n ) ,η ( n )) N ( n ) Y i =1 p Z ( z i ) ≥ M N ( n ) ( t ⋆ ) e t ⋆ η ( n ) X z : S N ∈ [ ω ( n ) ,η ( n )) N ( n ) Y i =1 e t ⋆ z i p Z ( z i ) M ( t ⋆ ) = exp  − N ( n ) I − η ( n ) 2  X z : S N ∈ [ ω ( n ) ,η ( n )) N ( n ) Y i =1 e t ⋆ z i p Z ( z i ) M ( t ⋆ ) , (41) where the second inequalit y uses the fact that e t ⋆ η ( n ) ≥ e t ⋆ P i z i when P N ( n ) i =1 z i < η ( n ). No w denote r ( w ) = e t ⋆ w p Z ( w ) M ( t ⋆ ) , and note that r d eﬁ nes a probabilit y distribution. Deﬁning W 1 , W 2 , . . . , W n to b e i.i.d. random v ariables with probabilit y mass fun ction r ( w ), w e then hav e X z : S N ∈ [ ω ( n ) ,η ( n )) N ( n ) Y i =1 e t ∗ z i p Z ( z i ) M ( t ∗ ) = P   ω ( n ) ≤ N ( n ) X i =1 W i < η ( n )   . (42) W e also h a ve the follo wing concen tration result: Lemma A.1 (Pro of in App endix A.3 ) . L et { W i } i ≥ 1 b e i.i.d. r andom variables distribute d as r ( w ) . Then P n i =1 W i √ log n d − → N (0 , ν 2 ) , as n → ∞ , wher e ν > 0 is a c ons tant. By Lemma A.1 , it follo ws that 1 p log N ( n ) N ( n ) X i =1 W i d → N (0 , ν 2 ) , for some constant ν > 0. F urthermore, by our c hoices of ω ( n ), N ( n ), and η ( n ), w e ha ve ω ( n ) p log N ( n ) → 0 , and η ( n ) p log N ( n ) → + ∞ . Th u s, P   ω ( n ) p log N ( n ) ≤ 1 p log N ( n ) N ( n ) X i =1 W i < η ( n ) p log N ( n )   → 1 / 2 , implying that the left-hand probabilit y expr ession b ecomes larger that 1 / 4 for all large enough n . Com bin ing this with the b oun ds ( 41 ) and ( 42 ), w e then obtain P ( S N ≥ ω ( n )) ≥ exp − N ( n ) I − log 3 4 n 2 − log 4 ! . 27 No w recall the computation in equation ( 31 ) . Using N = n (1 + o (1)), we arr iv e at − log T ( N ( n ) , p n , q n , ω ( n )) = − log P ( S N ≥ ω ( n )) ≤ L X ℓ =1  √ a ℓ − √ b ℓ  2 ! log n + o (log n ) . This concludes the pro of. A.3 Pro of of Lemma A.1 W e sho w that the moment generating fu nction of P n i =1 W i √ log n con ve r ges to that of a normal random v ariable. By a simple compu tation, w e ma y c hec k that r is a sum of d elta distributions with m ass ζ  log p n ( y ) q n ( y ) − log p n ( x ) q n ( x )  = p p n ( x ) q n ( x ) p n ( y ) q n ( y )  P L ℓ =0 p p n ( ℓ ) q n ( ℓ )  2 , (43) at the p oin t log p n ( y ) q n ( y ) − log p n ( x ) q n ( x ) , for all 0 ≤ x, y ≤ L . Note that the righ t-hand side of equation ( 43 ) is symmetric with resp ect to x and y , implying that r is a symmetric distribu tion. F or x, y 6 = 0, w e then ha ve ζ  log p n ( y ) q n ( y ) − log p n ( x ) q n ( x )  = p a x b x a y b y  P L ℓ =0 p p n ( ℓ ) q n ( ℓ )  2 · log 2 n n 2 = O  log 2 n n 2  . F or x = 0 and y 6 = 0 (and by symmetry , f or y = 0 and x 6 = 0), w e ha v e ζ  log p ( y ) q ( y ) − log p (0) q (0)  = p (1 − u log n/n )(1 − v log n/n ) a y b y  P L ℓ =0 p p n ( ℓ ) q n ( ℓ )  2 · log n n = C y log n n + O  log 2 n n 2  , for a su itable constan t C y > 0 . Hence, r (0) = 1 − C 0 log n n + O  log 2 n n 2  , for some constant C 0 > 0 . W e no w examine the range of W d = W i , whic h we denote b y the set W . Note that the range is ﬁ nite, since W can on ly take v alues f rom set n log  p n ( y ) q n ( x ) q n ( y ) p n ( x )  : 0 ≤ x, y ≤ K o . Also note that the range dep ends on n , since the ratio log  p n (0) q n (0)  c hanges with n . Ho w ev er, sin ce log  p n (0) q n (0)  = O  log n n  , this dep enden ce m a y only p ertu rb the range by O  log n n  . Th us, we m a y ﬁx constan ts { 0 , ± w 1 , . . . , ± w R } such that the range of W is giv en b y W = { 0 , ± ˆ w 1 , . . . , ± ˆ w R } where ˆ w i = w i + O  log n n  , for 1 ≤ i ≤ R. Since W is a sym m etric rand om v ariable, it is easy to see that its moment generating fun ction is giv en by E e tW = 1 + R X j =1 r ( ˆ w j )  e t ˆ w j / 2 − e − t ˆ w j / 2  2 , (44) 28 using the fact that r (0) = 1 − P R j =1 2 r ( ˆ w j ). As n oted ab ov e, for certain nonzero ˆ w ∈ W , w e ha ve r ( ± ˆ w ) = Θ  log n n  ; whereas for other v alues, we ha ve r ( ˆ w ) = O  log 2 n n 2  . Without loss of generalit y , let r ( ˆ w j ), for 1 ≤ j ≤ N , b e Θ  log n n  , and let r ( ˆ w j ), for N + 1 ≤ j ≤ R , b e O  log 2 n n 2  . W e th en wr ite r ( ˆ w j ) = C j log n n + O  log 2 n n 2  , f or 1 ≤ j ≤ N . Using the exp r ession ( 44 ), the m oment generating fu n ction of W is then given b y E e tW = 1 + X 1 ≤ j ≤ N  C j log n n + O  log 2 n n 2   e t ˆ w j / 2 − e − t ˆ w j / 2  2 + X N +1 ≤ j ≤ R O  log 2 n n 2   e t ˆ w j / 2 − e − t ˆ w j / 2  2 . Substituting t √ log n in place of t and usin g the appr o ximation a x/ 2 − a − x/ 2 = x log a + O ( x 2 log 2 a ) for x = o (1), w e arr iv e at E e tW/ √ log n = 1 + X 1 ≤ j ≤ N  C j log n n + O  log 2 n n 2   t ˆ w j √ log n + O  1 log n  2 + X N +1 ≤ j ≤ R O  log 2 n n 2   t ˆ w j √ log n + O  1 log n  2 = 1 + C t 2 n + o  1 n  , for a suitable constan t C , where the second equ alit y us es th e fact that ˆ w j = w j + O  log n n  , for all 1 ≤ j ≤ R . Hence, the momen t generating fun ction of P n i =1 W i √ log n is giv en by  E e tW/ √ log n  n =  1 + C t 2 n + o  1 n  n − → e C t 2 , whic h is the moment generating fun ction of N (0 , 2 C ). This completes the pr o of. References [1] E. Abb e, A. S . Bandeira, A. Brac h er, and A. Singer. Decod ing bin ary no de lab els from censored edge measuremen ts: Phase tr an s ition and eﬃcient r eco v ery . IEEE T r ansactions on Network Scienc e and Engine ering , 1(1):10– 22, 2014. [2] E. Abb e, A. S. Bandeira, and G. Hall. Exact reco v ery in the sto c hastic blo c k mo del. arXiv pr epr i nt arXiv:1405.326 7 , 2014. [3] E. Abb e and A. Mon tanari. Cond itional random ﬁelds, plan ted constraint satisfactio n and en- trop y concen tration. I n P . Ragha vendra, S. Raskho dniko v a, K. Jan s en, a n d J. D. P . Rolim, ed- itors, Appr oxima tion, R andomizat i on, and Combinatorial Optimization. Algo rithms and T e ch- niques , v olume 8096 of L e ctur e Notes in Computer Scienc e , pages 332–346. Springer Berlin Heidelb erg, 2013. 29 [4] E. Abb e and C. S andon. Comm unity detection in general sto c h astic b lo c k mo dels: F und a- men tal limits and eﬃcien t reco very algorithms. arXiv pr eprint arXiv: 1503.00 609 , 2015. [5] A. A. Amini, A. Chen, P . J. Bic k el, E. Levina, et al. Pseudo-like liho o d method s for comm unit y detection in large sparse net works. The Annals of Statistics , 41(4):20 97–2122, 2013. [6] R. Ban uelos. Con volutio n s and appr o ximation to the iden tit y . Av aila b le online at http://w ww.math. purdue. edu/ ~ banuelos /ma544- 05 /lectur es3.pdf . [7] A. Barrat, M. Barthelem y , R. Pasto r -S atorras, and A. V esp ignani. The architec tur e o f complex w eight ed n et wo r k s . Pr o c e e dings of the National A c ademy of Scienc es of the Unite d States of Americ a , 101(1 1):3747–37 5 2, 2004. [8] V. D . Blondel, J .-L. Guillaume, R. Lam biotte, and E. L efebvre. F ast unfolding of communities in large net w orks . Journal of Statistic al Me chanics: The o ry and E xp eriment , 2 008(10):P100 08, 2008. [9] S. Bo ccaletti, V. Latora, Y. Moreno, M. Chav ez, and D.-U. Hw ang. Complex n etw orks: Structure and d y n amics. Physics r ep orts , 424(4):175 –308, 2006. [10] Y. Ch en, C. Suh , and A. J Goldsmith. I nformation r eco v ery from pairwise measurements: A Shannon-theoretic app roac h. arXiv pr eprint arXiv : 1504.01 369 , 2015. [11] Y. Ch en and J. Xu. Statistical-computational tr adeoﬀs in plan ted pr oblems and submatrix lo- calizat ion with a gro win g num b er of clusters and submatrices. arXiv pr eprint arXiv:1402.1267 , 2014. [12] Y. Deshp ande, E. Abb e, and A. Mon tanari. Asymptotic mutual in formation f or the t wo -group s sto c hastic blo c k mo del. arXiv pr eprint arXiv:1507 . 08685 , 2015. [13] D. Easley and J. Klein b erg. Networks, Cr owds, and Markets: R e asoning Ab o ut a Highly Conne cte d World . Cambridge Universit y Press, New Y ork, NY, USA, 2010. [14] S . E. Fien b erg, M. M. Mey er, and S. S. W asserman. Statistical analysis of m u ltiple so ciometric relations. Journal of the Americ an Statistic al Asso ciation , 80(389 ):51–67, 1985. [15] C . Gao, Z. Ma, A. Y. Zhang, and H. H. Zhou . Ac hieving optimal misclassiﬁcatio n prop ortion in sto c h astic blo c k mo del. arXiv pr eprint arXiv:1505.03772 , 2015. [16] A. Golden b er g, A. X. Zheng, S. E. Fien b erg, and E. M. Airoldi. A surv ey of statistical net w ork mo dels. F ound. T r ends Mach. L e arn. , 2(2):129 –233, F ebruary 2010. [17] B. Ha jek, Y. W u , and J . Xu. Achieving exa ct cluster reco very threshold via semideﬁ n ite programming. arXiv pr eprint arXiv:1412.615 6 , 2014. [18] B. Ha jek, Y. W u , and J . Xu. Achieving exa ct cluster reco very threshold via semideﬁ n ite programming: Extensions. arXiv pr eprint arXiv:1502.0 7738 , 2015 . [19] E . Hartuv and R . Shamir. A clustering algorithm based on graph connectivit y . Information Pr o c essing L etters , 76(4– 6):175–181, 2000 . [20] S . Heimlic h er , M. Lelarge, and L. Massouli ´ e. Comm unity detection in the lab elled sto c hastic blo c k mo del. arXiv pr eprint arXiv:1209.2910 , 2012 . 30 [21] P . W. Holland, K. B. Lask ey , and S. Leinhardt. Sto c hastic blo c km o dels: First s teps . So cial Networks , 5(2):109 –137, 1983. [22] M. O. Jac kson. So cial and Ec onomic Networks . Prin ceton Univ ersity Press, 2010. [23] A. W. Kn ap p . Basic r e al analysis , v olume 10. Sp r inger Science & Business Media, 2005. [24] M. Lelarge, L. Massouli ´ e, and J . Xu. R econstru ction in th e lab eled sto chastic blo c k mo d el. I n Information The ory Workshop (ITW), 2013 IEE E , pages 1–5. IEEE, 2013. [25] L . Massouli´ e. Communit y detec tion thresholds and the w eak Raman ujan prop ert y . In Pr o c e e d- ings of the 46th An nu al ACM Symp osium on The ory of Computing , S TOC ’1 4, pages 694 –703. A CM, 2014 . [26] F. McSherry . S p ectral partitioning of random graph s. In F o u ndations of Computer Sci e nc e, 2001. Pr o c e e dings. 42nd IEEE Symp osium on , p ages 529–53 7. IEEE, 2001. [27] E . Mossel, J. Neeman, and A. Sly. Sto c hastic Blo ck Mo dels and Reconstruction. arXiv p r eprint arXiv:1202.14 99 . [28] E . Mossel, J. Neeman, and A. S ly . A pro of of the blo c k mo del thr eshold conjecture. arXiv pr epr i nt arXiv:1311.411 5 , 2013. [29] E . Mossel, J . Neeman, and A. S ly . Consistency th resholds for binary symmetric b lo c k mo dels. arXiv pr epr int arXiv:1407.1591 , 2014. [30] M. Newman, A.-L. Barabasi, and D. J. W atts. The Structur e and D ynamics of Networks: (Princ eton Studies in Complexity) . Prin ceton Univ ers it y Press, Pr in ceton, NJ, USA, 2006 . [31] M. E. J. Newman. An alysis of weig hted n et wo rk s . Physic al R eview E , 70(5):0561 31, 2004. [32] M. E. J. Newman and M. Girv an. Findin g and ev aluating communit y structure in n etw orks. Physic al r eview E , 69(2):02 6113, 2004 . [33] J . K. Pr itc hard, M. S teph ens, and P . Donn elly . In ference of p opulation s tructure using m ulti- lo cus genot yp e data. Genetics , 155(2):94 5–959, 2000. [34] M. Rubinov and O . Sp orns. Complex net w ork measures of br ain connectivit y: Uses and in terpr etations. Neur oImage , 52(3):10 59–1069, 2010. Computational Mo d els of the Brain. [35] D.S. Sade. So ciometrics of Macaca mulatt a: I. Lin k ages and cliques in gro oming matrices. F olia Primatolo gic a , 18(3–4):19 6–223, 1972. [36] J . Shi and J . Malik. Normalized cuts and image segmen tation. IE EE T r ans. Pattern Anal. Mach. Intel l. , 22(8):888 –905, August 2000. [37] T . v an Erv en and P . Harremo ¨ es. R´ en yi dive r gence and ku llbac k-leibler div ergence. IEE E T r ansactions on Information The ory , 60(7):37 97–3820, 2014. [38] S . W asserman and C. An derson. S to c hastic a p osteriori b lo c kmo d els: Construction and as- sessmen t. So cial Networks , 9(1):1– 36, 1987 . [39] H. C. White, S. A. Bo orm an , and R. L. Breiger. Social structure from multiple net works: I. Blockmod els of roles and p ositions. Americ an Journal of So ciolo g y , 81(4):730– 780, 1976. 31 [40] A. Y. Zhang and H. H. Zh ou . Minimax rates of communit y detection in stoc hastic block mo del. arXiv pr epr int arXiv:1507.05313 , 2015. 32

Information-theoretic bounds for exact recovery in weighted stochastic block models using the Renyi divergence

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment