Consistency of community detection in networks under degree-corrected stochastic block models

The Annals of Statistics 2012, V ol. 40, No. 4, 2266–22 92 DOI: 10.1214 /12-AOS1036 c  Institute of Mathematical Statistics , 2 012 CONSISTENCY OF COMMUNITY DETE CTION IN NETW ORKS UNDER DEGREE-CORREC TED STOCHASTIC BLOCK MODELS By Yunpeng Zhao, Eliza vet a Levina 1 and Ji Zhu 2 Ge o r ge M ason University, U niversity of Michigan and University of Michigan Comm unity detection is a fundamental problem in netw ork anal- ysis, with applications in man y diverse areas. The sto chastic block mod el is a common to ol for model-based co mmunity detection, and asymptotic tools for c hecking consistency of comm unity d etection under the block model hav e been recen tly developed. How ever , the block mod el is limited by its assumption that all nod es within a comm unity are sto chas tically equiva lent, and pro vides a p o or ﬁt to netw orks with hubs or highly v arying no de degrees within comm uni- ties, whic h are common in practice. The degree-corrected sto chas tic block mo del was p rop osed to add ress this shortcoming and allow s v ariation in n od e degrees w ithin a communit y while preserving the o verall blo ck community structure. I n t his pap er w e establish general theory for c hec king consistency of comm unity detection und er the degree-corrected sto chastic blo ck mod el and compare several com- munit y detection criteria under b oth the stand ard and the d egree- corrected mo dels. W e sho w which criteria are consistent under which mod els and constraints , as wel l as compare their relative performance in practice. W e ﬁnd that m et h od s based on th e degree-corrected b lock mod el, which includ es the standard blo ck m o del as a sp ecial case, are consistent under a wider class of mo dels and that mod ularity- type meth od s require parameter constrain ts for consistency , whereas lik eliho o d-based metho ds do not. On th e other hand, in p ractice, the degree correction inv olv es estimating many more parameters, and empirically we ﬁn d it is only worth doing if the no de d egrees within comm unities are ind eed h ighly v ariable. W e illustrate th e metho ds on simulated netw orks and on a n etw ork of p olitical b logs. Received Novem b er 2011; rev ised July 2012. 1 Supp orted in part by NSF Grants DMS-08-05798, DMS- 01106772 and D MS-1159005 2 Supp orted in part by NSF Grant DMS-07-48389 and NIH Grant R01-GM-096194. AMS 2000 subje ct classiﬁc ations. 62G20. Key wor ds and phr ases. Comm unity d etection, degree-corrected sto chastic blo ck mo d- els, consistency . This is an electronic repr int of the origina l ar ticle published by the Institute of Mathematical Statistics in The Annals of Statistics , 2012, V ol. 40, No. 4, 2266–229 2 . This repr int diﬀers from the or iginal in pagination and t yp ogr aphic detail. 1 2 Y. ZH AO, E. LEV INA AND J. ZHU 1. In tro duction. Net w orks ha ve b ecome one of the more common forms of data, and net w ork analysis has rece ive d a lot of atten tion in computer science, physic s, so cial sciences, biology and statist ics (see [ 13 , 15 , 25 ] for reviews). The applications are man y and v aried, includ in g so cial n et w orks [ 31 , 37 ], gene regulatory net works [ 33 ], recommender systems and secur ity monitoring. One of the fundamenta l problems in netw ork analysis is com- m unity detection, where comm unities are groups o f nod es that are, in some sense, more similar to eac h other than to other no des. The precise deﬁn ition of comm un it y , lik e that o f a cluster in multiv ariate analysis, is diﬃcult to formalize, but many metho ds ha v e b een dev elop ed to address this problem (see [ 11 , 15 , 23 ] for compr eh ensiv e r ecen t reviews), often relying on the in- tuitiv e notion of communit y as a group of no des with man y links b et w een themselv es and few er links to the rest of the netw ork. Three groups of m etho ds for comm u nit y d etection can b e loosely iden- tiﬁed in the literature. A n umber of greedy algorithms such as hierarc hical clustering ha v e b een pr op osed (see [ 22 ] for a review), wh ic h we will not fo cus on in this pap er. The second class of metho ds in vo lv es optimiz ation of so me “reasonable” global criteria o v er all p ossible net work partitions and includes graph cuts [[ 34 ], [ 38 ]], sp ectral clustering [ 28 ] an d mo dularit y [ 23 , 26 ], the latter discussed in detail b elo w. Finally , mo d el-based method s rely on ﬁt- ting a p robabilistic mo del for a n et w ork with comm unities. Perhaps the b est kno wn suc h mo del is the sto c hastic blo c k mod el, which w e will also refer to as simply the blo c k mo del [ 18 , 29 , 35 ]. O ther mo d els include a recen tly in- tro duced degree-co rr ected sto chastic blo c k mod el [ 20 ], mixture mo d els for directed n et w orks [ 27 ], m ultiv ariate laten t v ariable mod els [ 16 ], laten t fea- ture m o dels [ 17 ] and mixed mem b ership sto c hastic blo c k mo d els for model- ing o verlapping comm unities [ 2 ]. F rom the algo rithmic p oin t of view, many mo del-based metho ds also lead to criteria to b e optimized o ve r all partitions, suc h as the proﬁle like liho o d under the assum ed mod el. The la rge num b er o f a v ailable metho ds leads to the question of ho w to compare them in a p rincipled manner, other th an on ind ividual examples. There has b een little theoretical analysis of communit y detect ion method s unt il v ery recen tly , when a consistency framework for comm unity detec- tion w as in tro duced b y Bic k el and C hen [ 5 ]. They dev elop ed general theory for c hecking the consistency of detection criteria under the stoc hastic blo ck mo del (discussed in detail b elo w) as the num b er of no des grows and the num- b er of comm unities remains ﬁxed, and their result has b een generalized to allo w the n umber of comm unities to grow in [ 7 ]; see also [ 32 ]. The sto chasti c blo c k mo del, how ever, h as serious limitati ons in practice : it trea ts all no des within a communit y as stochastic ally equiv alent, and thus do es not allo w for the exist ence of “hubs,” h igh-d egree no des at the cent er of m an y comm u ni- ties o bserved in real data. T o add ress this issue, Karrer and Newman [ 20 ] CONSISTENCY OF COMMUNITY DETECTION 3 prop osed the degree-correcte d sto c hastic blo c k mo del, wh ic h can a ccommo- date hubs (a similar mo del for a directed net work w as previously pr op osed in [ 36 ], bu t they did not fo cus on comm unity detectio n and assumed kn o wn comm unit y memb ership). In [ 20 ], the authors ga ve several examples sho wing this mo del ﬁts d ata with hubs muc h b etter than the blo c k mo del; ho w ev er, there are no consistency results av ailable under this new mo del, and thus no w a y to compare method s in general. In this pap er we generalize the consistency fr amew ork of [ 5 ] to the degree- corrected sto c hastic blo ck mod el and obta in a general theorem for comm u- nit y detection consistency . S ince the degree-co rr ected mo del includes the regular blo c k mo d el as a sp ecial case, co nsistency results under the block mo del follo w automatica lly . W e then ev aluate t w o t yp es of mo dularity and the tw o criteria derived from the b lo c k mo del and the degree-corrected blo ck mo del using this general framework. One of our goa ls is to emphasize the diﬀerence b etw een assu m ed mo dels (needed for theoretical analysis) an d cr i- teria for ﬁn ding the optimal partitio n, whic h ma y o r ma y not be motiv ated b y a particular mo d el. What we ultimately sh o w agrees with statistical com- mon sen s e: cr iteria der ived from a p articular mo d el are consisten t when this mo del is assumed, but not necessarily consistent if the mo del do es not hold. F urther , if a criterion relies imp licitly on an assump tion ab out the mo del parameters (e.g., mo d ularit y implicitly assumes that links within communi- ties are s tr onger than b et w een), then it will b e co nsistent only if the model parameters are co nstrained to satisfy this assumption. W e m ak e all of the ab o ve s tatemen ts precise later in the pap er. The rest of the article is organized as follo ws. W e set up all notation and deﬁne the relev an t mo d els and criteria in S ection 2 . Consistency results under the regular and the degree-corrected stoc hastic blo ck mo dels for all of the criteria in Section 2 are stated in Section 3 . The general consistency theorem which implies all of these results is presen ted in Section 4 . In Section 5 we compare the p erforman ce of these criteria on simulated netw orks, and in Section 6 w e illustrate the m etho ds on a net work of p olitical blogs. Sectio n 7 concludes with a summ ary and discussion. All pro ofs are given in the App end ix . 2. Net work mo dels and communit y d etectio n crit eria. Before w e p ro- ceed to discuss sp eciﬁc criteria and m o dels, w e introd uce some basic nota- tion. A net w ork N = ( V , E ), where V is the set of no des (vertice s), | V | = n , and E is the set of edges, can b e repr esen ted b y its n × n adjacency matrix A = [ A ij ], where A ij = 1 if ther e is an edge from i to j , and A ij = 0 other- wise. W e only co nsid er u n wei ghte d and undirected net w orks here, and thus A is a binary symmetric matrix. The c ommunit y detectio n problem can b e form ulated as ﬁndin g a disjoin t partition V = V 1 ∪ · · · ∪ V K or, equiv alen tly , 4 Y. ZH AO, E. LEV INA AND J. ZHU a set of no de lab els e = { e 1 , . . . , e n } , w h ere e i is the lab el of no de i and tak es v alues in { 1 , 2 , . . . , K } . F or a ny set of lab el assignmen ts e , let O ( e ) b e the K × K matrix deﬁned b y O k l ( e ) = X ij A ij I { e i = k , e j = l } , where I is the in dicator function. F urther, let O k ( e ) = X l O k l ( e ) , L = X ij A ij . F or k 6 = l , O k l is the total num b er of edges b etw een comm unities k and l ; O k is th e sum of no de degrees in comm unit y k , and L is the sum of all degrees in the net w ork. If self-loops are not allo w ed (i.e., A ii = 0 is enforced), then w e can also interpret O k k as t w ice the total num b er of edges w ithin communit y k and L as t wice the n umber of edges in the whole netw ork. Finally , let n k ( e ) = P i I { e i = k } b e the n umber of no des in the k th communit y , and f ( e ) = ( n 1 n , n 2 n , . . . , n K n ) T . The stoc hastic blo c k mo del, whic h is p erhaps the most commonly u s ed mo del for n et w orks with comm unities, p ostulates that, giv en no de lab els c = { c 1 , . . . , c n } , the edge v ariables A ij ’s are indep endent Bernoulli rand om v ariables with E [ A ij ] = P c i c j , (2.1) where P = [ P ab ] is a K × K symmetric matrix. W e will use this form ulation throughout the pap er, whic h allo ws for self-loops. While it is also common to exclude self-lo ops, sometimes they are pr esen t in the data (as in our example in Section 6 ) and allo wing them leads to simpler notation. In pr in ciple, all of our results go through for the v ersion of the mo dels with s elf-lo ops excluded, with appropriate mod iﬁ cations mad e to the proofs. Under the mo del ( 2.1 ), all no des with the same lab el are sto chastic ally equiv alen t to eac h other, whic h in practice limits the app licabilit y of the sto c hastic blo ck mo del, as p oint ed out in [ 20 ]. Th e alternativ e p r op osed in [ 20 ], the degree- corrected sto c hastic blo c k mod el, is to r eplace ( 2.1 ) with E [ A ij ] = θ i θ j P c i c j , (2.2) where θ i is a “degree parameter” asso ciated with nod e i , reﬂecting its in- dividual prop ensit y to form ties. The degree parameters ha ve to sati sfy a constrain t to be iden tiﬁable, whic h in [ 20 ] w as set to P i θ i I ( c i = k ) = 1, for eac h k (other constraint s are p ossible). F u r ther, they replaced the Bernoulli CONSISTENCY OF COMMUNITY DETECTION 5 lik eliho o d by the Poisson, to simplify tec hnical d eriv ations. With these as- sumptions, a proﬁle lik eliho o d can b e d eriv ed b y maximizing o v er θ and P , giving the fol lo wing criterion to b e optimized o v er all p ossible partitions: Q DCBM ( e ) = X k l O k l log O k l O k O l . (2.3) W e ha v e compared the p erformance of this criterion in practice to its sligh tly more complicated v ersion b ased on th e (correct) Bernoulli likeli ho o d instead of the Po isson and found no diﬀerence in the solutio ns these t w o methods pro du ce. The Bern oulli distribution with a small mean is w ell appro ximated b y the P oisson distrib ution, and most real netw orks are sparse, so one can exp ect the approxi mation to work w ell; see also a more detailed d iscu ssion of this in [ 30 ]. W e w ill use ( 2.3 ) in all further analysis, to b e consisten t w ith [ 20 ] and ta ke a dv an tage of the simpler form. The degree-corrected model includes the regular stochastic blo ck mo del as a sp ecial case, w ith all θ i ’s equal. Enforcing this a dd itional constrain t on the proﬁle like liho o d leads to the follo wing criterion to b e optimized ov er all partitions: Q BM ( e ) = X k l O k l log O k l n k n l . (2.4) Lik e crite rion ( 2.3 ), th is is based on the P oisson assumption but gi ve s iden- tical r esults to the Bernoulli version in practice. Here w e use the form ( 2.4 ) for consistency with ( 2.3 ) and with [ 20 ]. A diﬀeren t t yp e of criterion u sed for communit y detectio n is mo du larit y , in tro du ced in [ 26 ]; see also [ 23 ] and [ 24 ]. The basic idea of modu larit y is to compare the num b er of observed edges within a comm un it y to the num b er of exp ected ed ges under a null mo del and maximize this diﬀerence o v er all p ossible communit y partitions. Thus, the ge neral form of a mod ularit y criterion is Q ( e ) = X ij [ A ij − P ij ] I ( e i = e j ) , (2.5) where P ij is the (estimated) pr ob ab ility of an edge falling b et wee n i and j under the null m o del. The con v en tion in the physics literature is to d ivide Q b y L , which we omit here, since it do es not change the solution. The c hoice of the n ull mo del, that is, of a mo del with no comm unities ( K = 1), determines the exact f orm of mo d ularit y . The sto c hastic blo ck mo del with K = 1 is simply the Erd os–Ren yi random graph, where P ij is a constan t whic h can b e estimated b y L/n 2 . Plu gging P ij = L/n 2 in to ( 2.5 ) giv es what w e will call the Erdos–Ren yi mo d ularit y (ERM), Q ERM ( e ) = X k  O k k − n 2 k n 2 L  . (2.6) 6 Y. ZH AO, E. LEV INA AND J. ZHU If instead we take th e degree-corrected mo del with K = 1 as the n ull mo del, it p ostulates that P ij ∝ θ i θ j , where θ i is the degree p arameter. This is essentially the w ell-kno wn exp ected degree rand om graph, also kn o wn as the conﬁguration mo del. In this case, P ij can b e estimated by d i d j /L , where d i = P j A ij is the degree of no de i . Sub stituting this into ( 2.5 ) giv es the p opular Newman–Girv an mod ularit y (NGM) , introd uced in [ 26 ]: Q NGM ( e ) = X k  O k k − O 2 k L 2 L  . (2.7) The four d iﬀeren t criteria for comm unit y detection are summ arized in T a- ble 1 . Note that the t w o lik eliho o d-based criteria, BM and DCBM, tak e in to accoun t all links within and b etw een comm unities, and wh ic h comm unities they connect; whereas the mo d u larities w ould n ot c hange if all the links connecting diﬀeren t communities were randomly p ermute d (a s long as they did n ot b ecome links within comm unities). F ur ther, note that the degree correction amoun ts to s u bstituting O k for n k and L for n , b oth for mo du- larit y and likeli ho o d-based criteria. Th us, if all no d es within a comm unity are treated as equ iv alen t, their n umber s u ﬃces to w eigh communit y stren gth appropriately; and if the no des are allo wed to ha ve diﬀeren t exp ected de- grees, then the num b er of edges b ecomes the correct w eigh t. Both of these features mak e sense intuitiv ely and, as w e will see later, will ﬁt in naturally with consistency conditions. Our analysis indicates that Newman–Girv an mo d ularit y and degree-correcte d blo c k mo del criteria are consistent under the more general degree-c orrected mo dels but Erdos–Ren yi modularity and blo c k mo del criteria are not, ev en though they are consistent un der the regular blo c k mo del. F urther, we sho w that lik eliho o d-based m etho ds are consisten t und er their assu med m o del with no restrictions on parameters, wher eas mo dularities are only consis- ten t if the mo del parameters are constrained to satisfy a “stronger lin ks within than b et w een” condition, which is the basis of m o dularity deriv a- tions. In short, w e sho w that a criterion is consisten t when the underlyin g mo del and a ssu m ptions are c orrect, and not nece ssarily otherwise. T able 1 Summary of c ommunity dete ction crite ria Block mo del Degree-corrected blo ck mo del Modu larit y P k ( O kk − n 2 k n 2 L ) (ERM) P k ( O kk − O 2 k L 2 L ) (NGM) Likel iho o d P kl O kl log O kl n k n l (BM) P kl O kl log O kl O k O l (DCBM) CONSISTENCY OF COMMUNITY DETECTION 7 3. Consistency of comm unit y det ection criteria. Here w e present all the consistency results for the four diﬀeren t criteria d eﬁned in S ection 2 . All these results follo w from the general consistency theorem in Section 4 ; th e pro ofs are giv en in the App end ix . The notion of consistency of comm unit y detection as the n umber of no des grows was in tro du ced in [ 5 ]. They d eﬁ ned a comm unit y detectio n criterion Q to b e consistent if the n o de lab els obtained b y maximizing the criterio n, ˆ c = arg max e Q ( e ), satisfy P [ ˆ c = c ] → 1 as n → ∞ . (3.1) Strictly sp eaking, this deﬁn ition suﬀers from an identiﬁabilit y problem, sin ce most reasonable criteria, including all the ones d iscussed ab ov e, are inv arian t under a p ermutation of c ommunit y la b els { 1 , . . . , K } . Thus, a b etter w a y to deﬁne consistency is to replace the equalit y ˆ c = c with the requiremen t that ˆ c and c b elong to the same equiv alence class of lab el p ermutat ions. F or simplicit y of notation, w e still write ˆ c = c in all consistency results in the rest of the pap er, but tak e them to mean that ˆ c and c are equal up to a p ermutatio n of lab els. The notion o f consistency in ( 3.1 ) is very strong, sin ce it requires asymp- totical ly no errors. One can also deﬁne wh at w e w ill call w eak consistency , ∀ ε > 0 P " 1 n n X i =1 1(ˆ c i 6 = c i ) ! < ε # → 1 as n → ∞ , (3.2) where equalit y is also interpreted to mean members hip in the same equiv- alence class with resp ect to lab el p ermutat ions. In [ 6 ], cond itions were es- tablished for a criterion to b e w eakly consistent under the sto chastic blo c k mo del. All other assumptions b eing equal, weak consistency only r equires that the exp ected degree of the graph λ n → ∞ , whereas strong consistency requires λ n / log n → ∞ . Here, we will analyze both strong and we ak consis- tency under the d egree-correct ed sto c hastic blo c k mo del. F or the asymp totic analysis, we us e a sligh tly diﬀeren t formulatio n o f th e degree-correcte d mo del than that giv en by [ 20 ]. The main diﬀerence is that w e treat true communit y lab els c and deg ree parameters θ = ( θ 1 , . . . , θ n ) as laten t rand om v ariables r ather than ﬁ xed p arameters. Note, ho w ev er, that the crite ria w e a nalyze w er e obtained as proﬁle likeli ho o ds with parameters treated as constan ts. This is one of th e standard approac hes to random eﬀects mo dels, k n o wn as conditional like liho o d (see page 234 of [ 21 ]). Th e net w ork model we use f or consistency analysis can b e describ ed as follo w s: (1) Eac h no de is in dep en d en tly assigned a pair of laten t v ariables ( c i , θ i ), where c i is the comm unity lab el taking v alues in 1 , . . . , K , and θ i is a discrete “degree v ariable” taking v alues in x 1 ≤ · · · ≤ x M . W e do not assume that c i is indep end en t of θ i . 8 Y. ZH AO, E. LEV INA AND J. ZHU (2) The marginal distrib ution of c is m ultinomial with p arameter π = ( π 1 , . . . , π K ) T , a nd θ satisﬁes E [ θ i ] = 1 for ident iﬁabilit y . (3) Giv en c and θ , the edges A ij are indep enden t Bernoulli ran d om v ari- ables with E [ A ij | c , θ ] = θ i θ j P c i c j , where P = [ P ab ] is a K × K symmetric matrix. F or simplicit y , w e allo w self-loops in the net w ork, that is, E [ A ii | c , θ ] = θ 2 i P c i c i . Otherw ise diagonal terms of A ha v e to b e treated sep arately , whic h ultimately make s no diﬀerence for the analysis but makes notatio n more a wkw ard. T o en s ure that all probab ilities are alw a ys less than 1, we requ ire the mo del to s atisfy the co nstraint x 2 M max a,b P ab ≤ 1. W e al so need to consider ho w the mo del c hanges with n . If P ab remains ﬁxed as n gro ws , the exp ected degree λ n will b e p rop ortional to n , which makes th e net w ork unrealistically dense. In stead, w e a llo w the matrix P to scal e with n and, in a slight abuse of notatio n, reparameterize it as P n = ρ n P , where ρ n = P ( A ij = 1) → 0 and P is ﬁxed. W e then sp ecify the rate of c t he exp ected degree λ n = nρ n , whic h has to satisfy λ n log n → ∞ for s tr ong c onsistency and λ n → ∞ for wea k consistency . Let Π b e the K × M matrix r epresent ing the join t distribution of ( c i , θ i ) with P ( c i = a, θ i = x u ) = Π au . F urth er, deﬁne ˜ π a = P u x u Π au . Note that P a ˜ π a = 1 since E ( θ i ) = 1. Moreo v er, we ha v e ˜ π a = π a if c and θ are inde- p end en t, or if θ i ≡ 1 (blo c k mo dels). Thus, we can view ˜ π as an adjusted v ersion of π . Next, we state our consistency r esu lts for the t wo t yp es of mo d ularities under b oth the degree-corrected and the standard block mo del. Theorem 3.1. Under the d e gr e e- c orr e cte d sto c hastic blo ck mo del, if the p ar ameters satisfy ˜ E aa > 0 , ˜ E ab < 0 for al l a 6 = b, wher e ˜ P 0 = P ab ˜ π a ˜ π b P ab , ˜ W ab = ˜ π a ˜ π b P ab ˜ P 0 , ˜ E = ˜ W − ( ˜ W 1 )( ˜ W 1 ) T , the Newman– Girvan mo dularity is str ongly c onsistent when λ n / log n → ∞ and we akly c onsistent when λ n → ∞ . The paramete r constraints in Th eorem 3.1 require, essen tially , that the links within comm unities are more like ly th an the links b et we en. This is particularly easy to see when K = 2, in whic h case the constraint s im p liﬁes to P 11 P 22 > P 2 12 . T aking θ i ≡ 1, we immed iately obtain the follo wing. CONSISTENCY OF COMMUNITY DETECTION 9 Corollar y 3.1 (Established in [ 5 ]). Under the standar d sto chastic blo ck mo del with p ar ameters satisfying The or em 3.1 c onstr aints with ˜ π r eplac e d by π , Newman–Girva n mo dularity is str ongly c onsistent when λ n / log n → ∞ and we akly c onsistent when λ n → ∞ . F or Erdos–Ren yi mo du larit y , whic h has not b een stud ied theoretically b efore, w e can also show consistency u nder the standard blo ck mo del, alb eit with a sligh tly s tronger condition on links within comm unities b eing more lik ely than the links b et w een: Theorem 3.2. Under the standar d sto chastic blo ck mo del, if the p ar am- eters satisfy P aa > P 0 , P ab < P 0 for al l a 6 = b, wher e P 0 = P ab π a π b P ab , the Er dos–R enyi mo dularity criterion ( 2.6 ) is str ongly c onsistent when λ n / log n → ∞ and we akly c onsistent when λ n → ∞ . Ho w ev er, th e Erd os–Ren yi mo dularit y is n ot consisten t u nder the degree- corrected mo del, at least not u nder the s ame parameter constraint . The Erdos–Ren yi mo dularity prefers to group n o des with similar degrees to- gether, whic h ma y not agree w ith true comm un ities when the v ariance in no de degrees is large. Here is a counter-e xample demonstrating this. Let K = 2 , π = (1 / 2 , 1 / 2) T , ρ n = 1 (so that the graph b ecomes dense as n → ∞ ), and P =  0 . 1 0 . 05 0 . 05 0 . 1  . F urther , θ is indep end en t o f c and tak es only tw o v alues, 1 . 6 and 0 . 4, with probabilit y 1 / 2 eac h. If we assign all n o des their true lab els, th e p opula- tion v ersion of th e criterion (where all random quantit ies are replaced b y their exp ectations under the true m o del) giv es Q ERM = 0 . 012 5. How ever, b y grouping no des with the same v alue of θ i ’s toget her, w e get the p opulation v ersion of Q ERM = 0 . 0135, higher than the v alue f or the tr u e partitio n, and this solution will therefore b e p referred in th e limit. Once again, the r esult mak es sense intuitiv ely , since the Erd os–Ren yi mo d- ularit y uses the regular blo c k mo del as its n ull hypothesis, and the parameter constrain t matc h es the “few er links b et w een than within” notion. F rom the algorithmic p oin t of view, the main diﬀerence b et we en Erdos–Ren yi mo d- ularit y and Newman–Girv an modu larit y is that the la tter dep ends on the edge matrix O only and “w eighs” communities by the num b er of edges, whereas th e former weig hs comm unities b y the n um b er of n o des n k (whic h, under the blo c k m o del, is p rop ortional to the n umb er of edges, but und er the degree-co rrected mo del is not) . 10 Y. ZH AO, E. LEV INA AND J. ZHU Next we state the consistency results for the t w o criteria deriv ed from proﬁle lik eliho o ds, DCBM ( 2.3 ) and BM ( 2.4 ). These require no parameter constrain ts. Theorem 3.3. Under th e de gr e e-c orr e cte d sto chastic blo ck mo del (and ther efor e under the r e gular mo del as wel l), the de gr e e-c orr e cte d criterion ( 2.3 ) is str ongly c onsistent when λ n / log n → ∞ and we akly c onsistent when λ n → ∞ . Theorem 3.4. Under the sto chastic blo c k mo del, the blo c k mo del crite- rion ( 2.4 ) is str ongly c onsistent when λ n / log n → ∞ and we akly c onsistent when λ n → ∞ . Theorem 3.4 was pr ov ed in [ 5 ] for a sligh tly diﬀerent form of the pr oﬁle lik eliho o d (Bernoulli rather than the P oisson). Under the degree-corrected blo c k mo d el, criterion ( 2.4 ) is not n ecessarily consisten t—the same counter- example can b e used to demonstrate this. As w as the case with mo dularities, the crite rion consisten t under the d egree-correcte d b lo c k mod el dep ends on O only , whereas the criterion consisten t only under the regular blo c k mo del also dep ends on n k . The theoret ical results suggest that th e likel iho o d-based criteria are al - w a ys preferable o v er the mo dularit y-based criteria, and th at criteria based on the d egree-correcte d mo del are alwa ys p referred to the criteria based on the regular b lo c k mo del, since they are consisten t under w eake r conditions. In practice, ho wev er, th is ma y not alw a ys hold. Computationally , mo du lar- it y t yp e criteria can b e appro x im ately optimized b y solving an eigen v alue problem [ 24 ], whereas lik eliho o d typ e criteria ha v e n o suc h appro ximations and thus h av e to b e optimized by s lo w er heuristic searc h algorithms, as w as done in [ 5 ] and [ 20 ]. Moreo v er, ﬁtting the degree-corrected blo c k mo d el re- quires estimating man y m ore parameters than ﬁtting a blo c k mo del and creates the usual trade-oﬀ b etw een mo del complexit y and go o d ness of ﬁt. If the no de degrees w ithin communities do not v ary w id ely , ﬁtting a blo c k mo del ma y pro vide a better solution; see more on this in Section 5 . 4. A general theorem on consistency under d egree-co rrected sto chastic blo c k m o dels. Here w e prov e a general theorem for chec king consistency under degree-correcte d sto chasti c b lo c k mo d els for any criterion deﬁ ned b y a reasonably nice fu nction. All consistency results for sp eciﬁc metho ds discussed in Sectio n 3 are c orollaries of this theorem. A la rge class of comm u nit y detection criteria c an b e writte n as Q ( e ) = F  O ( e ) µ n , f ( e )  , (4.1) CONSISTENCY OF COMMUNITY DETECTION 11 where µ n = n 2 ρ n . F or instance, man y graph cut methods (mincut, ratio cut [ 38 ], n ormalized cut [ 34 ]) ha v e this form and use functions that are designed to minimize th e n umb er of edges b etw een comm unities. All criteria discussed in Section 3 can also b e wr itten in this form. Ou r goal here is to establish conditions for consistency of a criterion of this form und er degree-corrected blo c k mo dels. A natural condition for consistency is that the “p opulation ve rsion” of Q ( e ) should b e maximize d b y the correct communit y assignment, as in M - estimation. T o deﬁne th e p opulation ve rsion of Q , we ﬁrst deﬁn e fun ctions H ( S ) and h ( S ) corresp ondin g to p opulation v ersions of O ( e ) and f ( e ), re- sp ectiv ely (the p recise meaning of “p opu lation version” is clariﬁed in Pr op o- sition 4.1 b elo w). F or any generic arr a y S = [ S k au ] ∈ R K × K × M , d eﬁne a K × K matrix H ( S ) = [ H k l ( S )] by H k l ( S ) = X abuv x u x v P ab S k au S lbv , and a K -dimen sional v ector h ( S ) = [ h k ( S )] by h k ( S ) = X au S k au . Also deﬁne R ( e ) ∈ R K × K × M b y R k au ( e ) = 1 n n X i =1 I ( e i = k , c i = a, θ i = x u ) . Then w e ha v e the follo wing: Pr opo s ition 4.1. 1 µ n E [ O k l | c , θ ] = H k l ( R ( e )) , (4.2) f k ( e ) = h k ( R ( e )) . (4.3) Prop osition 4.1 explains the precise meaning of “p opulation v ersion”: w e tak e the conditional exp ectations give n c and θ and write them as fun ctions of a generic v ariable S instead of R ( e ). The p opulation v ersion of Q is deﬁned as F ( H ( S ) , h ( S )). No w w e can sp ecify t he k ey suﬃcien t condition as fol lo ws: ( ∗ ) F ( H ( S ) , h ( S )) is uniquely maximized o v er S = { S : S ≥ 0 , P k S k au = Π au } b y S = D , with D k au = Π au E k a , for an y a and u , w here E is an y ro w p ermuta tion of a K × K ident it y matrix. 12 Y. ZH AO, E. LEV INA AND J. ZHU The matrix E deals w ith the p ermutatio n equiv alence class. Since R ( c ) → D as n → ∞ , S = D imp lies eac h class k exactly matc hes a comm unit y in the p opu lation. F or simp licit y , in what follo w s we assume that E is in fact the iden tit y m atrix itself. W e will elab orate on this condition b elo w. In addition, w e need some r egularit y conditions, analog ous to those in [ 5 ]: (a) F is Lipsc hitz in its arguments; (b) Let W = H ( D ). The directional d eriv ativ es ∂ 2 F ∂ ε 2 ( M 0 + ε ( M 1 − M 0 ) , t 0 + ε ( t 1 − t 0 )) | ε =0+ are con tinuous in ( M 1 , t 1 ) for all ( M 0 , t 0 ) in a neigh b orh o o d of ( W , π ) ; (c) Let G ( S ) = F ( H ( S ) , h ( S )). Th en on S , ∂ G ((1 − ε ) D + εS ) ∂ ε | ε =0+ < − C < 0 for all π , P . No w w e are ready to state the main theorem. Theorem 4.1. F or any Q ( e ) of the form ( 4.1 ), if π , P , F satisfy ( ∗ ), (a)–(c ) , then Q is str ongly c onsistent under de gr e e-c orr e cte d sto chastic blo c k mo dels if λ n log n → ∞ and we akly c onsistent if λ n → ∞ . The pro of is giv en in the App end ix . This theorem is a generalization of Theorem 1 in [ 5 ] from the standard sto chastic block mo d els to degree - corrected mo dels, and it implies all of the consistency results in S ection 3 . Finally , w e return to the key condition ( ∗ ). If Q ( e ) is maximized by the true comm u nit y lab els c , then as n → ∞ , F ( H ( S ) , h ( S )), the p opulation v ersion of Q ( e ), s hould also b e maximized by the true partition S = D , since R ( c ) → D and Q ( c ) → F ( H ( D ) , h ( D )), making ( ∗ ) a natural condi- tion. F urther, since for an y e , P k R k au ( e ) → Π au , the limit S of R ( e ) m ust satisfy P k S k au = Π au . Therefore, we only need to consider maximizers of F ( H ( S ) , h ( S )) satisfying this constraint. 5. Numerical ev aluation. In this section w e compare the p erformance of the four communit y detection criteria from S ection 2 on simulated data, gen- erated from the regular or th e d egree-correcte d blo c k mo del. The criteria are maximized o ver partitions using a greedy lab el-switc hing algo rithm called tabu search [ 4 , 14 ]. The ke y idea of tabu searc h is that once a n o de lab el has b een switc hed, it w ill b e “tabu” and not av ailable for switching for a certain n umb er of iterations, to pr ev en t being trapp ed in a lo cal maxim um. Ev en though tabu searc h cann ot guarante e con v ergence to the global maximum, it per f orms w ell in practic e. Moreo ve r, w e run the searc h for a n umber of initial v alues and d iﬀeren t orderings of nod es, to help av oid lo cal maxima. T o compare th e solution to the true lab els, w e u se the adjusted Rand index [ 19 ], a measur e of similarit y b et we en partitions commonly used in cluster- ing. W e ha ve a lso computed th e normalize d m utual information, a measure CONSISTENCY OF COMMUNITY DETECTION 13 more commonly used b y p h ysicists in the net works literature, which giv es v ery similar results (not rep orted to sa ve space). Th e adjusted Rand index is scaled so that 1 corresp ond s to the p erfect matc h and 0 to the exp ected diﬀerence b etw een tw o random partitions, with higher v alues ind icating b et- ter ag reement. The ﬁgur es in this sectio n all presen t the median adjusted Rand index o v er 100 replications. In all examples b elo w, w e generate net wo rks with n = 1000 n o des and K = 2 comm unities. Th e no de lab els are generated indep endent ly with P ( c i = 1) = π , P ( c i = 2) = 1 − π . By v arying π , w e can in v estigate robu stness of the metho d s to unbal anced comm unit y sizes. The probability matrix for th e blo c k mo del and the d egree-correcte d blo c k mo del is set to P = ρ  4 1 1 4  , where w e v ary ρ to obtain diﬀerent exp ected degrees λ . 5.1. The d e gr e e- c orr e cte d sto chastic blo ck mo del. F or this sim ulation, we generate data f rom the degree-corrected mo del with t wo possib le v alues for the degree parameter θ . The degree parameters are generated indep endently from the la b els, with P ( θ i = mx ) = P ( θ i = x ) = 1 / 2 , whic h implies x = 2 m +1 , since we need to ha ve E ( θ i ) = 1. W e v ary the ratio m from 1 (the regular blo c k mo del) to 10, wh ic h allo ws us to s tu dy the eﬀect of mo d el missp eciﬁcation on the regular blo ck mo d el. In this sim ulation, the comm unit y sizes are balanced ( π = 0 . 5). Figure 1 s ho ws the r esults f or three d iﬀeren t exp ected degrees λ . F or the densest net work with λ = 125 in Figure 1 (a), the degree-corrected b lo c k mo del and Newman–Girv an mo dularit y p erform th e b est o verall, as they assume the correct mo d el and the metho ds are consistent . A t m = 1, the Fig. 1. R esults for the de gr e e-c orr e cte d sto chastic blo ck mo del with two values for t he de gr e e p ar ameters, π = 0 . 5 , m varies. 14 Y. ZH AO, E. LEV INA AND J. ZHU Fig. 2. R esults for the standar d sto chastic blo ck mo del, m = 1 , π varies. regular blo c k mo del is just as go o d , b ut its p erformance d eteriorates r apidly as m increases. The Erdos–Ren yi mo du larit y also p erforms p erfectly for m = 1, and it tak es larger v alues of m for its p erformance to deteriorate than for blo ck mo del lik eliho o d, so we can co nclude that the Erdos–Ren y i mo dularity is more robust to v ariation in degrees. F or b oth of them, p o or results are due to grou p ing no des with similar degrees toge ther. The o v erall trend for sparser n et w orks [Figure 1 (b) and (c)] is s imilar, bu t all metho ds p erform worse, as with f ew er links there is eﬀectiv ely less d ata to use for ﬁtting the mo del, and the eﬀect is more pronoun ced f or large m , when degrees ha v e higher v ariance. 5.2. The st o chastic blo ck mo del. Here we fo cus on the s tandard sto c has- tic b lo c k mo del ( m = 1 ) and v ary π to assess robustn ess to unbala nced comm unit y sizes. All the four criteria are consisten t in this case, but, in practice, the closer π is to 0.5, the b etter they p erform (Figure 2 ), with the exception of the blo ck mo d el lik eliho o d in the dense case ( λ = 125), where it p erforms per f ectly for all π . Ov erall, the blo ck mo d el lik eliho o d p erforms b est, whic h is natur al b ecause it is the maxim um lik eliho o d estimator of the correct mo del. The Erdos–Ren yi mo d ularit y also p erforms b etter than the other t w o criteria, whic h o v erﬁt the data b y assuming the degree-corrected mo del and account ing for v ariation in observe d degrees, whic h in this case only adds n oise. 5.3. Unb alanc e d c ommunity si ze s. In this simulation we consider the degree-correcte d sto c hastic blo c k mo d el with un balanced comm unity sizes. W e ﬁx π = 0 . 3 and v ary the r atio m in Figure 3 . F or a d en se net wo rk [ λ = 125, Figure 3 (a)], the p erformance with π = 0 . 3 is similar to the balanced case with π = 0 . 5 [Figure 1 (a)]. Ho wev er, in sparser net works mo dularit y p er- forms m uch worse with un balanced comm u nit y sizes. Th is can also b e seen in Figure 2 f or the case m = 1. The failure of mo dularity to d eal with un bal- anced comm unity sizes w as also recen tly pointed out b y [ 39 ]. Note also that CONSISTENCY OF COMMUNITY DETECTION 15 Fig. 3. R esults for the de gr e e-c orr e cte d sto chastic blo ck mo del with two values for t he de gr e e p ar ameters, π = 0 . 3 , m varies. in the sparsest case ( λ = 12, Figure 3 ), th e degree-correct ed mo del su ﬀers from o ver-ﬁtting when m = 1, as w as al so seen in Fig ur e 2 . 5.4. A diﬀer ent de gr e e distribution. In the last sim ulation w e test the sensitivit y of all m etho ds, but in particular the degree-corrected mo del, to the assumption of a d iscrete degree distribu tion. Here w e sample the deg ree parameters θ i indep en den tly from the follo wing distribu tion: θ i =    η i , w.p. α , 2 / ( m + 1) , w.p. (1 − α ) / 2, 2 m/ ( m + 1) , w.p. (1 − α ) / 2, where η i is uniformly distributed on th e in terv al [0 , 2]. The v ariance o f θ i is equal to α/ 3 + (1 − α )( m − 1) 2 / ( m + 1) 2 . In this sim ulation, w e ﬁx m = 10, whic h mak es the v ariance a decreasing function of α , and v ary α fr om 0 to 1. W e also ﬁx π = 0 . 5. The r esults in Figure 4 s h o w that the d egree-corrected blo c k mo del lik eli- ho o d and Newman–Girv an mo du larit y still p erform well, which su ggests that Fig. 4. R esults for the de gr e e-c orr e cte d sto chastic blo ck mo del with a mixtur e de gr e e dis- tribution, m = 10 , π = 0 . 5 , mixtur e p ar ameter α va ries. 16 Y. ZH AO, E. LEV INA AND J. ZHU T able 2 Statistics of no de de gr e es in the p olitic al blo gs network Mean Median Mi n 1st Qt. 3rd Qt. Max 27.36 13.00 1.00 3.00 36.00 351. 00 the discreteness of θ is not a crucial assum p tion. The regular blo ck mo del fails in this case, as w e would exp ect from earlier results since m = 10, but the p erforman ce of the E rdos–Ren yi mo d ularit y improv es as α incr eases, whic h agrees with our earlier observ ation on its relativ e robus tn ess to v ari- ation in degrees. 6. Example: The p olitical blogs net work. In this sectio n we analyz e a real net w ork of p olitical blogs co mpiled by [ 1 ]. The no des of this net w ork are blogs ab out US p olitics and th e edges are hyp erlinks b et wee n th ese blogs. T he data were collected righ t after the 2004 presiden tial election and demonstrate strong divisions; eac h b log w as man ually lab eled as lib eral or conserv ativ e by [ 1 ], whic h we tak e as ground truth. F ollo win g the analysis in [ 20 ], w e ignore d irections of the h yp erlinks and fo cus on the largest connected comp onent of this net w ork, whic h contai ns 1222 no des, 16,7 14 edges and has the a verag e degree of app r o ximately 27. S ome summary statistic s of the no de d egrees are giv en in T ab le 2 , which sho ws th at the degree d istribution is hea vily sk ewed to the right. W e compare the partitions into t w o co mmunities found by the four dif- feren t comm un it y detection criteria with the true lab els u sing the adjusted Rand ind ex. The Newman–Girv an m o dularity and the degree-corrected mo del ﬁnd v ery similar partitions (they d iﬀer ov er only four no des an d ha ve the same adjus ted Rand index v alue of 0.819, the highest of all m etho ds). The partition found by the E r dos–Ren yi mo dularit y has a sligh tly w orse agree- men t with th e truth (adjusted Rand index of 0.793). The blo c k mo del likeli - ho o d divides the no d es int o t wo groups of lo w degree and high degree, with the adj u sted Rand index of n early 0, w hic h is equiv alent to r andom guessing. The results are sho wn in Figure 5 (d ra wn using the igraph pac k age in R [ 9 ] with the F ruch terman and Reingold la y out [ 12 ]). These are consisten t with what we observ ed in sim ulation studies: the Newman–Girv an mo d u larit y and the degree-correct ed blo c k mo del like liho o d p er f orm b etter in a n et- w ork with high d egree v ariation, and th e Erdos–Ren yi mo du larit y is more robust to degree v ariation than th e bloc k mo del lik eliho o d. All criteria w ere m aximized b y tabu searc h , but for modu larities w e also computed the sol utions based on the eigendecomp osition of the modu larit y matrix. Both solutions we re worse that those found by tabu searc h, bu t w h ile for Newman–Girv an mo dularit y the diﬀerence wa s sligh t (the adjusted Rand CONSISTENCY OF COMMUNITY DETECTION 17 Fig. 5. Politic al blo gs data. No de ar e a is pr op ortional to the lo garithm of its de gr e e and the c olors r epr esent c ommunity lab els. 18 Y. ZH AO, E. LEV INA AND J. ZHU index of 0.781 ins tead of 0.819), eigendecomp osition of the Erdos–Renyi mo dularity yielded a p o or r esult similar to that of block mo d el likel iho o d (with adjusted Rand ind ex v alue of 0.092 instead of 0.819 b y tabu searc h). This sugge sts that Erdos–Renyi mo dularity is numerically less stable und er high degree v ariation, in addition to b eing theoretically not consisten t. More analysis of the eigendecomp osition-based solutions is needed f or b oth t yp es of mo du larities to un derstand conditions under whic h these appr o ximations w ork w ell. 7. Summary and discussion. In this pap er w e deve lop ed a general to ol for chec king consistency of comm unit y detection criteria u n der the degree- corrected sto c hastic b lo c k m o del, a m ore general an d p ractical mo d el than the stand ard sto chastic b lo c k mo del for whic h such theory was previously a v ailable [ 5 ]. This general tool allo wed us to obtain co nsistency results for four diﬀeren t communit y detectio n criteria, and, to the b est of our kno wledge for th e ﬁ r st time in the net works literature, to clearly separate the eﬀects of the mo d el assumed for criteria deriv ation from the mo del assumed tru e for analysis of the criteria. What w e ha v e sho wn is, essent ially , statistical common sense: metho ds are consisten t when the mo del they assume holds for the data. The parameter constrain ts are n eeded w hen m etho ds implicitly rely on th em, although we found that th e t w o d iﬀeren t m o dularity metho d s, while using the same constrain t in sp irit, require somewhat diﬀeren t con- ditions on parameters to be consisten t. The theoretical analysis agrees w ell with b oth sim ulation studies and the data analysis, whic h also indicate that the m etho ds w ith b etter theoretical consistency prop erties d o n ot alwa ys p erform b est in pr actice: there is a cost asso ciated with ﬁ tting the extra complexit y of the degree-corrected mod el, and if there is not enough d ata for that, or the data do es not h a v e m uc h v ariation in node degrees, simpler metho ds based on the standard sto c hastic blo c k mo del will in fact do b etter. There are man y questions that require further in v estigation here, ev en in the con text of mo del-based comm unity detecti on w h en a mod el is assumed true. F or example, w e assum ed that K is known, whic h is not unr easonable in some cases (e.g., divid ing p olitical blogs into lib eral and conserv ativ e), but is in general a diﬃcult op en problem in comm unity d etection. Standard metho ds suc h as AIC and BIC do n ot seem to lend themselv es easily to this case, b ecause of p arameters d isap p earing in nonstandard wa ys w hen going from K + 1 to K b lo c ks. A p ermutat ion test w as prop osed in [ 40 ], but clearly more work is needed. Th ere is also the qu estion of what happ ens if K is allo w ed to gro w w ith n , wh ic h is p robably more realistic than ﬁxed K ; for the stoc hastic blo ck mod el, this case has b een co nsid er ed by [ 7 ] and [ 32 ], but their analysis is sp eciﬁc to the particular metho ds they co nsid ered and d o es n ot extend easily to the degree-co rrected b lo c k mo del. Another op en question is the prop erties of appro ximate but more easily computable CONSISTENCY OF COMMUNITY DETECTION 19 solutions based on th e eigendecomp osition, as opp osed to the p r op erties of global maximizers we s tudied h ere. F or the s to c hastic blo ck mo del, part of th is analysis wa s p erf ormed in [ 32 ]. Our p r actical exp erience su ggests that the b eh avior of eige nv ectors can b e quite complicate d, and it is n ot understo o d at this p oin t wh en this appro ximation w orks w ell. Finally , the sparse case λ n = O (1) is an op en problem in general, although results for some sp ecial cases of the sto c hastic b lo c k mo d el h a v e b een recen tly obtained [ 8 , 10 ]. APPENDIX W e start from summ arizing notation. Let R ( e ) , V ( e ) ∈ R K × K × M , ˆ Π ∈ R K × M , f ( e ) , f 0 ( e ) ∈ R K , where R k au ( e ) = 1 n n X i =1 I ( e i = k , c i = a, θ i = x u ) , V k au ( e ) = P n i =1 I ( e i = k , c i = a, θ i = x u ) P n i =1 I ( c i = a, θ i = x u ) , ˆ Π au = 1 n n X i =1 I ( c i = a, θ i = x u ) , f k ( e ) = 1 n n X i =1 I ( e i = k ) = X au V k au ( e ) ˆ Π au , f 0 k ( e ) = X au V k au ( e )Π au . Ev en though the arbitrary lab eling e is not random, in tuitiv ely one can think of R as the empirical joint d istribution of e , c , and θ , V as the c onditional distribution of e giv en c and θ . F urther, ˆ Π is the emp irical joint d istribution of c and θ , and th u s an estima te of their true join t distrib ution Π, f is the empirical marginal “distribution” of e , and f 0 is the s ame marginal bu t with the empirical joint distr ibution ˆ Π replaced by its p opulation v ersion Π. Then P k V k au ( e ) = 1, and V k au ( c ) = I ( k = a ) for all u . F urther, deﬁne ˆ T ( e ) ∈ R K × K to b e a r escaled exp ectation of th e matrix O cond itional o n c and θ , ˆ T k l ( e ) = 1 µ n E [ O k l | c , θ ] . F rom Prop osition 4.1 , ˆ T k l ( e ) = X abuv x u x v P ab R k au ( e ) R lbv ( e ) 20 Y. ZH AO, E. LEV INA AND J. ZHU = X abuv x u x v P st V k au ( e ) ˆ Π au V lbv ( e ) ˆ Π bv . Replacing ˆ Π b y its exp ectation ˆ Π, w e deﬁne T ( e ) ∈ R K × K b y T k l ( e ) = X abuv x u x v P st V k au ( e )Π au V lbv ( e )Π bv . Also deﬁn e X ( e ) ∈ R K × K to b e the rescaled diﬀerence b et wee n O and its conditional exp ectation, X k l ( e ) = O k l ( e ) µ n − ˆ T k l ( e ) . These qu an tities will b e u sed in the pr o of of the general Theorem 4.1 , where w e ﬁr st app ro ximate 1 µ n O k l b y ˆ T k l ( e ) and then appr o ximate ˆ T k l ( e ) by T k l ( e ). Pr oof of Proposition 4.1 . W e only proof ( 4 .2 ) since ( 4.3 ) is trivial. 1 µ n E [ O k l | c , θ ] = 1 µ n X ij X abuv E [ A ij I ( e i = k , c i = a, θ i = x u ) I ( e j = l , c j = b, θ j = x v ) | c , θ ] = X abuv x u x v P ab R k au ( e ) R lbv ( e ) = H k l ( R ( e )) .  Before we pro ceed to the general theorem, we state a lemma b ased on Bernstein’s inequalit y . Lemma A.1. L et k X k ∞ = max k l | X k l | and | e − c | = P n i =1 I ( e i 6 = c i ) . Then P  max e k X ( e ) k ∞ ≥ ε  ≤ 2 K n +2 exp  − 1 8 C ε 2 µ n  (A.1) for ε < 3 C , wher e C = max { x u x v P ab } . P  max | e − c |≤ m k X ( e ) − X ( c ) k ∞ ≥ ε  ≤ 2  n m  K m +2 exp  − 3 8 εµ n  (A.2) for ε ≥ 6 C m/n . P  max | e − c |≤ m k X ( e ) − X ( c ) k ∞ ≥ ε  ≤ 2  n m  K m +2 exp  − n 16 mC ε 2 µ n  (A.3) for ε < 6 C m/n . CONSISTENCY OF COMMUNITY DETECTION 21 This lemma is similar to Lemma 1.1 of [ 5 ], with a few minor errors cor- rected. T he pro of can b e foun d in the electronic supplement to this article [ 41 ]. Pr oof o f Theore m 4.1 . The pro of is divid ed int o three steps. Step 1: sho w that F ( O ( e ) µ n , f ( e )) is uniformly close to its p opulation v er- sion. More precisely , w e need to prov e that there exists ε n → 0, suc h that P  max e     F  O ( e ) µ n , f ( e )  − F ( T ( e ) , f 0 ( e ))     < ε n  → 1 if λ n → ∞ . (A.4) Since     F  O ( e ) µ n , f ( e )  − F ( T ( e ) , f 0 ( e ))     ≤     F  O ( e ) µ n , f ( e )  − F ( ˆ T ( e ) , f ( e ))     + | F ( ˆ T ( e ) , f ( e )) − F ( T ( e ) , f 0 ( e )) | , it is su ﬃcien t to b ound these t w o terms uniformly . By Lipsc hitz co ntin u it y ,     F  O ( e ) µ n , f ( e )  − F ( ˆ T ( e ) , f ( e ))     ≤ M 1 k X ( e ) k ∞ . (A.5) By ( A.1 ), ( A.5 ) con v erges to 0 uniformly if λ n → ∞ , and | F ( ˆ T ( e ) , f ( e )) − F ( T ( e ) , f 0 ( e )) | (A.6) ≤ M 1 k ˆ T ( e ) − T ( e ) k ∞ + M 2 k f ( e ) − f 0 ( e ) k where k · k is the Euclidean norm for v ectors. F urth er , | ˆ T k l ( e ) − T k l ( e ) | =     X abuv x u x v P ab V k au ( e ) V lbv ( e )( ˆ Π au ˆ Π bv − Π au Π bv )     (A.7) ≤ X abuv x u x v P ab | ˆ Π au ˆ Π bv − Π au Π bv | , and | f k ( e ) − f 0 k ( e ) | =     X au V k au ( e )( ˆ Π au − Π au )     ≤ X au | ˆ Π au − Π au | . (A.8) Since ˆ Π P → Π, ( A.6 ) con verges to 0 uniform ly . Th us, ( A.4 ) holds. Step 2: Prov e that there exists δ n → 0, suc h that P  max { e : k V ( e ) − I k 1 ≥ δ n } F  O ( e ) µ n , f ( e )  < F  O ( c ) µ n , f ( c )  → 1 , (A.9) where k W k 1 = P k au | W k au | for W ∈ R K × K × M . 22 Y. ZH AO, E. LEV INA AND J. ZHU By con tinuit y and ( ∗ ), there exists δ n → 0, suc h that F ( T ( c ) , f 0 ( c )) − F ( T ( e ) , f 0 ( e )) > 2 ε n if k V ( e ) − I k 1 ≥ δ n , where I = V ( c ). Thus, from ( A.4 ), P  max { e : k V ( e ) − I k 1 ≥ δ n } F  O ( e ) µ n , f ( e )  < F  O ( c ) µ n , f ( c )  ≥ P      max { e : k V ( e ) − I k 1 ≥ δ n } F  O ( e ) µ n , f ( e )  − max { e : k V ( e ) − I k 1 ≥ δ n } F ( T ( e ) , f 0 ( e ))     < ε n ,     F  O ( c ) µ n , f ( c )  − F ( T ( c ) , f 0 ( c ))     < ε n  → 1 . ( A.9 ) implies P ( k V ( ˆ c ) − I k < δ n ) → 1 . Since 1 n | e − c | = 1 n n X i =1 I ( c i 6 = e i ) = X au Π au (1 − V aau ( e )) ≤ X au (1 − V aau ( e )) = 1 2  X au (1 − V aau ( e )) + X au X k 6 = a V k au ( e )  = 1 2 k V ( e ) − I k 1 , w eak consistency follo ws. Step 3: In order to pr o v e strong consistency , w e need to sh ow that P  max { e : 0 < k V ( e ) − I k 1 <δ n } F  O ( e ) µ n , f ( e )  < F  O ( c ) µ n , f ( c )  → 1 . (A.10) Note that c ombining ( A.9 ) and ( A. 10 ), w e ha ve P  max { e : e 6 = c } F  O ( e ) µ n , f ( e )  < F  O ( c ) µ n , f ( c )  → 1 , whic h implies the strong consistency . Here w e closely follo w the deriv ation giv en in [ 3 ]. T o p r o v e ( A.10 ), note that by Lipschitz con tin uity and the cont inuit y of deriv ativ es of F with resp ect to V ( e ) in the n eigh b orh o o d of I , we ha ve F  O ( e ) µ n , f ( e )  − F  O ( c ) µ n , f ( c )  (A.11) = F ( ˆ T ( e ) , f ( e )) − F ( ˆ T ( c ) , f ( c ) ) + ∆ ( e , c ) , CONSISTENCY OF COMMUNITY DETECTION 23 where | ∆( e , c ) | ≤ M ′ ( k X ( e ) − X ( c ) k ∞ ), and F ( T ( e ) , f 0 ( e )) − F ( T ( c ) , f 0 ( c )) (A.12) ≤ − C ′ k V ( e ) − I k 1 + o ( k V ( e ) − I k 1 ) . Since th e deriv ative of F is con tin uous with resp ect to V ( e ) in the neigh- b orho o d of I , there exist s a δ ′ suc h that F ( ˆ T ( e ) , f ( e )) − F ( ˆ T ( c ) , f ( c ) ) (A.13) ≤ − ( C ′ / 2) k V ( e ) − I k 1 + o ( k V ( e ) − I k 1 ) holds when k ˆ Π − Π k ∞ ≤ δ ′ . Since ˆ Π → Π, ( A.13 ) holds with p robabilit y approac hing 1. Combining ( A.11 ) and ( A.13 ), it is easy to see that ( A.10 ) follo w s if w e can sho w P  max { e 6 = c } | ∆( e , c ) | ≤ C ′ k V ( e ) − I k 1 / 4  → 1 . (A.14) Again note that 1 n | e − c | ≤ 1 2 k V ( e ) − I k 1 . So f or eac h m ≥ 1, P  max | e − c | = m | ∆( e , c ) | > C ′ k V ( e ) − I k 1 / 4  (A.15) ≤ P  max | e − c |≤ m k X ( e ) − X ( c ) k ∞ > C ′ m 2 M ′ n  = I 1 . Let α = C ′ / 2 M ′ , if α ≥ 6 C , b y ( A. 2 ), I 1 ≤ 2 K m +2 n m exp  − α 3 m 8 n µ n  = 2 K 2 [ K exp( log n − αµ n / (8 / 3 n )) ] m . If α < 6 C , b y ( A.3 ), I 1 ≤ 2 K m +2 n m exp  − α 2 m 16 C n µ n  = 2 K 2 [ K exp(log n − α 2 µ n / (16 C n )) ] m . In b oth cases, since λ n / log n → ∞ , P  max { e 6 = c } | ∆( e , c ) | > C ′ k V ( e ) − I k 1 / 4  = ∞ X m =1 P  max | e − c | = m | ∆( e , c ) | > C ′ k V ( e ) − I k 1 / 4  → 0 as n → ∞ , whic h completes the pro of.  24 Y. ZH AO, E. LEV INA AND J. ZHU Pr oof of Theore m 3.2 . The regularit y conditions are ea sy to v erify . T o c hec k the k ey condition ( ∗ ), note that und er the b lo c k mo del assu mption, ( ∗ ) b ecomes ( ∗∗ ) F ( H ( S ) , h ( S )) is u niquely maximized o ver S = { S : S ≥ 0 , P k S k a = π a } b y S = D , with D = diag( π ) , where S is a generic K b y K matrix. Up to a constan t, the p opulation v ersion of Q ERM is F ( H ( S ) , h ( S )) = X k ( H k k − h 2 k P 0 ) . Using the id entit y , X k ( H k k − h 2 k P 0 ) + X k 6 = l ( H k l − h k h l P 0 ) = X k l H k l −  X k h k  2 P 0 = 0 , and deﬁne ∆ k l =  1 , if k = l , − 1 , if k 6 = l . Then w e ha v e F ( H ( S ) , h ( S )) = 1 2 X k l ∆ k l ( H k l − h k h l P 0 ) = 1 2 X k l ∆ k l  X ab S k a S lb P ab − X ab S k a S lb P 0  = 1 2 X k l X ab S k a S lb ∆ k l ( P ab − P 0 ) ≤ 1 2 X k l X ab S k a S lb ∆ ab ( P ab − P 0 ) = 1 2 X ab ∆ ab π a π b ( P ab − P 0 ) = F ( H ( D ) , h ( D )) . No w it remains to sho w the d iagonal m atrix D (up to a p ermuta tion) is the unique maximizer of F . Th is follo ws from Lemma 3.2 in [ 5 ], sin ce equalit y holds only if ∆ k l = ∆ ab when S k a S lb > 0 and ∆ does not ha v e t w o iden tical columns.  Pr oof of Theorem 3.1 . The consistency of Newman–Girv an mo du - larit y un der the blo c k mo d el has already b een sho wn in [ 5 ]. T o extend this CONSISTENCY OF COMMUNITY DETECTION 25 result to the degree-corrected blo ck m o del, deﬁne ˜ S k a = P u x u S k au . Then ˜ π a = X k ˜ S k a , H k l = X abuv x u x v P ab S k au S lbv = X ab ˜ S k a ˜ S lb P ab , H k = X l H k l = X as ˜ S k a ˜ π s P as . The p opulation v ersion of Q NGM is F ( H ( S )) = X k  H k k ˜ P 0 −  H k ˜ P 0  2  . Using the id entit y X k  H k k ˜ P 0 −  H k ˜ P 0  2  + X k 6 = l  H k l ˜ P 0 − H k H l ˜ P 2 0  = X k l H k l ˜ P 0 −  X k H k ˜ P 0  2 = 0 , w e obtain F ( H ( S )) = 1 2 X k l ∆ k l  P ab ˜ S k a ˜ S lb P ab ˜ P 0 − ( P as ˜ S k a ˜ π s P as )( P bt ˜ S lb ˜ π t P bt ) ˜ P 2 0  = 1 2 X k l X ab ˜ S k a ˜ S lb ∆ k l  P ab ˜ P 0 − ( P s ˜ π s P as )( P t ˜ π t P bt ) ˜ P 2 0  ≤ 1 2 X k l X ab ˜ S k a ˜ S lb ∆ ab  P ab ˜ P 0 − ( P s ˜ π s P as )( P t ˜ π t P bt ) ˜ P 2 0  = 1 2 X ab ∆ ab ˜ π a ˜ π b  P ab ˜ P 0 − ( P s ˜ π s P as )( P t ˜ π t P bt ) ˜ P 2 0  = F ( H ( D )) . Similar to Theorem 3.2 , D is the uniqu e maximizer of F ( H ( ˜ S )) , so it is enough to sh o w S = D whenever ˜ S = D to pro v e uniqueness. ˜ S = D implies ˜ S k a = 0, if k 6 = a . Since x u > 0, we o btain S k au = 0 if k 6 = a , wh ic h giv es the result. W e n ote that this argumen t cannot b e applied to pr o v e the consistency of Erdos–Ren yi mo d u larit y under degree-corrected blo c k mo dels, b ecause in that case h k = P au S k au 6 = P a ( P u x u S k au ) = P a ˜ S k a , when w e use the transformation ˜ S k a = P u x u S k au .  Pr oof of Theorem 3.4 . Up to a constan t, th e p opulation ve rsion of Q BL is F ( H ( S ) , h ( S )) = X k l  H k l log H k l h k h l − H k l  . 26 Y. ZH AO, E. LEV INA AND J. ZHU Let g k l = H k l / ( h k h l ), F ( H ( S ) , h ( S )) = X k l ( H k l log g k l − h k h l g k l ) = X abkl S k a S lb ( P ab log g k l − g k l ) ≤ X ab X k l S k a S lb ( P ab log P ab − P ab ) = X ab ( π a π b P ab log P ab − π a π b P ab ) = F ( H ( D ) , h ( D )) . Since the inequalit y h olds if and only if g k l = P ab when S k a S lb > 0, uniqu eness follo w s from Lemma A.2 , stated next.  Lemma A.2. L et g , P , S b e K × K matric es with nonne gative entries. Assume that: (a) P and g ar e symmetric; (b) P do e s not have two identic al c olumns; (c) ther e exists at le ast one nonzer o entry in e ach c olumn of S ; (d) for 1 ≤ k , l , a, b ≤ K, g k l = P ab whenever S k a S lb > 0 . Then S is a diagonal matrix or a r ow/c olumn p ermutation of a diagonal matrix. This lemma is a generalization of Lemma 3.2 in [ 5 ]. Th e pro of is giv en in the elect ronic su pplement [ 41 ]. Pr oof of Theorem 3.3 . Up to a constan t, th e p opulation ve rsion of Q DCBM is F ( H ( S )) = X k l  H k l log H k l H k H l − H k l  , (A.16) where w e only c hec k ( ∗∗ ) [the form ( ∗ ) tak es under th e blo c k mo del]. Th e generalizat ion to the degree-corrected block mo del is similar to the pro of of Theorem 3.1 and is omitted. Let g k l = H k l / ( H k H l ), and F ( H ( S )) = X k l ( H k l log g k l − H k H l g k l ) = X k l  X ab S k a S lb P ab log g k l −  X as S k a π s P as  X bt π t S lb P tb  g k l  = X k l X ab S k a S lb  P ab log g k l −  X s π s P as  X t π t P tb  g k l  = I 2 . CONSISTENCY OF COMMUNITY DETECTION 27 Since arg max x ( c 1 log x − c 2 x ) = c 1 /c 2 , replacing g k l b y P ab ( P s π s P as )( P t π t P tb ) , w e obtain I 2 ≤ X k l X ab S k a S lb  P ab log P ab ( P s π s P as )( P t π t P tb ) − P ab  = X ab  π a π b P ab log P ab ( P s π s P as )( P t π t P tb ) − π a π b P ab  = F ( H ( D )) .  Ac kno wledgment s. Th is w ork wa s carried out wh ile Y unp eng Zhao was a P h.D. studen t at the Universit y of M ic higan. W e thank Brian Karrer and Mark Newman (Univ ersit y of Mic higan) for helpful commen ts and co rrec- tions, Pete r B ¨ uhlmann (ETH) for th e r ole he play ed as Editor, and t w o anon ymous referees for their constructive feedbac k and corrections. SUPPLEMENT AR Y MA TERIAL Pro ofs of Lemmas A.1 and A.2 . (DOI: 10.1214 /12-A OS1036SUPP ; .p df ). The supp lemental material con tains p ro ofs of Lemmas A.1 and A.2 s tated in the App end ix . REFERENCES [1] Adamic, L. A. and Glance, N. ( 2005). The p olitical b logosphere and the 2004 US Election: Divided they blog . In Pr oc e e dings of the 3r d I nternational Workshop on Link Disc overy 36-43. AC M, N ew Y ork. [2] Ai ro ldi, E. M. , Blei, D. M. , Fienberg, S. E. and Xing, E. P. (2008). Mixed membership sto chastic blo ckmodels. J. Mach. L e arn. Re s. 9 1981–2 014. [3] Ai ro ldi, E . M. and Choi, D. (2011). Summary of proof in “A nonparametric view of n etw ork mod els and Newman–Girv an and other mo dularities.” P ersonal com- municatio n. [4] Be asley, J. E. (1998). H euristic algorithms for t he unconstrained b inary quadratic programming problem. T echnical rep ort, Management School, Imp erial College, London, U K. [5] Bi ckel, P. J. and Chen, A. (2009). A non p arametric view of netw ork mo dels and Newman–Girv an and other modularities. Pr o c. N atl. A c ad. Sci. USA 106 21068– 21073. [6] Bi ckel, P. J. and Chen, A. (2012). W eak consistency of comm unity detection cri- teria u nder the stochastic blo ck mo del. Unpu blished man uscript. [7] Choi, D. S. , Wolfe, P. J. and Airoldi, E. M. ( 2012). St o chastic b lo ckmodels with gro wing n umber of classes. Biom etrika 99 273–2 84. [8] Coja-Oghlan, A. and Lanka, A. (2010). Finding plan ted partitions in rand om graphs with general degree distributions. SIAM J. Discr ete Math. 23 1682– 1714. MR2570199 28 Y. ZH AO, E. LEV INA AND J. ZHU [9] Csard i, G. and Nepusz, T. (2006). The igraph softw are p ack age for complex n etw ork researc h. InterJournal Complex Systems 1695. [10] Decelle, A . , Krzakala, F. , Moore, C. and Zdebor ov ´ a, L. (2012). Asymptotic analysis of th e sto chastic b lock mo del for mo dular netw orks and its algorithmic applications. Phys. R ev. E 84 066106. [11] F or tuna to, S. (2010). Communit y detection in graphs. Phys. R ep. 486 75–174. MR2580414 [12] Fr uchterman , T. M . J. and Reingold, E. M. (1991). Graph dra wing by force- directed p lacement. Softwar e: Pr actic e and Exp erienc e 2 1 1129– 1164. [13] Getoor, L. and Diehl, C. P. (2005). Link mining: A survey . ACM SIGKDD Ex- plor ations Newslet ter 7 3–12. [14] Glo ver, F. W. an d Lagunas, M. (1997). T abu Se ar ch . K luw er Academic, Norwel l. [15] Goldenberg, A. , Zheng, A. X . , Fienberg, S. E. and Ai ro ldi, E. M. (2010). A surv ey of statistical netw ork mod els. F oundations and T r ends i n Machine L e arning 2 129–233. [16] Handcock, M. S. , Rafter y, A. E. and T antrum, J. M. (2007). Mo del-based clus- tering for social netw ork s. J. R oy. Statist. So c. Ser. A 170 301–354. MR2364300 [17] Hoff, P. D. (2007). Mo deling homophily and stochastic equiv alence in symmetric relational data. In A dvanc es i n Neur al Information Pr o c essing Systems , 19 MIT Press, Cambridge, MA. [18] Holland, P. W . , Laskey, K. B. and Leinhardt, S. (1983). S to chas tic blo ckmod- els: First steps. So cial Networks 5 109–1 37. MR0718088 [19] Huber t, L. and Arabie, P. (1985). Comparing partitions. J. Classiﬁc ation 2 193– 218. [20] Karrer, B. and Ne wman, M. E. J. (2011). Sto chasti c blo ckmodels and communit y structure in netw orks. Phys. R ev. E (3) 83 016107. MR2788206 [21] McCulloch, C. E. and Searle, S. R. (2001). Gener alize d, Line ar, and Mi xe d Mo d- els . Wiley-Interscience, New Y ork. MR1884506 [22] Newman, M. E. J. (2004). Detecting community structu re in netw orks. Eur. Phys. J. B 38 321–330. [23] Newman, M. E. J. (2006). Mo dularity and comm unity stru ct u re in netw orks. Pr o c. Natl. A c ad. Sci. USA 103 8577–8582. [24] Newman, M. E. J. (2006). Finding comm unity structure in netw orks using the eigen vectors of matrices. Phys. R ev. E (3) 74 036104, 19. MR2282139 [25] Newman, M. E. J. (2010). Networks: An Intr o duction . O xford U niv. Press, O x ford. MR2676073 [26] Newman, M. E. J. and Gir v an, M. (2004 ). Finding and ev aluating comm unity structure in netw orks. Phys. R ev. E 69 026113. [27] Newman, M. E. J. and Leicht, E. A. (2007). Mixture mo dels and exploratory analysis in netw ork s. Pr o c. Natl. A c ad. Sci. USA 104 9564–9569. [28] Ng, A. , Jordan, M. and We iss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Neur al Information Pr o c essing Systems 14 ( T. Di etterich , S. Becker an d Z. Gh ahramani , eds.) 849–856. MIT Press, Cam bridge. [29] No wicki, K. and S nijders, T . A. B. (2001). Estimation and p rediction for sto chas- tic blo ckstructures. J. Amer. Statist. A sso c. 96 1077–1087. MR1947255 [30] Perr y, P. O. and Wolfe, P. J. (2012). Null mo dels for netw ork d ata. Ava ilable at arXiv: 1201.58 71v1 . [31] Ro bins, G. , Snijders, T. , W ang, P. , Handcock, M. and P a ttison, P. (2007). Recent developmen ts in exponential rand om graphs models ( p ∗ ) for social net- w orks. So cial Networks 29 192–215. CONSISTENCY OF COMMUNITY DETECTION 29 [32] Ro he, K. , Cha tterj e e, S. and Yu, B. (2011). Sp ectral clustering and th e high- dimensional sto chastic blo ckmodel. Ann. Statist. 39 1878–191 5. MR2893856 [33] Schlitt, T. and Brazma, A. (2007). Current approaches to gene regulatory netw ork mod elling. BMC Bioi nformatics 8 S9. Suppl 6. [34] Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE T r ans. Pattern Analysis and Mach ine Intel ligenc e 22 888–90 5. [35] Snijders, T. A. B. and No wicki, K. (1997). Estimation and p red iction for sto chas- tic blockmo dels for graphs with latent b lock structure. J. Classiﬁc ation 14 75– 100. MR1449742 [36] W ang, Y. J. and Wo ng, G. Y. (1987). Sto chas tic blo ckmodels for directed graphs. J. Amer. Statist. As so c. 82 8–19. MR0883333 [37] W asserman, S. and F aust , K. (1994). So cial Network A nalysis: Metho ds and Ap- plic ations (Structur al Analysis in the So cial Scienc es) . Cambridge Univ. Press, Cam bridge. [38] Wei, Y. C. and Cheng, C. K. (1989). T ow ard eﬃcien t hiera rchical d esigns by ra- tio cut partitioning. In Pr o c e e dings of the IEEE Interna tional Confer enc e on Computer Aide d Design 298–301. IEEE, N ew Y ork. [39] Zhang, S. and Zhao, H. (2012). Communit y identiﬁcation in netw orks with unbal- anced struct u re. Phys. R ev. E 85 066114 . [40] Zhao, Y. , Le v ina, E. and Zhu, J. (2011). Communit y extraction for social n etw orks. Pr o c. Natl. A c ad. Sci. USA 108 7321–7326. [41] Zhao, Y. , Levina , E. and Zh u, J. (20 12). Supplement to “Consis tency of com- munit y d etection in net works under degree-corrected stochastic block mo dels.” DOI: 10.1214 /12-AOS1036SUPP . Y. Zhao Dep a rt ment of S t at istics George Ma son University 4400 University Drive, MS 4A7 F airf ax, Virginia 2203 0-4444 USA E-mail: yzhao15@gm u.edu E. Levina J. Zhu Dep a rt ment of S t at istics University of Michiga n 439 West Hall 1085 S. University A ve. Ann Arbor, Michig an 48109-1 107 USA E-mail: elevina@umic h.edu jizhu@umic h.edu The Annals of Statistics 2015, V ol. 43, No. 1, 462–466 DOI: 10.1214 /14-AOS1271 c  Institute of Mathematical S tatistics , 2015 CORRECTION TO THE PR OOF OF C ONSISTENCY OF COMMUNITY DETECTION By Peter J. Bickel, Aiyou Chen , Yunpeng Zhao, Eliza vet a Levina and J i Zhu University of California, Berkeley, Go o gle Inc, Ge or ge Mason Univ ersity, University of Michigan and U niversity of Michigan This n ote corrects an error in tw o related pro ofs of consistency of comm unity detection: under stochastic block mo dels by Bic kel and Chen [ Pr o c. Natl. A c ad. Sci. USA 106 (2009) 21068–2107 3] and under degree-corrected stochastic block mod el b y Zhao, Levina and Zh u [ A nn. Statist. 40 (2012) 2266–2 292]. This note pro vides a correction to the p r o of of consistency of comm un it y detection under degree-co rrected sto c hastic blo ck models [ 2 ], publish ed in this journal. The same error app eared earlier in the pro of of consistency under the sto c hastic b lo c k m o dels [ 1 ]. In this note, w e provide the correction for the pro of of [ 2 ], using the notation of that pap er, since the case of the degree-correcte d sto c hastic b lo c k mod els is more general and includes the regular s to c hastic blo c k mo d els as a sp ecial case. V ery similar arguments can b e used to correct the pro of of [ 1 ] directly . W e start b y ve ry br ieﬂy r estating notation. L et e b e an arbitrary set of lab el assignment s, c b e the true lab el assignmen ts and ˆ c b e the maximizer of a comm unity detecti on criterion. Let O ( e ) ∈ R K × K , V ( e ) ∈ R K × K × M , ˆ Π ∈ R K × M , f ( e ) ∈ R K , where O k l ( e ) = X ij A ij I { e i = k , e j = l } , V k au ( e ) = P n i =1 I ( e i = k , c i = a, θ i = x u ) P n i =1 I ( c i = a, θ i = x u ) , Received Au gust 2014; revised Septemb er 2014. AMS 2000 subje ct classiﬁc ations. 62G20. Key wor ds and phr ases. Netw ork comm unities, stochastic blo ck model, degree- corrected sto chastic blo ck mod el, consistency of communit y d etection. This is an electronic repr int of the origina l ar ticle published by the Institute of Mathematical Statistics in The Annals of Statistics , 2015, V ol. 43, No. 1, 462–466 . This repr int diﬀers from the o r iginal in pagination and t yp ogr aphic detail. 1 2 P . J. BICKEL ET AL. ˆ Π au = 1 n n X i =1 I ( c i = a, θ i = x u ) , f k ( e ) = 1 n n X i =1 I ( e i = k ) = X au V k au ( e ) ˆ Π au . W e considered comm unit y detection criteria that can b e wr itten in the form Q ( e ) = F  O ( e ) µ n , f ( e )  , where µ n = n 2 ρ n and ρ n → 0 is the a verage probability of an edge in the net w ork. F or any mat rix B , k B k ∞ = max k l | B k l | . The statemen t | ∆( e , c ) | ≤ M 1 ( k X ( e ) − X ( c ) k ∞ ) b elo w (A.11) in [ 2 ] is incorrect. (W e ha v e r eplaced M ′ and C ′ in the original with M 1 and C 1 in this correction since w e will need more constan ts.) F or the pro of to go through, w e need a diﬀerent wa y of provi ng P  max 1 ≤| e − c |≤ δ n n | ∆( e , c ) | − C 1 k V ( e ) − I k 1 / 4 ≤ 0  → 1 , (1.1) where δ n → 0. Note that ( 1.1 ) is similar to the (A.14) in [ 2 ], w ith an extra constrain t | e − c | ≤ δ n n . Since w e ha v e already pro v ed P ( 1 n | ˆ c − c | ≤ δ n ) → 1 in [ 2 ], ( 1.1 ) will complete the p ro of, and the conclusion of Theorem 4.1 in [ 2 ] remains v alid. W e ﬁrst need a lemma based on Bernstein’s inequalit y . Lemma 1.1. F or m ∈ { 1 , . . . , n } , P  max | e − c |≤ m k X ( e ) k ∞ ≥ ε  ≤ 2  n m  K m +2 exp  − 3 µ n ε 2 4( ε + 3)  . (1.2) The pr o of of Lemma 1.1 closely follo ws the pro of of (A.2) and (A.3) in [ 2 ] and h ence is omitted h er e. Pro of of ( 1.1 ): By T a ylor’s expansion, F  O ( e ) µ n , f ( e )  − F ( ˆ T ( e ) , f ( e )) = ∂ F ∂ M     M = ˆ T ( e ) , t = f ( e ) v ec( X ( e )) + O ( k X ( e ) k 2 ∞ ) , where ∂ F ∂ M is the p artial d eriv ativ e o v er th e ﬁrst comp onen t (v ectorize d) of F ( M , t ). Similarly , F  O ( c ) µ n , f ( c )  − F ( ˆ T ( c ) , f ( c )) CORRECTION TO CONS ISTENCY OF COMMUNITY DETECTION 3 = ∂ F ∂ M     M = ˆ T ( c ) , t = f ( c ) v ec( X ( c )) + O ( k X ( c ) k 2 ∞ ) . Since ∂ F ∂ M is con tin uous with resp ect to M and t , and ˆ T ( e ) and f ( e ) are con tin uous with resp ect to e , ∂ F ∂ M     M = ˆ T ( e ) , t = f ( e ) = ∂ F ∂ M     M = ˆ T ( c ) , t = f ( c ) + O ( k V ( e ) − I k 1 ) . (1.3) Therefore, since ∆( e , c ) = F  O ( e ) µ n , f ( e )  − F ( ˆ T ( e ) , f ( e )) − F  O ( c ) µ n , f ( c )  + F ( ˆ T ( c ) , f ( c )) = ∂ F ∂ M     M = ˆ T ( c ) , t = f ( c ) v ec( X ( e ) − X ( c )) + O ( k V ( e ) − I k 1 ) vec( X ( e )) + O ( k X ( e ) k 2 ∞ ) + O ( k X ( c ) k 2 ∞ ) , w e ha v e | ∆( e , c ) | ≤ M 1 k X ( e ) − X ( c ) k ∞ + M 2 k V ( e ) − I k 1 k X ( e ) k ∞ + M 3 k X ( e ) k 2 ∞ + M 4 k X ( c ) k 2 ∞ . No w w e pro v e ( 1.1 ), whic h holds if the follo wing four stateme nts hold: P  max 1 ≤| e − c |≤ δ n n M 1 k X ( e ) − X ( c ) k ∞ − C 1 k V ( e ) − I k 1 / 16 ≤ 0  → 1 , (1.4) P  max 1 ≤| e − c |≤ δ n n M 2 k X ( e ) k ∞ − C 1 / 16 ≤ 0  → 1 , (1.5) P  max 1 ≤| e − c |≤ δ n n M 3 k X ( e ) k 2 ∞ − C 1 k V ( e ) − I k 1 / 16 ≤ 0  → 1 , (1.6) P  max 1 ≤| e − c |≤ δ n n M 4 k X ( c ) k 2 ∞ − C 1 k V ( e ) − I k 1 / 16 ≤ 0  → 1 . (1.7) The pro of of ( 1.4 ) is similar to the p ro of of (A.15) in [ 2 ]. No te th at 1 n | e − c | ≤ 1 2 k V ( e ) − I k 1 . So f or eac h m ≥ 1, P  max | e − c | = m M 1 k X ( e ) − X ( c ) k ∞ − C 1 k V ( e ) − I k 1 / 16 > 0  ≤ P  max | e − c |≤ m k X ( e ) − X ( c ) k ∞ > C 1 m 8 M 1 n  = I 1 . Let α = C 1 / 8 M 1 if α ≥ 6 C , by (A.2) in [ 2 ], I 1 ≤ 2 K m +2 n m exp  − α 3 m 8 n µ n  = 2 K 2 [ K exp( log n − αµ n / (8 / 3 n )) ] m . 4 P . J. BICKEL ET AL. If α < 6 C , b y (A.3) in [ 2 ], I 1 ≤ 2 K m +2 n m exp  − α 2 m 16 C n µ n  = 2 K 2 [ K exp(log n − α 2 µ n / (16 C n )) ] m . In b oth cases, since λ n / log n → ∞ ( λ n = nρ n ), P  max 1 ≤| e − c |≤ δ n n M 1 k X ( e ) − X ( c ) k ∞ − C 1 k V ( e ) − I k 1 / 16 > 0  ≤ ∞ X m =1 P  max | e − c | = m M 1 k X ( e ) − X ( c ) k ∞ − C 1 k V ( e ) − I k 1 / 16 > 0  → 0 , as n → ∞ , whic h completes the pro of of ( 1.4 ). Equation ( 1.5 ) simply foll o ws (A.1) in [ 2 ]. W e next pro v e ( 1.6 ). F or eac h 1 ≤ m ≤ δ n n , P  max | e − c | = m M 3 k X ( e ) k 2 ∞ − C 1 k V ( e ) − I k 1 / 16 > 0  ≤ P  max | e − c |≤ m k X ( e ) k 2 ∞ > C 1 m 8 M 3 n  = I 2 . Let ε = q C 1 m 8 M 3 n , α = C 1 / 64 M 3 . Then f r om Lemma 1.1 , I 2 ≤ 2 K m +2 n m exp  − 3 µ n ε 2 4( ε + 3)  ≤ 2 K m +2 n m exp  − µ n ε 2 8  = 2 K m +2 n m exp  − α µ n n m  = 2 K 2  K exp  log n − α µ n n  m . Since λ n / log n → ∞ , P  max 1 ≤| e − c |≤ δ n n M 3 k X ( e ) k 2 ∞ − C 1 k V ( e ) − I k 1 / 16 > 0  ≤ ∞ X m =1 P  max | e − c | = m M 3 k X ( e ) k 2 ∞ − C 1 k V ( e ) − I k 1 / 16 > 0  → 0 , as n → ∞ , whic h completes the pro of of ( 1.6 ). CORRECTION TO CONS ISTENCY OF COMMUNITY DETECTION 5 W e no w complete the p ro of b y sho w ing ( 1.7 ). F or ea c h 1 ≤ m ≤ δ n n , P  max | e − c | = m M 4 k X ( c ) k 2 ∞ − C 1 k V ( e ) − I k 1 / 16 > 0  = P  k X ( c ) k 2 ∞ > C 1 m 8 M 4 n  = I 3 . Let ε = q C 1 m 8 M 4 n , α = C 1 / 64 M 4 . Then f r om Bernstein’s inequalit y , I 3 ≤ 2 K 2 exp  − 3 µ n ε 2 4( ε + 3)  ≤ 2 K 2 exp  − α µ n n m  . (1.8) Therefore, P  max 1 ≤| e − c |≤ δ n n M 4 k X ( c ) k 2 ∞ − C 1 k V ( e ) − I k 1 / 16 > 0  ≤ ∞ X m =1 P ( M 4 k X ( e ) k 2 ∞ − C 1 k V ( e ) − I k 1 / 16 > 0) → 0 as n → ∞ . Ac kno wledgemen ts. W e are v ery grateful to Emma Jingfei Zhang, a for- mer Ph.D. stud en t at Univ ersit y of Illinois at Urbana-Champaign no w at Univ ersit y of Miami, who discov ered the err or and p ersisted in trac king do wn its root cause. REFERENCES [1] Bickel, P. J. and Chen, A. (2009 ). A nonparametric view of netw ork models and Newman-Girv an and other modu larities. Pr o c. Nat l. A c ad. Sci. USA 106 21068– 21073. [2] Zhao, Y . , Levina, E. and Zhu, J. (2012). Consistency of communit y detection in netw orks under degree-corrected sto chastic blo ck mo dels. Ann. Statist . 40 2266– 2292. MR3059083 P. J. Bickel Dep a r tm ent of St at istics University of California, Berkeley 367 Ev ans Hall Berkeley, California 94720-3 860 USA E-mail: bic kel@stat.berkeley .edu A. Chen Google Inc 1600 Amphithea tre Pkwy Mount ain View, California 94043 USA E-mail: aiyo uchen @go ogle.com Y. Zhao Dep a r tm ent of St at istics George Ma son University 1714 Engineering Building 4400 University Drive F airf ax, Virginia 2203 0-4444 USA E-mail: yzhao15@gm u.edu E. Levina J. Zhu Dep a r tm ent of St at istics University of Michiga n 311 West Hall 1085 S. University A ve. Ann Arbor, Michig an 48109-1 107 USA E-mail: elevina@umic h.edu jizhu@umic h.edu

Consistency of community detection in networks under degree-corrected stochastic block models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment