Scalable detection of statistically significant communities and hierarchies, using message-passing for modularity

Scalable detection of statistically signiﬁcan t comm unities and hierarc hies, using message-passing for mo dularit y P an Zhang and Cristopher Mo ore Santa F e Institute, Santa F e, New Mexic o 87501, USA Mo dularit y is a p opular measure of comm unit y structure. Ho wev er, maximizing the modularity can lead to many competing partitions, with almost the same mo dularity , that are p o orly correlated with eac h other. It can also pro duce illusory “communities” in random graphs where none exist. W e address this problem b y using the mo dularit y as a Hamiltonian at ﬁnite temp erature, and using an eﬃcient Belief Propagation algorithm to obtain the consensus of many partitions with high mo dularit y , rather than lo oking for a single partition that maximizes it. W e sho w analytically and n umerically that the prop osed algorithm works all the wa y down to the detectability transition in net works generated by the stochastic blo ck mo del. It also performs well on real-w orld netw orks, rev ealing large comm unities in some netw orks where previous work has claimed no communities exist. Finally we show that b y applying our algorithm recursively , sub dividing communities until no statistically-signiﬁcant sub communities can b e found, we can detect hierarchical structure in real-w orld netw orks more eﬃcien tly than previous metho ds. Signiﬁc anc e : Most work on comm unit y detection do es not address the issue of statistical signif- icance, and many algorithms are prone to ov erﬁtting. W e address this using to ols from statistical ph ysics. Rather than trying to ﬁnd the partition of a netw ork that maximizes the mo dularit y , our approach seeks the consensus of many high-modularity partitions. W e do this with a scalable message-passing algorithm, deriv ed by treating the mo dularit y as a Hamiltonian and applying the ca vity metho d. W e sho w analytically that our algorithm succeeds all the w ay down to the detectabil- it y transition in the stochastic block mo del; it also p erforms well on real-world netw orks. It also pro vides a principled metho d for determining the n umber of groups, or hierarchies of communities and sub comm unities. Comm unity detection, or no de clustering, is a key problem in netw ork science, computer science, so ciology , and biology . It aims to partition the no des in a netw ork in to groups suc h that there are man y edges connecting no des within the same group, and comparativ ely few edges connecting nodes in diﬀerent groups. Man y metho ds hav e b een prop osed for this problem. These include sp ectral clustering, where we classify no des according to the eigen vectors of a linear operator such as the adjacency matrix, random w alk matrix, graph Laplacian, or other linear op erators [1 – 3]; statistical inference, where w e ﬁt the netw ork with a generative mo del suc h as the sto c hastic blo c k mo del [4 – 7]; and a wide v ariet y of other metho ds, e.g. [8 – 10]. See [11] for a review. W e fo cus here on a p opular measure of the quality of a partition, the mo dularit y (e.g. [8, 12–14]). A partition into q groups is a set of labels { t } , where t i ∈ { 1 , . . . , q } is the group to whic h no de i belongs. The mo dularity of a partition { t } of a net work with n no des and m edges is deﬁned as follo ws, Q ( { t } ) = 1 m   X h ij i∈E δ t i t j − X h ij i d i d j 2 m δ t i t j   . (1) Here E is the set of edges, the degree d i is the n umber of neigh b ors no de i has, and δ is the Kroneck er delta. Th us Q is prop ortional to the num b er of edges within comm unities, min us the expected num b er of suc h edges if the graph w ere randomly rewired while keeping the degrees ﬁxed; that is, the exp ectation in a n ull mo del where i and j are connected with probabilit y d i d j / 2 m . Ho wev er, maximizing ov er all p ossible partitions often gives a large mo dularit y even in random graphs with no comm unity structure [15 – 18]. Thus maximizing the mo dularity can lead to o v erﬁtting, where the “optimal” partition simply reﬂects random noise. Even in real-w orld net works, the mo dularit y often exhibits a large amount of degeneracy , with multiple lo cal optima that are po orly correlated with each other, and are not robust to small perturbations [19]. Th us we need to add some notion of statistical s igniﬁcance to our algorithms. One approach is hypothesis testing, comparing v arious measures of communit y structure to the distribution we w ould see in a null model suc h as Erd˝ os- R ´ en yi (ER) graphs [20 – 22]. How ever, even when comm unities really exist, the mo dularit y of the true partition is often no higher than that of random graphs. In Fig. 1, we show partitions of tw o netw orks with the same size and degree distribution: an ER graph (left), and a graph generated b y the stochastic block model (righ t), in the detectable regime where it is easy to ﬁnd a partition correlated with the true one [5, 6]. The true partition of the netw ork on the right has a smaller mo dularity than the partition found for the random graph on the left. W e can ﬁnd a partition with higher mo dularit y (and low er accuracy) on the right using e.g. sim ulated annealing, but then the mo dularities 2 Modularity = 0.391 Modularity = 0.333 FIG. 1: The adjacency matrices of tw o netw orks, partitioned to show p ossible communit y structure. Eac h blue p oint is an edge. The netw ork on the left is an ER graph, with no real comm unity structure; how ever, a search b y sim ulated annealing ﬁnds a partition with mo dularity 0 . 391. The netw ork on the right has true communities, and is generated by the sto chastic blo c k mo del, but the true partition has mo dularit y just 0 . 333. Thus illusory comm unities in random graphs can hav e higher mo dularit y than true communities in structured graphs. Both netw orks hav e size n = 5000 and a Poisson degree distribution with mean c = 3; the netw ork on the right has c out /c in = 0 . 2, in the easily-detectable regime of the sto c hastic blo ck model. w e obtain for the tw o netw orks are similar. Thus the usual approach of n ull distributions and p -v alues for hypothesis testing do es not app ear to work. W e prop ose to solv e this problem with the to ols of statistical ph ysics. Like [16], we treat the mo dularit y as the Hamiltonian of a spin system. W e deﬁne the energy of a partition { t } as E ( { t } ) = − mQ ( { t } ), and in tro duce a Gibbs distribution as a function of in verse temperature β , P ( { t } ) ∝ e − β E ( { t } ) . Rather than maximizing the mo dularit y by searc hing for the ground state of this system, we fo cus on its Gibbs distribution at a ﬁnite temp erature, lo oking for man y high-mo dularit y partitions rather than a single one. In analogy with previous work on the sto chastic block mo del [5, 6], w e deﬁne a partition { ˆ t } by computing the marginals of the Gibbs distribution, and assigning each no de to its most-lik ely communit y . Speciﬁcally , if ψ i t is the marginal probability that i b elongs to group t , then ˆ t i = argmax t ψ i t , breaking ties randomly if more than one t achiev es the maximum. W e call { ˆ t } the r etrieval p artition , and call its mo dularit y Q ( { ˆ t } ) the r etrieval mo dularity . W e claim that { ˆ t } is a far b etter measure of signiﬁcan t comm unity structure than the maximum-modularity partition. In the language of statistics, the maximum marginal prediction is b etter than the maximum a p osteriori prediction (e.g. [23]). More informally , the consensus of many go od solutions is b etter than the “b est” single one [24, 25]. W e give an eﬃcien t Belief Propagation (BP) algorithm to approximate these marginals, which is derived from the ca vity metho d of statistical physics. This algorithm is highly scalable; each iteration takes linear time on sparse net works if the num b er of groups is ﬁxed, and it conv erges rapidly in most cases. It is optimal in the sense that for syn thetic graphs generated by the sto c hastic blo ck model, it works all the wa y down to the detectability transition. It pro vides a principled wa y to choose the n umber of comm unities, unlike other algorithms that tend to ov erﬁt. Finally , b y applying this algorithm recursively , subdividing comm unities un til no statistically signiﬁcan t subcommunities exist, w e can uncov er hierarchical structure. W e v alidate our approach with exp eriments on real and synthetic netw orks. In particular, we ﬁnd signiﬁcant large communities in some large netw orks where previous w ork claimed there were none. W e also compare our algorithm with several others, ﬁnding that it obtains more accurate results, b oth in terms of determining the n umber of comm unities and matching their ground truth structure. 3 I. RESUL TS A. Results on the Sto c hastic Blo ck Mo del Also called the planted partition mo del, the sto c hastic blo c k model (SBM) is a p opular ensem ble of net w orks with comm unity structure. There are q groups of no des, and each no de i has a group label t ∗ i ∈ { 1 , . . . , q } ; thus { t ∗ } is the true, or plan ted, partition. Edges are generated indep enden tly according to a q × q matrix p , by connecting eac h pair of no des h ij i with probabilit y p t ∗ i ,t ∗ j . Here for simplicity we discuss the commonly studied case where the q groups ha ve equal size and where p has only tw o distinct entries, p rs = c in /n if r = s and c out /n if r 6 = s . W e use  = c out /c in to denote the ratio betw een these t wo entries. In the assortative case, c in > c out and  < 1. When  is small, the comm unity structure is strong; when  = 1, the net w ork b ecomes an ER graph. F or a given a verage degree c = ( c in + ( q − 1) c out ) /q , there is a so-called detectability phase transition [5, 6], at a critical v alue  ∗ = √ c − 1 √ c − 1 + q . (2) F or  <  ∗ , BP can lab el the no des with high accuracy; for  >  ∗ , neither BP nor an y other algorithm can lab el the no des b etter than chance, and indeed no algorithm can distinguish the netw ork from an ER graph with high probabilit y . This transition was recently established rigorously in the case q = 2 [26 – 28]. F or larger n um b ers of groups, the situation is more complicated. F or q ≤ 4, in the assortativ e case, this detectabilit y transition coincides with the Kesten-Stigum b ound [29, 30]. F or q ≥ 5 the Kesten-Stigum b ound marks a conjectured transition to a “hard but detectable” phase where communit y detection is still p ossible but takes exp onen tial time, while the detectability transition is at a larger v alue of  ; that is, the thresholds for reconstruction and robust reconstruction become diﬀerent. Our claim is that our algorithm succeeds do wn to the Kesten-Stigum b ound, i.e., throughout the detectable regime for q ≤ 4 and the easily detectable regime for q ≥ 5. In Fig. 2 w e co mpare the behavior of our BP algorithm on ER graphs and a net work generated by the SBM in the detectable regime. Both graphs hav e the same size and av erage degree c = 3. F or the ER graph (left) there are just t wo phases, separated by a transition at β ∗ = 1 . 317: the p ar amagnetic phase where BP conv erges to a factorized ﬁxed p oin t where every no de is equally likely to b e in ev ery group, and the spin glass phase where replica symmetry is brok en, and BP fails to con v erge. The con vergence time div erges at the transition. Note that in the spin glass phase, the retriev al mo dularit y returned by BP ﬂuctuates wildly as BP jumps from one lo cal optimum to another, and has little meaning. In any case BP assumes replica symmetry , which is incorrect in this phase. In con trast, the SBM netw ork in Fig. 2 (right) has strong communit y structure. In addition to the paramagnetic and spin glass phases, there is now a r etrieval phase in a range of β , where BP ﬁnds a retriev al state describing statistically signiﬁcan t communit y structure. The retriev al mo dularity jumps sharply at β R = 1 . 072 when we ﬁrst en ter this phase, and then increases gently to 0 . 393 as β increases; for comparison, the mo dularit y of the planted partition is M hidden (  ) = 1 / (1 +  ) − 1 / 2 = 0 . 33. When we enter the spin glass phase at β SG = 2 . 27, the retriev al mo dularit y ﬂuctuates as in the ER graph. The conv ergence time diverges at both phase transitions. W e can compute tw o of these transition p oin ts analytically by analyzing the linear stability of the factorized ﬁxed p oin t (see Methods). Stability against random perturbations gives β ∗ ( q , c ) = log  q √ c − 1 + 1  , (3) and stabilit y against correlated p erturbations giv es β R ( q , c,  ) = log  q (1 + ( q − 1)  ) c (1 −  ) − (1 + ( q − 1)  ) + 1  . (4) These cross at the Kesten-Stigum bound, where  =  ∗ . W e do not curren tly hav e an analytic expression for β SG . In Fig. 3 (left) w e sho w the phase diagram of our algorithm on SBM net works, including the paramagnetic, retriev al, and spin glass phases as a function of  , with q = 2 and c = 3. The b oundary β R b et w een the paramagnetic and retriev al phases is in excellen t agreement with our expression (4). F or  <  ∗ ≈ 0 . 267, our algorithm ﬁnds a retriev al state for β R < β < β SG . On the righ t, we show the accuracy of the retriev al partition { ˆ t } , deﬁned as its overlap with the plan ted partition, i.e., the fraction of no des labeled correctly . W e emphasize that β ∗ is not the optimal v alue of β , i.e., it is not on the Nishimori line [23, 31, 32]. How ever, the optimal β dep ends on the parameters of the SBM (see App endix). Our claim is that setting β = β ∗ in our algorithm succeeds throughout the easily-detectable regime, even when the parameters are unknown. In Fig. 3 (right) w e compare our algorithm with that of [5, 6], which learns the SBM parameters using an exp ectation-maximization (EM) algorithm. Our algorithm provides nearly the same ov erlap, without the need for the EM loop. 4 β 1 2 3 0 0.1 0.2 0.3 0 300 600 900 P SG β 1 2 3 0 0.1 0.2 0.3 0.4 0 300 600 900 P R SG FIG. 2: Retriev al modularity (blue × , left y-axis) and BP con v ergence time (red +, right y-axis) of an ER random graph (left) and a netw ork generated by the sto chastic blo c k mo del in the detectable regime (right). Both netw orks hav e n = 1000 and a verage degree c = 3, and the netw ork on the right has  = 0 . 2. In b oth cases w e ran BP with q = 2 groups. In the ER graph, whic h has no communit y structure, there are tw o phases, paramagnetic (P) and spin glass (SG), with a transition at β ∗ = 1 . 317. In the SBM netw ork, there is an additional retriev al phase (R) b et ween β R = 1 . 072 and β SG = 2 . 27 where BP ﬁnds a retriev al state with high mo dularity , indicating statistically signiﬁcan t communit y structure. ε =p out /p in β P R SG ε * β * β R ( ε ) theory β R ( ε ) experiments β SG 0 0.1 0.2 0.3 1 1.5 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ε =p out /p in Overlap Inference of SBM Modularity BP FIG. 3: Left: phase diagram for netw orks generated by the stochastic blo ck mo del, showing the paramagnetic (P), retriev al (R), and spin glass (SG) phases. Blue circles with error bars denote experimental estimates of β R , the b oundary betw een the paramagnetic and retriev al phases, and the solid green line shows our theoretical expression (4). The spin glass instability o ccurs for β > β ∗ (2 , 3) (red dash-dotted line) and  ∗ is the detectability transition (blac k dashed line). Right: The ov erlap of the retriev al partition at β = 1 . 315 ≈ β ∗ (2 , 3) (blue circles) and the partition obtained with the algorithm of [5], whic h infers the parameters of the SBM with an additional EM learning algorithm. Each experiment is on the giant component of a netw ork with n = 10 5 , q = 2 groups, and av erage degree c = 3. W e a v erage ov er 10 random instances. B. Results on real-world net works and choosing the n umber of groups W e tested our algorithm on a num b er of real-world netw orks. As for netw orks generated by the SBM in the detectable regime, w e ﬁnd a retriev al phase b etw een the paramagnetic and s pin glass phases (see ﬁgure in App endix). Rather than attempting to learn the optimal parameters or temp erature for these netw orks, w e simply set β = β ∗ ( q ∗ , c ) as deﬁned in (3) where q ∗ is the ground-truth num b er of groups (if known) and c is the av erage degree. Again, this v alue of β is not optimal, and v arying β ma y improv e the algorithm’s p erformance; ho w ever, setting β = β ∗ app ears to w ork well in practice. When the num b er of groups is not known, determining it is a classic mo del-selection problem. The maximum mo dularit y typically grows with q . In contrast, the retriev al mo dularit y stops growing when q exceeds the correct v alue, giving us a principled metho d of c ho osing q ∗ (see App endix). F or those netw orks where q ∗ is known, we found that this procedure agrees p erfectly with the ground truth. As shown in T able 1, our algorithm ﬁnds a retriev al state in all these netw orks, with high retriev al modularity and 5 high ov erlap with the ground truth. F or the Gn utella, Epinions and web-Google net w orks, no ground truth is known; but in con trast with [37], our algorithm ﬁnds signiﬁcan t large-scale communities. While most of these net w orks are assortativ e, one net work in the table, the adjacency net w ork of common adjectiv es and nouns in the nov el David Copp erﬁeld [2], is disassortative, since nouns are more likely to be adjacent to adjectiv es than other nouns and vice v ersa. In this case, w e found a retriev al state with negativ e mo dularity , and high ov erlap with the ground truth, b y setting β to − β ∗ ( q ∗ , c ). T ABLE I: Retriev al mo dularit y , o verlap b etw een the retriev al partition and the ground truth, the num b er of groups q ∗ as determined by our algorithm, the inv erse temp erature β ∗ deﬁned in (3), and the conv ergence time measured in seconds and iterations for several real-world netw orks [2, 33–37]. F or Gn utella, Epinions and web-Google [37] no ground truth is kno wn, but based on our results we claim, con trary to [37], that these netw orks hav e statistically signiﬁcant large-scale communities. Net work n m q ∗ β ∗ Q ( ˆ t ) ov erlap time (sec) # iterations Zac hary’s k arate club 34 78 2 1.012 0.371 1 0.001 26 Dolphin so cial netw ork 62 159 2 0.948 0.395 0.887 0.001 33 Bo oks ab out US p olitics 105 441 3 0.948 0.521 0.829 0.002 23 W ord adjacencies 112 425 2 -0.761 -0.275 0.848 0.003 35 P olitical blogs 1222 16714 2 0.387 0.426 0.948 0.043 18 Gn utella 62586 147892 7 0.995 0.517 37.43 433 Epinions 75888 405740 4 0.632 0.429 57.13 213 W eb-Go ogle 916428 4322051 5 0.676 0.724 2331 505 C. Results on hierarchical clustering Man y netw orks appear to ha ve hierarchical structure with comm unities and sub communities on many scales [2, 8, 24, 38, 39]. W e can lo ok for such structures by w orking recursively: we determine the optimal n umber q ∗ of groups, divide the netw ork in to subgraphs, and apply the algorithm to each one. W e stop dividing when there is no retriev al state, indicating that the remaining subgraphs ha v e no signiﬁcant internal structure. F or netw orks generated by the SBM, eac h subgraph is an ER graph. Our algorithm ﬁnds no retriev al state in the subgraphs, so it stops after one level of divisioin. The same o ccurs in some small real-w orld netw orks, e.g. Zac hary’s k arate club. In some larger real-w orld netw orks, on the other hand, our algorithm rep eatedly ﬁnds a retriev al state in the subgraphs, suggesting a deep hierarc hical structure. An example is the net work of p olitical blogs [34]. Our algorithm ﬁrst ﬁnds t wo large communities corresp onding to lib erals and conserv atives, and agreeing with the ground-truth lab els on 95% of the nodes. But as shown in Fig. 4, it splits these into sub comm unities, even tually ﬁnding a hierarch y 5 levels deep with a total of 14 subgroups (the shaded lea ves of the tree in Fig. 4). W e show the adjacency matrix with nodes ordered by this ﬁnal partition on the right of Fig. 4, and the hierarchical structure is clearly visible. The mo dularit y of the 2nd through 5th levels are 0 . 426, 0 . 331, 0 . 285, and 0 . 282 resp ectively . This decreasing mo dularit y may explain why the algorithm did not immediately split the net work all the wa y do wn to the sub-comm unities. A nested SBM was used to explore hierarchical structure in [39], where the blog netw ork w as also rep orted to ha ve hierarc hical structure. Our results are slightly diﬀerent, giving 14 rather than 17 subgroups, but the ﬁrst 3 levels of sub division are similar. D. Comparison with other algorithms In this section we compare the p erformance of our algorithm with tw o p opular algorithms: Louv ain [9] and OSLOM [21]. In particular, OSLOM tries to fo cus on statistically signiﬁcan t communities. Louv ain gives partitions with similar mo dularit y as our algorithm, but with a muc h larger num b er of groups, particularly on large netw orks. F or example, on the Gnutella and Epinions netw ork [37], our algorithm ﬁnds q ∗ = 7 and q ∗ = 4 groups with mo dularit y 0 . 517 and 0 . 429 resp ectively , while the Louv ain metho d ﬁnds 66 and 949 groups with mo dularit y 0 . 499 and 0 . 430 resp ectiv ely . Th us our algorithm ﬁnds large-scale communities, with a mo dularit y similar to the smaller communities found b y Louv ain. Of course, we emphasize that maximizing the mo dularity is not our goal: ﬁnding statistically signiﬁcant communities is. 6 FIG. 4: Left, a hierarchical division of the p olitical blog net work [34]. W e apply our tec hnique recursively , lo oking for a retriev al state and optimizing the num b er of groups in which to split the communit y at eac h stage. W e stop when no retriev al state is detected, indicating that the remaining groups hav e no statistically signiﬁcan t subcommunities. Each leaf denotes one node, the size indicates its degree, and the colors indicate diﬀeren t groups in ﬁnal division. Right, the adjacency matrix of the netw ork ordered according to this partition. 0 0.1 0.2 0.3 0 0.2 0.4 0.6 0.8 1 ε NMI Modularity BP OSLOM Louvain 4096 16384 65536 0 20 40 60 80 100 Number of nodes Number of groups Modularity BP OSLOM Louvain FIG. 5: Comparison of BP with Louv ain and OSLOM on SBM netw orks with n = 10 4 , c = 6, and q = 6. On the left, we sho w the normalized m utual information (NMI) betw een each algorithm’s results and the true partition as a function of  ; the other algorithms’ NMI drops sharply well b elo w the detectabilit y transition at  = 0 . 195. On the right, we show the inferred num b er of groups on the gian t component of an ER graph with c = 4. While our algorithm correctly ﬁnds q ∗ = 1, the other algorithms o verﬁt, ﬁnding a growing num b er of small communities as n increases. Each point is av eraged ov er 20 instances. W e sho w results on synthetic netw orks in Fig. 5. On the left, we apply Louv ain, OSLOM, and our algorithm to SBM netw orks with q = 6. W e compute the normalized mutual information (NMI) [40] b etw een the inferred partition and the plan ted one. (W e use the NMI rather than the ov erlap b ecause the n umber of groups given by OSLOM and Louv ain are very diﬀerent from the planted partition.) F or Louv ain and OSLOM, the NMI drops oﬀ well b elo w the detectabilit y transition. On the right, we show the num ber of groups that each algorithm infers for an ER graph with c = 4. Our algorithm correctly chooses q = 1, recognizing that this netw ork has no internal structure. The other algorithms o verﬁt, inferring a n umber of comm unities that gro ws with n . In the App endix w e report on exp erimen ts on b enc hmark net works with heavy-tailed degree distributions [41], with similar results. I I. DISCUSSION W e hav e presented a physics-based metho d for ﬁnding statistically signiﬁcant comm unities. Rather than using an explicit generative or graphical mo del, it uses a p opular measure of communit y structure, name ly the mo dularity . 7 It do es not attempt to maximize the mo dularity , whic h is b oth computationally diﬃcult and prone to ov erﬁtting. Instead it estimates the marginals of the Gibbs distribution using a scalable BP algorithm derived from the ca vity metho d (see next section), and deﬁnes the retriev al partition by assigning each no de to its most-likely comm unity according to these marginals. In essence, the algorithm lo oks for the consensus of many partitions with high mo dularit y . When this consensus exists, it indicates statistically signiﬁcan t communit y structure, as opp osed to random ﬂuctuations. Moreov er, by testing for the existence of this retriev al state, as opp osed to a spin glass state where the algorithm ﬂuctuates b etw een man y unrelated lo cal optima, w e can determine the correct num b er of groups, and decomp ose a netw ork hierarc hically . W e note that this algorithm is related to BP for the degree-corrected stochastic block model (DCSBM). Sp eciﬁcally , for a ﬁxed β , the mo dularit y is linearly related to the log-lik eliho o d of the DCSBM with particular parameters (see App endix). Ho wev er, our algorithm do es not hav e to learn the parameters of the blo ck mo del with an EM algorithm, or p erform mo del selection b etw een the sto chastic blo c k mo del and its degree-corrected v ariant [42]. T o b e clear, β is still a tunable parameter that can be optimized, but the heuristic v alue β = β ∗ app ears to w ork well for a wide range of net works. In addition to the detectability transition in the SBM, another w ell-known barrier to communit y detection is the resolution limit [43] where communities b ecome diﬃcult to ﬁnd when their size is O ( √ n ) or less. In the App endix, we giv e some evidence that our hierarchical clustering algorithm o v ercomes this barrier. Namely , for the classic example of a ring of cliques, at the second lev el our algorithm divides the graph precisely in to these cliques. Another recen t prop osal for determining the num ber of groups is to use the num ber of real eigenv alues of the non-bac ktracking matrix, outside the bulk of the sp ectrum [3]. F or some net works, such as the p olitical blogs, this giv es a larger num b er than the q ∗ w e found here; it may b e that, in some sense, this metho d detects not just top-level comm unities, but sub communities deeper in the hierarch y . It would b e in teresting to p erform a detailed comparison of the t wo metho ds. Our approach can b e extended to generalizations of the mo dularit y , where the graph is weigh ted, or where a parameter γ represents the relative imp ortance of the exp ected num b er of internal edges [16]. Finally , it w ould b e in teresting to apply BP to other ob jectiv e functions, such as normalized cut or conductance, devising Hamiltonians from them and considering the resulting Gibbs distributions. Finally , w e note that rather than running BP once and using the resulting marginals, w e could use decimation [51] to ﬁx the lab els of the most biased no des, run BP again to update the marginals, and so on. This would increase the running time of the algorithm, but it may improv e its p erformance. Another approac h w ould b e reinforcemen t [51], where w e add external ﬁelds that p oin t to ward the likely conﬁguration. W e lea ve this for future work. I II. METHODS A. Deﬁning statistical signiﬁcance As describ ed ab ov e, an ER random graph has many partitions with high mo dularity . How ever, these partitions are nearly uncorrelated with each other. In the language of disordered materials, the landscap e of partitions is glassy: while the optimal one might b e unique, there are many others whose mo dularity is almost as high, but which hav e a large Hamming distance from the optim um and from eac h other. If we deﬁne a Gibbs distribution on the partitions, w e encounter either a paramagnetic state where the marginals are uniform, or a spin glass with replica symmetry breaking where w e jump b et w een local optima. In either case, fo cusing on an y one of these optima is simply ov erﬁtting. F or net works such as on the righ t of Fig. 1, in contrast, there are man y high-mo dularit y partitions that are correlated with each other, and with the ground truth. As a result, the landscap e has a smo oth v alley surrounding the ground truth. At a suitable temp erature, the Gibbs distribution is in a retriev al phase with both lo w energy (high modularity) and high entrop y , giving it a lo wer free energy than the paramagnetic state, with its marginals biased tow ards the ground truth. When BP conv erges to a ﬁxed p oin t, it ﬁnds a (local) minimum of the Bethe free energy , appro ximating this lo wer free energy phase. W e prop ose the existence of this retriev al phase as a ph ysics-based deﬁnition of statistical signiﬁcance. When it exists, the retriev al partition deﬁned by the maximum marginals is an optimal prediction for whic h no des belong to whic h groups. The idea of using the free energy to separate real comm unity structure from random noise, and using the Gibbs marginals to deﬁne a partition, also app eared in [5, 6]. Ho wev er, that work is based on a speciﬁc generative mo del, namely the sto chastic blo ck mo del, and the energy is (minus) the log-likelihoo d of the observed netw ork. In contrast, w e av oid explicit generative models, and fo cus directly on the modularity as a measure of comm unit y structure. 8 B. The cavit y metho d and b elief propagation Our goal is to compute the marginal probabilit y distribution that each node b elongs to a given group and the free energy of the Gibbs distribution. W e could do this using a Monte Carlo Marko v Chain algorithm. How ever, to obtain marginals we would need many indep enden t samples, and to obtain the free energy we would need to sample at many diﬀeren t temp eratures. Thus MCMC is prohibitiv ely slo w for our purp oses. Instead, for sparse net works, we can use Belief Propagation [44], known in statistical physics as the cavit y metho d [45]. BP makes a conditional indep endence assumption, whic h is exact only on trees; ho wev er, in the regimes w e will consider (the detectable regime of the stochastic blo ck mo del, and typical real-world graphs), its estimates of the marginals are quite accurate. It also provides an estimate of the free energy , called the Bethe free energy , which is a function of one- and t w o-p oin t marginals. BP works with “messages” ψ i → k t : these are estimates, sen t from no de i to node k , of the marginal probabilit y that t i = t based on i ’s in teractions with no des j 6 = k . The up date equations for these messages are as follo ws: ψ i → k t ∝ exp   − β d i 2 m θ t + X j ∈ ∂ i \ k log  1 + ψ j → i t (e β − 1)    . (5) Here ∂ i denotes the set of i ’s neigh b ors, and θ t = P n j =1 d j ψ j t denotes an external ﬁeld acting on no des in group t , whic h we up date after each BP iteration. W e refer to the App endix for detailed deriv ations of the BP up date equations and Bethe free energy . F or q groups and m edges, each iteration of (A4) tak es time O ( q m ). If q is ﬁxed this is linear in the n um b er of edges, and linear in the num b er of no des when the netw ork is sparse (i.e., when the av erage degree is constant). Moreov er, these up dates can b e easily parallelized. Empirically , the num b er of iterations required to conv erge app ears to dep end v ery weakly on the netw ork size, although in some cases it must gro w at least logarithmically . C. The factorized solution and lo cal stabilit y Observ e that the factorize d solution, ψ j → i t = 1 /q , where each no de is equally lik ely to b e in each p ossible group, is alw ays a ﬁxed p oint of (A4). If BP conv erges to this solution, w e cannot lab el the nodes b etter than c hance, and the retriev al mo dularit y is zero. This is the paramagnetic state. There are tw o other p ossibilities: BP fails to conv erge, or it conv erges to a non-factorized ﬁxed p oin t, which we call the r etrieval state . In the latter case, w e can compute the marginals b y ψ i t ∝ exp   − β d i 2 m θ t + X j ∈ ∂ i log  1 + ψ j → i t (e β − 1)    , (6) and deﬁne the retriev al partition ˆ t that assigns each no de to its most-likely communit y . This partition represents the consensus of the Gibbs distribution: it indicates that there are many high-mo dularit y partitions that are correlated with each other. The retriev al mo dularit y Q ( { ˆ t } ) is then a go o d measure of the extent to whic h the netw ork has statistically signiﬁcan t communit y structure. On the other hand, if BP does not conv erge, this means that neither the factorized solution nor any other ﬁxed p oin t is lo cally stable; the spin glass susceptibility diverges, and replica symmetry is broken. In other words, the space of partitions breaks into an exp onen tial num b er of clusters, and BP jumps from one to another. The retriev al partition obtained using the current marginals will c hange to a very diﬀerent partition if we run BP a bit longer, or if w e p erturb the initial BP messages slightly . In the spin glass phase, we are free to deﬁne a retriev al mo dularit y from the curren t marginals, but it ﬂuctuates rapidly , and do es not represen t a consensus of man y partitions. The linear stability of the factorized solution can b e characterized by computing the deriv atives of messages with resp ect to each other at the factorized ﬁxed p oin t. Using (A4), we ﬁnd that ∂ ψ i → k t /∂ ψ j → i s = T st where T st is the q × q matrix T st = ∂ ψ i → k t ∂ ψ j → i s     1 q = e β − 1 e β − 1 + q  δ st − 1 q  . (7) Its largest eigen v alue (in magnitude) is λ = e β − 1 e β − 1 + q . (8) 9 On lo cally tree-like graphs with Poisson degree distributions and av erage degree c , the factorized ﬁxed p oint is then unstable with respect to random noise whenever cλ 2 > 1. This is also kno wn as the de Almeida-Thouless local stabilit y condition [46], the Kesten-Stigum bound [29, 30], or the threshold for census or robust reconstruction [47, 48]. In our case, it shows that β m ust exceed a critical β ∗ giv en b y (3). If the net w ork has some other degree distribution but is otherwise random, (3) holds where c is the av erage excess degree, i.e., the exp ected num b er of additional neighbors of the endpoint of a random edge. If there is no statistically signiﬁcant communit y structure, then BP has just tw o phases, the paramagnetic one and the spin glass: for β < β ∗ it conv erges to the factorized ﬁxed p oin t, and for β > β ∗ it do esn’t conv erge at all. On the other hand, if there are statistically signiﬁcant communities, then BP conv erges to a retriev al state in the range β R < β < β SG . Typically β R < β ∗ and β ∗ is in the retriev al phase, since even if the factorized ﬁxed p oint is lo cally stable, BP can still conv erge to a retriev al state if its free energy is low er than that of paramagnetic solution. Th us we can test for statistically signiﬁcant comm unities by running BP at β = β ∗ . Note that our calculation of β ∗ in (3) assumes that the net work is random conditioned on its degree distribution; in principle β ∗ could fall outside the retriev al phase for real-world netw orks. In that case, our heuristic metho d of setting β = β ∗ fails, and it would b e necessary to scan v alues of β in the vicinity of β ∗ for the retriev al state. T o estimate β R , we again consider the linear stability of BP around the factorized ﬁxed point; but now we consider arbitrary p erturbations, as opp osed to random noise. Let T b e the q × q matrix deﬁned in (7). The matrix of deriv atives of all 2 q m messages with resp ect to each other is a tensor pro duct T ⊗ B , where B is the non-backtrac king matrix [3]. The adaptive external ﬁeld in the BP equations suppresses eigenv ectors where every no de is in the same comm unity . As a result, the relev ant eigenv alue is λµ where λ is the largest eigenv alue of T , and µ is the second-largest eigen v alue of B , and the factorized ﬁxed p oint is unstable whenever λµ > 1. F or net w orks generated b y the SBM, we ha ve [3] µ = c (1 −  ) 1 + ( q − 1)  . (9) Com bining this with (8) and setting λµ = 1 gives eq. (4). Ho wev er, this assumes that the corresp onding eigen v ector of B is correlated with the communit y structure, so that p erturbing BP aw a y from the factorized ﬁxed p oin t will lead to the retriev al state. This is true as long as µ is outside the bulk of B ’s eigenv alues, which are conﬁned to a disk of radius √ c in the complex plane [3]; if it is inside the bulk, then the communit y structure is w ashed out b y isotropic eigenv ectors and b ecomes hard to ﬁnd. Th us the comm unities are detectable as long as µ > √ c . This is equiv alent to β R < β ∗ , or equiv alen tly  <  ∗ . Th us the retriev al state exists all the w ay do wn to the Kesten-Stigum transition where  =  ∗ , µ = √ c , and β R = β ∗ . A t that p oin t, the relev ant eigenv alue crosses in to the bulk, and the retriev al phase disapp ears. W e note that the paramagnetic, retriev al, and spin glass states were also studied in [49], using a generalized Potts mo del and a heat bath MCMC algorithm. Ho wev er, their Hamiltonian dep ends on a tunable cut-size parameter, rather than on a general measure of comm unity structure suc h as the modularity . Moreov er, it is diﬃcult to obtain analytical results on phase transitions using MCMC algorithms, while the stability of BP ﬁxed points is quite tractable. D. Deﬁning the spin glass phase While w e ha v e iden tiﬁed the spin glass phase with the non-conv ergence of belief propagation, the true phase diagram is p otentially more complicated. The spin glass phase is deﬁned b y the div ergence of the spin glass susceptibility . If this phase app ears contin uously , then in sparse problems this is equiv alent to the sensitivit y of the BP messages to noise, i.e., whether it conv erges to a stable ﬁxed point. How ev er, if the spin glass phase appears discon tinuously , it could b e that BP conv erges even though the true susceptibility diverges (see e.g. [50]). W e exp ect this to happ en ab ov e the Nishimori line when the “hard but detectable” phase exists [5], when there is a retriev al state with lo wer free energy than the factorized ﬁxed p oint but with an exp onen tially small basin of attraction, so that BP starting with random messages fails to con verge to the true minim um of the free energy . Detecting this spin glass phase would require us to go beyond the replica-symmetric BP equations used here to equations with one-step replica symmetry breaking [51]. In the assortativ e case of the sto c hastic blo c k mo del, the hard-but-detectable phase exists for q ≥ 5. Happily , the corresp onding range of parameters is quite narrow; nevertheless, more work on this needs to be done. A C++ implemen tation can b e found at [52]. 10 Ac knowledgmen ts W e are grateful to Silvio F ranz, Florent Krzak ala, Mark Newman, F ederico Ricci-T ersenghi, Christophe Sch ulk e, and Lenk a Zdeborov´ a for helpful discussions, and to Tiago de Paula P eixoto for dra wing Fig. 4 (left) using his soft ware at h ttp://graph-to ol.sk ewed.de/. This w ork was supp orted by AFOSR and DARP A under gran t F A9550-12-1-0432. [1] V on Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395. [2] Newman MEJ (2006) Finding communit y structure in netw orks using the eigen vectors of matrices. Phys Rev E 74:036104. [3] Krzak ala F, Mo ore C, Mossel E, Neeman J, Sly A, Zdeb orov´ a L, Zhang P (2013) Spectral redemption in clustering sparse net works. Proc Natl Acad Sci USA 110:20935. [4] Hastings MB (2006) Comm unity detection as an inference problem. Phys Rev E 74:035102. [5] Decelle A, Krzak ala F, Moore C, Zdeb orov´ a L (2011) Asymptotic analysis of the stochastic blo ck mo del for mo dular net works and its algorithmic applications. Phys Rev E 84:066106. [6] Decelle A, Krzak ala F, Mo ore C, Zdeb oro v´ a L (2011) Inference and phase transitions in the detection of modules in sparse net works. Ph ys Rev Lett 107:065701. [7] Karrer B, Newman MEJ (2011) Sto chastic blockmodels and communit y structure in net works. Ph ys Rev E 83:016107. [8] Clauset A, Newman MEJ, Mo ore C (2004) Finding communit y structure in v ery large net w orks. Phys Rev E 70:066111. [9] Blondel VD, Guillaume JL, Lambiotte R, Lefeb vre E (2008) F ast unfolding of communities in large netw orks. J Stat Mech 2008:P10008. [10] Rosv all M, Bergstrom CT (2008) Maps of random w alks on complex netw orks reveal comm unity structure. Proc Natl Acad Sci USA 105:1118. [11] F ortunato S (2010) Communit y detection in graphs. Physics Reports 486:75. [12] Newman MEJ, Girv an M (2004) Finding and ev aluating communit y structure in net works. Ph ys Rev E 69:026113. [13] Newman MEJ (2004) F ast algorithm for detecting comm unit y structure in netw orks. Ph ys Rev E 69:066133. [14] Duch J, Arenas A (2005) Communit y detection in complex netw orks using extremal optimization. Phys Rev E 72:027104. [15] Guimera R, Sales-Pardo M, Amaral LAN (2004) Mo dularity from ﬂuctuations in random graphs and complex net works. Ph ys Rev E 70:025101. [16] Reichardt J, Bornholdt S (2006) Statistical mechanics of communit y detection. Phys Rev E 74:016110. [17] Zdeb oro v´ a L, Boettcher S (2010) A conjecture on the maxim um cut and bisection width in random regular graphs. J Stat Mec h 2010:P02020. [18] Sulc P , Zdeb orov´ a L (2010) Belief propagation for graph partitioning. J Ph ys A: Math Gen 43:B5003. [19] Go od BH, de Montjo y e Y A, Clauset A (2010) P erformance of mo dularit y maximization in practical contexts. Phys Rev E 81:046106. [20] Lancichinetti A, Radicchi F, Ramasco J (2010) Statistical signiﬁcance of communities in net works. Phys Rev E 81:046110. [21] Lancichinetti A, Radicc hi F, Ramasco J, F ortunato S (2011) Finding statistically signiﬁcan t comm unities in net works. PloS One 6:e18961. [22] Wilson JD, W ang S, Muc ha PJ, Bhamidi S, Nobel AB (2009) A testing based extraction algorithm for iden tifying signiﬁcan t comm unities in net works. Oxford Universit y Press. [23] Iba Y (1999) The Nishimori line and Bay esian statistics. J Ph ys A: Math Gen 32:3875. [24] Clauset A, Moore C, Newman MEJ (2008) Hierarchical structure and the prediction of missing links in net works. Nature 453:98. [25] Lancichinetti A, F ortunato S (2012) Consensus clustering in complex netw orks. Nature Scien tiﬁc Rep orts 2:336. [26] Mossel E, Neeman J, Sly A (2012) Sto c hastic blo ck models and reconstruction. [27] Massoulie L (2013) Comm unity detection thresholds and the weak Raman ujan prop erty . [28] Mossel E, Neeman J, Sly A (2013) A pro of of the blo c k mo del threshold conjecture. [29] Kesten H, Stigum BP (1966) A limit theorem for m ultidimensional Galton-W atson pro cesses. Ann Math Stat 37:1211. [30] Kesten H, Stigum BP (1966) Additional limit theorems for indecomp osable multidimensional Galton-W atson pro cesses. Ann Math Stat 37:1463. [31] Nishimori H (2012) Statistical Physics of Spin Glasses and Information Pro cessing. Oxford Univ ersity Press. [32] Montanari A (2008) Estimating random v ariables from random sparse observ ations. Europ ean T ransactions on T elecom- m unications 19:385. [33] Zachary WW (1977) An information ﬂow mo del for conﬂict and ﬁssion in small groups. Journal of Anthropological Researc h :452–473. [34] Adamic LA, Glance N (2005) The p olitical blogosphere and the 2004 US election: divided they blog. Pro ceedings of the 3rd Intl W orkshop on Link Discov ery 452–473. [35] Lusseau D, Schneider K, Boisseau OJ, Haase P , Slo oten E, Dawson SM (2003) The b ottlenose dolphin communit y of Doubtful Sound features a large prop ortion of long-lasting asso ciations. Behavioral Ecology and So ciobiology 54:396. [36] Krebs V Social Net work Analysis softw are & services for organizations, comm unities, and their consultants, www.orgnet. com/ . Accessed September 26, 2014. [37] Lesko vec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Communit y structure in large net works: natural cluster sizes and 11 the absence of large well-deﬁned clusters. Internet Math 6:29. [38] Sales-Pardo M, Guimera R, Moreira AA, Amaral LAN (2007) Extracting the hierarc hical organization of complex systems. Pro c Natl Acad Sci USA 104:15224. [39] Peixoto TP (2014) Hierarchical blo c k structures and high-resolution mo del selection in large net works. Ph ys Rev X 4:011047. [40] Danon L, Diaz-Guilera A, Duch J, Arenas A (2005) Comparing communit y structure identiﬁcation. J Stat Mech 2005:P09008. [41] Lancichinetti A, F ortunato S, Radicch i F (2008) Benc hmark graphs for testing comm unity detection algorithms. Ph ys Rev E 78:046110. [42] Y an X, Jensen JE, Krzak ala F, Mo ore C, Shalizi CR, Zdeb orov´ a L, Zhang P , Zhu Y (2014) Mo del selection for degree- corrected blo c k mo dels. J Stat Mech 2014:P05007. [43] F ortunato S, Barthelemy M (2007) Resolution limit in communit y detection. Pro c Natl Acad Sci USA 104:36. [44] Y edidia J, F reeman W, W eiss Y (2003) Understanding b elief propagation and its generalizations. Exploring Artiﬁcial In telligence in the New Millennium (Morgan Kaufmann Publishers Inc., San F rancisco). [45] M´ ezard M, P arisi G (2001) The Bethe lattice spin glass revisited. Eur Phys J B 20:217. [46] De Almeida J, Thouless D (1978) Stabilit y of the Sherrington-Kirkpatrick solution of a spin glass mo del. J Phys A: Math Gen 11:983. [47] M´ ezard M, Mon tanari A (2006) Reconstruction on trees and spin glass transition. J Stat Ph ys 124:1317. [48] Janson S, Mossel E (2004) Robust reconstruction on trees is determined by the second eigen v alue. Ann Prob :2630–2649. [49] Hu D, Ronhovde P , Nussinov Z (2012) Phase transitions in random Potts systems and the comm unity detection problem. Phil Mag 92:406. [50] Zdeb oro v´ a L (2009) Statistical physics of hard optimization problems. Acta Phys Slov 59:169. [51] M´ ezard M, Mon tanari A (2009) Information, Physics, and Computation. Oxford Univ ersity Press. [52] A C++ implemen tation of our algorithm can b e found at http://panzhang.net . App endix A: Belief Propagation equation and Bethe free energy In this section w e derive the BP up date equations app earing in the main text. BP works with “messages” ψ i → k t : these are estimates, sent from node i to node k , of the marginal probabilit y that t i = t based on i ’s in teractions with no des j 6 = k . If the Hamiltonian is − mQ , the up date equations for these messages are as follows: ψ i → k t = 1 Z i → k Y j ∈ ∂ i \ k q X s =1 e β δ st ψ j → i s Y j 6 = i,k q X s =1 e − β d i d j 2 m δ st ψ j → i s = 1 Z i → k Y j ∈ ∂ i \ k  1 + ψ j → i t (e β − 1)  Y j 6 = i,k  1 + ψ j → i t (e − β d i d j 2 m − 1)  . (A1) Here Z i → k is simply a normalization factor, and ∂ i denotes the neigh b orho od of no de i . The BP estimate of the marginal probabilit y ψ i t = Pr[ t i = t ] is then ψ i t = 1 Z i Y j ∈ ∂ i q X s =1 e β δ st ψ j → i s Y j 6 = i q X s =1 e − β d i d j 2 m δ st ψ j → i s = 1 Z i Y j ∈ ∂ i  1 + ψ j → i t (e β − 1)  Y j 6 = i  1 + ψ j → i t (e − β d i d j 2 m − 1)  , (A2) whic h is the same as (A1) except that we remov e the condition j 6 = k . W e can also estimate the tw o-p oint marginals, and in particular, the probability that tw o neighboring points b elong to the same group. If h ij i ∈ E , the BP estimate of the probabilit y that t i = t and t j = s is ψ ij st = 1 Z ij e β δ st ψ j → i s ψ i → j t . (A3) The up date equations (A1) inv olv e q n 2 messages: ev ery node interacts with ev ery other one, not just their neighbors. Ho wev er, in the sparse case we can simplify the eﬀect of non-neighbors, by replacing them with an external ﬁeld as in [5, 6]. If k / ∈ ∂ i and d i , d k  √ m , w e hav e ψ i t = ψ i → k t X s e − β d i d k 2 m δ st ψ k → i s ≈ ψ i → k t  1 − β d i d k 2 m ψ k → i t  ≈ ψ i → k t . 12 In that case, w e can iden tify the messages ψ i → k t that i sends to its non-neigh b ors k with its marginal ψ i t . Then (A1) simpliﬁes to ψ i → k t = 1 Z i → k Y j ∈ ∂ i \ k  1 + ψ j → i t (e β − 1)  Y j 6 = i,k  1 + ψ j t (e − β d i d j 2 m − 1)  ≈ 1 Z i → k exp   − β d i 2 m θ t + X j ∈ ∂ i \ k log  1 + ψ j → i t (e β − 1)    , (A4) where θ t = n X j =1 d j ψ j t (A5) denotes an external ﬁeld acting on no des in group t , which we up date after eac h BP iteration. Iterating (A4) no w has computational complexit y q m , whic h is linear in the num b er of edges when q is ﬁxed. The Bethe fr e e ener gy of a BP ﬁxed point is a function of the messages: f Bethe = − 1 nβ   X i log Z i − X h ij i∈E log Z ij + β 4 m X t θ 2 t   , (A6) where Z i and Z ij are the normalization constants for the one- and tw o-p oint marginals appearing in (A2) and (A3). BP ﬁxed points are also stationary points of the Bethe free energy [44]. Observ e that the factorize d solution, ψ j → i t = 1 /q , where each no de is equally lik ely to b e in each p ossible group, is alw ays a ﬁxed p oint of the BP equations (A4). Assuming it do es not get stuc k in a lo cal minim um, BP con verges to a retriev al state whenev er its Bethe free energy is less than that of the factorized state. If the netw ork has av erage degree c , this is simply f fact Bethe = − 1 β  log q + c 2 log  1 − 1 q + e β q  − cβ 2 q  . In Fig. 6 w e compare the free energy , conv ergence time, and retriev al mo dularit y for netw orks generated by the sto c hastic blo c k mo del at three diﬀeren t v alues of  , alongside an Erd˝ os-R´ en yi graph of the same av erage degree c = 3. F or small enough β , their free energies are all equal to f fact Bethe , since they are all in the paramagnetic phase. F or eac h v alue of  , there is a critical β R at whic h the free energy splits oﬀ from the others, where makes a transition to a retriev al state with f Bethe < f fact Bethe . The retriev al mo dularit y jumps to a nonzero v alue, indicating comm unity structure, and the conv ergence time diverges at the transition. F or the Erd˝ os-R ´ enyi graph, the apparent mo dularit y also jumps, but at β ∗ = β SG it enters the spin glass phase rather than the retriev al phase: BP fails to conv erge and the retriev al mo dularit y ﬂuctuates, indicating partitions that are uncorrelated with eac h other. App endix B: Relation with the degree-corrected sto chastic blo c k mo del The degree-corrected sto chastic blo ck mo del (DCSBM) w as introduced in [7] to o vercome the fact that the SBM t ypically places low-degree and high-degree vertices into diﬀerent groups, since it exp ects the degree distribution within each group to b e P oisson. The DCSBM’s parameters are the exp ected no de degrees { d i } and a q × q matrix of parameters ω rs . Given a partition { t } , the n umber of edges A ij b et w een each pair h ij i is Poisson-distributed with mean d i d j ω t i ,t j . In the simple graph case where A ij = 1 if h ij i ∈ E and A ij = 0 otherwise, the log-lik eliho o d of the net work is then L ( { t } ) = log P ( G |{ ω ab } , { t } ) = log   Y h ij i∈E d i d j ω t i t j Y h ij i e − d i d j ω t i t j   . (B1) If ω rs = ω in for r = s and ω out for r 6 = s , the likelihoo d can be written as L = X h ij i  log( d i d j ω out ) − d i d j ω out  +  log ω in ω out    X h ij i∈E δ t i t j − ω in − ω out log( ω in /ω out ) X h ij i d i d j δ t i t j   . (B2) 13 0.6 0.8 1 1.2 1.4 1.6 1.8 − 1.3 − 1.2 − 1.1 − 1 − 0.9 − 0.8 − 0.7 β Free energy ε =p out /p in =0.01 ε =p out /p in =0.075 ε =p out /p in =0.15 random network 0 200 400 600 800 1000 Convergence time 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.1 0.2 0.3 0.4 0.5 β Modularity ε =p out /p in =0.01 ε =p out /p in =0.075 ε =p out /p in =0.15 random network F i gu r e 1: Le f t : F r e e e n e r gy ( s ol i d ) an d c on v e r ge n c e t i m e ( d as h e d ) as a f u n c t i on of  f or n e t w or k s ge n e r at e d b y t h e s t o c h as t i c b l o c k m o d e l f or t h r e e d i ↵ e r e n t v al u e s of ✏ = c ou t /c in , al s o c om p ar e d w i t h an E r d ˝ os - R ´ e n y i gr ap h . R i gh t : r e t r i e v al m o d u l ar i t y f or t h e s e n e t w or k s . Al l n e t w or k s h a v e s i z e n = 10 4 an d a v e r age d e gr e e c = 3. T h e n e t w or k s ge n e r at e d b y t h e S B M h a v e q = 2 gr ou p s of e q u al s i z e . If ! rs = ! in f or r = s an d ! ou t f or r 6 = s , t h e l i k e l i h o o d c an b e w r i t t e n as L = X h ij i  l og ( d i d j ! ou t )  d i d j ! ou t  + ✓ l og ! in ! ou t ◆ 2 4 X h ij i2 E  t i t j  ! in  ! ou t l og ( ! in / ! ou t ) X h ij i d i d j  t i t j 3 5 . ( 8) C om p ar i n g w i t h t h e d e ﬁ n i t i on of m o d u l ar i t y , i f w e s e t ! in an d ! ou t s u c h t h at  = l og ! in ! ou t an d 2 m = l og ( ! in / ! ou t ) ! in  ! ou t , ( 9) t h e n t h e s e c on d t e r m i n ( 8) i s  mQ ( { t } ) . S i n c e t h e ﬁ r s t t e r m i n ( 8) d o e s n ot d e p e n d on { t } ,w eh a v e e L ( { t } ) / e  mQ ( { t } ) , an d t h e G i b b s d i s t r i b u t i on i s e x ac t l y t h e G i b b s d i s t r i b u t i on of p ar t i t i on s i n t h e D C S B M . T h u s , f or an y ﬁ x e d  , t h e r e ar e p ar am e t e r s ! in , ! ou t of t h e D C S B M s u c h t h at t h e s e d i s t r i b u t i on s h a v e t h e s am e f r e e e n e r gy an d t h e s am e gr ou n d s t at e . B e l i e f p r op agat i on on t h e D C S B M w as d e s c r i b e d i n [ 5] , an d on e c an op t i m i z e t h e p ar am e t e r s ! in , ! ou t t h r ou gh an e x p e c t at i on - m ax i m i z at i on al gor i t h m an al ogou s t o [ 2, 1] . Ho w e v e r , ou r ap p r oac h i s d i ↵ e r e n t i n s e v e r al w a y s . • W e d e ﬁ n e c om m u n i t y s t r u c t u r e d i r e c t l y i n t e r m s of a c l as s i c m e as u r e , t h e m o d u l ar i t y , as op p os e d t o t h e l og- l i k e l i h o o d of a ge n e r at i v e m o d e l . • R at h e r t h an h a v i n g t o ﬁ t t h e p ar am e t e r s of t h e D C S B M w i t h an E M al gor i t h m , w e h a v e a s i n gl e t e m p e r at u r e p ar am e t e r  . W e c an u s u al l y d e t e c t c om m u n i t i e s b y s e t t i n g  =  ⇤ as i n m ai n t e x t ; at w or s t , w e j u s t h a v e t o a s c an a s m al l r e gi on . • F or r e al - w or l d n e t w or k s t h e r e t r i e v al m o d u l ar i t y ap p e ar s t o b e a go o d gu i d e t o t h e n u m b e r of gr ou p s q ⇤ , w h i l e t h e f r e e e n e r gy of t h e ( D C ) S B M c on t i n u e s t o d e c r e as e f or q> q ⇤ . • O u r ap p r oac h ap p e ar s t o w or k e q u al l y w e l l f or n e t w or k s w i t h P oi s s on d e gr e e d i s t r i b u t i on s ( ge n e r at e d b y t h e S B M ) an d t h os e w i t h h e a v y - t ai l e d d e gr e e d i s t r i b u t i on s , s u c h as t h e LF R b e n c h m ar k [ 7] an d t h e n e t w or k of p ol i t i c al b l ogs , w h e r e t h e D C S B M d o e s m u c h b e t t e r [ 4] . I n p ar t i c u l ar , w e h a v e n o n e e d t o d o m o d e l s e l e c t i on b e t w e e n S B M an d D C S B M , as w as d on e u s i n g t h e B e t h e f r e e e n e r gy i n [ 5] . 3 FIG. 6: Left: F ree energy (solid) and conv ergence time (dashed) as a function of β for netw orks generated by the sto chastic blo c k mo del for three diﬀeren t v alues of  = c out /c in , also compared with an Erd˝ os-R´ enyi graph. Right: retriev al mo dularity for these netw orks. All net works ha ve size n = 10 4 and av erage degree c = 3. The netw orks generated by the SBM hav e q = 2 groups of equal size. Comparing with the deﬁnition of modularity , if w e set ω in and ω out suc h that β = log ω in ω out and 2 m = log( ω in /ω out ) ω in − ω out , (B3) then the second term in (B2) is β mQ ( { t } ). Since the ﬁrst term in (B2) does not dep end on { t } , w e hav e e L ( { t } ) ∝ e β mQ ( { t } ) , and the Gibbs distribution is exactly the Gibbs distribution of partitions in the DCSBM. Th us, for any ﬁxed β , there are parameters ω in , ω out of the DCSBM such that these distributions hav e the same free energy and the same ground state. Belief propagation on the DCSBM was described in [42], and one can optimize the parameters ω in , ω out through an expectation-maximization algorithm analogous to [5, 6]. How ever, our approac h is diﬀeren t in several wa ys. • W e deﬁne communit y structure directly in terms of a classic measure, the mo dularity , as opp osed to the log- lik eliho o d of a generative model. • Rather than having to ﬁt the parameters of the DCSBM with an EM algorithm, we hav e a single temp erature parameter β . W e can usually detect communities b y setting β = β ∗ as in main text; at w orst, w e just hav e to a scan a small region. • F or real-world netw orks the retriev al mo dularit y app ears to be a go o d guide to the n umber of groups q ∗ , while the free energy of the (DC)SBM con tin ues to decrease for q > q ∗ . • Our approach app ears to work equally well for net works with Poisson degree distributions (generated by the SBM) and those with hea vy-tailed degree distributions, such as the LFR b enchmark [41] and the netw ork of p olitical blogs, where the DCSBM do es muc h b etter [7]. In particular, we ha ve no need to do mo del selection b et w een SBM and DCSBM, as w as done using the Bethe free energy in [42]. App endix C: The Nishimori line and the optimal temp erature When data is produced b y an underlying generativ e model, inference of the laten t parameters can be done optimally along the Nishimori line [23, 31], where the Gibbs distribution is exactly the p osterior distribution of the latent parameters (in this case the group lab els or partitions). If the netw ork is generated b y the DCSBM, then (B3) gives a β Nishimori that corresponds to the correct parameters at Nishimori line. Determining the parameters, and therefore β Nishimori , could b e done with an EM algorithm as in [5, 6], but our goal is to av oid this additional learning step. Moreo ver, if the netw ork is not actually generated by the DCSBM, there is a priori no v alue of β that corresponds to the Nishimori line, and no w a y to determine the optimal β without access to the ground truth. 14 ε =p out /p in β P R SG ε * β * β R ( ε ) theory β R ( ε ) experiments Approximate Nishimori Line β SG 0 0.05 0.1 0.15 0.2 0.25 0.3 0.6 0.8 1 1.2 1.4 1.6 1.8 2 F i gu r e 2: T h e p h as e d i agr am f r om t h e m ai n t e x t f or n e t w or k s ge n e r at e d b y t h e s t o c h as t i c b l o c k m o d e l , w i t h t h e ap p r o x i m at e Ni s h i m or i l i n e  Nish imori =  l og ✏ ad d e d ( b l u e ) . R e p l i c a s y m m e t r y b r e ak i n g c an n ot o c c u r on t h e Ni s h i m or i l i n e , an d i n d e e d i t a v oi d s t h e s p i n - gl as s p h as e . I n f e r e n c e at  Nish imori w ou l d b e op t i m al , b u t i t w ou l d r e q u i r e u s t o l e ar n , or i n f e r , t h e c or r e c t v al u e of t h e p ar am e t e r ✏ . 3 T he N i s hi m o r i l i ne a nd t he o pt i m a l t e m p e r a t ur e W h e n d at a i s p r o d u c e d b y an u n d e r l y i n g ge n e r at i v e m o d e l , i n f e r e n c e of t h e l at e n t p ar am e t e r s c an b e d on e op t i m al l y al on g t h e Ni s h i m or i l i n e [ 8, 9] , w h e r e t h e G i b b s d i s t r i b u t i on i s e x ac t l y t h e p os t e r i or d i s t r i b u t i on of t h e l at e n t p ar am e t e r s ( i n t h i s c as e t h e gr ou p l ab e l s or p ar t i t i on s ) . I f t h e n e t w or k i s ge n e r at e d b y t h e D C S B M , t h e n ( 9) gi v e s a  Nish imori t h at c or r e s p o n d s t o t h e c or r e c t p ar am e t e r s at Ni s h i m or i l i n e . D e t e r m i n i n g t h e p ar am e t e r s , an d t h e r e f or e  Nish imori , c ou l d b e d on e w i t h an E M al gor i t h m as i n [ 1, 2] , b u t ou r goal i s t o a v oi d t h i s ad d i t i on al l e ar n i n g s t e p . M or e o v e r , i f t h e n e t w or k i s n ot ac t u al l y ge n e r at e d b y t h e D C S B M , t h e r e i s a p r i or i n o v al u e of  t h at c or r e s p on d s t o t h e Ni s h i m or i l i n e , an d n o w a y t o d e t e r m i n e t h e op t i m al  w i t h ou t ac c e s s t o t h e gr ou n d t r u t h . Ho w e v e r , f or s y n t h e t i c n e t w or k s ge n e r at e d b y t h e S B M , w e c an c on s t r u c t an ap p r o x i m at e Ni s h i m or i l i n e b y om i t t i n g t h e d i ↵ e r e n c e b e t w e e n t h e S B M an d t h e D C S B M , b y as s u m i n g t h at t h e e x p e c t e d d e gr e e s ar e ac t u al l y t h e s am e . T h i s gi v e s  Nish imori = l og ( c in /c ou t )=  l og ✏ . I n F i g. 2 w e s h o w t h e p h as e d i agr am f r om t h e m ai n t e x t w i t h t h i s ap p r o x i m at e Ni s h i m or i l i n e ad d e d . I t p as s e s t h r ou gh t h e c r i t i c al p oi n t ( ✏ ⇤ ,  ⇤ ) ( on e c an c h e c k an al y t i c al l y t h at  ⇤ =  l og ✏ ⇤ ) an d t h at i t a v oi d s t h e s p i n - gl as s p h as e , p as s i n g d i r e c t l y f r om t h e p ar am agn e t i c p h as e t o t h e r e t r i e v al p h as e . T h i s r e c o v e r s t h e f ac t t h at r e p l i c a s y m m e t r y b r e ak i n g c an n ot o c c u r on t h e Ni s h i m or i l i n e [ 10] . 4 C ho o s i ng t he n um b e r o f g r o ups C h o os i n g t h e n u m b e r q of gr ou p s i n a n e t w or k i s a c l as s i c m o d e l s e l e c t i on p r ob l e m . S e t t i n g q b y m ax i m i z i n g t h e m o d u l ar i t y i s a w i d e l y - u s e d h e u r i s t i c i n t h e n e t w or k l i t e r at u r e ; h o w e v e r , as w e h a v e al r e ad y s e e n , i t i s p r on e t o o v e r ﬁ t t i n g. F or e x am p l e , t h e m ax i m u m m o d u l ar i t y f or an E r d ˝ os - R ´ e n y i gr ap h i s an i n c r e as i n g f u n c t i on of q , w h i l e t h e c or r e c t m o d e l h as q = 1. S i m i l ar l y , i n t h e s t o c h as t i c b l o c k m o d e l t h e l i k e l i h o o d i n c r e as e s , or t h e gr ou n d s t at e e n e r gy d e c r e as e s , u n t i l e v e r y n o d e i s as s i gn e d t o i t s o w n gr ou p . 4 FIG. 7: The phase diagram from the main text for netw orks generated by the stochastic blo ck mo del, with the appro ximate Nishimori line β Nishimori = − log  added (blue). Replica symmetry breaking cannot o ccur on the Nishimori line, and indeed it a voids the spin-glass phase. Inference at β Nishimori w ould b e optimal, but it would require us to learn, or infer, the correct v alue of the parameter  . Ho wev er, for syn thetic net works generated b y the SBM, w e can construct an appro ximate Nishimori line b y omitting the diﬀerence betw een the SBM and the DCSBM, by assuming that the exp ected degrees are actually the same. This giv es β Nishimori = log( c in /c out ) = − log  . In Fig. 7 w e sho w the phase diagram from the main text with this appro ximate Nishimori line added. It passes through the critical point (  ∗ , β ∗ ) (one can chec k analytically that β ∗ = − log  ∗ ) and that it a voids the spin-glass phase, passing directly from the paramagnetic phase to the retriev al phase. This recov ers the fact that replica symmetry breaking cannot o ccur on the Nishimori line [32]. App endix D: Cho osing the num b er of groups Cho osing the num b er q of groups in a netw ork is a classic mo del selection problem. Setting q b y maximizing the mo dularit y is a widely-used heuristic in the netw ork literature; how ev er, as we ha ve already seen, it is prone to o verﬁtting. F or example, the maximum mo dularit y for an Erd˝ os-R ´ enyi graph is an increasing function of q , while the correct model has q = 1. Similarly , in the sto chastic blo ck mo del the lik eliho o d increases, or the ground state energy decreases, un til every no de is assigned to its o wn group. One approac h [5, 6] is to use the free energy rather than the ground state energy . In essence, the en tropic term p enalizes ov erﬁtting, and gives us the total likelihoo d of the mo del summed o ver all partitions, as opp osed to the lik eliho o d of the b est partition. This approac h works w ell on syn thetic graphs: the free energy decreases until we reac h the correct num b er of groups, after which it stays roughly constant. How ever, on real-w orld netw orks the free 15 β Retrieval Modularity 0.5 1 1.5 2 2.5 0 0.1 0.2 0.3 0.4 0 300 600 900 Convergence time PR β Retrieval Modularity 0.5 1 1.5 2 2.5 0 0.1 0.2 0.3 0.4 0 600 1200 1800 Convergence time P R SG β Retrieval Modularity 0.5 1 1.5 2 2.5 0 0.1 0.2 0.3 0.4 0 300 600 900 Convergence time P R SG F i gu r e 3: R e t r i e v al m o d u l ar i t y ( b l u e ⇥ ) an d B P c on v e r ge n c e t i m e ( r e d + ) of K ar at e c l u b n e t w or k w i t h 2 gr ou p s ( t op l e f t ) , 3 gr ou p s ( t op r i gh t ) , an d 4 gr ou p s ( b ot t om ) . W i t h q = 2, w h i c h i s t h e gr ou n d t r u t h v al u e , t h e s y s t e m h as a v e r y s t r on g c om m u n i t y s t r u c t u r e , r e p r e s e n t e d b y a l ar ge r e t r i e v al p h as e s t ar t i n g at  R =0 . 565. W i t h q = 3, t h e r e t r i e v al p h as e e x i s t s b e t w e e n  R =0 . 79 an d  SG =1 . 35; c om p ar e F i g. 2 ( r i gh t ) i n t h e m ai n t e x t . W i t h q = 4 gr ou p s , t h e r e t r i e v al p h as e b e c om e s e v e n n ar r o w e r , b e t w e e n  R =0 . 97 an d  SG =1 . 3. 6 FIG. 8: Retriev al mo dularity (blue × ) and BP con vergence time (red +) of Karate club netw ork with 2 groups (top left), 3 groups (top righ t), and 4 groups (b ottom). With q = 2, which is the ground truth v alue, the system has a very strong comm unity structure, represen ted by a large retriev al phase starting at β R = 0 . 565. With q = 3, the retriev al phase exists b et w een β R = 0 . 79 and β SG = 1 . 35; compare Fig. 2 (righ t) in the main text. With q = 4 groups, the retriev al phase becomes ev en narrow er, b etw een β R = 0 . 97 and β SG = 1 . 3. energy contin ues to decrease with q , for example as sho wn in Fig. 8 of [5]. Thus, for net works not generated by the SBM, it is not clear that this method works. Here we prop ose to use the retriev al mo dularit y Q ( { ˆ t } ) as a criterion for c ho osing q . Namely , w e claim that Q ( { ˆ t } ) increases with q until w e reach the correct v alue q ∗ . F or q > q ∗ , either Q ( { ˆ t } ) stays the same, or the retriev al phase disapp ears and we enter the spin glass phase. In Fig. 8 we plot Q ( { ˆ t } ) and BP conv ergence time for the k arate club net work with diﬀeren t v alues of q . With q = 2, i.e., the ground-truth num b er of groups, the retriev al phase is v ery large. F or larger q , the retriev al phase b ecomes narrow er, and Q ( { ˆ t } ) do es not increase. Note the similarity with Fig. 2 (righ t) in the main text. In Fig. 9, we plot Q ( { ˆ t } ) for diﬀerent v alues of q as a function of β for three netw orks with known comm unity structure: a syn thetic netw ork generated by the SBM with q ∗ = 4, the k arate club with q ∗ = 2 [33], and a netw ork of p olitical b ooks with q ∗ = 3 [36]. In each case, Q ( { ˆ t } ) stops growing at q = q ∗ , and is nearly indep endent of β throughout the retriev al phase. (T o deal with ﬂuctuations, in practice we do not increase q unless the retriev al mo dularit y increases by at least some threshold v alue.) Thus our metho d giv es the correct num b er of communities, rather than o verﬁtting. Note that here q ∗ refers to the top level of organization in the netw ork. In the main text, we discuss using our approac h to recursively divide communities into sub comm unities. In that case, we use this pro cedure to determine the num b er q ∗ of sub comm unities we should split the net work into at each stage, and stop splitting when we reac h comm unities with q ∗ = 1. App endix E: Additional comparisons with Louv ain and OSLOM In Fig. 10 we show comparisons b etw een our BP algorithm, Louv ain [9], and OSLOM [21] on netw orks with pow er- la w degree distributions. On the left, the graphs are generated b y the LFR benchmark pro cess [41]. W e sho w the normalized mutual information [40] as a function of the mixing parameter µ . As for the SBM graphs shown in the 16 0 1 2 3 4 0 0.1 0.2 0.3 0.4 0.5 β Retrieval modularity q=2 q=3 q=4 q=5 q=6 0 1 2 3 4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 β Retrieval modularity q=2 q=3 q=4 q=5 q=6 0 1 2 3 4 0 0.1 0.2 0.3 0.4 0.5 β Retrieval modularity q=2 q=3 q=4 q=5 q=6 F i gu r e 4: R e t r i e v al m o d u l ar i t y as a f u n c t i on of q f or t h r e e n e t w or k s w h e r e t h e n u m b e r of gr ou p s i s k n o w n : a n e t w or k ge n e r at e d b y t h e s t o c h as t i c b l o c k m o d e l w i t h q ⇤ = 4, n = 10 4 , an d ✏ =0 . 1 ( t op l e f t ) , t h e k ar at e club wit h q ⇤ = 2 ( t op r i gh t ) an d t h e n e t w or k of p ol i t i c al b o ok s w i t h q ⇤ = 3 ( b ot t om ) . I n e ac h c as e , f or q> q ⇤ t h e r e t r i e v al m o d u l ar i t y s t op s gr o w i n g u n t i l t h e s p i n gl as s p h as e ap p e ar s . 7 FIG. 9: Retriev al mo dularit y as a function of q for three netw orks where the num b er of groups is known: a net work generated b y the sto chastic block mo del with q ∗ = 4, n = 10 4 , and  = 0 . 1 (top left), the k arate club with q ∗ = 2 (top righ t) and the net work of political bo oks with q ∗ = 3 (b ottom). In each case, for q > q ∗ the retriev al modularity stops gro wing until the spin glass phase app ears. main text, there is a parameter range where BP ac hieves a higher NMI than the other algorithms. On the right, w e show results for a net work with no communit y structure, where the degree distribution follows a p ow er law with exp onen t − 2. While BP correctly chooses q ∗ = 1 as the num b er of groups, the other algorithms ov erﬁt, ﬁnding a n umber of comm unities that grows with the netw ork size. These results are similar to those shown in Fig. 5 of the main text. App endix F: The resolution limit In this section we describ e results of our algorithm on the ring-of-cliques net work, which is the standard example of the resolution limit [43]. This netw ork has size n = ab ; it consists of a cliques, eac h of whic h is comp osed of b no des, and which are connected to the neighboring cliques by a single link. Thus the intuitiv ely correct partition of the netw ork puts each clique in to one group. How ev er, when b is suﬃciently small compared to a , maximizing the mo dularit y forces us to combine multiple cliques [43]. F or example, if a = 24 and b = 5, the correct partition with 24 groups has mo dularit y 0 . 8674, while the division with 12 groups of 2 cliques each has mo dularity 0 . 8712. As a consequence, maximizing the modularity fails to divide the net w ork correctly into the cliques. In Fig. 11 we plot the dendrogram obtained by our hierarchical clustering algorithm starting from 3 diﬀerent initial conditions (from top to b ottom). All three dendrograms hav e 2 levels b elo w the ro ot. The ﬁrst split creates groups consisting of m ultiple cliques, but the second split correctly assigns each clique to its o wn group. At that p oin t the algorithm concludes that the cliques hav e no internal structure, and it stops sub dividing. This suggests that our hierarc hical clustering algorithm may b e able to a void the resolution limit. 17 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 1 µ NMI Modularity BP OSLOM Louvain 4096 16384 65536 0 20 40 60 Number of nodes Number of groups Modularity BP OSLOM Louvain F i gu r e 5: C om p ar i s on of B P , t h e Lou v ai n m e t h o d , an d O S LO M on b e n c h m ar k n e t w or k s w i t h p o w e r - l a w d e gr e e d i s t r i b u t i on s . O n t h e l e f t , n e t w or k s ar e LF R b e n c h m ar k s w i t h n = 10 4 an d c = 4. T h e d i s t r i b u t i on of c om m u n i t y s i z e s f ol l o w s a p o w e r l a w w i t h e x p on e n t  1, r an gi n g f r om 200 t o 400. T h e d e gr e e d i s t r i b u t i on i s a p o w e r l a w w i t h e x p on e n t  2, an d t h e m ax i m u m d e gr e e i s 30. W e s h o w t h e n or m al i z e d m u t u al i n f or m at i on ( NM I ) as a f u n c t i on of t h e m i x i n g p ar am e t e r µ , an d t h e r e i s a r an ge of µ w h e r e B P ac h i e v e s a h i gh e r NM I t h an t h e ot h e r al gor i t h m s . O n t h e r i gh t , w e s h o w r e s u l t s on a r an d om gr ap h w i t h n o c om m u n i t y s t r u c t u r e , w i t h a p o w e r l a w d e gr e e d i s t r i b u t i on w i t h e x p on e n t  2 an d m e an c = 6. He r e B P c or r e c t l y c h o os e s q ⇤ =1 f or t h e n u m b e r of gr ou p s , w h i l e t h e ot h e r al gor i t h m s o v e r ﬁ t , s e l e c t i n g a n u m b e r of gr ou p s t h at gr o w s w i t h n . F or b ot h gr ap h s , e ac h d at a p oi n t i s a v e r age d o v e r 20 i n s t an c e s . C om p ar e F i g. 5 i n t h e m ai n t e x t . F i gu r e 6: T h r e e d e n d r ogr am s ob t ai n e d b y ou r h i e r ar c h i c al c l u s t e r i n g al gor i t h m on t h e r i n g of c l i q u e s , ge n e r at e d b y i n d e p e n d e n t r u n s w i t h d i ↵ e r e n t i n i t i al c on d i t i on s . He r e t h e r e ar e a = 24 c l i q u e s of s i z e b =5 e ac h . T h e n u m b e r i n s i d e e ac h n o d e i n d i c at e s t h e n u m b e r of n o d e s i n i t . I n al l t h r e e r u n s , t h e ﬁ r s t l e v e l of s p l i t t i n g m e r ge s m u l t i p l e c l i q u e s t oge t h e r , b u t t h e s e c on d l e v e l c or r e c t l y d i v i d e s t h e n e t w or k i n t o i n d i v i d u al cliques. This o ↵ e r s s om e e v i d e n c e t h at ou r h i e r ar c h i c al al gor i t h m c an o v e r c om e t h e r e s ol u t i on l i m i t , as op p os e d t o al gor i t h m s t h at m ax i m i z e t h e m o d u l ar i t y . 8 FIG. 10: Comparison of BP , the Louv ain metho d, and OSLOM on b enchmark netw orks with p o wer-la w degree distributions. On the left, netw orks are LFR b enchmarks with n = 10 4 and c = 4. The distribution of communit y sizes follows a pow er la w with exp onent − 1, ranging from 200 to 400. The degree distribution is a p o wer law with exponent − 2, and the maximum degree is 30. W e sho w the normalized mutual information (NMI) as a function of the mixing parameter µ , and there is a range of µ where BP achiev es a higher NMI than the other algorithms. On the right, we sho w results on a random graph with no comm unity structure, with a pow er law degree distribution with exp onent − 2 and mean c = 6. Here BP correctly chooses q ∗ = 1 for the num b er of groups, while the other algorithms ov erﬁt, selecting a num ber of groups that grows with n . F or b oth graphs, each data p oin t is a veraged o ver 20 instances. Compare Fig. 5 in the main text. 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 1 µ NMI Modularity BP OSLOM Louvain 4096 16384 65536 0 20 40 60 Number of nodes Number of groups Modularity BP OSLOM Louvain F i gu r e 5: C om p ar i s on of B P , t h e Lou v ai n m e t h o d , an d O S LO M on b e n c h m ar k n e t w or k s w i t h p o w e r - l a w d e gr e e d i s t r i b u t i on s . O n t h e l e f t , n e t w or k s ar e LF R b e n c h m ar k s w i t h n = 10 4 an d c = 4. T h e d i s t r i b u t i on of c om m u n i t y s i z e s f ol l o w s a p o w e r l a w w i t h e x p on e n t  1, r an gi n g f r om 200 t o 400. T h e d e gr e e d i s t r i b u t i on i s a p o w e r l a w w i t h e x p on e n t  2, an d t h e m ax i m u m d e gr e e i s 30. W e s h o w t h e n or m al i z e d m u t u al i n f or m at i on ( NM I ) as a f u n c t i on of t h e m i x i n g p ar am e t e r µ , an d t h e r e i s a r an ge of µ w h e r e B P ac h i e v e s a h i gh e r NM I t h an t h e ot h e r al gor i t h m s . O n t h e r i gh t , w e s h o w r e s u l t s on a r an d om gr ap h w i t h n o c om m u n i t y s t r u c t u r e , w i t h a p o w e r l a w d e gr e e d i s t r i b u t i on w i t h e x p on e n t  2 an d m e an c = 6. He r e B P c or r e c t l y c h o os e s q ⇤ =1 f or t h e n u m b e r of gr ou p s , w h i l e t h e ot h e r al gor i t h m s o v e r ﬁ t , s e l e c t i n g a n u m b e r of gr ou p s t h at gr o w s w i t h n . F or b ot h gr ap h s , e ac h d at a p oi n t i s a v e r age d o v e r 20 i n s t an c e s . C om p ar e F i g. 5 i n t h e m ai n t e x t . 12 0 15 15 15 10 20 15 15 15 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 12 0 10 10 15 15 10 10 5 15 10 15 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 12 0 25 15 20 10 15 15 20 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 F i gu r e 6: T h r e e d e n d r ogr am s ob t ai n e d b y ou r h i e r ar c h i c al c l u s t e r i n g al gor i t h m on t h e r i n g of c l i q u e s , ge n e r at e d b y i n d e p e n d e n t r u n s w i t h d i ↵ e r e n t i n i t i al c on d i t i on s . He r e t h e r e ar e a = 24 c l i q u e s of s i z e b =5 e ac h . T h e n u m b e r i n s i d e e ac h n o d e i n d i c at e s t h e n u m b e r of n o d e s i n i t . I n al l t h r e e r u n s , t h e ﬁ r s t l e v e l of s p l i t t i n g m e r ge s m u l t i p l e c l i q u e s t oge t h e r , b u t t h e s e c on d l e v e l c or r e c t l y d i v i d e s t h e n e t w or k i n t o i n d i v i d u al cliques. This o ↵ e r s s om e e v i d e n c e t h at ou r h i e r ar c h i c al al gor i t h m c an o v e r c om e t h e r e s ol u t i on l i m i t , as op p os e d t o al gor i t h m s t h at m ax i m i z e t h e m o d u l ar i t y . 8 FIG. 11: Three dendrograms obtained b y our hierarchical clustering algorithm on the ring of cliques, generated by independent runs with diﬀerent initial conditions. Here there are a = 24 cliques of size b = 5 each. The n um b er inside each node indicates the num ber of no des in it. In all three runs, the ﬁrst lev el of splitting merges multiple cliques together, but the second level correctly divides the netw ork into individual cliques. This oﬀers some evidence that our hierarchical algorithm can o vercome the resolution limit, as opp osed to algorithms that maximize the modularity .

Scalable detection of statistically significant communities and hierarchies, using message-passing for modularity

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment