Bayesian Degree-Corrected Stochastic Blockmodels for Community Detection

BA YESIAN DEGREE-CORRECTED STOCHASTIC BLOCKMODELS F OR COMMUNITY DETECTION By Lijun Peng and Luis Car v alho ∗ Boston Uni versity Comm unity detection in net works has drawn muc h attention in div erse ﬁelds, esp ecially so cial sciences. Given its signiﬁcance, there has been a large b o dy of literature with ap proaches from man y ﬁelds. Here we present a statistical framework that is represen tative, exten- sible, and that yields an estimator with go od prop erties. Our p ro- posed approach considers a sto chastic blo c kmo del based on a logis- tic regression formulation with no de corre ction terms. W e follow a Ba yesian approac h that explicitly captures the communit y behavior via prior sp eciﬁcation. W e further adopt a data augmen tation strat- egy with laten t P´ oly a-Gamma v ariables to obtain p osterior samples. W e conduct inference based on a prin cipled, canonically mapped cen- troid estimator that formally addresses lab el non-iden tiﬁabilit y and captures represen tativ e comm unit y assignmen ts. W e demonstrate the proposed mo del and estimation on real-world as well as simulated benchmark netw orks and sho w that the prop osed model and esti- mator are more ﬂexible, representativ e, and yield smaller error rates when compared to the MAP estimator from classical deg ree-corrected stochastic blo ckmodels. 1. In tro duction. Net w orks can b e used to describ e interactions among ob jects in di- v erse ﬁelds suc h as ph ysics ( Newman , 2006 ), biology ( Hancock et al. , 2010 ), and esp ecially so cial sciences ( Zachary , 1977 ; Adamic and Glance , 2005 ). In netw ork theory , ob jects are represen ted b y no des and their in teractions b y e dges . Clusters of no des that share man y edges b etw een them but that, in contrast, do not in teract often with no des in other clusters can b e though t of as c ommunities . This characterization follo ws a traditional approac h in so cial sciences that aims at discerning the structure of a net work according to relationship patterns among “actors” , e.g. friendship or collaboration. These in teraction patterns ma y reﬂect “assortativit y” , a concept that originated in the ecological and epidemiological litera- ture ( Alb ert and Barab´ asi , 2002 ): it refers to the tendency of no des to asso ciate with other similar no des in a netw ork. Among measures of similarity , the degree of a no de is of com- mon in terest in the study of assortativit y in netw orks ( Newman , 2002 , 2003 ; V´ azquez , 2003 ), that is, assortativ e netw orks usually sho w a preference for high-degree no des to connect to other high-degree no des. W e exp ect in some applications that actors exercise assortativit y and prefer to group themselves according to similarit y or kinship in commun ities, and so comm unities are dense in within-group associations but sp arse in b etw een-group interac- tions. Thus , not surprisingly , communit y detection has sparked great in terest in many ﬁelds where recen t applications aim at c haracterizing the structure of a netw ork b y detecting its comm unities. There hav e b een many approaches to address comm unit y detection (see Section 2 for a more thorough review), but a common mo deling c hoice is to treat actors as b eha ving ∗ Supported b y NSF grant DMS-1107067. Keywor ds and phr ases: comm unity detection, label non-iden tiﬁabilit y, canonical remapping, cen troid es- timation, P´ oly a-Gamma latent v ariable 1 L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 2 similarly given their resp ective comm unities. This structural equiv alence assumption is at the core of blo ckmo dels ( Lorrain and White , 1971 ), whic h were later extended to stochastic blo c kmo dels ( Holland and Leinhardt , 1981 ; Fienberg et al. , 1985 ). Here, to tackle comm unit y detection, w e adopt a hierarc hical Bay esian sto chastic blo ckmodel where group labels are random. W e contend that a suitable prior sp eciﬁcation is essen tial t o accurately character ize assortativ e behavior, and th us that a Bay esian approach is essen tial to comm unity detection (see, e.g., the examples in Section 7.1 .) Our results can b e connected to the work of No wic ki and Snijders ( 2001 ), Karrer and Newman ( 2011 ) and Hofman and Wiggins ( 2008 ) but we mak e t wo important distinctions: (i) we cap ture communi ty behavi or by expli citly requiring that the probability of within-group asso ciations is higher than b etw een-group relations; and (ii) we address parameter and lab el non-iden tiﬁabilit y issues directly by remapping conﬁgurations to a unique canonical space. The ﬁrst p oin t is important in ligh t of the examples in the last section. The second point allows us to sample from the posterior space of lab el conﬁgurations more eﬃciently and to formally deﬁne an estimator based on a meaningful loss function. Moreov er, our mo del can be related to the w ork of Mariadassou et al. ( 2010 ) and V u et al. ( 2013 ) as they are all based on exp onential-family clustering framew orks, but our mo del is diﬀeren t from theirs in tw o resp ects b esides the tw o p oints just men tioned: (i) we mak e exact inference by adopting latent v ariables, rather than adopting appro ximate v ariational approac hes; and (ii) w e add more ﬂexibility by requiring h yp er-prior structure on mo del parameters con trolling degree correction. More sp eciﬁcally , we make the following contributions: (1) W e propose a Bay esian degree-corrected stoc hastic blockmodel for communit y detection that explicitly characterizes comm unit y b eha vior. W e discuss this new mo del and how w e account for parameter non-identiﬁabili ty in Section 3 . (2) W e treat lab el non-identiﬁabilit y issues by deﬁning a canonical pro jection of the space of lab el conﬁgurations in Section 4 . (3) W e dev elop an eﬃcient p osterior sampler b y iden tifying go o d initial conﬁgurations through approximate mo de ﬁnding and then exploring a Gibbs sampler based on a data augmen tation strategy in Section 5 . (4) W e prop ose a r emapp e d centroid estimator for communit y inference in Section 6 . This new estimator is based on Hamming loss and is arguably a go o d represen tativ e of a pro jected space of lab el conﬁgurations. In Section 7 we show that our prop osed metho d is eﬃcient and able to ﬁt medium- sized net w orks with thousands of no des in reasonable time. Moreo ver, w e sho w that our prop osed estimator yields, in practice, smaller misclassiﬁcation rates due to a more reﬁned loss function when compared to the ML-based estimators. Finally , in Section 8 , w e oﬀer some concluding remarks and directions for future w ork. 2. Prior and Related W ork. There is a large bo dy of literatur e in comm unit y de- tection, giv en its signiﬁcance and interest. T raditional metho ds include graph partitioning ( Kernighan and Lin , 1970 ; Barnes , 1982 ), hierarc hical clustering ( Hastie et al. , 2001 ), and sp ectral clustering ( Donath and Hoﬀman , 1973 ; V on Luxburg , 2007 ; Rohe et al. , 2011 ); while these metho ds are heuristic and th us suitable for large netw orks, they do not address directly comm unit y detection but aim instead at partitioning the net work according to edge densities b et w een groups and thus identifying connection “b ottlenec ks” . The concept of mo dularity b etter captu res communit y structure b y also taking within- group edge densities into account ( Newman and Girv an , 2004 ; Newman , 2006 ). Optimization L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 3 metho ds based on modularity can then be used to detect communit ies, but since mo dularit y optimization is NP-complete ( Brandes et al. , 2007 ), in terest lies mostly in appro ximated metho ds such as the greedy metho d of Newman ( 2004 ) and extremal optimization ( Duch and Arenas , 2005 ; Bick el and Chen , 2009 ). How ev er, there are still dra wbac ks: methods based on mo dularity ma y fail in detecting small comm unities and thus exhibit a “resolution limit” ( F ortunato and Barthelem y , 2007 ). Latent space netw ork mo dels ( Hoﬀ et al. , 2002 ), laten t v ariable mo dels ( Hoﬀ et al. , 2005 ), and latent p osition cluster mo dels ( Handco c k et al. , 2007 ) assume that the probabilit y of an interaction dep ends on no de-speciﬁc laten t factors such as the distance betw een tw o no des in an unobserv ed contin uous “so cial space” ; these mo dels are generalizations of exp onen tial random graph mo dels [ERGMs; see ( Robins et al. , 2007 )] where comm unit y structure is assumed from cluster structure in the laten t space. There are man y other methods to men tion [see, for example, the review i n ( P arthasarath y et al. , 2011 )], but we fo cus on parametric statistical approac hes where inference on commu- nit y structure is based on an assumed mo del of asso ciation. The motiv ation is that since there are many p ossible communit y conﬁgurations, that is, assignment of actors to commu- nities, we wan t to not only infer comm unities, but to also assess ho w lik ely eac h conﬁguration is according to the mo del. The ﬁrst endeav ors in such parametric mo dels—alb eit not in communit y detection—are the p 1 exp onen tial family mo dels due to Holland and Leinhardt ( 1981 ). These mo dels fol- lo w a log-linear formulation ( Fien b erg and W asserman , 1981 ) with parameters that are related to in- and out-degrees and edge densities. Later, these mo dels w ere extended to in- corp orate actor and group parameters ( Fien b erg et al. , 1985 ; T allb erg , 2005 ; Daudin et al. , 2008 ). W ang and W ong ( 1987 ) further adapted the mo dels to consider a block structure through sto chastic blo ckmo dels [SBMs ( Holland et al. , 1983 ; Anderson et al. , 1992 )], yield- ing p 1 blo c kmo dels. Zanghi et al. ( 2010 ), Mariadassou et al. ( 2010 ) and V u et al. ( 2013 ) prop osed scalable appro ximate v ariational approaches based on mo diﬁed v ersion of those p 1 (blo c k)models. Sto c hastic blo ckmodels explore a simpler mo del structure where the probability of an as- so ciation b et w een t wo actors dep ends on the groups to which they belong, that is, tw o actors within the same group are sto chastically equiv alen t. Karrer and Newman ( 2011 ) developed an SBM that allo ws for de gr e e-c orr e ction , that is, mo dels where the degree distribution of no des within eac h group can b e heterogeneous. Celisse et al. ( 2012 ), Choi et al. ( 2012 ) and Bic k el et al. ( 2013 ) addressed the asymptotic inference in SBM by use of maximum likeli- ho od and v ariational approac hes. More ﬂexible approaches generalize the SBM b y adopting a hierarchical Bay esian setup that regards probabilities of asso ciation as random and group mem b ership as laten t v ariables ( Snijders and No wic ki , 1997 ; No wic ki and Snijders , 2001 ; Hofman and Wiggins , 2008 ). As in all latent mixture mo dels, lab el non-identi ﬁabilit y is a kno wn problem since mul tiple lab el assignmen ts yield the same partition int o communities; ultimately , we only care if tw o actors are in the same comm unit y or in diﬀerent comm unities. It is also p ossible to incorporate node attributes in the mo del ( Kim and Lesk ov ec , 2011 ; F osdick and Hoﬀ , 2013 ) and to allow actors to b elong to more than one communit y ( Airoldi et al. , 2008 ). 3. A Ba y esian Sto chastic Blo ckm o del for Comm unit y Detection. Under our comm unit y detection setup we assume a ﬁxe d num ber of groups K ≥ 2 and we are given, as data, a matrix [ A ] ij represen ting relationships b et w een “actors” i and j in a netw ork with n > K no des. W e represent the assignment of actors to comm unities through σ : L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 4 { 1 , . . . , n } 7→ { 1 , . . . , K } , a vector of lab els : σ i = k codes for the i -th individual b elonging to the k -th commun ity . A simple sto c hastic blo c kmo del sp eciﬁes that the probabilit y of an edge b etw een actors i and j dep ends only on their lab els σ i and σ j , and that σ follo ws a pro duct multinomial distribution: (1) A ij | σ, θ ind ∼ Bern  θ σ i σ j  , i, j = 1 , . . . , n, i < j, σ i iid ∼ MN (1; π ) , i = 1 , . . . , n, where π is a v ector of prior probabilities o v er K lab els, parameter θ kk is the “within” probabilit y of a relationship in communit y k , and θ kl is the “b et w een” probabilit y of a relationship for communit ies k and l , k , l = 1 , . . . , K , k < l . If we deﬁne θ w . = θ 11 = · · · = θ K K and θ b . = θ 12 = · · · = θ K − 1 ,K , w e hav e a simpler mo del with single within and b etw een probabilities ( Hofman and Wiggins , 2008 ). W e regard SBMs as log-linear mo dels and exploit this form ulation to deﬁne a no de- c orr e cte d SBM by (2) A ij | σ, γ , η ind ∼ Bern  logit − 1 ( γ σ i σ j + η i + η j )  where, in logit scale, parameters γ capture within and b et w een communi ty probabilities of asso ciation and node inter cepts η = ( η 1 , . . . , η n ) capture the exp ected degrees of the no des. T o av oid redundancies, we only co de γ kl for k ≤ l . W e note that without η , mo del ( 2 ) is equiv alen t to mo del ( 1 ) with γ kl =logit( θ kl ). W e also remark that we call the ab ov e mo del no de-corrected, which is arguably more suitable for a broader generalized linear mo del form ulation; in Karrer and Newman’s approach the observed A ij follo w a P oisson distribution, and so η is related to exp ected log degrees, and hence their degree-correction denomination ( Karrer and Newman , 2011 ). 3.1. Par ameter Identiﬁabili ty. In what follows, to simplify the notation we group β = ( γ , η ) and deﬁne the design matrix X associated to mo del ( 2 ) such that A ij | σ, β iid ∼ Bern  logit − 1 ( x ij ( σ ) > β )  . Note that we mak e explicit the dep endence of each row x ij on the lab els σ . Mo del ( 2 ) has then  K 2  + K + n parameters, but the next result shows that only  K 2  + n parameters are needed for the mo del to b e identiﬁable if each communit y has at least t w o no des (the pro of is in App endix 9.1 .) Theorem 1 . The design matrix X asso ciate d with mo del ( 2 ) has the fol lowing pr op er- ties: (1) It has K line arly dep endent c olumns. (2) It is ful l c olumn-r anke d if and only if e ach c ommunity has at le ast two no des. Based on these t w o criteria, to attain an iden tiﬁable mo del w e remo ve K parameters from γ and mo dify the prior on σ to a constrained multinomial distribution, P ( σ ) ∝ K Y k =1 I ( N k > 1) n Y i =1 π I ( σ i = k ) k , L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 5 where I ( · ) is the indicator function and N k = P i I ( σ i = k ) is the num ber of no des in comm unit y k . There are still problems with label ident iﬁabilit y that w e address b y label remapping in the Section 4 ; for no w, to allow for a straigh tforward remapping of comm unit y lab els, w e just set (3) γ 11 = · · · = γ K K = 0 to remo ve the redundant γ parameters. 3.2. Hier ar chic al mo del for c ommunity dete ction. W e attain a more realistic mo del by further setting a h yp er-prior distribution on γ = ( γ 12 , . . . , γ K − 1 ,K ), η , and π , (4) β = ( γ , η ) ∼ I ( γ ≤ 0) · N  0 , τ 2 I n + ( K 2 )  , π ∼ Dir ( α 1 , . . . , α K ) , where τ 2 con trols how informativ e the prior is. The prior on γ and η can b e seen as a ridge regularization for the logistic regression in ( 2 ). The constraint γ ≤ 0 in this sto chastic blo c kmo del is essential to communit y detection since we should exp ect as man y as or few er edges betw een comm unities than within communities on a v erage, and th us that the log-o dds of betw een and within probabilities is non-positive. The conjugate prior on π adds more ﬂexibilit y to the model, and is imp ortant when iden tifying comm unities of v aried sizes and alleviating resolution limit issues. 4. Lab el Identiﬁabilit y . Since the likeli ho o d in ( 2 ) only considers if individu als are in the same comm unity or not, lab els are not iden tiﬁable due to this sto c hastic equiv alence. Moreo v er, if π follows a str ongly informativ e symmetric Diric hlet, α = W · 1 K with W large, then the marginal prior on σ is appro ximately non-iden tiﬁable: P ( σ ) = Z P ( σ | π ) P ( π )d π = Q k Γ( N k + W ) / Γ( W ) Γ( n + K W ) / Γ( K W ) ≈ Q k W N k ( K W ) n = 1 K n . Since σ i are i.i.d. mu ltinomial, then if π is non-informativ e, π = (1 /K, . . . , 1 /K ), the labels are not iden tiﬁable in the p osterior P ( σ | A ) either. In fact, non-iden tiﬁabilit y issues occur within a group of labels I whenever π i = π j for all i, j ∈ I , but w e discuss a non-informativ e π for simplicity and b ecause that is a common mo deling c hoice. A common approach in latent class mo dels to ﬁx lab el non-identiﬁ ability is to ﬁx an arbitrary order in the parameters ( Gelman et al. , 2003 , Chapter 18), e.g. γ 12 < · · · < γ K − 1 ,K . Ho we ver, as No wic ki and Snijders ( 2001 ) p oint out, this solution can lead to imp erfect iden tiﬁcation of the classes if the parameters are close with high posterior probabilit y; a ma jor drawbac k then is that parameters and lab els can b e in terpreted incorrectly . T o address this problem, a lab el switc hing algorithm w as prop osed b y Stephens ( 2000 ) in the con text of MCMC sampling, but it is slo w in practice. Another approac h is to simply fo cus on p ermutation-in v arian t functions; in particular, when estimating σ , we can adopt a p erm utation-in v ariant loss, such as Binder’s loss ( Binder , 1978 ). W e discuss suc h approac h in more detail in Section 6 . Next, we prop ose an alternative, simpler pro cedure to remap lab els and address non-iden tiﬁabilit y . L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 6 4.1. Canonic al Pr oje ction and R emapping L ab els. Let L . = { 1 , . . . , K } and L = { σ ∈ L n : N k ( σ ) > 1 , k = 1 , . . . , K } b e the space of lab els with positive prior probabilit y . If ρ is any p ermutation of the lab els then P ( σ | A ) = P ( ρ ( σ ) | A ), where ( ρ ( σ )) j = ρ ( σ j ) for j = 1 , . . . , n . Non-id entiﬁabi lity here means that P ( · | A ) is invariant under ρ , and that σ and ρ ( σ ) are P ( ·| A )- e quivalent , whic h we denote by σ ∼ P ρ ( σ ). Moreo ver, we can partition L according to ∼ P : if S is one such partitioned subspace, then any σ ∈ S is such that σ is not P ( · | A )-equiv alen t to any other lab el conﬁguration in S . T o achiev e lab el identiﬁabilit y we anc hor one such subspace as a r efer enc e space Q and regard al l other su bspaces as p ermuted copies of Q . Let ind ( σ ) b e the vector with the ﬁrst positions in σ where each lab el app ears, ind ( σ ) k . = min { i : σ i = k } , and further deﬁne ord ( σ ) as the vector with the order in which the lab els app ear in σ , (5) o rd ( σ ) k = σ − 1 h ind ( σ ) ( k ) i , k ∈ L. Note that ind ( σ ) ( k ) is the k -th p osition in the ordered vect or ind ( σ ). As an example, if σ = (2 , 2 , 3 , 1 , 3 , 4 , 2 , 1) with K = 4 (and n = 8) then ind ( σ ) = (4 , 1 , 3 , 6), ordered ind ( σ ) is (1 , 3 , 4 , 6) and so ord ( σ ) = (2 , 3 , 1 , 4). T o main tain identiﬁabilit y we then simply constraint lab el assignments to the subset of L where ord ( · ) is ﬁxed. As a simple, natural choice, let us restrict assignmen ts to Q = { σ : ord ( σ ) = L } . Note that any σ can be mapped to its c anonic al assignment by (6) ρ ( σ ) . = o rd ( σ ) − 1 ( σ ) . T aking our previous example, σ = (2 , 2 , 3 , 1 , 3 , 4 , 2 , 1) w ould then b e mapp ed to ρ ( σ ) = (1 , 1 , 2 , 3 , 2 , 4 , 1 , 3). The deﬁnitions of ind and ord can then b e used to derive a procedure that r emaps σ to ρ ( σ ); for completeness, w e list an algorithm that implements such remap pro cedure in Appendix 9.2 . Our prop osed reference set ab o ve is also describ ed by Q = { σ ∈ L : σ = ρ ( σ ) } , the quotient space of L with resp ect to o rd , L / o rd : an y pair of lab el conﬁgurations σ 1 and σ 2 suc h that ρ ( σ 1 ) = ρ ( σ 2 ) are identiﬁ ed to a single lab el ρ ( σ 1 ) in Q . By constraining the lab els to a reference quotient space we achiev e not only identiﬁabilit y , but also make the lab els in terpretable: label j marks the j -th comm unit y to app ear in the sequence of lab els. As a consequence, we are not restricted to estimating p ermutation-in v arian t functions of the lab els, as in the approach of Nowic ki and Snijders ( 2001 ), since no w, for example, P ( σ i = j | A ) is meaningful. As a particular application, we derive a direct estimator of σ based on Hamming loss in Section 6 ; in the next section we discuss how the constrain t to Q is implemen ted in practice. 5. P osterior Sampling. T o sample from the join t p osterior on σ , β and π , w e use a Gibbs sampler ( Geman and Geman , 1984 ; Rob ert and Casella , 1999 ) that iterativ ely alternates b et w een sampling from [ σ | γ , η , π , A ] , [ π | σ, γ , η , A ] , [ γ , η | σ, π , A ] un til conv ergence. Next, we discuss how we obtain eac h conditional distribution in closed form. L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 7 5.1. Sampling σ and π . Let us start with the most relev ant parameters: the lab els σ . W e can sample a candidate, unconstrained assignment for actor i , σ i , conditional on all the other lab els σ [ − i ] , parameters ( β , π ), and data A from a multinomi al with probabilities: (7) P ( σ i | σ [ − i ] , β , π , A ) ∝ π k Y j 6 = i  logit − 1 ( γ σ i σ j + η i + η j )  A ij  1 − logit − 1 ( γ σ i σ j + η i + η j )  1 − A ij = π k Y j 6 = i exp { A ij ( γ σ i σ j + η i + η j ) } 1 + exp { γ σ i σ j + η i + η j } . T o guarantee that parameters are identiﬁable, w e reject the candidate σ if N k ≤ 1 for any comm unit y k . Moreo v er, to keep the lab els iden tiﬁable, we remap σ using the routi ne in Section 4 and remap γ accordingly . As an example, consider the lab el samples obtained from running the Gibbs sampler on the political blogs study in Section 7 . In Figure 1 w e plot a multidi mensional scaling [MDS ( Gow er , 1966 )] representation of the samples. W e hav e K = 2 communities, and so L is partitioned into a reference quotient space in the right and a “mirrored” space in the left; an y point in the mirrored space can be obtained b y sw apping lab els 1 and 2 in the reference space and vice-v ersa. The green arro w shows a v alid sampling mo v e σ ( t ) → σ ( t +1) at iteration t that do es not require a remap, while the red arro w is an in v alid mo v e since it crosses spaces. The blue arrow remap s σ ( t +1) to ρ ( σ ( t +1) ) in the reference space. The dashed green arro w summarizes b oth op erations. MDS 1 MDS 2 mirror space solution space ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● sampled label remapped label inner−space move cross−space move remapping move equivalent mov e Fig 1: MDS representation of the tw o copies of the quotien t space L / o rd using p osterior samples for the p olitical blogs example in Section 7 . Arro ws are describ ed in text. L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 8 F or the nuisance parameter π we summon conjugacy to obtain (8) π | σ, θ , A ∼ Dir ( α + N ( σ )) , where N ( σ ) = ( N 1 , . . . , N K ) and N k are comm unit y sizes. 5.2. Sampling γ and η . Sampling β conditional on σ , π , and data A is more chal- lenging since the logistic lik eliho o d in ( 2 ) do es not sp ecify a closed form distribution. Ho we ver, if we explore a data augmen tation strategy b y introducing laten t v ariables ω = ( ω ij ) i β ), then β | ω , σ, A ∼ I [ γ ≤ 0] · N ( m, V ) where, with Ω = Diag( ω ij ) and laten t weigh ted resp onses z ij = ( A ij − 1 / 2) ω − 1 ij , (9) V =  X > Ω X + 1 τ 2 I n + ( K 2 )  − 1 and m = V X > Ω z . The assortativit y constrain t γ ≤ 0 in the β prior is clearly also present in the conditional p osterior, and so we can use a simple rejection sampling step for the truncated normal: sample from unconstrained marginals N ( m, V ) and accept only if γ ≤ 0. Ho w ev er, since β =  γ η       ω , σ, A ∼ N m =  m γ m η  , V =  V γ V γ η V η γ V η  ! , w e can adopt a more eﬃcient wa y of sampling β by ﬁrst sampling η marginally , (10) η | ω , σ, A ∼ N ( m η , V η ) , and then sampling (11) γ | η , ω , σ, A ∼ I ( γ ≤ 0) · N ( m γ + V γ η V − 1 η ( η − m η ) , V γ − V γ η V − 1 η V η γ ) from a truncated normal. In practice, w e compute the Sc h ur complemen t of V η , V γ − V γ η V − 1 η V η γ , using the SWEEP op erator ( Go odnight , 1979 ). 5.3. Gibbs sampler. T o summarize, after setting initial parameters σ , β and π arbitrarily , w e then iterate until conv ergence the following Gibbs sampling steps: 1. Sample σ | β , π , A : for each node i , (a) Sample σ i | σ [ − i ] , β , A from a m ultinomial distribution as in ( 7 ). If N k ( σ ) < 2 for some comm unit y k , reject and keep the previous v alue of σ i . (b) Remap σ using the pro cedure in Section 4 . 2. Sample π | σ, β , A from the Diric hlet distribution in ( 8 ). 3. Sample β | σ, π , A : (a) Sample ω | σ, β , π , A : for each pair i < j , ω ij | σ, β ∼ PG (1 , x ij ( σ ) > β ). (b) Sample β | σ, π , ω , A : compute m and V as in ( 9 ), sample η marginally as in ( 10 ), and then sample γ | η from a truncated multiv ariate normal distribution as in ( 11 ). L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 9 T o sp eed up con v ergence and impro v e precision, w e set the init ial σ to b e an appro ximate p osterior mo de obtained from a greedy optimization version of the ab ov e routine, similar to a gradien t cyclic descent metho d. The main changes are: 1. In Step 1.a we take σ i to b e the mode of σ i | σ [ − i ] , β , A (but we migh t still reject σ i if N k ( σ ) < 2 for some k and remap σ in Step 1.b.) 2. In Step 2, w e take π to b e the mo de of the Dirichlet distribution in ( 8 ). 3. Step 3 is substituted by a regularized iterativ e rew eighted least-squares (IRLS) step. IRLS is usual when ﬁtting logistic regression mo dels ( McCullagh and Nelder , 1989 ). A t the t -th iteration we deﬁne µ ij = logit − 1 ( x ij ( σ ) > β ( t ) ) and W = Diag( µ ij (1 − µ ij )) to obtain the up date V =  X > W X + 1 τ 2 I n + ( K 2 )  − 1 and β ( t +1) = V X > W z ( t ) where z ( t ) = X β ( t ) + W − 1 ( y − µ ) is now the “w orking resp onse” . T o guaran tee that the comm unit y constrain ts γ ≤ 0 are met, w e use an active-set metho d ( No cedal and W right , 2006 , Chapter 16). Since w e exp ect the posterior space to b e m ultimodal, w e adopt a strategy similar to Kar- rer and Newman ( 2011 ) and sample m ultiple starting p oin ts for σ according to its prior distribution and then obtain approximate p osterior mo des for each sim ulation. W e elect the best appro ximate mo de o v er all simulations as the start ing point for the Gibbs sam- pler, whic h is then run until conv ergence to more thoroughly explore the p osterior space. F or con venience , the Gibbs sampler and its optimization version are implemented in the R pac k age sbmlogit , av ailable as supplemen tary material. 6. P osterior Inference. The usual estimator for lab el assignment is the maxim um a p osteriori (MAP) estimator, b σ M = arg min e σ ∈{ 1 ,...,K } n E σ | A  I ( e σ 6 = σ )  = arg max e σ ∈{ 1 ,...,K } n P ( σ = e σ | A ) , whic h, alb eit based on a zero-one loss function ( Besag , 1986 ), has the adv an tage of b eing in v ariant to label permutations. How ev er, give n the ﬂexibilit y in our mo del due to the hierarc hical lev els, the p osterior space is often complex and so the MAP might fail to captu re the v ariabilit y and migh t focus on sharp peaks that gather a small amoun t of posterior mass around them. Another estimator for lab el assignment arises from minimizing Binder’s loss B ( Binder , 1978 , 1981 ), (12) b σ B = arg min e σ ∈{ 1 ,...,K } n E σ | A  B ( e σ , σ )  , where B ( e σ , σ ) = X i 2 and K = 2. 7.3. Case Study. Next, w e ev aluate our estimator for comm unit y detection on t wo real- w orld netw ork datasets. 7.3.1. Politic al blo gs. The ﬁrst case study is the political blogs net w ork ( Adamic and Glance , 2005 ), which is a medium real-world netw ork con taining ov er one thousand no des. L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 14 n = 100,a = 2,b = 1 n = 500,a = 2,b = 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 = 10 = 15 = 25 0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 mu precision method Centroid Binder KN FG ML WT LP Fig 5: Benc hmark netw orks of n = 100 and 500 nodes, with diﬀerent combinations of the exp onen ts a , b and the av erage degree h k i are used. Eac h b o xplot corresp onds to the precision of the estimator ov er 100 and 50 graph realizations for n = 100 and n = 500, resp ectiv ely . In this net w ork, each no de is a blog ov er the p erio d of tw o mon ths preceding the U.S. Presiden tial Election of 2004, and t wo no des are considered to b e connected if they referred to one another and there was ov erlap in the topics they discussed. The netw ork is known to be split into t wo comm unities ( K = 2), liberals and conserv ativ es, and has n = 1 , 222 no des after isolated no des are remo v ed. It is exp ected that blogs in fa vor of the same part y are more lik ely to b e linked and discussing the same topics than those in fav or of diﬀeren t parties, whic h corrob orates a comm unit y b ehavi or. The cen troid estimator, depicted in the leftmost panel in Figure 6 , agrees w ell with the reference of this netw ork. W e estimate each η i for no de i by its estimated p osterior mean using the con verged samples and plot the estimated η i against the logit normalized degree of no de i in the middle panel of Figure 6 . There is a positive linear relat ionship bet ween η i and the logit of the normalized degrees, indicating that the exp ected degree, th us the probabilit y of ha ving an edge, is positively related to the observ ed degree of the no de. If there is a communit y eﬀect, that is, if the netw ork can b e b etter explained b y partitioning no des in to tw o diﬀerent communities, then γ 12 is exp ected to b e signiﬁcantly negative. The righ tmost panel in Figure 6 shows the estimated p osterior distribution of γ 12 . An estimated 95% credible interv al for γ is [ − 3 . 16 , − 2 . 99], which shows a clear deviation from 0 and th us indicates a strong comm unit y eﬀect in the netw ork. W e further compare the centroid estimator with tw o other estimators, Binder and KN, as in the previous section. The estimated 90% error in terv als for the centroid , Binder, and L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 15 KN estimators are [0 . 053 , 0 . 054], [0 . 053 , 0 . 054], and [0 . 045 , 0 . 051], respectively . In general, the three estimators p erform equally well while the KN estimator yields a slightly smaller error rate on av erage. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −7 −6 −5 −4 −3 −2 −1 −6 −4 −2 0 2 logit(degree/(n−1)) η −3.15 −3.10 −3.05 −3.00 γ 12 ● Fig 6: P olitical blogs net work . Left: Node sizes are proportional to degree; node colors signal the cen troid estimators (red/green). Node color int ensities are prop ortional to b P ∗ ( σ i | A ) and no de b orders mark the reference. Middle: η i on logit(degree i / ( n − 1)) for each no de i ; color for eac h no de i represents ( b σ C ) i . Righ t: estimated p osterior distribution for γ 12 . 7.3.2. Politic al b o oks. Finally , we pic k the political b o oks dataset compiled by V aldis Krebs (unpublished). This is a net work of p olitical b o oks sold b y the on-line b o okseller Amazon around the time of the US presiden tial election in 2004. The netw ork is split into three communities: lib eral, neutral, or conserv ativ e. An edge b etw een tw o b o oks represents frequen t co-purc hasing by the same buyers. W e also use w eakly-informativ e priors and run m ultiple chains. The estimated 90% error interv als for the cen troid, Binder, and KN estimators are [0 . 167 , 0 . 175], [0 . 167 , 0 . 175], and [0 . 171 , 0 . 171], resp ectiv ely . The reason why w e observe large error rates under all estimation pro cedure analyzed here might b e that the reference provided b y V aldis Krebs is not that reliable, or that misclassiﬁed b o oks app eal to buy ers who purchase b o oks from all three political opinions. Most of the misclassiﬁed no des are in the neutral (red) comm unit y . Figure 7 sho ws the centroid estimator of the p olitical b o oks net work in the right panel. The comm unities corresponding to lib eral (blue) and conserv ative (green) are clearly sepa- rated b y the neutral (red) communit y and agree with the reference well. The middle panel plots estimated η i against normalized degrees in logit scale; it is evident that t he in-b et w een red communit y has a diﬀerent intercept for η , indicating that it is less connected. The right panel sho ws estimated marginal posterior distributions for γ . Not surprisingly , γ 23 < γ 12 and γ 23 < γ 13 with high p osterior probability since comm unities 2 (green) and 3 (blue) are separated b y commun ity 1 (red) and so do not share man y edges. L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4.0 −3.0 −2.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 logit(degree/(n−1)) η −9 −8 −7 −6 −5 −4 −3 −2 γ 12 γ 13 γ 23 ● ● ● Fig 7: P olitical b o oks netw ork. Left: no de sizes are prop ortional to degree; no de colors signal the centroid estimators. No de color in tensities are prop ortional to b P ∗ ( σ i | A ) and no de b orders mark the reference. Middle: η i on logit(degree i / ( n − 1)) for each no de i ; color for eac h no de i represents ( b σ C ) i . Righ t: estimated p osterior distribution for γ . 8. Discussion. In this pap er we hav e prop osed a Bay esian mo del based on degree- corrected sto c hastic blo c kmo dels that is tailored for comm unit y detection. More sp eciﬁ- cally , our model is ﬂexible due to its hierarc hical structure and aims to capture the gregar- ious communit y b ehavior by requiring, through prior sp eciﬁcation, that the probability of within-comm unit y associations to b e no smaller t han the probabilit y of betw een-communit y asso ciations. Moreov er, we argue that the mo del is a b etter representativ e of assortatively mixing net w orks with binary data co ding the asso ciations instead of frequency coun ts, since w e mo del binary observ ations using a suitable logistic regression with parameters for within and b et w een-comm unit y probabilities of asso ciation. W e devise a Gibbs sampler to obtain p osterior samples and exploit a laten t v ariable form ulation to yield closed-form conditionals. W e formally address lab el iden tiﬁabilit y b y restricting lab el conﬁgurations to a canonical reference subspace, and prop ose a remap pro cedure to implement this constrai nt in prac- tice. As a consequence, lab els are interpretable and we are able to estimate any function of the labels as opp osed to previous approaches that were restricted to p ermutation-in v arian t functions. In particular, we prop ose a nov el remapp ed centroid estimator to infer commu- nit y assignmen ts. W e contend that while the mo del can arguably represen t the data w ell, the p osterior space can be complex and a bad estimator can spoil the analy sis; it is then imp erativ e to adopt an estimator that arises from a principled and reﬁned loss function and th us b etter summarizes the p osterior space. Our prop osed remapp ed centroid estima- tor is more similar to a p osterior mean, and th us, while considering the whole p osterior distribution in the space of remapp ed lab el assignmen ts, tends to situat e itself in regions of high concen tration of p osterior mass. F rom a practical p oint of view, w e sho w that the prop osed estimator p erforms b etter than MAP and Binder estimators and achiev es low er misclassiﬁcation rates. L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 17 If the p osterior space is m ultimo dal then a single point estimator has diﬃculty in repre- sen ting the space, and the centroid estimator is not immune to this problem. W e intend to further extend the prop osed estimation pro cedure to account for multiple mo des by explor- ing c onditional estimators on partitions of the space. While this can b e done empirically by clustering p osterior samples, we will pursue a more principled w ay of iden tifying partitions. As simple extensions to the prop osed mo del, w e also intend to incorp orate parameters for no de attributes and to generalize the formulation to account for coun t, categorical, and ordinal data. Other directions for future work, alb eit not related to communit y detection, include extending the remap pro cedure to other settings such as clustering and mixture mo del inference. 9. App endix. 9.1. Pr o of of The or em 1 . F or the pro of we ﬁrst note that w e can split each row x ij in the design matrix of ( 2 ) according to γ and η en tries, x ij . = [ b ij c ij ], where (14) b ij,kl = I [min( σ i , σ j ) = k , max( σ i , σ j ) = l ] , k , l = 1 , . . . , K , k ≤ l, c ij,v = I ( i = v ) + I ( j = v ) , v = 1 , . . . , n, that is, b ij iden tiﬁes the pair of communities at the endp oints of ( i, j ) for γ and c ij marks eac h no de-correction from η . Pr oof of (a). Let us pick an arbitrary communit y k and a pair ( i, j ). There are then three wa ys to classify ( i.j ): (i) it is either outside of communit y k ; (ii) one of its endp oint s is in communit y k ; or (iii) it is inside communit y k . If we now deﬁne d ij,k = P v : σ v = k c ij,v then ( i, j ) is classiﬁed exactly according to d ij,k : d ij,k = 0 , 1 , or 2 if ( i, j ) is in cases (i), (ii), or (iii), resp ectiv ely . Thus, it follows that 2 b ij,kk + X l 6 = k b ij,kl = X v : σ v = k c ij,v , for eac h k = 1 , . . . , K , and so X has K constrain ts in its columns. Pr oof of (b). Note that X is full column-ranked if and only if X > X is inv ertible, so we just need to show that X > X is in vertibl e if N k ≥ 2 for k = 1 , . . . , K . Let B = [ b ij, 12 , . . . , b ij,K − 1 K ] i X =  B > B B > C C > B C > C  . Th us, X > X is in vert ible if and only if b oth B > B and the Sch ur complement of C > C , ∆ . = C > [ I − B ( B > B ) − 1 B > ] C are inv ertible. First, B > B = Diag X i X is the same as that of the blo c k diagonal matrix since one can b e obtained from the other through row and column op erations. Thus, the conditions N k 6 = 0 from B > B and now N k 6 = 1 can b e summarized in to N k ≥ 2. 9.2. R emap Algorithm. Algorithm 1 lists a routine that ﬁnds the canonical map ρ based on the canonical order in σ as in Equation ( 6 ) and remaps σ in-place. Algorithm 1 Remapping lab els in σ to ρ ( σ ). assigned ← {} ρ ← {} n ← 0 { n umber of diﬀerent lab els in σ } for i = 1 , . . . , | σ | do { obtain ρ . = ord ( σ ) − 1 } if not assigned ( σ ( i )) then { ﬁrst app earance? } assigned ( σ ( i )) ← true { mark σ ( i ) } n ← n + 1 ρ ( σ ( i )) ← n end if end for for i = 1 , . . . , | σ | do { remap σ } σ ( i ) ← ρ ( σ ( i )) end for return σ 9.3. Pr o of of The or em 2 . It is suﬃcien t to ﬁnd the pre-map estimator b σ ∗ . = arg min e σ ∈{ 1 ,...,K } n E σ | A  H ( e σ , ρ ( σ ))  L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 19 since, b y deﬁnition, b σ C = ρ ( b σ ∗ ). Denoting Σ = { 1 , . . . , K } n and Σ ∗ = Σ / ord , we ha v e that E σ | A  H ( e σ , ρ ( σ ))  = X σ ∈ Σ H ( e σ , ρ ( σ )) P ( σ | A ) = X σ ∈ Σ ∗ X σ ∗ : ρ ( σ ∗ )= σ H ( e σ , σ ) P ( σ ∗ | A ) . Since P ( σ ∗ | A ) = P ( σ | A ) follows from the lack of identiﬁabilit y w e further obtain E σ | A  H ( e σ , ρ ( σ ))  = X σ ∈ Σ ∗ n ( σ ) H ( e σ , σ ) P ( σ | A ) , where n ( σ ) = |{ σ ∗ : ρ ( σ ∗ ) = σ }| = K ! / ( K − k ( σ ))! is the num ber of assignmen ts that are iden tiﬁed to σ through ord , and k ( σ ) is the nu mber of diﬀerent lab els in σ . W e can then deﬁne P ∗ ( σ | A ) . = n ( σ ) P ( σ | A ) as the induced measure in the quotient space Σ ∗ to th us ha v e E σ | A  H ( e σ , ρ ( σ ))  = X σ ∈ Σ ∗ H ( e σ , σ ) P ∗ ( σ | A ) = X σ ∈ Σ ∗ n X i =1 I ( e σ i 6 = σ i ) P ∗ ( σ | A ) = n − n X i =1 X σ ∈ Σ ∗ I ( e σ i = σ i ) P ∗ ( σ | A ) = n − n X i =1 P ∗ ( σ i = e σ i | A ) . But then arg min e σ ∈{ 1 ,...,K } n E σ | A  H ( e σ , ρ ( σ ))  = arg max e σ ∈{ 1 ,...,K } n n X i =1 P ∗ ( σ i = e σ i | A ) and so ( b σ ∗ ) i = arg max k ∈{ 1 ,...,K } P ∗ ( σ i = k | A ) , that is, b σ ∗ is a consensus estimator, as desired. 9.4. Pr o of of The or em 3 . T o compare e σ and σ let us deﬁne n ij . = P k,l I ( σ k = i, e σ l = j ), the num b er of no des that b elong to communit y i in σ and to communi ty j in e σ . Then, B ( e σ , σ ) = P i P j 2, we ha v e: nH ( e σ , σ ) = X i 6 = j n ij X i,j n ij = X i 6 = j n ij X i 6 = j n ij + X i n ii ! = X i 6 = j n ij X i 6 = j n ij + X i 6 = j n ij X i n ii = X i 6 = j n 2 ij | {z } A + X i 6 = j,k 6 = l k 6 = i,j 6 = l n ij n kl | {z } B +2 X i 6 = j,i 6 = k j = 10 = 15 = 25 0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 mu precision method Centroid Binder KN FG ML WT LP Fig 8: Benc hmark netw orks of n = 100 no des, with diﬀeren t combinat ions of the exp onents a ∈ { 2 , 3 } , b ∈ { 1 , 2 } and the av erage degree h k i ∈ { 10 , 15 , 25 } are used. Each b oxplot corresp onds to the precision of the estimator o ve r 100 graph realizations. n = 500,a = 2,b = 1 n = 500,a = 2,b = 2 n = 500,a = 3,b = 1 n = 500,a = 3,b = 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 = 10 = 15 = 25 0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 0.1 0.2 0.3 0.4 0.5 0.6 mu precision method Centroid Binder KN FG ML WT LP Fig 9: Benc hmark netw orks of n = 500 no des, with diﬀeren t combinat ions of the exp onents a ∈ { 2 , 3 } , b ∈ { 1 , 2 } and the av erage degree h k i ∈ { 10 , 15 , 25 } are used. Each b oxplot corresp onds to the precision of the estimator o ve r 100 graph realizations. L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 22 Blondel, V. D., J.-L. Guillaume, R. Lam biotte, and E. Lefebvre (2008). F ast unfolding of communities in large netw orks. Journal of Statistic al Me chanics: The ory and Exp eriment 2008 (10), P10008. Brandes, U., D. Delling, M. Gaertler, R. G ¨ ork e, M. Ho efer, Z. Nikoloski, and D. W agner (2007). On ﬁnding graph clusterings with maxim um mo dularit y . In Gr aph-The or etic Conc epts in Computer Scienc e , pp. 121–132. Springer. Carv alho, L. and C. La wrence (2008). Centroid estimation in discrete high-dimensional spaces with appli- cations in biology . Pr o c e e dings of the National A c ademy of Scienc es 105 (9), 3209–3214. Celisse, A., J.-J. Daudin, and L. Pierre (2012). Consistency of maximum-lik elihoo d and v ariational estimators in the stochastic blo c k model. Ele ctr onic Journal of Statistics 6 , 1847–1899. Choi, D. S., P . J. W olfe, and E. M. Airoldi (2012). Stochastic blockmodels with a growing n um b er of classes. Biometrika . Clauset, A., M. E. J. Newman, and C. Moore (2004, August). Finding communit y structure in very large net works. Physic al R eview E 70 (6), 066111+. Daudin, J. J., F. Picard, and S. Robin (2008, June). A mixture mo del for random graphs. Statistics and Computing 18 (2), 173–183. Donath, E. and J. Hoﬀman (1973). Lo w er b ounds for the partitioning of graphs. IBM J. R es. Dev. 17 (5), 420–425. Duc h, J. and A. Arenas (2005). Communit y identiﬁcation using extremal optimization. Physic al R eview E 72 , 027104. Fien b erg, S. E., M. M. Meyer, and S. S. W asserman (1985). Statistical analysis of multiple so ciometric relations. Journal of the Americ an Statistic al Asso ciation 80 (389), 51–67. Fien b erg, S. E. and S. W asserman (1981). An exp onen tial family of probability distributions for directed graphs: Comment. Journal of the Americ an Statistic al Asso ciation 76 (373), 54–57. F ortunato, S. and M. Barthelem y (2007). Resolution limit in communit y detection. Pr o c e e dings of the National A c ademy of Scienc es 104 (1), 36–41. F osdic k, B. and P . Hoﬀ (2013). T esting and mo deling dep endencies b et ween a netw ork and no dal attributes. arXiv:1306.4708v1 . F ritsc h, A. and K. Ickst adt (2009). Impro ved criteria for clustering based on the p osterior similarity matrix. Bayesian A nalysis 4 (2), 367–392. Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (2003). Bayesian Data Analysis . CRC press. Geman, S. and D. Geman (1984). Sto c hastic relaxation, Gibbs distributions, and the Bay esian restoration of images. IEEE T r ansactions on Pattern Analysis and Machine Intel ligenc e 6 , 721–741. Goo dnight, J. H. (1979). A tutorial on the sweep op erator. The Americ an Statistician 33 (3), 149–158. Go wer, J. (1966). Some distance prop erties of latent root and vector methods used in multiv ariate analysis. Biometrika 53 (3-4), 325–338. Hancock, T., I. T akigaw a, and H. Mamitsuk a (2010). Mining metabolic pathw a ys through gene expression. Bioinformatics 26 (17), 2128ˆ a ˘ A ¸ S2135. Handcock, M., A. Raftery , and J. T antrum (2007). Mo del-based clustering for social netw orks. Journal of the R oyal Statistic al So ciety: Series A 170 (2), 301–354. Hastie, T., R. Tibshirani, and J. F riedman (2001). Maxim um lik eliho od from incomplete data via the em algorithm. The Elements of Statistic al L e arning , 520–528. Hoﬀ, P ., A. Raftery , and M. Handco ck (2002). Latent space approaches to so cial netw ork analysis. Journal of the Americ an Statistic al Asso ciation 97 (460), 1090–1098. Hoﬀ, P ., A. Raftery , and M. Handcock (2005). Bilinear mixed-eﬀects mo dels for dy adic data. Journal of the Am eric an Statistic al Asso ciation 100 (469), 286–295. Hofman, J. and C. Wiggins (2008). Ba yesian approac h to netw ork mo dularit y . Physic al R eview L et- ters 100 (25), 258701. Holland, P . and S. Leinhardt (1981). An exp onent ial family of probability distributions for directed graphs. Journal of the Americ an Statistic al Asso ciation 76 (373), 33–50. Holland, P . W., K. B. Laskey , and S. Leinhardt (1983). Stochastic blo c kmo dels: First steps. So cial net- works 5 (2), 109–137. Karrer, B. and M. Newman (2011). Sto c hastic blo c kmo dels and communit y structure in netw orks. Physic al R eview E 83 (1), 016107. Kernighan, B. and S. Lin (1970). An eﬃcient heuristic procedure for partitioning graphs. Bel l Sys. T e ch. J. 49 (2), 291–308. Kim, M. and J. Lesko v ec (2011). Mo deling so cial netw orks with node attributes using the multiplic ative attribute graph model. UAI 7A UAI Pr ess , 400–409. Lancic hinetti, A., S. F ortunato, and F. Radicchi (2008). Benc hmark graphs for testing communit y detection algorithms. Physic al R eview E 78 (1), 046110. L. PENG AND L. CAR V ALHO/BA YESIAN DEG.-CORR. SBM F OR COMM. DETECTION 23 Lau, J. W. and P . J. Green (2007). Bay esian mo del-based clustering pro cedures. Journal of Computational and Gr aphic al Statistics 16 (3), 526–558. Lorrain, F. and H. C. White (1971). Structural equiv alence of individuals in social netw orks. The Journal of Mathematic al So ciolo gy 1 (1), 49–80. Mariadassou, M., S. Robin, and C. V acher (2010, 06). Uncov ering latent structure in v alued graphs: A v ariational approach. The Annals of Applie d Statistics 4 (2), 715–742. McCullagh, P . and J. A. Nelder (1989). Gener alize d Line ar Mo dels , V olume 37. CRC Press. Newman, M. (2002, Oct). Assortative mixing in netw orks. Phys. R ev. L ett. 89 , 208701. Newman, M. (2004). F ast algorith m for detecting communit y structure in net w orks. Physic al R eview E 69 (6), 066133. Newman, M. (2006). Mo dularit y and communit y structure in net w orks. Pr o c e e dings of the National A c ademy of Scienc es 103 (23), 8577–8582. Newman, M. and M. Girv an (2004). Finding and ev aluating communit y structure in netw orks. Physic al R eview E 69 (2), 026113. Newman, M. E. J. (2003). Mixing patterns in netw orks. Phys. R ev. E (67). Nocedal, J. and S. J. W right (2006). Numeric al Optimization (2nd ed.). Springer-V erlag. No wicki, K. and T. A. B. Snijders (2001). Estimation and prediction for sto chastic blo c kstructures. Journal of the Americ an Statistic al Asso ciation 96 (455), 1077–1087. P arthasarathy , S., Y. Ruan, and V. Satuluri (2011). Communit y discov ery in so cial netw orks: Applications, methods and emerging trends. In So cial Network Data Analytics , pp. 79–113. Springer. P olson, N. G., J. G. Scott, and J. Windle (2012). Ba yesian inference for logistic models using polya-gamma laten t v ariables. arXiv pr eprint arXiv:1205.0310 . P ons, P . and M. Latapy (2004). Computing communities in large netw orks using random walks. J. of Gr aph Alg . and App. bf 10 , 284–293. Ragha v an, U. N., R. Albert, and S. Kumara (2007). Near linear time algorit hm to detect communit y structures in large-scale netw orks. Physic al R eview E 76 (3). Robert, C. and G. Casella (1999). Monte Carlo Statistic al Metho ds . Springer New Y ork. Robins, G., P . Pattison, Y. Kalish, and D. Lusher (2007). An in tro duction to exp onen tial random graph ( p ∗ ) mo dels for so cial netw orks. So cial networks 29 (2), 173–191. Rohe, K., S. Chatterjee, and B. Y u (2011). Sp ectral clustering and the high-dimensional sto chastic block- model. The Annals of Statistics 39 (4), 1878–1915. Sampson, S. F. (1968). A novitiate in a p erio d of change: an exp erimental and c ase study of so cial r elation- ships . Ph. D. thesis, Cornell Universit y , Septem b er. Snijders, T. A. and K. Nowic ki (1997). Estimation and prediction for sto c hastic blo c kmodels for graphs with laten t block structure. Journal of Classiﬁc ation 14 (1), 75–100. Stephens, M. (2000). Dealing with lab el switching in mixture mo dels. Journal of the R oyal Statistic al So ciety. Series B 62 (4), 795–809. T allberg, C. (2005). A bay esian approach to mo deling stochastic blo ckstruc tures with cov ariates. Journal of Mathematic al So ciolo gy 29 , 1–23. V´ azquez, A. (2003). Growing netw ork with lo cal rules: Preferential attac hment , clustering hierarch y , and degree correlations. Physic al R eview E 67 (5), 056104. V on Luxburg, U. (2007). A tutorial on sp ectral clustering. Statistics and c omputing 17 (4), 395–416. V u, D. Q., D. R. Hun ter, and M. Sc h w einberger (2013, 06). Mo del-based clustering of large net w orks. Annals of Applie d Statistics 7 (2), 1010–1039. W ang, Y. J. and G. Y. W ong (1987). Sto chastic blo c kmo dels for directed graphs. Journal of the A meric an Statistic al Asso ciation 82 (397), 8–19. Zac hary , W. W. (1977). An information ﬂow mo del for conﬂict and ﬁssion in small groups. Journal of Anth r op olo gic al R ese ar ch 33 (4), 452–473. Zanghi, H., F. Picard, V. Miele, and C. Am broise (2010, 06). Strategies for online inference of model-based clustering in large and gro wing netw orks. The Annals of Applie d Statistics 4 (2), 687–714. Dep ar tment of Ma thema tics and St a tistics Boston University 111 Cummington Mall Boston, Massachusetts 02215 E-mail: ljp eng@math.bu.edu ; lecarv al@math.bu.edu

Bayesian Degree-Corrected Stochastic Blockmodels for Community Detection

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment