Community extraction in multilayer networks with heterogeneous community structure

Journal of Machine Learning Researc h 18 (2017) ** Submitted 12/16; Revised 6/17; Published 11/17 Comm unit y Extraction in Multila y er Net w orks with Heterogeneous Comm unit y Structure James D. Wilson ∗ jdwilson4@usf ca.edu Dep artment of Mathematics and Statistics University of San F r ancisc o San F r ancisc o, CA 94117-1080 John P alowitc h ∗ p alojj@email.unc.edu Shank ar Bhamidi bhamidi@email.unc.edu Andrew B. Nob el nobel@email.unc.edu Dep artment of Statistics and Op er ations R ese ar ch University of North Car olina at Chap el Hil l Chap el Hil l, NC 27599 Editor: Edoardo Airoldi Abstract Multila yer netw orks are a useful w ay to capture and mo del multiple, binary or w eighted relationships among a ﬁxed group of ob jects. While communit y detection has prov en to b e a useful exploratory technique for the analysis of single-lay er net works, the dev elop- men t of communit y detection metho ds for multila yer netw orks is still in its infancy . W e prop ose and inv estigate a pro cedure, called Multila yer Extraction, that identiﬁes densely connected vertex-la yer sets in multila y er netw orks. Multila yer Extraction mak es use of a signiﬁcance based score that quantiﬁes the connectivity of an observed vertex-la yer set through comparison with a ﬁxed degree random graph model. Multilay er Extraction di- rectly handles net works with heterogeneous lay ers where comm unity structure ma y be diﬀeren t from lay er to la yer. The pro cedure can capture ov erlapping comm unities, as w ell as bac kground vertex-la yer pairs that do not b elong to an y communit y . W e estab- lish consistency of the vertex-la y er set optimizer of our prop osed multila yer score un- der the m ultilay er sto chastic block mo del. W e inv estigate the p erformance of Multi- la yer Extraction on three applications and a test bed of sim ulations. Our theoretical and n umerical ev aluations suggest that Multilay er Extraction is an eﬀective exploratory to ol for analyzing complex multila yer netw orks. Publicly av ailable co de is a v ailable at https://github.com/jdwilson4/MultilayerExtraction . Keyw ords: comm unity detection, clustering, multiplex net works, score based metho ds, mo dularit y ∗ JD W is corresp onding author. JDW and JP contributed equally to the writing of this pap er. c  2017 James D. Wilson, John Palo witch, Shank ar Bhamidi and Andrew B. Nob el. Wilson, P alowitch, Bhamidi and Nobel 1. In tro duction Net works are widely used to represent and analyze the relational structure among inter- acting units of a complex system. In the simplest case, a netw ork mo del is an un weigh ted, undirected graph G = ( V , E ), where V is a set of vertices that represen t the units, or ac- tors , of the mo deled system, and E is an edge set containing all pairs of vertices { u, v } suc h that actors u and v share a ph ysical or functional relationship. Net works hav e b een successfully applied in a wide arra y of ﬁelds, including the so cial sciences to study so cial relationships among individuals (W asserman and Galaskiewicz, 1994), biology to study in- teractions among genes and proteins (Bader and Hogue, 2003), and neuroscience to study the structure and function of the brain (Sp orns, 2011). In many cases, the v ertices of a net w ork can be divided into groups (often disjoin t) with the prop erty that there are many edges b et ween vertices in the same group, but relatively few edges b etw een v ertices in diﬀeren t groups. V ertex groups of this sort are commonly referred to as c ommunities . The unsup ervised searc h for communities in a netw ork is kno wn as c ommunity dete ction . Comm unity structure has been used to iden tify functionally relev ant groups in gene and protein interaction systems (Lewis et al., 2010; P arker et al., 2015), structural brain netw orks (Bassett et al., 2011), and so cial net works (Onnela et al., 2011; Greene et al., 2010). As communities are often asso ciated with imp ortant structural c haracteristics of a complex system, communit y detection is a common ﬁrst step in the understanding and analysis of netw orks. The searc h for communities that optimize a given quan titative performance criterion is t ypically an NP-hard problem, so in most cases one m ust rely on approximate algorithms to identify comm unity structure. The fo cus of this pap er is comm unity detection in multilayer netw orks. F ormally , an ( m, n )-m ultila yer netw ork is a collec tion G ( m, n ) = ( G 1 , . . . , G m ) of m simple graphs G ` = ([ n ] , E ` ) ha ving common vertex set [ n ] = { 1 , . . . , n } , where the edge sets E ` ma y v ary from la yer to la yer. The graph G ` will b e referred to as the ` th layer of the netw ork. W e assume that the vertices of the m ultilay er netw ork are registered, in the sense that a ﬁxed v ertex u ∈ [ n ] refers to the same actor across all lay ers. Thus the graph G ` reﬂects the relationships b et ween identiﬁed actors 1 , . . . , n in circumstance ` . There are no edges b et ween v ertices in diﬀeren t lay ers, and the lay ers are regarded as unordered so that the indices ` ∈ [ m ] do not reﬂect an underlying spatial or temp oral order among the lay ers. In general, the actors of a m ultilay er net work may not exhibit the same communit y structure across all la yers. F or example in so cial netw orks, a group of individuals ma y b e w ell connected via friendships on F aceb o ok; how ev er, this common group of actors will lik ely , for example, not work at the same compan y . In realistic situations suc h as these, a given v ertex comm unity will only b e presen t in a subset of the lay ers, and diﬀerent communities may b e presen t in diﬀerent subsets of la yers. W e refer to such m ultilay er systems as heterogeneous as eac h lay er may exhibit noticeably diﬀerent communit y structure. Complex and diﬀeren tial relationships betw een actors will b e reﬂected in the heterogenous b eha vior of diﬀeren t la yers of a multila yer netw ork. In spite of this heterogeneity , man y existing communit y detection metho ds for m ultila yer net w orks typically assume that the communit y structure is the same across all, or a substan tial fraction of, the lay ers (Berlingerio et al., 2011; Rocklin and Pinar, 2013; Barigozzi et al., 2011; B erlingerio et al., 2013; Holland et al., 1983; Paul and Chen, 2015). 2 Community Extraction in Mul tila yer Networks W e dev elop and inv estigate a multila yer communit y detection method called Multilay er Extraction, whic h eﬃcien tly handles m ultilay er net works with heterogeneous lay ers. The- oretical and numerical ev aluation of our metho d reveals that Multila yer Extraction is an eﬀectiv e exploratory to ol for analyzing complex multila yer net works. Our contributions to the current literature of statistical analysis of multila y er net works are threefold 1. W e develop a testing-based algorithm for identifying densely connected v ertex-lay er sets ( B , L ), where B ⊆ [ n ] is a set of vertices and L ⊆ [ m ] is a set of lay ers suc h that the vertices in B are densely connected across the lay ers in L . The strength of the connections in ( B , L ) is measured by a local modularity score deriv ed from a null random netw ork mo del that is based on the degree sequence of the observed m ultilay er net work. Identiﬁed communities can hav e o verlapping vertex or lay er sets, and some vertex-la y er pairs may not b elong to any comm unity . V ertex-la yer pairs that are not assigned to an y communit y are interpreted as bac kground as they are not strongly connected to an y other. Overlap and bac kground are common features of real net works that can hav e deleterious eﬀects on partition based metho ds (Lancic hinetti et al., 2011; Wilson et al., 2013, 2014). The Multilay er Extraction pro cedure directly addresses communit y heterogeneity in m ultilay er netw orks. 2. W e assess the consistency of the global optimizer of the aforemen tioned lo cal modu- larit y score under a generativ e mo del for m ultila yer netw orks with comm unities. The generativ e mo del studied is a m ultilay er generalization of the sto c hastic 2 blo c k mo del from Snijders and Nowic ki (1997); W ang and W ong (1987) for m ultilay er netw orks, whic h we call the Multila yer Stochastic Block Model (MSBM). The MSBM is a gener- ativ e mo del that characterizes preferen tial (or a-preferential) b eha vior of pre-sp eciﬁed v ertex-lay er communities in a multila yer net work, via sp eciﬁcation of inter- and in tra- comm unity probabilities. W e are able to sho w that under the MSBM, the num b er of mis-clustered v ertices and lay ers from the v ertex-lay er comm unity that maximizes our prop osed signiﬁcance score v anishes to zero, with high probabilit y , as the num b er of v ertices tends to inﬁnity . There has been considerable work in the area of consistency analysis for single-la yer net works (e.g. Zhao et al., 2012); ho wev er, to the b est of our kno wledge, we are the ﬁrst to address the join t optimalit y properties for b oth vertices and lay ers in multila y er net works. F urthermore, w e pro vide complete and explicit ex- pressions of all error b ounds in the pro of, as we anticipate future analyses where the n umber of lay ers is allow ed to grow with the size of the net w ork. Our proof in v olves a no vel inductiv e argument, whic h, to our knowledge, has not b een employ ed elsewhere. 3. W e apply Multilay er Extraction to three div erse and imp ortan t multila y er netw ork t yp es, including a multila yer so cial netw ork, a transp ortation netw ork, and a collab- oration netw ork. W e compare and contrast Multilay er Extraction with con temp orary metho ds, and highlight the adv antages of our approach o ver single lay er and aggre- gate alternativ es. Our ﬁndings rev eal imp ortan t insigh ts ab out these three complex relational systems. 3 Wilson, P alowitch, Bhamidi and Nobel 1.1 Related W ork Multila yer netw ork mo dels ha ve b een applied to a v ariet y of problems, including mo d- eling and analysis of air transp ortation routes (Cardillo et al., 2013), studying individuals with multiple so ciometric relations (Fienberg et al., 1980, 1985), and analyzing relation- ships betw een social interactions and economic exchange (F erriani et al., 2013). Kiv el¨ a et al. (2014) and Bo ccaletti et al. (2014) pro vide t wo recent reviews of multila y er net works. W e note that G ( m, n ) is also sometimes referred to as a multiplex netw ork. While there is a large and gro wing literature concerning communit y detection in stan- dard, single-la yer, netw orks (F ortunato, 2010; Newman, 2004; P orter et al., 2009), the dev el- opmen t of communit y detection metho ds for m ultilay er netw orks is still relativ ely new. One common approach to multila yer communit y detection is to pro ject the m ultilay er netw ork in some fashion on to a single-lay er net work and then identify comm unities in the single lay er net work (Berlingerio et al., 2011; Ro cklin and Pinar, 2013). A second common approach to m ultilay er comm unity detection is to apply a standard detection metho d to each la yer of the observed netw ork separately (Barigozzi et al., 2011; Berlingerio et al., 2013). The ﬁrst approach fails to account for la yer-speciﬁc comm unity structure and ma y giv e an ov er- simpliﬁed or incomplete summary of the communit y structure of the multila y er netw ork; the second approac h does not enable one to lev erage or identify common structure b etw een la yers. In addition to the metho ds ab ov e, there hav e also b een several generalizations of single-la yer metho ds to multila yer netw orks. F or example, Holland et al. (1983) and P aul and Chen (2015) in tro duce multila y er generalizations of the sto chastic blo c k mo del from W ang and W ong (1987) and Snijders and Nowic ki (1997). Notably , these generative mo dels require the communit y structure to b e the same across lay ers. P eixoto (2015) considers a multila y er generalization of the sto chastic blo c k mo del for w eighted net works that mo dels hierarc hical communit y structure as w ell as the degree dis- tribution of an observ ed netw ork. Paul and Chen (2016) describ e a class of null mo dels for m ultilay er comm unity detection based on the conﬁguration and expected degree mo del. W e utilize a similar mo del in our consideration. Stanley et al. (2016) considered the clustering of lay ers of multila y er netw orks based on recurring communit y structure throughout the net work. Peixoto (2015) and Muc ha et al. (2010) generalized the notion of mo dularit y to m ultilay er net works, and De Domenico et al. (2014) generalized the map equation, which measures the description length of a random walk on a partition of vertices, to m ultilay er net works. De Domenico et al. (2013) discuss a generalization of the multila y er metho d in Muc ha et al. (2010) using tensor decomp ositions. Approximate optimization of either m ultilay er mo dularity or the map equation results in vertex-la yer comm unities that form a partition of [ n ] × [ m ]. By contrast, the comm unities identiﬁed by Multilay er Extraction need not form a partition of [ n ] × [ m ] as some v ertices and or la yers ma y b e considered bac kground and not signiﬁcantly asso ciated with any comm unity . 1.2 Ov erview of the P ap er In the next section w e describ e the null m ultilay er random graph mo del and the scoring of v ertex-lay er sets. In Section 3 w e present and prov e theoretical results regarding the asymptotic consistency prop erties of our prop osed score for multila yer net works. Section 5 pro vides a detailed description of the proposed Multilay er Extraction pro cedure. W e apply 4 Community Extraction in Mul tila yer Networks Multila yer Extraction to three real-world multila yer net works and compare and contrast its p erformance with existing communit y detection methods in Section 6. In Section 7 w e ev aluate the p erformance of Multilay er Extraction on a test b ed of simulated multila yer net works. W e conclude the main pap er with a discussion of future researc h directions in Section 8. The App endix is divided into three sections. In Appendix A, w e pro v e supporting lemmas con tributing to the results giv en in Section 3. In App endix B, w e discuss comp eting metho ds to Multila y er Extraction. In App endix C, w e giv e the complete details of our sim ulation framew ork. 2. Scoring a V ertex-La y er Group Seeking a v ertex partition that optimizes, or approximately optimizes, an appropriate score function is a standard approach to single la yer comm unity detection. Prominent examples of score-based approaches include mo dularity maximization (Newman, 2006b), lik eliho o d maximization for a sto chastic blo ck mo del (W ang and W ong, 1987), as w ell as minimization of the conductance of a partition (Chung, 1997). Rather than scoring a partition of the av ailable netw ork, Multilay er Extraction makes use of a signiﬁcance based score that is applicable to individual vertex-la yer sets. Belo w, we describ e the m ultilay er n ull mo del, and then the prop osed score. 2.1 The Null Model Our signiﬁcance-based score for vertex-la yer sets in multila yer netw orks relies on the comparison of an observ ed multila y er netw ork with a null multila y er netw ork mo del. Let G ( m, n ) b e an observed ( m, n )-multila y er netw ork. F or each la yer ` ∈ [ m ] and pair of v ertices u, v ∈ [ n ], let x ` ( u, v ) = I ( { u, v } ∈ E ` ) indicate the presence or absence of an edge b etw een u and v in lay er ` of G ( m, n ). The de gr e e of a v ertex u ∈ [ n ] in lay er ` , denoted b y d ` ( u ), is the n umber of edges inciden t on u in G ` . F ormally , d ` ( u ) = X v ∈ [ n ] x ` ( u, v ) . The de gr e e se quenc e of lay er ` is the vector d ` = ( d ` (1) , . . . , d ` ( n )) of degrees in that la yer; the degree sequence of G ( m, n ) is the list d = ( d 1 , . . . , d m ) containing the degree sequence of each la yer in the netw ork. Let G ( m, n ) denote the family of all ( m, n )-multila yer netw orks. Given the degree se- quence d of the observed netw ork G ( m, n ), we deﬁne a multila yer conﬁguration mo del and an asso ciated probabilit y measure P d on G ( m, n ), as follows. In lay er G 1 , each node is given d 1 ( u ) half-stubs. P airs of these edge stubs are then chosen uniformly at random, to form edges until all half-stubs are exhausted (disallo wing self-lo ops and m ultiple edges). This pro- cess is done for every subsequent la yer G 2 , . . . , G m indep enden tly , using the corresp onding degree sequence from eac h lay er. In the m ultilay er netw ork mo del described ab ov e, eac h lay er is distributed according to the conﬁguration mo del, ﬁrst in tro duced b y Bollob´ as and Univ ersitet (1979) and Bender and Canﬁeld (1978). The probabilit y of an edge b et ween no des u and v in lay er ` dep ends 5 Wilson, P alowitch, Bhamidi and Nobel only on the degree sequence d ` of the observ ed graph G ` . The distribution P d has t wo complemen tary prop erties that make it useful for identifying communities in an observed m ultilay er net work: (i) it preserv es the degree structure of the observed net work; and (ii) sub ject to this restriction, edges are assigned at random, without regard to the higher order connectivit y structure of the net work. Because of these c haracteristics, the conﬁguration mo del has long b een taken as the appropriate null mo del against which to judge the quality of a prop osed comm unity partition. The conﬁguration mo del is the n ull mo del whic h motiv ates the mo dularity score of a partition in a netw ork (Newman, 2004, 2006b). Consider a single-lay er observed net work G ( n ) with n no des and degree sequence d . F or ﬁxed K > 0, let c u ∈ [ K ] b e the communit y assignmen t of no de u . The mo dularity score of the partition associated with the assignment c 1 , . . . , c n is deﬁned as M ( c 1 , . . . , c n ; G ( n )) := 1 2 | E | X i ∈ [ K ] X u 0, which satisfy π 1 + π 2 = 1, and (ii) a sequence of symmetric 2 × 2 matrices P = { P 1 , . . . , P m } where P ` = { P ` ( i, j ) } with en tries P ` ( i, j ) ∈ (0 , 1). Under the distribution P m,n , a random m ultilay er netw ork b G ( m, n ) is generated using tw o simple steps: 7 Wilson, P alowitch, Bhamidi and Nobel 1. A subset of d π 1 n e vertices are placed in communit y 1, and remaining v ertices are placed in communit y 2. Eac h vertex u in comm unity j is assigned a communit y lab el c u = j . 2. An edge is placed b etw een no des u, v ∈ [ n ] in lay er ` ∈ [ m ] with probability P ` ( c u , c v ), indep enden tly from pair to pair and across la yers, and no self-lo ops are allow ed. F or a ﬁxed n and m , the comm unit y labels c n = ( c 1 , . . . , c n ) are chosen once and only once, and the communit y lab els are the same across eac h la yer of b G ( m, n ). On the other hand, the inner and intra communit y connection probabilities (and hence the assortativity) can b e diﬀeren t from la yer to lay er, introducing heterogeneity among the lay ers. Note that when m = 1, the MSBM reduces to the (single-lay er) sto chastic block mo del from W ang and W ong (1987). 3.2 Consistency of the Score W e ev aluate the consistency of the Multilay er Extraction score under the MSBM de- scrib ed ab ov e. Our ﬁrst result addresses the vertex set maximizer of the score giv en a ﬁxed la yer set L ⊆ [ m ]. Our second result (Theorem 3 in Section 3.2.1) lev erages the former to analyze the global maximizer of the score across lay ers and v ertex sets. Explicitly , consider a m ultilay er net work b G ( m, n ) with distribution under the multila yer stochastic block model P m,n = P m,n ( P , π 1 , π 2 ). F or a ﬁxed vertex set B ⊆ [ n ] and la yer set L ⊆ [ m ], deﬁne the random score by b H ( B , L ) := 1 | L | X ` ∈ L b Q ` ( B ) + ! 2 , where b Q ` ( B ) is the set-mo dularity of B in la yer ` under P m,n . Our main results address the b eha vior of b H ( B , L ) under v arious assuptions on the parameters of the MSBM. T ow ard the ﬁrst result, for a ﬁxed lay er set L ⊆ [ m ], let b B opt ( n ) denote the node set that maximizes b H ( B , L ) (if more than one set do es, an y ma y b e c hosen arbitrarily). T o deﬁne the notion of a “misclassiﬁed” no de, for any tw o sets B 1 , B 2 ⊆ [ n ] let d h ( B 1 , B 2 ) denote the Hamming distance (rigorously deﬁned as the cardinality of the symmetric diﬀerence b et ween B 1 and B 2 ). W e then deﬁne the num b er of misclassiﬁed no des by a set B by Error ( B ) := min { d h ( B , C 1 ) , d h ( B , C 2 ) } . Note that this deﬁnition accoun ts for arbitrary lab eling of the t wo communities. As the no des and communit y assignmen ts are registered across lay ers, neither d h nor Error dep end on the choice of L . Before stating the main theorem, we deﬁne a few quan tities that will b e used throughout its statement and pro of: Deﬁnition 1 L et “ det ” denote matrix determinant. F or a ﬁxe d layer set L ⊆ [ m ] , deﬁne δ ` := det P ` δ ( L ) := min ` ∈ [ L ] δ ` π := ( π 1 , π 2 ) t κ ` := π T P ` π κ ( L ) := min ` ∈ [ L ] κ ` (4) W e no w state the ﬁxed-lay er-set consistency result: 8 Community Extraction in Mul tila yer Networks Theorem 2 Fix m and let { b G ( m, n ) } n> 1 b e a se quenc e of multilayer sto chastic 2 blo ck mo dels wher e b G ( m, n ) is a r andom gr aph with m layers and n no des gener ate d under P m,n ( ·| P , π 1 , π 2 ) . Assume π 1 6 π 2 , and that π 1 , π 2 , and P do not change with n . Fix a layer set L ⊆ [ m ] . If δ ( L ) > 0 then ther e exist c onstants A, η > 0 dep ending on π 1 and δ ( L ) such that for al l ﬁxe d ε ∈ (0 , η ) , P m,n  Error  b B opt ( n )  < An ε log n  > 1 − exp  − κ ( L ) 2 ε 32 n ε (log n ) 2 − ε + log 4 | L |  (5) for lar ge enough n . Note that the right-hand-side of (5) conv erges to 1 for all ε ∈ (0 , 1), regardless of η . F ur- thermore, if ε ∈ (0 , η ), w e ha ve n ε < n ε 0 for all ε 0 > η . Therefore, a corollary of Theorem 2 is that for an y ε ∈ (0 , 1), P m,n  Error  b B opt ( n )  < n ε log n  → 1 as n → ∞ . The ab ov e statemen t is perhaps a more illustrative version of the Theorem 2, and shows that the constan ts A and η play a role only in b ounding the con vergence rate of the probability . The pro of of Theorem 2 is giv en in Section 4.1. W e note that the assumption that π 1 6 π 2 is made without loss of generalit y , since the communit y lab els are arbitrary . When m = 1, Theorem 2 implies asymptotic n → ∞ consistency in the (single-la yer) sto chastic blo c k mo del. In this case, the condition that δ ` = P ` (1 , 1) P ` (2 , 2) − P ` (1 , 2) 2 > 0 is a natural requiremen t on the inner communit y edge density of a blo ck mo del. This condition app ears in a v ariety of consistency analyses, including the ev aluation of mo dularit y (Zhao et al., 2012). When m > 1, Theorem 2 implies the vertex set that maximizes H ( B , L ) will ha ve asymptotically v anishing error with high probability , giv en that L is a ﬁxed lay er set with al l lay ers satisfying δ ` > 0. 3.2.1 Consistency of the joint optimizer Theorem 2 do es not address the joint optimizer of the score across all vertex-la yer pairs. First, we p oint out that for a ﬁxed B ⊆ [ n ], the limiting behavior of the score b H ( B , L ) dep ends on L ⊆ [ m ] through the lay er-wise determinan ts { δ ` : ` ∈ [ n ] } and the scaling constan t 1 | L | inheren t to H ( B , L ), as deﬁned in equation (3). Let γ : N + 7→ R + b e a non-decreasing function of | L | . Deﬁne H γ ( B , L ) := 1 γ ( | L | ) X ` ∈ L Q ` ( B ) + ! 2 . (6) and let b H γ ( B , L ) b e the corresp onding random version of this score under an MSBM. W e analyze the joint no de-set optimizer of H γ under some representativ e c hoices of γ , an analysis which will ultimately motiv ate the choice γ ( | L | ) = | L | . W e ﬁrst provide an illustrative example. Consider a MSBM with m > 1 la yers having the follo wing structure: the ﬁrst lay er has p ositiv e determinant, and all m − 1 remaining la yers ha ve determinan t equal to 0. Note that δ 1 > 0 implies that the ﬁrst lay er has 9 Wilson, P alowitch, Bhamidi and Nobel ground-truth assortative communit y structure, and that δ ` = 0 for all ` > 1 implies that the remaining lay ers are (indep enden t) Erd os-Renyi random graphs. In this case, the desired global optimizer of H γ ( B , L ) is communit y 1 (or 2) and the ﬁrst lay er. Ho wev er, setting γ ( | L | ) ≡ 1 (eﬀectiv ely ignoring the scaling of H ) will ensure that, in fact, the entir e lay er set is optimal, since Q ` ( B ) + > 0 b y deﬁnition. It follo ws that setting γ ( | L | ) to increase (strictly) in | L | , which in tro duces a p enalty on the size of the lay er set, is desirable. F or a ﬁxed scaling function γ , deﬁne the global join t optimizer of b H ( B , L ) by  b B ( n ) opt , b L ( n ) opt  := arg max 2 [ n ] × 2 [ m ] b H γ ( B , L ) . (7) Note that  b B ( n ) opt , b L ( n ) opt  is random, and may con tain multiple elements of 2 [ m ] × 2 [ n ] . The next theorem addresses the b ehavior of  b B ( n ) opt , b L ( n ) opt  under the MSBM for v arious choices of γ ( | L | ), and sho ws that setting γ ( | L | ) = | L | is desirable for consistency . Theorem 3 Fix m and let { b G ( m, n ) } n> 1 b e a se quenc e of multilayer sto chastic 2 blo ck mo dels wher e b G ( m, n ) is a r andom gr aph with m layers and n no des gener ate d under P m,n ( ·| P , π 1 , π 2 ) . Assume π 1 6 π 2 , and that π 1 , π 2 , and P do not change with n . Fix 0 = δ (0) < δ (1) < 1 . Supp ose the layer set [ m ] is split ac c or ding to [ m ] = ∪ i =0 , 1 L i , wher e δ ` = δ ( i ) for al l ` ∈ L i . Then for any ε > 0 , the fol lowing hold: (a) L et b L + := { ` : b Q ` ( b B ( n ) opt ) > 0 } . If γ ( | L | ) ≡ 1 , then for al l n > 1 , b L ( n ) opt = b L + , and P m,n  Error  b B ( n ) opt  < n ε log n  → 1 as n → ∞ . (b) If γ ( | L | ) = | L | , P m,n  b L ( n ) opt = L 1 , Error  b B ( n ) opt  < n ε log n  → 1 as n → ∞ . (c) If γ ( | L | ) = | L | 2 , P m,n  b L ( n ) opt ⊆ 2 L 1 , Error  b B ( n ) opt  < n ε log n  → 1 as n → ∞ . The proof of Theorem 3 is given in Section 4.2. Part (a) implies that setting γ ( | L | ) ≡ 1 ensures that the optimal lay er set will b e, simply , all lay ers with p ositive mo dularit y , thereb y making this an undesirable c hoice for the function γ . P art (c) says that if γ ( | L | ) = | L | 2 , the lay er set with the highest aver age la yer-wise mo dularit y will be optimal (with high probabilit y as n → ∞ ), which means that all subsets of L 1 are asymptotically equiv alent with resp ect to b H ( B , L ) (with high probabilit y). By part (b), if γ ( | L | ) = | L | , then L 1 is the unique asymptotic maximizer of the p opulation score (with high probability). Therefore, γ ( | L | ) = | L | is the most desirable choice of scaling. 10 Community Extraction in Mul tila yer Networks 3.3 Discussion of theoretical results As men tioned previously , the theoretical results in Section 3.2 regard the glob al optimizer of the score. The results ha ve the follo wing in-practice implication: if one simulates from an MSBM satisfying the assumptions of the theorem, and subsequen tly ﬁnds the glob al optimizer of the score, the classiﬁcation error of the optimizer will b e v anishingly small with increasingly high probability (as n → ∞ ). T o illustrate this p oint, consider an MSBM with m > 1 lay ers, each ha ving communit y structure π 1 = 0 . 4 and for r ∈ (0 , . 95), P (1 , 1) = P (2 , 2) = 0 . 05 + r and P (1 , 2) = 0 . 05. Under this model, δ ` = r ( r + 0 . 1) > 0 for all ` ∈ [ m ], and the assumptions of Theorem 3 are satisﬁed. Therefore, if we were to simulate this MSBM and ﬁnd the global optimizer of H , with high probability (increasingly , n → ∞ ) w e w ould recov er (1) the correct la yer set [ m ], and (2) the optimal no de set will ha ve small classiﬁcation error (v anishingly , n → ∞ ). Of course, in practice it is computationally infeasible to ﬁnd the global optimizer, so w e emplo y the metho d laid out in Section 5 to ﬁnd lo cal maxima. W e ﬁnd through sim ulation (see Section 7) that our metho d ac hieves extremely low error rates for relativ ely small net- w orks, including on the MSBM describ ed abov e, for man y v alues of r . These results reﬂect the in tuition that theoretical results for global optimizers should ha ve practical implications when lo cal optimization metho ds are suﬃciently eﬀectiv e. A limitation of our theoretical results is that they assume an MSBM with only tw o comm unities. F urthermore, the mo del do es not allow for any notion of ov erlap b etw een comm unities. Nevertheless, in the setting w e consider w e demonstrate that there is a concise condition on the parameters of the mo del that guarantees consistency , namely , that δ ( L ) > 0 for a target la y er set L . W e exp ect a similar condition to exist in more complicated settings. In particular, at the outset of Section 4.1, w e sk etch how the δ ( L ) > 0 condition relates to the maxima of the p opulation version of the score, which roughly sp eaking is the deterministic, limiting form of the score under the MSBM. W e exp ect that, in more complex settings with (sa y) more than tw o communities or ov erlapping no des, similar conditions on the mo del parameters w ould guarantee that the p opulation version of the score b e maximized at the correct comm unit y/lay er partitions, which (as w e show) would entail that the classiﬁcation error of the global maximizer con verges in probabilit y to zero. Though the proofs in such settings w ould undoubtedly b e more complicated, the analyses in this pap er should serv e as a theoretical foundation. F urthermore, we hav e empirically analyzed those settings via sim ulation in Section 7.1 (for more than t wo communities) and App endix D (for o verlapping comm unities). 4. Pro ofs In this section we prov e the theoretical results given in Section 3. The ma jority of the sec tion is devoted to a detailed pro of of Theorem 2 and supp orting lemmas. This is follo wed b y the pro of of Theorem 3, of which w e give only a sketc h, as many of the results and techniques con tributing to the pro of of Theorem 2 can b e re-used. 4.1 Pro of of Theorem 2, and Supp orting Lemmas W e pro ve Theorem 2 via a n um b er of supp orting lemmas. W e begin with some notation: 11 Wilson, P alowitch, Bhamidi and Nobel Deﬁnition 4 F or a ﬁxe d vertex set B ⊆ [ n ] deﬁne ρ n ( B ) = | B ∩ C 1 ,n | | B | , s n ( B ) = | B | n , v n ( B ) := ( ρ n ( B ) , 1 − ρ n ( B )) . (8) We wil l at times suppr ess dep endenc e on n and B in the ab ove expr essions. Deﬁnition 5 Deﬁne the p opulation normalize d mo dularity of a set B in layer ` by q ` ( B ) := s n ( B ) √ 2  v n ( B ) t P ` v n ( B ) − ( v n ( B ) t P ` π ) 2 κ `  . (9) Deﬁne the p opulation sc or e function H ∗ ( · , L ) : 2 [ n ] 7→ R by H ∗ ( B , L ) = | L | − 1   X ` ∈ [ L ] q ` ( B )   2 . (10) Throughout the results in this section, w e assume that L ⊆ [ m ] is a ﬁxed la yer set (as in the statement of Theorem 2). W e will therefore, at times, suppress the dependence on L from δ ( L ) and κ ( L ) (from Deﬁnition 1). 4.1.1 Sketch of the Proof of Theorem 2 The pro of of Theorem 2 is inv olved and broken into many lemmas. In this section, we giv e a rough sketc h of the argument, as follows. The lemmas in this section establish that: 1. C 1 ,n maximizes the p opulation score H ∗ ( · , L ) (Lemmas 6 and 7). 2. F or large enough sets B ⊆ [ n ], the random score b H ( B , L ) is b ounded in probabilit y around the p opulation score H ∗ ( B , L ) (Lemmas 9 and 12). 3. Inductiv e Step : F or ﬁxed k > 1, assume that d h ( b B opt ( n ) , C 1 ,n ) /n = O p ( b n,k ), where larger k makes b n,k of smaller order. Then, based on concen tration properties of the score, in fact d h ( b B opt ( n ) , C 1 ,n ) /n = O p ( b n,k +1 ) (Lemma 13). 4. There exists a constan t η such that for any ε ∈ (0 , η ), d h ( b B opt ( n ) , C 1 ,n ) /n = O p ( n ε log n ) (Theorem 2). This result is sho wn using the Inductive Step. 4.1.2 Suppor ting lemmas for the Proof of Theorem 2 Lemma 6 Deﬁne φ ( L ) := ( | L | − 1 P ` det P ` 2 κ ` ) 2 . Then: 1. F or any B ⊆ [ n ] , q ` ( B ) = s n ( B ) √ 2 ( π 1 − ρ n ( B )) 2 · det P ` 2 κ ` , and ther efor e H ∗ ( B , L ) = | L | φ ( L ) s n ( B ) 2 2 ( π 1 − ρ n ( B )) 4 . 2. δ ( L ) 2 6 φ ( L ) 6 1 π 2 1 δ ( L ) 2 and ther efor e H ∗ ( C 1 ,n , L ) > | L | π 2 1 2 (1 − π 4 1 ) δ ( L ) 2 . 12 Community Extraction in Mul tila yer Networks Lemma 7 Fix any n > 1 . Deﬁne R ( t ) := (  B ⊆ [ n ] : max {| s ( B ) − π 1 | , 1 − ρ ( B ) } 6 t  , π 1 < π 2  B ⊆ [ n ] : max {| s ( B ) − π 1 | , ρ ( B ) } 6 1 − ρ ( B ) 6 t  , π 1 = π 2 Then ther e exists a c onstant a > 0 dep ending just on π 1 such that for suﬃciently smal l t , B / ∈ R ( t ) implies H ∗ ( B , L ) < H ∗ ( C 1 ,n , L ) − a | L | φ ( L ) t . The pro ofs of Lemmas 6-7 are given in Appendix A. W e no w giv e a general concen tration inequalit y for b H ( B , L ), which sho ws that for suﬃciently large sets B ⊆ [ n ], b H ( B , L ) is close to the p opulation score H ∗ ( B , L ) with high probability . This result is used in the pro of of Lemma 12, and its pro of is giv en in App endix A. W e ﬁrst give the follo wing deﬁnition: Deﬁnition 8 F or ﬁxe d ε > 0 and n > 1 , deﬁne B n ( ε ) := { B ⊆ [ n ] : | B | > nε } . Lemma 9 Fix ε ∈ (0 , 1) . L et κ b e as in Deﬁnition 1. F or e ach n > 1 supp ose a c ol le ction of no de sets B n is c ontaine d in B n ( ε ) . Then for lar ge enough n , P n  sup B n     b H ( B , L ) − H ∗ ( B , L )     > 4 | L | t n 2 + 52 | L | κn  6 4 | L ||B n | exp  − κ 2 εt 2 16 n 2  for al l t > 0 . W e no w deﬁne new notation that will serv e the remaining lemmas: Deﬁnition 10 L et γ n := log n/n , and for any inte ger k > 0 , deﬁne b n,k := γ 1 − 1 2 k n . Deﬁnition 11 F or any r ∈ [0 , 1] and C ⊆ [ n ] , deﬁne the r -neighb orho o d of C by N ( C, r ) := { B ⊆ [ n ] : d h ( B , C ) /n 6 r } . F or al l n > 1 , any c onstant A > 0 , and ﬁxe d k > 1 , deﬁne N n,k ( A ) := ( N ( C 1 , A · b n,k − 1 ) ∪ N ( C 2 , A · b n,k − 1 ) , k > 1 B n ( A ) , k = 1 Lemma 12, stated b elo w, is a concentration inequality for the random v ariable b H ( B , L ) on particular neighborho o ds of C 1 : Lemma 12 Fix ε ∈ (0 , π 1 ) and any c onstant A > 0 . F or k > 1 satisfying 1 / 2 k − 1 < ε , we have for suﬃciently lar ge n that sup B ∈ N n,k ( A )    b H ( B , L ) − H ∗ ( B , L )    6 5 | L | b n,k . (11) with pr ob ability gr e ater than 1 − 2 exp {− κ 2 ε 32 nγ 1 − ε n log( n ) + log 4 | L |} . The c onclusion holds with k = 1 if A = ε . The pro of of Lemma 12 is giv en in App endix A. W e now state and prov e the k ey lemma used to drive the induction step in the pro of of Theorem 2: 13 Wilson, P alowitch, Bhamidi and Nobel Figure 1: Illustration of relationship b etw een collections of no de sets. Lemma 13 Fix ε ∈ (0 , π 1 ) and an inte ger k > 1 satisfying 1 / 2 k − 1 < ε . Supp ose ther e exist c onstants A, b > 0 such that for lar ge enough n , P n  b B opt ( n ) ∈ N n,k ( A )  > 1 − b exp  − κ 2 ε 32 nγ 1 − ε n log n + log 4 | L |  := 1 − bβ n ( ε ) Then ther e exists a c onstant A 0 > 0 dep ending only on π 1 and δ such that for lar ge enough n , P n  b B opt ( n ) ∈ N n,k +1 ( A 0 )  > 1 − (4 + b ) β n ( ε ) . The c onclusion holds for k = 1 if A = ε . Pro of Assume π 1 < π 2 ; the following argument ma y b e easily adapted to the case where π 1 = π 2 , which w e explain at the end. Recall b n,k from Deﬁnition 10. F or c > 0, deﬁne R n,k ( c ) :=  B ⊂ [ n ] : max {| s ( B ) − π 1 | , 1 − ρ ( B ) } 6 c · b n,k  , Note that sets B ∈ R n,k ( c ) ha ve b ounded Hamming distance from C 1 ,n , as sho wn by the follo wing deriv ation. W riting s = s ( B ) and ρ = ρ ( B ), for all B ∈ R n,k ( c ) we ha ve n − 1 | d h ( B , C 1 ,n ) | = n − 1  | B \ C 1 ,n | + | C 1 ,n \ B |  = n − 1  | B | − | B ∩ C 1 ,n | + | C 1 ,n | − | B ∩ C 1 ,n |  = s + π 1 − 2 ρs 6 s + ( s + c · b n,k ) − 2 (1 − c · b n,k ) s = c · b n,k + 2 sc · b n,k 6 3 c · b n,k . (12) Therefore, R n,k ( c ) ⊆ N ( C 1 ,n , A 0 · b n,k ) ⊂ N n,k +1 ( A 0 ) with A 0 = 3 c . W e ha ve assumed b B opt ( n ) ∈ N n,k ( A ) with high probability; our aim is to show b B opt ( n ) ∈ N n,k +1 ( A 0 ). Since R n,k ( c ) ⊆ N n,k +1 ( A 0 ), it is suﬃcient to sho w that b B opt ( n ) / ∈ N n,k ( A ) ∩ R n,k ( c ) c with high probabilit y . This is illustrated by ﬁgure 1: since b B opt ( n ) is inside the outer o v al (with high probability), it is suﬃcien t to show that it cannot b e outside the inner 14 Community Extraction in Mul tila yer Networks o v al. T o this end, it is enough to show that, with high probability , b H ( B , L ) < b H ( C 1 ,n , L ) for all sets B in N n,k ( A ) ∩ R n,k ( c ) c . Note that b y Lemma 12, sup B ∈ N n,k ( A ) b H ( B , L ) < H ∗ ( B , L ) + 5 | L | b n,k (13) for large enough n , with probability at least 1 − 2 β n ( ε ). Next, since cb n,k → 0 as n → ∞ , by Lemma 7 there exists a constant a > 0 dep ending just on π 1 suc h that for large enough n , B ∈ R n,k ( c ) c implies H ∗ ( B , L ) < H ∗ ( C 1 ,n ) − a | L | φ ( L ) cb n,k . Applying Lemma 12 again, we also hav e H ∗ ( C 1 ,n , L ) < b H ( C 1 ,n ) + 5 | L | b n,k with probability at least 1 − 2 β n ( ε ). F urthermore, φ ( L ) > δ 2 b y Lemma 6. Applying these inequalities to (13), we obtain sup B ∈ N n,k ( A ) ∩R n,k ( c ) c b H ( B , L ) < b H ( C 1 ,n , L ) − a | L | δ 2 cb n,k + 10 | L | b n,k (14) with probability at least 1 − 4 β n ( ε ). With c large enough, (14) implies that b H ( B , L ) < b H ( C 1 ,n , L ) for all B ∈ N n,k ( A ) ∩ R n,k ( c ) c . This prov es the result in the π 1 < π 2 case. If π 1 = π 2 , the argument is almost identical. W e instead deﬁne R n,k ( c ) as R n,k ( c ) :=  B ⊆ [ n ] : max {| s ( B ) − π 1 | , ρ ( B ) , 1 − ρ ( B ) } 6 c · b n,k  . A deriv ation analogous to that giving inequality (12) yields n − 1 max { d h ( B , C 1 ,n ) , d h ( B , C 2 ,n ) } 6 3 c · b n,k , whic h directly implies that R n,k ( c ) ⊆ N n,k +1 ( A 0 ) with A 0 = 3 c . The remainder of the argu- men t follo ws unaltered. 4.1.3 Proof of Theorem 2 Recall Q ` ( B ) from Deﬁnition 2 and let b Q ` ( B ) b e its random version under the MSBM, as in Section 3.2. F or any B ⊆ [ n ], w e ha ve the inequality n b Q ` ( B ) o + 6 Y ` ( B ) n  | B | 2  1 / 2 6  | B | 2  n  | B | 2  1 / 2 6 | B | n . (15) This yields the follo wing inequalit y for b H ( B , L ): b H ( B , L ) = | L | − 1      X ` ∈ [ L ] Q ` ( B )   +    2 6 | L | − 1    X ` ∈ [ L ] Q ` ( B ) +    2 6 | L | − 1 n − 2 | B | 2 . (16) Recall that B n ( ε ) := { B ∈ 2 [ n ] : | B | > εn } . Inequality (16) implies b H ( B , L ) 6 | L | ε 2 for all B ∈ B n ( ε ) c . By part 2 of Lemma 6, φ ( L ) > δ 2 . Therefore, deﬁning τ := π 2 1 2 (1 − π 1 ) 4 δ 2 / 2, | L | τ < | L | φ ( L ) π 2 1 2 (1 − π 1 ) 4 = H ∗ ( B , L ) . 15 Wilson, P alowitch, Bhamidi and Nobel Therefore, for all B ∈ B n ( ε ) c , w e ha ve b H ( B , L ) 6 | L | ε 2 < H ∗ ( C 1 ,n , L ) − | L | ( τ − ε 2 ). By Lemma 12, for large enough n we therefore ha ve sup B n ( ε ) c b H ( B , L ) < b H ( C 1 ,n , L ) − | L | ( τ − ε 2 ) + 5 | L | γ 1 − ε n (17) with probability greater than 1 − 2 β n ( ε ), where β n ( ε ) := exp {− κ 2 ε 32 nγ 1 − ε n log n + log 4 | L |} . F or an y ε < √ τ , inequality (17) implies b H ( B , L ) < b H ( C 1 ,n , L ) for all B ∈ B n ( ε ), and therefore b B opt ( n ) ∈ B n ( ε ), with probabilit y at least 1 − 2 β n ( ε ). Note that ε < √ τ < π 1 , and N n,k ( ε ) = B n ( ε ) by Deﬁnition 11. Therefore, the conditions for Lemma 13 with k = 1 (and A = ε ) are satisﬁed. F or any ﬁxed ε ∈ (0 , η ) with η := √ τ , we ma y no w apply Lemma 13 recursively un til 1 / 2 k 6 ε . This establishes that for suﬃciently large n , P n  b B opt ( n ) ∈ N n,k ( A )  > 1 − (2 + 4 k ) β n ( ε ) . (18) By deﬁnition, b B opt ( n ) ∈ N n,k ( A ) implies that Error ( b B opt ( n )) := min C = C 1 ,C 2 d h ( b B opt ( n ) , C ) 6 A · n · b n,k . (19) Since 1 / 2 k 6 ε , note that n · b n,k = nγ 1 − 1 2 k n = n · n 1 2 k − 1 (log n ) 1 − 1 2 k < n ε log n. Com bined with inequality (18), this completes the pro of.  4.2 Pro of of Theorem 3 T o prov e part (a), w e ﬁrst note that Theorem 2 implies that on the lay er set L 1 , for an y ε > 0, Error ( b B ( n ) opt ) = O p ( n ε log n ). Lemma 6 can b e used to sho w that H ∗ ( B , L ) = 0 for any L ⊆ L 0 and any B ⊆ [ n ]. Using Lemma 9 and taking a union b ound ov er L 0 , it is then straightforw ard to show (using tec hniques from the pro of of Theorem 2) that on the full la yer set [ m ], for any ε > 0, Error ( b B opt ) = O p ( n ε log n ). Considering now b L ( n ) opt , observ e that if b Q ` ( B ) 6 0, then b H ( B , L ) = b H ( B , L \ { ` } ). This immediately implies that b L ( n ) opt = b L + . T o prov e part (b), we note that it is straightforw ard to show (using Lemma 6) that H ∗ ( B , L 1 ) > H ∗ ( B , L ) for any L ⊂ [ m ], with equalit y if and only if L = L 1 . Using Lemma 9 and a union b ound o ver [ m ] will show that b L ( n ) opt = L 1 with high probabilit y . Applying Theorem 2 completes the part. Part (c) is sho wn similarly , with the application of Lemma 6 sho wing that for any L ⊆ L 1 and L 0 ⊆ [ m ], H ∗ ( B , L ) > H ∗ ( B , L 0 ), with equalit y if and only if L 0 ⊆ L 1 .  5. The Multila y er Extraction Pro cedure The Multila yer Extraction procedure is built around three op erations: initialization, ex- traction, and reﬁnement. In the initialization stage, a family of seed vertex sets is speciﬁed. 16 Community Extraction in Mul tila yer Networks Next an iterative extraction pro cedure ( Extraction ) is applied to each of the seed sets. Extraction alternately up dates the lay ers and v ertices in a vertex-la yer comm unity in a greedy fashion, impro ving the score at each iteration, un til no further improv ement to the score is p ossible. The family of extracted vertex-la y er communities is then reduced using the Reﬁnemen t pro cedure, which ensures that the ﬁnal collection of comm unities contains the extracted comm unity with largest score, and that the pairwise o verlap betw een an y pair of communities is at most β , where β ∈ [0 , 1] is a user-deﬁned parameter. The imp ortance and relev ance of this parameter is discussed in Section 5.3.1. W e describ e the Multilay er Extraction algorithm in more detail b elo w. 5.1 Initialization F or each vertex u ∈ [ n ] and lay er ` ∈ [ m ] let N ( u, ` ) = { v ∈ [ n ] : { u, v } ∈ E ` } b e the set of vertices connected to u in G ` . W e will refer to N ( u, ` ) as the neighborho o d of u in la yer ` . Let B 0 = { N ( u, ` ) , u ∈ [ n ] , ` ∈ [ m ] } b e the family of all vertex neigh b orho o ds in the observed m ultilay er net work G ( m, n ). Multilay er Extraction uses the vertex sets in B 0 as seed sets for identifying comm unities. Our c hoice of seed sets is motiv ated by Gleich and Seshadhri (2012), who empirically justiﬁed the use of vertex neigh b orho o ds as go o d seed sets for lo cal detection metho ds seeking comm unities with low conductance. 5.2 Extraction Giv en an initial vertex set, the Extraction procedure seeks a vertex-la y er comm unity with large score. The algorithm iterativ ely conducts a L ayer Set Se ar ch follow ed b y a V ertex Set Se ar ch , and rep eats these steps until a v ertex-la yer set, whose score is a lo cal maxim um, is reac hed. In eac h step of the pro cedure, the score of the candidate communit y strictly increases, and the pro cedure is stopp ed once no improv ements to the score are p ossible. These steps are describ ed next. L ayer Set Se ar ch : F or a ﬁxed vertex set B ⊆ [ n ], Extraction searches for the lay er set L that maximizes H ( B , · ) using a rank ordering of the lay ers that dep ends only on B . In particular let Q ` ( B ) b e the lo cal set mo dularit y of lay er ` from (2). Let L o b e the la yer set identiﬁed in the previous iteration of the algorithm. W e will no w up date the la y er set L 0 ; L . This consists of the following three steps: (i) Order the lay ers so that Q ` 1 ( B ) > · · · > Q ` m ( B ). (ii) Iden tify the smallest integer k suc h that H ( B , { ` 1 , . . . , ` k } ) > H ( B , { ` 1 , . . . , ` k , ` k +1 } ). W rite L p := { ` 1 , . . . , ` k } for the prop osed change in the lay er set. (iii) If H ( B , L p ) > H ( B , L o ) set L = L p . Else set L = L o In the ﬁrst iteration of the algorithm (where we take L o = ∅ ), w e set L = L p in step (iii) of the search. The selected lay er set L p is a lo cal maxim um for the score H ( B , · ). V ertex Set Se ar ch : Suppose now that w e are giv en a v ertex-lay er set ( B , L ). Extraction up dates B , one vertex at a time, in a greedy fashion, with up dates dep ending on the la yer set L and the current v ertex set. In detail, for eac h u ∈ [ n ] let δ u ( B , L ) = ( H ( B / { u } , L ) − H ( B , L ) if u ∈ B H ( B ∪ { u } , L ) − H ( B , L ) if u / ∈ B . (20) 17 Wilson, P alowitch, Bhamidi and Nobel V ertex Set Search iterativ ely up dates B using the following steps: (i) Calculate δ u ( B , L ) for all u ∈ [ n ]. If δ u ( B , L ) 6 0 for all u ∈ [ n ], then stop. Otherwise, iden tify u ∗ = arg max u ∈ [ n ] δ u ( B , L ). (ii) If u ∗ ∈ B , then remov e u ∗ from B . Otherwise, add u ∗ to B . A t each iteration of Extraction , the score of the updated vertex-la y er set strictly in- creases, and the ev entual conv ergence of this pro cedure to a lo cal maxim um is guaranteed as the possible search space is ﬁnite. The resulting local maxima is returned as an extracted comm unity . 5.3 Reﬁnement Beginning with the n vertex neighborho o ds in eac h lay er of the net work, the Extraction pro cedure identiﬁes a collection C T = { ( B t , L t ) } t ∈ T of at most m ∗ n vertex-la yer communi- ties. Giv en an ov erlap parameter β ∈ [0 , 1], the family C T is reﬁned in a greedy fashion, via the Reﬁnement pro cedure, to pro duce a subfamily C S , S ⊆ T , of high-scoring v ertex-lay er sets having the prop ert y that the ov erlap b etw een any pair of sets is at most β . T o quantify ov erlap, we specify a generalized Jaccard match score to measure o verlap b et ween tw o communities. W e measure the ov erlap b et ween t wo candidate communities ( B q , L q ) and ( B r , L r ) using a generalized Jaccard matc h score J ( q , r ) = 1 2 | B q ∩ B r | | B q ∪ B r | + 1 2 | L q ∩ L r | | L q ∪ L r | . (21) It is easy to see that J ( q , r ) is b et ween 0 and 1. Moreov er, J ( q , r ) = 1 if and only if ( B q , L q ) = ( B r , L r ) and J ( q , r ) = 0 if and only if ( B q , L q ) and ( B r , L r ) are disjoint. Larger v alues of J ( · , · ) indicate more o verlap b et ween comm unities. In the ﬁrst step of the pro cedure, Reﬁnement identiﬁes and retains the communit y ( B s , L s ) in C T with the largest score and sets S = { s } . In the next step, the pro cedure iden tiﬁes the communit y ( B s , L s ) with largest sc ore that satisﬁes J ( s, s 0 ) 6 β for all s 0 ∈ S . The index s is then added to S . Reﬁnemen t contin ues expanding S in this wa y un til no further additions to S are p ossible, namely when for eac h s ∈ T , there exists an s 0 ∈ S suc h that J ( s, s 0 ) > β . The reﬁned collection C S = { B s , L s } s ∈ S is returned. 5.3.1 Choice of β Man y existing communit y detection algorithms hav e one or more tunable parameters that control the num b er and size of the communities they iden tify (V on Luxburg, 2007; Lesk ov ec et al., 2008; Mucha et al., 2010; Lancichinetti et al., 2011; Wilson et al., 2014). The family of communities output by Multila yer Extraction dep ends on the ov erlap param- eter β ∈ [0 , 1]. In practice, the v alue of β plays an imp ortant role in the structure of the v ertex-lay er comm unities. F or instance, setting β = 0 will pro vide vertex-la y er communities that are fully disjoint (no ov erlap b et ween v ertices or lay ers). On the other hand, when β = 1 the pro cedure outputs the full set of extracted communities, many of whic h may b e redundan t. In exploratory applications, w e recommend inv estigating the iden tiﬁed comm u- nities at m ultiple v alues of β , as the structure of communities at diﬀerent resolutions may 18 Community Extraction in Mul tila yer Networks pro vide useful insigh ts ab out the netw ork itself (see for instance Lesk o vec et al. (2008) or Muc ha et al. (2010)). Empirically , we observ e that the n umber of communities identiﬁed by the Multila yer Extraction pro cedure is non-decreasing with β , and there is t ypically a long interv al of β v alues ov er which the num b er and iden tity of communities remains constan t. In practice w e sp ecify a default v alue of β b y analyzing the num b er of communities across a grid of β b etw een 0 and 1 in increments of size 0.01. F or ﬁxed i , let β i = ( i − 1) ∗ 0 . 01 and let k i = k ( β i ) denote the num b er of communities iden tiﬁed at β i . The default v alue β 0 is the smallest β v alue in the longest stable windo w, namely β 0 = smallest β i suc h that k ( β i ) = mo de( k 1 , . . . , k 101 ) . 6. Application Study In this section, w e assess the p erformance and p oten tial utility of the Multilay er Ex- traction pro cedure through an empirical case study of three m ultilay er net works, including a multila yer so cial netw ork, transp ortation netw ork, and collaboration netw ork. W e com- pare and contrast the p erformance of Multilay er Extraction with six benchmark metho ds: Sp ectral Clustering (Newman, 2006a), Lab el Propagation (Ragha v an et al., 2007), F ast and Greedy (Clauset et al., 2004), W alktrap (Pons and Latapy, 2005), multila y er Infomap (De Domenico et al., 2014), and multila yer GenLouv ain (Mucha et al., 2010; Jutla et al., 2011). The multila yer Infomap and GenLouv ain metho ds are generalized m ultilay er meth- o ds that can be directly applied to each net work considered here. Each of the other four metho ds ha ve publicly a v ailable implementations in the igr aph pack age in R and in Python, and eac h method is a standard single-lay er detection metho d that can handle weigh ted edges. W e apply the ﬁrst four metho ds to b oth the aggregate (w eighted) net w ork computed from the av erage of the la yers in the analyzed m ultilay er netw ork, and to each la yer sepa- rately . A more detailed description of the comp eting metho ds and their parameter settings is provided in the App endix. F or this analysis and the subsequent analysis in Section 7, w e set Multlay er Extraction to identify vertex-la yer communities that hav e a large signiﬁcance score as sp eciﬁed b y equation (2). F or each metho d we calculate a num b er of quan titative features, including the num b er and size of the iden tiﬁed communities, as w ell as the num b er of identiﬁed bac kground v er- tices. F or all comp eting methods, w e deﬁne a b ackgr ound vertex as a vertex that w as placed in a trivial comm unity , namely , a comm unity of size one. W e also ev aluate the similarit y of communities identiﬁed by eac h method. As aggregate and lay er-by-la yer metho ds do not pro vide informative la yer information, we compare the v ertex sets identiﬁed b y each of the comp eting metho ds with those iden tiﬁed by Multila yer Extraction. T o this end, consider t wo collections of vertex sets B , C ⊆ 2 [ n ] . Let size( B ) denote the num b er of vertex sets con tained in the collection B and let | A | represent the n umber of vertices in the vertex set A . Deﬁne the co verage of B b y C as Co( B ; C ) = 1 size ( B ) X B ∈B max C ∈C  | B ∩ C | | B ∪ C |  . (22) 19 Wilson, P alowitch, Bhamidi and Nobel The v alue Co( B ; C ) quantiﬁes the extent to whic h vertex sets in B are contained in C . In general, Co( B ; C ) 6 = Co( C ; B ). The cov erage v alue Co( B ; C ) is b et ween 0 and 1, with Co( B ; C ) = 1 if and only if B is a subset of C . W e inv estigate three multila y er netw orks of v arious size, sparsity , and relational types: a so cial net work from an Austrailian computer science departmen t (Han et al., 2014); an air transp ortation netw ork of Europ ean airlines (Cardillo et al., 2013); and a collab oration net work of net work science authors on arXiv.org (De Domenico et al., 2014). The size and edge density of eac h net work is summarized in T able 1. Net work # Lay ers # V ertices T otal # Edges A U-CS 5 61 620 EU Air T ransp ort 36 450 3588 arXiv 13 14489 59026 T able 1: Summary of the real multila yer netw orks in our study . AU-CS Network EU Air T ransp ort Net work arXiv Netw ork # No des # No des # No des Comm. mean (sd) Bac k. Co v. Comm. mean (sd) Bac k. Cov. Comm. mean (sd) Bac k. Co v. M-E 6 8.7(2.3) 11 1.00 11 13.1(5.0) 358 1.00 272 8.2(3.5) 12412 1.00 F ast A. 5 12.2(2.6) 0 0.68 8 52.1(54.6) 33 0.10 1543 9.1(55.5) 424 0.27 Spectral A. 7 8.7(5.4) 0 0.54 8 52.0(39.6) 34 0.10 1435 9.8(222.6) 424 0.15 W alktrap A. 6 10.2(3.3) 0 0.75 15 22.9(46.5) 107 0.15 2238 6.3(36.7) 424 0.50 Label A. 6 10.2(6.5) 0 0.59 4 104.3(190.5) 33 0.09 2329 6.0(14.4) 424 0.59 F ast L. 6 7.7(4.2) 16.2 0.78 4 13.1(12.4) 395 0.34 301 6.9(27.8) 12428 0.68 Spectral L. 6 8.0(5.0) 16.4 0.72 3 16.1(13.9) 398 0.30 283 7.2(38.1) 12457 0.66 W alktrap L. 7 6.6(5.1) 16.2 0.74 5 10.2(12.8) 404 0.34 353 5.8(15.4) 12428 0.76 Label L. 5 9.7(9.6) 16.2 0.76 2 27.8(26.8) 395 0.29 383 5.4(6.7) 12428 0.79 GenLouv ain 6 50.8(28.4) 0 0.50 9 49.6(26.1) 422 0.64 402 9.1(4.5) 12101 0.83 Infomap 20 7.0(4.9) 0 0.74 41 2.3 (2.4) 33 0.21 3655 7.1(109.1) 423 0.29 T able 2: Quan titative summary of the identiﬁed comm unities in each of the three real applications. Sho wn are the num b er of communities (Comm.), the mean and standard deviation of the n umber of no des in each comm unity , the n umber of background no des p er lay er (Bac k.) and the cov erage of the M-E with the metho d (Cov.). Metho ds run on the aggregate net work are follo wed A. and la yer-b y-lay er metho ds are follow ed by L. T able 2 provides a summary of the quan titative features of the communities identiﬁed b y each metho d. F or ease of discussion, we will abbreviate Multilay er Extraction by M-E in the next tw o sections of the manuscript, where we ev aluate the performance of the metho d on real and sim ulated data. 6.1 AU-CS Netw ork Individuals interact across multiple mo des: for example, tw o p eople ma y share a friend- ship, business partnership, college ro ommate, or a sexual relationship, to name a few. Thus m ultilay er net work mo dels are particularly useful for understanding so cial dynamics. In suc h 20 Community Extraction in Mul tila yer Networks a mo del, v ertices represen t the individuals under study and la yers represent the t yp e of in- teraction among individuals. Corresp ondingly , v ertex-lay er communities represent groups of individuals that are closely related across a subset of in teraction mo des. The n umber and t yp e of la yers in a vertex-la yer comm unity describ es the strength and type of relationships that a group of individuals share. W e demonstrate the use of M-E on a social net work by in vestigating the AU-CS netw ork, whic h describ es online and oﬄine relationships of 61 emplo yees of a Computer Science researc h departmen t in Australia. The v ertices of the netw ork represen t the employ ees in the department. The lay ers of the netw ork represent ﬁv e diﬀe ren t relationships among the emplo yees: F ac eb o ok , leisur e , work , c o-authorship , and lunch . Co-author Leisure Work Lunch Facebook a) c) math-ph q-bio q-bio.BM nlin.AO cs.CV cond-mat.dis-nn physics-data-an cs.SI math.OC physics.bio-ph cond-mat.stat-mech physics.soc-ph q-bio.MN math-ph q-bio q-bio.BM nlin.AO cs.CV cond-mat.dis-nn physics-data-an cs.SI math.OC physics.bio-ph cond-mat.stat-mech physics.soc-ph q-bio.MN b) Figure 2: a) The A U-CS m ultilay er netw ork. The vertices hav e been reordered based on the six comm unities identiﬁed b y M-E. b) The lay ers of the eleven extracted s igniﬁcan t comm u- nities identiﬁed in the EU transp ort netw ork. The la yers are ordered according to the t yp e of airline. The darkness of the shaded in blocks represen ts the score of the identiﬁed comm unity . c) Adjacency matrix of la yers in the arXiv netw ork, where edges are placed b et ween la yers that were contained in one or more of the comm unities iden tiﬁed b y M-E. Dotted lines separate three communities of submission types that were identiﬁed using sp ectral clustering. Resul ts M-E iden tiﬁed 6 non-ov erlapping vertex-la yer comm unities, which are illustrated in Fig- ure 2 a . These communities reveal several interesting patterns among the individuals in the netw ork. Both the work and lunch la yers were contained in all six of the identiﬁed 21 Wilson, P alowitch, Bhamidi and Nobel comm unities, reﬂecting a natural co-o ccurrence of w ork and lunc h in teractions among the emplo yees. Of the comp eting metho ds, GenLouv ain was the only other metho d to iden tify this co-o ccurrence. F urthermore, t wo of the iden tiﬁed comm unities b y M-E contained the leisur e and F ac eb o ok la yers, b oth of whic h are so cial activities that sometimes extend be- y ond w ork and lunc h. Thus these communities lik ely represent groups of employ ees with a stronger friendship than those that were simply colleagues. These interpretable features iden tiﬁed b y M-E provide an example of how our metho d can b e used to provide useful insigh ts b ey ond aggregate and lay er by la yer methods. With the exception of Infomap, all of the metho ds identify a similar num b er of comm u- nities (ranging from 5 to 7). Infomap, on the other hand, iden tiﬁed 20 small comm unities in the multila yer netw ork. The 11 bac kground vertices iden tiﬁed b y M-E w ere sparsely connected, ha ving tw o or fewer connections in 3 of the lay ers. As seen by the cov erage measure in T able 2, the vertex sets identiﬁed b y M-E were similar to those identiﬁed by the single-la yer methods as well as b y Infomap. F urthermore, the comm unities identiﬁed b y the aggregate approac hes are in fact well contained in the family iden tiﬁed by M-E (av erage co verage = 0.78). In summary , the vertex sets identiﬁed by M-E reﬂect b oth the aggregate and separate la yer comm unity structure of the netw ork, and the lay er sets reveal imp ortan t features ab out the so cial relationships among the employ ees. 6.2 Europ ean Air T ransp ortation Netw ork Multila yer net works ha ve b een widely used to analyze transp ortation netw orks (Strano et al., 2015; Cardillo et al., 2013). Here, v ertices generally represen t spatial lo cations (e.g., an in tersection of t wo streets, a railroad crossing, GPS coordinates, or an airp ort) and la yers represen t transit among diﬀerent modes of transp ortion (e.g., a car, a subw a y , an airline, or bus). The t ypical aim of multila yer analysis of transp ortation net works is to b etter understand the eﬃciency of transit in the analyzed lo cation. V ertex-lay er communities in transp ortation net works con tain collections of vehicles (la yers) that frequen tly trav el along the same transit route (v ertices). V ertex-lay er communities rev eal similarity , or even redun- dancy , in transp ortation among v arious mo des of transp ortation and enable optimization of trav el eﬃciency in the lo cation. In the presen t example, we use M-E to analyze the European air transp ortation netw ork, where vertices represen t 450 airp orts in Europ e and lay ers represent 37 diﬀerent airlines. An edge in la yer j is presen t b etw een tw o airports if airline j tra veled a direct ﬂight betw een the tw o airp orts on June 1st, 2011. Notably here, each airline b elongs to one of ﬁve classes: major (18); low-c ost (10); r e gional (6); c ar go (2); and other (1). A m ultiplex visualization of this netw ork is sho wn in Figure 3. Resul ts The la y er sets of the extracted M-E communities are illustrated in Figure 2 b , and a summary of M-E and the comp eting metho ds is av ailable in the second ma jor column of T able 2. M-E identiﬁed 11 small comm unities (mean num b er of v ertices = 13.1, mean n umber of la yers = 3.73). This suggests that the airlines generally follow distinct routes con taining a small num b er of unique airp orts. F urthermore, Figure 2 illustrates that the la yers of each communit y are closely asso ciated with airline classes. Indeed, an av erage 22 Community Extraction in Mul tila yer Networks Figure 3: A one-dimensional visualization of the Europ ean Air T ransp ortation Netw ork. Edges are placed b etw een airlines that share at least t wo routes b etw een airports. of 78 % of the la yers in each comm unity b elonged to the same airline class. This reﬂects the fact that airlines of the same class tend to hav e direct ﬂights b et ween similar airp orts. T ogether these tw o ﬁndings suggest that in general there is little redundancy among the tra vel of airlines in Europ e but that airlines of a similar class tend to tra vel the same routes. In terestingly , the r e gional airline Norw egian Air Shuttle (NAX) and the major airline Scandana vian Airlines (SAS) app eared together in 4 unique comm unities. These airlines are in fact the top tw o air carriers in Scandanavia and ﬂy primarily to airp orts in Norwa y , Sw eden, Denmark, and Finland. Thus, M-E rev eals that these tw o airlines share many of the same transp ortation routes despite the fact that they are diﬀerent airline classes. In comparison to the comp eting methods, w e ﬁnd that b oth the single-lay er methods and M-E identiﬁed on the order of 400 background vertices ( ≈ 89%), whic h suggests that man y of the airp orts are not frequen ted by m ultiple airlines. This ﬁnding aligns with the fact that man y airp orts in Europ e are small and do not service multiple airline carriers. Aggregate detection approaches, as well as m ultilay er GenLouv ain, identiﬁed a similar num b er of comm unities as M-E. In fact, the results of M-E most closely matc hed those of GenLouv ain (Co verage = 0.64). W e found that Infomap again identiﬁed the most communities (almost 4 times as man y as an y other metho d) and, like the aggregate approaches, iden tiﬁed few bac kground v ertices. 6.3 arXiv Netw ork Our ﬁnal demonstration of M-E is on the m ultilay er collab oration arXiv netw ork from De Domenico et al. (2014). In a m ultilay er representation of a collab oration net work, ver- tices represent researchers or other p ossible collaborators, and lay ers represent scientiﬁc ﬁelds or sub-ﬁelds under which researc hers collab orated. F or these applications, multila yer net works provide information ab out the dissemination and segmentation of scien tiﬁc re- searc h, including the p ossible ov erlap of collab orative work under diﬀerent scientiﬁc ﬁelds. 23 Wilson, P alowitch, Bhamidi and Nobel V ertex-lay er comm unities in collab orative net works represen t groups of individuals who col- lab orated with one another across a subset of scientiﬁc ﬁelds. Such communiti es describ e ho w diﬀering scien tiﬁc ﬁelds ov erlap, which collab orators are w orking in a similar area, as w ell as how to disseminate research across ﬁelds and how scientists can most easily take part in interdisciplinary collab orations. The arXiv net work that we analyze represents the authors of all arXiv submissions that con tained the w ord “net works” in its title or abstract b et ween the y ears 2010 and 2012. The netw ork has 14489 vertices represen ting authors, and 13 lay ers representing the arXiv category under which the submission w as placed. An edge is placed b etw een tw o authors in lay er ` if they co-authored a pap er placed in that category . The netw ork is sparse, with eac h la yer ha ving edge density less than 1.5%. Resul ts M-E iden tiﬁed 272 multila yer communities, with an av erage of 2.39 lay ers p er commu- nit y . The communities were small in size, suggesting that netw ork science collab oration groups are relativ ely tigh tly-knit, b oth in num b er of authors and num b er of diﬀering ﬁelds. In Figure 2 c , w e plot an adjacency matrix for la yers whose ( i, j ) en try is 1 if and only if la yers i and j w ere con tained in at least one m ultilay er comm unity . Using the adjacency ma- trix, the la yers of the netw ork were partitioned into communities using Sp ectral clustering. This ﬁgure identiﬁes the existence of three activ e in terdisciplinary working groups among the selected researc hers. These results suggest tw o imp ortant insights. First, netw ork re- searc hers can identify his or her primary sub-ﬁeld comm unity and b est disseminate research in this area by communicating with researchers from other ﬁelds in the same comm unity . Second, Figure 2 c illustrates clear separation of three ma jor areas of study . By promot- ing cross-disciplinary eﬀorts b etw een each of these three ma jor areas, the dissemination of kno wledge among netw ork scientists will lik ely b e m uch more eﬃcient. Infomap and the aggregate approac hes identify on the order of 1000 small to mo derately sized communities (mean n umber of vertices b etw een 6.04 and 9.80) with approximately 423 background v ertices. Notably , the 423 bac kground v ertices iden tiﬁed by eac h of these metho ds had a 0.92 matc h. On the other hand M-E, GenLouv ain and the single lay er approac hes iden tify a smaller num b er of communities (b et ween 272 and 402), and classify ab out 12 thousand (roughly 86%) of the vertices as background (of whic h share a match of 0.9). These ﬁndings suggest that the individual la yers of the arXiv net work ha ve het- erogeneous comm unity structure, and that they contain many non-preferentially attached v ertices. Once again, GenLouv ain had the highest match with M-E (cov erage = 0.83). 7. Sim ulation Study As noted ab o ve, Multila yer Extraction has three k ey features: it allo ws communit y o verlap; it iden tiﬁes background; and it can identify comm unities that are present in a small subset of the av ailable la yers. Below we describ e a simulation study that aims to ev aluate the p erformance of M-E with resp ect to these features. The results of additional sim ulations are describ ed in the App endix. In this sim ulation study , we are particularly interested in comparing the p erformance of M-E with oﬀ-the-shelf aggregate and la yer-b y-la yer metho ds. W e make this comparison 24 Community Extraction in Mul tila yer Networks using the la yer-b y-lay er and aggregate approac hes using the communit y detection metho ds describ ed in Section 6. Deﬁne the match b etw een tw o v ertex families B and C by M ( B ; C ) = 1 2 Co( B ; C ) + 1 2 Co( C ; B ) , (23) where Co( B ; C ) is the co verage measure for vertex families from (22). The match M ( B ; C ) is symmetric and takes v alues in [0 , 1]. In particular, M ( B ; C ) = 1 if and only if B = C and M ( B ; C ) = 0 if and only if B and C are disjoint. In our simulations, w e compute the matc h b et ween the family of vertex sets iden tiﬁed b y eac h metho d and the family of true sim ulated communities. F or the la y er-by-la yer metho ds, w e ev aluate the a verage match of the comm unities iden tiﬁed in eac h la yer. Supp ose that T is the true family of v ertex sets in a sim ulation and D is the family of vertex sets identiﬁed b y a detection procedure of in terest. Note that the v alue Co( D ; T ) quantiﬁes the sp eciﬁcit y of D , while Co( T ; D ) quantiﬁes its sensitivity; th us, M ( D ; T ) is a quan tit y betw een 0 and 1 that summarizes b oth the sensitivity and speciﬁcity of the identiﬁed vertex sets D . The results of the sim ulation study are summarized in Figures 4, 5, and 6 and discussed in more detail b elo w. 7.1 Multilay er Sto chastic Blo ck Mo del In the ﬁrst part of the simulation study w e generated multila yer sto c hastic blo ck models with m ∈ { 1 , 5 , 10 , 15 } lay ers, k ∈ { 2 , 5 } blo cks, and n = 1000 vertices suc h that eac h la yer has the same communit y structure. In more detail, eac h v ertex is ﬁrst assigned a communit y lab el { 1 , . . . , k } according to a probabilit y mass function π = (0 . 4 , 0 . 6) for k = 2 and π = (0 . 2 , 0 . 1 , 0 . 2 , 0 . 1 , 0 . 4) for k = 5. In eac h lay er, edges are assigned indep enden tly , based on v ertex communit y membership, according to the probabilit y matrix P with en tries P ( i, i ) = r + 0 . 05 and P ( i, j ) = 0 . 05 for i 6 = j . Here r is a parameter representing connectivity strength of vertices within the same comm unity . The resulting multila yer net work consists of m indep endent realizations of a sto chastic k blo ck mo del with the same communities. F or eac h v alue of m and k we v ary r from 0.00 to 0.10 in incremen ts of 0.005. M-E and all other comp eting metho ds are run on ten replications of each sim ulation. The av erage matc h of each method to the true communities is giv en in Figure 4. Resul ts In the single-lay er ( m = 1) setting M-E is comp etitive with the existing single-lay er and multila yer metho ds for r > 0 . 05, and identiﬁes the true communities without error for r > 0 . 06. F or m > 5 M-E outp erforms all comp eting single-la yer metho ds for r > 0 . 02. As the num b er of lay ers increases, M-E and the multila yer Infomap and GenLouv ain metho ds exhibit improv ed performance across all v alues of r . F or the tw o communit y blo c k model, M-E and the other m ultilay er metho ds hav e comparable performance in net works with ﬁve or more la y ers. M-E outp erforms the single-la y er and m ultilay er metho ds in the k = 5 block mo del when m > 1. As exp ected, aggregate approaches p erform w ell in this simulation, outp erforming or matching other metho ds when m 6 5 (results not shown). These results suggest that in homogeneous m ultilay er netw orks M-E can outp erform or match existing metho ds when the net work con tains a mo derate to large num b er of lay ers. 25 Wilson, P alowitch, Bhamidi and Nobel Figure 4: (Color) Simulation results for m ultilay er sto c hastic block mo del. In eac h plot, w e rep ort the match of the identiﬁed comm unities with the true communities where the matc h is calculated using the match score in (23). 26 Community Extraction in Mul tila yer Networks Figure 5: (Color) Simulation results p ersistence simulations. In eac h plot, we report the match of the identiﬁed communities with the true communities where the matc h is calculated using the matc h score in (23). 7.2 Persistence In the second part of the simulation study we consider m ultilay er netw orks with het- erogeneous comm unity structure. W e simulated net works with 50 la yers and 1000 v ertices. The ﬁrst τ ∗ 50 lay ers follo w the sto chastic blo ck mo del outlined in Section 7.1 with a ﬁxed connection probabilit y matrix P having entries P ( i, i ) = 0 . 15 and P ( i, j ) = 0 . 05 for i 6 = j . The remaining (1 − τ ) ∗ 50 la y ers are indep endent Erd˝ os-R ´ en yi random graphs with p = 0 . 10, so that in each la yer ev ery pair of vertices is connected indep endently with probabilit y 0 . 10. F or eac h k ∈ { 2 , 5 } w e v ary the persistence parameter τ from 0.02 to 1 in incremen ts of 0.02, and for each v alue of τ , w e run M-E as well as the comp eting metho ds on ten replications. The av erage match of eac h metho d is rep orted in Figure 5. Resul ts In b oth blo c k mo del settings with k = 2 and 5 communities, M-E outp erforms com- p eting aggregate and multila yer metho ds for small v alues of τ . At these v alues, aggregate metho ds p erform p o orly since the comm unity structure in the la yers with signal is hid- den by the noisy Erd˝ os-R´ enyi la yers once the la yers are aggregated. Though not shown in Figure 6, the lay er-by-la yer metho ds are able to correctly identify the comm unity struc- ture of the lay ers with signal. How ever, these metho ds identify on av erage of 4 or more non-trivial communities in eac h noisy la y er where there is in fact no communit y structure presen t. Multilay er GenLouv ain and Infomap outp erform aggregate metho ds in this sim- ulation, suggesting that av ailable m ultilay er metho ds are able to b etter handle multila yer net works with heterogeneous la yers. Whereas the noisy Erd˝ os-R´ en yi la yers p osed a c hal- lenge for b oth single-lay er and aggregate metho ds, M-E nev er included any of these lay ers in an identiﬁed communit y . These results highligh t M-E’s ability to handle netw orks with noisy and heterogeneous la yers. 27 Wilson, P alowitch, Bhamidi and Nobel 7.3 Single Embedded Communities W e next ev aluate the ability of M-E to detect a single em b edded communit y in a mul- tila yer net work. W e construct m ultilay er netw orks with m ∈ { 1 , 5 , 10 , 15 } lay ers and 1000 v ertices according to the following pro cedure. Each lay er of the netw ork is generated by em b edding a common communit y of size γ ∗ 1000 in an Erd˝ os-R´ en yi random graph with connection probabilit y 0 . 05 in such a w a y that vertices within the comm unit y are connected indep enden tly with probabilit y 0 . 15. The parameter γ is v aried b et ween 0.01 and 0.20 in incremen ts of 0.005; ten independent replications of the embedded net work are generated for eac h γ . F or eac h metho d, we calculate the co verage C ( E , C ) of the true embedded comm unity E by the identiﬁed collection C . W e rep ort the a verage cov erage ov er the ten replications in Figure 6. Resul ts In the single la yer setting, M-E is able to correctly iden tify the embedded communit y when the em b edded vertex s et tak es up approximately 11 p ercen t ( γ = 0 . 11) of the lay er. As b efore, the p erformance of M-E greatly improv es as the n umber of lay ers in the observed m ultilay er netw ork increases. F or example at m = 5 and m = 10, the algorithm correctly iden tiﬁes the embedded communit y (with at least 90% match) once the communit y has size taking as little as 6 percent ( γ = 0 . 055) of eac h lay er. A t m = 15, M-E correctly extracts comm unities with size as small as three p ercent of the graph in eac h la yer. Figure 6: (Color) Sim ulation results for single em b edded simulations. In each plot, w e report the match of the identiﬁed comm unities with the true communities where the matc h is calculated using the match score in (23). 28 Community Extraction in Mul tila yer Networks In the lo wer right plot of Figure 6, we illustrate the results for the aggregate metho ds applied to the simulated netw ork with m = 15 lay ers. In the low er left plot of Figure 6, w e sho w the results for the la yer-b y-la yer metho ds in this simulation and the upp er part of the ﬁgure show the results for the multila yer metho ds. F or m > 5 M-E outp erforms all of the comp eting aggregate metho ds. In addition, M-E outp erforms every la yer-b y-lay er metho d for all m . Multila yer GenLouv ain and Infomap hav e comparable p erformance to M-E across all m and γ . These results emphasize the extraction cabilities of M-E and sho w that the pro cedure, contrary to aggregate and single-lay er comp eting metho ds, is able to detect small embedded comm unities in the presence of background v ertices. 8. Discussion Multila yer net w orks hav e b een proﬁtably applied to a num b er of complex systems, and comm unity detection is a v aluable exploratory technique to analyze and understand net- w orks. In many applications, the communit y structure of a multila y er netw ork will diﬀer from la yer to lay er due to heterogeneity . In suc h netw orks, actors in teract in tightly con- nected groups that p ersist across only a subset of lay ers in the net w ork. In this pap er we ha ve introduced and ev aluated the ﬁrst communit y detection method to address multila y er net works with heterogeneous communities, Multila yer Extraction. The core of Multilay er Extraction is a signiﬁcance score that quantiﬁes the connection strength of a v ertex-lay er set b y comparing connectivity in the observed netw ork to that of a ﬁxed degree random net work. Empirically , we sho wed that Multilay er Extraction is able to successfully identify com- m unities in multila y er netw orks with o verlapping, disjoin t, and heterogeneous communit y structure. Our numerical applications revealed that Multila yer Extraction can identify relev ant insigh ts ab out complex relational systems b eyond the capabilities of existing de- tection metho ds. W e also established asymptotic consistency of the global maximizer of the Multila yer Extraction score under the m ultilay er sto c hastic block mo del. W e note that in practice the Multilay er Extraction pro cedure can iden tify ov erlapping communit y structure in multila yer net works; ho w ever, our consistency results apply to a m ultilay er model with non-o verlapping comm unities. F uture work should in vestigate consistency results lik e the ones deriv ed here on a multila y er mo del with ov erlapping communities, suc h as a multila yer generalization of the mixed mem b ership blo ck mo del introduced in Airoldi et al. (2008). W e exp ect that similar theory will hold in an o v erlapping mo del and that the theoretical tec hniques utilized here can b e used to pro ve suc h results. The Multilay er Extraction metho d pro vides a ﬁrst step in understanding and analyzing m ultilay er net works with heterogeneous communit y structure. This work encourages sev eral in teresting areas of future research. F or instance, the techniques used in this paper could b e applied, with suitable mo dels, to netw orks ha ving ordered lay ers (e.g. temp oral netw orks), as well as to netw orks with weigh ted edges such as the recent work done in P alo witch et al. (2016); Wilson et al. (2017). F urthermore, one could incorp orate b oth node- and la yer-based co v ariates in the null mo del to handle exogenous features of the m ultilay er net work. Finally , it would b e interesting to ev aluate the consistency of Multilay er Extraction in multila y er net works in the high dimensional setting where the num b er of vertex-la y er communities gro ws with the num b er of vertices. 29 Wilson, P alowitch, Bhamidi and Nobel Ac kno wledgements The authors gratefully ac knowledge P eter Mucha for helpful discussions and suggestions for this work. W e also thank the asso ciate editor and the t wo anon ymous referees whose commen ts greatly improv ed the con tent and exposition of the pap er. The w ork of JDW w as supp orted in part by NSF grants DMS-1105581, DMS-1310002, and SES grant 1357622. The work of JP was supp orted in part by NIH/NIMH grant R01-MH101819-01. The work of SB was supp orted in part by NSF grants DMS-1105581, DMS-1310002, DMS-160683, DMS-161307, SES gran t 1357622, and AR O grant W911NF-17-1-0010. The work of ABN w as supp orted in part b y NSF DMS-1310002, NSF DMS-1613072, as well as NIH HG009125- 01, NIH MH101819-01. The con tent is solely the resp onsibility of the authors and do es not necessarily represent the oﬃcial views of the National Institutes of Health. App endix A. Pro ofs of Lemmas from Section 3.2 A.1 Pro of of Lemma 6 It is easy to show that for any 2 × 2 symmetric matrix A and 2-vectors x , y , ( x T A x )( y T A y ) − ( x T A y ) 2 = ( x 1 y 2 − x 2 y 1 ) 2 det( A ) . Fix B ⊆ [ n ] and let s, ρ , and v corresp ond to B , as in Deﬁnition 4. Then for any ` ∈ [ L ], using the fact that κ ` := π T P ` π and the identit y ab ov e, we ha v e v t P ` v − ( π t P ` π ) 2 κ ` = κ ` v t P ` v κ ` − ( v t P ` π ) 2 κ ` = ( π t P ` π )( v t P ` v ) − ( v t P ` π ) 2 κ ` = ( π 1 (1 − ρ ) − π 2 ρ ) 2 det P ` κ ` = ( π 1 − ρ ) 2 det P ` κ ` . Recall that q ` ( B ) := s √ 2  v t P ` v − ( π t P ` π ) 2 /κ `  and H ∗ ( B , L ) = | L | − 1 ( P ` q ` ( B )) 2 . P art 1 follo ws b y summation ov er L . F or part 2, note that π 2 1 P ` (1 , 1) + π 2 2 P (2 , 2) > 2 π 1 π 2 p P ` (1 , 1) P ` (2 , 2). Therefore, κ ` = π 2 1 P ` (1 , 1) + 2 π 1 π 2 P ` (1 , 2) + π 2 2 P (2 , 2) > 2 π 1 π 2  p P ` (1 , 1) P ` (2 , 2) + P ` (1 , 2)  > 2 π 1 π 2  p P ` (1 , 1) P ` (2 , 2) + P ` (1 , 2)   p P ` (1 , 1) P ` (2 , 2) − P ` (1 , 2)  = 2 π 1 π 2 δ > π 1 δ. Th us δ 6 det P ` κ ` 6 1 π 1 δ . Part 2 follo ws.  A.2 Pro of of Lemma 7 Deﬁne g : 2 [ n ] 7→ R by g ( B ) := s ( B ) 2 ( ρ ( B ) − π 1 ) 4 . Recall the function φ ( L ) deﬁned in Lemma 6. Note that part 1 of Lemma 6 implies H ∗ ( B , L ) = | L | φ ( L ) g ( B ). It is therefore suf- ﬁcien t to sho w that there exists a constan t a > 0 suc h that for suﬃciently small t , B ∈ R ( t ) c 30 Community Extraction in Mul tila yer Networks implies g ( B ) < g ( C 1 ,n ) − at . W e will sho w this separately for the π 1 < π 2 and π 1 = π 2 cases. Part 1 ( π 1 < π 2 ): Deﬁne the interv als I 1 := [0 , π 1 ], I 2 := ( π 1 , π 2 ], and I 3 := ( π 2 , 1]. W e trisect 2 [ n ] , the domain of g , with the collections D i,n := { B ⊆ [ n ] : s ( B ) ∈ I i } , for i = 1 , 2 , 3. W e will prov e that the inequalit y g ( B ) < g ( C 1 ,n ) − at holds for all B ∈ R ( t ) on eac h of those collections. W e will contin ually rely on the fact that B ∈ R ( t ) implies at least one of the inequalities (I) | s ( B ) − π 1 | > t or (I I) 1 − ρ ( B ) < t is true. Supp ose B ∈ R ( t ) c ∩ D 1 ,n and inequality (I) is true. Then s ( B ) < π 1 − t , and g ( B ) := s ( B ) 2 2 ( ρ ( B ) − π 1 ) 4 6 s ( B ) 2 2 (1 − π 1 ) 4 (since π 1 6 1 / 2) < ( π 1 − t ) 2 2 (1 − π 1 ) 4 = π 2 1 2 (1 − π 1 ) 4 − 2 t (1 − π 1 ) 4 + o ( t ) < π 2 1 2 (1 − π 1 ) 4 − t (1 − π 1 ) 4 = g ( C 1 ,n ) − t (1 − π 1 ) 4 (24) for suﬃciently small t . If inequality (I I) is true, then ( ρ ( B ) − π 1 ) 4 6 max { (1 − t − π 1 ) 4 , π 4 1 } = max { ( π 2 − t ) 4 , π 4 1 } = ( π 2 − t ) 4 for suﬃciently small t , as π 1 < π 2 . Therefore, g ( B ) 6 π 4 1 2 ( π 2 − t ) 4 = π 4 1 2 π 4 2 − 4 π 3 2 t + o ( t ) < g ( C 1 ,n ) − 2 π 3 2 t (25) for suﬃciently small t . Th us for all B ∈ R ( t ) c ∩ D 1 ,n , g ( B ) < g ( C 1 ,n ) − a 1 t with a 1 = min { (1 − π 1 ) 4 , 2 π 3 2 } . Supp ose B ∈ R ( t ) c ∩ D 2 ,n and inequalit y (I) is true. Then s ( B ) > π 1 + t . Note that 0 6 ρ ( B ) | B | 6 | C 1 ,n | , yielding the useful inequality 0 6 ρ ( B ) 6 π 1 /s ( B ) . (26) Subtracting through by π 1 giv es ( ρ ( B ) − π 1 ) 4 6 max { π 4 1 , π 4 1 (1 /s ( B ) − 1) 4 } = π 4 1 (1 /s ( B ) − 1) 4 . Therefore, g ( B ) 6 s ( B ) 2 2 π 4 1 (1 /s ( B ) − 1) 4 = π 4 1 2 (1 / p s ( B ) − p s ( B )) 4 < π 4 1 2 (1 / √ π 1 + t − √ π 1 + t ) 4 , (27) since F ( x ) := (1 / √ x − √ x ) 4 is decreasing on (0 , 1], and s ( B ) > π 1 + t . Note that d dt  1 √ π 1 + t − √ π 1 + t  4 = − 3  1 √ π 1 + t − √ π 1 + t  3  1 2( π 1 + t ) 3 / 2 + 1 2 √ π 1 + t  . (28) 31 Wilson, P alowitch, Bhamidi and Nobel By T a ylor’s theorem, this implies that (1 / √ π 1 + t − √ π 1 + t ) 4 = (1 / √ π 1 − √ π 1 ) 4 − a 2 t + o ( t ) < (1 / √ π 1 − √ π 1 ) 4 − a 2 t/ 2 for suﬃciently small t , where a 2 is the righ t-hand-side of (28) at t = 0. Note further that (1 / √ π 1 − √ π 1 ) 4 = ( π 2 / √ π 1 ) 4 = π 4 2 /π 2 1 . Putting these facts together with inequalit y (27), w e obtain g ( B ) < π 4 1 2 π 4 2 π 2 1 − a 2 t/ 2 = π 1 2 π 4 2 − a 2 t/ 2 = g ( C 1 ,n ) − a 2 t/ 2 . (29) If inequality (I I) is true, ρ ( B ) < 1 − t . If ρ ( B ) 6 π 1 , ( ρ ( B ) − π 1 ) 4 is maximized when ρ ( B ) = 0, so that, since s ( B ) 6 π 2 , g ( B ) 6 π 2 2 2 π 4 1 = g ( C 1 ,n ) + π 2 2 2 π 4 1 − π 2 1 2 π 4 2 = g ( C 1 ,n ) + π 2 1 π 2 2 2 ( π 2 2 − π 2 1 ) < g ( C 1 ,n ) − t (30) for suﬃcien tly small t , since π 1 is ﬁxed. If ρ ( B ) > π 1 , note that inequality (26) implies s ( B ) 6 π 1 /s ( B ). Therefore, g ( B ) 6 π 2 1 2 ρ ( B ) 2 ( ρ ( B ) − π 1 ) 4 = π 2 1 2  p ρ ( B ) − π 1 / p ρ ( B )  4 < π 2 1 2  √ 1 − t − π 1 / √ 1 − t  4 (31) since G ( x ) := ( √ x − π 1 / √ x ) 4 is increasing on ( π 1 , 1]. A similar T aylor expansion argumen t to that yielding inequalit y (29) yields, for a constant a 3 dep ending only on π 1 , g ( B ) < π 2 1 2 (1 − π 1 ) 4 − a 3 t/ 2 = g ( C 1 ,n ) − a 3 t/ 2 , (32) for suﬃciently small t . Pulling together inequalities (29), (30), and (32), we hav e that for all B ∈ R ( t ) c ∩ D 1 ,n , g ( B ) < g ( C 1 ,n ) − a 4 with a 4 := min { a 2 / 2 , 1 , a 3 / 2 } . Supp ose B ∈ R ( t ) c ∩ D 3 ,n . Note that | B | − | C 2 ,n | 6 | B ∩ C 1 ,n | 6 | C 1 ,n | . Dividing through b y | B | yields the useful inequality 1 − π 2 /s ( B ) 6 ρ ( B ) 6 π 1 /s ( B ) . (33) Subtracting inequality (33) by π 1 giv es π 2 (1 − 1 /s ( B )) 6 ρ ( B ) − π 4 1 6 π 1 (1 /s ( B ) − 1) . Since π 1 < π 2 , this implies that ( ρ ( B ) − π 1 ) 4 6 π 4 2 (1 − 1 /s ( B )) 4 . Therefore, g ( B ) 6 s ( B ) 2 2 π 4 2 (1 /s ( B ) − 1) 4 = π 4 2 2  1 / p s ( B ) − p s ( B )  4 < π 4 2 2 (1 / √ π 2 − √ π 2 ) 4 , (34) since F ( x ) := (1 √ x − √ x ) 4 is decreasing on I 3 := ( π 2 , 1] and s ( B ) ∈ I 3 . Note that √ π 2 − 1 / √ π 2 = − π 1 / √ π 2 . Therefore, g ( B ) < π 4 2 2 π 4 1 π 2 2 = π 2 2 2 (0 − π 1 ) 4 = g ( C 2 ,n ) < g ( C 1 ,n ) − t 32 Community Extraction in Mul tila yer Networks for t suﬃciently small. Thus, for a := min { a 1 , a 4 , 1 } , for suﬃcien tly small t w e hav e g ( B ) < g ( C 1 ,n ) − at whenever B ∈ R ( t ). This completes the pro of in the case π 1 < π 2 . Part 2 ( π 1 = π 2 ): Recall that when π 1 = π 2 w e deﬁne R ( t ) by R ( t ) :=  B ⊆ [ n ] : max {| s ( B ) − π 1 | , ρ ( B ) , 1 − ρ ( B ) } 6 t  . Hence w e will use the fact that B ∈ R ( t ) implies at least one of the inequalities (I) | s ( B ) − π 1 | > t or (I I) t < ρ ( B ) < 1 − t is true. Deﬁne the interv als I 1 := [0 , π 1 ], I 2 := ( π 1 , 1]. W e bisect 2 [ n ] , the domain of g , with the collections D i,n := { B ⊆ [ n ] : s ( B ) ∈ I i } , for i = 1 , 2. W e will prov e that the inequality g ( B ) < g ( C 1 ,n ) − at holds for all B ∈ R ( t ) on each of those collections. Supp ose B ∈ R ( t ) c ∩ D 1 ,n and inequalit y (I) is true. Then the same deriv ation yielding inequalit y (24) giv es g ( B ) < g ( C 1 ,n ) − t (1 − π 1 ) 4 for suﬃciently small t . If inequalit y (II) is true, then ( ρ ( B ) − π 1 ) 4 6 max { (1 − t − π 1 ) 4 , ( π 1 − t ) 4 } = max { ( π 2 − t ) 4 , ( π 1 − t ) 4 } = ( π 2 − t ) 4 , since π 1 = π 2 . Therefore, inequality (25) remains intact. Both inequalities hold on I 2 as well, for the roles of π 1 and π 2 ma y b e interc hanged, and the deriv ations treated symmetrically . This completes the pro of in the case π 1 = π 2 .  A.3 Pro of of Lemma 9 Recall the deﬁnitions of set modularity and population set mo dularity from Deﬁnitions 2 and 9. Deﬁne W := P ` ∈ [ L ] b Q ` ( B ) and w := P ` ∈ [ L ] q ` ( B ). Note that b y P art 1 of Lemma 6, q ` ( B ) > 0 regardless of B , a fact which will allow the application of Lemma 16 in what follo ws. W e ha ve b H ( B , L ) = | L | − 1 W 2 + , H ∗ ( B , L ) = | L | − 1 w 2 , and for any B suc h that | B | > nε , P n    b H ( B , L ) − H ∗ ( B , L )   > 4 | L | t n 2 + 52 | L | κn  = P n    W 2 + − w 2 | > 4 | L | 2 t n 2 + 52 | L | 2 κn  6 P n  max ` ∈ [ L ]   b Q ` ( B ) − q ` ( B )   > t n 2 + 13 κn  6 4 | L | exp  − κ 2 εt 2 16 n 2  for large enough t > 0, where the ﬁrst inequality follo ws from Lemma 16 for large enough n , and the second inequalit y follo ws from Lemma 19 and a union bound. Applying a union b ound o ver sets B ∈ B n yields the result.  A.4 Pro of of Lemma 12 Assume ﬁrst that k > 1. By deﬁnition, B ∈ N n,k ( A ) implies that at least one of d h ( B , C 1 ) 6 A · n · b n,k − 1 or d h ( B , C 2 ) 6 A · n · b n,k − 1 is true. Supp ose the ﬁrst inequality holds. Since d h ( B , C 1 ) = | B \ C 1 | + | C 1 \ B | , we ha ve the inequality   | B | − nπ 1   =   | B | − | C 1 |   6   | B | − | B ∩ C 1 | − | C 1 | + | B ∩ C 1 |   6   | B | − | B ∩ C 1 |   +   | C 1 | − | B ∩ C 1 |   = | B \ C 1 | + | C 1 \ B | 6 A · n · b n,k − 1 . 33 Wilson, P alowitch, Bhamidi and Nobel Alternativ ely , if d h ( B , C 2 ) 6 A · n · b n,k − 1 , w e hav e the same b ound for   | B | − nπ 2   . Therefore, since π 1 6 π 2 , B ∈ N n,k ( A ) implies that | B | > nπ 1 − A · n · b n,k − 1 . Since b n,k − 1 = o (1) as n → ∞ and ε < π 1 , this implies that for large enough n , N n,k ( A ) ⊆ B n ( ε ). By Lemma 9, therefore, for large enough n , w e ha ve P n sup N n,k ( A )    b H ( B , L ) − H ∗ ( B , L )    > 4 | L | t n 2 + 52 | L | κn ! 6 4 | L || N n,k ( A ) | exp  − κ 2 εt 2 16 n 2  (35) for all t > 0. W e no w b ound the right-hand side of inequality (35) with t replaced b y t n := n 1+ 1 2 k (log n ) 1 − 1 2 k . Note that t 2 n n 2 = 1 n 2 n 2+ 1 2 k − 1 (log n ) 2 − 1 2 k − 1 = n · n 1 2 k − 1 − 1 (log n ) 1 − 1 2 k − 1 log n = n · b n,k − 1 log n. F urthermore, b y Corollary 15 (see App endix B) we hav e | N n,k ( A ) | 6 2 exp [3 A · n · b n,k − 1 log (1 /b n,k − 1 )]. These facts yield the b ound | N n,k ( A ) | exp  − κ 2 εt 2 n 16 n 2  6 2 exp  − κ 2 ε 16 n · b n,k − 1  log n − 16 κ 2 ε 3 A log (1 /b n,k − 1 )  6 2 exp  − κ 2 ε 32 n · b n,k − 1 log n  (for large n , since 1 /b n,k − 1 = o ( n )) < 2 exp  − κ 2 ε 32 nγ 1 − ε n log n  , where the ﬁnal inequalit y follo ws from the c hoice of k satisfying 1 2 k − 1 < ε . Therefore, 4 | L || N n,k ( A ) | exp  − κ 2 εt 2 n 16 n 2  6 2 exp  − κ 2 ε 32 nγ 1 − ε n log n + O (log | L | )  (36) for large enough n . Notice now that t n /n 2 = b n,k v anishes slow er than 1 /n , and is therefore the leading order term in the expression 4 | L | t n n 2 + 52 | L | κn (see equation 35). Hence for large enough n we ha ve 4 | L | t n n 2 + 52 | L | κn 6 5 | L | b n,k . Combining this observ ation with lines (35) and (36) prov es the result in the case k > 1. If k = 1, assume A = ε . By deﬁnition, then (see Deﬁnition 11), N n,k ( A ) = B n ( ε ). Returning to inequality (35), w e note that log |B n ( ε ) | = O ( n ), and thus we can deriv e the b ound (36) with the same c hoice of t n := n 1+ 1 2 k (log n ) 1 − 1 2 k = n √ n log n . The rest of the argumen t go es through unaltered.  App endix B. T ec hnical Results Lemma 14 Fix π 1 ∈ [0 , 1] . F or e ach n , let C 1 ⊆ [ n ] b e an index set of size b nπ 1 c . L et C 2 := [ n ] \ C 1 . L et γ n ∈ [0 , 1] b e a se quenc e such that γ n → 0 and nγ n → ∞ . Then for lar ge enough n , | N ( C 1 , γ n ) | 6 exp { 3 nγ n log(1 /γ n ) } . 34 Community Extraction in Mul tila yer Networks Pro of Deﬁne the b oundary of a neighborho o d of C ⊆ [ n ] by ∂ N ( C , r ) := { B ⊆ [ n ] : d h ( B , C ) = b nr c} . Note that any B ⊆ [ n ] may b e written as the disjoin t union B = { C 2 ∩ B } ∪ { C 1 ∩ B } . Since C 1 ∩ B = C 1 \ { C 1 \ B } , for ﬁxed k ∈ [ n ] it follows that each set B ∈ ∂ N ( C, k /n ) is uniquely iden tiﬁed with choices of | C 2 ∩ B | indices from C 2 and | C 1 \ B | indices from C 1 suc h that | B ∩ C 2 | + | C 1 \ B | = | B \ C 1 | + | C 1 \ B | = d h ( B , C 1 ) = k . Therefore, we ha ve the equalit y | ∂ N ( C 1 , k ) | = k X m =0  | C 2 | m  +  | C 1 | k − m  . (37) Note that for positive integers K , N with K < N/ 2, prop erties of the geometric se ries yield the following bound:  N K  − 1 K X m =0  N m  = K X m =0 ( N − K )! K ! ( N − m )! m ! = K X m =0 K Y j = m +1 j N − j + 1 < K X m =0  K N − K + 1  m < N − ( K − 1) N − (2 K − 1) . (38) F or suﬃciently small K/ N , the righ t-hand side of inequalit y (38) is less than 2, and th us P K m =0  N m  < 2  N K  if K  N . W e apply this inequalit y to equation (37). Cho ose n large enough so that nγ n < 1 2 min {| C 1 | , | C 2 |} , which is p ossible since γ n → 0. Then for ﬁxed k 6 nγ n , we hav e that | ∂ N ( C 1 , k ) | < 2 h  | C 2 | k  +  | C 1 | k  i for large enough n . By another application of the inequality deriv ed from (38), using the fact that nγ n = o ( n ), we therefore obtain | N ( C 1 , γ n ) | = b nγ n c X k =0 | ∂ N ( C 1 , k ) | < b nγ n c X k =0 2  | C 2 | k  +  | C 1 | k  < 4  | C 2 | b nγ n c  +  | C 1 | b nγ n c  6 8  n b nγ n c  . As  N K  6  N · e K  K , we ha ve | N ( C 1 , γ n ) | 6 exp (log(8) + nγ n { log( e ) + log(1 /γ n ) } ) 6 exp(3 nγ n log(1 /γ n )) for large enough n , since 1 /γ n → ∞ . Here we giv e a short Corollary to Lemma 14 which directly serves the pro of of Lemma 12. Recall N n,k ( A ) from Deﬁnition 11 in Section 4.1.2. 35 Wilson, P alowitch, Bhamidi and Nobel Corollary 15 Fix an inte ger k > 1 . F or lar ge enough n , | N n,k ( A ) | 6 2 exp [3 A · n · b n,k − 1 log (1 /b n,k − 1 )] . Pro of The corollary follows from a direct application of Lemma 14 to N ( C 1 , A · b n,k − 1 ) and N ( C 2 , A · b n,k − 1 ). Lemma 16 L et x 1 , . . . , x k ∈ (0 , 1) b e ﬁxe d and let X 1 , . . . , X k b e arbitr ary r andom vari- ables. Deﬁne W := P i X i and w := P i x i . Then for t suﬃciently smal l, P ( | W 2 + − w 2 | > 4 k 2 t ) 6 P (max i | X i − x i | > t ) . Pro of Deﬁne D i := | X i − x i | and ﬁx t < min i x i . Then if max i D i 6 t , all X i ’s will b e p ositive, and th us W + = W and | W − w | 6 k t , by the triangle inequality . Therefore max i D i 6 t implies that | W 2 + − w 2 | = | ( W − w ) 2 + 2 w ( W − w ) | 6 k 2 t 2 + 2 w k t 6 k 2 t 2 + 2 k 2 t. (39) Th us b y the law of total probability , we ha v e P ( | W 2 + − w 2 | > 4 k 2 t ) 6 P ( {| W 2 + − w 2 | > 4 k 2 t } ∩ { max i D i 6 t } ) + P (max i D i > t ) . Inequalit y (39) implies that for suﬃcien tly small t , the ﬁrst probability on the right-hand side ab o ve is equal to 0. The result follows. In what follo ws w e state and prov e Lemma 19, a concen tration inequalit y for the mo d- ularit y of a no de set (see Deﬁnition 2) from a single-la yer SBM with n no des and tw o comm unities. W e ﬁrst giv e a few short facts ab out the 2-communit y SBM. F or for all re- sults that follo w, let s, ρ , and v (see Deﬁnition 4) correspond to the ﬁxed set B ⊆ [ n ] in each result (though sometimes we will make explicit the dep endence on B ). Deﬁne a matrix V b y V ( i, j ) := P ( i, j )(1 − P ( i, j )) for i = 1 , 2, where P is the probability matrix asso ciated with the 2-blo ck SBM. Lemma 17 Consider a single-layer SBM with n > 1 no des, two c ommunities, and p ar am- eters P and π 1 . Fix a no de set B ⊆ [ n ] with | B | > αn for some α ∈ (0 , 1) . Then 1.    E ( Y ( B )) −  | B | 2  v t P v    6 3 | B | / 2 2.    E  P u ∈ B b d ( u )  − | B | nv t P π    6 | B | 3. V ar  P u ∈ B b d ( u )  6 9 | B | n Pro of F or part 1, note that b y deﬁnition, E ( Y ( B )) = P u,v ∈ B : u 1 no des, two c ommunities, and p ar ameters P and π 1 , deﬁne κ := π T P π . Then for lar ge enough n , P    2 | b E | − n 2 κ   > t + 4 n  6 2 exp n − t 2 n 2 o for any t > 0 . Pro of Note that | b E | = Y ([ n ]). Thus part 1 of Lemma 17 with B = [ n ] yields    E ( | b E | ) −  n 2  κ    6 3 n/ 2 for large enough n . As n 2 / 2 =  n 2  + n/ 2, b y the triangle inequalit y ,    E ( | b E | ) − n 2 2 κ    6    E ( | b E | ) −  n 2  κ    + n 2 6 2 n Th us for any t > 0, Ho eﬀding’s inequality giv es P     2 b E − n 2 κ    > t + 4 n  6 P     2 b E − n 2 κ    > t + 2     E ( | b E | ) − n 2 2 κ      6 P     2 b E − 2 E ( | b E | )    > t  6 2 exp ( − 2 t 2 4  n 2  ) 6 2 exp {− t 2 /n 2 } . Lemma 19 Consider a single-layer 2-blo ck SBM having n > 1 no des and p ar ameters P and π . Fix α ∈ (0 , 1) and B ⊆ [ n ] such that | B | > αn . Then for lar ge enough n we have P n     b Q ( B ) − q ( B )    > t n 2 + 8 κn  6 4 exp  − κ 2 αt 2 16 n 2  (41) for any t > 0 . Pro of With notation laid out in Section 2.2, deﬁne e Q ( B ) := n − 1  | B | 2  − 1 / 2 ( Y ( B ) − e µ ( B )) , (42) where e µ ( B ) := P u,v ∈ B : u 0, P     b Q ( B ) − e Q ( B )    > t 2 n 2 + 2 κn  6 P     2 | b E | − n 2 κ    > κt + 4 n  6 2 exp  − κ 2 t 2 n 2  . (45) Step 2. This step relies on McDiarmid’s concen tration inequality . Recall from Section 2.1 that b X ( u, v ) denotes the indicator of edge presence b etw een nodes u and v . Note that no de pairs hav e a natural, unique ordering along the upp er-diagonal of the adjacency matrix. Deﬁne or d { u, v } = 2( u − 1) + ( v − 1), for { u, v } ∈ [ n ] 2 with u < v (e.g. or d { 1 , 2 } = 1, or d { 1 , 3 } = 2, etc.). F or all n > 1 and i 6 n ( n − 1) / 2, deﬁne b Z ( i ) := b X ( u, v ) such that or d { u, v } = i . If or d { u, v } = i , we call { u, v } the “ i -th ordered no de pair”. Deﬁne the set I ( B ) := { i : the i -th ordered no de pair has at least one no de in B } and let b Z ( B ) := { b Z ( i ) : i ∈ I ( B ) } . Note that the proxy score e Q ( B ) is a function f ( z 1 , z 2 , . . . ) of the indicators b Z ( B ). Consider a ﬁxe d indicator set Z ( B ). F or each j ∈ I ( B ), deﬁne Z j ( B ) := { Z j ( i ) : i ∈ I ( B ) } with Z j ( B ) := ( Z j ( i ) = 1 − Z ( i ) , i = j Z j ( i ) = Z ( i ) , i 6 = j (46) T o apply McDiarmid’s inequality , we must b ound ∆( j ) := | f ( Z ( B )) − f ( Z j ( B )) | uniformly o ver j ∈ I ( B ). Fix j ∈ I ( B ) and let { u 0 , v 0 } b e the j -th ordered edge. Without loss of generalit y , we assume Z ( j ) = 1. Since f ( Z ( B )) = Q ( B ), f ( Z ( B )) has a representation in terms of Y ( B ) and e µ ( B ). W e let Y j ( B ) and e µ j ( B ) corresp ond to f ( Z ( B ) j ). Notice that n  | B | 2  1 / 2 ∆( j ) =   Y ( B ) − Y j ( B ) −  e µ ( B ) − e µ j ( B )    . (47) 39 Wilson, P alowitch, Bhamidi and Nobel W e b ound the right hand side of equation (47) in t wo cases: (i) u 0 , v 0 ∈ B , and (ii) u 0 / ∈ B , v 0 ∈ B . In case (i), Y ( B ) − Y j ( B ) = 1, and e µ ( B ) − e µ i ( B ) = X u,v ∈ B ; u 6 = v d ( u ) d ( v ) − d j ( u ) d j ( v ) n 2 κ = d ( u 0 ) d ( v 0 ) − d j ( u 0 ) d j ( v 0 ) n 2 κ = d ( u 0 ) d ( v 0 ) − ( d ( u 0 ) − 1)( d ( v 0 ) − 1) n 2 κ = d ( u 0 ) + d ( v 0 ) − 1 n 2 κ , whic h is b ounded in the interv al (0 , 1) for large enough n . Thus in case (i), ∆( j ) 6 2  | B | 2  − 1 / 2 b y the triangle inequality , for large enough n . In case (ii), Y ( B ) − Y 0 ( B ) = 0, and e µ ( B ) − e µ j ( B ) = X u,v ∈ B ; u 6 = v d ( u ) d ( v ) − d j ( u ) d j ( v ) n 2 κ = X u ∈ B ; u 6 = v 0 d ( u )  d ( v 0 ) − d j ( v 0 )  n 2 κ = X u ∈ B ; u 6 = v 0 d ( u ) n 2 κ 6 n | B | n 2 κ 6 κ − 1 . Hence due to equation (47), w e ha ve for suﬃciently large n that ∆( j ) 6 n − 1  | B | 2  − 1 / 2 · max { 2 , κ − 1 } 6 n − 1  | B | 2  − 1 / 2 · 2 · κ − 1 (48) for all j ∈ I ( B ), as κ 6 1. Since |I ( B ) | =  | B | 2  + | B || B C | 6 n | B | , McDiarmid’s b ounded- diﬀerence inequality implies that for suﬃciently large n , P     e Q ( B ) − E  e Q ( B )     > t n  = 2 exp  − t 2 n | B | ∆( j )  6 2 exp − κ 2 n 2  | B | 2  t 2 4 n 3 | B | ! 6 2 exp  − κ 2 ( | B | − 1) t 2 8 n  6 2 exp  − κ 2 αt 2 16  for any t > 0. Replacing t by t/n giv es P     e Q ( B ) − E  e Q ( B )     > t n 2  6 2 exp  − κ 2 αt 2 16 n 2  . (49) Step 3. T urning our attention to E ( e Q ( B )), recall that n  | B | 2  1 / 2 e Q ( B ) = Y ( B ) − e µ ( B ) and that e µ ( B ) := P u,v ∈ B ; u t 2 n 2 + 2 κn  6 2 exp  − κ 2 t 2 n 2  41 Wilson, P alowitch, Bhamidi and Nobel (ii) P     e Q ( B ) − E  e Q ( B )     > t n 2  6 2 exp  − κ 2 αt 2 16 n 2  (iii) There exists c with | c | < 8 /κ + 3 such that for large enough n , E  b Q ( B )  = q ( B ) + c/n Noting that α / 16 < 1, we apply a union b ound to the results of steps (i) and (ii): P     b Q ( B ) − E  e Q ( B )     > t n 2 + 2 κn  6 4 exp  − κ 2 αt 2 16 n 2  (52) Applying the inequality | x − a | > | x | − | a | with (iii) and some algebra gives the result. App endix C. Comp eting Metho ds In Sections 6 and 7, w e compare and contrast the p erformance of Multila yer Extraction with the following metho ds: Sp e ctr al clustering (Newman, 2006a) : an iterativ e algorithm based on the sp ectral prop erties of the mo dularit y matrix of an observed netw ork. In the ﬁrst step, the mo dularit y matrix of the observed netw ork is calculated and its leading eigen vector is iden tiﬁed. The graph is divided into tw o disjoint communities so that eac h vertex is assigned according to its sign in the leading eigen vector. Next, the mo dularit y matrix is calculated for b oth of the subgraphs corresp onding to the previous division. If the modularity of the partition increases, these communities are once again divided in to tw o disjoin t communities, and the pro cedure is rep eated in this fashion un til the mo dularit y no longer increases. F or the desired igraph ob ject graph , the call for this in R was: cluster_leading_eigen(graph, steps = -1, weights = NULL, start = NULL, options = arpack_defaults, callback = NULL, extra = NULL, env = parent.frame()) L ab el Pr op agation (R aghavan et al., 2007) : an iterative algorithm based on propa- gation through the netw ork. At the ﬁrst step, all vertices are randomly assigned a comm unity lab el. Sequentially , the algorithm chooses a single v ertex and up dates the lab els of its neighborho o d to b e the ma jorit y lab el of the neighborho o d. The algo- rithm contin ues up dating lab els in this wa y until no up dates are av ailable. F or the desired igraph ob ject graph , the call for this in R was: cluster_label_prop(graph, weights = E(graph)$weight, initial = NULL, fixed = NULL) F ast and gr e e dy (Clauset et al., 2004) : an iterativ e and greedy algorithm that seeks a partition of v ertices with maximum mo dularity . The algorithm is an agglomerativ e approac h that is a mo diﬁcation of the Kernighan-Lin algorithm commonly used in the iden tiﬁcation of communit y structure in net work. F or the desired igraph ob ject graph , the call for this in R was: cluster_fast_greedy(graph, merges = TRUE, weights = E(graph)$weight, modularity = TRUE, membership = TRUE, weights = NULL) 42 Community Extraction in Mul tila yer Networks Walktr ap (Pons and L atapy, 2005) : an agglomerative algorithm that seeks a partition of vertices that minimizes the total length of a random walk within eac h communit y . A t the ﬁrst stage, eac h vertex of the netw ork is placed in its own communit y . At eac h subsequent stage, the t wo closest comm unities (according to walk distance) are merged. This process is contin ued un til all v ertices hav e b een merged in to one large comm unity , and a communit y dendrogram is formed. The partition with the smallest random walk distance is chosen as the ﬁnal partition. cluster_walktrap(graph, weights = E(graph)$weight, steps = 4, merges = TRUE, modularity = TRUE, membership = TRUE) GenL ouvain (Jutla et al., 2011) : a m ultilay er generalization of the iterativ e Gen- Louv ain algorithm. This algorithm seeks a partition of the v ertices and lay ers that maximizes the multila y er mo dularit y of the netw ork, as describ ed in (Mucha et al., 2010). W e use the MA TLAB implemen tation from (Jutla et al., 2011) of GenLouv ain with resolution parameter set to 1, and argument randmove = moverandw . Infomap (De Domenic o et al., 2014) : a multila yer generalization of the Infomap algo- rithm from Rosv all and Bergstrom (2008). This algorithm seeks to identify a partition of the vertices and la yers that minimize the generalized map equation, whic h measures the description length of a random w alk on the partition. W e use the C++ m ultiplex implemen tation of Infomap pro vided at http://www.mapequation.org/code.html . In implementation, we set the argumen ts of the function to p ermit ov erlapping com- m unities, and set the algorithm to ignore self-lo ops. F or the ﬁrst four metho ds, we use the default settings from the igr aph pack age version 0.7.1 set in R . App endix D. Extraction Sim ulations D.1 Simulation W e now inv estigate several intrinsic prop erties of Multila yer Extraction b y applying the metho d to m ultilay er netw orks with several t yp es of communit y structure, including I) disjoin t, I I) ov erlapping, I I I) p ersisten t, IV) non-p ersistent, and V) hierarchical struc- ture. Figure 7 illustrates six m ultilay er netw orks that we analyze for this purp ose. Each sim ulated net work con tains 1000 no des and 90 la yers. Embedded comm unities ha ve inner connection probability 0.15; whereas, the remaining vertices indep enden tly connected to all other vertices with probabilit y 0.05. D.2 Results In the disjoint, o verlapping, p ersisten t, and non-persistent net works (I, II, II I, and IV, resp ectiv ely), Multilay er Extraction iden tiﬁes comm unities that p erfectly match the true em b edded communities. On the other hand, in the hierarchical communit y setting, Multi- la yer Extraction is unable to identify the full set of comm unities. In example V, Multilay er 43 Wilson, P alowitch, Bhamidi and Nobel (a) Disjoint (b) Overlapping (c) Homogeneous (e) Hierar chical Split 1 1 1 1 1 1 1 1 90 90 90 90 1000 1000 1000 1000 300 300 300 300 600 600 600 600 250 30 30 30 60 60 60 60 15 Layers V ertices (d) Heter ogeneous 1 1 90 1000 300 600 30 60 15 (f) Hierar chical Embedding 1 1 90 1000 300 600 30 60 I. Disjoint II. Overlapping III. Persistent IV . Non-persistent V . Hierarchical Embed VI. Hierarchical Split 1 1 1 1 1 1 2 2 2 2 2 2 3 Figure 7: Simulation test b ed for extraction procedures. Eac h graphic displays a m ultilay er net work on 1000 no des and 90 lay ers. In each plot, shaded rectangles are placed ov er the no des (ro ws) and lay ers (columns) that are included in a multila yer comm unity . Communities are lab eled by num b er. V ertices within the same comm unity are randomly connected with probabilit y 0.15 while all other v ertices hav e connection probabilit y 0.05 to vertices in their respective lay er. Extraction do es not identify communit y 1, and in example VI Extraction identiﬁes a commu- nit y with vertices 1 - 300 across lay ers 1 - 60, whic h combines communit y 1 and communit y 2. T ogether, these results suggest t wo prop erties of the Multila yer Extraction pro cedure. First, the metho d can eﬃcien tly iden tify disjoin t and ov erlapping communit y structure in m ultilay er netw orks with heterogeneous communit y structure. Second, Multila yer Extrac- tion tends to disregard comm unities with a large n umber of v ertices (e.g. communities that include ov er half of the v ertices in a net work). The inv erse relationship betw een the score and the n umber of vertices in a communit y may provide some justiﬁcation as to why this is the case. In netw orks with large comm unities, one can in principle mo dify the score by in tro ducing a reward for large collections. W e plan to pursue this further in future research. References E. M. Airoldi, D. M. Blei, S. E. Fien b erg, and E. P . Xing. Mixed mem b ership sto c hastic blo c kmo dels. The Journal of Machine L e arning R ese ar ch , 9:1981–2014, 2008. Gary D Bader and Christopher WV Hogue. An automated metho d for ﬁnding molecular complexes in large protein interaction net works. BMC bioinformatics , 4(1):2, 2003. Matteo Barigozzi, Giorgio F agiolo, and Giusepp e Mangioni. Identifying the comm unity structure of the international-trade multi-net w ork. Physic a A: statistic al me chanics and its applic ations , 390(11):2051–2066, 2011. 44 Community Extraction in Mul tila yer Networks Danielle S. Bassett, Nic holas F. Wym bs, Mason A. P orter, P eter J. Muc ha, Jean M. Carlson, and Scott T. Grafton. Dynamic reconﬁguration of human brain netw orks during learning. Pr o c e e dings of the National A c ademy of Scienc es , 108(18):7641 7646, May 2011. doi: 10.1073/pnas.1018985108. E. A. Bender and E. R. Canﬁeld. The asymptotic num b er of lab eled graphs with given degree sequences. Journal of Combinatorial The ory, Series A , 24(3):296–307, 1978. Mic hele Berlingerio, Mic hele Coscia, and F osca Giannotti. Finding redundant and com- plemen tary comm unities in multidimensional net works. In Pr o c e e dings of the 20th A CM international c onfer enc e on Information and know le dge management , pages 2181–2184. A CM, 2011. Mic hele Berlingerio, F abio Pinelli, and F rancesco Calabrese. Abacus: frequent pattern mining-based comm unity disco very in multidimensional netw orks. Data Mining and Know le dge Disc overy , 27(3):294–320, 2013. P eter J Bic kel and Aiy ou Chen. A nonparametric view of net work models and newman– girv an and other mo dularities. Pr o c e e dings of the National A c ademy of Scienc es , 106(50): 21068–21073, 2009. Stefano Bo ccaletti, G Bianconi, R Criado, CI Del Genio, J G´ omez-Garde ˜ nes, M Romance, I Sendina-Nadal, Z W ang, and M Zanin. The structure and dynamics of multila yer net works. Physics R ep orts , 544(1):1–122, 2014. B. Bollob´ as and A. Univ ersitet. A pr ob abilistic pr o of of an asymptotic formula for the numb er of lab el le d r e gular gr aphs . Aarhus Univ ersitet, 1979. Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert G¨ orke, Martin Ho efer, Zoran Nik oloski, and Dorothea W agner. Maximizing mo dularit y is hard. arXiv pr eprint physics/0608255 , 2006. Alessio Cardillo, Jes ´ us G´ omez-Garde ˜ nes, Massimiliano Zanin, Miguel Romance, David P ap o, F rancisco Del P ozo, and Stefano Bo ccaletti. Emergence of netw ork features from m ultiplexity . Scientiﬁc r ep orts , 3, 2013. F an RK Chung. Sp e ctr al gr aph the ory , volume 92. American Mathematical So c., 1997. A. Clauset, M.E.J. Newman, and C. Mo ore. Finding communit y structure in very large net works. Physic al r eview E , 70(6):066111, 2004. Aaron Clauset. Finding lo cal communit y structure in netw orks. Physic al r eview E , 72(2): 026132, 2005. Amin Co ja-Oghlan and Andr´ e Lank a. Finding plan ted partitions in random graphs with general degree distributions. SIAM Journal on Discr ete Mathematics , 23(4):1682–1714, 2009. Manlio De Domenico, Alb ert Sol´ e-Ribalta, Eman uele Cozzo, Mikk o Kiv el¨ a, Y amir Moreno, Mason A P orter, Sergio G´ omez, and Alex Arenas. Mathematical form ulation of m ultilay er net works. Physic al R eview X , 3(4):041022, 2013. 45 Wilson, P alowitch, Bhamidi and Nobel Manlio De Domenico, Andrea Lancic hinetti, Alex Arenas, and Martin Rosv all. Iden tifying mo dular ﬂo ws on multila yer net works reveals highly o verlapping organization in so cial systems. arXiv pr eprint arXiv:1408.2925 , 2014. Aurelien Decelle, Floren t Krzak ala, Cristopher Moore, and Lenk a Zdeborov´ a. Asymptotic analysis of the sto chastic blo c k mo del for mo dular net works and its algorithmic applica- tions. Physic al R eview E , 84(6):066106, 2011. Dario F asino and F rancesco T udisco. Mo dularit y b ounds for clusters lo cated b y leading eigen vectors of the normalized mo dularit y matrix. arXiv pr eprint arXiv:1602.05457 , 2016. Simone F erriani, F abio F on ti, and Raﬀaele Corrado. The so cial and economic bases of net work multiplexit y: Exploring the emergence of multiplex ties. Str ate gic Or ganization , 11(1):7–34, 2013. Stephen E Fien b erg, Michael M Meyer, and Stanley S W asserman. Analyzing data from m ultiv ariate directed graphs: An application to social netw orks. T echnical rep ort, DTIC Do cumen t, 1980. Stephen E Fien b erg, Michael M Mey er, and Stanley S W asserman. Statistical analysis of m ultiple so ciometric relations. Journal of the A meric an Statistic al Asso ciation , 80(389): 51–67, 1985. S. F ortunato. Comm unity detection in graphs. Physics R ep orts , 486(3):75–174, 2010. Da vid F Gleich and C Seshadhri. V ertex neighborho o ds, lo w conductance cuts, and goo d seeds for lo cal communit y metho ds. In Pr o c e e dings of the 18th A CM SIGKDD interna- tional c onfer enc e on Know le dge disc overy and data mining , pages 597–605. ACM, 2012. D. Greene, D. Doyle, and P . Cunningham. T racking the ev olution of communities in dynamic so cial netw orks. International Confer enc e on A dvanc es in So cial Networks Analysis and Mining (ASONAM) , page 176183, 2010. doi: 10.1109/ASONAM.2010.17. Qiuyi Han, Kevin S Xu, and Edoardo M Airoldi. Consistent estimation of dynamic and m ulti-lay er netw orks. arXiv pr eprint arXiv:1410.8597 , 2014. P aul W Holland, Kathryn Blac kmond Laskey , and Sam uel Leinhardt. Sto chastic blockmod- els: First steps. So cial networks , 5(2):109–137, 1983. Inderjit S Jutla, Lucas GS Jeub, and P eter J Mucha. A generalized louv ain method for communit y detection implemented in matlab. URL http://netwiki. amath. unc. e du/GenL ouvain , 2011. Mikk o Kiv el¨ a, Alex Arenas, Marc Barthelem y , James P Gleeson, Y amir Moreno, and Ma- son A Porter. Multila yer net works. Journal of Complex Networks , 2(3):203–271, 2014. A. Lancichinetti, F. Radicc hi, J. J. Ramasco, and S. F ortunato. Finding statistically signif- ican t comm unities in netw orks. PloS one , 6(4):e18961, 2011. 46 Community Extraction in Mul tila yer Networks Jure Lesko v ec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney . Statistical prop erties of communit y structure in large so cial and information netw orks. In Pr o c e e dings of the 17th international c onfer enc e on World Wide Web , pages 695–704. ACM, 2008. Anna CF Lewis, Nick S. Jones, Mason A. Porter, and Charlotte M. Deane. The function of comm unities in protein in teraction netw orks at multiple scales. BMC Systems Biolo gy , 4 (1):114, Dec 2010. ISSN 1752-0509. doi: 10.1186/1752- 0509- 4- 100. Elc hanan Mossel, Jo e Neeman, and Allan Sly . Sto chastic blo ck mo dels and reconstruction. arXiv pr eprint arXiv:1202.1499 , 2012. P eter J Mucha, Thomas Ric hardson, Kevin Macon, Mason A P orter, and Jukk a-Pekk a Onnela. Communit y structure in time-dep endent, multiscale, and multiplex net works. Scienc e , 328(5980):876–878, 2010. Ra j Rao Nadakuditi and Mark EJ Newman. Graph sp ectra and the detectability of com- m unity structure in netw orks. Physic al r eview letters , 108(18):188701, 2012. Mark EJ Newman. Finding communit y structure in net works using the eigenv ectors of matrices. Physic al r eview E , 74(3):036104, 2006a. Mark EJ Newman. Mo dularit y and communit y structure in net works. Pr o c e e dings of the National A c ademy of Scienc es , 103(23):8577–8582, 2006b. M.E.J. Newman. Detecting communit y structure in net works. The Eur op e an Physic al Journal B-Condense d Matter and Complex Systems , 38(2):321–330, 2004. Jukk a-Pekk a Onnela, Sam uel Arb esman, Marta C. Gonzlez, Alb ert-Lszl Barabsi, and Nic holas A. Christakis. Geographic constrain ts on so cial netw ork groups. PL oS ONE , 6 (4):e16939, Apr 2011. doi: 10.1371/journal.p one.0016939. John Palo witc h, Shank ar Bhamidi, and Andrew B Nob el. The con tinuous conﬁgura- tion model: A n ull for comm unity detection on weigh ted net works. arXiv pr eprint arXiv:1601.05630 , 2016. Ka veri S P arker, James D Wilson, Jonas Marsc hall, P eter J Mucha, and Jeﬀrey P Hen- derson. Net work analysis reveals sex-and antibiotic resistance-asso ciated antivirulence targets in clinical uropathogens. ACS Infe ctious Dise ases , 1(11):523–532, 2015. Subhadeep Paul and Y uguo Chen. Communit y detection in multi-relational data with restricted multi-la yer sto c hastic blo c kmo del. arXiv pr eprint arXiv:1506.02699 , 2015. Subhadeep Paul and Y uguo Chen. Null models and mo dularit y based comm unity detection in multi-la yer net works. arXiv pr eprint arXiv:1608.00623 , 2016. Tiago P P eixoto. Inferring the mesoscale structure of la yered, edge-v alued, and time-v arying net works. Physic al R eview E , 92(4):042807, 2015. P ascal Pons and Matthieu Latap y . Computing comm unities in large net w orks using random w alks. In Computer and Information Scienc es-ISCIS 2005 , pages 284–293. Springer, 2005. 47 Wilson, P alowitch, Bhamidi and Nobel Mason A P orter, Jukk a-P ekk a Onnela, and Peter J Muc ha. Comm unities in netw orks. Notic es of the AMS , 56(9):1082–1097, 2009. Usha Nandini Raghav an, R ´ ek a Alb ert, and Soundar Kumara. Near linear time algorithm to detect communit y structures in large-scale netw orks. Physic al R eview E , 76(3):036106, 2007. Matthew Rocklin and Ali Pinar. On clustering on graphs with m ultiple edge types. Internet Mathematics , 9(1):82–112, 2013. M. Rosv all and C.T. Bergstrom. Maps of random walks on complex netw orks reveal com- m unity structure. Pr o c e e dings of the National A c ademy of Scienc es , 105(4):1118–1123, 2008. T om AB Snijders and Krzysztof Nowic ki. Estimation and prediction for sto chastic blo c k- mo dels for graphs with latent blo ck structure. Journal of Classiﬁc ation , 14(1):75–100, 1997. Olaf Sp orns. Networks of the Br ain . MIT press, 2011. Natalie Stanley , Saray Shai, Dane T a ylor, and Peter Muc ha. Clustering netw ork lay ers with the strata multila yer stochastic blo c k mo del. IEEE , 2016. Eman uele Strano, Sara y Shai, Simon Dobson, and Marc Barthelemy . Multiplex netw orks in metrop olitan areas: generic features and lo cal eﬀects. Journal of The R oyal So ciety Interfac e , 12(111):20150651, 2015. Ulrik e V on Luxburg. A tutorial on sp ectral clustering. Statistics and c omputing , 17(4): 395–416, 2007. Y. J. W ang and G. Y. W ong. Sto c hastic blo c kmo dels for directed graphs. Journal of the A meric an Statistic al Asso ciation , 82(397):8–19, 1987. Stanley W asserman and Joseph Galaskiewicz. A dvanc es in so cial network analysis: R ese ar ch in the so cial and b ehavior al scienc es , volume 171. Sage Publications, 1994. James Wilson, Shank ar Bhamidi, and Andrew Nob el. Measuring the statistical signiﬁcance of lo cal connections in directed netw orks. In NIPS Workshop on F r ontiers of Network A nalysis: Metho ds, Mo dels, and Applic ations. , 2013. James D Wilson, Simi W ang, Peter J Mucha, Shank ar Bhamidi, and Andrew B Nob el. A testing based extraction algorithm for identifying signiﬁcan t comm unities in net works. The Annals of Applie d Statistics , 8(3):1853–1891, 2014. James D Wilson, Matthew J Denn y , Shank ar Bhamidi, Skyler J Cranmer, and Bruce A Desmarais. Sto c hastic weigh ted graphs: Flexible mo del sp eciﬁcation and simulation. So cial Networks , 49:37–47, 2017. Y. Zhao, E. Levina, and J. Zhu. Comm unity extraction for social netw orks. Pr o c e e dings of the National A c ademy of Scienc es , 108(18):7321–7326, 2011. 48 Community Extraction in Mul tila yer Networks Y unp eng Zhao, Elizav eta Levina, Ji Zh u, et al. Consistency of communit y detection in net works under degree-corrected sto chastic blo ck mo dels. The Annals of Statistics , 40 (4):2266–2292, 2012. 49

Community extraction in multilayer networks with heterogeneous community structure

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment