Inference of Edge Correlations in Multilayer Networks

Inference of Edge Correlations in Multila y er Net w orks A. Ro xana Pamﬁl and Sam D. Ho wison Mathematic al Institute, University of Oxfor d Mason A. P orter Dep artment of Mathematics, University of California L os Angeles and Mathematic al Institute, University of Oxfor d (Dated: Septem b er 10, 2020) Man y recen t dev elopments in netw ork analysis hav e focused on m ultilay er netw orks, whic h one can use to enco de time-dep endent interactions, multiple types of interactions, and other complications that arise in complex systems. Like their monolay er coun terparts, multila y er net works in applica- tions often ha v e mesoscale features, suc h as comm unit y structure. A prominen t t ype of metho d for inferring such structures is the employmen t of m ultila yer stochastic block mo dels (SBMs). A com- mon (but p otentially inadequate) assumption of these models is the sampling of edges in diﬀerent la yers indep endently , conditioned on the comm unity lab els of the no des. In this pap er, w e relax this assumption of indep endence b y incorp orating edge correlations into an SBM-lik e mo del. W e deriv e maximum-lik elihoo d estimates of the key parameters of our mo del, and we prop ose a mea- sure of la y er correlation that reﬂects the similarit y b etw een connectivity patterns in diﬀeren t lay ers. Finally , w e explain how to use correlated mo dels for edge “prediction” (i.e., inference) in multila y er net works. By taking into account edge correlations, prediction accuracy improv es both in synthetic net works and in a temp oral netw ork of shopp ers who are connected to previously-purc hased gro cery pro ducts. I. INTR ODUCTION A net work is an abstract representation of a system in which entities called “no des” in teract with eac h other via connections called “edges” [1]. Most traditionally , in a type of net work called a “graph”, each edge enco des an interaction b et ween a pair of no des. Net works arise in man y domains and are useful for n umerous practical problems, such as detecting b ot accounts on Twitter [2], ﬁnding vulnerabilities in electrical grids [3], and identi- fying p otentially harmful in teractions b etw een drugs [4]. A common feature of man y netw orks is mesoscale (i.e., in termediate-scale) structures. Detecting suc h structures amoun ts to a t yp e of coarse-graining, pro viding represen- tations of a net w ork that are more compact than listing all of the nodes and edges. Types of mesoscale structures include communit y structure [5], core–p eriphery struc- ture [6, 7], role similarity [8], and others. An increas- ingly popular approach for mo deling and detecting such structures is b y using sto chastic blo c k models (SBMs) [9], a generativ e mo del that can pro duce netw orks with comm unity structure or other mesoscale structures. F or many applications of net work analysis, it is imp or- tan t to mov e b eyon d ordinary graphs (i.e., “monolay er net works”) to examine more complicated netw ork struc- tures, suc h as collections of interrelated net works. One can study such structures through the ﬂexible lens of m ultilay er netw orks [10 – 13]. Similar to monola yer net- w orks, a multila y er netw ork consists of a collection of “state no des” that are connected pairwise by edges. A state no de is a manifestation of a “ph ysical no de” (which w e will also sometimes call simply a “no de”), whic h rep- resen ts some entit y , in a speciﬁc lay er. Diﬀeren t la yers ma y corresp ond to interactions in diﬀeren t time perio ds (yielding a temp oral net work), diﬀeren t t ypes of relations (yielding a m ultiplex netw ork), or other p ossibilities. As in the setting of monola yer net w orks, mo deling and in- ferring mesoscale structures in m ultilay er netw orks is a prominen t researc h area. A k ey assumption of almost all existing mo dels of m ul- tila yer netw orks with mesoscale structure is that edges are generated indep endently , conditioned on a multila y er partition [14 – 21]. This indep endence condition applies b oth within eac h la yer (which is inconsistent with the fact that real net works often include 3-cliques and other small-scale structures) and across la y ers (which is incon- sisten t with the fact that the same no des are often adja- cen t to each other in multiple la y ers). In this pap er, we fo cus on relaxing the edge-indep endence assumption that applies to edges b etw een the same t w o physical no des in diﬀeren t la yers. W e still consider each pair of no des in- dep enden tly . In Fig. 1, we show an example of a t wo-la y er net work with b oth strong p ositive and strong negative edge cor- relations. Incorp orating such correlations into a net work mo del is b eneﬁcial for many applications. F or example, a multiplex netw ork of air routes, where eac h la y er cor- resp onds to one airline, is lik ely to include some popu- lar routes that app ear in m ultiple lay ers (and unp opular routes may app ear only in one lay er)[22]. In a temp o- ral so cial net work, we exp ect p eople to hav e rep eated in teractions with other p eople [23]; this is a stronger statemen t than just sa ying that they tend to in teract more within the same communit y ov er time. Such edge p ersistence is also common in many bipartite user–item net works: shopp ers tend to buy the same gro cery pro d- ucts ov er time [24], customers of a music-streaming plat- form listen rep eatedly to their fav orite songs [25], and 2 Wikip edia users edit speciﬁc pages sev eral times [26]. FIG. 1: Example of a correlated multila yer net work with t w o la y ers and with t w o blocks of no des in each la yer. Edges betw een tw o blac k no des are positively correlated across lay ers, edges b et ween tw o white nodes are negativ ely correlated across la y ers, and edges b et ween a blac k no de and a white no de are uncorrelated across la y ers. Multila yer net w ork mo dels that incorp orate edge cor- relations hav e many imp ortant applications. One is the inference task of e dge pr e diction (also called link pr e dic- tion ), where one seeks to assign probabilities of o ccur- rence to unobserved edges. SBMs ha ve often been used for edge prediction for both monola yer net works [4, 9, 27] and multila yer net works [15, 20, 28]. The mo dels that w e prop ose should yield b etter p erformance than past eﬀorts, as they take adv antage of in terlay er edge correla- tions in data. W e discuss this in more detail in Sec. I I I. Another application is to gr aph matching [29], where one seeks to infer a laten t correspondence b etw een no des in t wo diﬀeren t netw orks when one do es not know the iden- tities of the nodes. F or example, one ma y wish to matc h common users b etw een anonymized Twitter and F ace- b o ok netw orks. In a series of pap ers [30 – 32], Lyzin- ski and collab orators established conditions under which graph matc hing is successful. They tested their metho ds on correlated Erd˝ os–R ´ en yi (ER) netw orks and correlated SBMs, which we also in vestigate in this pap er (alb eit for a diﬀerent purp ose). W e tak e these mo dels further b y incorp orating degree correction [33], whic h generates net works with heterogeneous degree distributions and is imp ortan t for inference using SBM. This extension may allo w one to study the graph-matching problem on more realistic net work models. A third application of our work is eﬃcient computation of correlations betw een pairs of la yers of a multila yer net work. One can use such correla- tion estimates to quantify the similarit y b etw een diﬀer- en t la yers and potentially to compress multila yer net work data b y discarding (or merging) la yers that are strongly p ositiv ely correlated with an existing la yer [34]. Previous pap ers ha ve fo cused primarily on no de-centric notions of la yer similarity [35 – 37], whereas our correlated mo dels yield edge-cen tric measures of similarit y . Beneﬁts of our approac h ov er related studies [34, 38, 39] include the fact that correlation v alues cov er an intuitiv e range (b etw een − 1 and 1) and that they w ork equally well for quan tifying la yer similarity and dissimilarit y . Our edge-correlated netw ork mo dels are also useful for comm unity detection. Given a m ultila yer netw ork, one can design an inference algorithm that determines both the parameters that describ e the edge probabilities (and correlations) and a m ultilay er comm unity structure that underlies these probabilities. Solving this inference prob- lem enables the detection of “correlated comm unities”. Because of the additional complexity in the mo del, this is b ound to b e more diﬃcult than standard multila yer comm unity dete ction, so w e lea ve this inference problem for future work. Instead, for the rest of the pap er, we assume that we know the block structure of a netw ork; w e infer the remaining parameters, including the correla- tions that are the core element of our mo del [40]. W e also assume that the blo ck assignments g are the same for all la yers; the case in which communities can v ary arbitrar- ily across lay ers is signiﬁcantly more diﬃcult [41], and w e leav e its consideration for future w ork. With these restrictions, one can determine g using any metho d of c hoice. Some existing mo dels of m ultilay er net works incorpo- rate interla yer dep endencies b y prescribing joint degree distributions [36, 42, 43], b y incorp orating edge ov erlaps [44, 45], or by modeling the app earance of new edges through preferential-attac hment mechanisms [46]. The mo dels that w e describ e in this pap er are similar to those that w ere in tro duced in [30 – 32] for graph-matc hing pur- p oses. Another notew orthy paper is one by Barucca et al. [47] that described a generalized v ersion of the tem- p oral SBM of Ghasemian et al. [17]. This generaliza- tion includes an “edge-persistence” parameter ξ , which giv es the probability that an edge from one lay e r also o ccurs in the next temp oral lay er. F or several reasons, w e take a diﬀeren t approach. First, the mo del of [47] is speciﬁc to temporal netw orks, whereas w e are also in- terested in other t yp es of m ultilay er net works. Second, their model does not easily incorporate degree correction. Third, we w ant to include correlations explicitly in the mo del, rather than implicitly using the edge-p ersistence parameter ξ . Our paper pro ceeds as follo ws. In Sec. I I, we describ e our models of m ultilay er net works with edge correlations. W e start with a simple example of correlated Erd˝ os– R ´ en yi (ER) graphs in Sec. II A to mak e our exp osition for more complicated models easier to follo w. In Sec. II B, w e integrate mesoscale structures by incorporating cor- relations in an SBM-like mo del. W e then introduce de- gree correction in Sec. I I C. F or all of these mo dels, we deriv e maxim um-likelihoo d (ML) estimates both of the marginal edge-existence probabilities in each lay er and of the interla y er correlations. ML estimation is common for SBMs and DCSBMs in b oth monolay er netw orks [33] and multila y er net works [16]. Although ML estimation is less p ow erful than p erforming Bay esian inference [9], the former is consistent for b oth SBMs and DCSBMs 3 [48] and it reco vers many common techniques for detect- ing mesoscale structures in net works [49]. In Sec. I I I, w e describ e ho w to use our mo dels for edge prediction, and w e provide some results for synthetic netw orks. W e then pro ceed with applications in Sec. IV. In Sec. IV A, w e use our mo dels to estimate pairwise la yer correlations in several empirical net works. In Sec. IV B, w e use our correlated models for edge prediction in a temporal net- w ork of gro cery purc hases. W e summarize our results in Sec. V and discuss a few ideas for future work. I I. CORRELA TED MODELS In our deriv ations, w e consider just tw o netw ork lay ers at a time. Although this may seem limiting, we can apply our framework to generate correlated netw orks with more than tw o la yers in a sequential manner (see the discus- sion in Sec. I I A), and w e can determine pairwise lay er correlations for a netw ork with arbitrarily many lay ers (see the applications in Sec. IV). In Sec. V, we brieﬂy discuss the c hallenges that arise when mo deling three or more la yers simultaneously , rather than in a sequen tial pairwise fashion. Consider a netw ork with tw o lay ers and identical sets of no des in each la yer; this is known as a no de-aligne d m ultilay er net work [10]. Let A 1 and A 2 denote the ad- jacency matrices of our t wo netw ork la yers. As in man y generativ e mo dels of netw orks, w e assume that edges in these t wo lay ers are generated b y some random pro cess, so the entries A 1 ij and A 2 ij are random v ariables. Imp os- ing some statistical correlation b etw een these tw o sets of random v ariables in tro duces interla yer correlations in the resulting m ultilay er netw ork structure. Our goal is to prop ose a model of correlated netw orks in which each lay er is, marginally , a degree-corrected sto c hastic blo ck mo del (DCSBM) [33]. How ever, it is instructiv e to ﬁrst consider the simpler cases in which eac h la yer is marginally an Erd˝ os–R´ en yi random graph (see Sec. I I A) or an SBM without degree correction (see Sec. I I B). Correlated ER mo dels and correlated SBMs ha ve b een studied previously , most notably in work by Lyzinski et al. [30–32] on the graph-matching problem. Ho wev er, our use of these mo dels for estimating lay er correlations is nov el, as are the correlated DCSBMs that w e propose in Sec. I I C. In monolay er SBMs, it is common to use either Bernoulli or P oisson random v ariables to generate edges b et ween nodes. The former is generally more accurate, b ecause it do es not yield multiedges; how ever, the lat- ter is more common, as it often simpliﬁes calculations considerably [48, 50]. Nev ertheless, we hav e found that Bernoulli mo dels are simpler when incorp orating corre- lations [24]. They also ha ve sev eral other adv antages, including the fact that they work for b oth sparse and dense net works and that they can handle the en tire cor- relation range b etw een − 1 and 1. Therefore, w e consider only Bernoulli mo dels in this pap er. A. Correlated Erd˝ os–R ´ enyi La yers 1. F orwar d mo del Assume that the intrala y er netw orks that corresp ond to A 1 and A 2 are ER graphs from the G ( n, p ) ensem ble [1] with edge probabilities p 1 and p 2 . F or each pair of no des ( i, j ), we therefore ha ve P ( A 1 ij = 1) = p 1 , (1) P ( A 2 ij = 1) = p 2 . (2) T o couple edges that connect the same pair of no des in diﬀeren t la y ers, let q := P ( A 1 ij = 1 , A 2 ij = 1) (3) denote the join t probabilit y for an edge to occur in b oth la yers. Unless q = p 1 p 2 , this construction implies that the random v ariables A 1 ij and A 2 ij are not indep endent. The parameters p 1 , p 2 , and q (which lie in the inter- v al [0 , 1]) fully specify a forward mo del of netw orks with correlated ER la yers. T o generate a netw ork from this mo del, one considers each node pair ( i, j ) and, indepen- den tly of all other no de pairs, assigns v alues to A 1 ij and A 2 ij according to the follo wing probabilities: P ( A 1 ij = 1 , A 2 ij = 1) = q , P ( A 1 ij = 1 , A 2 ij = 0) = p 1 − q , P ( A 1 ij = 0 , A 2 ij = 1) = p 2 − q , P ( A 1 ij = 0 , A 2 ij = 0) = 1 − p 1 − p 2 + q . (4) These expressions follo w from the la ws of probabilit y and from the deﬁnitions of p 1 , p 2 , and q . F or these probabili- ties to b e well-deﬁned, it is b oth necessary and suﬃcient that 0 ≤ q ≤ min( p 1 , p 2 ) and p 1 + p 2 ≤ 1 + q . In Fig. 2, w e illustrate the feasible region for p 1 and p 2 , giv en a v alue of q . It is also p ossible to generate a correlated ER netw ork in a sequen tial manner. First, one generates the adja- cency matrix A 1 b y placing edges with probability p 1 . One then determines the probabilities of edges in the second la yer by conditioning on the ﬁrst la yer: P ( A 2 ij = 1 | A 1 ij = 1) = P ( A 1 ij = 1 , A 2 ij = 1) P ( A 1 ij = 1) = q p 1 , P ( A 2 ij = 1 | A 1 ij = 0) = P ( A 1 ij = 0 , A 2 ij = 1) P ( A 1 ij = 0) = p 2 − q 1 − p 1 , P ( A 2 ij = 0 | A 1 ij = 1) = P ( A 1 ij = 1 , A 2 ij = 0) P ( A 1 ij = 1) = p 1 − q p 1 , P ( A 2 ij = 0 | A 1 ij = 0) = P ( A 1 ij = 0 , A 2 ij = 0) P ( A 1 ij = 0) = 1 − p 1 − p 2 + q 1 − p 1 . (5) 4 FIG. 2: Visualization of the feasible region (gra y area) for p 1 and p 2 , giv en a v alue of q . The b oundaries of this region are deﬁned b y the inequalities q ≤ p 1 ≤ 1, q ≤ p 2 ≤ 1, and p 1 + p 2 ≤ 1 + q . The h yp erb ola p 1 p 2 = q sp eciﬁes the boundary betw een regimes with a p ositiv e la y er correlation and regimes with a negativ e la yer correlation. With this approach, it is p ossible to generate net works with arbitrarily many lay ers by ﬁrst sampling edges in the ﬁrst lay er, and then sampling edges in each subsequent la yer by conditioning on the previous one. This kind of pro cess is esp ecially w ell-suited to temporal netw orks, in whic h lay ers hav e a natural ordering. F or multiplex net works, it is more appropriate to extend Eqns. (4) to handle more than t wo lay ers. It is also p ossible to parametrize correlated ER graphs in terms of the marginal Bernoulli probabilities, p 1 and p 2 , and the P earson correlation ρ = E  A 1 ij A 2 ij  − E  A 1 ij  E  A 2 ij  σ [ A 1 ij ] σ [ A 2 ij ] = q − p 1 p 2 p p 1 (1 − p 1 ) p 2 (1 − p 2 ) , (6) where E [ · ] and σ [ · ], resp ectively , denote the mean and standard deviation of a random v ariable. One beneﬁt of using ρ , rather than q , as the third mo del parameter is that its v alue is easier to interpret. A v alue of ρ that is close to 0 indicates a w eak correlation b etw een la yers, whereas v alues that are close to the extremes of +1 and − 1 indicate a strong p ositive correlation and a strong negativ e correlation, resp ectively . W e can gain further intuition by considering the cases ρ = 0, ρ = 1, and ρ = − 1. First, ρ = 0 if and only if P ( A 1 ij = 1 , A 2 ij = 1) = P ( A 1 ij = 1) P ( A 2 ij = 1). That is, the correlation is 0 if and only if edges are generated indep enden tly in the t wo la yers with marginal probabili- ties of p 1 and p 2 . F or ρ = 1, one can show (see [24]) that p 1 = p 2 = q , whic h corresponds to the t wo lay ers ha ving iden tical net work structure. Lastly , for ρ = − 1, we hav e q = 0 and p 1 = 1 − p 2 (see [24]), and t wo nodes are ad- jacen t in one lay er if and only if they are not adjacen t in the other lay er. 2. Maximum-likeliho o d p ar ameter estimates W e now derive ML estimates of the parameters p 1 , p 2 , and q . Let E denote the set of no de pairs that can form edges. F or undirected netw orks without self-edges, there are |E | = N ( N − 1) / 2 such no de pairs to consider, where N is the n um b er of ph ysical no des. By contrast, |E | = N ( N − 1) when generating directed netw orks with- out self-edges. With this general notation, all our deriv a- tions in Sec. I I are v alid for b oth directed and undirected net works, with or without self-edges. (They are also v alid for bipartite netw orks [24].) W e consider each pair of no des ( i, j ) ∈ E indep endently when generating edges, s o the lik eliho o d of observing adjacency matrices A 1 and A 2 is P ( A 1 , A 2 | p 1 , p 2 , q ) = Y ( i,j ) ∈E q A 1 ij A 2 ij ( p 1 − q ) A 1 ij (1 − A 2 ij ) ( p 2 − q ) (1 − A 1 ij ) A 2 ij (1 − p 1 − p 2 + q ) (1 − A 1 ij )(1 − A 2 ij ) . (7) It is helpful to introduce the following notation: e 11 := |{ ( i, j ) ∈ E : A 1 ij = 1 , A 2 ij = 1 }| , e 10 := |{ ( i, j ) ∈ E : A 1 ij = 1 , A 2 ij = 0 }| , e 01 := |{ ( i, j ) ∈ E : A 1 ij = 0 , A 2 ij = 1 }| , e 00 := |{ ( i, j ) ∈ E : A 1 ij = 0 , A 2 ij = 0 }| . These quantities corresp ond, resp ectively , to the num b er of no de pairs that are adjacent in b oth lay ers, are adja- cen t in the ﬁrst lay er but not in the second, are adjacent in the second lay er but not in the ﬁrst, and are not adja- cen t in either lay er. Using this notation and taking the logarithm of (7), w e arriv e at the follo wing expression for 5 the log-lik eliho o d: L = e 11 log q + e 10 log( p 1 − q ) + e 01 log( p 2 − q ) + e 00 log(1 − p 1 − p 2 + q ) . (8) When ﬁtting our mo del to netw ork data, the quan tities e 11 , e 10 , e 01 , e 00 are all known; and we seek to determine the v alues of p 1 , p 2 , and q that are b est explained by the data. T o do so, we maximize the log-likelihoo d (8) by setting its partial deriv atives to 0 [51]. W e obtain b p 1 = e 11 + e 10 e 11 + e 10 + e 01 + e 00 , (9) b p 2 = e 11 + e 01 e 11 + e 10 + e 01 + e 00 , (10) b q = e 11 e 11 + e 10 + e 01 + e 00 . (11) In all three expressions, the denominator is equal to the n umber of potential edges (i.e., the cardinalit y of E ). Ad- ditionally , let m 1 = e 11 + e 10 and m 2 = e 11 + e 01 denote the num b er of observed edges in the ﬁrst and the second la yers, resp ectively . It follows that the ML estimate b p 1 is equal to the n umber of observed edges in lay er 1 divided b y the num b er of p otential edges, and an analogous rela- tion holds for b p 2 . The estimate b q is equal to the num b er of no de pairs that are adjacen t in both la yers divided by the total n umber of no de pairs. These results matc h our in tuition. W e obtain an estimate of the Pearson correlation ρ b et ween the tw o lay ers by substituting the ML estimates b p 1 , b p 2 , and b q in to Eqn. (6) to obtain b ρ = e 00 e 11 − e 10 e 01 p ( e 11 + e 10 )( e 11 + e 01 )( e 10 + e 00 )( e 01 + e 00 ) . (12) One can sho w that maximizing the log-lik eliho o d (8) with resp ect to p 1 , p 2 , and ρ (rather than with resp ect to p 1 , p 2 , and q ) gives the same expression for b ρ , conﬁrming that this is indeed an ML estimate of the correlation. Note that b ρ is not deﬁned when either lay er is an empty or a complete graph, as the corresp onding Bernoulli random v ariable has a standard deviation of 0. In App. A, w e calculate the v ariances that are associ- ated with the ML estimates b p 1 , b p 2 , b q . W e then sho w using a synthetic example that these scale as 1 / N 2 , where w e recall that N is the n umber of physical no des. These re- sults quantify the uncertaint y around the ML estimates for correlated ER models as a function of netw ork size. B. Correlated SBMs One of the w ays in which real-world netw orks dif- fer from ER random graphs is that the former hav e mesoscale structures, such as comm unities [5]. W e use SBMs to incorporate such structures into our correlated mo dels. Let g be a vector of blo ck assignments, which w e take to b e identical for b oth netw ork lay ers, and let K denote the num b er of blo cks. As we explained in Sec. I, we assume throughout the presen t pap er that w e are given g , and w e aim to estimate the remaining model parameters. F ollowing terminology from [27], let B = { 1 , . . . , K } × { 1 , . . . , K } b e the set of “edge bundles” ( r , s ), eac h of whic h is describ ed by its o wn set of parameters p 1 rs , p 2 rs , and q rs . The K × K matrices p 1 , p 2 , and q play an analogous role to p 1 , p 2 , and q in the ER lay ers. Let g i denote the blo ck assignment of no de i . A corre- lated t wo-la yer SBM is described by the following set of equalities: P ( A 1 ij = 1) = p 1 g i g j , P ( A 2 ij = 1) = p 2 g i g j , P ( A 1 ij = 1 , A 2 ij = 1) = q g i g j . Lyzinski et al. prop osed this forward model in [31] to study the graph-matc hing problem. By contrast, we fo- cus on the inv erse problem of estimating the parameters p 1 , p 2 , and q , giv en some netw ork data. 1. Maximum-likeliho o d p ar ameter estimates As in Sec. I I A, supp ose that w e consider each no de pair ( i, j ) indep enden tly . The likelihoo d of observing ad- jacency matrices A 1 and A 2 is then P ( A 1 , A 2 | g , p 1 , p 2 , q ) = Y ( i,j ) ∈E h q A 1 ij A 2 ij g i g j ( p 1 g i g j − q g i g j ) A 1 ij (1 − A 2 ij ) ( p 2 g i g j − q g i g j ) (1 − A 1 ij ) A 2 ij × (1 − p 1 g i g j − p 2 g i g j + q g i g j ) (1 − A 2 ij )(1 − A 2 ij ) i . (13) In this pro duct, each factor dep ends on i and j only via their blo ck memberships g i and g j , so w e can combine sev eral terms. First, deﬁne e ab rs :=   { ( i, j ) ∈ E : A 1 ij = a, A 2 ij = b, g i = r , g j = s }   6 for ( a, b ) ∈ { (1 , 1) , (1 , 0) , (0 , 1) , (0 , 0) } , in analogy with e 11 , e 10 , e 01 , and e 00 from Sec. I I A. W e can then write the log-lik eliho o d as L = X ( r,s ) ∈B h e 11 rs log q rs + e 10 rs log( p 1 rs − q rs ) + e 01 rs log( p 2 rs − q rs ) + e 00 rs log(1 − p 1 rs − p 2 rs + q rs ) i . (14) The adv antage of writing the log-lik eliho o d as in (14) is that it clearly separates the contribution from diﬀer- en t edge bundles. Using the results for ER lay ers from Sec. I I A, we immediately obtain (without further calcu- lations) the following ML parameter estimates: b p 1 rs = e 11 rs + e 10 rs e 11 rs + e 10 rs + e 01 rs + e 00 rs = m 1 rs e rs , (15) b p 2 rs = e 11 rs + e 01 rs e 11 rs + e 10 rs + e 01 rs + e 00 rs = m 2 rs e rs , (16) b q rs = e 11 rs e 11 rs + e 10 rs + e 01 rs + e 00 rs = e 11 rs e rs , (17) where m 1 rs and m 2 rs denote the n umber of edges b etw een blo c ks r and s in lay ers 1 and 2, resp ectiv ely , and e rs is the num b er of p ossible edges b et ween no des in blo ck r and blo ck s . When there is a single edge bundle (i.e., when we do not assume any blo c k structure in a net- w ork), the ML estimates (15)–(17) recov er those that we obtained for correlated ER netw orks in Sec. II A. Eac h edge bundle also has a corresp onding Pearson correla- tion, whose ML estimate is b ρ rs = e 00 rs e 11 rs − e 10 rs e 01 rs p ( e 11 rs + e 10 rs )( e 11 rs + e 01 rs )( e 10 rs + e 00 rs )( e 01 rs + e 00 rs ) . (18) In applications to temp oral consumer–pro duct netw orks, w e ﬁnd that diﬀerent edge bundles ha ve v astly diﬀeren t correlation v alues [24]. W e anticipate that other empiri- cal m ultilay er netw orks ha ve similar properties. 2. Eﬀe ctive c orr elation Although having diﬀerent correlation v alues for diﬀer- en t edge bundles can b e useful, it is also helpful to hav e a single correlation measurement for a given multila yer net work. F or example, one may wish to use such a net- w ork diagnostic for one of the purp oses that w e outlined in Sec. I. One wa y to deﬁne an “eﬀective correlation” is to ﬁrst sample tw o no de indices, I and J , uniformly at random and then compute the Pearson correlation of the random v ariables A 1 I J and A 2 I J . That is, corr( A 1 I J , A 2 I J ) = E [ A 1 I J A 2 I J ] − E [ A 1 I J ] E [ A 2 I J ] σ [ A 1 I J ] σ [ A 2 I J ] , (19) where we use capital letters for the no de indices I and J to emphasize that they are random v ariables. W e can calculate each term on the right-hand side of (19) b y conditioning on the block assignmen ts of the ran- domly chosen no des I and J . First, for l ∈ { 1 , 2 } , we ha ve E [ A l I J ] = P ( A l I J = 1) = X ( r,s ) ∈B P ( A l I J = 1 | g I = r , g J = s ) P ( g I = r , g J = s ) = X ( r,s ) ∈B b p 1 rs e rs |E | = X ( r,s ) ∈B m l rs e rs e rs |E | = m l |E | , (20) where m l denotes the n umber of edges in lay er l . The expression (20) is the same as the probability p l of gen- erating an edge in la yer l for the ER case. Because A l I J is a Bernoulli random v ariable (in other w ords, it can only tak e the v alues 1 or 0), its standard deviation is σ [ A l I J ] = s m l |E |  1 − m l |E |  . Lastly , E [ A 1 I J A 2 I J ] = P ( A 1 I J = 1 , A 2 I J = 1) = X ( r,s ) ∈B b q rs e rs |E | = X ( r,s ) ∈B e 11 rs e rs e rs |E | = e 11 |E | . The estimated v alue of the eﬀective correlation is thus b ρ = corr( A 1 I J , A 2 I J ) (21) = e 00 e 11 − e 10 e 01 p ( e 11 + e 10 )( e 11 + e 01 )( e 10 + e 00 )( e 01 + e 00 ) , whic h recov ers the v alue in (12) for ER lay ers (i.e., with- out any blo ck structure in the mo del). W e stress that there is no reason a priori to exp ect this outcome. In fact, the analogous result do es not hold for Poisson mod- els [24]. In the present case, the fact that there is such a corresp ondence betw een mo dels is conv enien t for practi- cal reasons, as it implies that one can p erform the simpler calculations from Sec. II A to obtain correlation estimates b et ween net work la yers, even for netw orks with nontrivial mesoscale structure. C. Correlated Degree-Corrected SBMs The mo dels that we ha ve discussed thus far generate net works in whic h no des in the same block ha ve the same exp ected degree. SBMs that make this kind of assump- tion tend to p erform p o orly when they are used to infer mesoscale structure in real net works, man y of whic h hav e highly heterogeneous degree distributions. This observ a- tion led to the developmen t of degree-corrected SBMs (DCSBMs) [33]. W e exp ect that such adjustments can also mak e a diﬀerence when mo deling edge correlations, so w e now extend the model from Sec. I I B to incorporate degree correction. 7 W e contin ue to w ork with tw o-lay er netw orks, whic h we again sp ecify in terms of tw o intrala yer adjacency matri- ces, A 1 and A 2 , with a common blo ck structure that we sp ecify with a vector g . F or eac h no de pair ( i, j ) ∈ E , we place edges in the tw o la yers according to the probabili- ties P ( A 1 ij = 1) = θ 1 i θ 1 j p 1 g i g j , (22) P ( A 2 ij = 1) = θ 2 i θ 2 j p 2 g i g j , (23) P ( A 1 ij = 1 , A 2 ij = 1) = q θ 1 i θ 1 j θ 2 i θ 2 j q g i g j . (24) W e will so on justify the expression in (24). The quanti- ties θ l i and θ l j , with l ∈ { 1 , 2 } , are the degrees of no des i and j , normalized by the mean degrees. W e calculate these quantities directly from an input degree sequence, so they are not model parameters. F or undirected and unipartite netw orks, θ l i = d l i / h d l i , where i ∈ N and h d l i is the mean degree in la yer l . This normalization reco vers the mo del in Sec. I I B when θ l i = 1 (i.e., when all nodes ha ve the same degree). The mo del parameters p 1 rs , p 2 rs , and q rs are no w edge “prop ensities” that, together with the degrees, con trol the probabilities of edges in the la y- ers. The probabilities in Eqns. (22)–(23) ensure that, marginally , A 1 and A 2 are generated according to mono- la yer DCSBMs [33]. It is not ob vious how to mo del the join t probability P ( A 1 ij = 1 , A 2 ij = 1). In particular, it is not clear how it should dep end on the observ ed de- grees of no des i and j in lay ers 1 and 2 [52]. Part of the complication is that there are four suc h quantities for eac h no de pair ( i, j ). The c hoice from (24) w orks particularly well when ρ = 1 and the normalized de- gree sequences θ 1 and θ 2 are the same, as it reduces to a single degree-corrected SBM that generates tw o iden- tical netw ork lay ers. Another sensible option is to set P ( A 1 ij = 1 , A 2 ij = 1) = θ 1 i θ 1 j θ 2 i θ 2 j q g i g j . This c hoice has the nice prop erty that edges in a particular edge bun- dle ( r , s ) ∈ B are indep endent if and only if q rs = p 1 rs p 2 rs , whic h matc hes the independence condition from Sec. II B for the setting without degree correction. How ever, this second mo del underperforms the one from (22)–(24) for edge prediction (see Sec. I I I and [24]). Consequently , for the rest of this paper, w e use the mo del from Eqns. (22)–(24) as our correlated DCSBM. 1. Maximum-likeliho o d p ar ameter estimates When writing the log-likelihoo d for correlated DCS- BMs, we can ignore an y additive terms that only inv olve kno wn quantities, such as the normalized degrees θ l i . W e can th us write L = X ( r,s ) ∈B X ( i,j ) ∈E " A 1 ij A 2 ij log q rs + A 1 ij (1 − A 2 ij ) log p 1 rs − s θ 2 i θ 2 j θ 1 i θ 1 j q rs ! + (1 − A 1 ij ) A 2 ij log p 2 rs − s θ 1 i θ 1 j θ 2 i θ 2 j q rs ! + (1 − A 1 ij )(1 − A 2 ij ) log  1 − θ 1 i θ 1 j p 1 rs − θ 2 i θ 2 j p 2 rs + q θ 1 i θ 1 j θ 2 i θ 2 j q rs  # δ ( g i , r ) δ ( g j , s ) + (const.) . (25) As in Sec. I I B, w e seek to maximize L with resp ect to the parameters p 1 rs , p 2 rs , and q rs b y setting the corresponding deriv atives to 0. Ho wev er, degree-corrected mo dels ha v e the crucial complication that node pairs ( i, j ) in the same edge bundle ( r , s ) are no longer sto chastically equiv alent (i.e., the corresp onding en tries of the adjacency matrix are no longer sampled from indep enden t, iden tically dis- tributed random v ariables), so their con tributions to the log-lik eliho o d are no longer the same in general. Con- sequen tly , the ML equations for correlated DCSBMs in- v olve O ( N 2 /K 2 ) terms, making them more diﬃcult to solv e eﬃcien tly . In certain cases, we are able to make some appro xi- mations that make these ML equations easier to solve. Recall that θ l i = 1 if the degree of no de i is equal to the mean degree in la yer l . F or ( i, j ) ∈ E , w e write θ 1 i θ 1 j = 1 + ε 1 ij , θ 2 i θ 2 j = 1 + ε 2 ij . If the degree distribution is narrow, suc h that all no de degrees are close to the mean degree, then ε 1 ij and ε 2 ij are small parameters (which can b e either p ositive or negativ e). In this case, a ﬁrst-order T aylor expansion yields q θ 1 i θ 1 j θ 2 i θ 2 j = q (1 + ε 1 ij )(1 + ε 2 ij ) ≈ 1 + ε 1 ij + ε 2 ij 2 . (26) W e also calculate s θ 1 i θ 1 j θ 2 i θ 2 j = s 1 + ε 1 ij 1 + ε 2 ij ≈ 1 + ε 1 ij − ε 2 ij 2 (27) 8 and s θ 2 i θ 2 j θ 1 i θ 1 j ≈ 1 + ε 2 ij − ε 1 ij 2 . (28) Using the appro ximations (26)–(28), we expand the ﬁrst deriv atives of L to ﬁrst order in ε 1 ij and ε 2 ij . (See [24] for details.) This calculation yields the following system of equations: e 10 rs p 1 rs − q rs − e 00 rs 1 − p 1 rs − p 2 rs + q rs + g 10 rs 2 q rs ( p 1 rs − q rs ) 2 − f 1 rs 1 − p 2 rs + q rs / 2 (1 − p 1 rs − p 2 rs + q rs ) 2 − f 2 rs p 2 rs − q rs / 2 (1 − p 1 rs − p 2 rs + q rs ) 2 = 0 , e 01 rs p 2 rs − q rs − e 00 rs 1 − p 1 rs − p 2 rs + q rs + g 01 rs 2 q rs ( p 2 rs − q rs ) 2 − f 1 rs p 1 rs − q rs / 2 (1 − p 1 rs − p 2 rs + q rs ) 2 − f 2 rs 1 − p 1 rs + q rs / 2 (1 − p 1 rs − p 2 rs + q rs ) 2 = 0 , e 11 rs q rs − e 10 rs p 1 rs − q rs − e 01 rs p 2 rs − q rs + e 00 rs 1 − p 1 rs − p 2 rs + q rs − g 10 rs 2 p 1 rs ( p 1 rs − q rs ) 2 − g 01 rs 2 p 2 rs ( p 2 rs − q rs ) 2 + 1 2 f 1 rs (1 + p 1 rs − p 2 rs ) + f 2 rs (1 − p 1 rs + p 2 rs ) (1 − p 1 rs − p 2 rs + q rs ) 2 = 0 . (29) In these equations, we deﬁned e 11 rs , e 10 rs , e 01 rs , and e 00 rs as in Sec. I I B. Additionally , we set g 10 rs = X ( i,j ) ∈E A 1 ij (1 − A 2 ij )( ε 2 ij − ε 1 ij ) δ ( g i , r ) δ ( g j , s ) , g 01 rs = X ( i,j ) ∈E (1 − A 1 ij ) A 2 ij ( ε 1 ij − ε 2 ij ) δ ( g i , r ) δ ( g j , s ) , f 1 rs = X ( i,j ) ∈E (1 − A 1 ij )(1 − A 2 ij ) ε 1 ij δ ( g i , r ) δ ( g j , s ) , f 2 rs = X ( i,j ) ∈E (1 − A 1 ij )(1 − A 2 ij ) ε 2 ij δ ( g i , r ) δ ( g j , s ) . W e can eﬃciently calculate all of these quantities from the matrices A 1 and A 2 . The system of equations (29) reduces to the analo- gous equations for correlated SBMs if w e ignore all of the terms that dep end on ε l ij (i.e., the terms that cor- resp ond to perturbations of the degrees from their mean v alues). The zeroth-order solution that we obtain from ignoring these terms provides a go o d initialization of a n umerical algorithm to solve (29) for the parameters p 1 rs , p 2 rs , and q rs . In practice, when using correlated DCS- BMs for edge prediction (see Sec. I I I), we ﬁnd that using a ﬁrst-order approximation to determine p 1 rs , p 2 rs , and q rs giv es results that are almost identical to those from the zeroth-order approximation [see Eqns. (15)–(17)]. Con- sequen tly , we suggest using these zeroth-order appro xi- mation for edge-prediction applications, giv e n that they are straigh tforward to calculate and ha ve negligible im- pact on the quality of the results. F or large netw orks, w e also obtain a noticeable improv ement in calculation sp eed when using these approximations. In App. B, we compare the parameters that we es- timate using the appro ximate system of equations (29) with those from the log-likelihoo d (25). As exp ected, the qualit y of the appro ximation depends on the shape of the degree distribution, with larger discrepancies betw een the t wo approaches for broader degree distributions. 2. Corr elation values F or the SBMs without degree correction from Sec. II B, no de pairs ( i, j ) from a giv en edge bundle ( r , s ) ha ve the same Pearson correlation ρ rs . This no longer holds for degree-corrected mo dels. Instead, each no de pair ( i, j ) has its own correlation v alue % ij = E [ A 1 ij A 2 ij ] − E [ A 1 ij ] E [ A 2 ij ] σ [ A 1 ij ] σ [ A 2 ij ] (30) = q θ 1 i θ 1 j θ 2 i θ 2 j q rs − θ 1 i θ 1 j θ 2 i θ 2 j p 1 rs p 2 rs q θ 1 i θ 1 j p 1 rs (1 − θ 1 i θ 1 j p 1 rs ) θ 2 i θ 2 j p 2 rs (1 − θ 2 i θ 2 j p 2 rs ) . As in our earlier expansions of the ML equations, we appro ximate % ij to ﬁrst order in ε 1 ij and ε 2 ij . W e obtain % ij ≈ ρ rs + ρ rs ε 1 ij 2 p 1 rs 1 − p 1 rs + ε 2 ij 2 p 2 rs 1 − p 2 rs − ε 1 ij + ε 2 ij 2 p 1 rs p 2 rs q rs − p 1 rs p 2 rs ! . (31) Ignoring terms that depend on ε 1 ij and ε 2 ij (i.e., terms that corresp ond to perturbations of the degrees from their mean v alues), w e obtain % ij ≈ ρ rs . This approximation w orks esp ecially w ell when p 1 rs and p 2 rs are also small, suc h that their resp ectiv e net w ork la yers are sparse. The case q rs = p 1 rs p 2 rs requires separate consideration [53] to av oid dividing by 0. First-order approximations 9 in ε 1 ij and ε 2 ij for this case giv e % ij ≈ − ε 1 ij + ε 2 ij 2 s p 1 rs 1 − p 1 rs s p 2 rs 1 − p 2 rs . (32) In particular, the zeroth-order solution giv es % ≈ 0, in agreemen t with the SBM without degree correction from Sec. I I B. I I I. EDGE PREDICTION The aim of edge prediction (also called “link predic- tion”) in netw orks is to infer lik ely missing edges and/or spurious edges [54]. Edge prediction is useful for ﬁlling in incomplete data sets, suc h as protein-interaction net- w orks (in which edges are often established as a result of costly exp erimen ts) [55] or terrorist-asso ciation net- w orks (whic h are typically constructed based on partial kno wledge) [56]. In the con text of bipartite user–item net works, edge-prediction techniques provide candidates for personalized recommendations. One can perform edge prediction in either a supervised or an unsup ervised fashion. W e brieﬂy discuss eac h of these t yp es of approac hes. Sup ervised metho ds rely on mo dels that learn how a sp eciﬁed set of features relates to the presence or ab- sence of edges. Existing methods that take adv antage of m ultilay er structure for edge prediction typically do so through the sp eciﬁcation of multila yer features. These include aggregations of monolay er features [57], as w ell as path-based [58] and neighborho o d-based [59, 60] fea- tures that consider m ultiple la yers. Although many of these features dep end indirectly on the similarity b etw een diﬀeren t lay ers, none of these metho ds quantify the level of correlation or use it for edge prediction. With unsupervised metho ds, one obtains a ranking of no de pairs such that edges are more likely among higher- rank ed pairs. Common approaches include ones that are based on probabilistic models and ones that are based on similarit y indices (like the Jaccard index or the Adamic– Adar index) [54]. An example of the former for multila yer net works is a metho d that maps each net work la yer in- dep enden tly to a hyperb olic space and then uses the h y- p erb olic distance b etw een no des in one lay er to predict edges in another lay er [61]. This w ork used no de-centric notions of correlation and thereby complemen ts the edge- cen tric p ersp ectiv e of our work. Metho ds that rely on similarit y indices include those that ﬁrst generate latent states (i.e., so-called “em b eddings”) for the no des and then rank no de pairs according to the similarities of these em b edding vectors [62, 63]. Tillman et al. [64] used la y er- lev el correlations to combine monola yer similarit y indices in to a single score. There are also approaches that im- plicitly take adv antage of similarities across lay ers, suc h as by extracting common higher-order structures (sp ecif- ically , subgraphs with three or more no des) and lo oking for patterns that diﬀer by exactly one edge [65]. V ariants of SBMs are popular choices for unsupervised edge-prediction methods [4, 9, 55, 56], including in mul- tila yer settings [15, 20]. As an example, in a monolay er degree-corrected Bernoulli SBM, the probability that t wo no des, i and j , are adjacent according to the mo del is P ( A ij = 1) = θ i θ j p g i g j . The pairs ( i, j ) for whic h these probabilities are relatively large but which are not adja- cen t (i.e., with A ij = 0) in the actual netw ork of in terest pro duce a list of likely candidates for missing edges. Sim- ilarly , pairs ( i, j ) for which these probabilities are small but whic h are adjacen t (i.e., with A ij = 1) in the actual net work may b e spurious edges. A. Edge Prediction Using Correlated Mo dels There hav e been several recent attempts to p erform edge prediction in multila yer netw orks [15, 20, 28]. All of these metho ds use multila yer information to infer mesoscale structures in net works, but then they perform edge prediction independently in eac h la yer, conditioned on the inferred mesoscale structure and any other mo del parameters. In particular, when using one of these ap- proac hes, observing that t w o no des are adjacent in one la yer has no b earing on their probability to b e adjacen t in another la yer. W e aim to use our correlated mo dels to o vercome this limitation. As in our prior discussions, consider a netw ork with t wo lay ers with intrala yer adjacency matrices A 1 and A 2 , and let g b e the shared blo ck structure of these lay- ers. Our goal is to predict edges in the second la yer, conditioned on the adjacency structure of the ﬁrst la yer. F or each no de pair ( i, j ) ∈ E , the k ey quantities to cal- culate are the probabilities P ( A 2 ij = 1 | A 1 ij = 1) and P ( A 2 ij = 1 | A 1 ij = 0) for i and j to b e adjacent in the second la yer, conditioned on them either b eing adjacent or non-adjacent in the ﬁrst lay er. F or example, using the correlated Bernoulli SBM from Sec. I I B (which has no degree correction), we hav e P ( A 2 ij = 1 | A 1 ij = 1) = q g i g j p 1 g i g j , (33) P ( A 2 ij = 1 | A 1 ij = 0) = p 2 g i g j − q g i g j 1 − p 1 g i g j . (34) This set of probabilities is the same across all no de pairs ( i, j ) from the same edge bundle ( r, s ). No w supp ose that w e hav e a p ositive correlation in this edge bundle, so ρ rs > 0. F rom the deﬁnition of the Pearson corre- lation, it follo ws that q rs > p 1 rs p 2 rs . W e then ﬁnd that P ( A 2 ij = 1 | A 1 ij = 1) > p 2 rs and P ( A 2 ij = 1 | A 1 ij = 0) < p 2 rs , whereas using a monola yer SBM would en tail that P ( A 2 ij = 1) = p 2 rs . Therefore, the eﬀect of incorp orating correlations into our edge-prediction mo del when these correlations are p ositive is (1) to increase the probabilit y that no des i and j are adjacent in the second lay er when the corresponding edge also exists in the ﬁrst la yer and (2) to decrease this probabilit y when the corresp onding 10 edge is absen t from the ﬁrst la yer. The eﬀect is reversed for negativ e correlations. In T able I, we summarize the tw o key probabilities (33)–(34) for four diﬀeren t correlated models, alongside the probabilities P ( A 2 ij = 1) for monolay er SBMs and DCSBMs. W e include a correlated “conﬁguration mo del” (CM) [66], which is a sp ecial case of the degree-corrected SBM from Sec. I I C when there is only one blo ck. (Alter- nativ ely , one can think of correlated CMs as extensions of correlated ER mo dels that incorp orate degree correc- tion.) W e use all of these mo dels for edge prediction in syn thetic net works in Sec. I I I B and in consumer–pro duct net works in Sec. IV B. In particular, the t w o monola yer mo dels are baselines that we hop e to outp erform using our correlated mo dels. B. T ests on Synthetic Net works W e use K -fold [67] cross-v alidation to assess the p erfor- mance of the mo dels from T able I on the edge-prediction task. In mac hine learning, this is an eﬀective wa y to measure predictive p erformance [54]. After partitioning a giv en data set in to K parts, one ﬁts a model to K − 1 of these subsets and uses it to make predictions on the re- maining (i.e., “holdout”) set. One uses each subset once as a holdout, so one do es this pro cess K times in to- tal. F or our problem, we perform 5-fold cross-v alidation (whic h is a standard choice in the mac hine-learning liter- ature) by splitting the data in the second la yer of a given net work into 5 subsets. Eﬀectively , this consists of hid- ing 20% of the entries of the adjacency matrix A 2 , suc h that we do not know whether they are edges or not. W e then train a mo del on 100% of the en tries of A 1 and 80% of the entries of A 2 , and w e use it to make predictions ab out the 20% holdout data from A 2 . W e do this 5 times to co ver each choice of holdout data. A common wa y to assess the p erformance of a binary classiﬁcation mo del (i.e., a mo del that assigns one of tw o p ossible v alues to test data) is by using a receiver oper- ating characteristic (ROC) curv e. An R OC curve plots the true-p ositiv e rate (TPR) of a classiﬁer v ersus the false-p ositiv e rate (FPR) for v arious c hoices of a thresh- old. Many models — including those that are used for edge prediction in netw orks — make probabilistic pre- dictions, so spe cifying a threshold is necessary to conv ert these into binary predictions. Low ering the threshold increases b oth the TPR and the FPR. A model has pre- dictiv e pow er if the former gro ws faster than the latter, suc h that the entire ROC curve lies abov e the diagonal line TPR = FPR, whic h gives the performance of a ran- dom classiﬁer. As a single summary measure of a model’s predictiv e p erformance, it is common to report the area under an R OC curv e (AUC). Larger A UC v alues are b et- ter, with a v alue of AUC = 0 . 5 indicating equal success as random guessing and A UC = 1 corresponding to per- fect prediction. Ev en in the latter case, one still needs to determine a choice of threshold that completely separates true positives from false p ositives. The A UC is not the only p ossible quantit y to assess edge-prediction performance, although it is v ery common [15, 68]. In many netw orks, the num b er of edges is muc h smaller than the num b er of non-edges (i.e., pairs of no des that are not adjacent). In situations with suc h an imbal- ance, the area under the precision–recall (PR) curve is more sensitive than the AUC to v ariations in mo del p er- formance. Nevertheless, w e use the A UC b ecause it has an in tuitive interpretation (sp eciﬁcally , as the probabil- it y that the underlying mo del ranks a true positive abov e a true negative) that allo ws us to establish the result in App. C. Additionally , the conclusions that w e draw in Sec. IV B ab out the p erformance of diﬀeren t models do not c hange signiﬁcantly if w e use PR curves rather than R OC curv es. W e now describ e ho w to generate synthetic net work b enc hmarks that are suitable for testing the mo dels in T able I. W e construct these netw orks so that they ha ve tw o tunable parameters: the Pearson correlation ρ ∈ [ − 1 , 1] and a communit y-mixing parameter µ ∈ [0 , 1] that con trols the strength of the plan ted mesoscale struc- ture. See Bazzi et al. [14] for more details ab out the deﬁnition of µ . One can also explicitly control the degree distribution, such as by including a parameter η k for the slop e of a truncated p ow er law (e.g., as used in [14] to sample a degree sequence in each lay er). F or the exp eri- men ts in this section, w e ﬁx η k = − 2 and use a minimum degree of k min = 10 and a maxim um degree of k max = 50. It would b e interesting to explore the performance gap b et ween degree-corrected mo dels and mo dels without de- gree correction as one v aries η k , k min , and k max , although w e do not do so in the presen t pap er. Lastly , for our n u- merical experiments in this section, w e use net works with N = 2000 no des in each la yer and n c = 5 communities, with communit y sizes sampled from a ﬂat Dirichlet dis- tribution (i.e., one with θ = 1 in the notation of [14]). W e examine tw o versions, which we call CorrSBM and CorrDCSBM , of a correlated b enchmark that is parametrized by the correlation ρ and the communit y- mixing parameter µ . F or b oth versions of the bench- mark, we generate the (undirected and unipartite) ad- jacency matrix A 1 of the ﬁrst lay er in the same wa y . Sp eciﬁcally , given µ and degree-distribution parameters η k , k min , and k max , we use the co de from [69] to generate A 1 and its asso ciated blo ck structure g . W e ﬁt a mono- la yer mo del — either an SBM or a DCSBM, dep ending on the selected version of the benchmark — to A 1 to ob- tain the marginal edge prop ensities p 1 for the ﬁrst la yer. W e then choose p 2 in one of tw o wa ys. F or ρ ∈ [0 , 1], w e set p 2 = p 1 , which ensures that we can generate net- w orks with correlations that cov er the en tire range from 0 to 1. F or ρ ∈ [ − 1 , 0], w e set p 2 = 1 − p 1 , where 1 is a matrix with all entries equal to 1; this ensures that w e can generate netw orks with correlations that co ver the en tire range from − 1 to 0. Given p 1 , p 2 , and ρ , we then determine q using either the correlated SBM of Sec. I I B or the correlated DCSBM of Sec. I I C. F or the CorrD- 11 T ABLE I: Edge-prediction probabilities for v arious correlated multila y er and monolay er netw ork models. Mo del P ( A 2 ij = 1 | A 1 ij = 1) P ( A 2 ij = 1 | A 1 ij = 0) Corr. ER q /p 1 ( p 2 − q ) / (1 − p 1 ) Corr. SBM q rs /p 1 rs ( p 2 rs − q rs ) / (1 − p 1 rs ) Corr. CM q θ 1 i θ 1 j θ 2 i θ 2 j q . ( θ 1 i θ 1 j p 1 ) ( θ 2 i θ 2 j p 2 − q θ 1 i θ 1 j θ 2 i θ 2 j q ) . (1 − θ 1 i θ 1 j p 1 ) Corr. DCSBM q θ 1 i θ 1 j θ 2 i θ 2 j q rs . ( θ 1 i θ 1 j p 1 rs ) ( θ 2 i θ 2 j p 2 rs − q θ 1 i θ 1 j θ 2 i θ 2 j q rs ) . (1 − θ 1 i θ 1 j p 1 rs ) SBM p 2 rs DCSBM θ 2 i θ 2 j p 2 rs (a) Mo dels without mesoscale structure (b) Mo dels with mesoscale structure FIG. 3: ROC curves from a 5-fold cross-v alidation for a netw ork that w e sample from the CorrDCSBM b enc hmark with communit y-mixing parameter µ = 0 . 3 and correlation ρ = 0 . 5. (a) Correlated models that do not incorp orate an y mesoscale structure compared to a monola yer DCSBM baseline. The baseline gives AUC ≈ 0 . 83, whereas the AUC v alues for the tw o correlated mo dels are appro ximately 0 . 76 (correlated ER) and 0 . 83 (correlated CM). (b) Correlated models that incorporate mesoscale structure compared to the same monolay er DCSBM baseline. The baseline again gives AUC ≈ 0 . 83, and the A UC v alues for the tw o correlated models are appro ximately 0 . 89 (correlated SBM) and 0 . 91 (correlated DCSBM). CSBM b enc hmark with ρ ≥ 0, w e set the normalized degrees θ 2 i to b e equal to the corresponding quantities θ 1 i from the ﬁrst lay er. Again, this choice ensures that we can generate netw orks all the wa y to ρ = 1. (W e also implemen ted a version of this b enchmark that samples degrees indep endently in the second lay er, and w e found qualitativ ely similar results.) The ﬁnal step consists of generating A 2 giv en A 1 , the propensities p 2 and q , and (for the CorrDCSBM b enc hmark only) the normalized degree sequences θ 1 and θ 2 for b oth lay ers. T o p erform this step, we ﬁrst compute edge probabilities using ei- ther of the correlated mo dels from Secs. I I B and II C, and w e then generate edges independently according to these probabilities. W e now present our results for the t wo v ariants of the b enchmark. In Fig. 3, we sho w sample R OC curv es for one netw ork that w e create using the CorrDCSBM b enc hmark with µ = 0 . 3 and ρ = 0 . 5. W e compare the p erformance of our correlated mo dels with a monolay er DCSBM baseline, which performs edge prediction using only information from the second net work lay er. Two of the correlated mo dels outp erform this baseline, and the correlated CM p erforms comparably well (i.e., it has a similar A UC). In Fig. 4, we sho w results for the CorrSBM b ench- mark for tw o choices of the comm unity-mixing parame- ter µ and sev eral v alues (b oth positive and negativ e) of the P earson correlation ρ . As expected, the A UC v alues for monolay er SBMs are indep enden t of ρ , whereas the predictiv e p erformance of correlated ER mo dels and cor- related SBMs improv es as we increase | ρ | . In particular, when | ρ | = 1, the t wo correlated mo dels mak e p erfect predictions. When ρ = 0, the p erformance of the cor- related ER mo del is indistinguishable from c hance (b e- cause AUC ≈ 0 . 5), whereas correlated SBMs ha ve iden- tical p erformance to monolay er SBMs. The gap b etw een 12 (a) µ = 0 . 3, ρ ≤ 0 (b) µ = 0 . 3, ρ ≥ 0 (c) µ = 0 . 8, ρ ≤ 0 (d) µ = 0 . 8, ρ ≥ 0 FIG. 4: Edge-prediction results on syn thetic net works from the CorrSBM b enchmark with (left) ρ ≤ 0 and (right) ρ ≥ 0 using tw o c hoices of the comm unity-mixing parameter µ . W e use µ = 0 . 3 in the top ro w and µ = 0 . 8 in the b ottom ro w. In all plots, along the horizontal axis, w e v ary the correlation ρ that we use to generate net work instances. On the v ertical axis, w e indicate the AUC for 5-fold cross-v alidation using a monola y er SBM (dashed curv es) and correlated SBM and ER models (solid curves). Each data point is a mean across 10 trials, and the error bars correspond to one standard deviation from that mean. As expected, the A UC do es not change with ρ for the monola yer mo del, but it increases with | ρ | for the t wo correlated models. F or progressiv ely larger µ , for which the sampled net works hav e progressively weak er mesoscale structure, there is a smaller p erformance gap b et ween correlated ER mo dels and correlated SBMs. the tw o correlated mo dels is smaller for µ = 0 . 8 than for µ = 0 . 3, because the underlying blo ck structure is w eaker in the former case than in the latter. The A UC of the monola yer baseline is also smaller for µ = 0 . 8 than for µ = 0 . 3. One striking feature in Fig. 4 is that all curv es are ap- pro ximately straight lines (to within sampling error). It mak es sense that the p erformance of monolay er SBMs do es not v ary with ρ , as these mo dels do not use an y information from the other lay er, but the linear depen- dence on ρ of the other tw o curv es is less intuitiv e. F or the correlated ER mo del, w e can establish rigorously (see App. C) that the AUC is approximately equal to (1 + | ρ | ) / 2 when p 1 ≈ p 2 or when p 1 ≈ 1 − p 2 . Given that the correlated SBM curves from Fig. 4 also exhibit a linear dep endence on ρ , w e b elieve that it is p ossible to establish similar results for correlated mo dels that incor- p orate mesoscale structure. These results ha ve practical 13 (a) µ = 0 . 3 (b) µ = 0 . 8 FIG. 5: Edge-prediction results on syn thetic net works from the CorrDCSBM b enchmark with correlation ρ ≥ 0 and communit y-mixing parameters of (a) µ = 0 . 3 and (b) µ = 0 . 8. In b oth panels, along the horizon tal axis, w e v ary the correlation ρ that we use to generate netw ork instances. On the vertical axis, we indicate the AUC for 5-fold cross-v alidation using a monola yer DCSBM (dashed curves) and the correlated DCSBM and CM (solid curves). Eac h data p oint is a mean across 10 trials, and the error bars correspond to one standard deviation from that mean. As expected, the AUC is roughly indep endent of ρ for the monola yer mo del, but it increases with ρ for the t w o correlated models. As w e increase µ , such that the sampled net works hav e progressiv ely w eaker mesoscale structure, w e observe a substantial narrowing of the p erformance gap b et ween the correlated CMs and the correlated DCSBMs. imp ortance, as they allo w one to quickly estimate the additional b eneﬁts of using correlated mo dels instead of monola yer SBMs for edge prediction. In Fig. 5, we show results for the CorrDCSBM b enc hmark for t wo choices of the comm unity-mixing pa- rameter µ and nonnegativ e [70] v alues of the P earson cor- relation ρ . As expected, when ρ = 0, correlated DCSBMs p erform similarly to monolay er DCSBMs. As in Fig. 4, the p erformance of the monolay er mo del is roughly inde- p enden t of ρ , whereas the tw o correlated mo dels p erform b etter as ρ increases. The gap betw een the t wo correlated mo dels narro ws substan tially as one increases µ from 0 . 3 to 0 . 8. IV. APPLICA TIONS W e discuss tw o applications of correlated m ultilay er- net work mo dels to the analysis of empirical net works. In Sec. IV A, we report pairwise lay er correlations for sev- eral multiplex net works of diﬀeren t sizes. In Sec. IV B, w e consider a temp oral bipartite netw ork of customers and pro ducts. Using an approach similar to that from Sec. I I I, w e demonstrate that correlated multila yer mo d- els hav e a better edge-prediction p erformance than mono- la yer baselines. A. La yer Correlations in Empirical Net w orks W e now calculate pairwise la yer correlations using the form ula (21). Recall that this expression gives the ef- fectiv e correlation b et ween tw o lay ers, assuming that they ha ve iden tical block structures (although their edge- prop ensit y parameters can b e diﬀerent). Crucially , this calculation do es not require that one ﬁrst determines the underlying blo c k structure. In fact, as w e demonstrated in Sec. I I B, the eﬀective correlation of a correlated SBM reco vers the correlation of a correlated ER graph, whic h is straightforw ard to compute. Accounting for node de- grees, as we did in Sec. I I C for correlated DCSBMs, sig- niﬁcan tly increases the complexit y of such a calculation. Additionally , as we show ed in Sec. I I C, correlations us- ing a degree-corrected mo del are rather similar to those that one obtains without degree correction. In T able I I, w e rep ort the mean pairwise lay er correla- tion for 9 multiplex net works. (See App. D for descrip- tions of these data sets.) T o provide additional insigh t in to these netw orks, we also rep ort the tw o lay ers with the largest eﬀective correlations. W e mak e a few observ ations ab out some of the re- sults in T able I I. F or the C. ele gans connectome, the la yers that corresp ond to tw o types of chemical synapses are highly correlated with each other, and their correla- tion to the lay er of electrical synapses is comparatively lo wer. F or the European Union (EU) air transp ortation 14 net work, the tw o most correlated la yers are those that corresp ond to Scandinavian Airlines and Norwegian Air Sh uttle ﬂights; this is consistent with the ﬁndings in [38] (whic h w ere based on a diﬀeren t metho d for quan tifying la yer similarity). F or the netw ork of arXiv collab ora- tions b etw een net work scientists, the tw o most similar categories are “physics.data-an” (whic h stands for “Data Analysis, Statistics and Probability”) and “cs.SI” (whic h stands for “So cial and Information Netw orks”). One hy- p othesis is that these tw o lab els are often used together in pap ers; such common usage results in a large edge o verlap betw een the corresponding la yers and hence in a large correlation v alue. T o quan tify edge correlations at a more gran ular level, one can ﬁrst infer blo ck assignments g and then calcu- late correlations ρ rs b et ween all blo c k pairs ( r, s ). One p ossible ﬁnding from such a calculation ma y be that cor- relations b etw een Scandinavian Airlines and Norwegian Air Sh uttle routes are signiﬁcantly larger in certain geo- graphical regions than in others. B. Edge Prediction in Shopping Netw orks The data-science c ompan y dunnh umb y gav e us access to “pseudonymized” transaction data from stores of a ma jor gro cery retailer in the United Kingdom. The data w ere pseudonymized by replacing personally identiﬁable information with n umerical IDs, rendering it impossible to iden tify individual shopp ers. F or our analysis, w e ag- gregate transactions o ver ﬁxed time windo ws to construct bipartite netw orks of customers and pro ducts. W e re- fer to these structures as “shopping netw orks”. Because some purchases o ccur in higher volumes than others, it is useful to incorporate edge weigh ts. Given a customer i and a pro duct j , the item-p enetr ation weigh t is equal to the fraction of all of the items purc hased by customer i that are pro duct j . The b asket-p enetr ation weigh t is equal to the fraction of all baskets (i.e., distinct shop- ping trips) of customer i that include pro duct j . See the do ctoral dissertation [24] for more details ab out these w eighting schemes. W e now apply the edge-prediction methodology from Sec. I I I to temporal shopping netw orks, in which edges and edge w eights can change from changes in shopping b eha vior, with a ﬁxed set of customers and a ﬁxed set of pro ducts. W e construct net works with t wo la yers, whic h co ver the three-mon th time p eriods of March–Ma y 2013 and June–August 2013, resp ectiv ely . Using the same un- derlying transaction data, we construct tw o netw orks for whic h we determine the v ector g of blo c k assignments (the same one for b oth lay ers [78]) in diﬀerent w ays. F or the ﬁrst net work (which we call ShoppingMod ), we use bask et-p enetration w eights for the edges and apply m ul- tila yer mo dularity maximization [79, 80] to the weigh ted net work to determine comm unity assignments g [81]. F or the second net work (whic h we call ShoppingSBM ), we initially calculate item-p enetration weigh ts, and w e then apply a threshold to remov e edges whose w eight is b e- lo w the median weigh t (i.e., appro ximately 50% of the edges). W e ﬁt a degree-corrected SBM to the result- ing unw eighted netw ork using the belief-propagation al- gorithm from [82]. W e expect b etter edge-prediction p er- formance for the second netw ork, b ecause we detect its blo c k structure using an SBM (as opp osed to using mo d- ularit y maximization, which is more restrictive). As with our tests on synthetic netw orks in Sec. I I I B, w e use 5-fold cross-v alidation to assess edge-prediction p erformance. W e summarize the A UC v alues of our v ari- ous correlated m ultilay er mo dels and the monola yer base- lines in T able I I I, and w e show sample ROC curves in Fig. 6. W e make a few observ ations ab out these results. First, our approximation A UC ≈ (1 + ρ ) / 2 for corre- lated ER mo dels is very accurate for these tw o netw orks, whose correlations are appro ximately 0 . 44 and 0 . 48, re- sp ectiv ely . Second, our correlated m ultilay er models out- p erform the monolay er baselines for both netw orks. In particular, the v ery simple correlated ER mo del — whic h assigns one of tw o probabilities to edges, as indicated in T able I — p erforms ab out as w ell as the more sophis- ticated monolay er DCSBM for the ShoppingMod net- w ork. Third, as exp ected, AUC v alues are systematically larger for ShoppingSBM than for ShoppingMod . Fi- nally , although incorp orating mesoscale structure leads to better performance when there is no degree correction, this do es not seem to be the case for degree-corrected mo dels, as correlated DCSBMs do not p erform signiﬁ- can tly b etter than correlated CMs. This is also apparent in Figs. 6(b,d), where we observe almost iden tical R OC curv es for the tw o mo dels. This result suggests that, for some netw orks, taking in to accoun t lay er correlations and degree heterogeneity alleviates the need to also con- sider mesoscale structure when p erforming edge predic- tion. This observ ation has practical implications, as a correlated CM is m uch easier than a correlated DCSBM to ﬁt to data and to use for edge prediction. Ho wev er, for recommendation systems, there are situations in which ﬁtting a correlated DCSBM is b eneﬁcial, even if its edge- prediction p erformance is similar to that of a correlated CM. F or instance, one may wish to identify relev ant cus- tomers for a chosen pro duct, irresp ective of how muc h they buy (i.e., their degree). SBMs are able to distin- guish b etw een customers with equal degrees and identify those with th e greatest predisposition to buy a particular pro duct, whereas CMs are not. W e also note a result from [83] that a curv e lies com- pletely abov e (i.e., “dominates”) another ROC curve if and only if the same relationship holds for the asso ci- ated PR curves. This result implies for almost all of the curv es in Fig. 6 that the rankings of our mo dels based on A UCs are almost identical to those that w e would obtain if we instead base them on the areas under PR curv es. The only curves whose ranking when we use PR curv es is unclear from this result are the correlated ER and monola yer SBM curv es in Fig. 6(c). It is informative to consider the source of false posi- 15 T ABLE II: P airwise la yer correlations in sev eral multiplex net works. Domain Net work Num b er Mean Largest correlation (corresp onding la yers) of la yers correlation So cial CS Aarh us [71] 5 0 . 27 0 . 45 (“w ork” and “lunch” lay ers) Lazega la w ﬁrm [72] 3 0 . 39 0 . 48 (“advice” and “co-work” lay ers) Y ouT ub e [73] 5 0 . 12 0 . 20 (“shared subscriptions” and “shared subscrib ers”) Biological C. elegans connectome [74] 3 0 . 47 0 . 85 (“MonoSyn” and “P olySyn” lay ers) P. falciparum genes [75] 9 0 . 08 0 . 25 (“HVR7” and “HVR9” lay ers) Homo sapiens proteins [76] 7 0 . 04 0 . 29 (“direct interaction” and “physical association”) Other F A O international trade [34] 364 0 . 13 0 . 74 (“Pastry” and “Sugar confectionery”) EU air transp ortation [22] 37 0 . 03 0 . 39 (“Scandina vian Airlines” and “Norwegian Air Shuttle”) ArXiv collab orations [77] 13 0 . 07 0 . 73 (“physics.data-an” and “cs.SI”) T ABLE II I: Predictiv e performance of diﬀeren t mo dels on the shopping data set, as measured by the A UC. Mo del A UC A UC ( ShoppingMod ) ( ShoppingSBM ) Monola yer SBM 0.549 0.633 Correlated ER 0.724 0.743 Correlated SBM 0.742 0.793 Monola yer DCSBM 0.725 0.797 Correlated CM 0.817 0.870 Correlated DCSBM 0.818 0.875 tiv es in Fig. 6. Because the correlation b etw een lay ers is p ositiv e, w e are lik ely to predict an edge where none ex- ists when all of the follo wing conditions hold: (1) no des ( i, j ) are adjacen t in one la y er but not the other; (2) the no de pair ( i, j ) b elongs to an edge bundle ( r, s ) with a large lay er correlation ρ rs ; and (3) no des i and j hav e large degrees. Note that condition (c) applies only for correlated models with degree correction. V. CONCLUSIONS AND DISCUSSION W e introduced mo dels of multila yer net works in which edges that connect the same no des in diﬀerent lay ers are not indep endent. In comparison to models without edge correlations, our models oﬀer an improv ed represen tation of man y empirical net works, as in terlay er correlations are a common phenomenon: ﬂigh ts b etw een ma jor airp orts are serviced b y multiple airlines, individuals in teract re- p eatedly with the same p eople, consumers often buy the same pro ducts o ver time, and so on. Among other p o- ten tial applications, one can use our mo dels to impro ve edge prediction, to study the graph-matching problem on more realistic b enchmark netw orks, and to calculate la yer correlations as insightful summary statistics for net- w orks. T o model la yer correlations, we used biv ariate Bernoulli random v ariables to generate edges simultane- ously in t wo netw ork lay ers. (See [24] for deriv ations using Poisson random v ariables.) Correlated Bernoulli sto c hastic blo c k models were prop osed previously [31], although only as forw ard mo dels for generating net- w orks, rather than for p erforming inference given em- pirical data. Another k ey con tribution of our w ork is a degree-corrected v arian t of such a mo del. The maximum- lik eliho o d equations are signiﬁcantly more diﬃcult to solv e in this case, but we were able to mak e useful sim- pliﬁcations with suitable approximations. Notably , these simpliﬁed equations closely appro ximate those for mod- els without degree correction for net works with almost homogeneous degree distributions. The models in the presen t paper that incorp orate some mesoscale structure g assume that suc h structure is giv en. This setup has the b eneﬁt that one can use any de- sired algorithms to pro duce a netw ork partition, includ- ing ones that op erate on weigh ted or annotated net works or that use nonstandard null mo dels in a mo dularity ob- jectiv e function. This mak es our approac h for analyzing correlations suitable for a wide v ariety of applications. Fitting a correlated SBM to netw ork data yields a cor- relation v alue ρ rs for each edge bundle ( r, s ). W e hav e deﬁned an eﬀectiv e correlation that combines all of these v alues into a single measure of similarity b etw een tw o la yers. Notably , the v alue of the eﬀectiv e correlation is indep enden t of a net w ork’s mesoscale structure, making it extremely easy to compute (see Eqn. (12)). W e illus- trated this metho d of assessing lay er similarity for multi- plex netw orks from so cial, biological, and other domains. Another application of our work is to edge prediction in m ultilay er netw orks. Our numerical experiments re- v ealed that simple correlated models (e.g., a correlated conﬁguration mo del or a correlated SBM without de- gree correction) can outp erform monola yer DCSBMs in terms of A UC even for mo derate correlation v alues. W e also observed such improv ed performance for consumer– pro duct netw orks, which hav e signiﬁcant la yer correla- tions ( ρ ≈ 0 . 45). W e exp ect that a correlated multila yer DCSBM will t ypically outp erform a monola yer DCSBM for most empirical netw orks, even when there are lo w er lev els of correlation. There are many in teresting wa ys to build further on our work. F or example, it would be useful to be able to mo del all la yers sim ultaneously , rather than in a pairwise fashion, esp ecially for m ultiplex netw orks (in whic h lay- ers do not hav e a natural ordering). One c hallenge is that 16 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 F alse- p ositiv e rat e 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 T rue-p ositiv e rat e Correlated SBM Correlated ER Monola y er S BM (a) ShoppingMod , no degree correction 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 F alse-p osit iv e rate 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 T rue-p osit iv e rate Correlated DC SBM Correlated CM Monola y er DCSBM (b) ShoppingMod , degree-corrected mo dels 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 F alse- p ositiv e rat e 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 T rue-p ositiv e rat e C o r r e l a t e d S B M C o r r e l a t e d E R M o n o l a y e r S B M (c) ShoppingSBM , no degree correction 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 F alse- p ositiv e rat e 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 T rue-p ositiv e rat e Corre lated DCSBM Correlated CM Monola y er D CSBM (d) ShoppingSBM , degree-corrected mo dels FIG. 6: ROC curv es for the edge-prediction task using 5-fold cross-v alidation on tw o temporal netw orks, ShoppingMod and ShoppingSBM . W e consider b oth degree-corrected mo dels and mo dels without degree correction. The dotted diagonal line in each plot indicates the expected R OC curv e for a random classiﬁer. All other curv es lie ab ov e this diagonal line, suggesting that they hav e predictiv e pow er. In all cases, correlated m ultilay er mo dels outperform their monolay er coun terparts. F or b oth net works, correlated SBMs outp erform correlated ER mo dels, whereas correlated DCSBMs p erform similarly to correlated CMs [as illustrated by the almost o verlapping curv es in panels (b) and (d)]. a m ultiv ariate Bernoulli distribution of dimension L has 2 L − 1 parameters; this grows quic kly with the num b er L of lay ers. F or a temporal setting, we hav e proposed gen- erating correlated netw orks in a sequential w ay b y condi- tioning eac h la yer on the previous one. In some cases, it will b e useful to relax this “memoryless” assumption and condition a la yer on all previous lay ers, rather than only on the most recen t lay er. F or example, the purchases of shopp ers in December one year are strongly related not only to their purchases in No vem b er, but also to what they bought in Decem b er during the previous y ear. In the presen t pap er, we hav e not considered the case of nonidentical mesoscale structure across la yers. (See [24] for a p ossible approach.) Additionally , although one can accoun t for edge w eights when ﬁtting a blo ck struc- ture g to use with our models, the rest of our deriv a- tions apply only to unw eighted netw orks. Mo deling cor- related weigh ted net works entails prescribing both edge- 17 existence and edge-weigh t correlations. Y et another idea is to derive correlated mo dels for netw orks with ov er- lapping communities. Previous research [15] suggests that this can substan tially improv e edge-prediction per- formance. F or correlated DCSBMs, for example, it would b e useful to learn the dep endence of the joint probabilit y P ( A 1 ij = 1 , A 2 ij = 1) on no de degrees, rather than assume the parametric form in Eqn. (24). This is a challenging problem for whic h maximum-lik eliho o d estimation is un- lik ely to b e a suitable tool, so it falls outside the scop e of the present pap er. Our work is also a starting p oin t for designing algo- rithms to detect correlated communities in netw orks b y inferring g alongside other mo del parameters. A prac- tical outcome of suc h an algorithm would b e a set of comm unities that p ersist across la yers if and only if the edges in those comm unities are suﬃciently highly corre- lated with eac h other. Such an approach w ould oﬀer a new in terpretation of what it means for a communit y to span m ultiple la yers [41, 84]. Lastly , although we p erformed edge prediction in an unsup ervised manner, it is also p ossible to use estimated correlations as features in sup ervised mo dels [57 – 60] to impro ve their performance. In summary , our w ork highlights the importance of re- laxing edge-indep endence assumptions in statistical mod- els of net work data. Doing so provides ric her insights in to the structure of empirical netw orks, improv es edge- prediction p erformance, and yields more realistic mo dels on whic h to test comm unit y-detection, graph-matching, and other types of algorithms. A CKNOWLEDGMENTS ARP was funded b y the EPSRC Cen tre for Do ctoral T raining in Industrially F o cused Mathematical Mo d- elling (EP/L015803/1) in partnership with dunnhum by . W e are grateful to dunnhum by for providing access to gro cery-shopping data and to Shreena Patel and Rosie Prior for many helpful discussions. W e also thank Jun- biao Lu for helpful comments. App endix A: V ariance of ML Estimates In Sec. II, w e deriv ed ML parameter estimates for v ari- ous types of correlated net work mo dels. In this app endix, w e show how to obtain the v ariance of these estimates and we illustrate how these v ariances scale with netw ork size (i.e., the num b er of nodes). W e focus on correlated ER models and the corresponding log-lik eliho o d (8). The same approach also works for the other types of mo dels that w e examined. Let β = [ p 1 p 2 q ] > denote the vector of parameters for a correlated ER model. Under mild conditions [85], the ML estimate b β conv erges in distribution to a mul- tiv ariate normal distribution as the num b er N of nodes tends to inﬁnit y . F or large but ﬁnite N , the quan tit y b β is distributed approximately according to N ( β ∗ , I − 1 ( β ∗ )), where β ∗ is the true-parameter v ector and I − 1 ( β ∗ ) is the in verse of the Fisher information matrix ev aluated at the parameter v alues in β ∗ . The diagonal en tries of I − 1 ( β ∗ ) pro vide v ariance estimates for p 1 , p 2 , and q . T o illustrate ho w these v ariance estimates scale with the n umber N of no des, w e sim ulate correlated ER net- w orks with p 1 = 0 . 1, p 2 = 0 . 085, and q = 0 . 05 (which corresp ond to a correlation of ρ = 0 . 5) for diﬀerent net- w ork sizes. In Fig. 7, w e plot the 95% conﬁdence in terv als around the ML estimates for each of the three parame- ters in our model. W e ﬁnd empirically that the v ariance scales with 1 / N 2 , so the standard deviation (and thus the width of the 95% conﬁdence in terv als) scales with 1 / N . 25 50 75 100 125 150 175 200 Num b er of no des 0 . 00 0 . 05 0 . 10 0 . 15 0 . 20 Estimated parameters b p 1 b p 2 b q FIG. 7: ML estimates with 95% conﬁdence interv als that w e obtain from the in verse of the Fisher information matrix for diﬀeren t v alues of the num b er N of nodes. The true parameter v alues (dashed horizontal lines) are p 1 = 0 . 1, p 2 = 0 . 085, and q = 0 . 05. The width of the conﬁdence in terv als scales approximately as 1 / N . App endix B: Maximizing the F ull Log-Likelihoo d V ersus our Approximate Log-Lik eliho o d for Correlated DCSBMs In Sec. I I C, we derived an approximation (29) of the log-likelihoo d for correlated DCSBMs that enables a more eﬃcien t estimation of the parameters than max- imizing the exact log-lik eliho o d (25). W e no w use simu- lated data to illustrate how the results that one obtains using this appro ximation compare with maximizing the original lik eliho o d. W e sim ulate net w orks using the CorrDCSBM b enc h- mark (see Sec. II I B) with N = 1000 no des, K = 5 comm unities, a mixing parameter of µ = 0 . 3, and a cor- relation of ρ = 0 . 5. F or N = 2000 nodes, which we used in other exp eriments in the pap er, we ﬁnd that maximizing the full log-lik eliho o d is prohibitiv ely slow. Ev en for N = 1000, obtaining estimates from the full 18 log-lik eliho o d takes ab out 30 minutes in total, whereas the calculation runs in about 5 seconds on a t ypical lap- top when we use the approximate log-likelihoo d. The quality of our appro ximation dep ends strongly on the shap e of the degree distribution: accuracy de- grades for distributions with larger v ariances and heavier tails. W e illustrate this b ehavior in Fig. 8, where we plot the full-likelihoo d estimates [see Eqn. (25)] versus the appro ximate-likelihoo d estimates [see Eqn. (29)] for tw o net works. F or b oth netw orks, we sample degrees from truncated p o wer-la w distributions using the co de in [69]. In Fig. 8(a), we choose a relativ ely narrow degree dis- tribution (with k exp = 0, a minimum degree of 18, and a maximum degrees of 22). In this case, the parame- ter v alues that w e estimate using the appro ximate log- lik eliho o d closely matc h those from the full log-likelihoo d. F or Fig. 8(b), we choose a relativ ely wide degree distri- bution (with k exp = − 2, a minimum degree of 10, and a maxim um degree of 50). In this case, the appro ximate log-lik eliho o d tends to o verestimate parameter v alues, es- p ecially for larger v alues of these parameters. See the top-righ t corner of Fig. 8(b). As this example illustrates, there is a trade-oﬀ b etw een accuracy and sp eed. It may be p ossible to derive better appro ximations, suc h as b y considering a second-order expansion instead of a ﬁrst-order expansion with resp ect to the quan tities ε 1 ij and ε 2 ij or by adding corrections for large-degree nodes to the appro ximate log-lik eliho o d. 10 − 2 10 − 1 P arameters from full lik eliho o d 10 − 2 10 − 1 P arameters from appro ximate lik eliho o d b p 1 b p 2 b q (a) k exp = 0, k min = 18, k max = 20 10 − 2 10 − 1 P arameters from full lik eliho o d 10 − 2 10 − 1 P arameters from appro ximate lik eliho o d b p 1 b p 2 b q (b) k exp = − 2, k min = 10, k max = 50 FIG. 8: ML estimates from the approximate log-lik eliho o d (29) versus ML estimates from the full log-lik eliho o d (25). (a) Example of a net work with a relativ ely narro w degree distribution. F or this example, the appro ximation w orks w ell. (b) Example of a net work with a relatively wide degree distribution. F or this example, we observe some discrepancies b etw een the t wo sets of estimates. App endix C: Edge-Prediction AUC as a F unction of the P earson Correlation ρ W e establish the follo wing result. Prop osition. The AUC for a c orr elate d ER mo del is an aﬃne function of the Pe arson c orr elation ρ . In p artic- ular, when p 1 ≈ p 2 or p 1 ≈ 1 − p 2 , we have AUC ER ≈ (1 + | ρ | ) / 2 . Pr o of. Supp ose that ρ > 0. (The case ρ ≤ 0 is similar.) With a correlated ER mo del, all unobserved interactions ( i, j ) in the second lay er hav e one of tw o probabilities: q /p 1 if A 1 ij = 1 and ( p 2 − q ) / (1 − p 1 ) if A 1 ij = 0. Because ρ > 0, w e ha ve q > p 1 p 2 , which implies that q /p 1 > ( p 2 − q ) / (1 − p 1 ). Selecting a threshold b etw een these t wo probabilities amoun ts to predicting that everything that is an edge in the ﬁrst lay er is also an edge in the second la y er and that everything that is not an edge in the ﬁrst la yer is also not an edge in the s econd lay er. Let a and b denote the TPR and FPR, resp ectively , at suc h an intermediate threshold. In this case, a and b are the co ordinates of the point at whic h the slop e of the ROC curv e c hanges. See the illustration in Fig. 9. FIG. 9: Diagram of the R OC curv e for an ER model, whic h assigns one of t wo p ossible edge probabilities to eac h pair of no des. By straigh tforward geometry , A UC ER = 1 − ab 2 − (1 − a )(1 − b ) 2 − (1 − a ) b = 1 2 + a − b 2 is the area under this ROC curve. The next step is to esti- mate a and b . With a correlated ER mo del, the num b er of true p ositiv es is prop ortional [86] to e 11 ; the mo del’s pre- diction is correct ev ery time that an edge that is present in the ﬁrst la yer is also present in the second lay er. T o ﬁnd the TPR, one needs to divide this quantit y by the n umber of edges (there are e 11 + e 01 of them) in the sec- ond la yer. W e obtain a ≈ e 11 e 11 + e 01 ≈ q p 2 = p 1 + ρ r 1 − p 2 p 2 p 1 (1 − p 1 ) . (C1) Similarly , every time an edge that is present in the ﬁrst la yer is not present in the second lay er coun ts as an in- correct prediction of the mo del. Therefore, the num b er 19 of false p ositives is proportional to e 10 . Dividing this by the n umber of non-edges in the second la yer yields b ≈ e 10 e 10 + e 00 ≈ p 1 − q 1 − p 2 = p 1 − ρ r p 2 1 − p 2 p 1 (1 − p 1 ) . (C2) F rom (C1) and (C2), it follows that A UC ER ≈ 1 2 + ρ 2 p p 1 (1 − p 1 )  r 1 − p 2 p 2 + r p 2 1 − p 2  , whic h is an aﬃne function of ρ . When p 1 ≈ p 2 or p 1 ≈ 1 − p 2 , as is the case in Fig. 3, we obtain AUC ER ≈ (1+ | ρ | ) / 2, as desired. Using a similar argument, one can sho w that the same result holds (with the same assumptions on p 1 and p 2 ) when ρ ≤ 0. Giv en that the correlated SBM curves from Fig. 4 also app ear to dep end linearly on ρ , w e believe that it is p ossi- ble to establish similar results for correlated mo dels that incorp orate mesoscale structure. App endix D: Data Sets W e provide brief descriptions of the multiplex netw orks that we analyzed in Sec. IV A. F or weigh ted netw orks, w e disregard edge weigh ts when calculating la yer cor- relations. W e do wnloaded these netw orks, aside from the Y ouT ub e and P. falcip arum data sets, from https: //comunelab.fbk.eu/data.php . 1. CS Aarh us This is an undirected and unw eigh ted so cial netw ork of oﬄine and online relationships b etw een N = 61 mem- b ers of the Departmen t of Computer Science at Aarh us Univ ersity [71]. There are T = 5 lay ers: (1) regularly eating lunc h together; (2) friendships on F aceb o ok; (3) co-authorship; (4) leisure activities; and (5) working to- gether. 2. Lazega La w Firm This directed, unw eighted netw ork encompasses inter- actions b etw een N = 71 partners and asso ciates who w ork at the same law ﬁrm [72]. The netw ork has T = 3 la yers that enco de co-work, friendship, and advice rela- tionships. 3. Y ouT ub e This is an undirected, w eighted netw ork of interactions b et ween N = 15088 Y ouT ube users [73]. There are T = 5 t yp es of interactions: (1) direct con tacts (“friendships”); (2) shared con tacts; (3) shared subscriptions; (4) shared subscrib ers; and (5) shared fav orites. 4. C. ele gans Connectome This is a directed, un weigh ted net work of synaptic con- nections b et ween N = 279 neurons of the nemato de C. ele gans [74]. There are T = 3 la yers, whic h corresp ond to electric, chemical monadic (“MonoSyn”), and chemi- cal polyadic (“PolySyn”) junctions. 5. P. falcip arum Genes This is an undirected, unw eighted net work of N = 307 recom binant genes from the parasite P. falcip arum , whic h causes malaria [75]. There are T = 9 lay ers that cor- resp ond to distinct highly v ariables regions (HVRs), in whic h these recombinations o ccur. Two genes are adja- cen t in a la yer if they share a substring whose length is statistically signiﬁcan t. 6. Homo sapiens Proteins This is a directed, unw eighted net work of interactions b et ween N = 18222 proteins in Homo sapiens [87]. There are T = 7 la yers, which corresp ond to the following t yp es of interactions: (1) direct interactions; (2) physical asso ciations; (3) suppressiv e genetic in teractions; (4) as- so ciation; (5) colo calization; (6) additive genetic in terac- tions; and (7) syn thetic genetic in teractions. The original data is from BioGRID [76], a public database of protein in teractions (for humans as w ell as other organisms) that is curated from diﬀeren t types of exp eriments. 7. F o o d and Agriculture Organization (F A O) T rade This is a w eighted, directed netw ork of fo od imports and exports during the y ear 2010 b etw een N = 214 coun- tries [34]. There are T = 364 la yers, which correspond to diﬀeren t foo d pro ducts. 8. Europ ean Union Air T ransp ortation This is an undirected, unw eighted netw ork of ﬂights b et ween N = 450 airp orts in Europ e [22]. There are T = 37 lay ers, eac h of which corresp onds to a diﬀerent airline. 9. ArXiv Collab orations This is an undirected, w eighted coauthorship netw ork b et ween N = 14489 netw ork scientists [77]. There are T = 13 la yers, which correspond to diﬀeren t 20 arXiv sub ject areas: “ph ysics.so c-ph”, “ph ysics.data- an”, “physics.bio-ph”, “math-ph”, “math.OC”, “cond- mat.dis-nn”, “cond-mat.stat-mech”, “q-bio.MN”, “q- bio”, “q-bio.BM”, “nlin.AO”, “cs.SI”, and “cs.CV”. [1] M. E. J. Newman, Networks , 2nd ed. (Oxford Univ ersity Press, Oxford, UK, 2018). [2] C. A. Davis, O. V arol, E. F errara, A. Flammini, and F. Menczer, in Pr oc e e dings of the 25th International World Wide Web Confer enc e Comp anion (2016) pp. 273– 274. [3] G. A. Pagani and M. Aiello, Physica A: Statistical Me- c hanics and its Applications 392 , 2688 (2013). [4] R. Guimer` a and M. Sales-Pardo, PLoS Computational Biology 9 , e1003374 (2013). [5] S. F ortunato and D. Hric, Physics Rep orts 659 , 1 (2016). [6] P . Csermely , A. London, L.-Y. W u, and B. Uzzi, Journal of Complex Netw orks 1 , 93 (2013). [7] P . Rom bach, M. A. Porter, J. H. F owler, and P . J. Mucha, SIAM Review 59 , 619 (2017). [8] R. A. Rossi and N. K. Ahmed, IEEE T ransactions on Kno wledge and Data Engineering 27 , 1112 (2015). [9] T. P . Peixoto, in A dvanc es in Network Clustering and Blo ckmo deling , edited by P . Doreian, V. Batagelj, and A. F erligo j (Wiley , Hob ok en, NJ, USA, 2020) pp. 289– 332. [10] M. Kivel¨ a, A. Arenas, M. Barthelemy , J. P . Gleeson, Y. Moreno, and M. A. P orter, Journal of Complex Net- w orks 2 , 203 (2014). [11] S. Bo ccaletti, G. Bianconi, R. Criado, C. D. Genio, J. G´ omez-Gardenes, M. Romance, I. Sendina-Nadal, Z. W ang, and M. Zanin, Physics Rep orts 544 , 1 (2014). [12] M. A. Porter, Notices of the American Mathematical So- ciet y 65 , 1419 (2018). [13] A. Aleta and Y. Moreno, Ann ual Review of Condensed Matter Ph ysics 10 , 45 (2019). [14] M. Bazzi, L. G. S. Jeub, A. Arenas, S. D. Howison, and M. A. Porter, Physical Review Researc h 2 , 023100 (2020). [15] C. De Bacco, E. A. P ow er, D. B. Larremore, and C. Mo ore, Ph ysical Review E 95 , 042317 (2017). [16] N. Stanley , S. Shai, D. T aylor, and P . J. Mucha, IEEE T ransactions on Netw ork Science and Engineering 3 , 95 (2016). [17] A. Ghasemian, P . Zhang, A. Clauset, C. Mo ore, and L. P eel, Physical Review X 6 , 031005 (2016). [18] T. P . Peixoto and M. Rosv all, Nature Communications 8 , 582 (2017). [19] T. P . Peixoto, Ph ysical Review E 92 , 042807 (2015). [20] T. V all` es-Catal` a, F. A. Massucci, R. Guimer` a, and M. Sales-P ardo, Physical Review X 6 , 011036 (2016). [21] D. T aylor, S. Shai, N. Stanley , and P . J. Mucha, Ph ysical Review Letters 116 , 228301 (2016). [22] A. Cardillo, J. G´ omez-Gardenes, M. Zanin, M. Romance, D. P ap o, F. Del P ozo, and S. Bo ccaletti, Scien tiﬁc Re- p orts 3 , 1344 (2013). [23] U. Aslak, M. Rosv all, and S. Lehmann, Physical Review E 97 , 062312 (2018). [24] A. R. Pamﬁl, Communities in Annotate d, Multilayer, and Corr elate d Networks , D.Phil. thesis, Univ ersity of Oxford (2018). [25] C. H. Park and M. Kahng, in IEEE/A CIS 9th Interna- tional Confer enc e on Computer and Information Scienc e (IEEE, 2010) pp. 573–578. [26] T. Y asseri, R. Sumi, A. Rung, A. Kornai, and J. Kert´ esz, PloS One 7 , e38869 (2012). [27] C. Aicher, A. Z. Jacobs, and A. Clauset, Journal of Com- plex Net works 3 , 221 (2014). [28] M. T arr´ es-Deulofeu, A. Go doy-Lorite, R. Guimer` a, and M. Sales-P ardo, Physical Review E 99 , 032307 (2019). [29] D. Conte, P . F oggia, C. Sansone, and M. V ento, Inter- national Journal of P attern Recognition and Artiﬁcial In telligence 18 , 265 (2004). [30] V. Lyzinski, D. E. Fishkind, and C. E. Prieb e, Journal of Mac hine Learning Research 15 , 3513 (2014). [31] V. Lyzinski, D. L. Sussman, D. E. Fishkind, H. P ao, L. Chen, J. T. V ogelstein, Y. Park, and C. E. Prieb e, P arallel Computing 47 , 70 (2015). [32] V. Lyzinski and D. L. Sussman, Information and Infer- ence: A Journal of the IMA , iaz031 (2020). [33] B. Karrer and M. E. J. Newman, Physical Review E 83 , 016107 (2011). [34] M. De Domenico, V. Nicosia, A. Arenas, and V. Latora, Nature Comm unications 6 , 6864 (2015). [35] J. Iacov acci, Z. W u, and G. Bianconi, Physical Review E 92 , 042806 (2015). [36] V. Nicosia and V. Latora, Physical Review E 92 , 032805 (2015). [37] F. Battiston, V. Nicosia, and V. Latora, Physical Review E 89 , 032804 (2014). [38] T.-C. Kao and M. A. Porter, Journal of Statistical Ph ysics 173 , 1286 (2018). [39] M. De Domenico and J. Biamonte, Physical Review X 6 , 041062 (2016). [40] In fact, previous algorithms for SBM inference [15, 16, 27, 82] ha v e suggested that one can form ulate a full inference algorithm using an iterative approac h that alternates b e- t ween estimating a multila yer netw ork’s mesoscale struc- ture and estimating its edge probabilities and correla- tions. Therefore, the latter task — which is the fo cus of the present pap er — constitutes one half of an algorithm for correlated-comm unity detection. [41] A. R. Pamﬁl, S. D. Howison, R. Lambiotte, and M. A. P orter, SIAM Journal on Mathematics of Data Science 1 , 667 (2019). [42] K.-M. Lee, J. Y. Kim, W.-k. Cho, K.-I. Goh, and I. Kim, New Journal of Physics 14 , 033027 (2012). [43] B. Min, S. Do Yi, K.-M. Lee, and K.-I. Goh, Physical Review E 89 , 042811 (2014). [44] V. Marceau, P .-A. No¨ el, L. H´ eb ert-Dufresne, A. Allard, and L. J. Dub´ e, Physical Review E 84 , 026105 (2011). [45] S. F unk and V. A. Jansen, Physical Review E 81 , 036118 (2010). [46] J. Y. Kim and K.-I. Goh, Ph ysical review letters 111 , 058702 (2013). [47] P . Barucca, F. Lillo, P . Mazzarisi, and D. T antari, Jour- nal of Statistical Mechanics: Theory and Exp eriment 2018 , 123407 (2018). 21 [48] Y. Zhao, E. Levina, and J. Zhu, The Annals of Statistics 40 , 2266 (2012). [49] J.-G. Y oung, G. St-Onge, P . Desrosiers, and L. J. Dub ´ e, Ph ysical Review E 98 , 032309 (2018). [50] P . O. Perry and P . J. W olfe, arXiv preprint arXiv:1201.5871 (2012). [51] The Hessian is negative deﬁnite at the critical p oint that one obtains by setting these deriv atives to 0, so this point is a lo cal maximum (as opp osed to a local minimum or saddle p oin t) of the log-likelihoo d function. [52] One desiderata of a correlated degree-corrected mo del is that it satisﬁes the inequalit y P ( A 1 ij = 1 , A 2 ij = 1) ≤ P ( A 1 ij = 1) for all no de pairs ( i, j ). This requirement rules out some p ossibilities for P ( A 1 ij = 1 , A 2 ij = 1), for instance one where this quan tity has no dep endence on no de degrees. [53] Situations when one or more of the denominators in Eqn. (31) are O ( ε ) also require separate deriv ations. How ever, w e do not treat these exceptional cases in the present pap er. [54] L. L ¨ u and T. Zhou, Physica A 390 , 1150 (2011). [55] R. Guimer` a and M. Sales-Pardo, Pro ceedings of the Na- tional Academ y of Sciences of the United States of Amer- ica 106 , 22073 (2009). [56] A. Clauset, C. Mo ore, and M. E. J. Newman, Nature 453 , 98 (2008). [57] M. Pujari and R. Kanaw ati, Net works & Heterogeneous Media 10 , 17 (2015). [58] M. Jalili, Y. Orouskhani, M. Asgari, N. Alip ourfard, and M. P erc, Roy al So ciety Open Science 4 , 160863 (2017). [59] D. Hristov a, A. Noulas, C. Brown, M. Musolesi, and C. Mascolo, EPJ Data Science 5 , 24 (2016). [60] H. Mandal, M. Mirchev, S. Gramatiko v, and I. Mishko vski, in 2018 26th T ele c ommunic ations F orum (TELF OR) (IEEE, 2018) pp. 1–4. [61] K.-K. Kleineb erg, M. Bogun´ a, M. ´ A. Serrano, and F. Pa- padop oulos, Nature Ph ysics 12 , 1076 (2016). [62] R. Matsuno and T. Murata, in Comp anion Pr o ce e dings of The Web Confer ence (2018) pp. 1261–1268. [63] L. Pio-Lop ez, A. V aldeoliv as, L. Tichit, ´ E. Remy , and A. Baudot, arXiv preprint arXiv:2008.10085 (2020). [64] R. E. Tillman, V. K. Potluru, J. Chen, P . Reddy , and M. V eloso, arXiv preprint arXiv:2004.04704 (2020). [65] M. Coscia and M. Szell, arXiv preprint (2020). [66] B. K. F osdick, D. B. Larremore, J. Nishimura, and J. Ugander, SIAM Review 60 , 315 (2018). [67] Note that this K is diﬀerent from the one that denotes the n umber of blo cks elsewhere in this pap er. [68] A. Ghasemian, H. Hosseinmardi, A. Galst yan, E. M. Airoldi, and A. Clauset, Pro ceedings of the National Academ y of Sciences of the United States of America (2019), a v ailable at https://doi.org/10.1073/pnas. 1914950117 . [69] L. G. S. Jeub and M. Bazzi, A generative mo del for mesoscale structure in m ultilay er net works implemented in Ma tlab , https://github.com/MultilayerGM/ MultilayerGM- MATLAB (2016–2019), version 1.0. [70] The case ρ ≤ 0 requires a diﬀerent w ay of sampling nor- malized degrees θ 2 i of nodes in the second la y er to be able to tak e ρ all the w ay to ρ = − 1. More precisely , when ρ is close to − 1, the second la yer is dense if the ﬁrst la yer is sparse (and vice versa), so setting θ 1 = θ 2 is inadequate. [71] M. Magnani, B. Micenko v a, and L. Rossi, arXiv preprint arXiv:1303.4986 (2013). [72] E. Lazega, The Col le gial Phenomenon: The So cial Me ch- anisms of Co oper ation Among Pe ers in a Corp orate L aw Partnership (Oxford Universit y Press, Oxford, UK, 2001). [73] L. T ang, X. W ang, and H. Liu, in 2009 Ninth IEEE In- ternational Confer enc e on Data Mining (IEEE, 2009) pp. 503–512. [74] B. L. Chen, D. H. Hall, and D. B. Chklo vskii, Pro ceedings of the National Academ y of Sciences of the United States of America 103 , 4723 (2006). [75] D. B. Larremore, A. Clauset, and C. O. Buck ee, PLoS Computational Biology 9 , e1003268 (2013). [76] C. Stark, B.-J. Breitkreutz, T. Reguly , L. Boucher, A. Breitkreutz, and M. Ty ers, Nucleic Acids Research 34 , D535 (2006). [77] M. De Domenico, A. Lancic hinetti, A. Arenas, and M. Rosv all, Physical Review X 5 , 011027 (2015). [78] The assumption that communities are identical across la yers is a reasonable one for this data set. When p er- forming communit y detection without this restriction, we ﬁnd that more than 90% of no des stay in the same com- m unity across temp oral lay ers. See [24] for details. [79] P . J. Muc ha, T. Rich ardson, K. Macon, M. A. Porter, and J.-P . Onnela, Science 328 , 876 (2010). [80] L. G. S. Jeub, M. Bazzi, I. S. Jutla, and P . J. Mucha, A generalized Louv ain method for comm unity detection im- plemen ted in MA TLAB , http://netwiki.amath.unc. edu/GenLouvain (2011–2018), version 2.1. [81] W e use GenLouv ain [80] to perform multila yer mo du- larit y maximization. This algorithm requires the sp eciﬁ- cation of (at least) t wo parameters [41, 79], whic h we set to γ = 1 . 2 and ω = 5 . 0. The large v alue of ω ensures that comm unity assignments are identical across la yers. [82] M. E. J. Newman and A. Clauset, Nature Comm unica- tions 7 (2016). [83] J. Davis and M. Goadrich, in Pr o c e edings of the 23r d In- ternational Confer enc e on Machine L e arning (2006) pp. 233–240. [84] M. Bazzi, M. A. Porter, S. Williams, M. McDonald, D. J. F enn, and S. D. Howison, Multiscale Modeling & Sim u- lation: A SIAM Interdisciplinary Journal 14 , 1 (2016). [85] T. J. Sweeting, The Annals of Statistics , 1375 (1980). [86] It is not exactly equal, b ecause with K -fold cross v alida- tion, we are performing a prediction on only 1 /K of the data at a time. [87] M. De Domenico, M. A. Porter, and A. Arenas, Journal of Complex Netw orks 3 , 159 (2015).

Inference of Edge Correlations in Multilayer Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment