Consistent estimation of dynamic and multi-layer block models

Significant progress has been made recently on theoretical analysis of estimators for the stochastic block model (SBM). In this paper, we consider the multi-graph SBM, which serves as a foundation for many application settings including dynamic and m…

Authors: Qiuyi Han, Kevin S. Xu, Edoardo M. Airoldi

Consistent estimation of dynamic and multi-layer block models
Consisten t estimation of dynamic and m ulti-la y er blo c k mo dels Qiuyi Han 1 , Kevin S. Xu 2 , and Edoardo M. Airoldi 1 1 Departmen t of Statistics, Harv ard Univ ersit y , Cam bridge, MA, USA qiuyihan@fas.harvard.edu , airoldi@fas.harvard.edu 2 T ec hnicolor Researc h, Los Altos, CA, USA kevinxu@outlook.com Ma y 20, 2015 Abstract Significan t progress has b een made recently on theoretical analysis of esti- mators for the sto c hastic blo c k mo del (SBM). In this pap er, we consider the multi-gr aph SBM, whic h serv es as a foundation for many application settings including dynamic and m ulti-lay er netw orks. W e explore the asymptotic prop- erties of t wo estimators for the m ulti-graph SBM, namely sp ectral clustering and the maximum-lik eliho od estimate (MLE), as the num ber of lay ers of the m ulti-graph increases. W e deriv e sufficien t conditions for c onsistency of both estimators and prop ose a v ariational appro ximation to the MLE that is com- putationally feasible for large netw orks. W e verify the sufficient conditions via sim ulation and demonstrate that they are practical. In addition, we apply the mo del to t wo real data sets: a dynamic so cial net work and a m ulti-lay er so cial net work with several t yp es of relations. 1 1 In tro duction Mo deling relational data arising from net works including so cial, biological, and infor- mation netw orks has received m uch atten tion recently . V arious probabilistic mo dels for netw orks ha ve b een proposed, including sto c hastic blo c k models and their mixed- mem b ership v arian ts ( Airoldi et al. , 2008 ; Goldenberg et al. , 2010 ). Ho wev er, in man y settings, w e not only ha v e a single netw ork, but a collection of netw orks ov er a common set of no des, whic h is often referred to as a multi-gr aph . Multi-graphs arise in several t yp es of settings including dynamic net works with time-evolving edges, such as time-stamp ed social netw orks of interactions b et ween p eople, and m ulti-lay er netw orks, where edges are measured in multiple w ays suc h as phone calls, text messages, e-mails, face-to-face con tacts, etc. A significant c hallenge with multi-graphs is to e xtract common information across the layers of the multi-graph in a concise represe n tation, y et b e flexible enough to allo w differences across la yers. Motiv ated b y the ab o v e examples, we consider the multi-gr aph sto chastic blo ck mo del first proposed b y Holland et al. ( 1983 ), which divides nodes into classes that define blo c ks in the m ulti-graph. The k ey assumption is that no des share the same blo c k structure o ver the m ultiple la y ers, but the class connection probabilities may v ary across la yers. W e believe this mo del is a flexible and principled wa y of analyzing multi-graphs and pro vides a strong foundation for man y applications. The sp ecial case of a single lay er, often referred to simply as the sto c hastic blo c k model (SBM), has been studied extensively in recen t y ears ( Bick el and Chen , 2009 ; Rohe et al. , 2011 ; Choi et al. , 2012 ; Celisse et al. , 2012 ; Jin , 2012 ; Bic kel et al. , 2013 ; Amini et al. , 2013 ). Ho wev er, the more general multi-graph case has not been studied as muc h. In this pap er, we explore the asymptotic properties of several estimators for the 2 m ulti-graph SBM by letting the n umber of netw ork la yers grow while keeping the n umber of no des fixe d . W e prov e that a sp ectral clustering estimate of the class mem b erships is consisten t for a sp ecial case of the mo del (Section 4.1 ). Next we deriv e sufficient conditions under which the maximum-lik eliho od estimate (MLE) of the class mem berships is consisten t in the general case (Section 4.2 ). Finally we pro- p ose a v ariational approximation to the MLE that is computationally tractable and is applicable to many m ulti-graph settings including dynamic and multi-la y er net- w orks (Section 4.2.1 ). W e apply the sp ectral and v ariational approximation metho ds to sev eral sim ulated and real data sets, including b oth a dynamic so cial netw ork and a so cial netw ork with m ultiple types of relations b etw een p eople (Section 5 ). Our main contribution is the consistency analysis for the MLE, which ensures the tractability of the mo del and pav es the wa y for more sophisticated mo dels and inference techniques. T o the b est of our kno wledge, we provide the first theoretical results for the m ulti-graph SBM for a gro wing n um b er of la y ers. 2 Related w ork Probabilistic mo dels for netw orks hav e b een studied for several decades; many com- monly used mo dels are discussed in the survey b y Goldenberg et al. ( 2010 ). More recen t work includes non-parametric netw ork mo dels using graphons ( Airoldi et al. , 2013 ; W olfe and Olhede , 2013 ; Gao et al. , 2014 ). Most previous mo dels assume that a single net w ork, rather than a m ulti-graph, is observ ed. Tw o settings where multi-graphs arise include dynamic and m ulti-lay er netw orks. Dynamic netw ork mo dels typically assume that a sequence of netw ork snapshots is observ ed at discrete time steps. Previous w ork on dynamic net work mo dels has built 3 up on mo dels for a single net w ork augmen ted with Mark ovian dynamics. Ahmed and Xing ( 2009 ); Hanneke et al. ( 2010 ) built up on exp onen tial random graph mo dels. Ishiguro et al. ( 2010 ); Y ang et al. ( 2011 ); Ho et al. ( 2011 ); Xu and Hero ( 2014 ); Xu ( 2015 ) built on sto c hastic blo c k mo dels. Sark ar and Mo ore ( 2005 ); Sark ar et al. ( 2007 ); Durante and Dunson ( 2014 ) used laten t space mo dels. F oulds et al. ( 2011 ); Heaukulani and Ghahramani ( 2013 ); Kim and Lesk ov ec ( 2013 ) used latent feature mo dels. Multi-la yer net works consider multiple t yp es of connections simultaneously . F or example, F aceb o ok users in teract b y using “likes”, comments, messages, and other means. Multi-lay er net works go by many other names like m ulti-relational, multi- dimensional, multi-view, and multiplex netw orks. The analysis of m ulti-lay er net- w orks has a long history ( Holland et al. , 1983 ; Fienberg et al. , 1985 ; Szell et al. , 2010 ; Muc ha et al. , 2010 ; Magnani and Rossi , 2011 ; Oselio et al. , 2014 ). Ho wev er there has not b een m uch work on probabilistic mo deling of such netw orks, aside from the multi-view latent space mo del prop osed b y Salter-T o wnshend and McCormick ( 2013 ), which couples the laten t spaces of the multiple lay ers. A third related setting in volv es mo deling p opulations of net works, where eac h observ ation consists of a net w ork snapshot dra wn from a probabilit y mass function o ver a net work-v alued sample space. Durante et al. ( 2014 ) prop osed a nonparametric Ba yesian mo del for this setting. This setting differs from the multi-graph setting that we consider in this pap er b ecause the netw ork snapshots (lay ers) are drawn in an indep enden t and iden tically distributed (iid) fashion, with no coupling b et w een the snapshots. The statistical prop erties of the inference algorithms in b oth dynamic and multi- la yer net work models hav e not typically been studied. Recen tly there has b een a lot 4 of progress on consistency analysis for single netw orks. Maxim um-likelihoo d estima- tion, its v ariational appro ximation, and sp ectral clustering hav e all b een pro ven to b e consistent under the sto c hastic blo ck mo del ( Bick el and Chen , 2009 ; Rohe et al. , 2011 ; Choi et al. , 2012 ; Celisse et al. , 2012 ; Zhao et al. , 2012 ; Jin , 2012 ; Bick el et al. , 2013 ; Lei and Rinaldo , 2014 ; Y ang et al. , 2014 ) as the n umber of no des N → ∞ . In tuitively , for each new no de added to the graph, we observe N realizations, hence larger N pro vides more information leading to consistent estimation of the model. W e extend the ideas used in single net works to multi-graphs. W e note that the asymptotic regime is different in this case. F or a single net work, one t ypically lets N → ∞ , while for multi-graphs, w e let T → ∞ with N fixe d . In tuitively it means w e do not need to observ e a very large netw ork to get a correct understanding of the structure. Instead, w e can gain the information through multiple samples, whic h may represent, for example, multiple observ ations o ver time or multiple re- lationships. In practice, it ma y b e more realistic to allow N to gro w along with T , particularly for the dynamic net work setting. Allowing N to grow pro vides mor e information; thus our analysis with fixed N serves as a conserv ative analysis for differen t settings. 3 Multi-graph sto c hastic blo c k mo del W e present an ov erview of the multi-gr aph sto chastic blo ck mo del first prop osed b y Holland et al. ( 1983 ). A single relation is represented b y an adjacency matrix G t = ( G t ij ) , i, j = 1 , . . . , N . W e fo cus on symmetric binary relations with no self- edges. F or a m ulti-graph, we observe an adjac ency arr ay ~ G = { G 1 , G 2 , . . . , G T } sharing the same set of no des. Subscripts denote the same no de pairs for an y t , while the sup erscript t indexes lay ers of the multi-graph. A la yer may refer to 5 time or t yp e of relation depending on the application. If ~ G is a random adjacency arra y for N no des and T relations, then the probabilit y distribution of ~ G is called a sto chastic multi-gr aph . Let the edge G t ij b e a Bernoulli random v ariable with success probability Φ t ij . Φ t = (Φ t ij ) ∈ [0 , 1] N × N is the probability matrix of graph G t . Let ~ Φ = { Φ 1 , Φ 2 . . . Φ T } be the pr ob ability arr ay . W e assume the indep endence of edges within and across lay ers conditioned on the probabilit y arra y . That is, the adjacency array is generated according to G t ij | ~ Φ ind ∼ Bern(Φ t ij ) . The multi-graph sto c hastic blo c k mo del is a sp ecial case of a sto c hastic multi- graph. In the m ulti-graph SBM, netw orks are generated in the following man- ner. First eac h no de is assigned to a class with probability π = { π 1 , . . . , π K } where π k is the probability for a no de to b e assigned to class k . Then, giv en that no des i and j are in classes k and l , resp ectiv ely , an edge betw een i and j in netw ork lay er t is generated with probabilit y P t kl . In other words, no des in the same classes in the same la yer hav e the same connection probabilit y gov erned b y ~ P = { P 1 , P 2 , . . . , P T } ∈ [0 , 1] K × K , the class c onne ction pr ob ability arr ay . Let c i ∈ { 1 , . . . , K } denote the class lab el of no de i . Then Φ t ij = P t c i c j . The no des ha v e class lab els ~ c shar e d b y all of the lay ers of the m ulti-graph, but in each la yer the class connection probabilities P t kl ma y b e different. As we consider undirected netw orks, P t is a symmetric matrix with K ( K + 1) / 2 free parameters. One can see that the (single netw ork) SBM is a sp ecial case of the multi-graph SBM with T = 1. Though simple, this multi-graph mo del has not b een formally studied in the a p osteriori setting where class lab els are estimated. It serv es as a basis for many 6 settings including dynamic netw orks and netw orks with m ultiple relations. More im- p ortan tly , it can b e theoretically analyzed and can provide insigh t on more complex mo dels. 4 Consisten t estimation for the m ulti-graph sto c hastic blo c k mo del Holland et al. ( 1983 ) only discussed estimation of the multi-graph SBM with blo c ks sp ecified a priori. The sample prop ortion of eac h lay er t is the maximum-lik eliho od estimate (MLE) of the class probabilit y matrix P t . How ever, in most applications, the blo c k structure is unknown. Hence our main goal is to accurately estimate the class mem b erships. W e extend sev eral inference techniques used for the single net work SBM to the m ulti-lay er case. It is not immediately straigh tforward ho w w e can utilize inference techniques designed for the single netw ork SBM. One may imagine inferring ~ c indep enden tly from eac h net work and a veraging across them, e.g. b y ma jority v oting. That is, eac h no de is assigned the class lab el that o ccurs most often. W e find in sim ulations that this ad-ho c metho d often does not work well. W e prop ose sp ectral clustering on the mean graph as a motiv ating metho d for a sp ecial case of the mo del. Then w e discuss maximum-lik elihoo d estimation, a natural wa y to combine the information con tained in the differen t lay ers, for the general case. Maxim um-lik eliho o d estimation is intractable for large netw orks so we also consider a v ariational appro ximation to the MLE. Our main fo cus is on the consistency prop erties of these metho ds. W e consider a fixed num ber of no des N but let the num b er of graph la yers T → ∞ . In realit y , 7 although w e do not hav e infinite la yers, w e often encounter situations with a large n umber of la yers, suc h as dynamic netw orks ov er long perio ds of time. 4.1 Consistency of sp ectral clustering Sp ectral clustering is a p opular choice for estimating the blo c k structure of the SBM b ecause it scales to large netw orks and has sho wn to b e consistent as N → ∞ ( Sussman et al. , 2012 ). The metho d is based on singular v alue decomp osition and K-means clustering on the singular v ectors. A natural w ay to extend sp ectral clustering from single net wor ks to multi-graphs is to apply spectral clustering on the mean graph ¯ G = 1 T P T t =1 G t . This method is in tuitively appealing as it matches with the assumption of a single set of class lab els shared by all of the lay ers. W e sho w that under some stationarit y and ergo dicit y conditions, it indeed provides a consisten t estimate of the class assignments. Sp ecif- ically we consider the case where the class connection probabilities P t v ary across la yers but hav e the same mean M . The follo wing theorem shows the consistency of sp ectral clustering on the mean graph ¯ G if the mean M is identifiable. Theorem 1. Assume ~ P fol lows a stationary er go dic pr o c ess such that E ( P t kl ) = µ kl and V ar ( P t kl ) =  2 kl for al l t . Assume M = [ µ kl ] is identifiable, i.e. M has no identic al r ows. L et ¯ G = 1 T P T t =1 G t . Sp e ctr al clustering of ¯ G gives ac cur ate lab els as T → ∞ . That is, let U N × K b e the first K right singular ve ctors in the singular value de c omp osition of ¯ G . K-me ans clustering on the r ows of U N × K outputs class estimates ˆ c 1 , ..., ˆ c N . Up to p ermutation, ˆ c = c, a.s. as T → ∞ . W e pro vide a sk etch of the pro of; details can b e found in Appendix A . Since we ha ve indep enden t errors in the probability matrix and also indep enden t errors in the Bernoulli observ ations, av eraging cancels the error so that ¯ G → C M C 0 . Here C is a 8 rank K matrix incorporating the class assignment v ectors. Using an inequality from Oliv eira ( 2009 ), w e b ound the distance b et w een the singular v ectors of ¯ G and C M C 0 . Therefore, sp ectral clustering on ¯ G clusters the no des in to K different classes. R emark 1 . T o determine the n umber of classes is a difficult mo del selection problem ev en for a single netw ork. W e will not discuss this problem in detail. W e assume K is fixed and kno wn in this pap er. R emark 2 . The diagonal of G t is alw a ys 0 b ecause no self-edges are allo wed; ho wev er the diagonal of C M C 0 is not necessarily 0. This may not cause a problem as N → ∞ . But for finite N , it ma y cause error in estimating the singular v ectors. If this is the case, w e ma y utilize the singular v alue decomp osition that minimizes the off-diagonal mean square error arg min U,S X i 0 and δ > 0 , if m ( c ) = min k n k ( c ) is sufficiently lar ge, then ˆ c = arg max z X t f t ( z ) → c, a.s. as T → ∞ . The idea is that P t f t ( z ) is the sum of indep enden t profile log-likelihoo ds. W e need N to b e sufficien tly large so that the expectation of the profile log-likelihoo d at eac h lay er is maximized at the true labels c . Then as T → ∞ , we hav e con vergence to e xpectation for P t f t ( z ). W e formalize the ideas by establishing the following lemmas. Lemma 1 (F rom Choi et al. ( 2012 )) . F or any lab el assignment z , let r ( z ) c ount the numb er of no des whose true class assignments under c ar e not in the majority within their r esp e ctive class assignment under z . L et δ = min k,l max m σ ( P km ) + σ ( P lm ) − 2 σ  P km + P lm 2  . 12 Then the exp e ctation of the lo g-likeliho o d h is maximize d by h ( c ) , and h ( c ) − h ( z ) ≥ r ( z ) 2 δ min k n k ( c ) . In p articular, for al l z 6 = c , h ( c ) − h ( z ) ≥ 1 2 δ min k n k ( c ) . Lemma 1 sho ws the exp ectation of the log-lik eliho o d is maximized at the true parameters, and the difference of the true parameters and any other candidate is at least some distance apart which dep ends on the column difference of the probability matrix. How ever, as we work with the profile log-likelihoo d, we establish Lemmas 2 and 3 to b ound the difference betw een the exp ectation of the profile log-lik eliho od and the complete log-lik eliho o d. Lemma 2. L et x ∼ 1 N Bin ( N , p ) . F or p ∈ (0 , 1) , E ( σ ( x )) → σ ( p ) + 1 2 N + O  1 N 2  , as N → ∞ . Lemma 3. Assume C 0 ≤ P kl ≤ 1 − C 0 , C 0 > 0 . F or any δ 0 > 0 and any z , if min k n k ( z ) is lar ge enough, then the differ enc e b etwe en the exp e ctation of the pr ofile lo g-likeliho o d g ( z ) and the exp e ctation of the c omplete lo g-likeliho o d h ( z ) is b ounde d in the fol lowing manner:     g ( z ) − h ( z ) − K ( K + 1) 4     ≤ δ 0 . Lemma 2 utilizes a T a ylor series expansion. F or simplicity , we use big O ( · ) notation instead of sp ecifying an actual b ound. Readers can refer to Appendix B 13 for the b ound and the constants in the b ound. Lemma 3 uses Lemma 2 and sho ws that with a sufficien tly large num ber of no des, the difference of the exp ectation of the profile log-likelihoo d and complete log-likelihoo d is K ( K + 1) / 4 and a negligible term δ 0 . Com bining the lem mas and using the concen tration inequalit y , w e can show that Theorem 2 provides sufficient conditions for the consistency of the m ulti-graph SBM. The pro ofs of Lemmas 1 – 3 and Theorem 2 can b e found also in App endix B . R emark 3 . The main difference b et ween the N → ∞ case considered in most previ- ous work and the T → ∞ case that we consider is that, for N → ∞ , a direct b ound is put on f and l . F or T → ∞ , we need only to b ound the exp e ctation of f and l . This is newly studied here. In other w ords, for some particular class connection probabilit y matrix P , the num b er of no des required in a single net w ork to hav e an accurate estimate is muc h larger than what is needed in a m ulti-graph with a gro wing n um b er of la yers. 4.2.1 V ariational appro ximation The MLE is computationally infeasible for large netw orks b ecause the n umber of candidate class assignments gro ws exp onen tially with the n umber of nodes. T o o ver- come the computational burden, v ariational appro ximation, which replaces the join t distribution with indep enden t marginal distributions, can b e used to approximate the MLE. Daudin et al. ( 2008 ) has a detailed discussion of v ariational approximation in the SBM. W e adapt it to the multi-graph SBM, resulting in the following up date 14 equations: b ik ∝ π k Y j 6 = i Y t Y l h ( P t kl ) g t ij (1 − P t kl ) 1 − g t ij i b j l π k ∝ X i b ik P t kl = P i 6 = j b ik b j l g t ij P i 6 = j b ik b j l , where the b ik ’s denote the v ariational parameters. The deriv ation is straightforw ard; w e pro vide details in App endix D . V ariational appro ximation has b een shown to b e consistent in the SBM ( Celisse et al. , 2012 ; Bick el et al. , 2013 ). W e conjecture that the p erformance of v ariational appro ximation is also go od in the multi-graph SBM. Unless otherwise sp ecified we use v ariational approximation to replace the MLE in all exp erimen ts. 5 Exp erimen ts 5.1 Numerical illustration W e b egin with a toy example where w e in vestigate empirically how many no des are needed for the profile MLE to correctly reco ver the classes as T → ∞ . Due to the computational intractabilit y of computing the exact profile MLE, we consider v ery a small netw ork with N = 16 no des and K = 2 classes where each class has 8 no des. Consider tw o multi-graph SBMs with the follo wing probabilit y matrices: Case 1: P t ≡    0 . 55 0 . 45 0 . 45 0 . 55    15 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 −5 −4 −3 −2 −1 0 log N log N(E( σ (x)) − σ (p)) − 1/2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Bound Exact (a) p = 0 . 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 −5 −4 −3 −2 −1 0 log N log N(E( σ (x)) − σ (p)) − 1/2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Bound Exact (b) p = 0 . 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 −5 −4 −3 −2 −1 0 log N log N(E( σ (x)) − σ (p)) − 1/2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Bound Exact (c) p = 0 . 4 Figure 1: Comparison of b ound in Lemma 2 to exact v alues of N ( E ( σ ( x )) − σ ( p )) − 1 / 2 for v arying N and p . The tigh tness of the b ound affects the minim um n um b er of no des required to guaran tee consistency in Theorem 2 . 16 T able 1: Minim um n umber of no des N required for consistency of the profile MLE with K = 2 classes under different v alues for parameters C 0 and δ from Theorem 2 . δ C 0 0.3 0.25 0.2 0.15 0.1 0.05 0.165 42 50 64 88 124 184 0.091 44 52 66 92 142 234 0.040 46 56 70 94 148 314 0.010 66 68 74 100 156 330 Case 2: P t ≡    0 . 51 0 . 49 0 . 49 0 . 51    The δ (defined in Theorem 2 ) corresp onding to the ro w difference of P t is muc h smaller in case 2. Empirically the profile MLE succeeds to get the true labels in case 1 while it fails in case 2. F urther analysis shows that in order to hav e consistency giv en the class connection probabilit y matrix P t in case 2, the total num b er of no des should b e at least 40. This to y example demonstrates that conditions on the probabilit y matrices and net w ork size are necessary for consistency . Theorem 2 pro vides sufficient conditions. Next we inv estigate the tightness of the conditions in Theorem 2 . The tightness of Lemma 1 was studied b y Choi et al. ( 2012 ). W e chec k the tightness of Lemma 2 . F or differen t p , we can calculate the exact v alue of N ( E ( σ ( x )) − σ ( p )) − 1 / 2 and compare it to the b ound from Lemma 2 . Figure 1 sho ws that the b ound is lo ose for small N , but has almost the same asymptotic decay as the exact calculation. F or small N , the remainder in T aylor expansion causes deviation. Also the bounds are lo oser for p closer to 0 or 1 but still informative in most cases. F or the sp ecial case of K = 2 classes, we can calculate all of the constants in the sufficien t conditions in Theorem 2 for different v alues of C 0 and δ b y en umerating 17 cases. Details are pro vided in App endix C . T able 1 shows the smallest num b er of no des N that is sufficien t for consistency of the profile MLE to hold for differen t v alues of C 0 and δ . Note that the minimum N is in the tens or hundreds, suggesting that the b ounds in Theorem 2 are not o verly lo ose and are indeed of practical significance. 5.2 Comparison with ma jorit y voting As previously mentioned, ma jority v oting is another w ay to utilize inference metho ds for a single netw ork on multi-graphs. W e consider tw o ma jorit y v ote metho ds as baselines for comparison, one that utilizes sp ectral clustering on eac h lay er, and one that applies a v ariational appro ximation to each lay er. When using ma jority v oting b et w een differen t lay ers of the netw ork, the estimated class lab els for eac h la yer must first b e aligned or matc hed. W e utilize the Hungarian algorithm ( Kuhn , 1955 ) to compute the maxim um agreemen t matc hing betw een the estimated lab els at lay er t with the ma jorit y vote up to la yer t − 1. W e conduct simulations to compare our prop osed metho ds of sp ectral clustering on the mean graph and profile maxim um-likelihoo d estimation with the ma jor- it y v ote baselines. W e consider a well-studied scenario where w e ha v e 128 no des initialized randomly into 4 classes ( Newman and Girv an , 2004 ). F or each lay er, the within-class connection probabilit y is 0 . 0968, and the b et w een-class connection probabilit y is 0 . 0521. Under suc h connection probabilities, the classes are b elo w the detectabilit y limit ( Decelle et al. , 2011 ) for a single lay er, so the class estimation accuracy from a single lay er is v ery low. W e increase the num b er of la yers and observ e ho w the accuracy changes. Figure 2 shows the accuracy of the tw o prop osed methods compared to the t wo 18 0 5 10 15 20 0.2 0.4 0.6 0.8 1.0 Number of la yers Class estimation accuracy Profile MLE Spectral Mean V ote (Spectral) V ote (V ar iational) Figure 2: Simulation experiment comparing the prop osed methods of profile MLE and sp ectral clustering on the mean graph with tw o ma jorit y vote baselines. The prop osed metho ds increase in accuracy as the num ber of la yers increases, but the t wo heuristic metho ds based on ma jority v ote do not. ma jority voting metho ds av eraged ov er 100 replications. Both the profile MLE and sp ectral clustering on the mean graph hav e the anticipated increasing accuracy ov er time. But the accuracies of the t w o heuristic ma jority vote metho ds do not impro ve. Though one ma y exp ect the errors in ma jorit y v ote to b e canceled out ov er time, these results sho w that, without careful av eraging of errors, we cannot gain from the m ultiple la yers. W e find that this is due to choosing connection probabilities b elo w the detectabilit y limit; if w e make the estimation problem easier b y increasing the within-class probability ab o ve the detectabilit y limit, then the ma jority vote metho ds do impro ve with increasing lay ers, alb eit muc h slow er than the metho ds w e propose in this pap er. 5.3 MIT Realit y Mining data Next we apply our mo del on the MIT Reality Mining data set ( Eagle and Pen tland , 2006 ). This data set comprises 93 students and staff at MIT in the 2004-2005 sc hool 19 0 10 20 30 40 0.00 0.10 0.20 0.30 W eek Class connection probability Sloan <−> Sloan Staff <−> Staff Sloan <−> Staff Figure 3: Estimates of class connection probabilities in the Reality Mining data set. The probabilities v ary significan tly o ver time, particularly for edges b etw een Sloan studen ts. y ear during which time their cell phone activities w ere recorded. W e construct dynamic netw orks based on physical proximit y , whic h was measured using scans for nearb y Blueto oth devices at 5-min ute in terv als. W e exclude data near the beginning and end of the experiment where participation w as lo w. W e discretize time into 1- w eek in terv als, similar to Mutlu and Aviy ente ( 2012 ); Xu et al. ( 2014 ), resulting in 39 time steps b et w een August 2004 and Ma y 2005. W e treat the affiliation of participants as ground-truth class labels and test our prop osed metho ds. Two communities are found: one of 26 Sloan business school studen ts, and one of 67 staff w orking in the same building. Since degree hetero- geneit y may cause problems in detecting comm unities using the SBM ( Karrer and Newman , 2011 ), we reduce its impact b y connecting each participant to the 5 other participan ts who spent the most time in ph ysical proximit y during each w eek. Fig- ure 3 sho ws the empirical blo c k connection probabilities within and betw een the tw o classes, estimated by the profile MLE. The class connection probabilities v ary sig- nifican tly ov er time, which v alidates the imp ortance of the v arying class connection 20 T able 2: Class estimation accuracy in the Reality Mining data set given data up to w eek listed in the first column. Best performer in each ro w is listed in b old. Both the prop osed sp ectral clustering on the mean graph and profile maximum-lik eliho od estimation approaches impro ve o ver time, but ma jorit y v ote do es not. W eek Ma j. vote Spectral Mean Profile MLE 10 0.76 0.62 0.57 15 0.82 0.94 0.95 20 0.83 0.95 0.98 25 0.78 0.95 0.99 30 0.80 0.97 0.99 35 0.80 0.97 0.99 End 0.77 0.97 0.99 probabilit y assumption in our model. Notice that the t wo communities b ecome well- separated around week 8. The class estimation accuracies for the different metho ds are shown in T able 2 . Since the communit y structure only b ecomes clear at around w eek 8, the sp ectral and profile MLE metho ds are initially worse than ma jority v ot- ing but quickly improv e and are sup erior ov er the remainder of the data trace. By com bining information across time, the prop osed methods successfully reveal the comm unity structure while ma jorit y v oting con tinues to improperly estimate the classes of about 20% of the p eople. 5.4 A U-CS m ulti-lay er netw ork data W e lo ok at another example from a m ulti-la yer netw ork comprising fiv e kinds of self-rep orted on-line and off-line relationships b et ween the emplo yees of a research departmen t: F aceb o ok, leisure, work, co-authorship, and lunch ( A U-CS ML ). W e assume the class structure to b e inv arian t across the differen t t yp es of relations and apply our model. F or mo del selection, w e extend the In tegrated Classification Lik eliho o d (ICL) criterion prop osed by Daudin et al. ( 2008 ) for the single netw ork 21 (a) Co-authorship (b) F aceb o ok (c) Leisure (d) Lunch (e) W ork K ICL 2 4087 3 3914 4 3830 5 3841 6 3878 (f ) ICL (low er denotes b etter fit) Figure 4: The estimated communit y structures in the AU-CS multi-la y er net works o verlaid onto the adjacency matrices of different relations. The dots denote connec- tions (edges), and the grids correspond to SBM blocks. 22 SBM to m ulti-graphs to select the num b er of blo cks K . Sp ecifically we maximize − 2 Q ( ~ G ) + ( K − 1) log N + T K ( K + 1) 2 log N ( N − 1) 2 , where Q ( ~ G ) is the v ariational appro ximation to the complete log-lik eliho od. W e initialize the v ariational approximation with differen t randomizations as well as the sp ectral clustering solution. The term is maximized at K = 4. Figure 4 sho ws the estimated 4 classes o v erlaid onto the adjacency matrix of eac h relation. Although we hav e no ground truth for this data set, w e detect w ell-separated comm unities in all relations aside from co-authorship, whic h is an extremely sparse la yer. Notice once again the difference in empirical connection probabilities ov er the m ultiple lay ers of the multi-graph. F or this data set, we do not hav e ground truth lab els to ev aluate the class es- timation accuracy . W e note, how ev er, that the ICL obtained b y our v ariational appro ximation algorithm is muc h b etter than the ICLs obtained by fitting an SBM on the mean graph and b y ma jorit y v ote, b oth of whic h are o ver 4000. 6 Discussion In this pap er, we inv estigated the multi-graph sto c hastic blo c k model applied to dynamic and multi-la y er netw orks with inv ariant class structure. Both sp ectral clustering on the mean graph and maximum-lik eliho od estimation are prov ed to b e consistent for a fixed num ber of no des when we ha ve an increasing num b er of net work la yers, pro vided certain sufficien t conditions are satisfied. There are sev eral in teresting av en ues for extensions of our analysis. First we can add a la yer of probabilistic mo deling on the probabilit y matrices if w e ha ve 23 additional information. Since dynamic net works usually v ary smo othly ov er time, w e can put a state-space mo del on the adjacency arra y ( Xu and Hero , 2014 ). W e can also use a hierarc hical mo del on the probability matrices to couple them for analyzing multi-la y er netw orks. Since our sufficient conditions do not consider such additional structure, an interesting area of future work would b e to deriv e sufficien t conditions that utilize the structure on the probabilit y matrices, whic h would likely pro duce tighter b ounds. It w ould also b e in teresting to draw connections to recent w ork on consistent estimation for p opulations of net works ( Durante et al. , 2014 ), for which no coupling betw een samples (la yers) exists. A Pro of of Theorem 1 Pr o of. Denote the laten t class lab el for eac h no de as a vector ~ C i = ( C i 1 , ..., C iK ) where C ij =        0 , c i 6 = j 1 , c i = j . Define the N × K matrix C =       ~ C 1 . . . ~ C N       . Notice that E ( ¯ G ) = E ( ¯ Φ) = C E ( ¯ P ) C 0 = C M C 0 , where the last equalit y follo ws from ergo dicit y of the pro cess { P t } . In tuitively ¯ G w ould conv erge to C M C 0 . Since the matrix of eigen vectors of C M C 0 only has K distinct ro ws, the eigenv ectors of ¯ G would con verge to those of C M C 0 , and 24 ev entually the rows of the eigen v ector matrix w ould b e well-separated for no des in differen t classes. More formally , w e first b ound the difference of ¯ G and C M C 0 . Let k A k F = ( P i,j a 2 ij ) 1 / 2 denote the F rob enius norm of a matrix A . W e ha v e E ( k ¯ G − C M C 0 k 2 F ) = X i,j V ar( ¯ G ij ) = X i,j  E (V ar( ¯ G ij | ¯ Φ ij )) + V ar( E ( ¯ G ij | ¯ Φ ij ))  . The first term can b e b ounded by V ar( ¯ G ij | ¯ Φ ij ) = ¯ Φ ij (1 − ¯ Φ ij ) T ≤ 1 4 T b ecause 0 ≤ ¯ Φ ij ≤ 1. F or the second term, V ar( E ( ¯ G ij | ¯ Φ ij )) = V ar( ¯ Φ ij ) = V ar( ¯ P c i c j ) =  2 c i c j T . Therefore, E ( k ¯ G − C M C 0 k 2 F ) ≤ N 2 (1 + 4  2 ) 4 T where  = max c i ,c j  c i c j . By the Marko v inequality , for an y δ > 0, P ( k ¯ G − C M C 0 k 2 F > δ ) ≤ N 2 (1 + 4  2 ) 4 T δ → 0 as T → ∞ As a result, the sp ectral norm k ¯ G − C M C 0 k ≤ k ¯ G − C M C 0 k F go es to 0 to o. Based on lemma A.2 b y Oliv eira ( 2009 ), if M has K distinct eigenv alues, then the eigen- v ectors of ¯ G are close to the corresp onding eigen vectors of C M C 0 . That is, let u i b e the eigenv ector corresp onding to the i th largest eigen v alues of ¯ G . Let θ i b e the coun terpart for C M C 0 . If k ¯ G − C M C 0 k <  , then k u i u T i − θ i θ T i k < δ  . This implies that 1 − ( u T i θ i ) 2 < δ θ . That is, u i is close to θ i or − θ i . But C M C 0 has only K 25 distinct ro ws. So the results sho w a sp ectral clustering on ¯ G will even tually lead to p erfect reco very of the class lab els. B Pro of of Theorem 2 W e begin with the proofs of Lemmas 1 – 3 . Pr o of of L emma 1 . This is from Lemmas A1 and A2 by Choi et al. ( 2012 ). The main arguments are as follows. h , the exp ectation of the log-lik eliho o d, is alwa ys maximized at the true parameters. F or an y partition of P , an y refinement of the partition increases h . F or an y lab el assignment z , we can find a refinement that has at least r ( z ) / 2 pairs of no des that connect to at least min k n k ( c ) of no des that differ at least δ from the truth. Pr o of of L emma 2 . Because of symmetry , we only consider p ∈ (0 , 1 2 ]. Let C 0 = p/ 2. Then C 0 < p < 1 − C 0 . Let region C = [ C 0 , 1 − C 0 ]. By the Chernoff bound, P ( | x − p | >  ) ≤ 2 exp( − 2 N  2 ). Therefore, P ( x 6∈ C ) ≤ 2 exp( − N p 2 / 2). Let E C ( x ) = P x ∈ C xp ( x ). The subscript C denotes an y op eration restricted on region C . Define the following functions and constan ts: σ ( p ) = p log( p ) + (1 − p ) log (1 − p ); M 0 = max p ∈ C | σ ( p ) | = − σ (0 . 5) ≤ 0 . 7 σ 0 ( p ) = log( p ) − log(1 − p ); M 1 = max p ∈ C | σ 0 ( p ) | = log (1 − C 0 ) − log ( C 0 ) σ 00 ( p ) = 1 p + 1 1 − p ; M 2 = max p ∈ C | σ 00 ( p ) | = 1 C 0 + 1 1 − C 0 σ 000 ( p ) = − 1 p 2 + 1 (1 − p ) 2 ; M 3 = max p ∈ C | σ 000 ( p ) | = 1 C 2 0 − 1 (1 − C 0 ) 2 σ (4) ( p ) = 1 2 p 3 + 1 2(1 − p ) 3 ; M 4 = max p ∈ C | σ (4) ( p ) | = 1 2 C 3 0 + 1 2(1 − C 0 ) 3 26 W e can get the follo wing bounds: | E ¯ C σ ( x ) | ≤ M 0 P ( x 6∈ C ) ≤ 2 exp  − N p 2 2  E ( x − p ) 3 = p (1 − p )(1 − 2 p ) N 2 ≤ 1 4 N 2 E ( x − p ) 4 = p (1 − p ) 3 + p 3 (1 − p ) N 3 + 3( N − 1) p 2 (1 − p ) 2 N 3 ≤ 1 2 N 3 + 1 4 N 2 By T a ylor expansion on region C , σ ( x ) = σ ( p ) + σ 0 ( p )( x − p ) + σ 00 ( p ) 2 ( x − p ) 2 + σ 000 ( p ) 6 ( x − p ) 3 + R ( x ) | R ( x ) | ≤ max x ∈ C | σ (4) ( x )( x − p ) 4 / 24 | . Th us N [ E ( σ ( x )) − σ ( p )] − 1 2 = N E  σ ( x ) − σ ( p ) − σ 0 ( p )( x − p ) − σ 00 ( p ) 2 ( x − p ) 2  ≤ N E C  σ ( x ) − σ ( p ) − σ 0 ( p )( x − p ) − σ 00 ( p ) 2 ( x − p ) 2  + N (2 M 0 + 2 M 1 ) P ( x 6∈ C ) ≤ N E C  σ 000 ( p ) 6 ( x − p ) 3  + max x ∈ C | σ (4) ( x )( x − p ) 4 / 24 | + N (2 M 0 + 2 M 1 ) P ( x 6∈ C ) ≤ M 3 24 N + M 4 24  1 2 N 2 + 1 4 N  + 2 N  1 + M 1 + M 3 6  exp  − N p 2 2  → 0 as N → ∞ , whic h completes the pro of. Pr o of of L emma 3 . F or an y z , as ¯ P av erages P , C 0 ≤ ¯ P kl ( z ) ≤ 1 − C 0 . W e ha v e g ( z ) − h ( z ) = X k ≤ l n kl ( z )[ E ( x kl ( z )) − σ ( ¯ P kl )] 27 where x kl ( z ) = o kl ( z ) n kl ( z ) ∼ 1 n kl Bin( n kl , ¯ P kl ) . By Lemma 2 , n kl ( z )[ E ( x kl ( z )) − σ ( ¯ P kl )] → 1 2 + O  1 n kl ( z )  . Therefore g ( z ) − h ( z ) = K ( K + 1) 4 + O  K 2 m ( z )  . Pr o of of The or em 2 . W e wan t to sho w that there exists δ 0 suc h that E f ( c ) − E f ( z ) ≥ δ 0 (1) for all z 6 = c . Then b y Bernstein’s inequalit y , w e hav e 1 T      X t [ f t ( z ) − E f t ( z )]      → 0 as T → ∞ . Therefore 1 T X t f t ( c ) − 1 T X t f t ( z ) → 1 T X t ( E f ( c ) − E f ( z )) ≥ δ 0 for all z 6 = c . Then w e get the conclusion that c is the unique maximizer of P t f t ( z ). T o sho w ( 1 ), we know E f ( c ) − E f ( z ) = g ( c ) − g ( z ) = ( h ( c ) − h ( z )) + ( g ( c ) − h ( c )) − ( g ( z ) − h ( z )) ≥ δ m ( c ) r ( z ) + ( g ( c ) − h ( c )) − ( g ( z ) − h ( z )) . 28 b y Lemma 1 . Let n 0 denote the threshold that, for all N ≥ n 0 , | g ( z ) − h ( z ) − K ( K + 1) / 4 | ≤ δ 1 if m ( z ) ≥ n 0 . Then for m ( c ) ≥ n 0 , g ( c ) − h ( c ) ≥ K ( K + 1) 4 − δ 1 . F or any z , the total n um b er of no des are N ≥ K m ( c ). The total num b er of no des that do not satisfy n k ( z ) ≥ n 0 is at most n 0 ( K − 1). And for the rest of the nodes, w e still ha ve the b ounded feature as in Lemma 2 . Hence g ( z ) − h ( z ) ≤ δ 2 n 0 ( K − 1) + K ( K + 1) 4 + δ 1 . Therefore E f ( c ) − E f ( z ) ≥ δ m ( c ) r ( z ) − 2 δ 1 − δ 2 n 0 ( K − 1) . This is an increasing function of m ( c ). So w e can find large enough m ( c ) that δ m ( c ) r ( z ) − 2 δ 1 − δ 2 n 0 ( K − 1) ≥ δ 0 as required. C Minim um n um b er of no des for consistency with 2 classes Theorem 2 guaran tees consistency of the MLE provided the conditions on C 0 and δ are satisfied, and the minim um num b er of nodes in an y class is large enough. F or the sp ecial case of K = 2 classes, w e can calculate the minimum n umber of no des N required to guarantee consistency . W e need N sufficiently large so that the exp ectation of the log-lik eliho od under the true class lab els c at eac h lay er is larger than the exp ectation of the log-likelihoo d under any other class assignment z . With 29 2 classes, for any giv en v alue of N , w e can simply enumerate ov er the n umber of misclassified nodes to determine if this is indeed true. It suffices only to chec k t wo b oundary cases for the other class assignmen t z : • 2 no des are in one class, and the remaining N − 2 no des are in the other class. • Only a single no de is misclassfied in z , i.e. c i = z i for all except a single v alue of i ∈ { 1 , . . . , N } . If the exp ectation of the log-likelihoo d under c is indeed larger than the exp ectation of the log-likelihoo d under an y other class assignment z for b oth of these cases, then consistency as T → ∞ is guaranteed for this v alue of N . If it is not, then one can simply iterate o v er v alues of N un til it is large enough to guarantee suffic iency . D Details of v ariational appro ximation Let vector ~ z i = ( z i 1 , . . . , z iK ) denote the class assignment v ector for each no de i . z ik =        1 if i in class k 0 otherwise . So ~ z i has all zeros except a single one indicating its class. This notation is easier for writing do wn the likelihoo d. W e denote the initial class assignment probability b y ~ π = ( π 1 , . . . , π K ). This is the m ultinomial parameter of ~ z i . The likelihoo d is l = Y i,k π z ik k Y i

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment