Statistical test for detecting community structure in real-valued edge-weighted graphs
We propose a novel method to test the existence of community structure of undirected real-valued edge-weighted graph. The method is based on Wigner semicircular law on the asymptotic behavior of the random distribution for eigenvalues of a real symme…
Authors: Tomoki Tokuda
Statistical test for detecting comm unit y structure in real-v alued edge-w eigh ted graphs T omoki T okuda ∗ Okinaw a Institut e of Sci ence and T ec hnolo g y Graduat e Universit y , JAP AN Octob er 11 , 20 16 Abstract W e prop ose a no v el metho d to test the existence of comm unit y structure of undirected real-v alued edge-w eigh ted graph. The metho d is based on Wigner semicircular la w on the asymptotic b ehavior of the random distrib ution for eigen v alues of a real sym metric matrix. W e pr o vid e a theoretical foundation for this metho d and rep ort on its p erformance in syn thetic and real data, su ggesting that ou r metho d outp erforms other state-o f-the-art metho ds. 1 In t ro du ction Clustering ob jects based on their similarities is a basic data mining approach in statistical ana lysis. In particular, graph data (or, net w ork data) that re- flect relat io nships b etw een no des, are often acquired in v arious scien tific do- mains suc h as protein-protein in t era ctio n, neural net w ork and so cial net work [6], whic h p oten tially pro vides quite useful information on the underlying structure of the sys tem in qu estion. Sp ecifically , our in terest is to detect a p ossible ‘comm unit y’, or cluster structure of undirected graph, whic h is define d as block structure of a graph ∗ tomoki.tokuda@ oist.jp 1 (Fig.1a), where the corresp onding edge-w eight matrix consists of sev eral clus- ter blo cks (four cluster blo ck s in F ig.1b). T o detect suc h structure, a n um- b er of clus tering metho ds hav e b een prop osed in the literature of statistical ph ysics and information theory [15, 1 8, 5]. Mainly , there are f our approa c hes: graph pa rtitioning, hierarc hical clustering, partitional clustering and sp ectral clustering [6, 5, 16]. Ho w ev er, the con ve ntional f ramew or k for a na lysis of comm unity structure is t ypically for an uns igned graph in wh ic h an edge w eight is constrained to b e non-nega tiv e. Recen tly , it ha s ga ined m uc h atten tion to analyze a signed graph that allow s for nega t ive w eigh ts [10]. Indeed, in real data, it is of ten essen tial to take in to accoun t negative as w ell as p ositiv e r elat io nship for a b etter understanding of the underlying comm unity structure in a graph suc h as social netw ork. Most metho ds in literature, how ev er, address this problem in a rather limited framew ork in whic h edge weigh ts within a clus ter are p os- itiv e while those b et w een clusters negativ e (i.e., w eakly balanced structure) [10]. On the other hand, it still remains an op en ques tion as ho w to cluster no des in a more general framework suc h as negativ e edge weigh ts within a cluster [25]. In the presen t pap er, we consider a general fra mew ork for comm unit y structure as follows . W e assume t hat edge w eigh ts are indep enden tly gen- erated from a generativ e mo del that is sp ecific to a particular cluster blo c k, whic h c haracterizes a distribution of edges in eac h cluster blo ck. F urther, w e assume that these distributions ar e distinguishable in terms of their mean and v ariance. F or this f r a mew ork, as a first step of addressing a clustering problem, w e aim to dev elop a statistical metho d for tes ting the existence of the unde rlying comm unit y structure. As regards statistical test on comm unit y structure, sev eral metho ds hav e b een prop osed in the con text of unsigned (w eighted or unw eigh ted) g r a ph [6]. A ma jor approach to t his pro blem is to ev aluate the stabilit y of cluster solutions when the graph in question is con taminated with noise [7, 9]. If similar cluster solutio ns are yielded for con taminated graphs, it suggests the stabilit y of the cluster solution for the o riginal graph, providing the evidence of the comm unit y structure. The b o otstrap metho d [21] is in the similar line with this approac h. A sec ond approac h is ba sed on comparison o f the cluster solution f or the original graph with those solutions of randomly p erm uted graphs. As a statistic for testing the significance , the en trop y of graph con- figurations [2], or ‘C-score’ fo cusing on the lo wes t internal degrees [11] ha ve b een pro p osed. The common feature of these state-of-the-art metho ds is t hat 2 a cluster solution to the graph in question is required fo r testing. In other w ords, the test result depends on a cluste ring metho d that one uses. In this sense, these methods test the significance o f a yielded cluster solution, ra ther than the existence of comm unity structure itself. F or t he general frame- w ork of our in terest, suc h a n approac h is not applicable b ecause appropriate clustering methods are not readily a v ailable. W e prop ose a general metho d for testing communit y structure of edge- w eigh ted graph with real- v alued w eights, whic h do es not require a cluster solution. Our metho d is based on the asymptotic b eha vior of eigenv a lues of the normalized w eigh t mat rix of gr a ph, whic h is describ ed by Wigner semicircular law when there is no communit y structure. In the similar line with our approa c h, in the case o f binary-v alued gra ph, a statistical test on comm unit y structure has been recen tly prop osed [3] that is based on the exact asymptotic b ehavior of (maximum) eigen v alues. How ev er, their metho d is not directly a pplicable to real-v a lued graphs where w e tak e in to accoun t b oth mean and v a riance, b ecause Bernoulli distribution assumed in their metho d cannot properly capture these quantities . O ur m etho d prov ides a non trivial extension of detecting comm unit y structure to real-v alued graphs, which ha s a wide r ange of applications to net w ork data. In the following sections, firs t, a theoretical foundatio n for our metho d is provided . Second, it is sho wn that our metho d outp erfor ms other metho ds in syn thetic data. Third, we apply our metho d to real data. (a) E dg e-weigh ted graph (b) E dge-weigh t matrix Figure 1 : Il lustr ation of two-way c ommunity structur e in a gr aph. Panel (a): Gr aph r epr esentation (e dge-weighte d gr aph). Panel (b): Matrix r epr esentation (e dge-weight matrix), wher e str engths of r elationships b etwe en no des ar e denote d by c olor. 3 2 Metho d Our statistical test on comm unity structure is based on the probabilit y distri- bution of eigenv alues of the normalized edge-w eigh t matrix (w e define ‘nor- malization’ in Section 2.2). W e make the b est use of asymptotic results on suc h a distribution when there is no comm unity structure, whic h ha v e b een in tensiv ely studied in the field of Random Matrix Theory of Theoreti- cal Ph ysics [12]. In this section, w e provide a t heoretical foundatio n for our statistical tes t. (a) Setting of parameters (b) T racy - Widom distibution Figure 2: Panel (a) : Il lustr ation of setting of c ommunity structur e in matrix r epr esentation wher e the no des ar e arr ange d in the or der of cluster lab els. Each cluster blo ck is char acterize d b y me an µ and standar d deviation σ with cluster blo ck index ( k , k ′ ) . Panel (b): The density function of T r acy-Widom distribution for Gaussian ortho g onal ensembles with β = 1 (the first derivative of F 1 ( x ) in Eq.(7)), gener ate d by the func tion dtw in R-p ackage { RMTstat } . The c ritic al values at signific anc e level α = 0 . 05 and α/ 4 = 0 . 0125 ar e 0 . 979 (d enote d in gr e en line) and 1 . 889 (in r e d line), r esp e ctively. 2.1 Setting W e consider a clustering problem o f no des for undirected edge-we ighted graph G = ( V , E ) where V consists of n v ertices { v 1 , . . . , v n } , and E is represen ted b y the edge-weigh t matr ix W n , whic h is a n × n s ymmetric (real Hermitian) matrix with elemen ts w i,j = w j,i ∈ R and w i,i = 0 ( R denotes a set of real n um b ers). Let us assume that there are K clusters of nodes, denoting them as c 1 , . . . , c K . W e define a cluster blo c k ( k , k ′ ) as a set of we ights w i,j suc h that no des i a nd j b elong to the cluster c k and c k ′ , resp ectiv ely: v i ∈ c k 4 and v j ∈ c k ′ (1 ≤ k, k ′ ≤ K ). Here, we assume that each off-diagonal w eigh t w i,j is independen tly dra wn from a certain distribution. With this assumption, w e define a K -w a y comm unit y structure as c ha r a cterized by differen t distributions in K × K cluster blo c ks. T o elaborat e this definition, w e a ssume the follo wing distribution for eac h cluster blo ck: w i,j ∼ g k ,k ′ ( i 6 = j ) g k ,k ′ = µ k ,k ′ + g × σ k ,k ′ , (1) where v i ∈ c k , v j ∈ c k ′ , and g is a certain probabilit y distribution. This definition suggests that a pair of parameters ( µ k ,k ′ , σ k ,k ′ ) c haracterizes each cluster blo c k, hence , comm unity structure (Fig.2a). Note that in this defini- tion we exclude the degenerate case that µ k ,k ′ = constant and σ k ,k ′ = 0 in whic h the v ariances b ecome zero for the whole set of { w i,j } . Since the comm unit y structure of our in terest is based on differences of w eight distributions, it is translation and scale inv arian t for the whole w eigh ts. Hence, to simplify the problem, a s a prepro cess , we standardize off-diagonal elemen ts of W n using all off-diago nal w eigh ts w i,j ( i 6 = j ) so t hat the mean is zero and the v ariance one. W e denote as S the mapping that standardizes edge-weigh t matrix in this w ay , transforming eac h elemen t of the matrix as S : w i,j → ( w i,j − µ ) /σ for i 6 = j w i,i → 0 , (2) where µ and σ are the mean and the standard deviation of the whole o ff - diagonal elemen ts { w i,j } . Practically , these mean and standard deviation ma y b e replaced b y the empirical coun terparts µ emp and σ emp . F o r the stan- dardized edge-we ight matrix S ( W n ), w e assume that the mean and the v ari- ance of g in Eq.(1) are zero and one, respectiv ely . In this setting, the mean and the v ariance in cluste r blo c k ( k , k ′ ) ar e µ k ,k ′ and σ 2 k ,k ′ , resp ectiv ely . The differences of these parameters distinguish b etw een clusters in terms of the first and second momen ts, while controlling higher momen ts than tw o. Us- ing this setting of comm unit y structure, w e define no comm unity case as a single comm unit y with K = 1 where µ k ,k ′ = 0 and σ k ,k ′ = 1 fo r S ( W n ). Note that since g is arbitr a ry , including a mixture distribution of certain distribution family , our definition of no communit y structure includes the case that eac h we ight is generated from a sp ecific distribution in a list o f 5 distributions in r a ndom order. Importa n tly , when w e sh uffle the off-diag onal elemen ts W n at random (in elemen t-wise manner), the comm unit y structure alw a ys disapp ears. Inde ed, in suc h a case, eac h elemen t w ′ i,j of the sh uffled matrix W ′ n indep enden tly and iden tically follo ws the mixture distribution consisting of different comp onen ts, i.e ., P k ,k ′ π k ,k ′ g k ,k ′ where π k ,k ′ is the pro- p ortion of elemen ts of cluster blo c k ( k , k ′ ) fo r the orig ina l matrix W n . W e use this property for our s tatistical test as a n alternativ e w ay for estimating confiden tia l in terv a ls (Section 2.2). 2.2 Statistical test In this section, w e dev elop a statistical test o n t he existence of comm unity structure defined in Section 2.1 (i.e., K = 1 vs. K > 1). W e base o ur test on the asymptotic b ehavior of the eigen v alues of S ( W n ) as the n um b er of no des n go es to ∞ when there is no comm unit y structure . A useful result of Random Matrix Theory in our contex t is that if t he elemen ts of an infinite dimensional symmetric matrix X indep enden tly follow a certain distribution with mean zero and v ariance one, then the empirical (ra ndom) distribution of eigen v alue λ of X n / √ n , where X n is the principal submatrix of X for the first n ro ws and columns, con v erges almost surely to Wigner semicircular distribution as n go es to ∞ (semicircular la w) [22, p.13 6]: f sc ( λ ) ≡ 1 2 π √ 4 − λ 2 . (3) Note that this law holds for any generativ e distribution of the elemen ts in matrix X (as long as indep enden tly draw n), whic h is referred to as univers al- it y prop ert y of the la w. Also, this law holds ev en if w e replace the diagonal elemen ts by zero’s. In order to a pply the semicircular law to our contex t, w e consider a nor- malization mapping of edge-w eigh t matrix W n , transforming each elemen t of t he matrix as T : w i,j → S ( w i,j ) / √ n, where S is the standardization mapping in Eq.(2). No w, let us ass ume that the elemen ts in an edge-w eigh t matrix W n are generated as in Eq.(1). In this setting, the semicircular law suggests tha t if the eigenv a lues of T ( W n ) for sufficien tly large n do not follo w Wigne r semicircular distribution in Eq.(3), 6 then, there should be some K -w a y comm unit y structure in the graph ( K > 1) b ecause of our a ssumption in Eq.(1) 1 . Ho wev er, the conv erse argumen t do es not necessarily hold. That is to sa y , the fact that the eigen v alues of T ( W n ) fo llo w Wigner semicirc ular distribution do es not imply that there is no comm unity structure (i.e., K = 1). A coun ter example is given as follo ws (pro of in App endix A). Example 1. L et W n b e a n × n symmetric e d ge-weight ma trix that h a s K -wa y c ommunity structur e with the same cluster siz e ( n/K ) as define d in Se ction 2.1. Supp ose that µ k ,k ′ = 0 for ∀ k , k ′ , σ 2 k ,k ′ = 0 for k 6 = k ′ , a nd σ 2 k ,k = 1 . Then, the empiric al eigenva l ue distribution of T ( W n ) almost sur ely c onver ges to Wigner semicir cular distribution in Eq.(3) as n go es to ∞ . Nonetheless, in our setting, w e can sho w that an additio nal condition on the e igen v alue distribution for an expo nen tia lly mapped edge-w eight matrix ensures that the con v erse argumen t also holds. F or this purp ose, we in tro duce the e xp onen tial mapping E xp t ha t transforms eac h elemen t of W n as E xp : w i,j → exp( t × w i,j ) for i 6 = j w i,i → 0 , (4) where t ∈ R is a tuning parameter (w e do not explicitly denote the de- p endence of E xp o n t b ecause o f cluttering). Subsequen tly , w e define the normalization mapping T e for the exp onen tially transformed matrix as T e : w i,j → S ( E xp ( w i,j )) / √ n. (5) No w, the follo wing theorem pro vides a neces sary and sufficien t condition for the existence of comm unit y structure (pro of in App endix B). Theorem 2.1. L et W n b e a n × n weight ma trix define d in Se ction 2 . 1 with the fixe d pr op ortion of cluster sizes ( r 1 , . . . , r K ) and the p airs of p ar ame ters { ( µ k ,k ′ , σ k ,k ′ ) } ( k , k ′ = 1 , . . . , K ) . Supp ose that ther e exists the moment gen- er ating function M ( t ) in an op en interval c ontaining zer o for g ( g is defi n e d in Eq.(1)). Then, the fol lowing statements (C1) and (C2) ar e e quival e nt: (C1) Ther e is n o c ommunity structur e (i. e., K = 1 ) 1 Without the ass umption in Eq.(1), this pr op e rt y do es not ho ld. F or instance, one can make a scale-free graph where the eigen v a lues do not fo llow the semicir cular law [20]. 7 (C2) The empiric al eigenvalue distribution of the fol lowing two matric es almost sur ely c onver ges to Wigner semicir cular distribution in Eq.(3 ) as n go es to ∞ : T ( W n ) and T e ( W n ) for some r e al value t 0 6 = 0 . Theorem 2.1 2 motiv ates us to use the semicircular la w t o establish a statistical tes t on the n ull h yp othesis H 0 : H 0 : There is no comm unity structure . (6) As has b een implied in the pro o f o f Theorem 2.1, the violation of the semicircular la w for T ( W n ) is related to differences of means ( µ k ,k ′ ) among cluster blo cks , while the violation of the la w for T e ( W n ) related to differences of v ariances ( σ 2 k ,k ′ ). Hence, if w e take in to accoun t the eigenv alues of these t w o matrices, w e can capture differen ces of the first and second momen ts of the underlying distributions amo ng cluster blo c ks. Practically , to test the n ull hypothesis H 0 , rather than dealing with the distribution of t he whole eigen v alues, w e fo cus on extreme v alues of eigen v alues, b ecause the proo f of Theorem 2.1 suggests that the extreme eigenv a lues may be closely related to the violation of the sem icircular law whe n there is comm unit y structure. The largest eigen v alue ma y b e p ositiv ely deviated from the expected v alue 2 , or the s mallest eigen v alue ma y b e nega t ively dev iated from the exp ected v alue − 2. Note that strictly sp eaking, the indep enden t assumption on w eigh ts is brok en if w e transform them b y T or T e using e mpirical mean and standard deviation µ emp and σ emp . F or simplicit y , how ev er, w e ignore such an effect here. The b eha vior of the la r g est eigen v a lue has b een w ell studie d in literature when elemen ts of edge-w eight ma t rix W n are indep enden tly generated b y certain symmetric distribution g (t ypically Gaussian, otherwise, its density function ma y b e ev en with less heavier tails than Gaussian distribution) with mean zero and v aria nce one for non-diagonal elemen t s and with mean zero and v ariance t w o f o r diagonal elemen ts. In this setting, the largest eigen- v alue λ max asymptotically follows T racy-Widom distribution for Gaussian orthogonal ense m bles with parameter β = 1: lim n →∞ P ( λ max ≤ 2 + x/n 2 / 3 ) = F 1 ( x ) , (7) 2 Alternatively , o ne ma y replac e the exp onential mapping by the square mapping, which requires less assumption on the existence of momen ts for g . How ever, the square tra nsfor- mation seems to hav e less power when we establish a statistical test (as in the f ollowing paragr aphs), po ssibly b ecause it is not one-to- o ne mapping. This obs erv ation motiv ates us to work on the exp onential mapping. 8 where F 1 ( x ) ≡ exp {− (1 / 2) R ∞ x q ( y ) dy } ( F 2 ( x )) 1 / 2 with F 2 ( x ) ≡ exp {− R ∞ x ( y − x ) q 2 ( y ) d y } where q ( x ) is the solution of Painle v ´ e I I equation d 2 q /dx 2 = xq + 2 q 3 with the b oundary condition q ( x ) ∼ Ai( x ) as x → ∞ [23 , 24]. Note that T racy-Widom distribution is for the maximum eigen v alue of specific t yp e of symmetric mat r ix (e.g., Gaussian ensem bles) while the semicircular la w holds for t he distribution of eigen v alues in general type of symmetric ma- trix (Wigner ensem bles). Moreo v er, in our framew ork, the diagonal elemen ts are all zero, whic h is in a sligh tly differen t situation than the conv en tional assumption for T racy-Widom dis tribution. Nev ertheless, because of the uni- v ersalit y pro p ert y of T ra cy-Widom distribution [1, Theorem 2 1 .4.3,], w e can safely apply Eq.(7 ) to our con text (ob viously , our contex t satisfies the con- dition of univ ersalit y that the diago nal part should b e symmetric with a sub-Gaussian tail). Using T racy-Widom distribution in Eq.(7), w e set confidence in terv als for our statistical test as follo ws. F or the normalized edge-weigh t matrix T ( W n ), w e set the confidence inte rv al C I max of the largest eigen v alue λ max at lev el α . Since the violation of the semicircular la w o ccurs as the p osi- tiv e deviation fro m the exp ected v a lue, w e consider the one-sided confide nce in terv al as ( −∞ , q ) where q is a critical v a lue at significant lev el α , i.e., P ( λ max ≥ q | H 0 ) = α , whic h is estimated b y F 1 ( x ) in Eq.(7 ) (refer to the shap e of its first deriv ativ e in Fig.2b). If the generative distribution g ma y not b e symmetric or it ma y b e heav y-tailed, one may ev alua te the distri- bution of the largest eigenv a lues b y means of p erm utation test for T ( W n ), though it ma y r equire some computation time. In a dditio n to the la rgest eigen v alue, w e also c onsider to t est the s mallest eigen v alue λ min , whic h ma y cause the violation of the sem icircular law (what matters is indeed the la rgest magnitude o f eigen v alue). In this case, the confidence in terv al C I min is given b y ( − q , ∞ ). In the similar manner, w e consider to test the la r gest and the smallest eigen v a lue of the exp onen t ia lly normalized w eigh t matrix. W e firs t standardize the data and t hen apply the mapping T e where w e set t 0 to 1 /2 as default. This results in the transformed matrix T e ( S ( W n )) (we denote the confidence in terv als as C I ′ max and C I ′ min , resp ectiv ely). Since this pro cedure in v olv es series of four statistical t ests, we set the lev el of significance to α/ 4 for eac h test, taking in to accoun t the Bonferroni correction (Algorithm 1 3 ). 3 I ( a ) is a n indicator function: 1 for cor rect a ; 0 other wise. 9 Algorithm 1 T esting o n the ex istence of c omm unity s tructure Input: Edge-w eigh t matrix W , confide ntial in terv als C I max , C I min , C I ′ max and C I ′ min at leve l α / 4. s ← 0 s ← s + I (max. eigen v alue of T ( W ) ∈ C I max ) s ← s + I (min. eigen v alues of T ( W ) ∈ C I min ) s ← s + I (max. eigen v alue of T e ( S ( W )) ∈ C I ′ max ) s ← s + I (min. eigen v alue of T e ( S ( W )) ∈ C I ′ min ) if s = 4 then Accept H 0 else Reject H 0 end if 3 Sim u lation study on syn thetic data In this section, w e rep ort on a simulation study to ev alua t e the p erformance of our method. First, we inv estigate the v alidit y of using F 1 ( x ) in Eq.(7) for appro ximation of the distribution of the maxim um eigenv a lue λ max when n is finite. Second, w e in v estigate the p ow er of our metho d when t he n ull h yp othesis H 0 is not true. Third, w e compare the p erformance of our metho d outlined in Algo- rithm 1 with other metho ds. Basically , existing metho ds in litera t ur e consist of t w o steps. In the first step, a clus tering solution for a given g raph is yielded b y a (arbitra ry) clustering metho d. The yielded solution is subsequen tly com- pared with those clustering solutions for randomized graphs, whic h is further ev aluated b y a particular statistic. In this s tudy , w e adapt one of the state- of-the-art metho ds based on clus tering en trop y (‘CE’, originally designed f or a un w eigh ted graph) [7]: S = − 1 L P ( i,j ) { p i,j log 2 p i,j + (1 − p i,j ) log 2 (1 − p i,j ) } where L is the total n umber of edges in the graph, and p i,j is ‘in-cluster prob- abilit y’ that measures the prop ortio n of accordance of cluster mem b erships of no des i and j b etw een the giv en graph and the randomized graphs ov er a n um b er of differen t noisy con ta minations ( we set the n umber of suc h contam- inations to 10 0). As regards clustering, to the best of our kno wledge, there is no clustering metho d that is sp ecifically designed to detect comm unity struc- ture based on differences o f patterns of distributions. As a ba il-out pro cedure, w e conside r one of the state-of-the-art methods fo r signed net works: Signed 10 sp ectral clustering based on normalized signed La placian metho d (‘Signed- Sp ec’), whic h is designed t o detect the w eakly balanced structure o f graph, i.e., p ositiv e w eigh ts within a cluster and negativ e we ights b et w een clus- ters [10]. W e also consider the con v en tional sp ectral clustering (normalized Laplacian metho d, ‘Con vSp ec’), whic h is applicable to a graph with p ositiv e w eigh ts. T o apply the metho d ‘ConvSpec’ to our context, w e transform an edge-w eigh t ma t r ix in to a p ositiv ely-w eighted matrix b y subtracting min i,j w i,j from eac h w eigh t. Note that the metho d ‘Con vSp ec’ is equiv alen t to t he metho d ‘SignedSpec’ when edge w eigh ts are a ll p ositive . 3.1 Data generation F or the data structure in this sim ula tion study , we adopted that in [8], setting the num b er of clusters to fiv e and cluster size to (1 0 s, 20 s, 30 s , 40 s, 50 s ) where w e manipulated in teger s . In this setting, we ha v e 5 × 5 = 2 5 cluster blo ck s. In eac h cluster blo c k, we ights w ere indep enden tly drawn from a Gaussian distribution N ( µ k ,k ′ , σ k ,k ′ ) where µ k ,k ′ and σ 2 k ,k ′ are the mean and the v ariance for a cluster block ( k , k ′ ). W e generated 100 da tasets for eac h setting. 3.2 Results When the n um b er of no des rang es from 150 to 1500, the distribution function F 1 ( x ) in Eq.(7) prov ides a g o o d approxim ation f o r the critical v a lue at sig- nificance lev el of α = 0 . 05 under the n ull h yp othesis H 0 (Fig.3a). Since the function F 1 ( x ) pro vides the a symptotic probability distribution, this result suggests that the function F 1 ( x ) also pro vides a go o d approximation fo r the critical v alue when the num b er of no des go es up mor e than this r a nge. As regards statistical p ow er, it is implied that our metho d can w ell detect the existence of comm unity structure when means µ k ,k ′ in each blo c k are differen t at most by 0.3 (3 × 0 . 05 + 3 × 0 . 05) when σ k ,k ′ = 1 with the nu m b er of no des b eing 750 (Fig.3b). On the other hand, the p o w er ma y not b e suffic ien t when differences among cluster blocks are c haracterized by v ariances σ 2 k ,k ′ (Fig.3c). Ho w ev er, the application of o ur metho d to the exp onen tially transformed matrix by E xp considerably improv es the p ow er (Fig.3d). All t hese results suggest go o d p erfo rmance of our metho d t o test the existence o f comm unity structure in a graph. Lastly , we compare the p erfo rmance of our metho d with the remainder 11 (a) No comm unity (b) µ differs (c) σ differ s (d) σ differs ( Exp ) Figure 3: Boxpl ots r epr esent distributions of lar gest eig envalues for various set- tings. Panel (a): No c ommunity c ase ( K = 1 ) of Gaussian ensembles for differ ent numb er of no des fr om 150 to 1500 in x-axis. Panel (b): Five-way c ommunity c ase with the numb er of no des 750 and cluster size (50, 100, 150, 200, 250). Each cluster blo ck is char acterize d by me ans of G aussian distribution (while fixing vari- anc e=1), which is r andomly chosen fr om {− µ, µ } with e qual pr ob abilities. The value of µ is manipulate d fr om 0 to 0.5 of width 0.1 in x-axis. Panel (c) : Five- way c ommunity c ase char acterize d by varianc e (while fixing me an=0), which is r andomly chosen fr om { 1 , σ 2 } with e qual pr ob abilities. The value of σ is manipu- late d fr om 1 to 6 of width 1 in x-axis. Panel (d): Five-way c ommunity c ase in the same setting as in (c), but e ach e dge-weight matrix is tr ansforme d by the exp onen- tial mapping Exp in Eq.(4) with t 0 = 1 / 2 . In al l p anels, the gr e en line denotes the 95 p er c entile of the lar gest eigenvalue und er the nul l hyp othesis H 0 in (6). of the metho ds. W e applied our metho d as outlined in Algor ithm 1 to the syn thetic data, setting α to 0.05 (hence, α/ 4 = 0 . 0 125). When the com- m unit y structure is c haracterized by mean differences , the p erformance of our metho d is comparable with the clustering entrop y metho d with signed 12 Distance between centroids (x 0.05) 1 2 3 Proportion of rejecting H 0 0 0.2 0.4 0.6 0.8 1 Our method CE + SignedSpec CE + ConvSpec (a) µ differs Scale (x 0.5 + 1) 1 2 3 Proportion of rejecting H 0 0 0.2 0.4 0.6 0.8 1 Our method CE + SignedSpec CE + ConvSpec (b) σ diff ers Figure 4: Comp arison of p ower of test for thr e e diffe r ent metho ds: our metho d, clustering entr opy metho d for yielde d cluster solution by signe d sp e c tr al clustering metho d (CE + Signe dSp e c), and clustering entr opy metho d by c onventional sp e ctr al clustering (CE + ConvSp e c). The true c ommunity structur e is set as fol lows: cluster size (50, 100, 150, 200, 250); the me ans and the varianc es ar e manipulate d in x- axis of Panel (a) and (b) as in Fig.3b and Fig.3c, r esp e ctively. sp ectral clustering (CE + SignedSp ec), while it outp erforms the cluster- ing entrop y metho d w ith con v en tional sp ectral clus tering (CE + Con vSp ec) (Fig.4a). On t he other hand, when the comm unity structures is c haracterized b y scale differences, our metho d considerably outp erforms the remainder of the me tho ds (Fig.4b). 4 Applicatio n to real data In this section, w e exp eriment our metho d to real data. The ob jectiv e is to ev aluate the p erformance of our metho d when it is applied to v arious t yp es of r eal g raph data. 4.1 Data First, we applied our metho d to the following b enc hmark graph datasets: Karate club, Kar ate [26]; co- authorships in net w ork science, Co-authours [14]; T ribal relationships in highland New Guinea, Gahuku-Gama [17]. The datasets o f Kar ate and Co-authours are binary (i.e., { 0 , 1 } ), while the edges in the dataset of Gahuku-Gama tak e discrete signed v alues, {− 1 , 0 , 1 } . The n um b er of nodes for these datasets are 34, 1589, and 16, respectiv ely . These datasets hav e b een we ll studied in terms of detecting communit y structure [25]. The clustering results and the underlying so cial relationships b et w een 13 sub jects (no des) w ere fully discussed in literature, clearly suggesting the existence of comm unit y structure in these datasets. Figure 5: R esults of applic ation of our metho d to r e al datasets, Kar ate, Co- authors, and Gahuku-Gama fr om left to right p anels. The star maker denotes the maximum or the minimum e i genvalues of the nor malize d matrix T ( W ) , while the cr oss ma rker denotes those of the exp onential ly normal ize d matrix T e ( S ( W )) . The e dges on top or b ottom of b oxes denote critic al v alues of these eigenvalues at signific anc e leve l α/ 2 with α =0.05 for Kar ate and Co-authors datasets, and α/ 4 for Gahuku-Gama data set. These critic al values wer e yielde d by p ermutation test with 100 0 r andomize d r e alizations. In c ontr ast, the r e d dashe d lines denote the critic al values derive d fr om T r acy-W i dom distribution F 1 ( x ) . Second, w e applied o ur metho d t o a real- v alued edge-weigh ted graph: resting state functional MRI ( fMRI ) data [13]. The original dataset con- sists of the lev el of BOLD (Blo o d- Oxygen-Lev el Dep enden t) signal at short in terv als, whic h reflects neural activity at eac h tiny p ortion of the brain, called ‘v oxel’ (4949 v oxels in this dat a set). W e pre- pro cessed this dataset b y ev aluating temp oral correlations among these v oxels and car r ying out Fisher’s z-transformation for them, whic h results in a 494 9 edge-we ight ma- trix W . The ob jectiv e in this dataset is to exp erimen t o ur metho d to suc h a real-v alued edge-weigh t matrix a nd to dra w a useful implication from the analysis. 4.2 Results F or the first group of real datasets, our metho d correctly suggests that the comm unit y structure may exist (i.e., K > 1), whether we estimate critical v alues either b y T racy-Widom distribution or b y p erm utation t est (Fig.5). 14 Figure 6: R esults of applic ation of our metho d to the fMRI dataset. The star markers denote the maximum or the minimum of eigenvalues λ for normalize d weight matric es by the mapping T in various br ains r e gions with e dge- weight matrix W k ,k , indexe d b y the br ain r e gion k in x- axis. The c r oss markers denote those c ounterp arts for exp onential ly normalize d weight matric es by the map ping T e . The horizonta l lines denote lines y = − 2 and y = 2 , which c orr esp ond the values to which the minimum and the ma ximum eigenvalues a symptotic al ly c onver g e. Note tha t in the binary case, w e alw a ys ha v e the same results of our test for the origina l matrix and the exp onen tially transp osed matrix, b ecause T ( W ) = T e ( S ( W )). So, w e carried out o ur test only for T ( W ) in Kar ate and Co - a uthors datasets, setting the significance lev el to α/ 2. F or fMRI da t a set, our test r ejected the n ull h yp othesis H 0 , yielding the maxim um and the minimu m eigen v alues as 31.0 and -7.2 f o r T ( W ), and 31.8 and -10.9 for T e ( S ( W )), whic h pro vides strong evidence that there exists the comm unit y structure in this graph. F urthermore, w e carried o ut our test for subsets of v oxe ls in brain regions that are anatomically predefined , where the n um b er of v ox els rang es from 13 to 49 8. Our test results suggest that there ma y exist the comm unit y structure in each region (except for bra in regio n 16) (Fig.6). This result supp orts the conjecture on heterogeneit y of brain activities in anatomically define d brain re gions discuss ed in the literature of neuroscience [4]. 15 5 Discuss ion W e ha v e prop o sed a nov el metho d f o r statistical test on t he existence of com- m unit y structure in an undirected graph that is characterize d b y the first and the second momen ts of generativ e mo del for edge w eigh ts. This metho d can b e considered as an (non trivial) extension of the recen tly prop osed metho d [3] from a binary-v alued graph t o a real-v a lued one. Unlike the existing metho ds for real-v alued graphs, our metho d do es not need a cluster solution. Hence, w e can apply this method ev en to the nontrivial case o f clus tering in whic h edge w eigh ts tak e b oth p ositive and ne gative real v alues. Also, in our approac h, w e can av oid a non trivial problem of determining the n um b er o f clusters. F urther, our metho d is quite efficien t in terms of computation time: W e need o nly to ev aluate the eigen v a lues of edge-weigh t matrix just once if w e use T racy-Widom distribution, whic h is due to the asymptotic results pro vided b y Random Matrix Theory . As the next step of analysis, one may w onder how to find comm unit y mem b erships when our test rejects the n ull hypothesis of K = 1. The presen t pap er did not a ddress this issue, but, it would b e quite useful to examine eigen v ectors of the edge-w eigh t matrix as in the case of sp ectral clustering. It is conjectured that s ome of the eigen v ectors of T ( W ) and T e ( S ( W )) ma y ha v e informat io n on comm unity mem b erships. It w ould b e a n impo r t an t fu- ture research t o pic on ho w to determine and syn thesize r elev ant eigenv ectors for inferring the underlying communit y structure. 16 APPENDIX A Pro of of Example 1 The edge-weigh t matrix can b e represen ted as W n = W n, 1 , 1 . . . 0 n 1 ,n K . . . . . . . . . 0 n K ,n 1 . . . W n,K ,K , where W n,k ,k is the principal submatrix of W n that consists of cluster blo c k ( k , k ); n k the n um b er of no des in the k th cluster; 0 n k ,n k ′ a n k × n k ′ zero matrix (1 ≤ k , k ′ ≤ K ). Because of the assumption t hat µ k ,k ′ = 0 , the normalized matrix T ( W n ) also has zero off- diagonal blo c ks. As a result, the eigen v alues of T ( W n ) consist of those eigen v alues of cluster blo ck ( k , k ) in T ( W n ). So, it suffices to show that the eigen v alues of these cluster blo ck s follo w the semicircular la w. Since the v ariance σ 2 k ,k for W n,k ,k is one , it b ecomes that the v ariance of all elemen ts in W n is P K k =1 n 2 k /n 2 . F ur t her, b ecause of the assumption tha t n k = n/K ( k = 1 , . . . , K ) , the v ariance of all elemen ts in W n b ecomes 1 /K . So, the standardized matrix S ( W n ) is giv en b y S ( W n ) = √ K × W n . Hence, the normalized matrix b ecomes T ( W n ) = ( √ K / √ n ) × W n . (8) Since n = K × n k , the coefficien t √ K / √ n in Eq.(8) b ecomes 1 / √ n k . There- fore, the eigen v alues that are relev an t for clus ter blo c k ( k , k ) are those eigen- v alues of the matrix W n,k ,k / √ n k . This suggests that the distribution of these eigen v alues follo ws the semicircular la w as n go es to ∞ . This completes the pro of. B Pro of of The orem 2.1 In our setting, if there is no comm unit y structure, then, the eigen v alues of T ( W n ) and the eigen v alues of T e ( W n ) follow the semicircular law b ecause 17 of univ ersalit y prop erty of the la w. Now , we prov e t he con v erse. F o r this purp ose, w e pro v e that if there is comm unity structure, then, it violates the semicircular law . First, w e consider the situation t ha t t here is communit y structure suc h t hat some of means µ k ,k ′ in S ( W n ) differ, while v ariances σ k ,k ′ in S ( W n ) are the same across differen t clus ter block s. Note that means and v ariances are defined for the standardized matrix S ( W n ). F or simp licit y of notation, w e denote the normalized w eight matr ix (i.e., T ( W n )) as W n . The matrix W n can be decomposed as follo ws: W n = W ′ n + M n , where M n is the normalized mean matrix, M n = µ 1 , 1 . . . µ 1 ,K . . . . . . . . . µ K, 1 . . . µ K,K / √ n, where µ k ,k = µ k ,k 1 n k 1 T n k ; n k the num b er of no des in the k th cluster; 1 m a m × 1 v ector with eleme nts one. By the dual W eyl inequality [22, p.40] (no t e that b oth W ′ n and M n are sy mmetric matrices), λ i + j − n ( W n ) ≥ λ i ( W ′ n ) + λ j ( M n ) , where λ i ( A ) denotes the i th eigen v alue of matrix A in descending order. Letting i = n and j = 1, λ 1 ( W n ) ≥ λ n ( W ′ n ) + λ 1 ( M n ) . (9) The eigen v alues of W ′ n follo w t he semicircular law from its definition. F rom the semicircular law and Bai- Yin theorem [22, p.12 9 ] (b ecause of the existence of the momen t-generating function, the fo urth momen t exists), the eigen v alue λ n ( W ′ n ) almost surely conv erges to − 2 as n → ∞ . On the other hand, we ev aluate a lo wer b ound of λ 1 ( M n ) as fo llo ws. The num b er of no des n k is giv en by n k = r k × n where r k is the prop o rtion of the no des in that cluster. The lar g est magnitude of eigen v alues is giv en b y t he op erator norm: max( | λ 1 ( M n ) | , | λ n ( M n ) | ) = sup | v | =1 | M n v | . (10) F or simplicit y , w e assume the left-hand side is giv en by the la rgest eigen- v alue λ 1 ( M n ) (t he fo llo wing arg umen t is applicable when it is − λ n ( M n ) 18 as well). W e ev aluate a lo w er b ound of λ 1 ( M n ) using Eq.(10). Letting v = ( v T 1 , . . . , v T K ) T with v k = v k × 1 n k , | M n v | 2 = 1 n K X k =1 n k ( K X k ′ =1 n k ′ µ k ,k ′ v k ′ ) 2 = n 2 K X k =1 r k ( K X k ′ =1 r k ′ µ k ,k ′ v k ′ ) 2 . Hence, λ 1 ( M n ) ≥ ǫ 2 × n, (11) where ǫ 2 = P K k =1 r k ( P K k ′ =1 r k ′ µ k ,k ′ v k ′ ) 2 . Because of our assumption, there exists non zero µ k ,k ′ in Eq.(11). This suggests that b y appropriately c ho os- ing v , it b ecomes that ǫ 2 6 = 0. Therefore, it is concluded that the largest eigen v alue of W n in Eq.(9) tak es (infinitely) larger v alue than 2 as n go es t o ∞ . This implies t he violatio n of the s emicircular that we aim to pro v e. So fa r, w e ha ve assumed that the v ariances of cluster blo ck are t he same in W . W e relax this condition. Let us assume that v ariances differ. It suffices to sho w that λ n ( W ′ n ) in Eq.(9) is low er b ounded. W e prov e this when K = 2, but the pro of can b e easily extended to K > 2 case. By the dual W eyl inequalit y , λ n ( W ′ n ) ≥ λ n ( W ′ a ) + λ n ( W ′ b ) , where W ′ a = W ′ 11 0 0 W ′ 22 , W ′ b = 0 W ′ 12 W ′ 21 0 , where W ′ i,j is the submatrix of W ′ for cluster blo c k ( i, j ), and W ′ T 12 = W ′ 21 . Since eac h diag o nal blo c k of W ′ a follo ws the semicircular law up to scale, λ n ( W ′ a ) = O (1). O n the other hand, w e augmen t the diagonal blocks for the symmetric matrix W ′ b b y generating elemen ts from the same distribution as in its off-diago nal blo c ks. W ′ b,arg = W ′ b + W ′ b,diag , where W ′ b,arg is the augmen ted matrix and W ′ b,diag the diago na l blo ck matrix. Again, by t he dual W eyl inequalit y , λ n ( W ′ b ) ≥ λ n ( W ′ b,arg ) + λ n ( − W ′ b,diag ) . 19 Since the eigenv alues of W ′ b,arg and the eigen v alues of − W ′ b,diag follo w the semicircular la w up to scale, it b ecomes that λ n ( W ′ b ) is low er b ounded. Hence, it is concluded that λ n ( W ′ n ) is also lo w er b ounded. Second, w e consider the case when means a r e zero, but some v aria nces differ across cluster blo c ks. In suc h a case, we consider the exp onen tial transformation of the edge-w eigh t W where w i,j → exp( t 0 × w i,j ). By def- inition, the exp ectation of the new v ariable exp( t 0 × w i,j ) is ev aluated at t = t 0 b y the momen t-generating function M k ,k ′ ( t ) for the distribution of w i,j in blo c k ( k , k ′ ) where M k ,k ′ ( t ) is give n b y exp( µ k ,k ′ t ) M ( σ k ,k ′ t ). In gen- eral, the probabilit y distribution is uniquely determined b y the corr esponding momen t-generating function in a n op en in terv al containing zero [19, p.155]. So, if some cluster blo ck s ha v e differen t distributions, there exists t 0 near zero such that some o f M k ,k ′ ( t 0 ) differ for some real t 0 6 = 0. This suggests that some means of the new v ariables differ. Moreov er, if w e ta k e t 0 small enough such that M k ,k ′ (4 t 0 ) exis ts (i.e., at least the fourth moment exists for the exp onen tially transformed v a r ia bles; indeed, it is p ossible to do so, b e- cause t 0 can b e take n as m uc h small as one w ants ), then, the same argumen t (that w e hav e dev elop ed in the first case of differen t means) can b e applied to this case ab o ut the violatio n of the semicircular la w (b ecause Bai- Yin the- ory is also applicable to this case). This completes the pro o f that if there is comm unit y structure, then, the semic ircular la w is violated. References [1] G´ erard Ben Arous and Alice Guionnet. Wigner matrices. In Gernot Ak emann, Jinho Baik, and Philipp e Di F rancesco, editors, The Oxfor d Handb o ok of R andom Matrix The ory , pages 433–451. Oxford Univ ersit y Press, 2011. [2] Ginestra Bianconi, Paolo Pin, and Matteo Marsili. Assessing the r ele- v ance of no de features for net w ork structure. Pr o c e e ding s of the National A c ademy of Scienc es , 106(28 ):11433–114 38, 2009. [3] P eter J Bic k el and Purnamrita Sar k ar. Hyp othesis testing for automated comm unit y detection in ne tw orks. Journal of the R oyal Statistic al So ci- ety: Series B (Statistic al Metho dolo g y) , 78(1):253–27 3 , 2016. 20 [4] Rasm us M Birn, Ziad S Saad, and P eter A Bandettini. Spatial hetero- geneit y of the nonlinear dynamics in the fmri b old resp onse. Neur oimage , 14(4):817– 826, 2001 . [5] Marianna Bolla. Sp e ctr a l Clustering and Bic lustering: L e arning L ar ge Gr aphs and Con tingency T abl e s . John Wiley & Sons , 2013. [6] Santo F ortunato. Communit y detection in graphs. Physics R ep orts , 486(3):75– 174, 2010 . [7] David Gf eller, Jean-C ´ edric Chapp elier, and P aolo De Los Rios. Finding instabilities in t he comm unit y structure of complex net w orks. Physic al R eview E , 72(5):056135 , 2005. [8] Cho-Jui Hsieh, Kai-Y ang Chiang, and Inderjit S D hillo n. Low rank mo deling of signed net works. In Pr o c e e dings of the 18th A CM SIGKDD international c onfer e n c e on Know le dge disc overy and data mining , pages 507–515. A CM, 2012. [9] Brian Karrer, Eliza v eta Levina, and Mark EJ Newman. Robustness of communit y structure in net w orks. Physic al R ev i ew E , 77(4):04 6 119, 2008. [10] J´ erˆ ome Kunegis, Stephan Sc hmidt, Andreas Lommatzsc h, J ¨ urgen Lerner, Ernesto Willia m De Luca, and Sahin Alba yrak. Sp ectral analy- sis of signed graphs for clustering, prediction and visualization. In SDM , v olume 10, pages 559–570 . SIAM, 2 0 10. [11] Andrea La ncic hinetti, Filipp o Radicch i, and Jos ´ e J Ramasco. Sta- tistical significance o f comm unities in net w orks. Physic al R eview E , 81(4):0461 10, 2010. [12] Madan Lal Meh ta. R andom matric es , v olume 142. Academic press, 2004. [13] T o m Mitc hell. Starplus fmri data: sub ject0479 9 , 2005. [14] Mark EJ Newman. Finding comm unit y structure in net w orks using the eigen v ectors of matrices . Physic al r eview E , 7 4 (3):036104, 2006. [15] Mark EJ Newman and Mich elle Girv an. Finding and ev aluating com- m unit y structure in net w orks. Physic al r e view E , 69(2):026113, 20 0 4. 21 [16] Andrew Y Ng, Michael I Jordan, Y air W eiss, et al. On sp ectral clus- tering: Ana lysis and an algorithm. A dvanc es in neur al information pr o c essing systems , 2:849–856, 2002. [17] Kenneth E Read. Cultures of the cen tral highlands, new guinea. South- western Journal of Anthr op olo g y , pages 1–4 3, 195 4. [18] J¨ org Reic hardt and Stefan Bornholdt. Sta tistical mechanic s o f commu - nit y detection. Physic al R eview E , 74(1):016110, 200 6. [19] John Rice. Mathem atic al statistics and data ana l ysis . Cengage Learning, 2006. [20] Geoff J Ro dgers and T aro Nagao. Complex net w o rks. In Gernot Ak e- mann, Jinho Baik, and Philipp e Di F rancesco, editors, The Ox f o r d Handb o ok of R andom Matrix The ory , pages 898–911. Oxford Univ ersit y Press, Oxford, 2011. [21] Martin Rosv all and Carl T Bergstrom. Mapping c hange in la rge net- w orks. Plo S one , 5(1):e8694, 2010. [22] T erence T ao. T opics in r an d om matrix the ory , v olume 132. American Mathematical So c., 2 012. [23] Craig A T racy and Ha rold Widom. On orthog onal and symplectic matrix ensem bles. Communic ations in Mathematic al Physics , 177(3):7 27–754, 1996. [24] Craig A T racy and Harold Widom. The distributions of random matrix theory and their applications. In N ew T r ends in Mathematic al Physics , pages 753–765. Springer, 2009. [25] Bo Y ang, William K Cheung, and Jiming Liu. Communit y mining fro m signed so cial netw orks. Know le d g e and Data Engine ering, I EEE T r ans- actions on , 19(10):1333–1 3 48, 2007. [26] W a yne W Zac hary . An info rmation flow mo del for conflict and fission in small groups. Journal of anthr op olo gi c al r ese ar ch , pages 45 2–473, 19 7 7. 22
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment