A Unified Community Detection, Visualization and Analysis method

Community detection in social graphs has attracted researchers' interest for a long time. With the widespread of social networks on the Internet it has recently become an important research domain. Most contributions focus upon the definition of algo…

Authors: Michel Crampes, Michel Plantie

A Unified Community Detection, Visualization and Analysis method
F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 Adv ances in Complex Systems © W orld Scientific Publishing Company A Unified Comm unit y Detection, Visualization and Analysis metho d Michel Cramp es Michel Planti´ e Ecole des Mines d’Ales, P arc Georges Besse, 30035 N ˆ ımes Cedex With the widespread of so cial netw orks on the Internet, comm unit y detection in so cial graphs has recently b ecome an imp ortant research domain. Interest was initially limited to unipartite graph inputs and partitioned community outputs. More recen tly bipar- tite graphs, directed graphs and ov erlapping comm unities ha v e all been inv estigated. F ew contributions how ev er hav e encompassed all three types of graphs simultaneously . In this pap er we presen t a metho d that unifies communit y detection for these three types of graphs while at the same time merges partitioned and ov erlapping comm uni- ties. Moreover, the results are visualized in a w a y that allows for analysis and semantic interpretation. F or v alidation purp oses this metho d is exp erimented on some w ell-known simple b enchmarks and then applied to real data: photos and tags in F ac eb o ok and Hu- man Br ain T r acto gr aphy data. This last applic ation le ads to the possibility of applying communit y detection methods to other fields suc h as data analysis with original enhanced performances. 1. in tro duction Thanks to the gro wth of online so cial net works, communit y detection has b ecome an imp ortant field of research in computer sciences. Many algorithms hav e b een prop osed (see several surveys on this topic in [8, 28, 34, 29]). Most of them tak e unipartite graphs as inputs and pro duce partitioned comm unities. In unipartite graphs an y no de may share an edge with another node. Other con tributions ha ve also explored bipartite graphs and directed graphs. In bipartite graphs, nodes are separated in t wo sets and there are only edges b etw een no des of differen t sets. In directed graphs eac h link has a start node and an end node. These authors generally in tro duce communit y detection metho ds whic h are sp ecific for eac h type of graphs, and sometimes for tw o t yp es of graphs. In this paper w e presen t a metho d that encompasses all three types of graphs sim ultaneously in a unique bipartite graph mo del. With this respect w e consider Newman’s mo dularity [25] and apply it to bipartite graphs. W e show that this mo dularity mo del can b e directly applied to bipartite graphs with the side effect of structurally linking ob jects of both no de sets in the same comm unities. This structural prop erty is formally demonstrated in Annex 1. In a second step this mo del is transformed into a unipartite graph mo del. As a result an y communit y detection algorithm for unipartite graph ma y b e applied. 1 F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 2 Cr amp es, Planti ´ e W e chose for exp eriments the so-called Louv ain algorithm [3] which is kno wn for its efficiency in pro ducing partitioned communities from extensive data sets. It is also applicable to weigh ted and un weigh ted graphs. Our metho d extracts comm unities where b oth types of no des are asso ciated. W e show that this result is semantically p ertinen t although it has b een criticized by some authors [20, 13, 1] who think that there should not be the same num b er of communities in b oth sets. Moreo ver asso ciating b oth types of no des in the same communities op ens up new issues. It is p ossible to merge partitioned and quantified o verlapping communities in a unique view and then analyze their structure with differen t persp ectives. Indeed most communit y detection algorithms suc h as Louv ain use heuristics which lead to lo cal optima. With our approac h w e can identify and explain the final organization and p ossibly correct some un wan ted no de assignmen ts. In the follo wing we use the term ”‘semantics”’ for qualifying en tities which are describ ed by prop erties or attributes. Communit y detection is driven by prop erties that are shared b etw een en tities and consequently the resulting communities are seman tically describ ed b y these prop erties. F or v alidation and comparison with other authors the whole metho d has b een exp erimen ted on small traditional unipartite and bipartite b enchmarks. W e hav e generated interesting insigh ts which extend b ey ond known results. W e can then apply our metho d on real medium-sized bipartite graphs, in a step that reveals significan t prop erties such as ov erlapping communities, communit y compactness and the role of inter-comm unity ob jects. These results are v aluable when observed in data lik e p eople-photo data sets targeted by our exp eriments. Bey ond comm unity detection, our method has also b een applied to brain data extracted through ’tractography’ by a team of neurologists and psycho-neurologists seeking to extract macro connections betw een differen t brain areas. Our results w ere compared with those they obtained when applying sp ectral clustering, a traditional data analysis method. Although they w ere very similar, our method provided new insigh ts in the analysis. In conclusion we observ e that after ha ving borrow ed algo- rithms from data analysis metho ds, comm unity detection ma y in return offer new to ols to these techniques. W e also successfully applied our metho d to most standard unipartite and bipartite graph b enc hmarks. The next section will presen t a state-of-the-art on communit y detection methods using different types of graphs. Section 3 will follo w by fo cusing on a new method to unify all t yp es of graphs; it uses a definition of mo dularity for bipartite graphs directly derived from mo dularity for unipartite graph which is presented in Annex 1 (section 8). Section 4 will then demonstrate how our unifying metho d is particu- larly v aluable in computing, visualizing and analyzing partitioned and ov erlapping comm unities. Section 5 presents sev eral practical results on different t yp es of graph data sets. The conclusion in section 7 discusses the pros and cons of our metho d in the ligh t of these exp erimen tal results. F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 3 2. State of the art As stated abov e, several state-of-the-art assessmen ts hav e already addressed the comm unit y detection problem: [28, 29, 34, 8]. They are mainly fo cused on uni- partite graph partitioning. The calculation p erformed is based on maximizing a mathematical criterion, in most cases mo dularity [25], represen ting the maxim um n um b er of connections within eac h communit y and a minimum n umber of links with external communities. V arious metho ds hav e b een developed to identify the optim um, e.g. greedy algorithms [23, 26 ], sp ectral analysis [24], or a searc h for the most centric edges [25]. One of the most efficient greedy algorithm for extracting partitioned communities from large (and p ossibly w eighted) graphs is Louv ain [3]. In a very comprehensive state-of-the-art rep ort [8] other new partitioned communit y detection metho ds are describ ed. The partitioning of communities, despite b eing mathematically attractive, is not satisfactory to describe reality . Eac h individual has ’sev eral liv es’ and usually b elongs to several communities based on family , professional, and other activities. As such other metho ds more recen tly take into accoun t the p ossibility for o verlap- ping comm unities. The so-called k-clique percolation method [27] detects o verlap- ping communities by allowing no des to b elong to m ultiple k-cliques. A more recent metho d adapted to bipartite netw orks, and based on an extension of the k-clique comm unit y detection algorithm is presen ted in [31]. Sev eral methods use lo cal fit- ness optimization [16][14]. The ’Lab el Propagation Algorithms’ (LP A) are rep orted to be particularly efficien t [12]. [16] uses a greedy clique expansion metho d to de- termine ov erlapping communities via a tw o-step process: identify separated cliques and expand them for o verlapping by means of optimizing a local fitness criteria. [7] derives n order clique graphs from unipartite graphs to pro duce partitioned and o v erlapping communities using Louv ain algorithm. Some research has pro vided re- sults in the form of hypergraph communities suc h as in[6, 5]. Other metho ds are found in scientific pap ers, yet most of these are prone to ma jor problems due to computational complexity . More recently W u [33] prop osed a fast ov erlapping com- m unit y detection metho d for large real-w orld unipartite net works. The metho d in [7] presents some common features with ours, alb eit with a different strategy , since it uses traditional partitioning algorithm to extract o v erlapping communities. When considering semantics it b ecomes necessary to fo cus on bipartite or “multi- partite” graphs i.e. graphs whose no des are divided in to several subsets, and whose edges only link nodes from different subsets. One example of this type of graph is the set of photos from a F acebo ok account along with their ’tags’ [19] or else the tripartite net work of epistemic graphs [30] linking researchers, their publications and keyw ords in these publications. T raditional metho ds transform the multipar- tite graph in to a unipartite graph b y assigning a link b etw een t wo no des should they share a common prop ert y . In doing so how ev er seman tics is lost. Hence many researc hers retain the multipart y graph prop erties by extending the notion of mo d- ularit y to these types of graphs and then apply algorithms originally designed for F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 4 Cr amp es, Planti ´ e unipartite graphs [32, 22, 1, 20, 7][18]. 3. Unifying bipartite, directed and unipartite graphs 3.1. Bip artite gr aphs p artitioning 3.1.1. T urning bip artite gr aphs into unip artite gr aphs. In formal terms, a bipartite graph G = ( U, V , E ) is a graph G 0 = ( N , E ) where no de set N is the union of t wo indep endent sets U and V and moreov er the edges only connect pairs of v ertices ( u, v ) where u b elongs to U and v b elongs to V . N = U ∪ V , U ∩ V = ∅ , E ⊆ U × V . Let r = | U | and s = | V | , then | N | = n = r + s The unw eighted biadjacency matrix of a bipartite graph G = ( U, V , E ) is a r × s matrix B in whic h B i,j = 1 if f ( u i , v j ) ∈ E and B i,j = 0 if f ( u i , v j ) / ∈ E . It must be p ointed out that the ro w margins in B represent the degrees of nodes u i while the columns’ margins represen t the degrees of nodes v j . Conv ersely , in B t , the transpose of B , ro w’s margins represen t the d egrees of no des v j and columns’ margins represen t the degrees of no des u i . Let’s now define the off-diagonal block square matrix A 0 : A 0 =  0 r B B t 0 s  where 0 r is an all zero square matrix of order r and 0 s is an all zero square matrix of order s . This symmetric matrix is the adjacency matrix of the unipartite graph G 0 where no des’ t yp es are not distinguished. It is possible to apply to G 0 an y algorithm for extracting comm unities from unipartite graphs. A 0 is also the off-diagonal adjacency matrix of bipartite graph G . Consequen tly the communities whic h are detected in G 0 are also detected in G . The question is to determine the v alidity of this side effect result: what is the quality of partitioning for G when applying an unipartite graph partitioning algorithm on G ’ ? Barb er [1] and Liu/Murata [18] hav e also introduced the blo ck matrix as a w ay of detecting comm unities in bipartite graphs. How ever w e see b elo w that they do not take all consequences of this approach. 3.1.2. Extending mo dularity to bip artite gr aphs Mo dularit y is an indicator often used to measure the qualit y of graph partitions [25]. First defined for unipartite graphs, several mo dularity v arian ts ha ve b een prop osed for bipartite graph partitioning and o verlapping communities. More recently several authors in tro duced mo dularity in to bipartite graphs using a probabilistic analogy with the mo dularity for unipartite graph which will be discussed b elow. Ho w ev er when applying unipartite graph mo dularity optimization algorithms to bipartite graphs, it is another expression of probabilistic mo dularity presented hereafter. Let G = ( U, V , E ) b e a bipartite graph with its biadjacency matrix B and the unipartite graph G 0 with the adjacency off-diagonal blo c k matrix A 0 . Let’s consider Newman’s modularity [25] for this graph G 0 . It is a function Q of both matrix A 0 F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 5 and the comm unities detected in G 0 : Q = 1 2 m X i,j  A 0 ij − k i k j 2 m  δ ( c i , c j ) (1) where A 0 ij represen ts the weigh t of the edge b etw een i and j , k i = P j A 0 ij is the sum of the weigh ts of the edges attac hed to v ertex i , c i denotes the comm unity to whic h vertex i is assigned, the Kroneck er’s function δ ( u, v ) equals 1 if u = v and 0 otherwise and m = 1 / 2 P ij A 0 ij . Hereafter we only consider binary graphs and w eigh ts are equal to 0 or 1. After several transformations we sho w (see Annex 1, Section 8) that this mod- ularit y can also b e written using the biadjacency matrix B of the bipartite graph G = ( U, V , E ): Q B = 1 m X ij [ B ij − ( k i + k j ) ² 4 m ] δ ( c i , c j ) (2) where k i is the margin of row i in B , k j the margin of column j in B and m = P ij B ij = 1 2 × P ij A 0 ij = m in (1). Another in teresting formulation to b e used is the following (Appendix 1, Section 8): Q B = X c [ | e c | m –( ( d u | c + d v | c ) 2 × m ) ² ] (3) where | e c | is the num b er of edges in communit y c , and d w | c is the degree of no de w belonging to c . This formulation of mo dularit y is the same as Newman’s mo dularity with more detailed information: it explicitly shows that b oth sets of nodes are structurally asso ciated in the same communit ies. Since in the general case B is not symmetric, this definition th us character- izes mo dularity for bipartite graphs after their extension in to unipartite graphs. It then b ecomes p ossible to apply any partitioning algorithm for unipartite graphs to matrix A 0 and obtain a result where b oth types of no des are b ound in the same comm unities, except in the case of singletons (i.e. no des without edges). This def- inition from unipartite graph mo dularity giv en that it is able to bind b oth types of no des, is compared in Section 3.2 with other authors’ mo dularity models for bipartite graphs. 3.1.3. T urning oriente d gr aphs into bip artite gr aphs. A directed graph is of the form G d = ( N , E d ) where N is a set of no des and E d is a set of ordered pairs of no des b elonging to N : E d ⊆ N × N . F rom the model in (1) F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 6 Cr amp es, Planti ´ e Leic h t [17] use probabilistic reasoning ’insigh ts’ to deriv e the follo wing mo dularity for directed net w ork: Q = 1 m X ij " A ij − k in i k out j m # δ ( c i , c j ) (4) where k in i and k out j are the in - and out- degrees of vertices i and j , A is the asym- metric adjacency matrix, and m = P ij A ij = P i k in i = P i k out j . Symmetry is then restored and spectral optimization applied to extract non-o verlapping comm unities. This mo del leads to a node partition that do es not distinguish b etw een the in and out roles; the no des are simply clustered within the v arious communities. T o compare these authors’ metho d to ours, we transformed directed graphs into bipartite graphs (this transformation w as also suggested in Guimera’s w ork [13] when applying their metho d for bipartite netw orks to directed graphs, as will b e seen b elow). At this p oint, let’s differen tiate the no des’ roles into N × N . Along these lines, we duplicate N and consider t w o identical sets N out and N in . The original directed graph G d is transformed into a bipartite graph G = ( N out , N in , E ) in whic h no des app ear t wice dep ending on their ’out’ or ’in’ role and moreov er the asymmetric adjacency matrix A plays the role of biadjacency matrix B in bipartite graphs. W e can no w define mo dularit y for directed graphs as follows: Q B = 1 m X ij [ A ij − ( k in i + k out j ) ² 4 m ] δ ( c i , c j ) (5) After applying any algorithm for a unipartite graph on the corresp onding adja- cency matrix A 0 w e obtain a partition where some no des ma y belong to the same comm unit y t wice or instead may app ear in tw o differen t comm unities. Each mo del has its pros and cons. Leich t’s mo del [17] is preferable when seeking a single parti- tion with no role distinction. Our model is attractive when seeking to distinguish b et w een ’in’ and ’out’ roles, e.g. b etw een pro ducers and customers where any one can play either role. The brain data example that follows will demonstrate that our mo del is particularly w ell suited for analyzing real data. 3.1.4. T urning unip artite gr aphs into bip artite gr aphs In the abov e presentation, w e in tro duced modularity for bipartite graphs as a formal deriv ativ e of unipartite graph mo dularity . It is dually p ossible to consider unipartite graphs as bipartite graphs, and extract communities as if unipartite graphs were bipartite graphs. T o pro ceed, we m ust consider the original symmetric adjacency matrix A as an asymmetric biadjacency matrix B (with the same no des on b oth dimensions) and build a new adjacency matrix A 0 using the original adjacency matrix A t wice on the off-diagonal, as if the no des had b een cloned. When applying a unipartite graph partitioning algorithm, we then obtain comm unities in whic h all no des app ear twice. This method only works if we add to A the unity matrix F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 7 I (with the same dimensions as A ) b efore building A 0 . The first diagonal in A in fact only contains 0s since no lo ops are generally present in a unipartite graph adjacency matrix. Semantically adding I to A means that all ob jects will b e linked to their resp ectiv e clones in A 0 . This is a necessary step in that when e xtracting comm unities, the ob jects must drag their clones in to the same comm unities in order to maintain connectivit y . In practice therefore, for unipartite graphs, we build A ’ with A + I . It may seem futile to perform suc h a transformation from a unipartite graph to a bipartite one in order to find comm unities in unipartite graphs giv en that for computing bipartite graph partitioning, we hav e already made the extension into unipartite graphs using their (symmetric) adjacency matrix. This transformation is nonetheless worth while for several reasons. First, when app earing twice, no des should b e asso ciated with their clones. If the resulting communities do not display this prop ert y , i.e. a no de’s clone lies in another communit y , then the original matrix is not symmetric and can b e considered as the adjacency matrix of a directed graph. This conclusion has b een applied to the human brain tractograph y data clustering, whic h will b e describ ed in the exp erimental section b elow. Con v ersely , if we are sure that the original adjacency matrix is symmetric, then a result where all nodes are associated with their clones in the same comm unities w ould b e a goo d indicator of the qualit y of the clustering algorithm and moreo ver pro vides the opp ortunity to compare our bipartite graph approach with other uni- partite graph strategies. This is also a method we introduced in to our exp eriment (see the k arate and other applications b elow) for the purp ose of verifying the v alidity of results. Lastly , the most imp ortant benefit consists of building ov erlapping communities and o wnership functions for unipartite graphs using the method explained in Section 4 b elow. Although transforming unipartite graphs in to bipartite graphs requires more computation, it also pro vides considerable information op ening the wa y to seman tic in terpretation, whic h justifies its application in a v ariet y of con texts. 3.2. Comp arison with other mo dularity mo dels and p artitioning algorithms for bip artite gr aphs Most mo dularity models which ha ve b een prop osed in the literature for bipartite graphs are inspired b y Newman’s modularity for unipartite graphs. In some of them the ob jective is to distinguish the n um b er of communities in each t yp e of no des [13] [21][32]. How ever there is a recent consensus on a probability n ull mo del introduced b y Barber [1] which is v ery close to the original Newman’s modularity null mo del for unipartite graphs [18]. Although these authors introduce the same blo ck matrix as w e do, their mo dularit y mo del differs from ours. After small transformations for unifying notation, Barber’s mo del ( see [2] equa- tion 19) is the follo wing: F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 8 Cr amp es, Planti ´ e Q B b = 1 m X i,j [ B ij − k i k j m ] δ ( c i, c j ) (6) This mo del is sligh tly different from our mo del: Q B = 1 m X ij [ B ij − ( k i + k j ) ² 4 m ] δ ( c i , c j ) (7) The formal difference is ob vious and deserves some comments. Our mo dularity expression is formally derived from Newman’s unipartite mo dularity model (see app endix). As w as sho wn in equation 3, it is equiv alent to considering bipartite graphs as unpartite graphs with b oth t yp es of graphs b ehaving the same. W e are therefore inclined to directly apply unipartite graph algorithms which are based on this mo del and expect modularity optimization. Con v ersely [1][18], although they consider the same block matrix as we do, they sp ecify a different n ull mo del which is conceptually sound but not the result of a direct mathemat- ical deriv ation from the unipartite mo del. Therefore either these tw o definitions are equiv alent in terms of final optimization, or, if they are not, Barber’s model should b e used with sp ecific algorithms for bipartite graphs, or with algorithms for unipartite graphs adapted to bipartite graphs. If their interpretation is different, the effects of using either this formula or the other can b e observed according to tw o p ersp ectives: 1) the num b er of communities in eac h set, 2) the no de distribution of each type in the communities. According to our definition of mo dularity , b oth types of no des are explicitly b ound. Conse- quen tly when applying an y unipartite graph algorithm for detecting comm unities, b oth types of no des should hav e the same n umber of communities and, except for singletons, they should b e regroup ed into the same communities (a type of node should not b e isolated in a communit y). This side effect is not explicit in Equation 6. How ever since in this equation δ ( c i , c j ) sp ecifies that the summation is applied to b oth types of ob jects b elonging to the same comm unity , the side effect is the same: optimizing the standard bipartite graph mo dularity should yield a partitioning of b oth t yp es of no des in the same communities (this analysis is also found in [21] : “This definition implicitly indicates that the num b ers of communities of b oth t yp es are equal”). Both mo dularities should then produce the same results in terms of no de t yp e distribution. As far as the num b er of communities and no de ownership are concerned, it is more difficult to compare the results of b oth these models, in particular if v ari- ous algorithms are applied dep ending on the selected mo del. F or instance, in the Southern W omen exp eriment described b elow, w e found 3 communities when ap- plying Louv ain, while Murata in [18] found four communities using their original LP Ab+ algorithm. These authors how ever only provided a quantitativ e ev aluation F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 9 via comparison with other algorithms on computation p erformance and mo dularity optimization; in contrast, w e provide hereafter qualitative analysis as well, whic h allo ws for seman tics justification on the partitioning as will b e show ed in next sec- tion. 4. Detection and analysis of comm unity o verlapping 4.1. A dding semantics to c ommunities The fact that b oth t yp es of nodes are b ound in their communities yields several imp ortan t results. First, in considering one type of no des, a communit y can b e defined b y asso ciating a subset of no des from the other type. In other w ords, no des from one set provides sense and semantics for the grouping of no des from the other set and moreov er may qualitativ ely explain regroupings, as will b e seen b elow. This seman tic p ersp ective has not b een considered by any of the other authors, a situation due to the fact that in other con tributions, either the n umber of communities differs for b oth t yp es of no des (e.g. [20], or else when b oth types of no des contain the same n um b er of comm unities they are not b ound in each communit y [13, 1]. Binding both t yp es of no des into the same comm unities yields other p ertinent results. F or one thing, it is p ossible to define b elonging functions and consequen tly obtain quantified o verlapping comm unities. In the following discussion, we will con- sider three p ossible b elonging functions, which may exp ose communit y o verlapping in a differen t ligh t. 4.2. Pr ob abilistic function Let’s adopt the Southern W omen’s benchmark, which will b e more thoroughly de- scrib ed in Section 5.3 b elow. Applying the Louv ain communit y detection algorithm for unipartite graphs yields a partition where W omen and Ev en ts are regroup ed in to three exclusive comm unities. Let’s call these communities c 1 , c 2 and c 3 . Now, let’s supp ose the fictitious case in which woman w 1 participated in even ts e 1 , e 2 , e 3 and e 4 . furthermore, w 1 , e 1 and e 2 are classified in c 1 , while e 3 is classified in c 2 and e 4 is classified in c 3 . W e can then define a probabilit y function as follows: P ( u i ∈ c ) = 1 k i P j B ij δ ( c j ) (8) where c is a communit y , k i = P j B ij and δ ( c j ) = 1 if v j ∈ c or δ ( c j ) = 0 if v j / ∈ c In P ( u i ∈ c ) the numerator includes all edges linking u i to prop erties v j ∈ c and the denominator con tains all edges linking u i to all other no des. With this function in the present example the probability of w 1 b eing classified in communit y e 1 equals 2 4 , and her probabilities of b eing classified in c 2 and in c 3 are 1 4 eac h. The probabilit y a no de b elongs to a given comm unity is the p ercen tage of its links to this communit y as a prop ortion of the total num b er of links to all communities. In F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 10 Cr amp es, Planti ´ e other w ords, the greater the proportion of links to a giv en comm unity , the higher the exp ectation of b elonging to this communit y . 4.3. L e gitimacy function and overlapping c ommunities It is p ossible to add more meaning in order to decide whic h communit y a giv en node should join. The legitimacy function serves to measure the no de inv olvemen t in a comm unit y and other results to show communit y ov erlapping. The more strongly a no de is link ed to other nodes in a comm unity , the greater its legitimacy to b elong to the particular communit y . In the Southern W omen’s example, let’s assume that after partitioning, c 1 con tains 7 ev ents, c 2 5 ev ents and c 3 2 ev ents (whic h is actually the case in the experiment presented b elow). Then, w 1 w ould hav e a 2 7 legitimacy for c 1 , 1 5 for c 2 and 1 2 for c 3 . The legitimacy function can th us b e formalized as follo ws: L ( u i ∈ c ) f = P j B ij δ ( c j ) |{ v ∈ c }| (9) where c is a comm unit y , δ ( c j ) = 1 if v j ∈ c or δ ( c j ) = 0 if v j / ∈ c The n umerator in this expression is the same as the probabilistic function nu- merator. Only the denominator is differen t. 4.4. R e assignment Mo dularity function Reassigning no de w from C 1 to C 2 either increases or decreases the mo dularit y defined in Equation (2). Such a c hange is referred to as Reassignment Mo dularity ( RM w : C 1 → C 2 ). The full developmen t ab out this expression is exp osed in Annex 2 (cf section 9). After simplification this expression yields to: RM w : C 1 → C 2 = 1 m ( l w | 2 − l w | 1 ) − 1 2 m 2 [ d 2 w + d w ( d C 2 − d C 1 )] (10) Reassignmen t is a v ery interesting measure. It allows detection of no des that are not prop erly assigned to a communit y . Since most communit y detection algorithms are greedy algorithms some no des may not b e in a stable situation. The RM v alue rev eals unstable no des and the communit y to whic h they should b e assigned. 5. Exp erimen tation This section will consider several b enchmarks from v arious sources. W e b egin b y applying our metho d to t w o simple graphs: the so-called ”k arate club” unipartite graph from [35] shows friendship relations b etw een members of a k arate sp ort club; and the ”Southern W omen” bipartite graph depicts relations b etw een southern American w omen participating in sev eral ev ents. Our method is then applied to F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 11 a medium-sized dataset extracted from a real-world situation. F or this purp ose, w e consider a bipartite graph (p eople tagged on photos) drawn from a student’s ”F aceb o ok” accoun t con taining an av erage num b er of photos and p eople. Lastly , this same metho d will be applied to human brain data in order to derive dep endencies b et w een several areas in the brain. W e also applied our metho d on several w ell kno wn unipartite and bipartite graph benchmarks as w ell as on big size benchmarks. 5.1. Unip artite gr aph: Kar ate club The k arate club graph [35] is a w ell-kno wn b enchmark showing friendship relations b et w een mem b ers of a k arate club; it is a unipartite graph on which many partition- ing algorithms hav e b een exp erimented. Consequently , this set-up makes it p ossible not only to v erify that our metho d for bipartite graphs when applied to unipartite graphs meets exp ected results, but also to assess the additional kno wledge extracted from o v erlapping. W e b egan by directly applying the Louv ain algorithm to the original unipartite graph, represented by its adjacency matrix A . which yielded four separate commu- nities (as shown in 2). These are the same communities extracted by other authors, e.g. [25]. During a second experiment, we considered that the adjacency matrix A is in fact a biadjacency matrix B whic h is representativ e of a bipartite graph whose corresp onding ob jects are the club members and whose prop erties are also club mem b ers. An edge exists in the bipartite graph b etw een a club member-ob ject and a club member-prop erty pro vided an edge is present b et w een the tw o club members in the original unipartite graph. The new A 0 adjacency matrix is A 0 =  O r B B t O s  , where B = A + I . and where I is the identit y matrix (as explained in section 3.1.4). W e once again apply the Louv ain algorithm to A 0 . R esults. As exp ected, these same four communities identified in the unipartite graph ha v e b een extracted from the bipartite graph, with the same individuals app earing twice in eac h comm unity (see Figure 2). This initial result confirms the absence of bias when transforming a unipartite graph into a bipartite one. The second result is more p ertinent b ecause it reveals an ov erlap betw een communities when considering legitimacy v alues. If we were to consider just the cell colorings in the figure, an ov erlap would b e observ able whenever at least one no de from a comm unit y is linked to other nodes in another communit y . The legitimacy v alues that indicate the in volv ement of eac h no de in each comm unity offer an effectiv e to ol for identifying and analyzing new features. Some slight differences ha v e b een noted in w orks by other authors: for example, in page 2, P orter [29] placed no de num b er 10 in the second comm unity . In our case, this no de has been placed in the first comm unit y , though the legitimacy v alue suggests that it should hav e b een placed in the second one, in which case the situation would b e reversed in the second comm unit y and node 10 w ould ha ve a legitimacy v alue that alters its placemen t in the first communit y . No de 10 is th us in a hesitation mo de b etw een the tw o F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 12 Cr amp es, Planti ´ e Fig. 1. Karate club graph with partitioned communities Fig. 2. Karate club communities and legitimacy measures comm unities. T o the b est of our kno wledge, this exp eriment represents the first time Karate comm unities are sho wn as separate and ov erlapping. Partitioning provides a prac- tical w a y to observe communities; how ever, ov erlapping reveals the extent to which partitioning reduces the amoun t of initial information. With our metho d for exam- ple, it can b e seen that some no des actually straddle several comm unities, e.g. no de 10 in our exp erimen t. 5.2. Unip artite gr aphs: other known b enchmarks W e ha ve applied our method on several other w ell known unipartite graphs, such as Dolphins and other b enchmark graphs such as those in [11]. As for the “Karate” case, we get the same communit y partitions as Newman algorithm [25]. [15] pro- p osed a well known algorithm to generate b enc hmark graphs (also used by [29, 8, 12] and others) where communities are w ell identified. W e used this algorithm to generate 30, 128, 500 and 1000 no de suc h graphs to test our algorithms and show the efficiency of our metho d. W e do find the same num b er of communities as Newman’s algorithm since the mo dularit y form ula w e use is directly derived from Newman’s one and we get the same analysis and results as in [15]. How ever w e prov ide a v ery interesting knowledge with supplementary data to observe node ov erlapping on these comm unities. The mo dularity has a limited resolution that dep ends on the n um b er of edges in the netw ork [9]. W e observed a main consequence of the resolution limit: the mo dules in large netw orks may hav e hidden substructures that require deep er inv estigations F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 13 to rev eal. 5.3. Bip artite gr aph: Southern Women This b enchmark has b een studied by most authors interested in chec king their par- titioning algorithm for bipartite graphs. The goal here is to partition, in to v arious groups, 18 w omen who attended 14 social ev ents according to their level of par- ticipation in these even ts. In his well-kno wn cross-sectional study , [10] compared results from 21 authors, most of whom iden tified tw o groups. R esults. In Figure 3, the bipartite graph is depicted as a bi-lay er graph in the middle with women at the top and even ts b elow; moreo v er, the edges b et w een w omen and ev en ts represent w oman-ev en t participations. Three clusters with asso- ciated women and even ts hav e b een found and even tually shown with red, blue and y ello w colorings. This result is more accurate than the ma jority of results presented in [10]; only one author found three female communities. Bey ond mere partitioning, Figure 3 presen ts ov erlapping communities using tw o ov erlapping functions, namely legitimacy and reassignment mo dularity (RM). Legitimacy and RM for women are placed just ab ov e female partitioning; for even ts, b oth are symmetrically shown b e- lo w even t partitioning. As exp ected, reassignment in the same comm unit y pro duces a zero RM v alue. The b est v alues for legitimacy and RM hav e b een underscored. Only the v alues of woman 8 and even t 8 indicate that they could hav e b een in an- other communit y . This is the outcome of early assignment during the first Louv ain phase for en tities with equal or nearly equal probabilities across several commu- nities. It can b e observed in [10] that w oman 8’s comm unity is also debated b y sev eral authors; our results app ear to b e particularly p ertinent in terms of both partitioning and o v erlapping. The fact that women and even ts are correlated may b e considered to cause a bias, such as in the num b er of communities. When comparing our results to those of other authors ho wev er, the merging of our blue and yello w communities produces their corresponding second comm unity . In their trial designed to obtain a v arying n um b er of comm unities in both sets, Suzuki [32] found a large num b er of singletons. Their results were far from those presented in [10], while ours w ere compatible and more highly detailed. In conclusion, results on the Southern W omen’s b enchmark are particularly rel- ev an t. Moreov er, our visualization enables observing comm unity partitioning, o ver- lapping and p ossible assignment contradictions. The application of reassignment for b etter mo dularit y optimization will b e tested in a subsequent work. 5.4. Bip artite gr aph: F ac eb o ok ac c ount In a F aceb o ok (FB) account, sev eral types of informations may be extracted. W e extracted and ev aluated only data coming from FB photo albums with its tags. W e did not use friendship relations. Three F aceb o ok photo files were downloaded from v arious F acebo ok (FB) accoun ts. All these files w ere extracted with the consent F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 14 Cr amp es, Planti ´ e Fig. 3. W omen Ev ents communities with legitimacy and Reassignmen t mo dularity measures of their o wners, none of whom w ere mem b ers of the researc h team. A person w as considered to be linked to a photo if he/she had been tagged in the photo. W e then ha ve a bipartite graph comp osed of t wo t yp e of no des : persons and photos. Comm unit y extraction using our metho d reveals some common features among the datasets. These features are sho wn in Figure 4 for one FB photo file, in which 274 p eople could b e identified in a total of 644 photos. R esults. Communities are seldom o verlapping, whic h supp orts the notion that the photos w ere tak en at different times in the o wner’s life (this is to be confirmed in a forthcoming study). When the owner was asked to comment on the communities, t w o main observ ations w ere submitted. The v arious groups of p eople were indeed consisten t, y et with one exception. The owner w as asso ciated in the partition with a group she had met on only a few o ccasions and not asso ciated with other groups of close friends. An analysis of the results provided a go o d explanation, which is partially display ed in Figure 4. F rom this view, the FB account owner is in the first comm unit y on the left, yet she is also present in most of the other communities (see grey color lev els in the first column). Although at first glance it might b e assumed that she is not part of other comm unities, our visualization indicates that suc h is not the case. She is presen t in most comm unities, even though she is mainly iden tified in the first one. Three types of photos can b e distinguished in this first communit y . More than 200 photos only con tain the owner’s tag, plus a few photos with unique tags of another communit y member; for every other p erson, at least one photo tags him/her with the o wner. This first communit y has in fact b een built from the first group with photos of unique owner’s tags asso ciated with the o wner. The owner’s tag th us encompasses photos con taining t wo p eople, one of whom is the o wner. It turns out that this group is predominan tly the owner’s group. In conclusion, partitioning only the bipartite graph w ould hav e pro duced a ma jor pitfall: the o wner would hav e b een isolated in a communit y that is not his/her top preference. With our metho d, merging partitioning and o verlapping exp oses b etter m ultiple regroupings with broader affinities. Other communities also sho wed high consistency when considering the photos: eac h comm unity was asso ciated with some particular even t responsible for gathering a group of the FB accoun t owner’s friends. F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 15 Fig. 4. F aceb o ok account communities with o verlapping 5.5. Bip artite gr aph: Br ain Data Our metho d was initially designed for h uman communit y detection and analysis. In this exp eriment, w e hav e demonstrated how it can b e applied to other data analysis techniques as well. The brain dataset was collected on a single patient b y a researc h team affiliated with the ”Human Connectome” pro ject working on brain tractograph y techniques [4]. These techniques use Magnetic Resonance Imaging (MRI) and Diffusion T ensor Imaging (DTI) to explore white matter tracks betw een brain regions. Probabilistic tractograph y pro duces ’connectivit y’ matrices betw een Regions Of In terest (R OI) in the brain. F or the case w e studied, ’seed’ R OIs w ere lo cated in the occipital lobe and ’target’ R OIs throughout the en tire brain. The goal here was to detect p ossible brain areas in the o ccipital lob e through ROI clustering on the basis of similar track b eha vior. In [4], the research team used Sp ectral Clustering (SC) to combine R OIs. It is in teresting to note that SC is one of numerous techniques that hav e traditionally b een applied in so cial communit y detection, e.g. b y Bonacich on the Southern W omen’s benchmark [10]. SC results are limited to communit y partitioning (though in theory o v erlapping could also b e computed). The goal was to experiment with our method and pro duce b oth partitioning and o v erlapping analyzes of brain areas. The original matrix contained 1,914 ro ws and 374 columns, with cells denoting the probabilities of link age b etw een R OIs. W e considered this matrix as a bipartite graph biadjacency matrix with weigh ted v alues and then applied our communit y detection metho d. Figure 5 presents the results of ROI comm unit y partitioning and o v erlapping. Eac h color in the first row is associated with a comm unity that gathers sev eral ROIs. Each R OI is represented b y a column that indicates its belonging to the other communities. When a cell is highligh ted with a color, a nonzero ov erlap- ping v alue exists for both this ROI and the corresp onding comm unity (with comm u- nit y num b ers b eing plotted on the left-hand side of the figure). This v alue has b een computed with the legitimacy function, which has b een extended to the w eighted edges, i.e. the weigh ted sum of v alues from cerebral hemisphere zones (ELF) within the selected communit y . Eac h communit y is asso ciated with a threshold v alue corre- sp onding to the maximum w eigh ted legitimacy ab ov e which the communit y would lose a full member. F or each communit y , this threshold v alue is automatically com- F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 16 Cr amp es, Planti ´ e Fig. 5. Brain data communities with ov erlapping puted in order to include all R OI members of the communit y . R esults. W e found 7 communities when neurologists selected 8 clusters with SC and after choosing the most significant eigen vectors on a scree test. Let’s observ e that t wo comm unities o verlap heavily on all others, whic h th us o verlap to a lesser exten t. Figure 5 confirms the strong interest in this set-up that simultaneously exhibits o v erlapping and non-ov erlapping data. These results hav e b een taken into accoun t by a team of neurological researc hers as different observ ations recorded on brain parcellation. 5.6. Bip artite gr aphs: others known b enchmarks Bipartite graph datasets are not easy to find in litterature. W e also tested our algorithms with bipartite netw orks used as b enchmark net works in [1]. One of them is the netw ork b enchmark describing corp orate in terlo cks in Scotland in the early t w en tieth century . The data set characterizes 108 Scottish firms during 1904-5, detailing the corp orate sector, capital, and b oard of directors for each firm. The data set includes only those b oard members who held m ultiple directorships, totaling 136 individuals. Barber found “roughly” (sic) 20 comm unities, whereas w e find 15 comm unities and pro vide v ery in teresting kno wledge about o v erlapping for these comm unities. W e obtained a global modularity of 0.71038 whereas Barb er found a smaller v alue of 0.56634. T o ev aluate scalability on our method we tested a rather big co-authorship bipartite dataset to detect scientific communities e xtracted from the w ell kno wn PubMed (http://www.ncbi.nlm.nih.go v/pubmed) biomedical scientific litterature online library . Our dataset w as comp osed of 30,000 persons and more than 80,000 scien tific pap ers. W e extracted 184 communities of av erage 670 members in ab out 3 seconds, with interesting ov erlapping information. Regarding resolution limit men- tionned earlier, the mo dularity metho d applied to bipartite graphs has a similar limit, with similar consequences. 6. Discussion and new p erp ectives The ab o ve exp eriments show that our metho d is able to find ov erlapping com- m unities in different t yp es of graphs. Moreo ver, it is able to measure the degree of mem b ership for each no de to each communit y . W e then get a first seman tic in terpre- tation of each no de in terms of communit y membership. These results are obtained F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 17 through the use of the off-diagonal blo ck square matrix A 0 . Several other methods ma y compute mo dularit y by using directly a graph structure without building an y off-diagonal blo ck matrix. F or example LP A based metho ds [12, 18] which use Bar- b er’s modularity definition for bipartite graphs ma y w ork directly with the graph structure. Ho wev er the results that are presented in these papers are differen t. They find 4 communities for the W omen Even ts dataset instead of 3 in our case. Since Barb er’s modularity expression is different from ours, it is difficult to compare these differen t results. Louv ain algorithm which uses Newman’s mo dularity formula is adapted to monopartite graphs. Since the approach with the block matrix requires more data and computation, we also tried applying Louv ain algorithm directly on to the biad- jacency matrix B . W e tested it on the same tw o bipartite data sets: W omen Even ts and F aceb o ok. Surprisingly , we got the same results as those with the blo ck square matrix A 0 . This exp eriment suggests the p ossibility of directly a pplying unipartite graph mo dels onto bipartite graph mo dels with unipartite graph mo dularity . More- o v er our metho d with the blo ck matrix A 0 could b e a go od means for v alidating this p ossibilit y . This coun ter in tuitive conclusion needs more exp eriments and more theoreti- cal pro of. Particularly since other authors use Barb er’s mo del whic h is specifically adapted for bipartite graphs. F uture work will deep er inv estigate this p ossibility of directly applying unipartite graph metho ds to bipartite graphs. 7. Conclusion In this pap er, we hav e demonstrated the feasibilit y of unifying bipartite graphs, di- rected graphs and unipartite graphs under a common unipartite graph model. It w as then prov ed that any unipartite graph partitioning algorithm aiming at optimizing the standard unipartite modularity model leads to a bipartite graph partitioning, wherein both types of nodes are bound in the comm unities. In the special case of directed graphs, no des app ear twice in p otentially differen t communities dep ending on their roles; for unipartite graphs, no des are cloned and app ear with their clones in the same comm unities. W e also in tro duced the p ossibility of unifying in a single view, the partition- ing and the ov erlapping communities. This developmen t is p ossible thanks to as- so ciating b oth t yp es of no des in the communities. Moreov er, o verlapping can be c haracterized through sev eral functions presen ting different in terpretations. F or in- stance, it is p ossible to identify those nodes that define the communit y cores, i.e. those who b elong exclusively to just one comm unity and, con versely , those who serv e as bridges b etw een different comm unities. W e also introduced reassignment v alues whic h op en up the p ossibility of improving partitioning results. Practically sp eaking, when applying our method to v arious b enchmarks and datasets, we are able to extract meaningful communities and displa y surprising ov erlapping prop er- ties when other authors limit their goal to iden tifying communities. W e extend far F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 18 Cr amp es, Planti ´ e b ey ond this p oin t and provide to ols for analyzing and interpreting results. Lastly , we introduced an essential result after exp erimenting on real brain datasets, supplied by a research team from the Connectome pro ject. Historically authors dealing with communit y detection problems used to b orrow their meth- o ds from data or graph analysis such as hierarchical clustering, clique enumeration or sp ectral analysis. Recen t communit y detection approac hes based on mo dularity optimization use original methods (Louv ain, label propagation). W e show ed that these methods could also be applied to data analysis with go o d results. Moreo ver these results can b e obtained without the need to choose parameters such as the n um b er of clusters, or a threshold v alue. It is of particular in terest to note that after b orrowing their methods from other scien tific domains, communit y detection tec hniques are no w enough mature for pro viding these domains with new original p erforming metho ds. In the future we will contin ue exploring cross fertilization b etw een communit y detection tec hniques and other scientific domains. In particular we will use Nash equilibrium for studying communit y stabilit y through the reassignmen t v alue w e in tro duced in this pap er. Indeed we think that communit y stability could b e an- other qualit y criteria along with mo dularity optimization for driving and assessing comm unit y detection algorithms’ p erformances. A cknow le dgments The authors w ould lik e to thank the Connectome research team, as part of the “CAF O” pro ject (ANR-09-RPDOC-004-01 pro ject) as well as the CRICM UPMC U975/UMRS 975/UMR 7225 research group for providing the original brain dataset. References [1] Mic hael Barber. Mo dularity and comm unity detection in bipartite netw orks. Physic al R eview E , 76(6):1–9, 2007. [2] Mic hael J Barber and John W Clark. Detecting net work comm unities by propagating lab els under constraints. Physic al R eview E - Statistic al, Nonlinear and Soft Matter Physics , 80(2 Pt 2):026129, 2009. [3] Vincen t D Blondel, Jean-Loup Guillaume, Renaud Lam biotte, and Etienne Lefebvre. F ast unfolding of communities in large netw orks. Journal of Statistic al Mechanics: The ory and Exp eriment , 2008(10):P10008, Octob er 2008. [4] Marco Catani and Mic hel Thiebaut de Schotten. Atlas of human br ain c onne ctions . Oxford Universit y Press, 2012, 2012. [5] Abhijnan Chakrab orty , Saptarshi Ghosh, and Niloy Ganguly . Detecting ov erlapping comm unities in folksonomies. In Pro c e e dings of the 23r d ACM confer enc e on Hyp ertext and so cial me dia HT 12 , page 213. ACM Press, 2012. [6] Ernesto Estrada and Juan A Ro driguez-V elazquez. Complex Net works as Hyper- graphs. Systems R ese ar ch , page 16, 2005. [7] T S Ev ans and R Lam biotte. Line Graphs, Link P artitions and Overlapping Commu- nities. Physic al R eview E , 80(1):9, 2009. [8] San to F ortunato. Communit y detection in graphs. Physics R ep orts , 486(3-5):103, June 2009. F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 19 [9] San to F ortunato and Marc Barth ´ elem y . Resolution limit in comm unity detection. Pr o c ee dings of the National A c ademy of Scienc es of the Unite d States of Americ a , 104(1):36–41, 2007. [10] Lin ton C. F reeman. Finding so cial groups: A meta-analysis of the southern w omen data. In Dynamic So cial Network Mo deling and Analysis. The National A c ademies , pages 39—-97. Press, 2003. [11] M. Girv an and M E J Newman. Communit y structure in social and biological net- w orks. Pr o c e e dings of the National Ac ademy of Scienc es of the Unite d States of Amer- ic a , 99(12):7821–7826, 2002. [12] Stev e Gregory . Finding o v erlapping comm unities in netw orks by lab el propagation. New Journal of Physics , 12(10):103018, 2009. [13] Roger Guimer` a, Marta Sales-P ardo, and Lu ´ ıs Amaral. Mo dule iden tification in bi- partite and directed netw orks. Physic al R eview E , 76(3), September 2007. [14] Andrea Lancic hinetti, San to F ortunato, and J´ anos Kert´ esz. Detecting the ov erlapping and hierarc hical communit y structure in complex net works. New Journal of Physics , 11(3):033015, March 2009. [15] Andrea Lancichinetti, Santo F ortunato, and Filipp o Radicchi. Benchmark graphs for testing comm unity detection algorithms. Physic al R eview E - Statistic al, Nonline ar and Soft Matter Physics , 78(4 Pt 2):6, 2008. [16] Conrad Lee, F ergal Reid, Aaron McDaid, and Neil Hurley . Detecting highly ov er- lapping communit y structure by greedy clique expansion. 4th Workshop on So cial Network Mining and Analysis SNAKDD10 , 10:10, 2010. [17] E A Leich t and M E J Newman. Comm unity structure in directed netw orks. Physic al R eview L etters , 100(11):118703, 2007. [18] Liu Xin and Murata Tsuy oshi. An Efficient Algorithm for Optimizing Bipartite Mo d- ularit y in Bipartite Netw orks. Journal of A dvanc e d Computational Intel ligenc e and Intel ligent Informatics , 14(4):408–415, 2010. [19] Mic hel Plan ti´ e and Mic hel Cramp es. Mining so cial netw orks and their visual seman- tics from so cial photos. International Journal of Computer scienc e & Applic ations , VI II(I I):102–117, 2011. [20] Tsuy oshi Murata. Mo dularities for bipartite netw orks. Pr o c e e dings of the 20th ACM c onfer enc e on Hyp ertext and hyperme dia HT 09 , 90(6):245–250, 2009. [21] Tsuy oshi Murata. Detecting communities from tripartite netw orks. WWW , pages 0–1, 2010. [22] Neubauer Nicolas and Ob ermay er Klaus. T ow ards Communit y Detection in k-Partite k-Uniform Hyp ergraphs. In Pr oc e e dings NIPS 2009 . . . , 2009. [23] Mark Newman. F ast algorithm for detecting comm unity structure in netw orks. Phys- ic al R eview E , 69(6), June 2004. [24] Mark Newman. Finding comm unit y structure in net works using the eigenv ectors of matrices. Physic al R eview E - Statistic al, Nonline ar and Soft Matter Physics , 74(3 Pt 2):036104, 2006. [25] Mark Newman and M. Girv an. Finding and ev aluating communit y structure in net- w orks. Physic al R eview E , 69(2), F ebruary 2004. [26] Andreas Noac k and Randolf Rotta. Multi-lev el algorithms for mo dularity clustering. arXiv , page 12, December 2008. [27] Gergely P alla, Imre Der ´ enyi, Ill´ es F ark as, and T am´ as Vicsek. Uncov ering the o ver- lapping comm unity structure of complex netw orks in nature and so ciety . Natur e , 435(7043):814–8, June 2005. [28] S Papadopoulos, Y Kompatsiaris, A V ak ali, and P Spyridonos. Comm unity detection in So cial Media. Data Mining and Know le dge Disc overy , 1(June):1–40, 2011. F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 20 Cr amp es, Planti ´ e [29] Mason A. P orter, Jukk a-Pekk a Onnela, and Peter J. Muc ha. Communities in Net- w orks, 2009. [30] Camille Roth and Paul Bourgine. Epistemic Communities: Description and Hierar- c hic Categorization. Mathematic al Population Studies: An International Journal of Mathematic al Demo gr aphy , 12(2):107–130, 2005. [31] Sune Lehmann,Martin Sc hw artz,Lars Kai Hansen. Biclique communities. Physic al r eview. E, Statistic al, nonline ar, and soft matter physics , 78(1 Pt 2), 2008. [32] Ken ta Suzuki and Ken W akita. Extracting Multi-facet Comm unity Structure from Bipartite Netw orks. 2009 International Confer enc e on Computational Scienc e and Engine ering , 4:312–319, 2009. [33] Zhihao W u, Y oufang Lin, Huaiyu W an, Shengfeng Tian, and Keyun Hu. Efficient o verlapping communit y detection in huge real-world netw orks. Physic a A: Statistic al Me chanics and its Applic ations , 391(7):2475 – 2490, 2012. [34] Bo Y ang, Da you Liu, Jiming Liu, and Bork o F urh t. Disc overing c ommunities fr om So cial Networks: Metho dolo gies and Applic ations . Springer US, Boston, MA, 2010. [35] W W Zac hary . An information flo w mo del for conflict and fission in small groups. Journal of A nthr op olo gic al R ese ar ch , 33(4):452–473, 1977. 8. Annex 1 8.1. New use of Newman mo dularity In this Annex, we will provide full details of the demonstration that yielded Equation (2). F or the sak e of conv enience, let’s use the definition of unipartite graph modu- larit y offered in Newman [17]. It is a function Q of matrix A 0 and the communities detected in G [25]: Q = 1 2 m X i,j  A 0 ij − k i k j 2 m  δ ( c i , c j ) (11) where A 0 ij denotes the w eight of the edge betw een i and j , k i = P j A 0 ij is the sum of the weigh ts of edges attac hed to v ertex i , c i is the comm unity to which v ertex i has been assigned, the Kroneck er’s function δ ( u, v ) equals 1 if u = v and 0 otherwise and m = 1 / 2 P ij A 0 ij . In our particular case (i.e. where A 0 is the off-diagonal blo ck adjacency matrix of a bipartite graph), w e apply the follo wing transformations: Let’s rename i 1 as index i when 1 ≤ i ≤ r and i 2 when r < i ≤ r + s . Con versely , let’s rename j 1 the index j when 1 ≤ j ≤ r and j 2 when r < j ≤ r + s . T o av oid confusion b et w een the A 0 ’s indices and B ’s indices let’s rename B indices i b and j b : 1 ≤ i b ≤ r and 1 ≤ j b ≤ s (see a representation of A matrix b elo w (Figure 12)) F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 21 A 0 = A 0 indexes ↓→ ....j 1 .... ....j 2 .... ... ... i 1 O r B i b r r ow s ... ... ... ... i 2 B t O s j b s r ow s ... ... ....i b .... ....j b .... ←↑ B indexes r col umns s col umns (12) Let’s call k i b the margin of ro w i b in B and k j b the margin of column j b in B . k i b = X j b B i b j b = X j 2 A 0 i 1 j 2 = X i 2 A 0 i 2 j 1 , w her e i b = i 1 = j 1 (13) k j b = X i b B i b j b = X i 1 A 0 i 1 j 2 = X j 1 A 0 i 2 j 1 , w her e j b = i 2 – r = j 2 – r (14) k i b is the degree of no de u i b , k j b is the degree of no de v j b . Let’s define k i/j 1 = P j 1 A 0 ij 1 and k i/j 2 = P j 2 A 0 ij 2 . Conv ersely : k j /i 1 = P i 1 A 0 j i 1 and k j /i 2 = P i 2 A 0 j i 2 . Hence : k i = P j A 0 ij = k i/j 1 + k i/j 2 , k j = P i A 0 ij = k j /i 1 + k j /i 2 . By taking in to accoun t the structure and prop erties of A 0 in (13) and (14) for the indices w e deriv e the follo wing prop erties : k i/j 1 has non-zero v alues only for i = i 2 , with k j b the degree of no de v j b : k i/j 1 = k i 2 /j 1 = X j 1 A 0 i 2 j 1 = X i 1 A 0 i 1 j 2 = k j 2 /i 1 = k j b (15) k i/j 2 has non-zero v alues only for i = i 1 , with k i b the degree of no de u i b : k i/j 2 = k i 1 /j 2 = X j 2 A 0 i 1 j 2 = X i 2 A 0 i 2 j 1 = k j 1 /i 2 = k i b (16) Moreo v er and more directly: k j /i 1 offers v alues only for j = j 2 : k j /i 1 = k j 2 /i 1 = k i 2 /j 1 = k j b , the degree of no de v j b . k j /i 2 offers v alues only for j = j 1 : k j /i 2 = k j 1 /i 2 = k i 1 /j 2 = k i b , the degree of no de u i b . 8.2. Analyzing se c ond p art of Q in e quation (11) Using these prop erties of matrix A 0 , it is now p ossible to analyze P ij k i k j . in equation (11). Next, by dev eloping k i and k j in A 0 w e obtain: P ij k i k j = P ij ( k i/j 1 + k i/j 2 )( k j /i 1 + k j /i 2 ) F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 22 Cr amp es, Planti ´ e = P ij k i/j 1 k j /i 1 + P ij k i/j 2 k j /i 2 + P ij k i/j 1 k j /i 2 + P ij k i/j 2 k j /i 1 = X i 2 j 2 k i 2 /j 1 k j 2 /i 1 + X i 1 j 1 k i 1 /j 2 k j 1 /i 2 + X i 2 j 1 k i 2 /j 1 k j 1 /i 2 + X i 1 j 2 k i 1 /j 2 k j 2 /i 1 (17) Let’s note that P ij k i/. k j /. = P i k i/. P j k j /. where the dot may take any v alue in i 1 , i 2 , j 1 , j 2 . Let c be a comm unity , in equation (11) summations P ij k i k j on indices i and j may only b e applied under the condition δ ( c i , c j ) = 1. Where an edge is presen t b et w een t w o nodes u and v belonging to c : δ ( c i , c j ) = 1 and δ ( c j , c i ) = 1. Conse- quen tly for each row i representing a no de b elonging to c , a corresp onding column j represents this same no de b elonging to c and vic e versa . F rom (15), (16) and the ab o v e observ ations: P ij k i/j 1 k j /i 1 δ ( c i , c j ) = P i k i/j 1 P j k j /i 1 δ ( c i , c j ) = P i 2 k i 2 /j 1 P j 2 k j 2 /i 1 δ ( c i 2 , c j 2 ) = P j b k j b P j b k j b = [ P j b k j b ] 2 P ij k i/j 2 k j /i 2 δ ( c i , c j ) = P i k i/j 2 P j k j /i 2 δ ( c i , c j ) = P i 1 k i 1 /j 2 P j 1 k j 1 /i 2 δ ( c i 2 , c j 2 ) = P i b k i b P i b k i b = [ P i b k i b ] 2 P ij k i/j 1 k j /i 2 δ ( c i , c j ) = P i k i/j 1 P j k j /i 2 δ ( c i , c j ) = P i 2 k i 2 /j 1 P j 1 k j 1 /i 2 δ ( c i 2 , c j 1 ) = P j b k j b P i b k i b P ij k i/j 2 k j /i 1 δ ( c i , c j ) = P i k i/j 2 P j k j /i 1 δ ( c i , c j ) = P i 1 k i 1 /j 2 P j 2 k j 2 /i 1 δ ( c i 2 , c j 1 ) = P i b k i b P j b k j b where j b = i 2 – r = j 2 – r , i b = i 1 = j 1 , u i b ∈ c and v i b ∈ c these last tw o conditions can also b e formalized with δ ( c i b , c j b ) = 1 if u i b and v i b b elong to the same comm unit y c and δ ( c i b , c j b ) = 0 otherwise. This dev elopmen t yields : P ij k i k j = [ P j b k j b ] 2 + [ P i b k i b ] 2 + 2[ P j b k j b ][ P i b k i b ] = P i b j b ( k i b + k j b ) 2 and: X ij k i k j δ ( c i , c j ) = X i b j b ( k i b + k j b ) 2 δ ( c i b , c j b ) (18) Equation (18) can b e rewritten using the degrees of no des: P i b k i b is the sum of the degrees of no des u i b b elonging to c under the condition δ in equation (18). W e denote this d u | c . P j b k j b is the sum of the degrees of no des v j b b elonging to c under the condition δ in equation (18) and has b een called d v | c . T hen X ij k i k j δ ( c i , c j ) = ( d u | c + d v | c ) 2 (19) 8.3. Analyzing first p art in e quation (11) First part in Q is P ij A 0 ij . Let’s examine what it represents in terms of B . It is p ossible to iden tify matrix B in A 0 using indices i 1 and j 2 . Con versely B t can be iden tified with indices i 2 and j 1 : F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 A Unifie d Community Dete ction, Visualization and Analysis metho d 23 F or i = i 1 A 0 ij s only pro duce v alues for j = j 2 , moreov er for i = i 2 , A 0 ij s only pro duce v alues for j = j 1 with A 0 i 1 j 2 = B i b j b and A 0 i 2 j 1 = B t i b j b under typical conditions regarding indices. Then P ij A 0 ij = P i 1 j 2 A 0 i 1 j 2 + P i 2 j 1 A 0 i 2 j 1 And P ij A 0 ij δ ( c i , c j ) = P i 1 j 2 A 0 i 1 j 2 δ ( c i 1 , c j 2 ) + P i 2 j 1 A 0 i 2 j 1 δ ( c i 2 , c j 1 ) The left-hand side of the sum equals the num b er of edges from no des u to no des v inside c . The right-hand side is the num b er of edges from these same no des v and u inside c . This set-up then leads to: P i 1 j 2 A 0 i 1 j 2 δ ( c i 1 , c j 2 ) = P i 2 j 1 A 0 i 2 j 1 δ ( c i 2 , c j 1 ) w ith i 1 = j 2 and i 2 = j 1 T hen X ij A 0 ij δ ( c i , c j ) = 2 X i 1 j 2 A 0 i 1 j 2 δ ( c i 1 , c j 2 ) = 2 X i b j b B i b j b δ ( c i b , c j b ) (20) This v alue can also b e formalized using the n um b er of edges: X i b j b B i b j b δ ( c i b , c j b ) = | ( u i b | c , v j b | c ) | = | e i b | c ,j b | c | w her e e i b | c ,j b | c ∈ E & u i b | c , v j b | c ∈ c (21) F or the en tire matrix A 0 : P ij A 0 ij = 2 P i b j b B i b j b F rom equation (11), m = 1 / 2 P ij A 0 ij Let’s no w define m b = P i b j b B i b j b = | e i b j b | where e i b j b ∈ E Then m = 1 2 × P ij A 0 ij = 1 2 × 2 × P i b j b B i b j b = m b 8.4. Mo dularity for al l gr aphs Lastly , b y removing sub-index b , which had only b een in tro duced to distinguish indices i and j when applied to A 0 or B , w e can redefine the A 0 mo dularit y in terms of B : Q B = 1 m X ij [ B ij − ( k i + k j ) ² 4 m ] δ ( c i , c j ) (22) In terms of edges, by simplifying e i b | c ,j b | c as e c (where e c has b oth ends in c ) and b y dropping sub-index b Equation (22) b ecomes: Q B = X c [ | e c | m –( ( d u | c + d v | c ) 2 × m ) ² ] (23) This definition of mo dularity may b e used for bipartite graphs since b oth types of no des are b ound. In previous sections, we hav e v alidated the ab ov e results on the basis of another author’s graph mo dularity mo dels. It can thus be concluded that equation (22) offers a go o d candidate for bipartite graph mo dularity that takes some sp ecific c haracteristics in to account. F ebruary 26, 2014 1:57 WSPC/INSTR UCTION FILE draftv20.0.2 24 Cr amp es, Planti ´ e 9. Annex 2: Reassignmen t Mo dularit y function In this App endix, we will pro vide full details of the demonstration that yielded Equation 10. Reassigning node w from C 1 to C 2 either increases or decreases the mo dularity defined in Equation (2). Such a c hange is referred to as Reassignment Mo dularity ( RM w : C 1 → C 2 ). Let w b e a node u or v . If w is withdrawn from C 1 and reassigned to C 2 , then w e can define RM w : C 1 → C 2 = Q B w ∈ C 2 - Q B w ∈ C 1 where Q B is the mo dularit y v alue in: Q B = X c [ | e c | m –( ( d u | c + d v | c ) 2 × m ) ² ] . (24) Let l w | i = l w,w 0 | w 0 ∈ C i b e the n umber of edges b etw een a no de w and all other no des w 0 where w 0 ∈ C i , Let d w b e the degree of w , | e i | the num b er of edges in C i and d C i = d u | c i + d v | c i . W e consider that the no de w which b elongs to C 1 is b ound to b e withdrawn from this comm unit y and assigned to the communit y C 2 . Q B w ∈ C 2 is Q B w ∈ C 1 with correction after w is reassigned. Then Q B w ∈ C 1 = [ 1 m | e 1 | − ( d C 1 ) ² (2 m ) 2 + 1 m | e 2 | − ( ( d C 2 ) ² (2 m ) 2 )] + K others where K others is the con tribution to mo dularity brough t by other comm unities than C 1 and C 2 . This last v alue do es not c hange when reassigning a no de from C 1 to C 2 . Q B w ∈ C 2 = [ 1 m ( | e 1 | − l w | 1 ) + 1 m ( | e 2 | + l w | 2 ) − ( ( d C 1 − d w ) 2 (2 m ) 2 + ( d C 2 + d w ) 2 (2 m ) 2 )] + K others , then Q B w ∈ C 2 - Q w ∈ C 1 = [ 1 m ( | e 1 | − l w | 1 ) + 1 m ( | e 2 | + l w | 2 ) − ( ( d C 1 − d w ) 2 (2 m ) 2 + ( d C 2 + d w ) 2 (2 m ) 2 )] − [ 1 m | e 1 | − ( d C 1 ) ² (2 m ) 2 + 1 m | e 2 | − ( ( d C 2 ) ² (2 m ) 2 )] and after simplification, RM w : C 1 → C 2 = 1 m ( l w | 2 − l w | 1 ) − 1 2 m 2 [ d 2 w + d w ( d C 2 − d C 1 )] (25) This equation can b e partly v alidated if after withdrawing w from C 1 w e put it bac k into C 1 and exp ect no change for Q B , i.e. RM w : C 1 → C 1 = 0. Considering that C 2 is in fact C 1 without w , we get d C 2 = d C 1 − d w , replacing d C 2 in equation (25) b y its v alue yields RM w : C 1 → C 1 = 0. A second v alidation can b e p erformed with Equation 5 in [33]. Although the authors’ demonstration is limited, it can still b e noticed that their final formula resem bles ours with a sligh t difference (i.e. division b y 2 in their case) due to their definition of modularity for ov erlapping communities. Moreo ver, in arguing that the righ t part of their equation is not meaningful for large graphs, the authors only con- sidered dE Q = l 2 − l 1 2 m whic h is the equiv alent of 1 m ( l w | 2 − l w | 1 ) in our Reassignment Mo dularit y definition. In our case, we do not limit reassignment to large graphs and w e k eep the whole v alue in Equation (25).

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment