Community detection in multiplex networks using locally adaptive random walks

Multiplex networks, a special type of multilayer networks, are increasingly applied in many domains ranging from social media analytics to biology. A common task in these applications concerns the detection of community structures. Many existing algo…

Authors: Zhana Kuncheva, Giovanni Montana

Community detection in multiplex networks using locally adaptive random   walks
Community Detection in Multiple x Networks using Locally Adapti v e Random W alks Zhana Kunche v a Department of Mathematics Imperial College London Giov anni Montana Department of Biomedical Engineering King’ s Colle ge London Abstract —Multiplex networks, a special type of multilay er networks, are increasingly applied in many domains ranging from social media analytics to biology . A common task in these applications concerns the detection of community structur es. Many existing algorithms for community detection in multiplexes attempt to detect communities which are shared by all layers. In this article we propose a community detection algorithm, LAR T (Locally Adaptive Random T ransitions), for the detection of communities that ar e shar ed by either some or all the layers in the multiplex. The algorithm is based on a random walk on the multiplex, and the transition probabilities defining the random walk are allowed to depend on the local topological similarity between layers at any given node so as to facilitate the exploration of communities acr oss layers. Based on this random walk, a node dissimilarity measure is derived and nodes are clustered based on this distance in a hierar chical fashion. W e present experimental results using networks simulated under various scenarios to showcase the performance of LAR T in comparison to related community detection algorithms. I . I N T RO D U C T I O N Many real world systems, including social and biological ones, are often represented as complex networks capturing the interacting nature of multiple agents populating the system [1]. The different agents are interpreted as the nodes of the network, and the relations among them are encoded by the edges of the network. An important aspect of network analysis is the discovery of community structures defined as groups of nodes that are more densely connected to each other than they are to the rest of the network [2]. A large body of work exists on community detection, and one such extensi ve revie w of the area is giv en by [3]. One useful way to explore the structural properties of a real network is to study the behavior of a discrete-time random walk on it [4]. Random walks hav e been successfully used to unfold the community structure on a network; see, for instance, [5], [6] and [7]. The main intuition behind these approaches is that a random walker that jumps from node to node with preset transition probabilities is expected to get “trapped” for longer times in denser regions defining the communities. A multiplex is a particular type of a multilayer network where all layers share the same set of nodes but may hav e very dif ferent topology [8]. The structure of the multiple x allows for layers to be connected by inter-layer weights. These weights represent some type of association between layers, and can be specified either by using information extracted directly from the multiplex or external data. There is a wide class of real networks that can be represented by a multiplex. Some examples are the social interactions between users with respect to various social media or the different transportation means between stations in a city; for a survey see [9]. One important area of research on multiplex networks is community detection since it can identify shared structures of nodes in the multiple layers. In this paper we in vestigate the task of detecting commu- nities that may be shared by a subset of the layers in the multiplex. Many multiplex community detection approaches identify a community partition that best fits all given layers, i.e. they detect communities shared by all layers. Some of these methods collapse the information into a single layer and then use traditional community detection algorithms for net- works, [10], while other methods extend community detection algorithms from one to multiple layers, [11], [12]. There exist real-world systems, ho wev er , for which some communities may be shared only by a subset of layers. T ake genomic data as an example, where groups of genes can be associated with specific functional processes relev ant to some tissues but not to others [13]. Relativ ely few solutions exist to address this problem of detecting communities in a subset of layers, [14], [15], therefore we propose a new approach to tackle the issue. The methodology we introduce in this paper is based on a discrete-time random walk on the multiplex. A multiplex random walk explores the network both within and between layers according to some preset transition probabilities [16]. W e offer a novel approach of adapting these transition proba- bilities to depend on the local topological similarity between any pair of layers, at any gi ven node. By encouraging jumps between nodes in dif ferent layers sharing similar topology , and penalizing jumps inv olving nodes across layers that do not share local topology , we aim to facilitate the exploration of potential communities that may be shared across layers. The resulting algorithm, called LAR T (Locally Adaptiv e Random T ransitions), defines a multiplex random w alker that will spend a longer time moving between nodes in communities which are shared across layers. The random walk will also get “trapped” in one layer for longer if a community is specific to this layer . W e take advantage of the properties of this random walk to introduce a distance measure between nodes, and an agglomerativ e clustering procedure is then used to detect communities within and between layers. The resulting algorithm can be considered as an extension of the W alkT rap algorithm [5] to the multiplex framework. The paper is organized as follows. In Section II, we provide a concise literature revie w of multiplex community detec- tion methods. Section III introduces the LAR T algorithm. In Section IV we provide an illustrativ e example to distinguish between communities shared by two or more layers (shared), and communities that are specific to one layer (non-shared). In Section V we compare the performance of the LAR T algorithm and other multiplex community detection methods. Our experimental results are based on networks simulated under various scenarios for illustrative purposes. In Section VI we provide concluding remarks and directions for future work. I I . R E L A T E D W O R K In recent years a handful of algorithms hav e been described in the literature to address the problem of finding community structures that are shared by all layers. A straightforward ap- proach relies on a simple layer aggregation procedure whereby all the layers are first collapsed into a single network so that traditional algorithms for community detection can be used afterward. The weights of the edges between any two nodes in the aggregated network are defined as a linear combination of the weights between those same nodes from each of the layers [10], and different assignments of these weights have been discussed in [10], [17], [18], [19]. Another direction consists of applying a community detection algorithm to each separate layer, and then combining all the resulting partitions either by using cluster ensemble approaches [20] or by merging communities across all layers such that each multiplex community contains a predefined minimum number of corresponding nodes [21]. Extensions of community detection algorithms from one to multiple layers have also been proposed in an attempt to take into account as much information as possible for each layer . T wo such examples rely on the extension of a function of modularity Q [2]. In its original form, the modularity Q is defined as the number of connections within a community compared to the expected number of such connections in an equiv alent random network. In [11], Principal Modularity Maximization concatenates the partitions obtained on each separate layer by maximizing modularity Q . Using results from Generalized Canonical Correlation Analysis, the authors obtain the final partition by computing the top k eigen vectors of the concatenated matrix. Another extension of modularity Q for a multiplex network is proposed in [14]. In addition to the usual interpretation of Q , this extension accounts for inter- layer weights that exist between nodes in dif ferent layers. In this way , communities that already exist in separate layers can be coupled. The two methods in [11] and [14] are robust to noise and variation in the layers. The authors of [15] provide an extension to a flow-based and information-theoretic algorithm known as Infomap [7]. The information flow is modeled as a random walk with teleporta- tion through the network. The best community partition over the network is scored by minimizing the map equation, which measures the description length of the random walker within and between communities. In [15] Infomap is generalized for multiplex networks by modifying some of the constraints in the original map equation to allow for nodes in different layers to be assigned to different communities. Another method for multiplex community detection is the top-bottom network partitioning approach [22], which uses a cross-layer edge clustering coef ficients to decide whether com- munities should be split. In [12], a multi-objective optimization algorithm iterativ ely maximizes the modularity of the current layer while simultaneously maximizing the similarity between the current and the previous layers. The authors of [23] extend a seed-centric algorithm to fit the multiplex network by means of multiplex centrality measures. Follo wing their work on subspace clustering methods used to detect a set of relev ant layers for each community , the authors of [24] use a search tree for detecting communities. Starting from a seed node, the proposed algorithm iterati vely expands the communities with respect to a quality function that can be specified by the user . Since a multiplex can be represented as a third order tensor [25], tensor decompositions have also been in vestigated for the problem of community detection in multiple layers; some of many such examples are [26], [27] and [28]. These methods obtain the partitioning using different tensor factorizations. They are advantageous since they are fast due to their closed- form solution, although the number of communities needs to be specified in advance. I I I . M E T H O D S In this section, we first discuss how the inter-layer weights at each node can be defined using a topological similarity measure, and introduce a supra-adjacency matrix in which within-layer and inter-layer connections are stored. W e then adapt the transition probabilities of a multiplex random walk to depend on the local topological similarity between any pair of layers at an y gi ven node, and briefly discuss their properties. In order to group nodes into communities using the multiplex random walk, we introduce a dissimilarity measure between nodes that captures the community structure of the multiplex. This measure considers two separate cases: when two nodes from the same layer are compared, and when two nodes from different layers are compared. Lastly we explain ho w these distances are used to generate hierarchical clusters and detect communities. A. The supra-adjacency matrix An L -layered multiplex network is a multilayer undirected graph M = ( V ; A k ) L k =1 , where V is a set of nodes, where | V | = N , and A k is the N × N adjacency matrix representing the set of edges in layer L k for k = 1 , 2 , ..., L . For any node v i ∈ V , i = 1 , 2 , ..., N , we denote node v i in layer L k by v k i . The connection between nodes v i and v j in L k is giv en by A ij ; k = A j i ; k . Nodes v i and v j in L k are neighbors if A ij ; k = A j i ; k = 1 , otherwise A ij ; k = 0 . Furthermore, ∀ k , A ij ; k = 0 for i = j . The weighted edge between nodes v k i and v l i is the inter-layer connection denoted by ω i ; kl ∈ R . Inter-layer weights have been modeled in different ways in the existing literature, and usually a fixed inter-layer v alue ω i ; kl = ω ∀ i, k , l, is adopted. Dif ferent v alues of ω ∈ [0 , ∞ ) hav e been considered to analyze ho w they affect the time required for a random walk to cover all nodes in the mul- tiplex, [16], [29]. In [14], ω ∈ (0 , 1] is interpreted as uniform coupling strength. V alues of ω close to 1 encourage the same community assignment of a node in two different layers, while ω close to 0 does not support coupling of communities from different layers. In this work, the inter-layer weights ω i ; kl reflect the similar- ity in local topology between v k i and v l i , and is defined as the number of edges that the two nodes have in common between layers, i.e. ω i ; kl := | N i,k ∩ N i,l | where N i,k := { v k j : A ij ; k = 1 } is the set of edges for v k i . It follows that ω i ; kl ∈ [0 , N − 1] . The M network has an associated ( N L × N L ) block matrix called the supra-adjacency matrix A ∗ . The diagonal blocks are the adjacency matrices A k , and the off-diagonal blocks are the inter-layer connection diagonal matrices W km , namely W km = diag ( ω 1; km , ω 2; km , ..., ω N ; k m ) . Thus A ∗ ( i,k )( j,l ) indi- cates the connection between node v k i and node v l j . The A ∗ matrix is used to define the multiplex random walk on M . In order to define a random walk that is well-suited for exploring the whole multiplex, howe ver , we require A ∗ to be “well-behav ed”, i.e. connected and non-bipartite [4]. A network is connected if there exists a path between any two pairs of nodes, and is bipartite if it can be divided into two disjoint sets such that no links connect two nodes in the same set. A ∗ is not necessarily connected since we may have inter- layer connections ω i ; kl = 0 . Since A ii ; k = 0 for ∀ i, k , A ∗ may be also bipartite. W e introduce a new supra-adjacency matrix A obtained from A ∗ by replacing the entry A j with A j + εI and W ij with W ij + εI ; here I is the N × N identity matrix and 0 < ε ≤ 1 . The positive weights on the main diagonal of A make the multiplex non-bipartite, while the positive weights in the off-diagonal blocks’ main diagonals make the multiplex connected. Both A and A ∗ clearly hav e the same topology . W e use A to define the transition probabilities in the next section. B. Locally Adapted Random T ransition Pr obabilities A discrete-time random walk on M should be allo wed to mov e within and across layers. The structure of M allows four possible moves that a random walker can make when in node v k i : when it stays in the same layer L k , it can either stay at v k i or move to a neighboring node v k j ; when it jumps to another layer L l , it can either make a step to its corresponding node, v l i , or mov e to a different one, v l j . The corresponding transition probabilities associated to these four possible mov es are defined as P ( i,k )( i,k ) := A ( i,k )( i,k ) κ i,k P ( i,k )( i,l ) := A ( i,k )( i,l ) κ i,k P ( i,k )( j,k ) := A ( i,k )( j,k ) κ i,k P ( i,k )( j,l ) :=0 (1) where κ i,k is the multiplex degree of node v k i in A defined as κ i,k := P j,l A ( i,k )( j,l ) . In this formulation, the transition probabilities depend on the topological similarity between layers. The rationale for these definitions is as follows. When v k i and v l i hav e sev eral common neighbors, it may be possible that both nodes belong to a community shared by those two layers, L k and L l ; in this case ω i ; kl is high, and in turn P ( i,k )( i,l ) is also high in order to encourage this type of move. In the extreme case when the local topology of v i is exactly the same across all L layers, then L X l =1; l 6 = k P ( i,k )( i,l ) = ( L − 1) /L and P j P ( i,k )( j,k ) = 1 /L . In this setting, for L = 2 , the random walker will be equally likely to stay at the current layer L k or explore the other layer; for L > 2 , the random walker will have higher probability to mov e to L l , l 6 = k , rather than staying in the current layer L k . On the contrary , when a node v k i belongs to a community which is specific only to L k , we expect that ω i ; kl , l 6 = k will be small; in this case, P l 6 = k P ( i,k )( i,l ) ≈ 0 , and the random walker will remain “trapped”’ for longer time in the region of the community on the current layer L k since P j P ( i,k )( j,k ) ≈ 1 . When a node v k i is disconnected in L k , the random w alker will either stay in the current layer with probability proportional to A ( i,k )( i,k ) = ε or move to any other L l , l 6 = k with probability proportional to P l A ( i,k )( i,l ) = ε ( L − 1) . W e note that in a multiplex there e xist inter-layer con- nections only between corresponding nodes. Therefore the probability to move from a node v k i to other nodes v l j for i 6 = j and k 6 = l is zero since there cannot exist a direct move where there is no connection. Our transition probabilities in (1) may be represented as an N L × N L transition matrix P of the random walk process, which may also be written as P := D − 1 A This representation is useful for showing the stationary prop- erties of the random walk. Here D is the N L × N L di- agonal matrix defined by the multiplex node degrees, i.e. D ( i,k )( i,k ) := κ i,k and D ( i,k )( j,l ) = 0 for i 6 = j or k 6 = l . The resulting transition matrix P is stochastic since 0 ≤ P ( i,k )( j,l ) ≤ 1 , ∀ i, j, k , l , and P ( j,l ) P ( i,k )( j,l ) = 1 for every ( i, k ) . Let the probability distribution p ( t ) =  p ( i,k ) ( t )  N ,L i =1; k =1 be the vector of probability values for all nodes in the network, where p ( i,k ) ( t ) is the probability of finding the random walker in node v k i after t steps. The dynamics of the probability distribution p ( t ) is giv en by: p ( t + 1) = p ( t ) P = p (0) P t . (2) It follows that the probability to start in node v k i and reach node v l j in t steps is gi ven by P t ( i,k ) , ( j,l ) . A stationary distri- bution of P satisfies the equation: p ∗ = p ∗ P , (3) with P ( i,k ) p ∗ ( i,k ) = 1 and 0 ≤ p ∗ ( i,k ) ≤ 1 for all ( i, k ) , see [4]. It can be proved that P is irreducible and aperiodic since it is defined by A , which is connected and non-bipartite. Therefore the existence and uniqueness of the stationary distribution p ∗ is guaranteed by the Perron-Frobenius Theorem [4]. The stationary distribution corresponds to the left-eigenv ector of P associated with eigen v alue 1 , and is obtained as p ∗ ( i,k ) = κ i,k P j,l κ ( j,l ) . There are two implications of the stationary distribution, which we need to consider in order to introduce a dissimilarity measure between nodes in the multiplex. First, as the number of steps t tends to infinity , the probability of being on a node v l j depends only on the degree of the node v l j regardless of what the starting node is, i.e.: lim t →∞ P t ( i,k )( j,l ) → p ∗ ( j,l ) = κ j,l P h,m κ h,m , ∀ ( i, k ) . (4) Second, the stationary distribution satisfies the time- rev ersibility property of the chain: κ i,k P t ( i,k )( j,l ) = κ j,l P t ( j,l )( i,k ) , ∀ ( i, k ) , ∀ ( j, l ) . (5) These are standard results; see [4]. The con ver gence to the stationary distrib ution implies that for large t all rows of the matrix P t approach the station- ary distribution (4), thus the whole multiplex becomes one community . On the other hand, for smaller t the random walk captures local community structures (a fact also observed in [5] and [7]), therefore we use short random walks to detect communities as desired. The time-re versibility property (5) implies that even if the probability to reach v l j in t steps starting from v k i is high, it does not follow that the probability to reach v k i in t steps starting from v l j is high. Therefore, it is insufficient to compare nodes v k i and v l j only through P t ( i,k )( j,l ) or P t ( j,l )( i,k ) . For this reason we use the N L -dimensional vector of probabilities P t ( i,k )( · , · ) = h P t ( i,k )( h,m ) i L,N m =1 ,h =1 av ailable for a node v k i to define its dissimilarity to any other node. C. Node dissimilarity matrix In this section we introduce an N L × N L node dissimilarity matrix S ( t ) which depends on the multiplex random walk of length t . This matrix contains all possible distances between any pair of nodes, both within and between layers. These distances are defined such that, when two nodes belong to the same community , their distance is low , regardless of whether they are in the same layer or not; con versely , the distance between two nodes is large when they are not in the same community , again regardless of the layers they are in. In order to define the elements of the dissimilarity matrix S we need to consider two separate cases: in the case when we compare two nodes in the same layer , they are in the same community if they ”see” in a similar way the rest of the nodes in the current layer and all nodes in the other layers. Another case is when we compare two nodes in two separate layers L k and L l ; they will be in the same community only if such a community is shared by both layers. Therefore, they will “see” in a similar way all nodes both in their respective layers L k and L l and in each other’ s layers L l and L k . Same layer: when v k i and v k j are in the same layer, their dissimilarity is defined as: S ( t ) ( i,k )( j,k ) := v u u u t N X h =1 L X m =1  P t ( i,k )( h,m ) − P t ( j,k )( h,m )  2 κ ( h,m ) = =    D − 1 2 P t ( i,k )( .,. ) − D − 1 2 P t ( j,k )( .,. )    (6) where k . k is the Euclidean norm. The distance S ( t ) ( i,k )( j,k ) is small when two nodes from the same layer, v k i and v k j , are in the same community , since the probabilities to reach any other node in layer L k starting from v k i or v k j will be approximately equal, P t ( i,k )( h,k ) ' P t ( j,k )( h,k ) , ∀ h = 1 , 2 , ..., N . Moreover , the probabilities to reach any other node in any other layer L l , l 6 = k , starting from v k i or v k j will be also almost equal, P t ( i,k )( h,l ) ' P t ( j,k )( h,l ) , ∀ h = 1 , 2 , ..., N , ∀ l 6 = k . Differ ent layers: When v k i and v l j are in tw o different layers, L k and L l , we define the dissimilarity as: S ( t ) ( i,k )( j,l ) := √ s 1 + s 2 + s 3 where s 1 := N X h =1 P t ( i,k )( h,k ) √ κ ( h,k ) − P t ( j,l )( h,l ) √ κ ( h,l ) ! 2 s 2 := N X h =1 P t ( i,k )( h,l ) √ κ ( h,l ) − P t ( j,l )( h,k ) √ κ ( h,k ) ! 2 s 3 := N X h =1 L X m =1; m 6 = k,l  P t ( i,k )( h,m ) − P t ( j,l )( h,m )  2 κ ( h,m ) . This definition follows the usual approach to norm definition in Euclidean space. This distance S ( t ) ( i,k )( j,l ) is small when v k i and v l j are in the same community . The value of s 1 is small when the probabilities to reach any node in layer L k starting from v k i are approximately equal to the probabilities to reach any node in layer L l starting from v l j , P t ( i,k )( h,k ) ' P t ( j,l )( h,l ) , ∀ h = 1 , 2 , ..., N . The v alue of s 2 is small when the probabilities to reach any node in layer L l starting from v k i are approximately equal to the probabilities to reach any node in layer L k starting from v l j , P t ( i,k )( h,l ) ' P t ( j,l )( h,k ) , ∀ h = 1 , 2 , ..., N . W e see that if v k i and v l j are in the same community , then both s 1 and s 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● La yer 1, L 1 C1 C2 C4 C5 C6 C7 C8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● La yer 2, L 2 C2 C3 C4 C5 C6 C9 C10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● La yer 3, L 3 C2 C3 C4 C5 C6 C9 C10 Fig. 1. Three-layered multiplex network: There is a one-to-one correspondence between nodes in all three layers L 1 , L 2 and L 3 . Each community has its own node color, and community borders are indicated by shadows which are labeled with respect to the community . Grey nodes represent nodes with no community assignment. Community C1 (red) exists only in L 1 , while community C3 (black) is shared only in L 2 and L 3 , it does not exist in L 1 . The communities C2 (green), C4 (yellow), C5 (blue) and C6 (orange) are shared by all three layers. Communities C7 (pink) and C8 (light blue) are communities specific to L 1 , but the nodes forming these two communities regroup into two other communities C9 (dark green) and C10 (purple) which are shared by L 2 and L 3 . See further discussion on the structure of the different communities in Section V. are small. Howe ver , if v k i and v l j do not belong to the same community , both s 1 and s 2 are large. The value of s 3 is small when the probabilities to reach any node in another layer L m , m 6 = k, l , starting from v k i or v l j are approximately equal, P t ( i,k )( h,m ) ' P t ( j,l )( h,m ) , ∀ h = 1 , 2 , ..., | V | , ∀ m 6 = k , l . It can be verified that S is symmetric, non-negati ve, homo- geneous and satisfies the triangle inequality . D. Agglomer ative clustering W e use agglomerative clustering to merge nodes in com- munities, since such a method allows us to use the topology of the multiplex and ensure that the obtained communities are connected. The algorithm starts by assigning each node in each layer to its own community , and then it iteratively merges nodes based on the average linkage criterion using the distance matrix S . In order to ensure that each community , if any is detected, is connected, we impose the criterion that only nodes and communities having at least one within-layer or inter- layer connection between them can be merged. Furthermore, we use the multiplex modularity Q M proposed in [14] to choose the best partition as this criterion takes into account both the within-layer and inter-layer connections of a detected community . I V . S H A R E D A N D N O N - S H A R E D C O M M U N I T I E S : I L L U S T R A T I V E E X A M P L E S T o provide a complete ev aluation of the performance of LAR T , we introduce some illustrative examples meant to represent shared and non-shared communities. Figure 1 shows examples of different shared and non-shared communities and how their structures can be influenced by noise or the nature of the data used to obtain the networks. The specificity of the data can result in groups of nodes that belong to the same community although there might not be clear connectivity patterns between them. These are discussed in more detail in the following two sections. A. Shar ed communities A shared community is a set of nodes for which sev eral (but not necessarily all) layers provide topological evidence that these nodes form the same shared community . An example is a set of nodes that form densely connected communities in each of the several layers in Figure 1. The nodes that form C3 (black) are densely connected in both layers L 2 and L 3 . They do not form a community in L 1 so C3 is shared only in L 2 and L 3 . A similar example are communities C9 (dark green) and C10 (purple) both of which are shared by L 2 and L 3 . Detecting shared communities can help uncover hidden structures that could otherwise go undetected when consid- ering each layer separately . T wo such examples are communi- ties C2 (green) and C4 (yellow) both of which are shared in all three layers. Here we observe the disjoint node subsets in C2 present in L 1 and L 2 , and similarly , disjoint subsets in C4 present in L 2 and L 3 . In such cases, the communities might be disjoint by chance or as a result of measurement errors. Detecting shared community structures can also be helpful when we try to distinguish true signal from noise. Consider as an example communities C5 (blue) and C6 (orange). In L 1 and L 3 , C5 and C6 are clearly disjoint and they are respectiv ely shared by L 1 and L 3 . Howe ver , in L 2 there are high white noise lev els between C5 and C6. B. Non-shar ed communities A non-shared community is a set of nodes which ha ve a densely connected structural pattern specific to one layer . For example, the nodes that form community C1 (red) are densely connected in L 1 , but the same nodes do not form any communities in L 2 or L 3 . There can also exist v arious structural patterns between nodes in different layers. Therefore, same sets of nodes can form non-shared communities in one layer, and shared communities in other layers. For example, non-shared com- munities C7 (pink) and C8 (light blue) are specific for L 1 . Howe v er , the union of these nodes is the same as the union of the nodes forming C9 and C10 in L 2 and L 3 . V . E X P E R I M E N T A L R E S U LT S In this section we present our experimental results based on simulated networks. W e consider fiv e different scenarios of shared and non-shared communities in synthetic multiplexes. LAR T is compared to other multiplex community detection methods: multiplex modularity maximization (MM) [14], Prin- cipal Modularity Maximization (PMM) [11], and two methods for combining partitions obtained on the separate layers using community similarity measures, topological overlap S T ([3]) and Normalized Mutual Information (NMI) S M ([20]). A. Simulation settings W e consider five different scenarios each one describing a different pattern of shared and non-shared communities as discussed earlier: Scenario S1: W e consider both shared and non-shared communities, for three layers L = 3 . The motiv ation for this scenario are communities C1 and C3 with reference to Fig- ure 1. The layers in which the communities exist are randomly sampled, and the set of nodes forming these communities do not form other communities in other layers. The number of nodes on the layers is uniformly sampled from 30 ≤ N ≤ 90 . Scenario S2: W e consider communities shared by three layers, L = 3 . The motiv ation for this scenario are commu- nities C2 and C4 with reference to Figure 1. For each shared community , two layers are selected randomly . In these two layers the set of nodes forming the community is randomly split in two or three disjoint subsets. The number of nodes on the layers is uniformly sampled from 60 ≤ N ≤ 80 . Scenario S3: W e consider communities shared by three layers, L = 3 . The motiv ation for this scenario are commu- nities C5 and C6 with reference to Figure 1. For two shared communities, randomly select one layer in which, first, the within-community edge probability of the two communities is uniformly sampled from 0 . 10 ≤ p ≤ 0 . 20 , and, second, white noise lev els are added between the two communities with probability uniformly sampled from 0 . 10 ≤ p ≤ 0 . 20 . The number of nodes on the layers is uniformly sampled from 60 ≤ N ≤ 80 . Scenario S4: W e consider both shared and non-shared communities in three layers, L = 3 . The moti vation for this scenario are communities C7, C8, C9 and C10 with reference to Figure 1. Randomly select two layers in which sets of nodes form shared communities. In the third layer , the same set of nodes form non-shared communities. The topological struc- ture of the non-shared communities is simulated to resemble bipartite sets with edge probability p = 0 . 4 . The number of nodes in the layers is N = 80 . T ransition probabilities matrix for t=9 steps La yer 1, L 1 La yer 2, L 2 La yer 3, L 3 La yer 3, L 3 La yer 2, L 2 La yer 1, L 1 C1 C1 C1 C2 C2 C2 C3 C3 C4 C4 C4 C5 C5 C5 C6 C6 C6 C7/C8 C9 C9 C10 C10 C3 C1 C1 C1 C2 C2 C2 C3 C3 C4 C4 C4 C5 C5 C5 C6 C6 C6 C7/C8 C9 C9 C10 C10 C3 0.00 0.01 0.02 0.03 0.04 0.05 Fig. 2. Heat map of transition probabilities for a random walk of length t = 9 steps on the multiplex network in Figure 1: the order of nodes in each layer is the same, and the order is determined by the communities that nodes belong to. In each block, the order of the transition probabilities follows the order of communities C1, C2, C3, C4, C5, C6 ; communities C7 and C8 are in L 1 , while communities C9 and C10 are in L 2 and L 3 . The probabilities to move between layers are much higher for communities that are shared by two layers (C3). Also the probabilities to get “trapped” in a layer-specific community are higher than those to move to another layer (C1 in L 1 only). Scenario S5: W e consider both shared and non-shared communities in four layers, L = 4 , where communities can be shared by no more than three out of the four layers. The motiv ation for this scenario are all communities of the multiplex in Figure 1. The communities are a mixture of the patterns considered in Scenarios 1 to 4 . The number of nodes on the layers is sampled from 150 ≤ N ≤ 180 . B. Comparative performance Each community (or indicated disjoint subsets of nodes in a community) has a uniformly sampled within-community edge probability , 0 . 25 ≤ p ≤ 0 . 40 (unless explicitly stated otherwise). For each one of the fiv e scenarios, we randomly generate 100 synthetic multiplexes. Each multiplex is simu- lated using the following steps: first, we randomly sample the number of nodes and communities in the multiplex; second, each community is assigned the same set of nodes in the different layers; third, in each layer the edges between the nodes in these sets are simulated according to the community structure of the respecti ve scenario. Last but not least, on each layer noise is added to represent the random connections between the communities. The multiplexes generated in each simulation were analyzed using the LAR T algorithm. For each multiplex, we select the length of the random walk t using the rule of thumb proposed in [5] which states that for dense networks t = 3 is sufficient to explore the local topology of the network. Since T ABLE I P E RF O R M AN C E O F C OM P E TI N G A L GO R I T HM S I N FI V E S I M U LAT ED S C E NA R IO S NMI index (mean ± std.dev .) FM index (mean ± std.dev .) Simulation Scenario Simulation Scenario Algorithm S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 LART 0 . 85 ± 0 . 05 0 . 95 ± 0 . 05 0 . 94 ± 0 . 07 0 . 84 ± 0 . 07 0 . 88 ± 0 . 08 0 . 80 ± 0 . 12 0 . 97 ± 0 . 04 0 . 97 ± 0 . 05 0 . 78 ± 0 . 12 0 . 80 ± 0 . 14 F ixed 1 0 . 70 ± 0 . 11 0 . 86 ± 0 . 10 0 . 88 ± 0 . 13 0 . 79 ± 0 . 03 0 . 68 ± 0 . 02 0 . 72 ± 0 . 20 0 . 91 ± 0 . 11 0 . 87 ± 0 . 10 0 . 70 ± 0 . 01 0 . 57 ± 0 . 02 MM 0 . 80 ± 0 . 07 0 . 90 ± 0 . 07 0 . 79 ± 0 . 11 0 . 73 ± 0 . 04 0 . 82 ± 0 . 07 0 . 70 ± 0 . 19 0 . 93 ± 0 . 06 0 . 85 ± 0 . 1 0 . 71 ± 0 . 03 0 . 75 ± 0 . 11 F ixed 2 0 . 64 ± 0 . 14 0 . 97 ± 0 . 05 0 . 88 ± 0 . 11 0 . 68 ± 0 . 02 0 . 66 ± 0 . 02 0 . 55 ± 0 . 30 0 . 92 ± 0 . 10 0 . 97 ± 0 . 06 0 . 65 ± 0 . 03 0 . 57 ± 0 . 03 S T 0 . 78 ± 0 . 07 0 . 70 ± 0 . 05 0 . 68 ± 0 . 08 0 . 77 ± 0 . 06 0 . 75 ± 0 . 08 0 . 72 ± 0 . 17 0 . 78 ± 0 . 07 0 . 72 ± 0 . 06 0 . 61 ± 0 . 06 0 . 65 ± 0 . 12 S M 0 . 78 ± 0 . 01 0 . 70 ± 0 . 05 0 . 68 ± 0 . 09 0 . 77 ± 0 . 05 0 . 75 ± 0 . 08 0 . 72 ± 0 . 17 0 . 78 ± 0 . 09 0 . 72 ± 0 . 06 0 . 61 ± 0 . 08 0 . 66 ± 0 . 11 PMM 0 . 62 ± 0 . 13 0 . 97 ± 0 . 04 0 . 99 ± 0 . 02 0 . 69 ± 0 . 01 0 . 73 ± 0 . 16 0 . 56 ± 0 . 23 0 . 97 ± 0 . 04 0 . 99 ± 0 . 01 0 . 66 ± 0 . 01 0 . 66 ± 0 . 21 we work with multiplex networks, we consider t = 3 L where L is the number of layers in the multiplex. In this way , we allow enough steps for the random walker to explore the local topology of each node in all layers. W e provide an illustrative example of the transition probabilities for a short random walk of length t = 9 on a three-layered multiplex network, Figure 2. This figure shows that multiplex random walks of length t = 3 L capture the local community structures of the multiplex. W e fix ε = 1 which is equiv alent to adding a self- loop to each node in ev ery layer . Using the same simulated data, we also test the MM and the PMM algorithms. For MM, we use the supra-adjacency matrix A ∗ as input. The multiplex modularity Q M used by both LAR T and MM needs the specification of a resolution parameter γ . In our application, we consider γ = 1 . PMM is designed to find a shared community structure for all layers. Since PMM uses the k -means algorithm to merge communities, we obtain results for dif ferent values of cen- ters k = 1 , 2 , ... 10 , and record the best ones only . Since LAR T reduces to W alkTrap for L = 1 (with the exception of the linkage criterion), we add two methods where best partitions are initially identified separately in each layer using W alkT rap. Then the similarity between communities in the different layers is assigned using either S T , which is based on the relativ e overlap between communities, or S M , which is based on the NMI between two communities. Any clustering method can be used to merge communities between layers with respect to the resulting similarities, and in this work we use affinity propagation [30] since it does not require a priori knowledge about the number of clusters. Finally , we consider each multiplex network, but with fixed weights, ω i ; kl = ω ∀ i, k , l . W e use both LAR T and MM to detect communities in such a setting, and present results for ω = 1 since it provides good comparativ e results for detecting communities shared by several layers. W e annotate these two applications with F ixed 1 for the LAR T framew ork and F ixed 2 for the MM framew ork. The partitions obtained from the competing algorithms are assessed using two different relative performance measures: the generalized Fowlk es-Mallows Index (FM) [31], and the NMI measure [20]. Both measures take values between 0 and 1 , and they are equal to 1 when the true communities are correctly identified. The comparativ e analysis results for NMI and FM are summarized in T able I. For each of the 5 scenarios, the av erage results ( ± std. dev .) ov er the 100 simulated multiplex networks are presented for each of the algorithms. W e use the two-sided K olmogorov-Smirno v statistic to test against the null hypothesis that the results obtained from two different methods come from the same distribution. If there is enough evidence to reject the null hypothesis, we assume that the difference between two methods is statistically significant. The relativ e performance of LAR T when detecting commu- nities shared across all layers is very competiti ve. Scenarios S2 and S3 are cases in which the communities are shared across all layers. These scenarios are fa vorable for methods that find communities shared across all layers and are designed to be robust to noise. The results suggest this is the case since PMM performs slightly better relativ e to the other methods. This is true when an appropriate number of center k has been selected. F ixed 2 also performs better than LAR T for S3 since the MM framework for fixed weights is well designed to detect communities shared across all layers. Furthermore, the performance of methods that combine partitions obtained on the separate layers, S T and S M , is poor compared to the other methods. Scenarios S1, S4 and S5 show the main strength of LAR T . LAR T is better able to detect layer specific communities and communities that are shared across sev eral but not all layers. Furthermore, the results from S4 show that the method distin- guishes between different topological structures of communi- ties in different layers. The weaker performance for F ixed 1 and F ixed 2 show the gains introduced by locally adapting inter-layer weights ω i ; k,l . W e produce additional simulations for ω = 0 . 5 and ω = 0 . 1 which are not included here. The obtained results for ω = 0 . 5 , 0 . 1 are lower than the presented ones for ω = 1 . The only exception is S4 for which weaker couplings ω = 0 . 5 , 0 . 1 sho w better performance than ω = 1 for the MM framew ork. Additionally , we obtain results for varying parameter γ = 0 . 25 , 0 . 5 , 0 . 75 , 1 , 1 . 25 , 1 . 5 , 1 . 75 , 2 , 2 . 25 , 2 . 5 in the case of LAR T and MM (and ω = 1 , 0 . 5 , 0 . 1 ). Results not included here show that there exist values of γ 6 = 1 for which the performance of both LAR T and MM improves with respect to their performance for γ = 1 . Howe v er , even when the best results are selected and compared, the performance of LAR T for S1, S4 and S5 is higher than the performance of MM. In addition, there exist γ values for which MM with ω = 1 and ω = 0 . 5 performs without error for S2 and S3. V I . C O N C L U S I O N S W e distinguish between shared and non-shared community structures on a multiplex, and propose the LAR T algorithm which is designed to detect both types of communities. The algorithm takes advantage of the complex multiplex structure, and adapts the transition probabilities of the random walk to depend on the topological similarity between layers at any giv en node. One advantage of LAR T is that it requires the definition of only one parameter t which determines the length of the random walk. The value of t can vary within some boundaries as long as the random walks are short enough to explore only the local community structure. Therefore, future work is required to adopt an exact way of choosing a range of suitable values for t . Even so, LAR T performs very well in detecting commu- nities shared by a subset of layers, and it is competitive to methods that detect communities shared by all layers. Future work would include comparison to other methods mentioned in the re view section. The method will be further implemented to real world systems to showcase the benefits of the LAR T algorithm. R E F E R E N C E S [1] S. H. Strogatz, “Exploring complex networks, ” Natur e , v ol. 410, no. 6825, pp. 268–276, 2001. [2] M. Girvan and M. E. Ne wman, “Community structure in social and biological networks, ” PNAS , vol. 99, no. 12, pp. 7821–7826, 2002. [3] S. Fortunato, “Community detection in graphs, ” Physics Reports , vol. 486, no. 3, pp. 75–174, 2010. [4] L. Lov ´ asz, “Random walks on graphs: A survey , ” Combinatorics, P aul Er dos is eighty , vol. 2, no. 1, pp. 1–46, 1993. [5] P . Pons and M. Latapy , “Computing communities in large networks using random walks, ” in Computer and Information Sciences-ISCIS 2005 . Springer , 2005, pp. 284–293. [6] H. Zhou and R. Lipowsky , “Network brownian motion: A new method to measure vertex-verte x proximity and to identify communities and subcommunities, ” in Computational Science-ICCS 2004 . Springer , 2004, pp. 1062–1069. [7] M. Rosvall and C. T . Bergstrom, “Maps of random walks on complex networks reveal community structure, ” Proceedings of the National Academy of Sciences , vol. 105, no. 4, pp. 1118–1123, 2008. [8] S. Boccaletti, G. Bianconi, R. Criado, C. Del Genio, J. G ´ omez-Garde ˜ nes, M. Romance, I. Sendina-Nadal, Z. W ang, and M. Zanin, “The structure and dynamics of multilayer networks, ” Physics Reports , vol. 544, no. 1, pp. 1–122, 2014. [9] M. Kivel ¨ a, A. Arenas, M. Barthelemy , J. P . Gleeson, Y . Moreno, and M. A. Porter, “Multilayer networks, ” Journal of Complex Networks , vol. 2, no. 3, pp. 203–271, 2014. [10] M. Berlingerio, M. Coscia, and F . Giannotti, “Finding and characterizing communities in multidimensional networks, ” in Advances in Social Net- works Analysis and Mining (ASONAM), 2011 International Conference on . IEEE, 2011, pp. 490–494. [11] L. T ang, X. W ang, and H. Liu, “Uncoverning groups via heterogeneous interaction analysis, ” in Data Mining, 2009. ICDM’09. Ninth IEEE International Conference on . IEEE, 2009, pp. 503–512. [12] A. Amelio and C. Pizzuti, “Uncovering communities in multidimen- sional networks with multiobjectiv e genetic algorithms, ” in Pr oceedings of GECCO Comp . A CM, 2014, pp. 75–76. [13] R. Dobrin, J. Zhu, C. Molony , C. Argman, M. L. Parrish, S. Carlson, M. F . Allan, D. Pomp, E. E. Schadt et al. , “Multi-tissue coexpression net- works re veal unexpected subnetworks associated with disease, ” Genome Biol , vol. 10, no. 5, p. R55, 2009. [14] P . J. Mucha, T . Richardson, K. Macon, M. A. Porter , and J.-P . Onnela, “Community structure in time-dependent, multiscale, and multiplex networks, ” Science , vol. 328, no. 5980, pp. 876–878, 2010. [15] M. De Domenico, A. Lancichinetti, A. Arenas, and M. Rosv all, “Identi- fying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems, ” Physical Review X , v ol. 5, no. 1, p. 011027, 2015. [16] M. De Domenico, A. Sol ´ e-Ribalta, S. G ´ omez, and A. Arenas, “Navi- gability of interconnected networks under random failures, ” PNAS , vol. 111, no. 23, pp. 8351–8356, 2014. [17] M. Berlingerio, M. Coscia, and F . Giannotti, “Finding redundant and complementary communities in multidimensional networks, ” in Pro- ceedings of the 20th ACM CIKM . ACM, 2011, pp. 2181–2184. [18] G. Zhu and K. Li, “ A unified model for community detection of multiplex networks, ” in W eb Information Systems Engineering–WISE 2014 . Springer, 2014, pp. 31–46. [19] D. Cai, Z. Shao, X. He, X. Y an, and J. Han, “Mining hidden community in heterogeneous social networks, ” in Pr oceedings of the 3rd interna- tional workshop on Link discovery . ACM, 2005, pp. 58–65. [20] A. Strehl and J. Ghosh, “Cluster ensembles—a knowledge reuse frame- work for combining multiple partitions, ” The Journal of Machine Learning Researc h , vol. 3, pp. 583–617, 2003. [21] M. Berlingerio, F . Pinelli, and F . Calabrese, “ Abacus: frequent pat- tern mining-based community discovery in multidimensional networks, ” Data Mining and Knowledge Discovery , vol. 27, no. 3, 2013. [22] P . Br ´ odka, T . Filipowski, and P . Kazienko, “ An introduction to commu- nity detection in multi-layered social network, ” in Information Systems, E-Learning, and Knowledge Management Resear ch . Springer, 2013, pp. 185–190. [23] M. Hmimida and R. Kanawati, “Community detection in multiplex net- works: A seed-centric approach, ” Networks and Heter ogeneous Media , vol. 10, no. 1, pp. 71–85, 2015. [24] B. Boden, S. G ¨ unnemann, H. Hoffmann, and T . Seidl, “Mining coherent subgraphs in multi-layer graphs with edge labels, ” in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining . A CM, 2012, pp. 1258–1266. [25] M. De Domenico, A. Sol ´ e-Ribalta, E. Cozzo, M. Kivel ¨ a, Y . Moreno, M. A. Porter , S. G ´ omez, and A. Arenas, “Mathematical formulation of multilayer networks, ” Physical Review X , vol. 3, no. 4, p. 041022, 2013. [26] X. Li, M. K. Ng, and Y . Y e, “Multicomm: Finding community struc- turein multi-dimensional networks, ” Knowledge and Data Engineering, IEEE T ransactions on , vol. 26, no. 4, pp. 929–941, 2014. [27] E. E. Papalexakis, L. Akoglu, and D. Ience, “Do more views of a graph help? community detection and clustering in multi-graphs, ” in Information Fusion (FUSION), 2013 16th International Conference on . IEEE, 2013, pp. 899–905. [28] L. Gauvin, A. Panisson, and C. Cattuto, “Detecting the community structure and activity patterns of temporal networks: a non-negati ve tensor factorization approach, ” PloS one , vol. 9, no. 1, 2014. [29] A. Sole-Ribalta, M. De Domenico, N. E. Kouv aris, A. Diaz-Guilera, S. Gomez, and A. Arenas, “Spectral properties of the laplacian of multiplex networks, ” Physical Review E , vol. 88, no. 3, p. 032807, 2013. [30] B. J. Frey and D. Dueck, “Clustering by passing messages between data points, ” Science , vol. 315, no. 5814, pp. 972–976, 2007. [31] E. B. Fowlk es and C. L. Mallows, “ A method for comparing two hierarchical clusterings, ” J ournal of the American Statistical Association , vol. 78, no. 383, pp. 553–569, 1983.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment