Finding Community Structure Based on Subgraph Similarity

Reading time: 5 minute
...

📝 Abstract

Community identification is a long-standing challenge in the modern network science, especially for very large scale networks containing millions of nodes. In this paper, we propose a new metric to quantify the structural similarity between subgraphs, based on which an algorithm for community identification is designed. Extensive empirical results on several real networks from disparate fields has demonstrated that the present algorithm can provide the same level of reliability, measure by modularity, while takes much shorter time than the well-known fast algorithm proposed by Clauset, Newman and Moore (CNM). We further propose a hybrid algorithm that can simultaneously enhance modularity and save computational time compared with the CNM algorithm.

💡 Analysis

Community identification is a long-standing challenge in the modern network science, especially for very large scale networks containing millions of nodes. In this paper, we propose a new metric to quantify the structural similarity between subgraphs, based on which an algorithm for community identification is designed. Extensive empirical results on several real networks from disparate fields has demonstrated that the present algorithm can provide the same level of reliability, measure by modularity, while takes much shorter time than the well-known fast algorithm proposed by Clauset, Newman and Moore (CNM). We further propose a hybrid algorithm that can simultaneously enhance modularity and save computational time compared with the CNM algorithm.

📄 Content

arXiv:0902.2425v1 [cs.NI] 14 Feb 2009 Finding Community Structure Based on Subgraph Similarity Biao Xiang, En-Hong Chen, and Tao Zhou Abstract Community identification is a long-standing challenge in the modern net- work science, especially for very large scale networks containing millions of nodes. In this paper, we propose a new metric to quantify the structural similarity be- tween subgraphs, based on which an algorithm for community identification is de- signed. Extensive empirical results on several real networks from disparate fields has demonstrated that the present algorithm can provide the same level of reliabil- ity, measure by modularity, while takes much shorter time than the well-known fast algorithm proposed by Clauset, Newman and Moore (CNM). We further propose a hybrid algorithm that can simultaneously enhance modularity and save computa- tional time compared with the CNM algorithm. 1 Introduction The study of complex networks has become a common focus of many branches of science [1]. An open problem that attracts increasing attention is the identification and analysis of communities [2]. The so-called communities can be loosely defined as distinct subsets of nodes within which they are densely connected, while sparser between which [3]. The knowledge of community structure is significant for the understanding of network evolution [4] and the dynamics taking place on networks, such as epidemic spreading [5, 6] and synchronization [7, 8]. In addition, reasonable Biao Xiang, En-Hong Chen Department of Computer Science, University of Science and Technology of China, Hefei Anhui 230009, P. R. China. e-mail: cheneh@ustc.edu.cn Tao Zhou Department of Modern Physics, University of Science and Technology of China, Hefei Anhui 230026, P. R. China, and Department of Physics, University of Fribourg, Chemin du Mus´ee 3, Fribourg 1700, Switzerland. e-mail: zhutou@ustc.edu 1 2 Biao Xiang, En-Hong Chen, and Tao Zhou identification of communities is helpful for enhancing the accuracy of information filtering and recommendation [9]. Many algorithms for community identification have been proposed, these include the agglomerative method based on node similarity [10], divisive method via itera- tive removal of the edge with the highest betweenness [3, 11], divisive method based on dissimilarity index between nearest-neighboring nodes [12], a local algorithm based on edge-clustering coefficient [13], Potts model for fuzzy community detec- tion [14], simulated annealing [15], extremal optimization [16], spectrum-based al- gorithm [17], iterative algorithm based on passing message [18], and so on. Finding out the optimal division of communities, measure by modularity [11], is very hard [19], and for most cases, we can only get the near optimal division. Generally speaking, without any prior knowledge, such as the maximal community size and the number of communities, an algorithm that can give higher modular- ity is more time consuming [20]. As a consequence, providing accurate division of communities for a very large scale network in reasonable time is a big challenge in the modern network science. To address this issue, Newman proposed a fast greedy algorithm with time complexity O(n2) for sparse networks [21], where n denotes the number of nodes. Furthermore, Clauset, Newman, and Moore (CNM) designed an improved algorithm giving identical result but with lower computational complexity [22], as O(nlog2n). In this paper, based on a newly proposed metric of similarity between subgraphs, we design an agglomerative algorithm for community identifi- cation, which gives the same level of reliability but is typically hundreds of times faster than the CNM algorithm. We further propose a hybrid method that can si- multaneously enhance modularity and save computational time compared with the CNM algorithm. The rest of this paper is organized as follows. In Section 2, we introduce the present method, including the new metric of subgraph similarity and the correspond- ing algorithm, as well as the hybrid algorithm. In Section 3, we give a brief de- scription of the empirical data used in this paper. The performance of our proposed algorithms for both algorithmic accuracy and computational time are presented in Section 4. Finally, we sum up this paper in Section 5. 2 Method Considering an undirected simple network G(V,E), where V is the set of nodes and E is the set of edges. The multiple edges and self-connections are not allowed. Denote Γ = {V1,V2,···,Vh} a division of G, that is, Vi ∩Vj = /0 for 1 ≤i ̸= j ≤h and V1 ∪V2 ∪···∪Vh = V. We here propose a new metric of similarity between two subgraphs, Vi and Vj, as: sij = eij + ∑h k=1 √eikek j |Vk| pdid j , (1) Finding Community Structure Based on Subgraph Similarity 3 where eij is the number of edges with two endpoints respectively belonging to Vi and Vj (eij is defined to be zero if i = j), |Vk| is the number of nodes in subgraph Vk, and di = ∑x∈Vi kx is the sum of degrees of nodes in

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut