Empirical Comparison of Algorithms for Network Community Detection

Reading time: 6 minute
...

📝 Original Info

  • Title: Empirical Comparison of Algorithms for Network Community Detection
  • ArXiv ID: 1004.3539
  • Date: 2010-04-21
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest. In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a size-resolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.

💡 Deep Analysis

Deep Dive into Empirical Comparison of Algorithms for Network Community Detection.

Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that “look like” good communities for the application of interest. In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objec

📄 Full Content

arXiv:1004.3539v1 [cs.DS] 20 Apr 2010 Empirical Comparison of Algorithms for Network Community Detection Jure Leskovec Stanford University jure@cs.stanford.edu Kevin J. Lang Yahoo! Research langk@yahoo-inc.com Michael W. Mahoney Stanford University mmahoney@cs.stanford.edu ABSTRACT Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that “look like” good communities for the application of interest. In this paper, we explore a range of network community detec- tion methods in order to compare them and to understand their rela- tive performance and the systematic biases in the clusters they iden- tify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best clus- ter of any size, we consider a size-resolved version of the optimiza- tion problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior. Categories and Subject Descriptors: H.2.8 Database Manage- ment: Database applications – Data mining General Terms: Measurement; Experimentation. Keywords: Community structure; Graph partitioning; Conduc- tance; Spectral methods; Flow-based methods. 1. INTRODUCTION Detecting clusters or communities in real-world graphs such as large social networks, web graphs, and biological networks is a problem of considerable practical interest that has received a great deal of attention [16, 17, 13, 8, 19]. A “network community” (also sometimes referred to as a module or cluster) is typically thought of as a group of nodes with more and/or better interactions amongst its members than between its members and the remainder of the network [30, 16]. To extract such sets of nodes one typically chooses an objective function that captures the above intuition of a community as a set of nodes with better internal connectivity than external connectiv- ity. Then, since the objective is typically NP-hard to optimize ex- actly [24, 4, 31], one employs heuristics [16, 20, 9] or approxima- tion algorithms [25, 33, 2] to find sets of nodes that approximately Copyright is held by the International World Wide Web Conference Com- mittee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA. ACM 978-1-60558-799-8/10/04. optimize the objective function and that can be understood or in- terpreted as “real” communities. Alternatively, one might define communities operationally to be the output of a community detec- tion procedure, hoping they bear some relationship to the intuition as to what it means for a set of nodes to be a good community [16, 29]. Once extracted, such clusters of nodes are often interpreted as organizational units in social networks, functional units in bio- chemical networks, ecological niches in food web networks, or sci- entific disciplines in citation and collaboration networks [16, 30]. In applications, it is important to note that heuristic approaches to and approximation algorithms for community detection often find clusters that are systematically “biased,” in the sense that they re- turn sets of nodes with properties that might be substantially differ- ent than the set of nodes that achieves the global optimum of the chosen objective. For example, many spectral-based methods tend to find compact clusters at the expense that they are not so well separated from the rest of the network; while other methods tend to find better-separated clusters that may internally be “less nice.” Moreover, certain methods tend to perform particularly well or par- ticularly poorly on certain kinds of graphs, e.g., low-dimensional manifolds or expanders. Thus, drawing on this experience, it is of interest to compare these algorithms on large real-world networks that have many complex structural features such as sparsity, heavy- tailed degree distributions, small diameters, etc. Moreover, depend- ing on the particular application and the properties of the network being analyzed, one might prefer to identify specific types of clus- ters. Understanding structural properties of clusters identified by various algorithmic methods and various objective functions can guide in selecting the most appropriate graph clu

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut