Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest. In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a size-resolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.
Deep Dive into Empirical Comparison of Algorithms for Network Community Detection.
Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that “look like” good communities for the application of interest. In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objec
arXiv:1004.3539v1 [cs.DS] 20 Apr 2010
Empirical Comparison of Algorithms for
Network Community Detection
Jure Leskovec
Stanford University
jure@cs.stanford.edu
Kevin J. Lang
Yahoo! Research
langk@yahoo-inc.com
Michael W. Mahoney
Stanford University
mmahoney@cs.stanford.edu
ABSTRACT
Detecting clusters or communities in large real-world graphs such
as large social or information networks is a problem of considerable
interest. In practice, one typically chooses an objective function
that captures the intuition of a network cluster as set of nodes with
better internal connectivity than external connectivity, and then one
applies approximation algorithms or heuristics to extract sets of
nodes that are related to the objective function and that “look like”
good communities for the application of interest.
In this paper, we explore a range of network community detec-
tion methods in order to compare them and to understand their rela-
tive performance and the systematic biases in the clusters they iden-
tify. We evaluate several common objective functions that are used
to formalize the notion of a network community, and we examine
several different classes of approximation algorithms that aim to
optimize such objective functions. In addition, rather than simply
fixing an objective and asking for an approximation to the best clus-
ter of any size, we consider a size-resolved version of the optimiza-
tion problem. Considering community quality as a function of its
size provides a much finer lens with which to examine community
detection algorithms, since objective functions and approximation
algorithms often have non-obvious size-dependent behavior.
Categories and Subject Descriptors: H.2.8 Database Manage-
ment: Database applications – Data mining
General Terms: Measurement; Experimentation.
Keywords: Community structure; Graph partitioning; Conduc-
tance; Spectral methods; Flow-based methods.
1.
INTRODUCTION
Detecting clusters or communities in real-world graphs such as
large social networks, web graphs, and biological networks is a
problem of considerable practical interest that has received a great
deal of attention [16, 17, 13, 8, 19]. A “network community” (also
sometimes referred to as a module or cluster) is typically thought of
as a group of nodes with more and/or better interactions amongst
its members than between its members and the remainder of the
network [30, 16].
To extract such sets of nodes one typically chooses an objective
function that captures the above intuition of a community as a set
of nodes with better internal connectivity than external connectiv-
ity. Then, since the objective is typically NP-hard to optimize ex-
actly [24, 4, 31], one employs heuristics [16, 20, 9] or approxima-
tion algorithms [25, 33, 2] to find sets of nodes that approximately
Copyright is held by the International World Wide Web Conference Com-
mittee (IW3C2). Distribution of these papers is limited to classroom use,
and personal use by others.
WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA.
ACM 978-1-60558-799-8/10/04.
optimize the objective function and that can be understood or in-
terpreted as “real” communities. Alternatively, one might define
communities operationally to be the output of a community detec-
tion procedure, hoping they bear some relationship to the intuition
as to what it means for a set of nodes to be a good community [16,
29]. Once extracted, such clusters of nodes are often interpreted
as organizational units in social networks, functional units in bio-
chemical networks, ecological niches in food web networks, or sci-
entific disciplines in citation and collaboration networks [16, 30].
In applications, it is important to note that heuristic approaches to
and approximation algorithms for community detection often find
clusters that are systematically “biased,” in the sense that they re-
turn sets of nodes with properties that might be substantially differ-
ent than the set of nodes that achieves the global optimum of the
chosen objective. For example, many spectral-based methods tend
to find compact clusters at the expense that they are not so well
separated from the rest of the network; while other methods tend
to find better-separated clusters that may internally be “less nice.”
Moreover, certain methods tend to perform particularly well or par-
ticularly poorly on certain kinds of graphs, e.g., low-dimensional
manifolds or expanders. Thus, drawing on this experience, it is of
interest to compare these algorithms on large real-world networks
that have many complex structural features such as sparsity, heavy-
tailed degree distributions, small diameters, etc. Moreover, depend-
ing on the particular application and the properties of the network
being analyzed, one might prefer to identify specific types of clus-
ters. Understanding structural properties of clusters identified by
various algorithmic methods and various objective functions can
guide in selecting the most appropriate graph clu
…(Full text truncated)…
This content is AI-processed based on ArXiv data.