Networks in nature possess a remarkable amount of structure. Via a series of data-driven discoveries, the cutting edge of network science has recently progressed from positing that the random graphs of mathematical graph theory might accurately describe real networks to the current viewpoint that networks in nature are highly complex and structured entities. The identification of high order structures in networks unveils insights into their functional organization. Recently, Clauset, Moore, and Newman, introduced a new algorithm that identifies such heterogeneities in complex networks by utilizing the hierarchy that necessarily organizes the many levels of structure. Here, we anchor their algorithm in a general community detection framework and discuss the future of community detection.
Deep Dive into The art of community detection.
Networks in nature possess a remarkable amount of structure. Via a series of data-driven discoveries, the cutting edge of network science has recently progressed from positing that the random graphs of mathematical graph theory might accurately describe real networks to the current viewpoint that networks in nature are highly complex and structured entities. The identification of high order structures in networks unveils insights into their functional organization. Recently, Clauset, Moore, and Newman, introduced a new algorithm that identifies such heterogeneities in complex networks by utilizing the hierarchy that necessarily organizes the many levels of structure. Here, we anchor their algorithm in a general community detection framework and discuss the future of community detection.
THE ART OF COMMUNITY DETECTION
NATALI GULBAHCE AND SUNE LEHMANN
Center for Complex Networks Research, Northeastern University, Boston, MA 02115, USA, and
Center for Cancer Systems Biology, Dana Farber Cancer Institute, Boston, MA 02215, USA.
SUMMARY
Networks in nature possess a remarkable amount of structure. Via a series of data-
driven discoveries, the cutting edge of network science has recently progressed
from positing that the random graphs of mathematical graph theory might
accurately describe real networks to the current viewpoint that networks in nature
are highly complex and structured entities. The identification of high order
structures in networks unveils insights into their functional organization. Recently,
Clauset, Moore, and Newman1, introduced a new algorithm that identifies such
heterogeneities in complex networks by utilizing the hierarchy that necessarily
organizes the many levels of structure. Here, we anchor their algorithm in a general
community detection framework and discuss the future of community detection.
STRUCTURE EVERYWHERE
The view that networks are essentially random was challenged in 1999 when it was
discovered that the distribution of number of links per node (degree) of many real
networks (internet, metabolic network, sexual contacts, airports, etc) is different
from what is expected in random networks2. In a large random network node
degrees are distributed according to the normal distribution, but in many man-
made and biological networks the degree distribution follows a power-law. In the
human protein-protein interaction networks3,4, for instance, some proteins act as
hubs, they are highly connected, and interact with more than 200 other proteins
contrary to most proteins that interact with only a few other proteins.
Various local to global measures have been introduced to unveil the organizational
principles of complex networks5,6,7,8,9. Maslov and Sneppen10 discovered that who
links to whom can depend on node degree; in many biological networks, high degree
nodes systematically link to nodes of low degree. This disassortativity decreases the
likelihood of cross talk between functional modules inside the cell and increases
overall robustness. Other networks, for example social networks11, are highly
assortative – in these networks nodes with similar degree tend to link to each other.
Figure 1: The scales of organization of
complex networks. The illustrations on
the left show how to break down the
“hairball” that arises when we plot the
entire network. On the smallest scale, the
degree provides information about single
nodes. The notion of assortativity enters
when we discuss pairs of nodes. With
three or more nodes, we are in the realm
of motifs. Larger groups of nodes are
called
modules
or
communities.
Hierarchy describes how the various
structural elements are combined; how
nodes a linked to form motifs, motifs are
combined to form communities, and
communities are joined into the entire
network.
Going beyond the properties of single nodes and pairs of nodes, the natural next
step is to consider structures that include several nodes. Interestingly, a few select
motifs of three to four nodes are ubiquitous in real networks12 while most others
occur only as often as they would at random, or, are actively suppressed. Other local
measures that signify a dense or sparse local structure in a network are the
clustering coefficient13 and short loops14,15.
COMMUNITIES
Between the scale of the whole network and the scale of the motifs we find the
network communities16,17. A community is a densely connected subset of nodes that
is only sparsely linked to the remaining network. Modular structure introduces
important heterogeneities in complex networks. For example, each module can have
different local statistics18; some modules may have many connections, while other
modules may be sparse. When there is large variation among communities, global
values of statistical measures can be misleading. The presence of modular structure
may also alter the way in which dynamical processes (e.g., spreading processes and
synchronization19) unfold on the network. In biological networks, communities
correspond to functional modules in which members of a module function
coherently to perform essential cellular tasks. Both metabolic networks20 and
protein phosphorylation networks21, for example, possess high clustering
coefficients and are modular.
The ultimate goal in biology is to determine how genes and the proteins they encode
function in the cell. A revolutionary approach to discover gene function has been to
knock out a gene and observe its phenotype. A nearly complete collection of single
gene deletions has been performed for Saccharomyces Cerevisiae22. Eukaryotes show
large amounts of genetic redundancy, however, and single knock outs are no longer
informative. Hence, the function of a large number of ge
…(Full text truncated)…
This content is AI-processed based on ArXiv data.