The art of community detection

Reading time: 5 minute
...

📝 Original Info

  • Title: The art of community detection
  • ArXiv ID: 0807.1833
  • Date: 2008-07-14
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Networks in nature possess a remarkable amount of structure. Via a series of data-driven discoveries, the cutting edge of network science has recently progressed from positing that the random graphs of mathematical graph theory might accurately describe real networks to the current viewpoint that networks in nature are highly complex and structured entities. The identification of high order structures in networks unveils insights into their functional organization. Recently, Clauset, Moore, and Newman, introduced a new algorithm that identifies such heterogeneities in complex networks by utilizing the hierarchy that necessarily organizes the many levels of structure. Here, we anchor their algorithm in a general community detection framework and discuss the future of community detection.

💡 Deep Analysis

Deep Dive into The art of community detection.

Networks in nature possess a remarkable amount of structure. Via a series of data-driven discoveries, the cutting edge of network science has recently progressed from positing that the random graphs of mathematical graph theory might accurately describe real networks to the current viewpoint that networks in nature are highly complex and structured entities. The identification of high order structures in networks unveils insights into their functional organization. Recently, Clauset, Moore, and Newman, introduced a new algorithm that identifies such heterogeneities in complex networks by utilizing the hierarchy that necessarily organizes the many levels of structure. Here, we anchor their algorithm in a general community detection framework and discuss the future of community detection.

📄 Full Content

THE ART OF COMMUNITY DETECTION NATALI GULBAHCE AND SUNE LEHMANN Center for Complex Networks Research, Northeastern University, Boston, MA 02115, USA, and Center for Cancer Systems Biology, Dana Farber Cancer Institute, Boston, MA 02215, USA. SUMMARY Networks in nature possess a remarkable amount of structure. Via a series of data- driven discoveries, the cutting edge of network science has recently progressed from positing that the random graphs of mathematical graph theory might accurately describe real networks to the current viewpoint that networks in nature are highly complex and structured entities. The identification of high order structures in networks unveils insights into their functional organization. Recently, Clauset, Moore, and Newman1, introduced a new algorithm that identifies such heterogeneities in complex networks by utilizing the hierarchy that necessarily organizes the many levels of structure. Here, we anchor their algorithm in a general community detection framework and discuss the future of community detection. STRUCTURE EVERYWHERE The view that networks are essentially random was challenged in 1999 when it was discovered that the distribution of number of links per node (degree) of many real networks (internet, metabolic network, sexual contacts, airports, etc) is different from what is expected in random networks2. In a large random network node degrees are distributed according to the normal distribution, but in many man- made and biological networks the degree distribution follows a power-law. In the human protein-protein interaction networks3,4, for instance, some proteins act as hubs, they are highly connected, and interact with more than 200 other proteins contrary to most proteins that interact with only a few other proteins. Various local to global measures have been introduced to unveil the organizational principles of complex networks5,6,7,8,9. Maslov and Sneppen10 discovered that who links to whom can depend on node degree; in many biological networks, high degree nodes systematically link to nodes of low degree. This disassortativity decreases the likelihood of cross talk between functional modules inside the cell and increases overall robustness. Other networks, for example social networks11, are highly assortative – in these networks nodes with similar degree tend to link to each other.

Figure 1: The scales of organization of complex networks. The illustrations on the left show how to break down the “hairball” that arises when we plot the entire network. On the smallest scale, the degree provides information about single nodes. The notion of assortativity enters when we discuss pairs of nodes. With three or more nodes, we are in the realm of motifs. Larger groups of nodes are called modules or communities. Hierarchy describes how the various structural elements are combined; how nodes a linked to form motifs, motifs are combined to form communities, and communities are joined into the entire network.

Going beyond the properties of single nodes and pairs of nodes, the natural next step is to consider structures that include several nodes. Interestingly, a few select motifs of three to four nodes are ubiquitous in real networks12 while most others occur only as often as they would at random, or, are actively suppressed. Other local measures that signify a dense or sparse local structure in a network are the clustering coefficient13 and short loops14,15.
COMMUNITIES Between the scale of the whole network and the scale of the motifs we find the network communities16,17. A community is a densely connected subset of nodes that is only sparsely linked to the remaining network. Modular structure introduces important heterogeneities in complex networks. For example, each module can have different local statistics18; some modules may have many connections, while other modules may be sparse. When there is large variation among communities, global values of statistical measures can be misleading. The presence of modular structure may also alter the way in which dynamical processes (e.g., spreading processes and synchronization19) unfold on the network. In biological networks, communities correspond to functional modules in which members of a module function coherently to perform essential cellular tasks. Both metabolic networks20 and protein phosphorylation networks21, for example, possess high clustering coefficients and are modular.
The ultimate goal in biology is to determine how genes and the proteins they encode function in the cell. A revolutionary approach to discover gene function has been to knock out a gene and observe its phenotype. A nearly complete collection of single gene deletions has been performed for Saccharomyces Cerevisiae22. Eukaryotes show large amounts of genetic redundancy, however, and single knock outs are no longer informative. Hence, the function of a large number of ge

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut