Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society. One crucial step when studying the structure and dynamics of networks is to identify communities: groups of related nodes that correspond to functional subunits such as protein complexes or social spheres. Communities in networks often overlap such that nodes simultaneously belong to several groups. Meanwhile, many networks are known to possess hierarchical organization, where communities are recursively grouped into a hierarchical structure. However, the fact that many real networks have communities with pervasive overlap, where each and every node belongs to more than one group, has the consequence that a global hierarchy of nodes cannot capture the relationships between overlapping groups. Here we reinvent communities as groups of links rather than nodes and show that this unorthodox approach successfully reconciles the antagonistic organizing principles of overlapping communities and hierarchy. In contrast to the existing literature, which has entirely focused on grouping nodes, link communities naturally incorporate overlap while revealing hierarchical organization. We find relevant link communities in many networks, including major biological networks such as protein-protein interaction and metabolic networks, and show that a large social network contains hierarchically organized community structures spanning inner-city to regional scales while maintaining pervasive overlap. Our results imply that link communities are fundamental building blocks that reveal overlap and hierarchical organization in networks to be two aspects of the same phenomenon.
Deep Dive into Link communities reveal multiscale complexity in networks.
Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society. One crucial step when studying the structure and dynamics of networks is to identify communities: groups of related nodes that correspond to functional subunits such as protein complexes or social spheres. Communities in networks often overlap such that nodes simultaneously belong to several groups. Meanwhile, many networks are known to possess hierarchical organization, where communities are recursively grouped into a hierarchical structure. However, the fact that many real networks have communities with pervasive overlap, where each and every node belongs to more than one group, has the consequence that a global hierarchy of nodes cannot capture the relationships between overlapping groups. Here we reinvent communities as groups of links rather than nodes and show that this unorthodox approach successfully recon
of overlapping communities and hierarchy. In contrast to the existing literature, which has entirely focused on grouping nodes, link communities naturally incorporate overlap while revealing hierarchical organization. We find relevant link communities in many networks, including major biological networks such as protein-protein interaction 6,7,42 and metabolic networks 10,39,43 , and show that a large social network 44,46,47 contains hierarchically organized community structures spanning inner-city to regional scales while maintaining pervasive overlap. Our results imply that link communities are fundamental building blocks which reveal overlap and hierarchical organization in networks to be two aspects of the same phenomenon.
Although no common definition has been agreed upon, it is widely accepted that a community should have more internal than external connections 2, 6-8, 12, 19 . Counterintuitively, highly overlapping communities can have many more external than internal connections (Fig. 1a,b). Because pervasive overlap breaks even this fundamental assumption, a new approach is needed.
The discovery of hierarchy and community organization has always been considered a problem of determining the correct membership (or memberships) of each node. Notice that, whereas nodes belong to multiple groups (individuals have families, co-workers and friends; Fig. 1c), links often exist for one dominant reason (two people are in the same family, work together or have common interests). Instead of assuming that a community is a set of nodes with many links between them, we consider a community to be a set of closely interrelated links.
Placing each link in a single context allows us to reveal hierarchical and overlapping rela-2 tionships simultaneously. We use hierarchical clustering with a similarity between links to build a dendrogram where each leaf is a link from the original network and branches represent link communities (Fig. 1d,e and Methods). In this dendrogram, links occupy unique positions whereas nodes naturally occupy multiple positions, owing to their links. We extract link communities at multiple levels by cutting this dendrogram at various thresholds. Each node inherits all memberships of its links and can thus belong to multiple, overlapping communities. Even though we assign only a single membership per link, link communities can also capture multiple relationships between nodes, because multiple nodes can simultaneously belong to several communities together.
The link dendrogram provides a rich hierarchy of structure, but to obtain the most relevant communities it is necessary to determine the best level at which to cut the tree. For this purpose, we introduce a natural objective function, the partition density, D, based on link density inside communities; unlike modularity 2 , D does not suffer from a resolution limit 25 (Methods). Computing D at each level of the link dendrogram allows us to pick the best level to cut (although meaningful structure exists above and below that threshold). It is also possible to optimize D directly. We can now formulate overlapping community discovery as a well-posed optimization problem, accounting for overlap at every node without penalizing that nodes participate in multiple communities.
As an illustrative example, Fig. 1f shows link communities around the word ‘Newton’ in a network of commonly associated English words. (See Supplementary Information, section S6, for details on networks used throughout the text.) The ‘clever, wit’ community is correctly iden-tified inside the ‘smart/intellect’ community. The words ‘Newton’ and ‘Gravity’ both belong to the ‘smart/intellect’, ‘weight’ and ‘apple’ communities, illustrating that link communities capture multiple relationships between nodes. See Supplementary Information, section S3.6, for further visualizations.
Having unified hierarchy and overlap, we provide quantitative, real-world evidence that a link-based approach is superior to existing, node-based approaches. Using data-driven performance measures, we analyse link communities found at the maximum partition density in realworld networks, compared with node communities found by three widely used and successful methods: clique percolation 11 , greedy modularity optimization 14 and Infomap 12 . Clique percolation is the most prominent overlapping community algorithm, greedy modularity optimization is the most popular modularity-based 2 technique and Infomap is often considered the most accurate method available 20 .
We compiled a test group of 11 networks covering many domains of active research and representing the wide body of available data (Supplementary Table 2). These networks vary from small to large, from sparse to dense, and from those with modular structure to those with highly overlapping structure. We highlight a few data sets of particular scientific importance: The mobile phone network is the most comprehensive proxy of a large-scale social
…(Full text truncated)…
This content is AI-processed based on ArXiv data.