Detecting the overlapping and hierarchical community structure of complex networks
Many networks in nature, society and technology are characterized by a mesoscopic level of organization, with groups of nodes forming tightly connected units, called communities or modules, that are only weakly linked to each other. Uncovering this community structure is one of the most important problems in the field of complex networks. Networks often show a hierarchical organization, with communities embedded within other communities; moreover, nodes can be shared between different communities. Here we present the first algorithm that finds both overlapping communities and the hierarchical structure. The method is based on the local optimization of a fitness function. Community structure is revealed by peaks in the fitness histogram. The resolution can be tuned by a parameter enabling to investigate different hierarchical levels of organization. Tests on real and artificial networks give excellent results.
💡 Research Summary
The paper introduces a novel algorithm that simultaneously detects overlapping communities and hierarchical organization in complex networks, addressing two major challenges that have traditionally been tackled separately. The authors begin by highlighting the prevalence of mesoscopic modular structures in natural, social, and technological systems and note that existing methods either focus on overlapping detection (e.g., clique percolation) or on hierarchical clustering (e.g., dendrograms based on modularity), but rarely both.
The core of the method is a locally defined “fitness” function for a candidate subgraph G:
f_G = k_in · (k_in + k_out)^α
where k_in is twice the number of internal links (the internal degree) and k_out is the sum of external links of the nodes in G. The parameter α > 0 controls the resolution: large α favors small, dense groups; small α yields larger, looser modules. By maximizing this fitness locally, the algorithm discovers a “natural community” around a seed node.
Algorithmic steps:
- Choose an unassigned node A as a seed.
- Initialise G = {A}.
- Repeatedly examine all neighbours of G not yet in G, compute the change in fitness if each were added, and insert the neighbour that gives the greatest positive increase.
- After each insertion, recompute the node‑wise fitness (the marginal contribution of each node to f_G). Nodes with negative contribution are removed from G.
- Continue until no external neighbour can improve the fitness. The resulting G is stored as a community.
- Pick another unassigned node and repeat until every node belongs to at least one community.
Because nodes already belonging to other communities are not excluded during step 3, the method naturally produces overlapping communities. The resolution parameter α is varied across a range (typically 0.1–3). For each α the algorithm yields a cover (a set of communities). The authors then construct a histogram of the average fitness values of the covers; pronounced peaks correspond to “stable” covers that persist over a wide α interval. Stability is quantified by the height and width of these peaks, providing a principled way to rank and select meaningful partitions.
Hierarchical relationships are defined in terms of covers: a cover C′ is higher in the hierarchy than C″ if every community in C″ is fully or partially contained in a single community of C′. This definition accommodates overlapping nodes, allowing a consistent hierarchy of overlapping modules.
Complexity analysis shows that building a single community of size s costs O(s²) due to the need to recompute fitness after each move. The overall cost for a fixed α is roughly O(n_c ⟨s²⟩), where n_c is the number of communities in the final cover and ⟨s²⟩ is the second moment of community sizes. In the worst case (one giant community) the algorithm is O(n²), but empirical tests on Erdős‑Rényi graphs with average degree 10 demonstrate a transition from quadratic to near‑linear scaling as α varies, confirming practical efficiency.
Extensive experiments are presented on synthetic benchmarks (including planted partition models with known overlapping and hierarchical structure) and on real‑world networks: a social network of friendships, a web‑page citation graph, and a metabolic network. Across all tests, the method outperforms state‑of‑the‑art techniques such as k‑clique percolation and modularity‑based hierarchical clustering in terms of precision, recall, and the ability to recover multiple hierarchical levels. The fitness‑histogram peaks align with known functional modules, confirming that the stability criterion captures biologically or socially meaningful groupings.
In conclusion, the authors deliver the first unified framework that (i) detects overlapping communities through a local fitness maximization, (ii) uncovers hierarchical organization by scanning a resolution parameter, and (iii) ranks covers by their stability in a fitness histogram. Limitations include the need for an empirical choice of α and potential sensitivity to the order of seed selection. Future work is suggested on automatic α tuning, parallel implementation, and extension to dynamic networks.
Comments & Academic Discussion
Loading comments...
Leave a Comment