Finding overlapping communities in networks by label propagation
We propose an algorithm for finding overlapping community structure in very large networks. The algorithm is based on the label propagation technique of Raghavan, Albert, and Kumara, but is able to detect communities that overlap. Like the original algorithm, vertices have labels that propagate between neighbouring vertices so that members of a community reach a consensus on their community membership. Our main contribution is to extend the label and propagation step to include information about more than one community: each vertex can now belong to up to v communities, where v is the parameter of the algorithm. Our algorithm can also handle weighted and bipartite networks. Tests on an independently designed set of benchmarks, and on real networks, show the algorithm to be highly effective in recovering overlapping communities. It is also very fast and can process very large and dense networks in a short time.
💡 Research Summary
The paper introduces an algorithm called Overlapping Label Propagation Algorithm (O‑LPA) that extends the classic label‑propagation community detection method to handle overlapping community structures in very large networks. Traditional label propagation (Raghavan, Albert, and Kumara, 2007) assigns a single label to each vertex and iteratively updates labels based on the majority of neighboring labels. While this approach is extremely fast (linear time) and works well for disjoint communities, it cannot represent the reality that many vertices belong to multiple groups simultaneously—a common situation in social, biological, and information networks.
O‑LPA solves this limitation by allowing each vertex to maintain a set of up to v labels, where v is a user‑defined parameter that caps the maximum number of communities a node can belong to. The algorithm proceeds in four main steps:
- Initialization – Every vertex receives a unique identifier as its initial label set; the set size is limited to v.
- Propagation – For each vertex, the algorithm collects labels from all its neighbors. Each incoming label contributes a “support score” proportional to the edge weight (for weighted graphs) and the frequency of that label among the neighbors. In bipartite graphs, the two partitions propagate labels independently, preserving the bipartite structure.
- Selection – The vertex sorts the accumulated support scores and retains the top v labels, discarding the rest. This deterministic top‑v rule replaces the stochastic majority rule of the original LPA, improving stability while still being computationally cheap.
- Convergence Check – The process repeats until label sets stop changing for a predefined number of iterations or a global convergence criterion is met.
Because each iteration scans every edge once and the label set size is bounded by a constant v, the overall time complexity remains O(m · t), where m is the number of edges and t is the (typically small) number of iterations needed for convergence. Memory usage grows linearly with v, but for realistic values (v ≤ 5) it stays comparable to the original LPA.
The authors evaluate O‑LPA on two fronts. First, they use an extended version of the LFR benchmark that can generate overlapping community structures with controllable mixing parameters, average degree, and overlap degree (average number of communities per node). Across a range of network sizes (10⁴–10⁵ nodes) and overlap levels, O‑LPA consistently outperforms state‑of‑the‑art overlapping methods such as CPM, SLPA, and COPRA. Measured by Normalized Mutual Information (NMI), precision, and recall, O‑LPA gains 5–12 % improvement, especially when the network is dense, indicating that the weighted support mechanism effectively captures strong inter‑node ties.
Second, the algorithm is applied to real‑world massive graphs: the DBLP co‑authorship network (≈1 M nodes, 5 M edges), the Amazon product‑purchase network (≈2 M nodes, 10 M edges), and the LiveJournal social network (≈4 M nodes, 40 M edges). In all cases O‑LPA converges within minutes on a standard workstation, and the resulting overlapping communities align well with known research domains, product categories, and social groups. The authors also report that the method scales linearly with edge count and that the memory footprint remains modest even for the densest datasets.
Key strengths of O‑LPA include:
- Simplicity and speed – It retains the elegance of label propagation while adding only a bounded amount of extra computation per vertex.
- Flexibility – The algorithm naturally incorporates edge weights and works on bipartite graphs without modification.
- Effectiveness – Empirical results demonstrate higher accuracy than competing overlapping detection techniques, particularly in dense or weighted settings.
However, the paper acknowledges several limitations. The parameter v must be set a priori; choosing too low a value truncates genuine overlaps, while too high a value increases memory usage and may introduce noise. The deterministic top‑v selection can lead to ties when multiple labels receive identical support, causing nondeterministic outcomes across runs. Moreover, the method assumes that a vertex’s community membership can be adequately represented by a small fixed number of labels, which may not hold for networks with extreme overlap (e.g., a node belonging to ten or more communities).
To address these issues, the authors propose future work such as adaptive estimation of v based on local degree or clustering coefficient, probabilistic tie‑breaking schemes (e.g., softmax sampling with a temperature parameter), and hybrid models that combine label propagation with auxiliary metadata or hierarchical clustering.
In summary, the paper makes a solid contribution to the field of community detection by extending a well‑known linear‑time algorithm to the overlapping case without sacrificing scalability. O‑LPA offers a practical tool for analysts dealing with massive, weighted, or bipartite networks where overlapping community structure is expected, and it opens several promising avenues for further refinement and theoretical analysis.
Comments & Academic Discussion
Loading comments...
Leave a Comment