Community Detection Using A Neighborhood Strength Driven Label Propagation Algorithm
Studies of community structure and evolution in large social networks require a fast and accurate algorithm for community detection. As the size of analyzed communities grows, complexity of the community detection algorithm needs to be kept close to linear. The Label Propagation Algorithm (LPA) has the benefits of nearly-linear running time and easy implementation, thus it forms a good basis for efficient community detection methods. In this paper, we propose new update rule and label propagation criterion in LPA to improve both its computational efficiency and the quality of communities that it detects. The speed is optimized by avoiding unnecessary updates performed by the original algorithm. This change reduces significantly (by order of magnitude for large networks) the number of iterations that the algorithm executes. We also evaluate our generalization of the LPA update rule that takes into account, with varying strength, connections to the neighborhood of a node considering a new label. Experiments on computer generated networks and a wide range of social networks show that our new rule improves the quality of the detected communities compared to those found by the original LPA. The benefit of considering positive neighborhood strength is pronounced especially on real-world networks containing sufficiently large fraction of nodes with high clustering coefficient.
💡 Research Summary
The paper addresses two major shortcomings of the classic Label Propagation Algorithm (LPA) for community detection in large social networks: (1) unnecessary label updates that waste computational effort, and (2) the algorithm’s ignorance of the structural strength of a node’s neighborhood. While LPA enjoys near‑linear time complexity and parameter‑free operation, its convergence can be slow and the resulting partitions often vary in quality because the process may get trapped in local minima.
To accelerate convergence, the authors introduce an “active node list” mechanism. Nodes are classified as interior (all neighbors share the same label) or boundary (at least one neighbor has a different label). Interior nodes are passive and removed from the list; only boundary nodes that are still capable of changing their label remain active. During each asynchronous iteration, a random active node is selected, its label is updated according to the chosen rule, and the list is refreshed by checking the node itself and its neighbors for status changes. This approach guarantees that every attempted update actually changes a label, thereby reducing the total number of iterations to the exact count of effective updates. Complexity analysis shows that list initialization costs O(n), each update costs O(d_i) for node i, and convergence checking is O(1). Empirical tests on nine real‑world networks (including Zachary’s karate club, Les Misérables, political books, football schedule, network‑science co‑authorship, email, corporate EVA, arXiv GR‑Qc, and PGP) demonstrate that the scaled number of iterations (iterations / n) stays below 3 even for networks of ten thousand nodes, and speed‑up factors range from 1.5× on small graphs to more than 6× on larger ones.
The second contribution is a generalized label‑update rule that incorporates “neighborhood strength.” Instead of simply counting how many neighbors carry each label, the algorithm computes a weighted score for each candidate label k:
S_k = Σ_{j∈N(i)} w_{ij}·δ(L(j),k)
where w_{ij} reflects the structural importance of neighbor j relative to i. The authors propose to set w_{ij} proportional to the local clustering coefficient (or triangle participation) of node j, thereby rewarding labels that belong to tightly‑connected neighborhoods. This modification captures the intuition that a person is more likely to adopt an idea from a neighbor who is well‑connected to the rest of the person’s social circle.
Experiments comparing the original LPA, the speed‑up version, and the strength‑driven version show that the latter consistently yields higher modularity scores (Q). The improvement is especially pronounced in networks with high average clustering coefficients, where Q increases by 10–30 % relative to the baseline. The strength‑driven rule also reduces the number of ambiguous label ties, leading to more stable partitions across multiple runs.
Overall, the paper demonstrates that (i) eliminating redundant updates via an active‑node bookkeeping scheme dramatically cuts runtime without altering asymptotic complexity, and (ii) enriching the label‑choice criterion with neighborhood‑strength information substantially enhances community quality. The authors acknowledge that computing clustering coefficients adds a preprocessing step, and that the weighted score may become costly when many distinct labels coexist. Future work is suggested on approximating the strength weights, extending the method to dynamic or streaming graphs, and handling overlapping communities. The presented enhancements preserve LPA’s simplicity and near‑linear scalability while delivering both faster convergence and more accurate community detection.
Comments & Academic Discussion
Loading comments...
Leave a Comment