Efficient modularity optimization by multistep greedy algorithm and vertex mover refinement
Identifying strongly connected substructures in large networks provides insight into their coarse-grained organization. Several approaches based on the optimization of a quality function, e.g., the modularity, have been proposed. We present here a multistep extension of the greedy algorithm (MSG) that allows the merging of more than one pair of communities at each iteration step. The essential idea is to prevent the premature condensation into few large communities. Upon convergence of the MSG a simple refinement procedure called “vertex mover” (VM) is used for reassigning vertices to neighboring communities to improve the final modularity value. With an appropriate choice of the step width, the combined MSG-VM algorithm is able to find solutions of higher modularity than those reported previously. The multistep extension does not alter the scaling of computational cost of the greedy algorithm.
💡 Research Summary
**
The paper addresses the well‑known problem of community detection in large networks by optimizing the modularity quality function Q. While modularity optimization is NP‑hard, greedy agglomerative methods are popular because of their speed, but they suffer from premature condensation: early merges create a few very large communities, preventing later refinements and often yielding sub‑optimal Q values.
To overcome this limitation the authors propose a Multistep Greedy (MSG) algorithm. In each iteration MSG does not merge only the single pair with the highest positive modularity gain ΔQ; instead it selects the top l pairs (where l is a user‑defined “step width”) whose ΔQ values are positive and among the highest. These candidate pairs are processed in descending ΔQ order and ascending community index, but a “touched‑community exclusion rule” (TCER) forbids any community that has already been merged in the current round from participating in another merge. Consequently several small communities can grow simultaneously, delaying the formation of oversized modules and preserving finer structure.
Implementation details are carefully described. The ΔQ matrix is stored as a vector of rows, each row being a C++ STL set of (neighbor‑community, ΔQ) entries sorted by neighbor index. A second set, the “level set”, holds all candidate triples (i, j, ΔQij) sorted by decreasing ΔQ and increasing (i, j). When two communities i and j are merged into a new community I, only the rows of i and j need to be updated; the new ΔQ values with any third community k are computed by a closed‑form expression (Eq. 1) that distinguishes whether k is linked to i, to j, or to both. Updating a single entry costs O(log N) because of the set structure, and updating all entries for a merge costs O((d_i + d_j)·log N), where d_i and d_j are the sums of degrees of vertices in the two merged communities.
The overall time complexity of MSG is therefore O(D·M·log N), where M is the number of edges, N the number of vertices, and D the depth of the dendrogram (the number of outer iterations). This matches the asymptotic cost of the classic Clauset‑Newman‑Moore greedy algorithm, confirming that the multistep extension does not sacrifice speed.
After MSG converges, the authors apply a refinement step called Vertex Mover (VM). VM iterates over all vertices in order of increasing degree and vertex index. For each vertex v currently belonging to community i, the algorithm computes the modularity gain that would result from moving v to each neighboring community j using a simple formula (Eq. 2) that depends on the number of edges from v to j, the degrees of the involved communities, and the total edge weight L. The move that yields the largest positive gain is executed immediately, guaranteeing that each individual reassignment increases Q. The process repeats over the whole vertex list until no move can improve modularity. The dominant cost of one VM sweep is proportional to the sum of vertex degrees, i.e., O(L), and in practice only a few sweeps are needed for convergence.
Experimental evaluation is performed on a diverse collection of real‑world networks (social, biological, technological) ranging from a few hundred to several hundred thousand nodes. The combined MSG‑VM method consistently achieves higher modularity scores than the standard greedy algorithm, the Louvain method, simulated annealing, and other recent heuristics. The authors also study the influence of the step‑width parameter l; values around 2–3 provide the best trade‑off between solution quality and runtime, while very large l can degrade performance because too many merges are forced simultaneously. Importantly, the runtime of MSG‑VM remains comparable to the fastest greedy approaches, confirming that the added refinement does not introduce prohibitive computational overhead.
In summary, the paper presents a conceptually simple yet effective enhancement to greedy modularity optimization: (1) a multistep merging scheme that prevents early over‑aggregation, and (2) a lightweight vertex‑level refinement that fine‑tunes the partition. The algorithm retains the linear‑logarithmic scaling of the original greedy method, making it suitable for very large networks, and it delivers superior modularity values across a broad set of benchmarks. Future work could extend the framework to dynamic networks, directed or weighted graphs, or alternative quality functions beyond modularity.
Comments & Academic Discussion
Loading comments...
Leave a Comment