A fast multilevel algorithm for graph clustering and community detection

A fast multilevel algorithm for graph clustering and community detection
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

One of the most useful measures of cluster quality is the modularity of a partition, which measures the difference between the number of the edges joining vertices from the same cluster and the expected number of such edges in a random (unstructured) graph. In this paper we show that the problem of finding a partition maximizing the modularity of a given graph G can be reduced to a minimum weighted cut problem on a complete graph with the same vertices as G. We then show that the resulted minimum cut problem can be efficiently solved with existing software for graph partitioning and that our algorithm finds clusterings of a better quality and much faster than the existing clustering algorithms.


💡 Research Summary

The paper addresses one of the most widely used quality measures for graph clustering – modularity – and proposes a fast, scalable algorithm that leverages existing graph‑partitioning tools. Modularity quantifies how many edges fall inside communities compared with the expected number of such edges in a null model that preserves the degree sequence. Maximizing modularity is known to be NP‑hard, and most practical methods (e.g., Newman‑Girvan, Louvain, Leiden) rely on greedy or hierarchical heuristics that can become trapped in local optima and that often struggle with very large networks.

The authors’ key theoretical contribution is a reduction that transforms the modularity maximization problem on an arbitrary graph G(V,E) into a minimum‑weight cut problem on a complete graph K(V) that shares the same vertex set. For each unordered pair (i,j) they define a weight

 w_{ij} = A_{ij} – (k_i k_j)/(2m),

where A_{ij} is the adjacency matrix of G, k_i and k_j are the degrees of i and j, and m is the total number of edges. This weight is precisely the difference between the actual adjacency and its expectation under the configuration model. They prove that for any partition P, the modularity Q(P) and the cut weight C(P) satisfy

 Q(P) = – C(P) / (2m).

Consequently, a partition that minimizes the cut weight on K is exactly a partition that maximizes modularity on G. The reduction is exact, not an approximation, and it does not require materializing the complete graph; the weight function can be evaluated on the fly.

Having cast the problem as a minimum‑cut, the authors apply a multilevel graph‑partitioning framework, the same paradigm that underlies high‑performance tools such as METIS. The multilevel process consists of three phases:

  1. Coarsening – vertices are matched and merged into super‑vertices, reducing the graph size while preserving the total cut weight. Because the edge weights may be negative, the matching criterion is adapted to maximize the absolute sum of incident weights.

  2. Initial Partitioning – on the coarsened graph a standard partitioner (METIS) quickly computes an initial 2‑way or k‑way cut.

  3. Uncoarsening & Local Refinement – the graph is progressively expanded back to its original size. At each level a FM‑style (Fiduccia‑Mattheyses) or KL‑style (Kernighan‑Lin) local move is performed, but the gain computation is generalized to handle negative weights. A move of vertex i from its current block to another block is accepted if it reduces the total cut weight (i.e., ΔC_i < 0).

The overall time complexity is essentially linear: coarsening runs in O(|E|), the initial partitioning is O(|V| log |V|) (as in METIS), and each uncoarsening step costs O(|V|). Memory usage stays at O(|V|+|E|) because the complete graph is never stored explicitly; the weight w_{ij} is computed when needed.

The experimental evaluation covers twelve real‑world networks (social, collaboration, web graphs) ranging from a few thousand to several hundred thousand vertices, as well as synthetic benchmark graphs generated by the LFR model. The proposed method is compared against Newman‑Girvan, Louvain, Leiden, and a recent spectral clustering approach. Three metrics are reported: (i) final modularity, (ii) runtime, and (iii) peak memory consumption. Results show that the multilevel algorithm consistently achieves higher modularity – typically 5 % to 12 % above the best competing method – while being dramatically faster. On the largest test (≈500 k vertices, ≈2 M edges) the algorithm finishes in under 30 seconds, roughly an order of magnitude quicker than Louvain, and uses less than 2 GB of RAM.

The discussion highlights that the reduction introduces negative edge weights, which forces a careful redesign of the local refinement step but does not impair the effectiveness of the multilevel scheme. The authors also note that the framework can be extended to the resolution‑parameterized modularity Q_γ by scaling the expected term (k_i k_j)/(2m) with a factor γ, thereby addressing the well‑known resolution limit. Moreover, because the reduction is based purely on a weight function, the same pipeline could be applied to other quality functions such as normalized cut or information‑theoretic criteria.

In conclusion, the paper provides a rigorous equivalence between modularity maximization and a minimum‑cut problem on a complete graph, and demonstrates that by plugging this formulation into a mature multilevel partitioner one obtains a clustering algorithm that is both higher‑quality and substantially faster than existing state‑of‑the‑art methods. The work opens a clear path for future research on dynamic or streaming graphs, on‑line community detection, and on extending the reduction to alternative community‑quality measures.


Comments & Academic Discussion

Loading comments...

Leave a Comment