A Local Perspective-based Model for Overlapping Community Detection
Community detection, which identifies densely connected node clusters with sparse between-group links, is vital for analyzing network structure and function in real-world systems. Most existing community detection methods based on GCNs primarily focus on node-level information while overlooking community-level features, leading to performance limitations on large-scale networks. To address this issue, we propose LQ-GCN, an overlapping community detection model from a local community perspective. LQ-GCN employs a Bernoulli-Poisson model to construct a community affiliation matrix and form an end-to-end detection framework. By adopting local modularity as the objective function, the model incorporates local community information to enhance the quality and accuracy of clustering results. Additionally, the conventional GCNs architecture is optimized to improve the model capability in identifying overlapping communities in large-scale networks. Experimental results demonstrate that LQ-GCN achieves up to a 33% improvement in Normalized Mutual Information (NMI) and a 26.3% improvement in Recall compared to baseline models across multiple real-world benchmark datasets.
💡 Research Summary
The paper introduces LQ‑GCN, a novel overlapping community detection framework that combines graph convolutional networks (GCNs) with a Bernoulli‑Poisson (B‑P) affiliation model and a local modularity (LQ) loss. Existing GCN‑based methods such as NOCD, UCoDe, and CDMG focus mainly on node‑level information and employ global modularity, which limits their ability to capture fine‑grained community structure, especially in large‑scale graphs. LQ‑GCN addresses these gaps by (1) integrating the B‑P model to directly learn a probabilistic node‑community affiliation matrix F, where the probability of an edge between two nodes grows with the number of shared communities; (2) incorporating a local modularity term that evaluates the connectivity of each community with its immediate neighbors, thereby encouraging high intra‑community cohesion while suppressing inter‑community similarity; and (3) employing a streamlined two‑layer GCN architecture with normalized adjacency (\bar A = I + D^{-1/2} A D^{-1/2}), tanh‑ReLU activations, L2 regularization, and dropout to maintain scalability and avoid over‑smoothing.
The overall loss is a weighted sum (\mathcal L = \alpha \mathcal L_{BP} + \beta \mathcal L_{LQ}). (\mathcal L_{BP}) is derived from the log‑likelihood of the B‑P model, with sampling‑based weighting to counteract the extreme imbalance between edges and non‑edges. (\mathcal L_{LQ}) is formulated as a cross‑entropy that maximizes diagonal entries (local intra‑community modularity) and minimizes off‑diagonal entries (local inter‑community similarity) of the matrix (LQ_M = C^\top B C). Training proceeds with Adam; initially only (\mathcal L_{BP}) is optimized for rapid convergence, and after a patience of 30 epochs the LQ term is added to refine community boundaries. Early stopping after 80 stagnant epochs prevents wasteful computation. Final community assignments are obtained by thresholding the learned affiliation scores.
Experiments were conducted on six real‑world datasets: three small Facebook social networks (170–792 nodes) and three large co‑authorship graphs from the Microsoft Academic Graph (≈15k–35k nodes). Evaluation metrics were Overlapping Normalized Mutual Information (ONMI) and Recall. LQ‑GCN‑X (using both adjacency and node attributes) consistently outperformed baselines (BIGCLAM, CESNA, NOCD, UCoDe, CDMG). Gains reached up to 33 % improvement in NMI and 26.3 % in Recall, with particularly strong results on the large co‑authorship graphs where previous methods suffered from scalability or accuracy issues. Ablation studies demonstrated that removing either the B‑P component or the LQ loss caused substantial performance drops, confirming that both probabilistic affiliation modeling and local modularity regularization are essential. Sensitivity analysis on (\alpha) and (\beta) highlighted the need for balanced weighting to achieve stable training.
In summary, LQ‑GCN presents a compelling solution for overlapping community detection in massive networks by explicitly leveraging local structural cues within a GCN‑based end‑to‑end learning pipeline. The authors suggest future extensions to dynamic graphs, automatic estimation of the number of communities, and integration with alternative graph neural architectures to broaden applicability.
Comments & Academic Discussion
Loading comments...
Leave a Comment