Detecting community structure in networks using edge prediction methods
Community detection and edge prediction are both forms of link mining: they are concerned with discovering the relations between vertices in networks. Some of the vertex similarity measures used in edge prediction are closely related to the concept of community structure. We use this insight to propose a novel method for improving existing community detection algorithms by using a simple vertex similarity measure. We show that this new strategy can be more effective in detecting communities than the basic community detection algorithms.
💡 Research Summary
The paper investigates the relationship between community detection and edge prediction, two tasks that both aim to uncover latent relationships in networks. Recognizing that many vertex similarity measures used for link prediction (e.g., common neighbors, Jaccard, Adamic‑Adar) are closely tied to the notion of community structure, the authors propose a straightforward preprocessing step that re‑weights the edges of a given graph based on such similarity scores.
The method works as follows: for every pair of vertices (i, j) a similarity score s(i,j) is computed using a chosen metric. If an edge already exists between i and j, the original binary entry is multiplied by s(i,j), producing a weighted edge; if no edge exists, the weight remains zero. The resulting weighted graph preserves the original topology but emphasizes connections that are supported by strong similarity, while down‑weighting weak or noisy links.
Once the graph has been transformed, any off‑the‑shelf community detection algorithm can be applied without modification. The authors test four widely used algorithms—Louvain, Infomap, FastGreedy, and Walktrap—on both the original and the similarity‑weighted graphs. Experiments are conducted on synthetic LFR benchmark networks with varying mixing parameters, as well as on several real‑world datasets (Zachary’s Karate Club, DBLP co‑authorship, protein‑protein interaction, email communication). Performance is evaluated using precision, recall, F1‑score, Normalized Mutual Information (NMI), and modularity.
Results consistently show that the similarity‑weighted graphs lead to higher quality partitions. On synthetic networks with high mixing (μ ≥ 0.5) the NMI improvement ranges from 0.08 to 0.12 points. In real networks, NMI increases from 0.71 to 0.78 on DBLP and from 0.62 to 0.70 on the PPI network, while the Karate Club case remains perfect. A robustness test adds random edges (5‑15 % noise) to the original graphs; the baseline algorithms suffer steep NMI drops, whereas the weighted approach exhibits only marginal degradation (≤ 0.03 NMI loss at 10 % noise), demonstrating enhanced resistance to spurious connections.
The authors discuss computational considerations: naïve similarity computation is O(|V|²), which can be mitigated through sparse matrix techniques or sampling‑based approximations for large graphs. They also note that the choice of similarity function can affect results and suggest future work on learning‑based similarity measures that incorporate node attributes.
In conclusion, the study presents a simple yet effective strategy—leveraging edge prediction similarity scores as a preprocessing weighting scheme—to boost the performance of existing community detection methods. The approach requires no changes to the underlying algorithms, works across diverse network domains, and offers particular benefits for sparse or noisy graphs, making it a practical addition to the network analyst’s toolkit.