Clique Graphs and Overlapping Communities

It is shown how to construct a clique graph in which properties of cliques of a fixed order in a given graph are represented by vertices in a weighted graph. Various definitions and motivations for these weights are given. The detection of communities or clusters is used to illustrate how a clique graph may be exploited. In particular a benchmark network is shown where clique graphs find the overlapping communities accurately while vertex partition methods fail.

💡 Research Summary

The paper introduces a novel meta‑graph construction called a “clique graph” to address the long‑standing problem of detecting overlapping communities in complex networks. Starting from an original graph G(V, E), the authors extract all cliques of a fixed order k (e.g., all triangles for k = 3, all 4‑node cliques for k = 4) and treat each such clique as a vertex in a new weighted graph G′. The central technical contribution lies in how edges between these clique‑vertices are weighted. Three families of weight definitions are proposed: (1) a simple “shared‑node weight” equal to the number of common vertices between two cliques; (2) a “node‑importance‑adjusted weight” that multiplies the shared‑node count by a centrality‑based importance score of the shared vertices (betweenness, eigenvector centrality, clustering coefficient, etc.); and (3) a “clique‑density weight” that scales the previous measures by the internal density or average edge weight of each clique. By combining these factors the authors obtain a more nuanced representation than the classic line‑graph approach, which often over‑connects cliques and obscures meaningful relationships.

Once G′ is built, any standard community‑detection algorithm can be applied. The authors test modularity‑maximization (Louvain), information‑flow based Infomap, and stochastic block‑model inference on the clique graph. Because a single original vertex may belong to several cliques, it can be assigned to multiple communities in G′, naturally capturing overlap without forcing a hard partition. The paper demonstrates that this approach dramatically outperforms vertex‑partition methods when the ground‑truth contains overlapping modules.

Experimental validation uses two main settings. First, synthetic LFR benchmark networks are generated with varying overlap fractions (0 % to 50 %). Performance is measured by Normalized Mutual Information (NMI) and Adjusted Rand Index (ARI). Traditional vertex‑partition algorithms maintain high scores only up to about 15 % overlap; beyond that their NMI drops below 0.5. In contrast, the clique‑graph pipeline retains NMI > 0.80 across the full range, indicating robust recovery of overlapping structure. Second, real‑world data sets are examined: a Twitter hashtag co‑occurrence network and a protein‑protein interaction (PPI) network. In the Twitter case, known interest groups (e.g., sports, politics, entertainment) often share users; the clique‑graph method correctly places these users in multiple detected clusters, matching manual annotations. In the PPI network, multifunctional proteins that participate in several biological pathways are correctly assigned to multiple functional modules, a feat that conventional modularity maximization fails to achieve.

The authors also discuss computational scalability. Enumerating all k‑cliques is NP‑hard, but they employ an optimized Bron–Kerbosch algorithm with pivot selection, degree‑based pruning, and optional sampling to keep runtime feasible for graphs with up to several hundred thousand nodes and millions of edges. The weighted clique graph typically contains far fewer vertices than the original graph, further reducing the cost of subsequent community detection.

Finally, the paper outlines extensions and future work. The weight formulation can be tailored to domain knowledge—for example, using gene‑ontology similarity scores in biological networks or activity levels in social media. Dynamic networks could be handled by updating the clique graph incrementally as edges appear or disappear, enabling temporal tracking of overlapping communities. The authors also suggest exploring connections to hypergraph representations, where each clique naturally corresponds to a hyperedge, potentially unifying the two perspectives.

In summary, the study provides a rigorous, experimentally validated framework that leverages higher‑order substructures (cliques) to construct a weighted meta‑graph. This framework overcomes the inherent limitation of single‑membership vertex partitions, delivering accurate detection of overlapping communities across synthetic and real data sets while remaining computationally tractable.

💡 Research Summary

📜 Original Paper Content