Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage
We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method employs a branch and bound strategy with novel and aggressive pruning techniques. For instance, we use the core number of a vertex in combination with a good heuristic clique finder to efficiently remove the vast majority of the search space. In addition, we parallelize the exploration of the search tree. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with large search spaces can be pruned by other processes. We apply the algorithm to two problems: to compute temporal strong components and to compress graphs.
💡 Research Summary
The paper presents a highly efficient parallel exact algorithm for finding the maximum clique in large, sparse graphs typical of social and information networks. Recognizing that the maximum‑clique problem is NP‑hard, the authors exploit structural properties of real‑world networks—particularly low degeneracy and the presence of high‑core subgraphs—to dramatically prune the search space. The algorithm proceeds in several stages. First, a fast greedy heuristic based on vertex degree quickly constructs a large clique; this clique serves as an initial lower bound (maxSoFar) and often is already optimal. Second, tight upper bounds are derived from k‑core numbers (K(v)+1), greedy coloring in degeneracy order (L(G)), and from the same bounds applied to each vertex’s reduced neighborhood graph N_R(v). These bounds can be computed in linear or near‑linear time (O(|V|+|E|) for cores, O(|E|+|T|) for neighborhood bounds, where T is the number of triangles). Third, the algorithm aggressively prunes any vertex whose bound does not exceed the current lower bound, removing it implicitly to avoid costly graph updates. Fourth, the remaining search is performed using a branch‑and‑bound framework that expands a search tree of candidate cliques. The key parallelization strategy lets multiple workers explore different branches simultaneously while sharing the global upper and lower bounds via shared memory (or message passing). Whenever any worker discovers a better lower bound, all others immediately incorporate it, causing many sub‑trees to be cut off early; this sometimes yields super‑linear speedups, especially when large‑degree vertices dominate the search space. Implementation details include lock‑free data structures, periodic full graph recomputation to keep memory usage low, and careful ordering of vertices from smallest to largest degree to keep neighborhoods small. Empirical evaluation on 32 real networks (ranging from a few thousand to 100 million vertices) shows near‑linear scaling of runtime with graph size, and the method solves a 1.8 billion‑edge Twitter‑retweet graph in about 20 minutes on a 16‑core shared‑memory machine. The heuristic alone finds the optimal clique in 52 of 74 networks examined in the online appendix, dramatically reducing exact search effort. Two applications are demonstrated. (1) Temporal strong components: by constructing a temporal reachability graph where edges represent time‑respecting paths, the maximum clique corresponds to the largest temporal strong component, enabling fast analysis of dynamic communication networks. (2) Graph compression: using cliques to derive a vertex ordering that localizes edges, the algorithm achieves compression quality comparable to specialized heuristics. The paper concludes that combining core‑number and coloring bounds with real‑time bound sharing yields a practical, scalable exact maximum‑clique solver, and suggests future extensions to distributed memory systems, GPU acceleration, and dynamic graph updates.
Comments & Academic Discussion
Loading comments...
Leave a Comment