Approximate Closest Community Search in Networks
Recently, there has been significant interest in the study of the community search problem in social and information networks: given one or more query nodes, find densely connected communities containing the query nodes. However, most existing studies do not address the “free rider” issue, that is, nodes far away from query nodes and irrelevant to them are included in the detected community. Some state-of-the-art models have attempted to address this issue, but not only are their formulated problems NP-hard, they do not admit any approximations without restrictive assumptions, which may not always hold in practice. In this paper, given an undirected graph G and a set of query nodes Q, we study community search using the k-truss based community model. We formulate our problem of finding a closest truss community (CTC), as finding a connected k-truss subgraph with the largest k that contains Q, and has the minimum diameter among such subgraphs. We prove this problem is NP-hard. Furthermore, it is NP-hard to approximate the problem within a factor $(2-\varepsilon)$, for any $\varepsilon >0 $. However, we develop a greedy algorithmic framework, which first finds a CTC containing Q, and then iteratively removes the furthest nodes from Q, from the graph. The method achieves 2-approximation to the optimal solution. To further improve the efficiency, we make use of a compact truss index and develop efficient algorithms for k-truss identification and maintenance as nodes get eliminated. In addition, using bulk deletion optimization and local exploration strategies, we propose two more efficient algorithms. One of them trades some approximation quality for efficiency while the other is a very efficient heuristic. Extensive experiments on 6 real-world networks show the effectiveness and efficiency of our community model and search algorithms.
💡 Research Summary
The paper addresses the community search problem in undirected graphs, focusing on the “free rider” issue where irrelevant nodes far from the query set are included in the returned community. To overcome this, the authors propose a new model called the Closest Truss Community (CTC). A CTC is defined as a connected k‑truss subgraph that (1) contains the query node set Q, (2) has the maximum possible trussness k among all such subgraphs, and (3) among those with maximal k, has the smallest graph diameter. This dual‑criterion definition simultaneously guarantees high cohesion (through k‑truss) and tight proximity (through diameter), thereby eliminating free riders.
The authors first prove that while the maximal‑k‑truss containing Q can be found in polynomial time, the additional diameter‑minimization makes the problem NP‑hard. Moreover, they show that no polynomial‑time algorithm can achieve an approximation factor better than (2 − ε) for any ε > 0, unless P = NP. This establishes a theoretical lower bound of 2 for any approximation of the optimal diameter.
To meet this bound, the paper introduces a greedy framework. The algorithm proceeds in two phases: (i) using a compact truss index, it quickly identifies the largest‑k truss that contains Q; (ii) it iteratively removes the node(s) farthest from Q, updating the truss structure after each removal. The removal continues until the subgraph’s diameter cannot be reduced further without violating the truss condition. The authors prove that this procedure yields a 2‑approximation to the optimal diameter, matching the theoretical lower bound.
Scalability is achieved through two complementary techniques. The first is a bulk‑deletion strategy that eliminates multiple distant nodes in a single step, reducing the number of expensive truss‑maintenance operations. The second is a local‑exploration heuristic that first computes a Steiner tree spanning Q and then expands it into a k‑truss by exploring neighboring edges, using edge trussness as a distance metric to preserve high k while keeping the community compact. The bulk‑deletion variant slightly degrades the approximation ratio but offers substantial speed‑ups, whereas the local‑exploration variant sacrifices some optimality for near‑real‑time performance.
Extensive experiments were conducted on six real‑world networks (including social, collaboration, and biological graphs) with up to millions of edges. Evaluation metrics comprised diameter, trussness, query distance, runtime, and memory consumption. Results demonstrate that CTC consistently yields smaller diameters and equal or higher trussness compared to existing k‑core, k‑truss, and quasi‑clique based community search methods, effectively removing free riders. The greedy algorithm answers queries in sub‑second time on large graphs, and the bulk‑deletion and local‑exploration variants further reduce response times to a few hundred milliseconds with modest memory overhead.
The paper’s contributions are fourfold: (1) a novel community model that jointly optimizes cohesion and proximity, (2) rigorous hardness and approximation‑bound proofs, (3) a 2‑approximation greedy algorithm with practical index‑based implementations, and (4) empirical validation showing superior quality and efficiency on real datasets. The authors suggest future work on dynamic graph updates, parallel processing for multiple query sets, and hybrid models that combine trussness with other density measures.
Comments & Academic Discussion
Loading comments...
Leave a Comment