"Tri, Tri again": Finding Triangles and Small Subgraphs in a Distributed Setting
Let G = (V,E) be an n-vertex graph and M_d a d-vertex graph, for some constant d. Is M_d a subgraph of G? We consider this problem in a model where all n processes are connected to all other processes, and each message contains up to O(log n) bits. A simple deterministic algorithm that requires O(n^((d-2)/d) / log n) communication rounds is presented. For the special case that M_d is a triangle, we present a probabilistic algorithm that requires an expected O(ceil(n^(1/3) / (t^(2/3) + 1))) rounds of communication, where t is the number of triangles in the graph, and O(min{n^(1/3) log^(2/3) n / (t^(2/3) + 1), n^(1/3)}) with high probability. We also present deterministic algorithms specially suited for sparse graphs. In any graph of maximum degree Delta, we can test for arbitrary subgraphs of diameter D in O(ceil(Delta^(D+1) / n)) rounds. For triangles, we devise an algorithm featuring a round complexity of O(A^2 / n + log_(2+n/A^2) n), where A denotes the arboricity of G.
💡 Research Summary
The paper studies the fundamental problem of detecting a fixed‑size subgraph Mₙ (in particular, triangles) inside an n‑vertex graph G in the CONGEST model where every node (process) can communicate directly with any other node but each message is limited to O(log n) bits. The authors present a suite of algorithms that trade off deterministic guarantees, randomness, and structural properties of G (such as maximum degree Δ, arboricity A, and the number of triangles t) to achieve sublinear round complexities.
First, for any constant‑size pattern Mₙ with d vertices, they give a simple deterministic protocol that partitions the vertex set into roughly n^{2/d} groups. In each round a single group broadcasts its 2‑hop neighbourhood information to the whole network, using clever encoding to respect the O(log n) bandwidth limit. The total number of rounds is O(n^{(d‑2)/d}/log n). For d = 3 (triangles) this yields O(n^{1/3}/log n) rounds, matching the known lower bound up to logarithmic factors.
Next, the authors focus on triangle detection and exploit the fact that the difficulty of the problem depends heavily on how many triangles actually exist. They propose a randomized algorithm that samples neighbours at each node and checks whether two sampled neighbours share a common neighbour. By repeating this process, the expected number of rounds drops to O(⌈n^{1/3}/(t^{2/3}+1)⌉). When t is large (e.g., Θ(n)), the algorithm finishes in constant expected rounds; when t is tiny, the bound degrades gracefully to Θ(n^{1/3}). Using Chernoff bounds they also obtain a high‑probability guarantee of O(min{n^{1/3} log^{2/3} n/(t^{2/3}+1), n^{1/3}}) rounds. This adaptive behaviour is a notable improvement over previous work that treated t as a worst‑case parameter.
The paper then addresses sparse graphs. If the input graph has maximum degree Δ, any subgraph of diameter D can be detected deterministically in O(⌈Δ^{D+1}/n⌉) rounds. The intuition is that each node needs only to collect information from its Δ‑hop neighbourhood, which contains at most Δ^{D} vertices; broadcasting this bounded amount of data across the network requires the stated number of rounds. For triangles (D = 2) the bound becomes O(⌈Δ^{3}/n⌉).
Finally, the authors exploit arboricity A, a measure of how “tree‑like” a graph is. By decomposing G into A edge‑disjoint forests, each node can locally verify triangle candidates within its own forest and across adjacent forests. The communication cost is proportional to A²/n, plus a logarithmic term that stems from a binary‑search‑style verification phase. The resulting round complexity is O(A²/n + log_{2+n/A²} n). Since many real‑world networks (social, web, biological) have low arboricity, this algorithm can be dramatically faster than the generic O(n^{1/3}) bound.
Overall, the paper contributes a clear taxonomy of distributed subgraph detection algorithms: a universal deterministic method with a simple n^{(d‑2)/d} dependence, a triangle‑specific randomized method that adapts to the actual triangle count, and two structural‑parameter‑driven deterministic schemes for low‑degree or low‑arboricity graphs. The analysis is rigorous, with explicit round‑complexity proofs and high‑probability bounds. However, the model assumes a fully connected communication topology, which may be unrealistic in many practical distributed systems; the impact of network diameter and routing overhead is not addressed. Moreover, the paper focuses on asymptotic round counts and does not provide experimental validation or concrete constant‑factor estimates, leaving open questions about practical performance. Future work could explore extensions to more realistic topologies, hybrid deterministic‑randomized protocols, and empirical studies on real datasets.
Comments & Academic Discussion
Loading comments...
Leave a Comment