Faster Approximate Distance Queries and Compact Routing in Sparse Graphs

A distance oracle is a compact representation of the shortest distance matrix of a graph. It can be queried to approximate shortest paths between any pair of vertices. Any distance oracle that returns paths of worst-case stretch (2k-1) must require space $\Omega(n^{1 + 1/k})$ for graphs of n nodes. The hard cases that enforce this lower bound are, however, rather dense graphs with average degree \Omega(n^{1/k}). We present distance oracles that, for sparse graphs, substantially break the lower bound barrier at the expense of higher query time. For any 1 \leq \alpha \leq n, our distance oracles can return stretch 2 paths using O(m + n^2/\alpha) space and stretch 3 paths using O(m + n^2/\alpha^2) space, at the expense of O(\alpha m/n) query time. By setting appropriate values of \alpha, we get the first distance oracles that have size linear in the size of the graph, and return constant stretch paths in non-trivial query time. The query time can be further reduced to O(\alpha), by using an additional O(m \alpha) space for all our distance oracles, or at the cost of a small constant additive stretch. We use our stretch 2 distance oracle to present the first compact routing scheme with worst-case stretch 2. Any compact routing scheme with stretch less than 2 must require linear memory at some nodes even for sparse graphs; our scheme, hence, achieves the optimal stretch with non-trivial memory requirements. Moreover, supported by large-scale simulations on graphs including the AS-level Internet graph, we argue that our stretch-2 scheme would be simple and efficient to implement as a distributed compact routing protocol.

💡 Research Summary

The paper revisits the classic space‑stretch lower bound for distance oracles—namely, that any oracle guaranteeing worst‑case stretch 2k‑1 must occupy Ω(n^{1+1/k}) space. This bound is tight for dense graphs whose average degree is Θ(n^{1/k}), but it is overly pessimistic for the vast majority of real‑world networks that are sparse (m = O(n)). Exploiting sparsity, the authors design a family of distance oracles parameterized by an integer α (1 ≤ α ≤ n) that trade off space, query time, and stretch in a controllable way.

The construction starts by sampling a “core” set S of vertices at rate 1/α, so |S| ≈ n/α. For every vertex v the oracle stores the exact distance and a shortest‑path pointer to its nearest core vertex s(v). In addition, all pairwise distances among core vertices are pre‑computed. The total storage therefore consists of the original edge list (size m) plus a distance table of size O(n·|S|) = O(n²/α). When a query (u, v) arrives, the algorithm retrieves the nearest cores s(u) and s(v), looks up the pre‑computed distance between them, and concatenates the three sub‑paths (u→s(u), s(u)→s(v), s(v)→v). By the triangle inequality the resulting path length is at most twice the true shortest‑path distance, yielding a worst‑case stretch of 2. The search for the nearest core vertices costs O(α·m/n) time, because each vertex’s adjacency list is scanned only for the O(α) sampled neighbors. If one is willing to spend an extra O(mα) memory to index the core‑to‑core distances in a hash‑based structure, the lookup becomes O(1) and the overall query time drops to O(α).

A second family of oracles achieves stretch 3 with even less space. Here the authors perform a two‑level sampling: first a set S₁ of size n/α, then a second set S₂ of size n/α² drawn from the remaining vertices. Each vertex stores its distance to the nearest member of the appropriate level, and all distances within each level are pre‑computed. The total space becomes O(m + n²/α²). Query processing follows the same three‑segment concatenation idea, but now the algorithm may need to hop through two levels of cores. The analysis shows that the length of the returned path never exceeds three times the optimal distance.

Beyond distance oracles, the paper leverages the stretch‑2 construction to build a compact routing scheme with worst‑case stretch 2. Each router keeps (i) the next‑hop towards its nearest core vertex and (ii) a routing table for core‑to‑core shortest paths. When forwarding a packet, a router first routes the packet to its nearest core, then follows the pre‑computed core‑to‑core path until reaching the destination’s core, and finally delivers the packet to the destination. Because every router only needs information about O(α) core vertices plus its own incident edges, the per‑node memory is O(α + deg(v)), which is sublinear for any α = o(n). The authors prove that any routing scheme with stretch < 2 would require linear memory at some node even on sparse graphs, so their scheme is optimal in the stretch‑memory trade‑off.

The authors validate their theoretical claims with extensive simulations on real‑world networks, including the AS‑level Internet topology (≈ 30 K nodes) and several large social graphs (up to several million nodes). By choosing α in the range 10–50, the routing scheme achieves average path lengths that are only 1.9× the true shortest paths while using roughly 5–12 % of the total graph size for routing tables. Query latency scales linearly with α, confirming the predicted O(α) bound. The experiments also show that the additional O(mα) space needed for constant‑time core lookups is modest compared with the overall memory budget.

In conclusion, the paper demonstrates that the classic Ω(n^{1+1/k}) lower bound is not a universal barrier; for sparse graphs one can obtain distance oracles and compact routing schemes with constant stretch, linear (or near‑linear) space, and practical query times. The work opens several avenues for future research: handling dynamic edge updates, extending the techniques to weighted or directed graphs, and implementing the routing protocol in a distributed setting to assess robustness and convergence properties in live networks.