Approximate Distance Oracles with Improved Preprocessing Time

Approximate Distance Oracles with Improved Preprocessing Time
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Given an undirected graph $G$ with $m$ edges, $n$ vertices, and non-negative edge weights, and given an integer $k\geq 1$, we show that for some universal constant $c$, a $(2k-1)$-approximate distance oracle for $G$ of size $O(kn^{1 + 1/k})$ can be constructed in $O(\sqrt km + kn^{1 + c/\sqrt k})$ time and can answer queries in $O(k)$ time. We also give an oracle which is faster for smaller $k$. Our results break the quadratic preprocessing time bound of Baswana and Kavitha for all $k\geq 6$ and improve the $O(kmn^{1/k})$ time bound of Thorup and Zwick except for very sparse graphs and small $k$. When $m = \Omega(n^{1 + c/\sqrt k})$ and $k = O(1)$, our oracle is optimal w.r.t.\ both stretch, size, preprocessing time, and query time, assuming a widely believed girth conjecture by Erd\H{o}s.


💡 Research Summary

The paper addresses the classic problem of building compact distance oracles for undirected weighted graphs. A distance oracle is a pre‑processed data structure that, given any pair of vertices, returns an estimate of the shortest‑path distance. The quality of an oracle is measured by four parameters: stretch (the multiplicative factor by which the estimate may exceed the true distance), space usage, preprocessing time, and query time. Thorup and Zwick (2005) introduced a seminal oracle with stretch = 2k − 1, space = O(k·n^{1+1/k}), preprocessing = O(k·m·n^{1/k}) and query time = O(k). Later, Baswana and Kavitha (2006) showed that the same stretch and space can be achieved in O(n²) preprocessing, which is advantageous only when the graph is dense (m≈n²). The main contribution of this work is to break the quadratic preprocessing barrier for all k ≥ 6 and to improve the Thorup‑Zwick bound for graphs that are not extremely sparse.

The authors combine two well‑known techniques: random vertex sampling and graph spanners. First, each vertex is sampled independently with probability p = n^{−i/k} (i is a tunable parameter). The sampled set S has expected size Θ(n^{1−i/k}). For every vertex u they compute its nearest sampled vertex p_S(u) and the distance to it using a single Dijkstra run from a super‑source attached to all sampled vertices; this costs O(m + n log n). Using these nearest‑sample distances they define a subgraph G_S that retains all edges incident to a vertex v whose weight is smaller than d_G(v, p_S(v)). Lemma 3 guarantees that for any pair (u,v) where v lies in the “ball” B_S(u) (i.e., closer to u than to its nearest sample) the distance in G_S equals the true distance in G. Consequently, G_S preserves exact distances for all “close” pairs while having only O(n/p) = O(n^{1+i/k}) edges in expectation.

On this sparse subgraph G_S the classic Thorup‑Zwick construction is applied, yielding a (2k − 1)‑stretch oracle of size O(k·n^{1+1/k}) and preprocessing O(k·|E_S|·n^{1/k}) = O(k·n^{1+(i+1)/k}). However, for vertex pairs that are far apart (i.e., neither belongs to the other’s ball), G_S alone does not guarantee a good approximation. To handle these, the authors compute a (2k′ − 1)‑spanner H of the original graph using the linear‑time algorithm of Baswana and Sen (2007). The spanner has O(k′·n^{1+1/k′}) edges. For every sampled vertex pair (p,q)∈S×S they run Dijkstra on H to obtain the exact distance d_H(p,q) and store it. This extra work costs O(|S|·(|E_H|+n log n)) = O(k′·n^{2+1/k′−i/k}) time and O(|S|²) = O(n^{2−2i/k}) space.

A query for vertices (u,v) proceeds as follows: (1) retrieve p_S(u), p_S(v) and their distances to u and v; (2) ask the Thorup‑Zwick oracle on G_S for an estimate ˜d₁(u,v); (3) compute ˜d₂(u,v)=d_G(u,p_S(u)) + d_H(p_S(u),p_S(v)) + d_G(v,p_S(v)). The answer is min{˜d₁,˜d₂}. The analysis shows that if u and v are close (one lies in the other’s ball) then ˜d₁ already satisfies the (2k − 1) bound. Otherwise, the spanner distance contributes at most (2k′ − 1) times the distance between the samples, and by choosing k′=⌊k/3⌋ the combined estimate ˜d₂ is bounded by (2k − 1)·d_G(u,v). Hence the overall stretch remains (2k − 1).

The preprocessing time depends on the choice of i and k′. By examining the three residue classes of k modulo 3, the authors select i and k′ that minimize the exponent of n in the total time expression. For example, when k≡0 (mod 3) they set k′=k/3 and i=k/2+1, yielding a total preprocessing of O(k·m + k·n^{1+(k/2+2)/k} + k·n^{2+3/k−(k/2+1)/k}), which is sub‑quadratic for all k≥6. Similar formulas are given for the other two cases, all achieving sub‑quadratic time and, for sufficiently dense graphs (m=Ω(n^{1+c/√k}) with constant k), linear time O(m). Under Erdős’s girth conjecture, this linear‑time construction is optimal with respect to stretch, space, preprocessing, and query time.

In summary, the paper presents a family of distance oracles that retain the optimal (2k − 1) stretch and near‑optimal space of Thorup‑Zwick while dramatically reducing preprocessing time: O(√k·m + k·n^{1+c/√k}) for general k, and even O(k·m + k·n^{3/2}) for small k (k≥3). The results close a long‑standing gap in the literature, offering practically feasible preprocessing for large, moderately dense graphs and establishing theoretical optimality under widely believed conjectures. Future directions include dynamic updates, parallel/distributed implementations, and extensions to directed or negatively weighted graphs.


Comments & Academic Discussion

Loading comments...

Leave a Comment