Lower Bounds on Near Neighbor Search via Metric Expansion

Lower Bounds on Near Neighbor Search via Metric Expansion

In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance $r$ . We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and deterministic, exact and approximate algorithms. For example if the graph has node expansion $\Phi$ then we show that any deterministic $t$-probe data structure for $n$ points must use space $S$ where $(St/n)^t > \Phi$. We show similar results for randomized algorithms as well. These relationships can be used to derive most of the known lower bounds in the well known metric spaces such as $l_1$, $l_2$, $l_\infty$ by simply computing their expansion. In the process, we strengthen and generalize our previous results (FOCS 2008). Additionally, we unify the approach in that work and the communication complexity based approach. Our work reduces the problem of proving cell probe lower bounds of near neighbor search to computing the appropriate expansion parameter. In our results, as in all previous results, the dependence on $t$ is weak; that is, the bound drops exponentially in $t$. We show a much stronger (tight) time-space tradeoff for the class of dynamic low contention data structures. These are data structures that supports updates in the data set and that do not look up any single cell too often.


💡 Research Summary

The paper establishes a unified framework that translates geometric expansion properties of a metric space into cell‑probe lower bounds for nearest‑neighbor search (NNS). For any metric space ((X,d)) and a distance threshold (r), the authors construct a graph (G_r) whose vertices are the points of (X) and whose edges connect every pair whose distance does not exceed (r). Three notions of expansion are considered: node expansion (\Phi), edge expansion, and a “content” expansion that captures how often a single memory cell is accessed across queries and updates.

The core technical result for deterministic data structures is the inequality ((St/n)^t > \Phi), where (n) is the number of stored points, (S) the number of memory cells, and (t) the number of cell probes per query. This inequality follows from an information‑theoretic argument: each query can extract at most (t\log S) bits of information, while the expansion (\Phi) forces the algorithm to distinguish at least (\Phi) times more possible neighborhoods than the size of the queried set. Consequently, any deterministic (t)-probe structure must use space (S = \Omega!\bigl(n,\Phi^{1/t}/t\bigr)).

For randomized algorithms the authors invoke Yao’s minimax principle and a distributed‑communication viewpoint. By averaging over a suitable distribution of queries, they replace node expansion with edge expansion (or an average‑case variant) and obtain essentially the same bound, albeit with a slightly weaker constant factor. Approximate NNS, where the search radius is inflated by a factor (c>1), is handled by scaling the graph radius accordingly; the same expansion‑based inequality holds with (\Phi) computed for the enlarged radius.

A major contribution is the treatment of dynamic, low‑contention data structures. “Low contention” means that no single memory cell is probed more than a constant number of times across all queries and updates. Under this restriction, the authors prove a tight time‑space trade‑off: if a structure performs (t) probes per query and (u) probes per update, then (S\cdot(t+u) = \Omega!\bigl(n,\Phi^{1/(t+u)}\bigr)). This bound is essentially optimal and improves upon earlier static results by a factor that depends on the total number of probes per operation.

To demonstrate the power of the framework, the paper computes the expansion parameters for the classic (\ell_1), (\ell_2), and (\ell_\infty) metrics. In each case the expansion grows exponentially with the dimension (d) (e.g., (\Phi = \Theta((1+1/r)^d)) for (\ell_1) and (\ell_2), and (\Phi = \Theta(2^d)) for (\ell_\infty)). Plugging these values into the generic inequality reproduces all previously known lower bounds for these spaces, including the classic (S = \Omega!\bigl(n^{1-1/t}\bigr)) bound for constant‑probe deterministic structures.

Overall, the work reduces the problem of proving cell‑probe lower bounds for NNS to a purely combinatorial question: compute the appropriate expansion of the distance‑threshold graph. This unifies earlier approaches based on encoding arguments and communication complexity, and opens a clear path for deriving lower bounds in new metric spaces simply by analyzing their expansion properties. Future directions include extending the framework to non‑symmetric distances, metrics with locally varying expansion, and developing algorithmic tools for estimating expansion in high‑dimensional or data‑dependent settings.