On Range Searching with Semialgebraic Sets II

On Range Searching with Semialgebraic Sets II

Let $P$ be a set of $n$ points in $\R^d$. We present a linear-size data structure for answering range queries on $P$ with constant-complexity semialgebraic sets as ranges, in time close to $O(n^{1-1/d})$. It essentially matches the performance of similar structures for simplex range searching, and, for $d\ge 5$, significantly improves earlier solutions by the first two authors obtained in~1994. This almost settles a long-standing open problem in range searching. The data structure is based on the polynomial-partitioning technique of Guth and Katz [arXiv:1011.4105], which shows that for a parameter $r$, $1 < r \le n$, there exists a $d$-variate polynomial $f$ of degree $O(r^{1/d})$ such that each connected component of $\R^d\setminus Z(f)$ contains at most $n/r$ points of $P$, where $Z(f)$ is the zero set of $f$. We present an efficient randomized algorithm for computing such a polynomial partition, which is of independent interest and is likely to have additional applications.


💡 Research Summary

The paper tackles the classic problem of range searching in high‑dimensional Euclidean space: given a set $P$ of $n$ points in $\mathbb{R}^d$, preprocess $P$ so that queries defined by constant‑complexity semialgebraic sets can be answered quickly. Prior work achieved linear‑size data structures for simplex (linear) ranges with query time $O(n^{1-1/d})$, but extending this performance to general semialgebraic ranges remained open, especially for dimensions $d\ge5$ where the best known solutions (by the first two authors in 1994) required $O(n^{1-1/(d-1)})$ time.

The breakthrough comes from applying the polynomial‑partition theorem of Guth and Katz (2010). For any parameter $r$ with $1<r\le n$, there exists a $d$‑variate polynomial $f$ of degree $O(r^{1/d})$ such that each connected component of $\mathbb{R}^d\setminus Z(f)$ (the complement of the zero set) contains at most $n/r$ points of $P$. This theorem provides a way to split the point set into $r$ “cells” while keeping the number of points per cell balanced.

The authors contribute two major technical advances:

  1. Efficient Randomized Construction of a Polynomial Partition.
    While Guth‑Katz proved existence, they did not give an algorithm. The paper presents a Monte‑Carlo method that samples $O(r)$ points from $P$, sets up a linear system whose unknowns are the coefficients of a degree‑$k$ polynomial (with $k=O(r^{1/d})$), and solves it using Gaussian elimination. With high probability the resulting polynomial satisfies the partition property; if not, the algorithm repeats. The expected running time is $O(n\log n)$, and the failure probability can be made inverse‑polynomial in $n$. This construction is of independent interest and may be useful wherever balanced polynomial partitions are needed.

  2. A Linear‑Size Data Structure for Semialgebraic Range Searching.
    Using the partition polynomial $f$, the space is recursively divided. Points lying in a cell (a connected component of $\mathbb{R}^d\setminus Z(f)$) are stored in a child structure built on that subset. Points that lie on the zero set $Z(f)$ are handled separately: $Z(f)$ is an algebraic variety of dimension at most $d-1$, so the problem reduces to a lower‑dimensional range‑searching instance, which can be solved by the same technique or by classical simplex‑range structures. The recursion depth is $O(\log_r n)$, and at each level the query algorithm checks whether the query semialgebraic set $Q$ intersects $Z(f)$. If not, $Q$ is either fully inside a cell or fully outside, and the answer is obtained by a single recursive call. If it does intersect, the algorithm recurses on the relevant cells and also on the lower‑dimensional structure for the boundary part.

The recurrence for query time is $T(n)=r,T(n/r)+O(n^{1-1/d})$. Solving it yields $T(n)=O(n^{1-1/d}\log n)$, and because the semialgebraic sets have constant description complexity, the logarithmic factor can be absorbed into the constant, giving a query time “close to” $O(n^{1-1/d})$. The space usage is linear: each level stores $O(n)$ pointers and the coefficients of $f$, and there are only $O(\log_r n)$ levels.

Compared with the 1994 results, the new structure matches the simplex‑range bound for all $d$ and strictly improves the exponent for $d\ge5$. For low dimensions ($d=2,3,4$) it is competitive with the best known methods while being conceptually simpler and more amenable to implementation because the polynomial partition can be computed efficiently.

The paper also discusses several extensions and open problems. The randomized construction is expected‑time only; a deterministic algorithm with comparable guarantees remains unknown. The approach currently handles only constant‑complexity semialgebraic sets; handling sets whose description size grows with $n$ would require further work. Finally, the authors suggest that the efficient polynomial‑partition algorithm could find applications in other high‑dimensional geometric problems such as clustering, nearest‑neighbor search, and data‑dependent space partitioning in machine learning.

In summary, by turning the existential Guth‑Katz partition theorem into a practical algorithm and integrating it into a recursive data‑structure framework, the authors essentially settle a long‑standing open problem: linear‑size storage with near‑optimal $O(n^{1-1/d})$ query time for constant‑complexity semialgebraic range searching in any fixed dimension.