Pseudo-random graphs and bit probe schemes with one-sided error
We study probabilistic bit-probe schemes for the membership problem. Given a set A of at most n elements from the universe of size m we organize such a structure that queries of type “Is x in A?” can be answered very quickly. H.Buhrman, P.B.Miltersen, J.Radhakrishnan, and S.Venkatesh proposed a bit-probe scheme based on expanders. Their scheme needs space of $O(n\log m)$ bits, and requires to read only one randomly chosen bit from the memory to answer a query. The answer is correct with high probability with two-sided errors. In this paper we show that for the same problem there exists a bit-probe scheme with one-sided error that needs space of $O(n\log^2 m+\poly(\log m))$ bits. The difference with the model of Buhrman, Miltersen, Radhakrishnan, and Venkatesh is that we consider a bit-probe scheme with an auxiliary word. This means that in our scheme the memory is split into two parts of different size: the main storage of $O(n\log^2 m)$ bits and a short word of $\log^{O(1)}m$ bits that is pre-computed once for the stored set A and `cached’. To answer a query “Is x in A?” we allow to read the whole cached word and only one bit from the main storage. For some reasonable values of parameters our space bound is better than what can be achieved by any scheme without cached data.
💡 Research Summary
The paper addresses the static membership problem: given a set A of at most n elements drawn from a universe of size m, we must store A so that queries “Is x in A?” can be answered extremely quickly. The classic Buhrman‑Miltersen‑Radhakrishnan‑Venkatesh (BMRV) scheme shows that with two‑sided error one can achieve O(n log m) bits of storage while probing a single random bit of memory per query. However, for one‑sided error (false positives only) the same space bound is impossible; the known lower bound is Ω(n log² m).
The authors propose a new model that splits the data structure into two parts: a large “main” storage B of size O(n log² m) bits and a tiny “cached” auxiliary word C of size poly(log m) bits. The auxiliary word is pre‑computed for the specific set A and stored in fast cache memory. To answer a query, the algorithm reads the entire cached word C (and the query element x) to compute a deterministic position in B, then reads a single bit from B. If x ∈ A the answer is always correct; if x ∉ A the answer is “yes” with probability at most ε, for any fixed ε > 0. Thus the scheme has one‑sided error and uses only one bit probe to the large storage.
The construction relies on highly unbalanced bipartite expander graphs. A graph G = (L,R,E) is an (m,s,d,k,δ)‑expander if |L| = m, each left vertex has degree d, and every left subset A of size ≤ k has at least (1‑δ)d|A| distinct neighbors in R. Such expanders guarantee that for any small set A the number of left vertices whose neighborhoods overlap heavily with Γ(A) is limited, which translates into a low false‑positive probability when probing a random neighbor.
BMRV’s original proof only shows that a random d‑regular bipartite graph satisfies the expansion parameters with high probability; it does not give an explicit construction. The authors “naïvely derandomize” this argument by employing a pseudo‑random generator (PRG). They take a short seed (the cached word C) and generate a graph deterministically; the seed is chosen so that, with overwhelming probability, the generated graph meets the required expansion. The expansion property can be tested in AC⁰ or logarithmic space, allowing the use of classic Nisan‑Wigderson generators or Braverman’s poly‑log‑independent functions. Consequently, a single polylog‑size seed suffices to specify a suitable expander, and the seed is stored in the cache.
Space analysis: each element of A is represented by O(log² m) bits in B, yielding total main storage O(n log² m). The cached word contributes only poly(log m) bits, negligible when n ≫ poly(log m). This improves over the Ω(n log² m) lower bound for one‑sided error schemes without cached data, while still providing a one‑bit probe per query.
Encoding (building the data structure) in the basic version takes expected poly(m) time because one must search for a good seed among many candidates. The authors show that by allowing a modest increase in space to n^{1+δ}·poly(log m) (for any constant δ > 0) the encoding can be performed in average time poly(n, log m). Thus the scheme becomes effectively encodable, with query time remaining polylog m.
The paper also discusses practical motivations: modern computers have hierarchical memory (registers, caches, RAM, disks). Even a tiny cache that depends on the stored set can dramatically reduce the number of slow memory accesses. The authors argue that their model captures this trade‑off and that the cache size required (polylog m) is far smaller than the information‑theoretic lower bound log ( m choose n ), ensuring the scheme is non‑trivial.
In summary, the work introduces a novel bit‑probe data structure for static membership with one‑sided error, achieving O(n log² m + polylog m) total space, a single bit probe to the large storage, and a polylog‑size cached seed derived from a pseudo‑random generator. The construction is conceptually simple, relies on a straightforward derandomization of expander existence, and bridges theoretical lower bounds with practical considerations of cache memory. Open problems include further reducing encoding time, tightening the space gap, and extending the approach to dynamic updates.
Comments & Academic Discussion
Loading comments...
Leave a Comment