An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We prove an optimal $\Omega(n)$ lower bound on the randomized communication complexity of the much-studied Gap-Hamming-Distance problem. As a consequence, we obtain essentially optimal multi-pass space lower bounds in the data stream model for a number of fundamental problems, including the estimation of frequency moments. The Gap-Hamming-Distance problem is a communication problem, wherein Alice and Bob receive $n$-bit strings $x$ and $y$, respectively. They are promised that the Hamming distance between $x$ and $y$ is either at least $n/2+\sqrt{n}$ or at most $n/2-\sqrt{n}$, and their goal is to decide which of these is the case. Since the formal presentation of the problem by Indyk and Woodruff (FOCS, 2003), it had been conjectured that the naive protocol, which uses $n$ bits of communication, is asymptotically optimal. The conjecture was shown to be true in several special cases, e.g., when the communication is deterministic, or when the number of rounds of communication is limited. The proof of our aforementioned result, which settles this conjecture fully, is based on a new geometric statement regarding correlations in Gaussian space, related to a result of C. Borell (1985). To prove this geometric statement, we show that random projections of not-too-small sets in Gaussian space are close to a mixture of translated normal variables.

💡 Research Summary

The paper establishes an optimal Ω(n) lower bound on the randomized communication complexity of the Gap‑Hamming‑Distance (GHD) problem, confirming that the naïve protocol using n bits of communication is asymptotically optimal. In GHD, Alice and Bob receive n‑bit strings x and y and must decide whether the Hamming distance d(x,y) is at least n/2 + √n or at most n/2 − √n, under the promise that one of these two cases holds. Prior work had shown Ω(n) lower bounds for deterministic protocols and for randomized protocols with a bounded number of rounds, but the general randomized case remained open.

The authors’ proof proceeds in two conceptual stages. First, they reduce the discrete problem to a continuous one in Gaussian space. By mapping bits to ±1 and interpreting the resulting vectors as samples from the standard n‑dimensional Gaussian distribution, the Hamming distance becomes a linear function of the inner product ⟨X,Y⟩: specifically, d(x,y)=n/2 − ½⟨X,Y⟩. Consequently, distinguishing the two gap cases is equivalent to testing whether ⟨X,Y⟩ exceeds a certain threshold.

Second, they prove a new geometric statement about random one‑dimensional projections of “not‑too‑small” subsets of Gaussian space. For any measurable set A with Gaussian measure at least a constant, the distribution of the projection ⟨v,X⟩ (where v is a random unit vector and X is drawn uniformly from A) is close to a mixture of centered normal distributions with a fixed variance. This result extends a theorem of C. Borell (1985) on Gaussian isoperimetry and noise stability, and its proof combines log‑concavity of Gaussian measures, surface‑area bounds, and a martingale argument to control mean and variance.

Applying this projection theorem to GHD, the authors show that if a randomized protocol communicated o(n) bits, then the information each player can convey about the projection of their input would be too noisy to separate the two distance regimes with error probability below 1/3. In other words, any protocol with communication less than a linear fraction of n would lead to a contradiction with the statistical distance between the two projected distributions. Hence, any randomized protocol must use Ω(n) bits, matching the trivial upper bound.

Beyond communication complexity, the result translates directly into space lower bounds for data‑stream algorithms. GHD reduces to several fundamental streaming problems, including estimation of the second frequency moment (F₂), ℓ₁‑norm approximation, and cut‑size estimation in graph streams. Therefore, the Ω(n) communication lower bound implies that any multi‑pass streaming algorithm that solves these problems with constant error must use Ω(n) memory, even when allowed multiple passes over the data. This improves upon earlier Ω(√n) space lower bounds and shows that many existing ℓ₂‑sketch based algorithms are essentially optimal.

The paper concludes with several avenues for future work: extending Borell‑type correlation inequalities to broader log‑concave measures, applying the Gaussian projection technique to other gap problems such as Gap‑Inner‑Product, and refining the relationship between round complexity, pass complexity, and memory usage in streaming models. Overall, the work settles a long‑standing conjecture about GHD, introduces a powerful geometric tool for analyzing Gaussian projections, and deepens our understanding of the intrinsic trade‑offs between communication, randomness, and space in high‑dimensional computational problems.

An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance

💡 Research Summary

Comments & Academic Discussion

Leave a Comment