Local, Private, Efficient Protocols for Succinct Histograms

Local, Private, Efficient Protocols for Succinct Histograms
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We give efficient protocols and matching accuracy lower bounds for frequency estimation in the local model for differential privacy. In this model, individual users randomize their data themselves, sending differentially private reports to an untrusted server that aggregates them. We study protocols that produce a succinct histogram representation of the data. A succinct histogram is a list of the most frequent items in the data (often called “heavy hitters”) along with estimates of their frequencies; the frequency of all other items is implicitly estimated as 0. If there are $n$ users whose items come from a universe of size $d$, our protocols run in time polynomial in $n$ and $\log(d)$. With high probability, they estimate the accuracy of every item up to error $O\left(\sqrt{\log(d)/(\epsilon^2n)}\right)$ where $\epsilon$ is the privacy parameter. Moreover, we show that this much error is necessary, regardless of computational efficiency, and even for the simple setting where only one item appears with significant frequency in the data set. Previous protocols (Mishra and Sandler, 2006; Hsu, Khanna and Roth, 2012) for this task either ran in time $\Omega(d)$ or had much worse error (about $\sqrt[6]{\log(d)/(\epsilon^2n)}$), and the only known lower bound on error was $\Omega(1/\sqrt{n})$. We also adapt a result of McGregor et al (2010) to the local setting. In a model with public coins, we show that each user need only send 1 bit to the server. For all known local protocols (including ours), the transformation preserves computational efficiency.


💡 Research Summary

The paper addresses frequency estimation and heavy‑hitter identification in the local differential privacy (LDP) model, where each user randomizes her own data before sending it to an untrusted server. The authors propose the first polynomial‑time LDP protocol that outputs a succinct histogram—a short list of the most frequent items together with estimated frequencies—while achieving optimal error and minimal communication.

Key contributions are:

  1. Optimal Error – For n users and a domain of size d, the protocol guarantees with high probability an ℓ∞ error of O(√(log d)/(ε² n)) for every item. The authors prove a matching lower bound, showing that any LDP protocol (even without computational constraints) must incur Ω(√(log d)/(ε √n)) error when δ ≤ 1/n. This establishes that the proposed error is information‑theoretically optimal.

  2. Efficient Construction – The algorithm runs in time polynomial in n and log d, dramatically improving over prior work that required Ω(d) time. The construction consists of two layers: (a) a “unique heavy‑hitter” sub‑protocol that recovers a single dominant item using an error‑correcting code and a basic ε‑LDP randomizer; (b) a hashing‑based reduction that partitions the domain into many buckets, each likely containing at most one heavy item, and runs many copies of the sub‑protocol in parallel. Random hashing and compressive‑sensing ideas ensure that the total privacy cost is essentially that of a single copy.

  3. 1‑Bit Communication – In the public‑coin model, the authors adapt a technique of McGregor et al. to transform any LDP protocol into one where each user sends only a single bit. The transformation uses public randomness to select a sample from a fixed distribution and a rejection‑sampling decision based on the user’s input. Privacy is preserved because the acceptance probability is tightly controlled by the LDP guarantee.

  4. Building Blocks – The paper introduces a “basic randomizer” that selects a random coordinate of a user’s encoded vector, adds calibrated Laplace‑type noise, and outputs an unbiased estimator. It also leverages the Johnson‑Lindenstrauss lemma for dimensionality reduction and standard coding‑theory tools for error‑correcting encodings of items.

  5. Theoretical Framework – The lower‑bound proof extends the information‑theoretic framework of Duchi et al., showing that the mutual information between inputs and outputs of any (ε, δ)‑LDP protocol is bounded by O(ε² + δ ε log(d/δ)). This bound yields the Ω(√(log d)/(ε √n)) error lower bound even for δ > 0, as long as δ is not too large.

Overall, the work simultaneously achieves three desiderata that were previously mutually exclusive in the LDP setting: (i) sublinear (in d) computational complexity, (ii) optimal statistical accuracy, and (iii) minimal (1‑bit) communication per user. The results have immediate practical relevance for large‑scale data collection systems such as web browsers, mobile apps, or financial software, where privacy must be guaranteed locally and the domain of possible inputs can be huge. By providing both algorithmic constructions and tight impossibility results, the paper sets a new benchmark for private frequency estimation and heavy‑hitter detection in the local model.


Comments & Academic Discussion

Loading comments...

Leave a Comment