Fast Private Data Release Algorithms for Sparse Queries
We revisit the problem of accurately answering large classes of statistical queries while preserving differential privacy. Previous approaches to this problem have either been very general but have not had run-time polynomial in the size of the database, have applied only to very limited classes of queries, or have relaxed the notion of worst-case error guarantees. In this paper we consider the large class of sparse queries, which take non-zero values on only polynomially many universe elements. We give efficient query release algorithms for this class, in both the interactive and the non-interactive setting. Our algorithms also achieve better accuracy bounds than previous general techniques do when applied to sparse queries: our bounds are independent of the universe size. In fact, even the runtime of our interactive mechanism is independent of the universe size, and so can be implemented in the “infinite universe” model in which no finite universe need be specified by the data curator.
💡 Research Summary
The paper addresses the long‑standing challenge of answering a massive number of statistical queries under differential privacy while keeping the computation time sub‑linear in the size of the data universe. Existing general‑purpose mechanisms (e.g., the Laplace mechanism, multiplicative‑weights based releases) either require time linear in |X| (which can be exponential in the database size n) or are restricted to very simple query families such as low‑dimensional intervals or conjunctions with a constant number of literals.
The authors focus on sparse queries, defined as linear queries whose support (the set of universe elements with non‑zero weight) contains at most m items, where m is polynomial in n. Although each individual query touches only a small part of the universe, the collection of all m‑sparse queries can still be astronomically large (≈|X|^m) and has VC‑dimension Θ(m). This model captures realistic scenarios such as searching for rare diseases, checking overlap between participants of many studies, or any analyst who has prior knowledge that only a limited subset of the population is relevant.
Two algorithms are presented:
- Interactive mechanism – a variant of the multiplicative‑weights (MW) framework implemented via an Iterative Database Construction (IDC). The key technical trick is to avoid maintaining a weight for every element of X. Instead, a much smaller “virtual” universe bX of size B is introduced, where B is chosen to satisfy B ≥ m·log B / α² (α is the target error). When a query arrives, only the m items in its support are temporarily mapped to slots in bX; a permanent mapping is created only when the MW update actually modifies the corresponding weight. Because the MW algorithm needs at most O(log B / α²) updates before achieving error α, the total number of permanent slots ever allocated never exceeds B. Consequently the per‑query running time is Õ(m / α²) and does not depend on |X| at all. The privacy‑accuracy trade‑off is
\
Comments & Academic Discussion
Loading comments...
Leave a Comment