Fast-MWEM: Private Data Release in Sublinear Time
The Multiplicative Weights Exponential Mechanism (MWEM) is a fundamental iterative framework for private data analysis, with broad applications such as answering $m$ linear queries, or privately solving systems of $m$ linear constraints. However, a critical bottleneck hindering its scalability is the $Θ(m)$ time complexity required to execute the exponential mechanism in each iteration. We introduce a modification to the MWEM framework that improves the per-iteration runtime dependency to $Θ(\sqrt{m})$ in expectation. This is done via a lazy sampling approach to the Report-Noisy-Max mechanism, which we implement efficiently using Gumbel noise and a $k$-Nearest Neighbor data structure. This allows for the rapid selection of the approximate score in the exponential mechanism without an exhaustive linear scan. We apply our accelerated framework to the problems of private linear query release and solving Linear Programs (LPs) under neighboring constraint conditions and low-sensitivity assumptions. Experimental evaluation confirms that our method provides a substantial runtime improvement over classic MWEM.
💡 Research Summary
The paper tackles a fundamental scalability bottleneck in the Multiplicative Weights Exponential Mechanism (MWEM), a cornerstone algorithm for differentially private data analysis. In the classic MWEM, each iteration requires evaluating the quality score of every candidate (query or constraint) and then running the exponential mechanism (EM) to select a high‑scoring candidate. This naïve implementation incurs a linear $Θ(m)$ cost per iteration, where $m$ is the number of candidates, making the algorithm impractical for large‑scale workloads with thousands or millions of queries.
The authors propose Fast‑MWEM, a modification that reduces the expected per‑iteration cost from $Θ(m)$ to $Θ(\sqrt{m})$. The key insight is that in many DP applications the score of a candidate can be expressed as an inner product between a static vector (the query or constraint) and a dynamic vector (the current difference between the true data histogram $h$ and the synthetic histogram $p$ maintained by the MWU component). Formally, the score is $s_i = |\langle q_i, h-p\rangle|$.
Recognizing this structure, the authors reduce the problem of finding the top‑$k$ scores to a Maximum Inner Product Search (MIPS) problem with $k = \sqrt{m}$. Efficient approximate MIPS solvers—such as Locality Sensitive Hashing (LSH), Inverted File (IVF), and Hierarchical Navigable Small World graphs (HNSW)—can retrieve the $k$ largest inner products in sublinear time. By building a $k$‑MIPS index on the fixed query set $Q$ once at the beginning, each iteration can query the index with the current vector $h-p$ and obtain the set $S_{\sqrt{m}}$ of the $\sqrt{m}$ most promising candidates.
Having isolated $S_{\sqrt{m}}$, the algorithm replaces the full‑vector Gumbel‑Max trick (which would still require $O(m)$ random draws) with lazy Gumbel sampling, a technique introduced by Mussmann et al. (2017). The lazy approach samples Gumbel noise only for the candidates in $S_{\sqrt{m}}$, computes their noisy scores, and then performs a probabilistic correction for the remaining $m-\sqrt{m}$ candidates using a Binomial draw and uniform sampling of transformed Gumbel variables. This yields a distribution that is provably close to the exact exponential mechanism distribution; the deviation contributes at most an additive $1/m$ term to the overall privacy loss.
Theoretical analysis shows that, assuming an exact $k$‑MIPS oracle that runs in $O(k)$ time, each iteration of Fast‑MWEM runs in $Θ(|\mathcal{X}|\sqrt{m})$ where $|\mathcal{X}|$ is the domain size (the cost of updating the MWU weights). When the oracle is only approximate, the authors bound the probability of failure and incorporate it into the privacy accounting, resulting in an overall guarantee of $(\varepsilon, \delta + 1/m)$‑DP. The utility guarantees (error bounds on query answers) match those of the original MWEM, because the lazy EM step samples from a distribution that is statistically indistinguishable from the exact EM distribution up to the negligible failure probability.
The framework is instantiated on two canonical DP tasks:
-
Private Linear Query Release – answering a collection $Q$ of $m$ linear queries on a dataset of size $n$. Fast‑MWEM reduces the per‑iteration cost from $O(|\mathcal{X}|m)$ to $O(|\mathcal{X}|\sqrt{m})$, yielding a total runtime improvement of roughly a factor $\sqrt{m}$.
-
Private Linear Programming – solving LPs of the form $\max_{x\in\mathbb{R}^d} c^\top x$ subject to $Ax\le b$, where $A$ has $m$ rows. Two privacy models are considered: scalar‑private low‑sensitivity LPs (where only $b$ changes between neighboring databases) and constraint‑private LPs (where one constraint may be added or removed). Fast‑MWEM achieves $O(d\sqrt{m})$ per‑iteration time for the former and $O(m\sqrt{d})$ for the latter, compared with $O(dm)$ in the classic approach.
Empirical evaluation on synthetic benchmarks confirms the theoretical speedups. Using IVF and HNSW indices, Fast‑MWEM attains 15–30× faster runtimes across a range of $m$ values (from $10^3$ to $10^5$) while preserving the same maximum absolute error and the same $(\varepsilon,\delta)$ privacy parameters as the baseline. The experiments also demonstrate that the overhead of building the $k$‑MIPS index is amortized over the many MWEM iterations, and that the algorithm scales gracefully with both the number of queries and the domain size.
In summary, Fast‑MWEM introduces a principled, sublinear‑time implementation of the exponential mechanism by exploiting inner‑product structure, approximate nearest‑neighbor search, and lazy Gumbel sampling. It retains the strong differential privacy and utility guarantees of the original MWEM while delivering orders‑of‑magnitude runtime improvements, thereby making MWEM‑based private data analysis feasible for large‑scale real‑world workloads. Future directions include extending the technique to non‑linear query families, improving high‑dimensional MIPS indexing, and integrating the approach with other DP primitives such as the private multiplicative weights for histograms or private stochastic gradient descent.
Comments & Academic Discussion
Loading comments...
Leave a Comment