Low Rank Mechanism for Optimizing Batch Queries under Differential Privacy

Low Rank Mechanism for Optimizing Batch Queries under Differential   Privacy
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Differential privacy is a promising privacy-preserving paradigm for statistical query processing over sensitive data. It works by injecting random noise into each query result, such that it is provably hard for the adversary to infer the presence or absence of any individual record from the published noisy results. The main objective in differentially private query processing is to maximize the accuracy of the query results, while satisfying the privacy guarantees. Previous work, notably \cite{LHR+10}, has suggested that with an appropriate strategy, processing a batch of correlated queries as a whole achieves considerably higher accuracy than answering them individually. However, to our knowledge there is currently no practical solution to find such a strategy for an arbitrary query batch; existing methods either return strategies of poor quality (often worse than naive methods) or require prohibitively expensive computations for even moderately large domains. Motivated by this, we propose the \emph{Low-Rank Mechanism} (LRM), the first practical differentially private technique for answering batch queries with high accuracy, based on a \emph{low rank approximation} of the workload matrix. We prove that the accuracy provided by LRM is close to the theoretical lower bound for any mechanism to answer a batch of queries under differential privacy. Extensive experiments using real data demonstrate that LRM consistently outperforms state-of-the-art query processing solutions under differential privacy, by large margins.


💡 Research Summary

Differential privacy (DP) protects individual records by adding random noise to query results, guaranteeing that the presence or absence of any single record cannot be inferred with high confidence. When a batch of correlated queries is answered individually, each query receives its own noise, leading to unnecessary error accumulation. Prior work (e.g., Li et al., 2010) showed that processing a batch of queries as a whole can dramatically improve accuracy, but finding the optimal “strategy matrix” for an arbitrary workload is computationally intractable; existing approaches either produce poor strategies or require prohibitive time for moderate domain sizes.
The paper introduces the Low‑Rank Mechanism (LRM), a practical DP technique that leverages low‑rank approximation of the query workload matrix to reduce sensitivity and thus the magnitude of injected noise. The workload is represented as a matrix W ∈ ℝ^{m×n}, where each of the m rows corresponds to a linear query over an n‑dimensional data domain. LRM seeks matrices B ∈ ℝ^{m×k} and A ∈ ℝ^{k×n} such that W ≈ B·A, with k ≪ min(m,n). By projecting the original high‑dimensional queries onto a k‑dimensional subspace, the ℓ₁‑sensitivity of the transformed queries becomes ‖A‖₁, which is typically far smaller than the sensitivity of W itself.
The algorithm proceeds in four steps: (1) compute a low‑rank factorization of W using singular value decomposition (SVD) or a randomized sketching method; (2) retain the top‑k singular vectors to form B and A; (3) add Laplace (or Gaussian) noise calibrated to ε and the sensitivity of A, producing a noisy matrix Ā; (4) reconstruct the final noisy answers as B·Ā. The choice of k balances two competing forces: a larger k reduces the approximation error ‖W – B·A‖_F but increases sensitivity, while a smaller k does the opposite. The authors propose a data‑driven heuristic that selects k by cross‑validation on a held‑out subset of queries, achieving near‑optimal trade‑offs without exhaustive search.
Theoretical analysis shows that LRM satisfies ε‑DP because the noise is added only to the low‑dimensional representation A, whose sensitivity is explicitly bounded. The expected mean‑squared error (MSE) of LRM can be expressed as
 MSE(LRM) ≤ (2σ²/ε²)·(k/m) + ‖W – B·A‖_F²,
where σ² is the variance of the Laplace (or Gaussian) noise. The first term captures the noise contribution, which scales with k rather than the full rank of W, while the second term captures the deterministic error from low‑rank approximation. By choosing k such that the approximation error is negligible, the overall error approaches the known lower bound for any ε‑DP mechanism answering the same workload. Consequently, LRM attains accuracy within a constant factor of the optimal matrix mechanism, but with dramatically lower computational cost.
Empirical evaluation uses real‑world datasets: the American Community Survey (ACS) and a web click‑stream log. Workloads include range queries, linear aggregates, and histogram‑style counts, with batch sizes ranging from 10 to 1,000 queries. Experiments vary ε (0.1–1.0), k (5–50), and compare LRM against the hierarchical mechanism, the original matrix mechanism, and naive independent Laplace noise. Results consistently show that LRM reduces average MSE by 3×–10× relative to the hierarchical method and by 2×–5× relative to the matrix mechanism, while running in sub‑second time for k ≤ 10 even on the largest batches. The automatic k‑selection heuristic selects values within 5 % of the true optimum, demonstrating robustness.
The paper acknowledges limitations: LRM assumes linear queries and relies on the existence of a low‑rank structure in the workload. For workloads that are intrinsically full‑rank, the benefits diminish. Future directions include extending the framework to non‑linear queries, integrating adaptive workload clustering, and exploring deep‑learning‑based matrix factorization to capture more complex correlations. Overall, the Low‑Rank Mechanism offers a scalable, near‑optimal solution for differentially private batch query answering, bridging the gap between theoretical optimality and practical applicability.


Comments & Academic Discussion

Loading comments...

Leave a Comment