Private Learning and Sanitization: Pure vs. Approximate Differential Privacy
We compare the sample complexity of private learning [Kasiviswanathan et al. 2008] and sanitization~[Blum et al. 2008] under pure $\epsilon$-differential privacy [Dwork et al. TCC 2006] and approximate $(\epsilon,\delta)$-differential privacy [Dwork et al. Eurocrypt 2006]. We show that the sample complexity of these tasks under approximate differential privacy can be significantly lower than that under pure differential privacy. We define a family of optimization problems, which we call Quasi-Concave Promise Problems, that generalizes some of our considered tasks. We observe that a quasi-concave promise problem can be privately approximated using a solution to a smaller instance of a quasi-concave promise problem. This allows us to construct an efficient recursive algorithm solving such problems privately. Specifically, we construct private learners for point functions, threshold functions, and axis-aligned rectangles in high dimension. Similarly, we construct sanitizers for point functions and threshold functions. We also examine the sample complexity of label-private learners, a relaxation of private learning where the learner is required to only protect the privacy of the labels in the sample. We show that the VC dimension completely characterizes the sample complexity of such learners, that is, the sample complexity of learning with label privacy is equal (up to constants) to learning without privacy.
💡 Research Summary
This paper investigates the sample complexity of two fundamental tasks in differential privacy—private learning and sanitization—under two privacy notions: pure ε‑differential privacy (δ = 0) and approximate (ε, δ)-differential privacy (δ > 0). The authors demonstrate that allowing a negligible failure probability δ can dramatically reduce the number of samples required for learning and sanitizing several basic concept classes, establishing a clear separation between the pure and approximate settings.
The work introduces a new class of optimization problems called Quasi‑Concave Promise Problems (QCPP). In a QCPP, the solution space is totally ordered and the quality function is quasi‑concave: if two solutions f ≤ h both achieve quality at least X, then any intermediate solution g with f ≤ g ≤ h also achieves quality at least X. The goal is to find a solution whose quality is within a (1‑γ) factor of the promised optimum, assuming such an optimum exists. The authors show that a QCPP can be privately approximated by recursively solving a smaller instance of the same problem, which leads to an efficient private algorithm for a broad family of tasks.
Two technical tools are built around QCPPs:
-
Choosing Mechanism – a variant of the exponential mechanism that works with “bounded‑growth” quality functions. For such functions, the number of distinct quality values grows only logarithmically with the size of the solution space, allowing the exponential mechanism to operate with a dramatically reduced database size.
-
Recursive QCPP Solver – by partitioning the ordered solution space and applying the Choosing Mechanism on each sub‑interval, the algorithm obtains a high‑quality solution while consuming only a polylogarithmic number of samples.
Using these tools, the paper constructs concrete private learners and sanitizers:
-
Point Functions (POINTₙ) – Under pure privacy, learning POINTₙ requires Ω(n) samples. The authors show that with (ε, δ)-privacy, O(log (1/δ)) samples suffice, essentially independent of the domain size. This yields an exponential separation when δ is set to 2^{‑o(n)}.
-
Threshold Functions (THRESHₙ) – Pure privacy again demands Ω(n) samples. By formulating learning THRESHₙ as a QCPP and applying the recursive solver, the authors achieve a learner with sample complexity O(log* n)·poly(1/ε, log (1/δ)). This improves upon prior pure‑privacy bounds by a factor of roughly n.
-
Axis‑Aligned Rectangles in d Dimensions – The paper extends the QCPP approach to high‑dimensional axis‑aligned rectangles, obtaining a private learner with sample complexity Õ(log d)·poly(1/ε, log (1/δ)), far below the Ω(d) requirement of pure privacy.
-
Sanitizers for POINTₙ and THRESHₙ – Analogous constructions yield (ε, δ)-private sanitizers whose sample complexity grows only logarithmically in 1/δ, contrasting with the O(VC·log|X|) bound for pure‑privacy sanitizers.
Beyond these constructions, the authors study label‑privacy, a relaxed model where only the labels of the training examples must remain private. They prove that in this setting the VC dimension exactly characterizes the sample complexity: learning with label privacy requires Θ(VC/α) samples (α being the target error), matching the non‑private case up to constant factors.
The paper also establishes a reduction from private learning to sanitization, showing that any private learner for a class C can be turned into a sanitizer for C with only a modest increase in sample size. This connection is used to prove lower bounds: there exist explicit concept classes for which any pure‑privacy sanitizer must use Ω(VC·log|X|) samples, confirming that the gap between pure and approximate privacy is not merely an artifact of specific constructions.
Overall, the contributions are threefold:
-
Theoretical Separation – Demonstrating that approximate differential privacy can achieve exponentially lower sample complexity than pure privacy for natural learning and sanitization tasks.
-
Algorithmic Framework – Introducing QCPPs and the Choosing Mechanism as general tools for designing efficient private algorithms under approximate privacy.
-
Characterization of Label‑Privacy – Showing that VC dimension fully determines the sample complexity when only labels need protection.
The paper concludes with open problems, notably extending the QCPP framework to more complex hypothesis classes (e.g., half‑spaces), and establishing lower bounds for approximate‑privacy learners, which remain largely unknown. The results suggest that approximate differential privacy is a powerful and practical alternative to pure privacy, offering substantial gains in data efficiency while preserving strong privacy guarantees.
Comments & Academic Discussion
Loading comments...
Leave a Comment