Sequential Auditing for f-Differential Privacy
We present new auditors to assess Differential Privacy (DP) of an algorithm based on output samples. Such empirical auditors are common to check for algorithmic correctness and implementation bugs. Most existing auditors are batch-based or targeted toward the traditional notion of $(\varepsilon,δ)$-DP; typically both. In this work, we shift the focus to the highly expressive privacy concept of $f$-DP, in which the entire privacy behavior is captured by a single tradeoff curve. Our auditors detect violations across the full privacy spectrum with statistical significance guarantees, which are supported by theory and simulations. Most importantly, and in contrast to prior work, our auditors do not require a user-specified sample size as an input. Rather, they adaptively determine a near-optimal number of samples needed to reach a decision, thereby avoiding the excessively large sample sizes common in many auditing studies. This reduction in sampling cost becomes especially beneficial for expensive training procedures such as DP-SGD. Our method supports both whitebox and blackbox settings and can also be executed in single-run frameworks.
💡 Research Summary
The paper introduces a novel sequential auditing framework for assessing differential privacy (DP) guarantees, specifically targeting the expressive f‑DP (functional DP) notion. Traditional DP auditors focus on the classic (ε,δ)‑DP definition and rely on a fixed, user‑specified sample size, which often leads to excessive computational costs, especially for expensive training procedures like DP‑SGD. In contrast, the proposed method automatically determines the near‑optimal number of samples required to reach a statistically significant decision, eliminating the need for a priori sample‑size selection.
The core insight is to recast DP auditing as a binary classification problem between the output distributions of a mechanism on neighboring datasets. f‑DP characterizes the entire privacy trade‑off through a convex, non‑increasing function f(α) that maps false‑positive rates (α) to the minimal achievable false‑negative rates (β). By estimating the optimal classifier’s error pair (α̂,β̂) from observed outputs, one can directly compare the empirical trade‑off curve to the claimed f‑DP curve.
The sequential test proceeds in two stages. First, after each new observation, the auditor updates an optimal (or near‑optimal) classifier using a newly proposed regularization and hyper‑parameter tuning scheme that remains stable even with small batches. Second, the estimated error rates are fed into a cumulative log‑likelihood‑ratio statistic. This statistic is compared against pre‑computed upper and lower boundaries that correspond to a user‑chosen significance level γ (e.g., 0.05). If the boundary is crossed, the null hypothesis “the mechanism satisfies the claimed f‑DP” is rejected; otherwise, sampling continues. The authors prove (Theorem 3.1) that the false‑rejection probability never exceeds γ and that the expected sample size is within a logarithmic factor of the theoretical minimum n_min, which would be required if the effect size were known in advance.
The framework supports both black‑box and white‑box settings. In black‑box mode, only input‑output pairs are needed to train the classifier; in white‑box mode, additional knowledge such as noise distribution parameters can be incorporated for tighter estimates. The method also works in a single‑run scenario, where the mechanism is executed once and the auditor reuses the same outputs, dramatically reducing overall cost.
Empirical evaluation includes synthetic experiments that validate the ability to recover the true f‑DP trade‑off curve, as well as real‑world tests on DP‑SGD trained image classifiers. Compared to prior fixed‑sample auditors, the sequential approach achieves the same statistical power while using only 10 %–25 % of the samples. A detailed comparison with a contemporaneous work that applies sequential testing to (ε,δ)‑DP shows superior sample efficiency and broader applicability, as the new method does not require the high‑privacy regime (ε ≪ 1).
Finally, the authors release an open‑source implementation, providing a generic sequential wrapper that can be attached to any existing f‑DP auditor. This contribution lowers the barrier to rigorous DP verification, making privacy audits more practical for large‑scale machine‑learning pipelines.
Comments & Academic Discussion
Loading comments...
Leave a Comment