Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks – re-identification, attribute inference, and data reconstruction – are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) consistent across a multitude of attack settings, and (2) tunable, enabling practitioners to evaluate risk with respect to arbitrary, including worst-case, levels of baseline risk. Empirically, our results are tighter than prior methods using $\varepsilon$-DP, Rényi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., an accuracy increase from 52% to 70% in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.


💡 Research Summary

The paper tackles a long‑standing problem in differential privacy (DP): the difficulty of translating abstract privacy parameters (ε, δ, Rényi‑DP, etc.) into concrete, interpretable risks such as re‑identification, attribute inference, and data reconstruction. Existing conversion methods are either overly pessimistic or inconsistent across the three risk categories, leading practitioners to either add excessive noise or underestimate actual privacy guarantees.

The authors adopt the hypothesis‑testing view of DP, known as f‑DP, which characterizes a DP mechanism by a trade‑off function f(α) that lower‑bounds the false‑negative rate of any test with false‑positive rate α. This formulation subsumes (ε, δ)‑DP, Rényi‑DP, and concentrated‑DP, and is closed under post‑processing. Crucially, f‑DP directly links the mechanism’s output distribution to the optimal membership‑inference test, providing a tight, unified description of privacy loss.

Operating under a strong‑adversary model—where the attacker knows the entire dataset except for a single target record and possesses a prior distribution over that record—the paper formalizes three operational risks:

  1. Strong Singling‑Out (SPSO), a re‑identification notion derived from predicate singling‑out, where the attacker must output a predicate that selects the unknown record.
  2. Strong Attribute Inference (SAI), where the attacker predicts a specific attribute of the unknown record.
  3. Strong Reconstruction Robustness (SRR), a general reconstruction attack measured by a loss function and a threshold.

For each risk, the authors define a baseline success probability (the best an attacker can achieve without seeing the DP output) and the actual success probability after observing the DP output. The central theorem shows that for any f‑DP mechanism, the post‑output success is bounded by a single expression:

Success ≤ 1 – f(Baseline).

Thus the same bound simultaneously applies to SPSO, SAI, and SRR. This unification eliminates the need for separate, often loose, conversions for each risk type.

Empirically, the authors evaluate the bound on two fronts. First, they analyze the U.S. 2020 Census data released with ε = 10.6, δ = 10⁻¹⁰. Prior work using (ε, δ)‑DP predicts a worst‑case attribute‑inference advantage of > 99 percentage points (pp), Rényi‑DP yields 73 pp, while the authors’ f‑DP bound gives 52 pp. When the attacker’s prior is a rare disease with prevalence 1 in 10 000, the bound drops to < 0.001 pp, demonstrating that even relatively large ε can provide meaningful protection in realistic settings.

Second, they calibrate DP‑SGD noise for a GPT‑2 sentiment‑classification task. Matching a target risk (e.g., a 5 pp gap between post‑output and baseline success) requires roughly 20 % less Gaussian noise than when using ε‑DP or Rényi‑DP based calibrations. Consequently, test accuracy improves from 52 % to 70 %, an 18‑point gain, illustrating tangible utility benefits.

Beyond the primary results, the paper derives corollaries for generalization error and model memorization under f‑DP, showing that the same trade‑off function governs these learning‑theoretic quantities. All code is released as a Python package (https://github.com/Felipe-Gomez/riskcal), facilitating adoption.

In summary, by leveraging the f‑DP framework, the authors provide a single, tight, and tunable bound that unifies the assessment of re‑identification, attribute inference, and reconstruction risks. This enables practitioners to interpret DP guarantees in terms of concrete threats, reduce unnecessary noise, and achieve higher utility without compromising privacy—a significant step toward making differential privacy practical for real‑world data releases and machine‑learning pipelines.


Comments & Academic Discussion

Loading comments...

Leave a Comment