Achieving Dalenius Goal of Data Privacy with Practical Assumptions

Achieving Dalenius Goal of Data Privacy with Practical Assumptions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Current differential privacy frameworks face significant challenges: vulnerability to correlated data attacks and suboptimal utility-privacy tradeoffs. To address these limitations, we establish a novel information-theoretic foundation for Dalenius’ privacy vision using Shannon’s perfect secrecy framework. By leveraging the fundamental distinction between cryptographic systems (small secret keys) and privacy mechanisms (massive datasets), we replace differential privacy’s restrictive independence assumption with practical partial knowledge constraints ($H(X) \geq b$). We propose an information privacy framework achieving Dalenius security with quantifiable utility-privacy tradeoffs. Crucially, we prove that foundational mechanisms – random response, exponential, and Gaussian channels – satisfy Dalenius’ requirements while preserving group privacy and composition properties. Our channel capacity analysis reduces infinite-dimensional evaluations to finite convex optimizations, enabling direct application of information-theoretic tools. Empirical evaluation demonstrates that individual channel capacity (maximal information leakage of each individual) decreases with increasing entropy constraint $b$, and our framework achieves superior utility-privacy tradeoffs compared to classical differential privacy mechanisms under equivalent privacy guarantees. The framework is extended to computationally bounded adversaries via Yao’s theory, unifying cryptographic and statistical privacy paradigms. Collectively, these contributions provide a theoretically grounded path toward practical, composable privacy – subject to future resolution of the tradeoff characterization – with enhanced resilience to correlation attacks.


💡 Research Summary

The paper revisits the original privacy desideratum articulated by Dalenius – that access to a statistical database should not increase an adversary’s knowledge about any individual – and builds a rigorous information‑theoretic framework that can be applied in realistic settings. The authors observe that differential privacy (DP) achieves strong guarantees only under an independence assumption (records are independent) which is rarely satisfied in practice, especially when data exhibit correlations. Moreover, DP’s utility‑privacy trade‑off is often sub‑optimal and does not exploit the “crowd‑blending” effect that larger datasets naturally provide.

To overcome these limitations, the authors propose an “Information Privacy” (IP) model that replaces the independence assumption with a partial‑knowledge constraint on the adversary: the entropy of the whole dataset must satisfy H(X) ≥ b for some positive constant b. This reflects the practical reality that an attacker cannot possess near‑perfect prior knowledge of a massive dataset. Under this constraint, Dalenius security is defined as a bound on the mutual information between any individual record X_i and the released output Y, i.e., I(X_i; Y) ≤ ε. The parameter ε plays the same role as the privacy budget in DP, with ε → 0 corresponding to perfect secrecy.

A central technical contribution is the reduction of the infinite‑dimensional problem of computing individual channel capacity (the maximal information leakage per individual) to a finite‑dimensional convex optimization problem. The authors define the individual channel capacity C_{b,1} = max_i max_{p∈Δ_b} I(X_i; Y), where Δ_b denotes the set of prior distributions with entropy at least b. By showing that the optimal prior can be restricted to a low‑dimensional family, they make the capacity computable for concrete mechanisms.

The paper then proves that three canonical DP mechanisms—Random Response, the Exponential Mechanism, and the Gaussian Mechanism—satisfy the Dalenius security condition under the entropy constraint. For each mechanism, the authors derive explicit relationships between the mechanism’s parameters (flip probability, utility‑sensitivity ratio, noise variance) and the resulting mutual information bound, demonstrating that group privacy and composition properties hold without requiring independence.

Beyond the information‑theoretic setting, the authors extend the framework to computationally bounded adversaries by invoking Yao’s definition of semantic security. This yields a unified model that simultaneously captures Shannon‑style perfect secrecy, Dalenius‑style statistical privacy, and cryptographic computational security. Consequently, the IP framework can be viewed as a bridge between classical cryptography and modern statistical privacy.

Empirical evaluation on synthetic and correlated datasets confirms the theoretical predictions. As the entropy lower bound b increases, the measured individual channel capacity decreases, indicating stronger privacy. When compared against standard DP mechanisms calibrated to the same ε, the IP mechanisms achieve higher utility (e.g., lower mean‑squared error for numeric queries) while remaining robust to correlation attacks that typically break DP.

In summary, the paper makes five major contributions: (1) it formalizes a practical partial‑knowledge adversary model that replaces DP’s independence assumption; (2) it introduces a tractable method for evaluating individual channel capacity via convex optimization; (3) it establishes Dalenius security for foundational DP mechanisms, preserving group privacy and composability; (4) it unifies information‑theoretic and computational privacy through Yao’s theory; and (5) it provides experimental evidence of superior utility‑privacy trade‑offs and resilience to correlated attacks. The work opens a path toward privacy mechanisms that achieve Dalenius’s original vision in real‑world settings, while offering clear directions for future research on the exact form of the utility‑privacy balance function g(b) and extensions to more complex data domains.


Comments & Academic Discussion

Loading comments...

Leave a Comment