RPWithPrior: Label Differential Privacy in Regression
With the wide application of machine learning techniques in practice, privacy preservation has gained increasing attention. Protecting user privacy with minimal accuracy loss is a fundamental task in the data analysis and mining community. In this paper, we focus on regression tasks under $ε$-label differential privacy guarantees. Some existing methods for regression with $ε$-label differential privacy, such as the RR-On-Bins mechanism, discretized the output space into finite bins and then applied RR algorithm. To efficiently determine these finite bins, the authors rounded the original responses down to integer values. However, such operations does not align well with real-world scenarios. To overcome these limitations, we model both original and randomized responses as continuous random variables, avoiding discretization entirely. Our novel approach estimates an optimal interval for randomized responses and introduces new algorithms designed for scenarios where a prior is either known or unknown. Additionally, we prove that our algorithm, RPWithPrior, guarantees $ε$-label differential privacy. Numerical results demonstrate that our approach gets better performance compared with the Gaussian, Laplace, Staircase, and RRonBins, Unbiased mechanisms on the Communities and Crime, Criteo Sponsored Search Conversion Log, California Housing datasets.
💡 Research Summary
The paper addresses the problem of providing ε‑label differential privacy (DP) for regression tasks, where the label (response) is a continuous variable. Existing approaches such as RR‑on‑Bins and its unbiased variant first discretize the continuous label space into a finite set of bins, often by rounding responses to integers, and then apply a randomized response (RR) mechanism. This discretization introduces storage and computational overhead, and more importantly, it misaligns with the natural continuity of many real‑world regression targets.
To overcome these drawbacks, the authors propose a fundamentally different framework that treats both the original responses Y and the privatized responses (\tilde Y) as continuous random variables. The core idea is to identify an optimal interval (I=
Comments & Academic Discussion
Loading comments...
Leave a Comment