Learning with Locally Private Examples by Inverse Weierstrass Private Stochastic Gradient Descent

Reading time: 5 minute
...

📝 Original Info

  • Title: Learning with Locally Private Examples by Inverse Weierstrass Private Stochastic Gradient Descent
  • ArXiv ID: 2602.16436
  • Date: 2026-02-18
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았으므로, 원문에 기재된 저자명을 그대로 기입해 주세요. **

📝 Abstract

Releasing data once and for all under noninteractive Local Differential Privacy (LDP) enables complete data reusability, but the resulting noise may create bias in subsequent analyses. In this work, we leverage the Weierstrass transform to characterize this bias in binary classification. We prove that inverting this transform leads to a bias-correction method to compute unbiased estimates of nonlinear functions on examples released under LDP. We then build a novel stochastic gradient descent algorithm called Inverse Weierstrass Private SGD (IWP-SGD). It converges to the true population risk minimizer at a rate of $\mathcal{O}(1/n)$, with $n$ the number of examples. We empirically validate IWP-SGD on binary classification tasks using synthetic and real-world datasets.

💡 Deep Analysis

📄 Full Content

Machine Learning (ML) models are increasingly deployed in domains involving sensitive data, such as healthcare, speech recognition, prediction, and forecasting. These models are vulnerable to inference attacks that allow adversaries to extract information about individual training examples (Hu et al., 2022). This has motivated the use of Differential Privacy (DP) (Dwork & Roth, 2014) as a rigorous standard to assess privacy in ML. To achieve meaningful guarantees, DP typically requires data to be centralized by a trusted curator, in charge of enforcing privacy. Unfortunately, this raises several risks: the trusted authority may fall victim to attacks that lead to major data breaches (Primoff & Kess, 2017;Lu, 2019), and data may be misappropriated by untrustworthy third parties that do not prioritize privacy.

Local Differential Privacy (LDP) (Kasiviswanathan et al., 2008;Duchi et al., 2018) addresses this challenge by requiring each data holder to privatize their data locally before release, effectively ensuring privacy without relying on a trusted curator. While this provides strong privacy guarantees, applying it in ML requires adapting the downstream learning process. Existing methods can be categorized into interactive and noninteractive approaches. In interactive methods, the learner adaptively queries data holders over multiple rounds, incurring a communication cost (Smith et al., 2017). In contrast, noninteractive methods require each user to release one or several privatized versions of their data in a single shot, eliminating the need for further interaction during learning (Zheng et al., 2017).

In practice, designing LDP mechanisms involves two considerations: whether downstream learning tasks are known, and whether data release can be adapted during learning. In some scenarios, the downstream learning problem is known, and task-specific algorithms can be designed to correct for LDP noise; however, this limits the potential for the data to be reused for other purposes. In contrast, many real-world scenarios involve unknown downstream tasks or require that data remain reusable in the long run. This motivates the use of task-agnostic, noninteractive LDP methods.

In task-agnostic LDP, each data holder publishes a onetime privatized representation of their data, without prior knowledge of the downstream learning task. Such a mechanism allows institutions or users to safely publish privatized datasets that remain usable for future analyses, for example, hospitals sharing medical records for research purposes. Yet, despite its generality, noninteractive and task-agnostic LDP raises a significant challenge. Indeed, learning from noisy (private) data may bias the process, as previously identified in supervised learning with noisy features (Bishop, 1995) and labels (van Rooyen & Williamson, 2018). Naively applying standard ML frameworks to privatized data may yield suboptimal models: new algorithms tailored for noninteractive and task-agnostic LDP are thus needed.

Contributions. In this paper, we develop a principled view of learning under noninteractive and task-agnostic LDP, and design new algorithms for locally private ML. We show that standard LDP mechanisms can be viewed, in expectation, as functional transforms: the Gaussian mechanism corresponds to the Weierstrass transform, while Randomized Response induces what we call the Bernoulli transform.

This perspective allows us to fully characterize the bias induced by LDP on data-dependent computations.

Crucially, inverting these transforms allows the design of algorithms that provably mitigate privacy-induced bias, yielding unbiased estimators for the underlying data-dependent quantities. In learning contexts, we leverage the inverse of these transforms to construct unbiased gradient estimators for loss functions. Applying this principle to first-order optimization, we introduce Inverse Weierstrass Private Stochastic Gradient Descent (IWP-SGD). We show that IWP-SGD asymptotically recovers, in expectation over the noise, the population risk minimizer of the original, non-private problem, as the number of samples n grows to infinity. Interestingly, the convergence rate of IWP-SGD scales as O(1/n) similarly to classic interactive LDP approaches (Smith et al., 2017). Finally, we empirically validate IWP-SGD on binary classification tasks using synthetic and real-world datasets. To the best of our knowledge, this is the first method that asymptotically recovers the non-private population risk minimizer in a fully task-agnostic LDP setting using a single privatized release per data point.

Our contributions can be summarized as follows:

• We formalize the processing of data released under the Gaussian and Randomized Response mechanisms as transform operators of the intended computations, enabling a unified analysis of their induced bias (Section 3). This view allows us to fully characterize the bias induced by the Gaussian and Randomized Response in sta

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut