Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability

Reading time: 5 minute
...

📝 Original Info

  • Title: Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability
  • ArXiv ID: 2602.15919
  • Date: 2026-02-17
  • Authors: ** - (논문에 명시된 저자 리스트를 여기 입력) 예시 : Jane Doe, John Smith, Alice Kim, et al. **

📝 Abstract

Can the privacy vulnerability of individual data points be assessed without retraining models or explicitly simulating attacks? We answer affirmatively by showing that exposure to membership inference attack (MIA) is fundamentally governed by a data point's influence on the learned model. We formalize this in the linear setting by establishing a theoretical correspondence between individual MIA risk and the leverage score, identifying it as a principled metric for vulnerability. This characterization explains how data-dependent sensitivity translates into exposure, without the computational burden of training shadow models. Building on this, we propose a computationally efficient generalization of the leverage score for deep learning. Empirical evaluations confirm a strong correlation between the proposed score and MIA success, validating this metric as a practical surrogate for individual privacy risk assessment.

💡 Deep Analysis

📄 Full Content

Modern machine learning models, and deep neural networks in particular, are known to memorize aspects of their training data (Zhang et al., 2017;Carlini et al., 2019). This memorization induces privacy vulnerabilities that can be exploited by Membership Inference Attacks (MIAs), which aim to determine whether a specific data point was included in the training set (Shokri et al., 2017;Carlini et al., 2022). A principled defense against membership inference is provided by Differential Privacy (DP) (Dwork, 2006), implemented in deep learning through noise-injected stochastic gradient methods (Abadi et al., 2016). However, controlling the trade-off between privacy protection and model utility remains challenging. Noise calibration typically relies on worst-case theoretical accounting, paired with empirical privacy auditing via MIAs, and often leads to either over-conservative noise levels or insufficient privacy protection. In this context, designing MIAs for empirical auditing is essential to quantify leakage in non-private models (Yeom et al., 2018) or to validate the practical tightness of DP guarantees in private models (Nasr et al., 2021;Jagielski et al., 2020).

While such auditing is now standard practice (Carlini et al., 2022;Nasr et al., 2021;Zarifzadeh et al., 2024), relying on aggregate metrics like average accuracy or AUC is insufficient. Such global measures can obscure critical risk heterogeneity, as outliers and rare subgroups are significantly more prone to memorization than typical samples (Carlini et al., 2022;Feldman and Zhang, 2020). Consequently, a model certified as private on average may still expose specific points to high privacy risks.

To address this heterogeneity, recent work is increasingly focusing on individual privacy risk assessment, aiming to quantify membership leakage at the level of each data point, separately, rather than in aggregate. State-of-the-art methods for per-sample auditing (Carlini et al., 2022;Zarifzadeh et al., 2024) predominantly rely on shadow models. These techniques train multiple reference models on random data splits to characterize each data point’s influence on the model’s behavior. This enables identification of which points exhibit increased sensitivity to their presence in the training dataset. However, these techniques are computationally prohibitive, especially for large-scale models, as they require retraining the model multiple times. This leads to our central question stated in the abstract: can the privacy vulnerability of individual data points be assessed without retraining models or explicitly simulating attacks?

In classical statistics, the leverage score quantifies a data point’s geometric influence on a model, independent of its label. We establish that in Gaussian linear models, this score precisely characterizes membership inference vulnerability. We prove that under an optimal black-box attack, the privacy loss distribution is controlled by a single scalar, the leverage score. In other words, membership inference vulnerability is fundamentally about self-influence: samples that are geometrically positioned to have a disproportionate effect on the model’s learned parameters are the most at risk of privacy leakage. Consequently, while specific outcomes fluctuate due to noise, the individual average privacy risk in the linear regime is determined by data geometry.

To extend this analysis to deep neural networks, we introduce the Generalized Leverage Score (GLS). Derived via implicit differentiation of the training optimality conditions, the GLS measures the infinitesimal sensitivity of a model’s prediction to its own label, generalizing the leverage score to both regression and classification settings. While the exact computation is expensive for deep networks, we demonstrate that a last-layer approximation remains highly effective in practice. This allows us to compute a scalable, theoretically principled, proxy for privacy risk that correlates with the success of state-of-the-art attacks, without the need for retraining or shadow models.

Contributions. (i) We prove that for Gaussian linear models under black-box access, the leverage score is the sufficient statistic characterizing both the privacy loss distribution and the optimal membership inference test. (ii) We extend this leverage score to general differentiable models via the Generalized Leverage Score (GLS), deriving a principled and scalable estimator of privacy vulnerability. (iii) Through multiple experiments, we show that this metric serves as a surrogate for individual privacy risk. It identifies vulnerable samples, showing a strong correlation with state-of-the-art shadow model attacks, at a reduced computational cost.

Membership Inference Attacks (MIA). Membership inference aims to determine if a specific sample was used to train a model. Early approaches relied on simple metric-based classifiers, exploiting overfitting signals such as prediction confidence, entropy,

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut