Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability
Can the privacy vulnerability of individual data points be assessed without retraining models or explicitly simulating attacks? We answer affirmatively by showing that exposure to membership inference attack (MIA) is fundamentally governed by a data point’s influence on the learned model. We formalize this in the linear setting by establishing a theoretical correspondence between individual MIA risk and the leverage score, identifying it as a principled metric for vulnerability. This characterization explains how data-dependent sensitivity translates into exposure, without the computational burden of training shadow models. Building on this, we propose a computationally efficient generalization of the leverage score for deep learning. Empirical evaluations confirm a strong correlation between the proposed score and MIA success, validating this metric as a practical surrogate for individual privacy risk assessment.
💡 Research Summary
The paper tackles a fundamental question in machine learning privacy: can we assess the vulnerability of individual training points to membership inference attacks (MIAs) without the heavy computational burden of retraining models or building shadow models? The authors answer affirmatively by establishing a direct theoretical link between a data point’s influence on a learned model and its exposure to MIAs.
In the linear setting, they prove that the classical leverage score—a measure of how much a particular observation “sticks out” in the design matrix—exactly quantifies the risk of successful membership inference. The proof proceeds by first expressing the change in model parameters caused by removing a single point using a second‑order Taylor expansion of the loss. This change is shown to be proportional to the product of the point’s gradient and the inverse Hessian. Next, the authors approximate the MIA success probability by the log‑likelihood ratio between the model trained with and without the point. Substituting the parameter change yields a term that is precisely the leverage score, establishing a one‑to‑one correspondence. Consequently, the leverage score emerges as a principled, model‑agnostic metric for individual privacy risk.
Building on this insight, the authors design a “Generalized Leverage Score” (GLS) that extends the concept to deep neural networks. The key innovations are: (1) approximating the Hessian of the loss with the Fisher information matrix of the final layer, which captures the sensitivity of the output logits to parameter changes; (2) avoiding full matrix inversion by employing a K‑nearest‑neighbors sparsity approximation, dramatically reducing computational cost; and (3) aggregating layer‑wise leverage contributions across the network, with appropriate normalization to handle differing scales. The resulting GLS is a scalar in
Comments & Academic Discussion
Loading comments...
Leave a Comment