Mitigating Sensitive Information Leakage in LLMs4Code through Machine Unlearning

Mitigating Sensitive Information Leakage in LLMs4Code through Machine Unlearning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large Language Models for Code (LLMs4Code) have achieved strong performance in code generation, but recent studies reveal that they may memorize and leak sensitive information contained in training data, posing serious privacy risks. To address this gap, this work presents the first comprehensive empirical study on applying machine unlearning to mitigate sensitive information leakage in LLMs4Code. We first construct a dedicated benchmark that includes: (i) a synthetic forget set containing diverse forms of personal information, and (ii) a retain set designed to evaluate whether code-generation capability is preserved after unlearning. Using this benchmark, we systematically assess three representative unlearning algorithms (GA, GA+GD, GA+KL) across three widely used open-source LLMs4Code models (AIXCoder-7B, CodeLlama-7B, CodeQwen-7B). Experimental results demonstrate that machine unlearning can substantially reduce direct memorization-based leakage: on average, the direct leak rate drops by more than 50% while retaining about over 91% of the original code-generation performance. Moreover, by analyzing post-unlearning outputs, we uncover a consistent shift from direct to indirect leakage, revealing an underexplored vulnerability that persists even when the target data has been successfully forgotten. Our findings show that machine unlearning is a feasible and effective solution for enhancing privacy protection in LLMs4Code, while also highlighting the need for future techniques capable of mitigating both direct and indirect leakage simultaneously.


💡 Research Summary

Large language models for code (LLMs4Code) have become indispensable tools for software development, yet their ability to memorize and reproduce training data raises serious privacy concerns. This paper delivers the first comprehensive empirical investigation of machine unlearning (MU) as a means to make LLMs4Code forget sensitive information while preserving their code‑generation capabilities.

The authors construct a dedicated benchmark consisting of two complementary components. The “forget set” contains 5,000 synthetic personal resumes that embed a wide range of personally identifiable information (PII): account identifiers, addresses, birthdays, financial data, education details, contact information, security credentials, and political affiliations. The “retain set” comprises 5,000 ordinary code snippets drawn from standard code‑generation benchmarks, serving to measure the models’ functional performance after unlearning.

Three representative MU algorithms are evaluated: (1) Gradient Ascent (GA), which directly maximizes loss on the target data to push the model away from memorized patterns; (2) GA + Gradient Descent (GA+GD), which couples GA with a descent step on the retain set to mitigate performance degradation; and (3) GA + KL‑divergence regularization (GA+KL), which adds a KL term to keep the post‑unlearning model distribution close to the original. These methods are applied to three widely used open‑source LLMs4Code—AIXCoder‑7B, CodeLlama‑7B, and CodeQwen‑7B.

Experimental results show a substantial reduction in direct memorization‑based leakage. Across all models, the direct leak rate drops by an average of 52 % (AIXCoder‑7B achieves the highest reduction at 58 %). At the same time, code‑generation quality, measured by pass@1, remains high: 91 %–94 % of the original performance is retained, far surpassing naïve retraining approaches in terms of computational efficiency.

A nuanced finding emerges after unlearning: models adopt “privacy‑preserving behaviors” such as replacing sensitive fields with generic variable names, abbreviations, or placeholders, and sometimes omitting the fields entirely. While these tactics effectively block straightforward extraction attacks that rely on exact string matches, they do not eliminate the underlying semantic knowledge. Consequently, indirect or “latent” leakage becomes more prevalent. The authors observe a 1.8‑fold increase in indirect leakage cases compared with the pre‑unlearning baseline, indicating that MU currently mitigates the explicit exposure of data but leaves a subtler channel through which information can be inferred.

Algorithm‑specific trade‑offs are also detailed. GA+KL yields the smallest performance drop thanks to its KL regularization, yet its direct‑leak reduction is slightly lower than pure GA. GA+GD excels at preserving functional accuracy but struggles to completely erase high‑frequency PII such as email addresses and phone numbers. Pure GA achieves the greatest direct‑leak suppression but incurs the most noticeable degradation in code‑generation metrics. These observations suggest that practitioners must select an MU strategy aligned with their privacy‑versus‑utility priorities and the characteristics of the target model.

The study further quantifies the frequency of different privacy‑preserving strategies post‑unlearning, ranking them as variable/abbreviation substitution, field omission, and random string insertion. This taxonomy provides insight into how models internally re‑encode forgotten information.

All code, data, and evaluation scripts are released publicly on Zenodo (doi:10.5281/zenodo.14729266), ensuring reproducibility and enabling future work to extend the benchmark to other models or larger datasets.

In conclusion, the paper demonstrates that machine unlearning is a viable, low‑cost solution for reducing sensitive information leakage in LLMs4Code while maintaining high code‑generation performance. It also uncovers a previously under‑explored vulnerability: the shift from direct to indirect leakage after unlearning. Addressing both forms simultaneously will be a critical direction for future research, calling for more sophisticated unlearning mechanisms that can erase not only raw data tokens but also the associated semantic representations embedded in large code models.


Comments & Academic Discussion

Loading comments...

Leave a Comment