MeGU: Machine-Guided Unlearning with Target Feature Disentanglement
The growing concern over training data privacy has elevated the “Right to be Forgotten” into a critical requirement, thereby raising the demand for effective Machine Unlearning. However, existing unlearning approaches commonly suffer from a fundamental trade-off: aggressively erasing the influence of target data often degrades model utility on retained data, while conservative strategies leave residual target information intact. In this work, the intrinsic representation properties learned during model pretraining are analyzed. It is demonstrated that semantic class concepts are entangled at the feature-pattern level, sharing associated features while preserving concept-specific discriminative components. This entanglement fundamentally limits the effectiveness of existing unlearning paradigms. Motivated by this insight, we propose Machine-Guided Unlearning (MeGU), a novel framework that guides unlearning through concept-aware re-alignment. Specifically, Multi-modal Large Language Models (MLLMs) are leveraged to explicitly determine re-alignment directions for target samples by assigning semantically meaningful perturbing labels. To improve efficiency, inter-class conceptual similarities estimated by the MLLM are encoded into a lightweight transition matrix. Furthermore, MeGU introduces a positive-negative feature noise pair to explicitly disentangle target concept influence. During finetuning, the negative noise suppresses target-specific feature patterns, while the positive noise reinforces remaining associated features and aligns them with perturbing concepts. This coordinated design enables selective disruption of target-specific representations while preserving shared semantic structures. As a result, MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.
💡 Research Summary
The paper addresses the pressing need for “right‑to‑be‑forgotten” compliance in machine learning systems, highlighting that current machine unlearning (MU) techniques suffer from a fundamental trade‑off: aggressive removal of target data often harms the model’s performance on retained data, while conservative approaches leave residual influence from the forgotten data. The authors first analyze the internal representation learned during pre‑training and argue that models encode information at two hierarchical levels: low‑level feature patterns and higher‑level semantic concepts. Different classes share “associated features” but also possess “unique features”. This entanglement means that when a class is unlearned, indiscriminate removal of its features can also erase shared features needed for other classes, causing under‑unlearning or over‑unlearning.
To overcome this, the authors propose Machine‑Guided Unlearning (MeGU). MeGU leverages zero‑shot multi‑modal large language models (MLLMs) to generate semantically meaningful “perturbing labels” for the target samples. By prompting an MLLM with a small subset of data, the model estimates inter‑class conceptual similarities, which are stored in a lightweight transition matrix. This matrix enables fast lookup of the most appropriate perturbing label for each target class without repeatedly invoking the MLLM.
The core technical contribution is the “Fragment‑Align” strategy, which introduces a pair of feature noises: a negative noise that suppresses the unique features of the target class, and a positive noise that aligns the remaining (shared) features with the semantics of the perturbing label. Both noises are learned while keeping the original pre‑trained backbone frozen; they are added to the input during fine‑tuning. The training objective combines the standard classification loss on retained data, a loss encouraging alignment with the perturbing label, and an ℓ₂ regularization on the noise magnitude. This dual‑noise design selectively disentangles the target concept while preserving associated features that benefit other classes.
Experiments cover three unlearning scenarios—single‑class removal, multi‑class removal, and label‑noise correction—across CIFAR‑10, ImageNet‑Subset, CelebA, and Tiny‑ImageNet. Evaluation metrics include recall of forgotten data, accuracy on retained data, and membership inference attack (MIA) success rate. MeGU consistently outperforms state‑of‑the‑art baselines such as SISA, Golatkar’s scrubbing, and UNSIR. It achieves >95 % removal of target data while degrading retained‑data accuracy by only 1–2 %. Moreover, MIA success drops by over 30 % relative to baselines, indicating stronger privacy protection.
Ablation studies show that (1) the transition matrix improves perturbing‑label selection efficiency by ~20 %; (2) using both positive and negative noises yields an 8 % higher removal rate than a single‑noise variant; (3) the method is robust to hyper‑parameter variations, with the loss‑weight ratio α:β between 1:1 and 1:3 giving stable performance and λ (noise regularization) in the range 0.01–0.1 having minimal impact.
In summary, MeGU integrates external semantic knowledge from MLLMs, lightweight similarity encoding, and a principled dual‑noise mechanism to resolve the entanglement problem that hampers existing unlearning methods. It delivers precise, efficient, and privacy‑preserving forgetting without sacrificing the model’s overall utility, offering a compelling blueprint for future privacy‑aware AI deployments.
Comments & Academic Discussion
Loading comments...
Leave a Comment