EvoMU: Evolutionary Machine Unlearning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Machine unlearning aims to unlearn specified training data (e.g. sensitive or copyrighted material). A prominent approach is to fine-tune an existing model with an unlearning loss that retains overall utility. The space of suitable unlearning loss functions is vast, making the search for an optimal loss function daunting. Additionally, there might not even exist a universally optimal loss function: differences in the structure and overlap of the forget and retain data can cause a loss to work well in one setting but over-unlearn or under-unlearn in another. Our approach EvoMU tackles these two challenges simultaneously. An evolutionary search procedure automatically finds task-specific losses in the vast space of possible unlearning loss functions. This allows us to find dataset-specific losses that match or outperform existing losses from the literature, without the need for a human-in-the-loop. This work is therefore an instance of automatic scientific discovery, a.k.a. an AI co-scientist. In contrast to previous AI co-scientist works, we do so on a budget: We achieve SotA results using a small 4B parameter model (Qwen3-4B-Thinking), showing the potential of AI co-scientists with limited computational resources. Our experimental evaluation shows that we surpass previous loss-based unlearning formulations on TOFU-5%, TOFU-10%, MUSE and WMDP by synthesizing novel unlearning losses. Our code is available at https://github.com/Batorskq/EvoMU.

💡 Research Summary

The paper tackles the problem of machine unlearning for large language models (LLMs), where the goal is to remove the influence of specific training data (e.g., sensitive or copyrighted material) without retraining from scratch. Existing approaches rely on hand‑crafted unlearning loss functions that are applied during a fine‑tuning phase. However, the space of possible loss functions is enormous, and a loss that works well for one forget/retain data configuration may either over‑unlearn (damaging utility) or under‑unlearn (leaving residual memorization) for another. The authors therefore propose EvoMU (Evolutionary Machine Unlearning), an automated scientific discovery framework that treats unlearning loss design as an automated discovery problem.

EvoMU consists of an iterative loop that combines a code‑generating language model (the “LLM Proposer”) with an evolutionary search over executable loss functions. The search space is constrained to Python functions that accept four inputs: the log‑probabilities of the forget batch (z_f), the retain batch (z_r), and the corresponding reference‑model log‑probabilities (z_ref_f, z_ref_r). The function must return a single scalar loss value and be differentiable in PyTorch. For each candidate loss, the base LLM (Qwen3‑4B‑Thinking) is fine‑tuned using LoRA adapters on the forget and retain datasets for a small, specified number of epochs. The resulting checkpoint is evaluated with standardized unlearning metrics (TOFU‑5%, TOFU‑10%, MUSE, WMDP) as well as utility metrics (e.g., perplexity, accuracy on retain data).

After evaluation, the top‑K performing losses are retained and fed back to the LLM Proposer, which mutates them by adjusting coefficients, modifying structural components, or altering training budgets. This mutation‑selection loop is repeated R times, progressively refining the loss functions toward higher forgetting effectiveness while preserving utility.

Empirical results across multiple benchmarks demonstrate that EvoMU discovers loss functions that match or surpass strong human‑designed baselines such as NPO, SimNPO, and gradient‑ascent‑based losses. Notably, EvoMU achieves state‑of‑the‑art forgetting rates on TOFU‑5% and TOFU‑10% and consistently outperforms baselines on MUSE and WMDP. Analysis of the discovered losses reveals that many top‑performers are surprisingly simple, often lacking explicit reference‑model terms and relying mainly on weighted log‑probability differences. This suggests that current research may be overlooking simple yet effective regions of the loss‑function space. Moreover, even randomly sampled loss functions (without any evolutionary refinement) can be competitive with established baselines, underscoring that loss choice is a dominant lever in unlearning performance.

A key contribution of the work is demonstrating that a modest 4‑billion‑parameter open‑source model is sufficient to drive effective loss discovery, contrasting with prior automated scientific discovery systems that depend on much larger or closed models. This “small‑model effectiveness” highlights the feasibility of cost‑efficient AI co‑scientists. The authors also release their code and experimental pipeline, enabling reproducibility and encouraging further research in automated loss design for unlearning and related domains.

In summary, EvoMU provides a fully automated, verifiable, and resource‑efficient framework for discovering task‑specific unlearning loss functions, advancing both the practice of machine unlearning and the broader field of AI‑assisted scientific discovery.

EvoMU: Evolutionary Machine Unlearning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment