Distill, Forget, Repeat: A Framework for Continual Unlearning in Text-to-Image Diffusion Models

Reading time: 5 minute
...

📝 Original Info

  • Title: Distill, Forget, Repeat: A Framework for Continual Unlearning in Text-to-Image Diffusion Models
  • ArXiv ID: 2512.02657
  • Date: 2025-12-02
  • Authors: ** Naveen George¹²,†, Naoki Murata², Yuhta Takida², Konda Reddy Mopuri¹, Yuki Mitsufuji²³ ¹ Indian Institute of Technology Hyderabad ² Sony AI ³ Sony Group Corporation **

📝 Abstract

The recent rapid growth of visual generative models trained on vast web-scale datasets has created significant tension with data privacy regulations and copyright laws, such as GDPR's ``Right to be Forgotten.'' This necessitates machine unlearning (MU) to remove specific concepts without the prohibitive cost of retraining. However, existing MU techniques are fundamentally ill-equipped for real-world scenarios where deletion requests arrive sequentially, a setting known as continual unlearning (CUL). Naively applying one-shot methods in a continual setting triggers a stability crisis, leading to a cascade of degradation characterized by retention collapse, compounding collateral damage to related concepts, and a sharp decline in generative quality. To address this critical challenge, we introduce a novel generative distillation based continual unlearning framework that ensures targeted and stable unlearning under sequences of deletion requests. By reframing each unlearning step as a multi-objective, teacher-student distillation process, the framework leverages principles from continual learning to maintain model integrity. Experiments on a 10-step sequential benchmark demonstrate that our method unlearns forget concepts with better fidelity and achieves this without significant interference to the performance on retain concepts or the overall image quality, substantially outperforming baselines. This framework provides a viable pathway for the responsible deployment and maintenance of large-scale generative models, enabling industries to comply with ongoing data removal requests in a practical and effective manner.

💡 Deep Analysis

Figure 1

📄 Full Content

Distill, Forget, Repeat: A Framework for Continual Unlearning in Text-to-Image Diffusion Models Naveen George1,2,†, Naoki Murata2, Yuhta Takida2, Konda Reddy Mopuri1, Yuki Mitsufuji2,3 1Indian Institute of Technology Hyderabad 2Sony AI 3Sony Group Corporation ai23mtech12001@iith.ac.in, naoki.murata@sony.com ESD-x MACE DUGE Our Method Pikachu Van Gogh Style Spiderman Lionel Messi Banana “Pikachu paddles a kayak downstream” “A bakery counter in Van Gogh Style” “Spiderman stands in a city alley” “Lionel Messi greets cheering supporters” “A banana beside a fish tank” “Squirtle Figurine On Office Desk” “Comic book style drawing on easel” “Iron man figure on desk.” “Cristiano Ronaldo arrives at stadium” “Pineapple centerpiece on dining table” SD v1.5 Unlearning Performance in CUL Pikachu Van Gogh Style Spiderman Lionel Messi Banana Retaining Performance in CUL Figure 1. Qualitative comparison of our method (bottom row) against SOTA baselines on 10 sequential unlearning steps. This figure highlights the critical failure modes of existing methods in a continual setting. Baselines like ESD-x [9] and MACE [19] suffer from catastrophic “retention collapse”, where generative quality completely breaks down on retained concepts (right panel). Other methods like DUGE [32] avoid total collapse but still exhibit severe quality degradation and poor unlearning in later stages. In contrast, our method demonstrates a superior ability to both effectively unlearn target concepts (left panel) and preserve general knowledge (right panel), maintaining a good generative quality compared to the original SD model. Abstract The recent rapid growth of visual generative models trained on vast web-scale datasets has created significant tension with data privacy regulations and copyright laws, such as GDPR’s “Right to be Forgotten.” This necessitates machine unlearning (MU) to remove specific concepts without the prohibitive cost of retraining. However, existing MU tech- niques are fundamentally ill-equipped for real-world sce- narios where deletion requests arrive sequentially, a setting known as continual unlearning (CUL). Naively applying one-shot methods in a continual setting triggers a stability †Work done during an internship at Sony AI crisis, leading to a cascade of degradation characterized by retention collapse, compounding collateral damage to re- lated concepts, and a sharp decline in generative quality. To address this critical challenge, we introduce a novel genera- tive distillation based continual unlearning framework that ensures targeted and stable unlearning under sequences of deletion requests. By reframing each unlearning step as a multi-objective, teacher-student distillation process, the framework leverages principles from continual learning to maintain model integrity. Experiments on a 10-step sequen- tial benchmark demonstrate that our method unlearns for- get concepts with better fidelity and achieves this without significant interference to the performance on retain con- cepts or the overall image quality, substantially outperform- 1 arXiv:2512.02657v1 [cs.LG] 2 Dec 2025 ing baselines. This framework provides a viable pathway for the responsible deployment and maintenance of large- scale generative models, enabling industries to comply with ongoing data removal requests in a practical and effective manner. 1. Introduction Recent visual generative models such as SORA [5], Gem- ini [31], Imagen 3.0 [28], and DALL·E 3 [3] can synthesize photorealistic media at scale. These models are trained on massive web-scale datasets that often include sensitive per- sonal data, NSFW material, and copyrighted content. Laws such as GDPR [24] grant a “Right to be Forgotten”, but retraining foundation models from scratch is prohibitively expensive. The common workaround, post-generation fil- tering, is superficial; the model retains underlying knowl- edge, and filters can be bypassed by users [23, 36]. This motivates machine unlearning (MU): directly editing model weights to remove targeted concepts (objects, NSFW, artis- tic styles, identities) from generative behavior. However, deletion requests arrive sequentially rather than all at once. In principle, an ideal unlearning method should satisfy three desiderata: (I) Perfect unlearning, where forget con- cepts are not produced; (II) No Ripple Effects [1], so that related concepts remain intact (e.g., removing Brad Pitt should not distort Angelina Jolie or Leonardo DiCaprio; un- learning Van Gogh should not degrade other artistic styles); and (III) Quality Preservation, maintaining overall im- age quality. Current approaches map forget concepts to fixed placeholders (empty strings [9, 34], general con- cepts [10, 18], or random concepts [8]), suppressing rather than truly unlearning them. This leads to ripple effects (col- lateral damage to related concepts [1, 40]). These challenges are significantly amplified in continual unlearning (CUL) [11, 32, 35], where deletion

📸 Image Gallery

Method_v6.png radar_plots_focused.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut