DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection
The rapid progress of generative models such as GANs and diffusion models has led to the widespread proliferation of AI-generated images, raising concerns about misinformation, privacy violations, and trust erosion in digital media. Although large-scale multimodal models like CLIP offer strong transferable representations for detecting synthetic content, fine-tuning them often induces catastrophic forgetting, which degrades pre-trained priors and limits cross-domain generalization. To address this issue, we propose the Distillation-guided Gradient Surgery Network (DGS-Net), a novel framework that preserves transferable pre-trained priors while suppressing task-irrelevant components. Specifically, we introduce a gradient-space decomposition that separates harmful and beneficial descent directions during optimization. By projecting task gradients onto the orthogonal complement of harmful directions and aligning with beneficial ones distilled from a frozen CLIP encoder, DGS-Net achieves unified optimization of prior preservation and irrelevant suppression. Extensive experiments on 50 generative models demonstrate that our method outperforms state-of-the-art approaches by an average margin of 6.6, achieving superior detection performance and generalization across diverse generation techniques.
💡 Research Summary
The paper addresses the pressing problem of detecting AI‑generated images in an era where generative models such as GANs and diffusion models are proliferating rapidly. While large‑scale vision‑language models like CLIP provide powerful, transferable representations that can be leveraged for this task, fine‑tuning CLIP often leads to catastrophic forgetting: the model loses the rich semantic priors learned during pre‑training and overfits to dataset‑specific cues, resulting in poor cross‑domain generalization.
To mitigate this, the authors propose DGS‑Net (Distillation‑Guided Gradient Surgery Network), a novel framework that operates directly in gradient space rather than on raw feature representations. The key idea is to decompose the gradient of any classification loss into a “harmful” (positive) component and a “beneficial” (negative) component based on the sign of each coordinate. Positive gradient entries increase the loss when perturbed, thus representing directions that should be suppressed; negative entries decrease the loss and are therefore desirable.
DGS‑Net consists of two complementary modules:
-
Orthogonal Suppression – The text encoder of CLIP (kept frozen) is used to compute a task‑specific gradient g⁺_text. This gradient captures semantic directions that are irrelevant or even detrimental to the detection task. The image‑side gradient g_task, obtained from the trainable image encoder (augmented with LoRA adapters), is orthogonally projected onto the subspace orthogonal to g⁺_text, effectively removing the harmful component while preserving the remaining signal.
-
Prior Alignment – A frozen copy of the CLIP image encoder provides a reference gradient g⁻_img, which contains the negative (beneficial) components that the original pre‑training deems useful for reducing loss. These components are injected as a lightweight alignment signal, nudging the fine‑tuned encoder toward the original semantic manifold and preventing drift.
The overall training objective combines binary cross‑entropy losses on image and text branches, with the gradients processed as described above before back‑propagation. LoRA adapters keep the number of trainable parameters low, and BLIP is employed to generate textual captions automatically, enabling the text‑based gradient computation without manual annotation.
The authors construct four benchmark datasets using ProGAN, R3GAN, SDXL, and SimSwap, each containing real and synthetic images across multiple categories. They visualize feature spaces with t‑SNE, showing that standard LoRA fine‑tuning collapses the geometry while DGS‑Net retains the original structure and achieves clear real/fake separation.
Extensive experiments span 50 diverse generative models, including state‑of‑the‑art diffusion systems (Stable Diffusion, Midjourney, DALL·E) and numerous GAN variants. DGS‑Net consistently outperforms prior methods (plain CLIP, LoRA‑based fine‑tuning, VIB‑Net, C2P‑CLIP, Effort, NS‑Net) by an average of 6.6 percentage points in detection accuracy. The gains are especially pronounced on unseen generators, demonstrating superior cross‑domain robustness. Computational overhead remains modest because the gradient surgery operations are linear algebraic projections that add negligible cost to the forward/backward pass.
In summary, DGS‑Net introduces a principled gradient‑space distillation strategy that selectively suppresses task‑irrelevant updates while reinforcing pre‑trained priors. This approach effectively eliminates catastrophic forgetting during CLIP fine‑tuning for AI‑generated image detection, yielding higher accuracy and markedly better generalization to novel generative techniques. Future work may explore extending the gradient‑surgery concept to other multimodal foundations and investigating unsupervised ways to identify harmful directions.
Comments & Academic Discussion
Loading comments...
Leave a Comment