Dual Attention Guided Defense Against Malicious Edits

Reading time: 5 minute
...

📝 Original Info

  • Title: Dual Attention Guided Defense Against Malicious Edits
  • ArXiv ID: 2512.14333
  • Date: 2025-12-16
  • Authors: Jie Zhang, Shuai Dong, Shiguang Shan, Xilin Chen

📝 Abstract

Recent progress in text-to-image diffusion models has transformed image editing via text prompts, yet this also introduces significant ethical challenges from potential misuse in creating deceptive or harmful content. While current defenses seek to mitigate this risk by embedding imperceptible perturbations, their effectiveness is limited against malicious tampering. To address this issue, we propose a Dual Attention-Guided Noise Perturbation (DANP) immunization method that adds imperceptible perturbations to disrupt the model's semantic understanding and generation process. DANP functions over multiple timesteps to manipulate both cross-attention maps and the noise prediction process, using a dynamic threshold to generate masks that identify text-relevant and irrelevant regions. It then reduces attention in relevant areas while increasing it in irrelevant ones, thereby misguides the edit towards incorrect regions and preserves the intended targets. Additionally, our method maximizes the discrepancy between the injected noise and the model's predicted noise to further interfere with the generation. By targeting both attention and noise prediction mechanisms, DANP exhibits impressive immunity against malicious edits, and extensive experiments confirm that our method achieves state-of-the-art performance.

💡 Deep Analysis

Figure 1

📄 Full Content

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 1 Dual Attention Guided Defense Against Malicious Edits Jie Zhang, Member, IEEE, Shuai Dong , Shiguang Shan Fellow, IEEE, Xilin Chen Fellow, IEEE, Abstract—Recent progress in text-to-image diffusion models has transformed image editing via text prompts, yet this also introduces significant ethical challenges from potential misuse in creating deceptive or harmful content. While current de- fenses seek to mitigate this risk by embedding imperceptible perturbations, their effectiveness is limited against malicious tampering. To address this issue, we propose a Dual Attention- Guided Noise Perturbation (DANP) immunization method that adds imperceptible perturbations to disrupt the model’s semantic understanding and generation process. DANP functions over multiple timesteps to manipulate both cross-attention maps and the noise prediction process, using a dynamic threshold to generate masks that identify text-relevant and irrelevant regions. It then reduces attention in relevant areas while increasing it in irrelevant ones, thereby misguides the edit towards incorrect regions and preserves the intended targets. Additionally, our method maximizes the discrepancy between the injected noise and the model’s predicted noise to further interfere with the generation. By targeting both attention and noise prediction mechanisms, DANP exhibits impressive immunity against ma- licious edits, and extensive experiments confirm that our method achieves state-of-the-art performance. Index Terms—Diffusion Models, Image Editing, Image Immu- nization. I. INTRODUCTION R ECENT advancements in diffusion models [1]–[3] have significantly propelled generative modeling, especially in text-to-image generation. Models like DALLE2 [4], Imagen [5], and Stable Diffusion [6] leverage diffusion processes with natural language inputs to generate images that align with user- provided text, offering intuitive control over image generation. Beyond text-to-image tasks, diffusion models have been ex- tended to image inpainting, editing, zero-shot classification, and open vocabulary segmentation [7]–[15]. Their progressive generation allows dynamic content adjustment, making them highly effective for image editing. However, the widespread use of diffusion models also raises concerns about potential negative impacts. Misuse of these technologies can lead to the creation of fake or manipulated images, deceiving the public, spreading misinformation, or manipulating opinion, thereby exacerbating societal trust issues. Additionally, privacy risks emerge when personal images are edited without consent, potentially resulting in privacy breaches or psychological harm. More seriously, these techniques can be used to produce harmful content, posing serious threats to social stability. Jie Zhang, Shiguang Shan and Xilin Chen are with the State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China, and also with the University of China Academy of Sciences, Beijing 100049, China (e-mail: zhangjie@ict.ac.cn; sgshan@ict.ac.cn; xlchen@ict.ac.cn). Shuai Dong is with the School of Computer Science, China University of Geosciences, Wuhan 430074, China (e-mail: dongshuai_iu@cug.edu.cn). Therefore, a thorough exploration of the security implications surrounding image editing technologies is essential to mitigate the risks of misuse. In response to these challenges, two primary mitigation strategies have emerged, namely the reactive detection of manipulated content [16]–[21] and the proactive immunization of original images against unauthorized editing [22]–[25]. Reactive detection operates post-facto by training classifiers to identify digital artifacts or inconsistencies left by gener- ative models. However, this approach does not prevent the initial creation and spread of harmful content, as the damage may already be done by the time an image is flagged. In contrast, proactive immunization offers a more robust, pre- emptive defense by embedding imperceptible, adversarial per- turbations into an original image before it is shared. The primary advantage of this proactive stance is its ability to disrupt the malicious editing process at its source, preventing harmful content from being successfully generated. Rather than merely identifying a fake, immunization aims to make its creation infeasible, thereby shifting the security burden from downstream detection to the point of content origin and empowering creators with direct control over their digital assets. These immunization strategies primarily rely on adversarial attacks to disrupt the image editing process by introducing carefully crafted perturbations at various stages. While early immunization strategies are effective against GAN-based edit- ing models [22], [23], the advent of diffusion models and their robust denoising process necessitates new approaches. For diffusion models, initial defen

📸 Image Gallery

lambda.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut