Text-guided image editing via diffusion models, while powerful, raises significant concerns about misuse, motivating efforts to immunize images against unauthorized edits using imperceptible perturbations. Prevailing metrics for evaluating immunization success typically rely on measuring the visual dissimilarity between the output generated from a protected image and a reference output generated from the unprotected original. This approach fundamentally overlooks the core requirement of image immunization, which is to disrupt semantic alignment with attacker intent, regardless of deviation from any specific output. We argue that immunization success should instead be defined by the edited output either semantically mismatching the prompt or suffering substantial perceptual degradations, both of which thwart malicious intent. To operationalize this principle, we propose Synergistic Intermediate Feature Manipulation (SIFM), a method that strategically perturbs intermediate diffusion features through dual synergistic objectives: (1) maximizing feature divergence from the original edit trajectory to disrupt semantic alignment with the expected edit, and (2) minimizing feature norms to induce perceptual degradations. Furthermore, we introduce the Immunization Success Rate (ISR), a novel metric designed to rigorously quantify true immunization efficacy for the first time. ISR quantifies the proportion of edits where immunization induces either semantic failure relative to the prompt or significant perceptual degradations, assessed via Multimodal Large Language Models (MLLMs). Extensive experiments show our SIFM achieves the state-of-the-art performance for safeguarding visual content against malicious diffusion-based manipulation.
💡 Deep Analysis
📄 Full Content
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021
1
Semantic Mismatch and Perceptual Degradation: A
New Perspective on Image Editing Immunity
Shuai Dong , Jie Zhang, Member, IEEE, Guoying Zhao Fellow, IEEE, Shiguang Shan Fellow, IEEE, Xilin
Chen Fellow, IEEE,
Abstract—Text-guided image editing via diffusion models,
while powerful, raises significant concerns about misuse, motivat-
ing efforts to immunize images against unauthorized edits using
imperceptible perturbations. Prevailing metrics for evaluating
immunization success typically rely on measuring the visual dis-
similarity between the output generated from a protected image
and a reference output generated from the unprotected original.
This approach fundamentally overlooks the core requirement
of image immunization, which is to disrupt semantic alignment
with attacker intent, regardless of deviation from any specific
output. We argue that immunization success should instead be
defined by the edited output either semantically mismatching
the prompt or suffering substantial perceptual degradations,
both of which thwart malicious intent. To operationalize this
principle, we propose Synergistic Intermediate Feature Manip-
ulation (SIFM), a method that strategically perturbs interme-
diate diffusion features through dual synergistic objectives: (1)
maximizing feature divergence from the original edit trajectory
to disrupt semantic alignment with the expected edit, and (2)
minimizing feature norms to induce perceptual degradations.
Furthermore, we introduce the Immunization Success Rate (ISR),
a novel metric designed to rigorously quantify true immunization
efficacy for the first time. ISR quantifies the proportion of edits
where immunization induces either semantic failure relative to
the prompt or significant perceptual degradations, assessed via
Multimodal Large Language Models (MLLMs). Extensive exper-
iments show our SIFM achieves the state-of-the-art performance
for safeguarding visual content against malicious diffusion-based
manipulation.
Index Terms—Diffusion Models, Image Editing, Image Immu-
nization.
I. INTRODUCTION
R
ECENT breakthroughs in diffusion models (DMs) have
revolutionized text-to-image synthesis and manipulation
[1]–[9], enabling high-fidelity generation and fine-grained
editing through natural language guidance [10]–[21]. While
these capabilities unlock transformative creative tools, they
also introduce severe ethical risks. Malicious actors could
exploit DMs to generate deepfakes or forged content for
disinformation campaigns, privacy violations, or manipulation
of public discourse [22]. Such threats erode trust in digital
Jie Zhang, Shiguang Shan and Xilin Chen are with the State Key Laboratory
of AI Safety, Institute of Computing Technology, Chinese Academy of
Sciences (CAS), Beijing 100190, China, and also with the University of China
Academy of Sciences, Beijing 100049, China (e-mail: zhangjie@ict.ac.cn;
sgshan@ict.ac.cn; xlchen@ict.ac.cn).
Guoying Zhao is with the Center for Machine Vision and Signal Analysis,
University of Oulu, 90014 Oulu, Finland (e-mail: guoying.zhao@oulu.fi).
Shuai Dong is with the School of Computer Science, China University of
Geosciences, Wuhan 430074, China (e-mail: dongshuai iu@163.com).
media and risk destabilizing socio-political systems, making
the development of robust safeguards against harmful edits
a critical priority. One promising defense paradigm is image
immunization, which addresses this challenge by embedding
imperceptible perturbations into images to proactively disrupt
unauthorized edits [23]–[31].
Despite notable advancements in mitigating malicious ed-
its [27]–[32], the prevailing standard for defining immuniza-
tion success remains superficial, predominantly relying on vi-
sual dissimilarity between the edited output of the immunized
image and the specific edited output of the unprotected orig-
inal. This approach is fundamentally limited because such a
reference edit represents merely one possible outcome among
a spectrum of valid edits for a given prompt, especially given
the inherent variability of diffusion models. Consequently,
deviation from such a non-unique reference does not inherently
signify successful immunization. Critically, these evaluations
overlook the core requirement that effective protection must
disrupt semantic alignment with the attacker’s intent, irrespec-
tive of comparison to any specific reference instance. For
example, with the prompt “make the hairstyles look more
gothic” as shown in Fig. 1, various “gothic” interpretations can
be semantically valid yet look very different from the edited
original. Therefore, significant visual deviation alone does not
mean a malicious edit has been prevented. This inadequacy
of current standards to genuinely assess immunization raises
a critical question: What constitutes an accurate standard for
immunization success?
To address the question, we redefine successful image
immunization through an adversarial len