Adversarial Robustness in Zero-Shot Learning:An Empirical Study on Class and Concept-Level Vulnerabilities
Zero-shot Learning (ZSL) aims to enable image classifiers to recognize images from unseen classes that were not included during training. Unlike traditional supervised classification, ZSL typically relies on learning a mapping from visual features to predefined, human-understandable class concepts. While ZSL models promise to improve generalization and interpretability, their robustness under systematic input perturbations remain unclear. In this study, we present an empirical analysis about the robustness of existing ZSL methods at both classlevel and concept-level. Specifically, we successfully disrupted their class prediction by the well-known non-target class attack (clsA). However, in the Generalized Zero-shot Learning (GZSL) setting, we observe that the success of clsA is only at the original best-calibrated point. After the attack, the optimal bestcalibration point shifts, and ZSL models maintain relatively strong performance at other calibration points, indicating that clsA results in a spurious attack success in the GZSL. To address this, we propose the Class-Bias Enhanced Attack (CBEA), which completely eliminates GZSL accuracy across all calibrated points by enhancing the gap between seen and unseen class probabilities.Next, at concept-level attack, we introduce two novel attack modes: Class-Preserving Concept Attack (CPconA) and NonClass-Preserving Concept Attack (NCPconA). Our extensive experiments evaluate three typical ZSL models across various architectures from the past three years and reveal that ZSL models are vulnerable not only to the traditional class attack but also to concept-based attacks. These attacks allow malicious actors to easily manipulate class predictions by erasing or introducing concepts. Our findings highlight a significant performance gap between existing approaches, emphasizing the need for improved adversarial robustness in current ZSL models.
💡 Research Summary
This paper provides the first comprehensive study of adversarial robustness in zero‑shot learning (ZSL) and its generalized variant (GZSL). Unlike conventional supervised classifiers, ZSL models rely on an intermediate concept (attribute) prediction stage (visual → concept → class) and suffer from a strong bias toward seen classes because they are trained only on seen data. The authors first evaluate the well‑known non‑target class attack (clsA) on several recent embedding‑based ZSL models (ReZSL, PSVMA, ZeroMamba) across three benchmark datasets (CUB, AWA2, SUN). While clsA dramatically reduces ZSL accuracy, in the GZSL setting its effect is limited to the originally calibrated point (γ). When the calibration hyper‑parameter is shifted, the models recover most of their performance, revealing a “spurious attack success” that can mislead robustness assessments.
To overcome this limitation, the paper introduces the Class‑Bias Enhanced Attack (CBEA). CBEA augments the adversarial loss with a term that explicitly widens the probability gap between seen and unseen classes, thereby amplifying the intrinsic bias. As a result, accuracy collapses to near‑zero across all calibration points, demonstrating that bias manipulation is a more reliable way to break GZSL systems than merely perturbing class scores.
Beyond class‑level attacks, the authors propose two novel concept‑level attacks that target the intermediate attribute prediction. The Class‑Preserving Concept Attack (CPconA) degrades specific attribute predictions while keeping the final class label unchanged, exposing a mismatch between human‑interpretable explanations and model decisions. The Non‑Class‑Preserving Concept Attack (NCPconA) manipulates attributes so that the final class prediction itself changes, effectively inserting or erasing concepts (e.g., adding “wings” to a non‑bird image) to force misclassification.
Extensive experiments show: (1) clsA’s impact on GZSL is highly dependent on the calibration γ; (2) CBEA eliminates accuracy for any γ, confirming its robustness‑breaking power; (3) CPconA reduces concept prediction accuracy by 30‑40 % while only marginally affecting class accuracy, highlighting a vulnerability of interpretability; (4) NCPconA can induce class switches in more than 60 % of cases by altering a few attributes. The attacks are evaluated under a standard PGD‑10 setting with ε = 8/255, ensuring comparability.
The findings underscore two critical insights: (i) GZSL’s reliance on a single calibration parameter can mask true adversarial weakness, and (ii) the concept prediction stage is itself an attack surface that can be exploited without directly targeting class logits. The paper calls for future work on (a) robust loss functions that regularize attribute predictions, (b) meta‑learning or adaptive calibration schemes that are resistant to bias‑enhancing attacks, and (c) multi‑objective optimization that balances interpretability with adversarial robustness. By exposing these vulnerabilities, the study sets a new benchmark for evaluating and hardening ZSL systems, especially for safety‑critical applications such as medical imaging and autonomous driving.
Comments & Academic Discussion
Loading comments...
Leave a Comment