Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation
Adversarial distillation in the standard min-max adversarial training framework aims to transfer adversarial robustness from a large, robust teacher network to a compact student. However, existing work often neglects to incorporate state-of-the-art robust teachers. Through extensive analysis, we find that stronger teachers do not necessarily yield more robust students-a phenomenon known as robust saturation. While typically attributed to capacity gaps, we show that such explanations are incomplete. Instead, we identify adversarial transferability-the fraction of student-crafted adversarial examples that remain effective against the teacher-as a key factor in successful robustness transfer. Based on this insight, we propose Sample-wise Adaptive Adversarial Distillation (SAAD), which reweights training examples by their measured transferability without incurring additional computational cost. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that SAAD consistently improves AutoAttack robustness over prior methods. Our code is available at https://github.com/HongsinLee/saad.
💡 Research Summary
This paper presents a deep investigation into a counterintuitive phenomenon in adversarial distillation (AD) and offers a novel solution. AD aims to transfer the robustness of a large, adversarially trained teacher model to a compact student model within a min-max adversarial training framework. While leveraging stronger, state-of-the-art robust teachers seems logical, the authors find that this often leads to worse student robustness—a problem termed “robust saturation.” Contrary to prior explanations attributing this to capacity gaps, the paper identifies “adversarial transferability” as the key factor.
Adversarial transferability is defined as the fraction of adversarial examples crafted on the student model that remain effective (i.e., are misclassified) by the teacher. Through extensive analysis, the authors categorize teachers into Effective Robust Teachers (ERTs) and Ineffective Robust Teachers (IRTs). ERTs exhibit higher entropy (uncertainty) in their outputs on student-crafted attacks and have a high transferability ratio. IRTs, despite being robust themselves, produce overconfident (low-entropy) outputs on these attacks and have a low transferability ratio. The authors show that learning from the overconfident, “hard” soft labels of IRTs drastically increases the student’s “adversarial variance,” leading to unstable training and severe robust overfitting.
To address this core issue, the authors propose Sample-wise Adaptive Adversarial Distillation (SAAD). The core idea of SAAD is to adaptively reweight training samples based on their measured transferability. During the adversarial distillation step, SAAD assigns higher weights to transferable adversarial samples (those that fool both student and teacher) and lower weights to non-transferable ones. This focuses the student’s learning on the teacher’s robust decision boundaries for samples where the teacher’s knowledge is directly relevant and mitigates the high-variance effect of non-transferable, noisy samples. Additionally, for clean samples, SAAD introduces a distillation term weighted by the inverse transferability, which helps improve clean accuracy by focusing on samples where the student’s and teacher’s outputs differ significantly.
A significant advantage of SAAD is its efficiency; it estimates transferability on-the-fly during the standard adversarial training loop without requiring extra adversarial example generation or substantial computational overhead.
Comprehensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that SAAD consistently outperforms prior AD methods (including ARD, RSLAD, IAD, AdaAD, and IGDM) in terms of AutoAttack robustness. Crucially, SAAD successfully enables students to distill robustness from powerful modern teachers that previously led to poor performance with existing methods. The work establishes adversarial transferability as a critical diagnostic tool and performance predictor for adversarial distillation, providing both a clear explanation for robust saturation and an effective, practical algorithm to overcome it.
Comments & Academic Discussion
Loading comments...
Leave a Comment