RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance

RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Deep neural networks are highly susceptible to backdoor attacks, yet most defense methods to date rely on balanced data, overlooking the pervasive class imbalance in real-world scenarios that can amplify backdoor threats. This paper presents the first in-depth investigation of how the dataset imbalance amplifies backdoor vulnerability, showing that (i) the imbalance induces a majority-class bias that increases susceptibility and (ii) conventional defenses degrade significantly as the imbalance grows. To address this, we propose Randomized Probability Perturbation (RPP), a certified poisoned-sample detection framework that operates in a black-box setting using only model output probabilities. For any inspected sample, RPP determines whether the input has been backdoor-manipulated, while offering provable within-domain detectability guarantees and a probabilistic upper bound on the false positive rate. Extensive experiments on five benchmarks (MNIST, SVHN, CIFAR-10, TinyImageNet and ImageNet10) covering 10 backdoor attacks and 12 baseline defenses show that RPP achieves significantly higher detection accuracy than state-of-the-art defenses, particularly under dataset imbalance. RPP establishes a theoretical and practical foundation for defending against backdoor attacks in real-world environments with imbalanced data.


💡 Research Summary

Backdoor attacks on deep neural networks have become a serious security concern, yet most existing defenses assume that the training data are balanced. This paper uncovers a previously overlooked vulnerability: class‑imbalance dramatically amplifies backdoor success and simultaneously degrades the performance of state‑of‑the‑art defenses. By systematically varying the imbalance ratio (ρ = 2, 10, 100, 200) on both long‑tailed and step‑imbalanced versions of five benchmark datasets (MNIST, SVAR, CIFAR‑10, TinyImageNet, ImageNet‑10), the authors demonstrate two key phenomena. First, the attack success rate (ASR) rises sharply as ρ increases, even when the poisoning budget remains fixed. Second, defenses that rely on global distributional cues—such as AC, ASSET, SCALE‑UP, and many empirical cleansing methods—experience a steep drop in true‑positive rate (TPR) and a surge in false‑positive rate (FPR) under the same imbalance conditions.

To address this gap, the authors propose Randomized Probability Perturbation (RPP), a certified poisoned‑sample detector that operates in a black‑box setting using only the model’s output probability vector. The core idea is that backdoor‑triggered inputs exhibit unusually stable prediction probabilities when subjected to small random perturbations, whereas clean inputs fluctuate more noticeably. RPP quantifies this stability by sampling M random noise vectors ε₁,…,ε_M, computing the model’s probability vectors p(x+ε_i) for each perturbed input, and measuring the average ℓ₂ change ΔP(x)=E_ε


Comments & Academic Discussion

Loading comments...

Leave a Comment