Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks
Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model output scores, iteratively optimizing inputs based on observed loss values. While recent runtime defenses attempt to disrupt this process via output perturbation, most either require access to model parameters or fail when attackers adapt their tactics. In this paper, we first reveal that even the state-of-the-art plug-and-play defense can be bypassed by adaptive attacks, exposing a critical limitation of existing runtime defenses. We then propose Dashed Line Defense (DLD), a plug-and-play post-processing method specifically designed to withstand adaptive query strategies. By introducing ambiguity in how the observed loss reflects the true adversarial strength of candidate examples, DLD prevents attackers from reliably analyzing and adapting their queries, effectively disrupting the AE generation process. We provide theoretical guarantees of DLD’s defense capability and validate its effectiveness through experiments on ImageNet, demonstrating that DLD consistently outperforms prior defenses–even under worst-case adaptive attacks–while preserving the model’s predicted labels.
💡 Research Summary
The paper addresses the growing threat of score‑based black‑box query attacks (SQAs), which generate adversarial examples by repeatedly querying a model’s output scores and optimizing a loss function without any access to model parameters. While training‑time defenses such as adversarial training are costly and often impractical, runtime defenses that perturb model outputs at inference time have become attractive because they require no retraining. Among these, the state‑of‑the‑art plug‑and‑play post‑processing defense, Adversarial Attack on Attackers (AAA), claims robustness by mapping the true margin loss through a piecewise linear (AAA‑linear) or sinusoidal (AAA‑sine) function, thereby confusing the attacker’s gradient estimation.
The authors first demonstrate that AAA is vulnerable to adaptive attacks. AAA‑linear can be bypassed simply by reversing the optimization direction: after the attacker’s loss stops decreasing, they switch from minimization to maximization, exploiting the fact that the linear mapping only flips the sign of the loss within each interval. AAA‑sine, designed to defend against both minimization and maximization, still suffers because each monotonic interval can mislead only one direction at a time. By alternating between minimization and maximization whenever a stagnation of loss is detected (Algorithm 2), the attacker dramatically reduces the defense’s effectiveness; under this adaptive tactic, AAA‑sine’s under‑attack accuracy drops from 61.7 % to 41.5 %.
To overcome these shortcomings, the paper proposes Dashed Line Defense (DLD), a novel post‑processing module that introduces deliberate ambiguity between the observed loss and the true loss. DLD defines a loss‑mapping function D_post that is non‑smooth and non‑symmetric. For a chosen interval length τ, a bias term L_bias(L;τ)=⌊L/τ⌋·τ partitions the loss axis into consecutive blocks. Within each block, D_post either outputs a “high” transformed loss L_high or a “low” transformed loss L_low, controlled by a scaling parameter h∈
Comments & Academic Discussion
Loading comments...
Leave a Comment