KOINEU

February 10, 2026

Reading time: 17 minute

...

📝 Original Info

Title:
ArXiv ID: 2512.21241
Date:
Authors: Unknown

📝 Abstract

In hard-label black-box adversarial attacks, where only the top-1 predicted label is accessible, the prohibitive query complexity poses a major obstacle to practical deployment. In this paper, we focus on optimizing a representative class of attacks that search for the optimal ray direction yielding the minimum ℓ2-norm perturbation required to move a benign image into the adversarial region. Inspired by Nesterov's Accelerated Gradient (NAG), we propose a momentum-based algorithm, ARS-OPT, which proactively estimates the gradient with respect to a future ray direction inferred from accumulated momentum. We provide a theoretical analysis of its convergence behavior, showing that ARS-OPT enables more accurate directional updates and achieves faster, more stable optimization. To further accelerate convergence, we incorporate surrogatemodel priors into ARS-OPT's gradient estimation, resulting in PARS-OPT with enhanced performance. The superiority of our approach is supported by theoretical guarantees under standard assumptions. Extensive experiments on ImageNet and CIFAR-10 demonstrate that our method surpasses 13 state-ofthe-art approaches in query efficiency.

📄 Full Content

We focus on hard-label adversarial attacks. Considered among the most practical and challenging black-box attacks, hard-label attacks operate under strict information constraints. While white-box attacks (Goodfellow, Shlens, and Szegedy 2015;Madry et al. 2018) leverage model parameters and gradients, and score-based attacks (Ma, Chen, and Yong 2021) exploit confidence scores, hard-label attacks rely solely on top-1 predicted labels. This makes the efficient generation of adversarial examples substantially more difficult while enhancing their practical applicability.

Why study query-based black-box adversarial attacks under the hard-label setting? Real-world machine-learning services such as cloud vision APIs and biometric recognizers often reveal nothing more than the final predicted decision (i.e., the top-1 label) to external users. With gradients and confidence scores stripped away, an attacker is forced to treat the model as a hard-label black box to probe its decision boundary. This stringent setting accurately reflects the limited feedback of deployed services and raises three key challenges. (1) Minimal feedback: Each query yields only a hardlabel response, demanding efficient exploration strategies. (2) Practical relevance: It closely mirrors restricted commercial platforms where probability scores and internal details are deliberately hidden. (3) Security-critical: Hard-label attacks reveal vulnerabilities in “security-through-obscurity” systems and underscore the urgent need for defenses against adversaries with minimal information. Consequently, designing query-efficient attacks based solely on hard-label feedback is essential for vulnerability assessment and robust defenses.

Why are hard-label attacks challenging? Because a model’s predicted label typically changes only when an input moves across or near its decision boundary, hard-label attacks must restrict their search to this narrow region, making the optimization especially challenging. Early hard-label attacks like Boundary Attack (BA) (Brendel, Rauber, and Bethge 2018) and Biased BA (Brunner et al. 2019) initialize from a sample already in the adversarial region and progressively reduce the perturbation by stepping toward the original image while exploring directions on the decision boundary via randomly sampled spherical vectors. However, these approaches remain highly inefficient in terms of query cost: they rely almost entirely on random sampling and neglect valuable information from past queries, which impedes effective perturbation reduction. To address this challenge, recent studies have adopted zeroth-order (ZO) optimization techniques, which leverage boundary information more effectively to identify adversarial examples. Existing ZO-based attacks-such as HopSkipJumpAttack (HSJA) (Chen, Jordan, and Wainwright 2020), OPT (Cheng et al. 2019), Sign-OPT (Cheng et al. 2020), and Prior-OPT (Ma et al. 2025)primarily focus on improving gradient estimation through finite differences. However, their optimization strategies rely on vanilla gradient descent, overlooking well-established acceleration methods such as momentum and Nesterov’s accelerated gradient, which can enhance convergence rates even when the gradient estimation quality remains unchanged. To address these limitations, we propose ARS-OPT, a novel ZO optimization algorithm incorporating accelerated random search (ARS) (Nesterov and Spokoiny 2017). Our theoretical analysis demonstrates that ARS-OPT leverages second-order gradient information implicitly without requiring explicit Hessian estimation and establishes a bound on the expected gap between the objective value at iteration T and the optimum value. Building on this, we introduce PARS-OPT, which integrates transfer-based priors to improve gradient estimation. PARS-OPT further extends to combine priors from multiple surrogate models, delivering additional gains in attack performance. Extensive experiments on ImageNet, CIFAR-10, and a CLIP-based model demonstrate that our framework, consisting of ARS-OPT and its prior-enhanced variant PARS-OPT, outperforms 13 state-of-the-art baseline methods with superior query efficiency.

Our main contributions are summarized as follows.

• Novelty in hard-label attacks. We present ARS-OPT, a novel hard-label attack that accelerates convergence by estimating gradients along an interpolated “lookahead” direction, combining the search trajectory with accumulated momentum. We further introduce PARS-OPT, which integrates transfer-based priors from surrogate models to improve gradient estimation and enhance attack efficiency. • Novelty in theoretical analysis. We establish an O(1/T 2 ) convergence rate under standard assumptions, supported by the construction of an unbiased estimator of the true gradient that is essential for ensuring this rate.

The theoretical analysis provides a principled explanation for the acceleration behavior of our approach and clarifies its underlying optimization dynamics.

Given a classifier ψ : R d → R C designed for a C-class classification task, and a correctly classified input image x ∈ [0, 1] d , where d is the dimension of the input image, the adversary seeks to generate an adversarial example x adv by crafting a minimal perturbation such that the classifier’s prediction for x adv becomes incorrect. This adversarial objective can be formally expressed as:

where ∥x adv -x∥ p is the p-norm distortion, and the constraint Φ(x adv ) is defined as an attack success indicator:

Here, ŷ = arg max i∈{1,…,C} ψ(x adv ) i denotes the top-1 predicted label by classifier ψ, y is the true label of x, and y adv is the target label in a targeted attack scenario.

Following the ray-search methods (Cheng et al. 2019(Cheng et al. , 2020;;Ma et al. 2025), we reformulate the optimization problem in Eq. (1) as finding the optimal ray direction θ * from x that yields the minimal distance f (θ) to the boundary of the adversarial region. This can be formulated as:

By convention, f (θ) = +∞ if the set is empty. Consequently, the resulting adversarial example is constructed as adversarial region

g2( θt) In ARS-OPT, g1( θt) and g2( θt) are collinear, but this does not hold in PARS-OPT.

The circle represents the unit-norm constraint.

original image Figure 1: Illustration of a three-step update: first, compute the perturbation direction θt = (1α t )θ t + α t m t ; then estimate gradients at θt using a biased g 1 ( θt ) and an unbiased g 2 ( θt ); finally, update θ t+1 and m t+1 via a gradient descent step.

, where θ * is the optimal solution obtained from the minimization problem defined in Eq. ( 3).

Previous works (Cheng et al. 2019(Cheng et al. , 2020;;Ma et al. 2025) focus on efficient gradient estimation to optimize the direction θ, with step size typically determined by line search. However, they do not explore any optimization acceleration techniques beyond gradient estimation. Next, we present an overview of ARS-OPT and its prior-enhanced variant PARS-OPT, both equipped with theoretical convergence guarantees.

Nesterov and Spokoiny ( 2017) propose an Accelerated Random Search (ARS) method for ZO optimization, which rigorously establishes explicit non-asymptotic convergence rates under various convexity and smoothness assumptions by introducing an accelerated ZO framework. In the score-based setting, Cheng et al. (2021) extend ARS to score-based attacks and provide an analysis of the convergence rate. However, in hard-label attacks, obtaining function values requires extensive binary searches, significantly reducing the query efficiency of gradient estimation based on finite differences.

To address these limitations, we introduce ARS-OPT, a novel ZO optimization framework that can be seamlessly augmented with transfer-based priors to further boost query efficiency. The primary challenge is accelerating convergence in gradient descent when only poorly estimated gradients are available. At iteration t, we employ the following three-step update process for θ t (Fig. 1):

, where m 0 is initialized to θ 0 . 2 At θt , we use multiple queries to estimate gradients g 1 ( θt ) (biased estimator, e.g., Sign-OPT or Prior-OPT method) and g 2 ( θt ) (unbiased estimator of ∇f ( θt )). 3 Update both parameters by gradient descent:

Inspired by Nesterov’s accelerated gradient method, our approach dynamically tracks two sequences, i.e., the direction θ t and the momentum vector m t , and then computes a lookahead vector θt by linearly interpolating between θ t and m t , controlled by an interpolation coefficient α t . At θt , we estimate two gradients, g 1 ( θt ) and g 2 ( θt ), to compute the updates of θ t+1 and m t+1 , respectively. Although we adopt the same estimation procedure for g 1 ( θt ) as in Prior-OPT, our algorithm converges substantially faster, as demonstrated by our experiments. The convergence guarantee of our approach relies on two technical assumptions: (1) g 2 ( θt ) serves as an unbiased estimator of ∇f ( θt ), and (2)

with the full derivation given in the Appendix. We also note that our framework can incorporate various gradient estimation techniques, such as prior-guided estimation, to further improve performance. Our approach can be intuitively understood through the analogy of a walker descending a valley: rather than relying solely on the current slope, the walker looks ahead to anticipate the upcoming terrain and adjust the direction of motion accordingly, thereby achieving smoother and faster progress toward the minimum.

Our framework, spanning from Step 1 to Step 3, is compatible with various gradient estimation techniques, enabling flexible algorithmic implementations. In this section, we provide a detailed introduction to the fundamental algorithm, ARS-OPT. In Step 1, unlike standard gradient descent, the gradient is not computed at the current direction θ t . Instead, the algorithm predicts a candidate ray direction θt by interpolating between the momentum vector m t and the current direction θ t . The sequences of θ t and m t are referred to as the main sequence and the auxiliary sequence, respectively. θt is referred to as the lookahead position of θ t , and is computed via interpolation: θt ← (1α t )θ t + α t m t , where α t ∈ [0, 1] is the interpolation coefficient. The value of α t is defined as the positive root of the equation α 2 t = ζ t γ t (1-α t ), where γ t is a scalar determined in Algorithm 1, and ζ t = 2(q-1)+π dπ / L dπ 2(q-1)+π . This expression is derived from the convergence analysis of ARS-OPT. This choice of α t is critical to establishing the algorithm’s theoretical convergence guarantees. For detailed derivations, we refer readers to Appendix A. To maintain two sequences-the optimization variable θ t and the auxiliary variable m t (which accumulates historical momentum to capture global optimization trends)-we employ two gradient estimates, g 1 ( θt ) and g 2 ( θt ), to update θ t and m t , respectively:

where d is the dimension of the input image, q is the number of vectors in gradient estimation, and v t is the sign-based gradient estimate (Cheng et al. 2020) as v t :=

Step 2

Estimation

only a single query (Eq. ( 7))

x θt Sample q -s random vectors, take s priors, then orthogonalize via Gram-Schmidt.

Step 3 Update

Figure 2: Illustration of one iteration in PARS-OPT. We first form a lookahead point θt by linearly interpolating between the current direction θ t and the momentum term m t (with m 0 = θ 0 ). Next, we estimate v t via a sign-based procedure over a set of randomly sampled orthonormal basis vectors. Finally, we use v t to compute the biased gradient estimate g 1 ( θt ) and the unbiased estimate g 2 ( θt ), which are then used to update θ t and m t , yielding θ t+1 and m t+1 for the next iteration.

sign of the directional derivative with a single query:

Eq. ( 4) can be regarded as the projection of the true gradient onto v t . Eq. ( 5) is an unbiased estimator of ∇f ( θt ), derived from Theorem 4.1 1 . Theorem 4.1. Let {u 1 , u 2 , . . . , u q } be an orthonormal set obtained by orthogonalizing q vectors independently and uniformly sampled from the unit sphere in R d . Suppose g is a fixed vector in R d (for example, it is the true gradient to be estimated). Let v := q i=1 sign(g ⊤ u i )u i , and ĝ := g ⊤ v • v. Then we have

The proof of Theorem 4.1 is shown in Appendix A. In Eq. ( 8), ĝ is equal to g 1 ( θt ), and

sequently, the true gradient can be recovered as g =

which shows that g 2 ( θt ) is an unbiased estimator of ∇f ( θt ).

ARS-OPT relies exclusively on random orthonormal vectors to estimate the gradient, which leads to inaccurate gradient approximation and poor query efficiency. To further enhance the efficiency of the algorithm, we propose a variant algorithm named Prior-guided ARS-OPT (PARS-OPT) within our framework. An ideal prior would be the gradient of f (θ)

1 Throughout this paper, for any vector v, we denote v as its ℓ2-normalized vector, where v := v ∥v∥ .

derived from a surrogate model. However, since f (θ) is nondifferentiable due to the binary search process, this gradient cannot be directly computed. To overcome this challenge, we employ a differentiable surrogate function h(θ, λ) in Eq. ( 9), following Ma et al. (2025), which ensures the gradient relationship:

Here, f (•) is defined on the surrogate model ψ, λ 0 = f (θ 0 ) is treated as a constant scalar during differentiation, and c is a non-zero constant. where ψi := ψ x + λ • θ ∥θ∥ i is an abbreviation for the i-th element of the output of the surrogate model ψ, and x is the original image. Given s non-zero vectors k t,1 , . . . , k t,s computed as ∇ θ h(θ 0 , λ 0 ) from s surrogate models and qs randomly sampled vectors r 1 , . . . , r q-s ∼ N (0, I), we apply Gram-Schmidt orthogonalization to these q vectors to obtain an orthonormal set p t,1 , . . . , p t,s , u 1 , . . . , u q-s , which are used by the gradient estimation formulas:

where

To ensure the convergence of PARS-OPT, we still require g 2 ( θt )

Algorithm 1: (P)ARS-OPT Attack 1: Input: L-smooth function f , L ≥ L, the original image

x, the success indicator function Φ(•), initial ray direction θ 0 , number of estimation vectors q, finite-difference step size ϵ, input dimension d, number of iterations T , maximum gradient norm g max , γ 0 > 0, surrogate model set S = { ψ(i) , . . . , ψ(s) } with s > 0 for PARS-OPT, and S = ∅ for ARS-OPT. 2: Output: Adversarial example

for ψ(i) in S do 6: r i ∼ N (0, I) for i = 1, . . . , qs; 10: p t,1 , . . . , p t,s , u 1 , . . . , u q-s ← Orthogonalize({k t,1 , . . . , k t,s , r 1 , . . . , r q-s });

; ▷ It requires extra queries.

12:

; ▷ Directional derivative approximation by finite differences. 17:

Estimate g 1 ( θt ), g 2 ( θt ) by using Eq. ( 11) and Eq. ( 13);

19:

g 1 ( θt ) ← ClipGradNorm g 1 ( θt ), g max ; 20:

▷ This line is used only in PARS-OPT.

21:

to be an unbiased estimator of ∇f ( θt ), whose proof is more involved than in ARS-OPT; see the Appendix for details. Algorithm 1 presents a unified framework covering both ARS-OPT and PARS-OPT, and Fig. 2 offers an overview of the PARS-OPT procedure. In targeted attacks, we initialize θ 0 with the direction to an image x0 from the target class in the training set. The momentum term m 0 is initialized as θ 0 in the first iteration. Specifically, setting s = 0 reduces Eq. ( 11) and Eq. ( 13) to their counterparts in ARS-OPT, namely Eq. (4) and Eq. ( 6). Note that Dt and ∥ ∇f t ∥ 2 are estimators rather than exact values, and {∇f (θ t ) ⊤ p t,i } s i=1 in Dt require additional finite-difference approximations. Details are provided in Remark A.12 of Appendix A. Algorithm 1 is a practical approximation of an idealized version presented in Appendix A. Theorem 4.2 establishes the convergence guarantee for this idealized algorithm, giving an O(1/T 2 ) rate under smooth convex assumptions. In comparison, Theorem A.10 shows that Sign-OPT attains an O((ln T )/T ) rate, indicating that the idealized PARS-OPT converges faster than Sign-OPT. Theorem 4.2. Let θ * denote the optimal solution of Problem (3), and let θ 0 , θ T , γ 0 , and ζ t denote the corresponding quantities in the idealized version of Algorithm 1. Assuming that f (•) is smooth and convex, we have

The proof is given in Appendix A (Theorem A.11).

Dataset. We evaluate the proposed method on two publicly available datasets, CIFAR-10 (Krizhevsky and Hinton 2009) and ImageNet (Deng et al. 2009), with images resized to 3 × 32 × 32 and 3 × 299 × 299, respectively. For all experiments, 1,000 images are randomly selected from each dataset as test samples for evaluation. In the case of targeted attacks, the target class is defined as y adv = (y + 1) mod C, where y denotes the class. For the same target class, we use the same image x as the initialization for all methods.

On the ImageNet dataset, we evaluate two target models: Inception-v4 (Szegedy et al. 2017) and Swin Transformer (Liu et al. 2021). For Inception-v4 (input resolution 299 × 299), we use Inception-ResNet-v2 (IncResV2) and Xception as surrogate models. For Swin Transformer (inputs resized to 224 × 224), the surrogate models are ResNet-50 and ConViT (D’Ascoli et al. 2021). See Appendix for details. Baseline Methods. We compare ARS-OPT and PARS-OPT against baselines, including HSJA, TA, GeoDA, Evolutionary, SurFree, AHA, QEBA, CGBA-H, SQBA, BBA, Sign-OPT, Prior-Sign-OPT and Prior-OPT. In our methods, the suffix “-S” (e.g., ARS-OPT-S) means the random vectors u 1 , . . . , u q-s for gradient estimation are drawn from a 3 × 56 × 56-dimensional subspace. AHA, QEBA, and CGBA-H also adopt subspace sampling, while SQBA, BBA, Prior-Sign-OPT, Prior-OPT, and PARS-OPT leverage surrogate models, denoted by subscripts; e.g., PARS-OPT IncResV2 uses Inception-ResNet-v2 as the surrogate model. Metrics. We report the mean ℓ 2 distortion as

Results of Attacks against Undefended Models. Tables 1 and2 report the results of attacks against undefended models on 1,000 ImageNet images. In summary:

(1) In Table 1, PARS-OPT performs the best in untargeted attacks, while ARS-OPT-S achieves state-of-the-art performance in targeted attacks due to its stabilized optimization via the lookahead direction, reducing the risk of local minima. (2) Table 2 reports untargeted attack results on CLIP (ViT-L/14). Our methods outperform the baselines (Sign-OPT and Prior-OPT) in mean ℓ 2 distortion and attack success rate. Results of Attacks against Defense Models. We evaluate untargeted attacks against two types of defense models, i.e., adversarial training (AT) (Madry et al. 2018) and MIMIR (Xu et al. 2025). MIMIR achieves state-of-the-art performance on RobustBench (Croce et al. 2021). Fig. 3 shows that our methods achieve the best performance on ImageNet.

In our ablation studies, we perform controlled experiments designed according to our theoretical analysis, using images of dimension d = 3072. Fig. 4a shows the relationship between Dt and ζ t . As Dt increases, ζ t increases accordingly, which in turn improves the convergence rate of PARS-OPT (Eq. ( 14)). Fig. 4b illustrates that increasing the number of vectors q used for gradient estimation leads to larger ζ t and improved performance. Fig. 4c shows that when the priors have equal quality (i.e., identical Dt values), increasing their number leads to larger ζ t , thereby improving attack efficiency. Fig. 4d shows that when the prior is effective, even with a small Dt , PARS-OPT achieves a lower convergence bound than ARS-OPT, indicating better potential performance.

We propose a novel hard-label attack approach, comprising two algorithms-ARS-OPT and PARS-OPT-that accelerate convergence and improve attack success rates by leveraging a lookahead direction and transfer-based priors. We provide convergence guarantees through theoretical analysis and validate our methods with extensive experiments, demonstrating improvements over 13 state-of-the-art approaches.

However, existing methods overlook established acceleration strategies-such as momentum and Nesterov’s accelerated gradient-that can greatly improve convergence rates without requiring better gradient estimates. In this work, we address this gap by integrating acceleration techniques to enhance query efficiency. Moreover, our framework can further boost efficiency by incorporating transfer-based priors.

D.R. denotes the use of dimension reduction technique.

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on open access ArXiv data.

📝 Original Info

📝 Abstract

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

📄 Full Content

Reference

Start searching

No results found