Query Strategies for Evading Convex-Inducing Classifiers
Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the adversary to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition input space into two sets one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that near-optimal evasion can be accomplished without reverse-engineering the classifier’s decision boundary. We also consider general lp costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p=1.
💡 Research Summary
The paper addresses the problem of an adversary who wishes to modify a malicious instance so that a classifier no longer flags it, while incurring as little modification cost as possible. The authors focus on a broad family of classifiers they call “convex‑inducing classifiers”: binary classifiers that partition the feature space into two sets, one of which is convex. This family includes linear classifiers, one‑class SVMs, bounded‑PCA anomaly detectors, hypersphere‑based detectors, and more complex convex shapes such as intersections of half‑spaces or balls.
The adversary’s objective is formalized through a cost function A(x) that measures the weighted ℓp distance between a candidate instance x and the attacker’s original target x_A (the malicious instance the attacker would like to submit). The goal is to find an instance x in the negative class (the “normal” region) that minimizes A(x). The optimal value is called the Minimal Adversarial Cost (MAC). Since the attacker cannot compute MAC exactly, the paper defines an ε‑approximate IMAC (ε‑IMAC) as any negative instance whose cost is at most (1+ε)·MAC.
Lowd and Meek (2005) previously showed how to achieve ε‑IMAC for linear classifiers by reverse‑engineering the decision boundary (the ACRE approach). Their method required many queries because it first reconstructed the hyperplane. The present work generalizes the setting to any convex‑inducing classifier and eliminates the need to reconstruct the boundary. The key insight is that if one of the two class regions is convex, then for any cost threshold C the set B_C(A) = {x | A(x) ≤ C} (a convex ℓp ball) either lies entirely inside the positive region, entirely inside the negative region, or straddles the boundary. By asking a single membership query—“does point y belong to the negative class?”—the attacker can determine whether B_C(A) intersects the negative region. This binary test tells whether C is an upper bound or a lower bound on MAC.
Using this test, the attacker performs a binary search on the cost value. Starting with a trivial lower bound C⁺ (e.g., the cost of a known negative instance) and an upper bound C⁻ (e.g., the cost of the original malicious instance), the algorithm repeatedly queries the midpoint (or geometric mean for multiplicative search) and updates the bounds based on the test outcome. After O(log ( (C⁻/C⁺) / ε )) iterations, the gap between the bounds shrinks enough that any point found in B_{C_t}(A)∩X⁻ is an ε‑IMAC. Because each iteration requires only one membership query, the total query complexity is O(D·log 1/ε), where D is the feature dimension; the factor D comes from the cost of evaluating the ℓp distance for each query.
The authors give special attention to ℓ₁ costs, which are common in spam‑filter evasion (edit distance on words, URLs, etc.). For ℓ₁ the cost ball is a polytope with at most 2D facets. They design a K‑step MultiLineSearch algorithm that explores each facet (or “line”) separately, achieving a query bound of O(K·D·log 1/ε) with K ≤ 2D. This improves over the generic binary search by exploiting the piecewise‑linear structure of the ℓ₁ ball.
The paper also treats the case where the positive class is convex (instead of the negative class) and shows that the same framework applies symmetrically. Moreover, they extend the analysis to general ℓp costs (p > 1). In those cases the midpoint of the interval is taken as the geometric mean, preserving the multiplicative search property.
Complexity results are summarized as follows:
- For any convex‑inducing classifier and any weighted ℓp cost, an ε‑IMAC can be found with polynomially many membership queries in D and log 1/ε.
- For ℓ₁ costs, the K‑step MultiLineSearch reduces the constant factor, yielding fewer queries than the generic method.
- When the classifier is linear (a special case), the proposed algorithm requires fewer queries than the Lowd‑Meek reverse‑engineering technique, matching the lower bound for query‑efficient evasion.
The authors discuss practical assumptions: the attacker has unrestricted access to a membership oracle (e.g., can submit arbitrary feature vectors and observe the binary label). They acknowledge that real systems may limit query rates, hide certain features, or impose costs per query, but argue that their model captures a worst‑case scenario useful for security analysis.
Finally, the paper positions its contribution in the broader adversarial learning literature. While prior work (Dalvi et al., 2004) focused on defending classifiers by anticipating attacks, this work focuses on the attacker’s side, providing a constructive algorithm that does not need to reconstruct the classifier’s decision surface. The authors suggest future directions such as extending the approach to non‑convex classifiers, incorporating query‑rate constraints, and developing defensive mechanisms that detect the characteristic query patterns of convex‑inducing evasion.
In summary, the paper delivers a theoretically grounded, query‑efficient method for near‑optimal evasion of a wide class of convex‑inducing classifiers, showing that an adversary can achieve ε‑approximate minimal cost modifications with only a modest number of queries, especially under ℓ₁ cost metrics. This advances our understanding of the security vulnerabilities of many practical detection systems and highlights the need for defenses that go beyond simple boundary obfuscation.
Comments & Academic Discussion
Loading comments...
Leave a Comment