On Randomized Algorithms in Online Strategic Classification
Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such strategic behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T|\mathcal H|)$, which is far from the standard online learning rate of $O(\sqrt{T\log|\mathcal H|})$. In this work, we provide refined bounds for online strategic classification in both settings. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal{H}) Δ^2$, the existing lower bound $Ω(\mathrm{Ldim}(\mathcal{H}) Δ)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We also provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$. In the agnostic setting, we give a proper learner using convex optimization techniques to improve the regret upper bound to $O(\sqrt{T \log |\mathcal{H}|} + |\mathcal{H}| \log(T|\mathcal{H}|))$. We show a matching lower bound up to logarithmic factors for all proper learning rules, demonstrating the optimality of our learner among proper learners. As such, improper learning is necessary to further improve regret guarantees.
💡 Research Summary
Paper Overview
The authors study online strategic classification, a setting where agents can strategically modify their features in response to a deployed classifier. The learner repeatedly selects a hypothesis, reveals it to the agent, and the agent best‑responds by moving to a reachable feature (according to a manipulation graph) that yields a positive prediction if possible. The learner observes only the manipulated feature and the true label, and its goal is to minimize Stackelberg regret against the best fixed hypothesis in hindsight. Two regimes are considered: the realizable case (there exists a hypothesis with zero strategic loss) and the agnostic case (no such hypothesis exists).
Key Contributions
-
Lower bound for randomized learners in the realizable setting
- Prior work only gave an Ω(Ldim·Δ) lower bound for deterministic learners, where Ldim(H) is the Littlestone dimension of the hypothesis class and Δ is the maximum degree of the manipulation graph.
- The paper constructs a new adversarial sequence and proves that for any learner—deterministic or randomized—if the horizon T exceeds Ldim·Δ², the expected number of mistakes is at least Ω(min{√T·Ldim, Ldim·Δ}).
- Consequently, randomization cannot beat the deterministic lower bound when T is large, answering the open question of whether randomization always helps in the realizable strategic setting (the answer is “no”).
-
A randomized algorithm that improves the deterministic upper bound
- Two algorithms are presented. For finite hypothesis classes, Algorithm 2 (Uniform‑Mix) mixes an “all‑positive” classifier (which yields full‑information feedback) with a uniform random draw from the current version space. By tuning the mixing probability p = min{1, (log |H|)/T}, the expected mistake bound becomes O(√T log |H|).
- For infinite classes, Algorithm 4 embeds the Standard Optimal Algorithm (SOA) as an expert system and again mixes with the all‑positive classifier. The resulting bound is O(√T·Ldim(H)·log Δ), which is strictly better than the deterministic O(Ldim·Δ log Δ) bound when T is relatively small (specifically T < log Δ·Ldim·Δ²).
- Both algorithms are the first randomized learners to achieve a mistake bound that beats the best known deterministic bound in a non‑trivial regime.
-
Proper learner for the agnostic setting with near‑optimal regret
- The authors design a proper (i.e., hypothesis‑output) online learner based on convex optimization and online Lagrangian updates. At each round the learner maintains a probability distribution over H and updates it using observed losses on the manipulated features.
- The regret guarantee is
\
Comments & Academic Discussion
Loading comments...
Leave a Comment