Swap Regret Minimization Through Response-Based Approachability
We consider the problem of minimizing different notions of swap regret in online optimization. These forms of regret are tightly connected to correlated equilibrium concepts in games, and have been more recently shown to guarantee non-manipulability against strategic adversaries. The only computationally efficient algorithm for minimizing linear swap regret over a general convex set in $\mathbb{R}^d$ was developed recently by Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC ‘25). However, it incurs a highly suboptimal regret bound of $Ω(d^4 \sqrt{T})$ and also relies on computationally intensive calls to the ellipsoid algorithm at each iteration. In this paper, we develop a significantly simpler, computationally efficient algorithm that guarantees $O(d^{3/2} \sqrt{T})$ linear swap regret for a general convex set and $O(d \sqrt{T})$ when the set is centrally symmetric. Our approach leverages the powerful response-based approachability framework of Bernstein and Shimkin (JMLR ‘15) – previously overlooked in the line of work on swap regret minimization – combined with geometric preconditioning via the John ellipsoid. Our algorithm simultaneously minimizes profile swap regret, which was recently shown to guarantee non-manipulability. Moreover, we establish a matching information-theoretic lower bound: any learner must incur in expectation $Ω(d \sqrt{T})$ linear swap regret for large enough $T$, even when the set is centrally symmetric. This also shows that the classic algorithm of Gordon, Greenwald, and Marks (ICML ‘08) is existentially optimal for minimizing linear swap regret, although it is computationally inefficient. Finally, we extend our approach to minimize regret with respect to the set of swap deviations with polynomial dimension, unifying and strengthening recent results in equilibrium computation and online learning.
💡 Research Summary
The paper tackles the long‑standing problem of efficiently minimizing linear swap regret—a special case of Φ‑regret where the comparator class consists of all affine endomorphisms of a convex decision set P ⊂ ℝᵈ. Prior to this work, the only polynomial‑time algorithm was the recent Daskalakis‑Farina‑Fishelson‑Pipis‑Schneider method, which incurs a regret bound of Ω(d⁴√T) and requires a costly ellipsoid call at each round.
The authors revive the response‑based approachability framework of Bernstein and Shimkin (JMLR 2015). They first reduce linear swap regret to an approachability problem: define the joint loss‑profile κₜ = (ℓₜ⊗pₜ, ℓₜ) and a target convex set S = conv{(ℓ⊗b(ℓ), ℓ) : ℓ∈L}, where b(ℓ) is the best‑response to loss ℓ. Lemma 3.1 shows that the swap‑regret is bounded by twice the Frobenius norm of the maximal affine endomorphism times the approachability loss AppLoss_T = min_{s∈S}‖\bar κ_T − s‖F.
The core algorithm (Algorithm 1) maintains a cumulative deviation matrix Uₜ and, at each round, solves a bilinear zero‑sum game min{p∈P} max_{ℓ∈L} ⟨U_{t‑1}, (ℓ⊗p, ℓ)⟩. The learner plays the minimizer pₜ, observes ℓₜ, and updates Uₜ ← U_{t‑1} + (κₜ − sₜ) where sₜ = (ℓₜ⊗b(ℓₜ), ℓ*ₜ) lies in S. By a Pythagorean lemma, the inner product ⟨\bar κ_{t‑1} − \bar s_{t‑1}, κₜ − sₜ⟩ is non‑positive, guaranteeing ‖U_T‖F ≤ B√T with B = max{p,ℓ}‖(ℓ⊗p, ℓ)‖F.
To shrink B, the authors pre‑condition P using John’s ellipsoid: an affine transformation that puts P in John’s position ensures that every p∈P and ℓ∈L satisfy ‖p‖₂ ≤ √d and ‖ℓ‖₂ ≤ 1, yielding B = O(√d). Consequently, the approachability loss is O(√d √T) and the resulting linear swap regret is O(d^{3/2}√T) for arbitrary convex bodies. If P is centrally symmetric (0∈P), the John transformation makes B a constant, leading to the optimal‑up‑to‑constants bound O(d√T).
The paper also proves a matching information‑theoretic lower bound: for T ≥ Ω(d⁴) there exists a centrally symmetric convex set (the product of ℓ₁ and ℓ∞ balls) and an adversary forcing expected linear swap regret at least Ω(d√T). This shows that the classic Gordon‑Greenwald‑Marks algorithm, which attains ˜O(d√T) via external‑regret minimization over the affine endomorphism space, is existentially optimal, though computationally inefficient.
Importantly, the same approachability loss controls profile swap regret, introduced by Arunachaleswaran et al. (2025). Minimizing profile swap regret guarantees non‑manipulability against a dynamic optimizer, so the presented algorithm simultaneously achieves both guarantees.
Finally, the authors extend the technique to swap‑deviation families of polynomial dimension. By allowing mixed strategies and constructing higher‑dimensional approachability targets, they obtain regret bounds that improve upon prior work on low‑degree swap deviations while preserving polynomial‑time complexity.
In summary, the paper delivers (1) a dramatically simpler and faster algorithm for linear swap regret with O(d^{3/2}√T) (general) and O(d√T) (symmetric) guarantees, (2) a tight Ω(d√T) lower bound establishing optimality, (3) simultaneous minimization of profile swap regret for non‑manipulability, and (4) a unified framework that scales to polynomial‑dimensional swap deviation sets. These contributions close a major gap between computational efficiency and theoretical optimality in online learning and algorithmic game theory.
Comments & Academic Discussion
Loading comments...
Leave a Comment