Online Min-Max Optimization: From Individual Regrets to Cumulative Saddle Points
We propose and study an online version of min-max optimization based on cumulative saddle points under a variety of performance measures beyond convex-concave settings. After first observing the incompatibility of (static) Nash equilibrium (SNE-Reg$_T$) with individual regrets even for strongly convex-strongly concave functions, we propose an alternate \emph{static} duality gap (SDual-Gap$_T$) inspired by the online convex optimization (OCO) framework. We provide algorithms that, using a reduction to classic OCO problems, achieve bounds for SDual-Gap$_T$~and a novel \emph{dynamic} saddle point regret (DSP-Reg$_T$), which we suggest naturally represents a min-max version of the dynamic regret in OCO. We derive our bounds for SDual-Gap$_T$~and DSP-Reg$_T$under strong convexity-strong concavity and a min-max notion of exponential concavity (min-max EC), and in addition we establish a class of functions satisfying min-max ECthat captures a two-player variant of the classic portfolio selection problem. Finally, for a dynamic notion of regret compatible with individual regrets, we derive bounds under a two-sided Polyak-Łojasiewicz (PL) condition.
💡 Research Summary
This paper introduces a novel perspective on online zero‑sum games by focusing on the cumulative saddle point rather than the traditional individual regrets of the two players. The authors observe that static Nash equilibrium regret (SNE‑Reg_T) is fundamentally incompatible with sub‑linear individual regrets even in strongly convex‑strongly concave settings, prompting the need for alternative performance measures. They define two new regret concepts: static duality gap (SDual‑Gap_T) and dynamic saddle‑point regret (DSP‑Reg_T). SDual‑Gap_T is the sum over rounds of the gap function g′_t(x_t, y_t) = f_t(x_t, y′) – f_t(x′, y_t), where (x′, y′) is the cumulative saddle point minimizing the total loss. This quantity coincides with the static regret in the online convex optimization (OCO) framework, allowing the use of classic OCO algorithms. DSP‑Reg_T is the dynamic counterpart, measuring the cumulative deviation of the algorithm’s actions from the per‑round optimal gap, and is equivalent to dynamic regret for the sequence {g′_t}.
The paper proposes two algorithms tailored to these measures. Online Gradient Descent‑Ascent (OGDA) operates under a λ‑strongly convex‑strongly concave assumption and achieves an O(L_0^2/λ·log T) bound on SDual‑Gap_T. Moreover, the average iterate converges to the cumulative saddle point at a rate O(√(log T/T)) in ℓ₂ norm. Online Min‑Max Newton Step (OMMNS) handles the min‑max exponential concavity (min‑max EC) class, attaining a bound O(2d(1/α+L_0D)·log T) on SDual‑Gap_T, where d is the total dimension, α the EC parameter, and D the diameter of the decision set. Both algorithms require no prior knowledge of the cumulative saddle point.
Dynamic performance is addressed by embedding OGDA and OMMNS as base learners in the “sleeping experts” framework. This yields a dynamic saddle‑point regret of O(max{log T, √(T V_T) log T}) for strongly convex‑strongly concave functions and O(d·max{log T, √(T V_T) log T}) for min‑max EC functions, where V_T quantifies the total variation of the loss sequence. Under a separability assumption, DSP‑Reg_T is shown to upper‑bound the static duality gap, linking the two notions.
The authors also explore the setting where each player minimizes its own individual regret. Assuming a two‑sided Polyak‑Łojasiewicz (PL) condition, they prove that an online accelerated gradient descent algorithm (AGDA) achieves an O(U_T) bound on the duality gap, where U_T captures problem‑specific uncertainty. However, they extend the impossibility result of Zhang et al. (2022) to show that sub‑linear SNE‑Reg_T and sub‑linear individual regrets cannot be simultaneously achieved under strong convex‑concave, min‑max EC, or two‑sided PL conditions.
A concrete application is presented: an adversarial portfolio selection problem. The loss function f(x, y) = –ln⟨x, A·(1/y)⟩, with diagonal price matrix A, satisfies min‑max EC with α = 1. In this game, player X selects portfolio weights while player Y manipulates asset prices. The cumulative saddle point corresponds to a fixed rebalanced portfolio and a fixed price‑manipulation strategy that are optimal in hindsight. Using OMMNS, the average strategies of both players converge to this optimal pair, demonstrating practical relevance.
In summary, the paper establishes a comprehensive theoretical framework for online min‑max optimization aimed at approximating the cumulative saddle point. It introduces SDual‑Gap_T and DSP‑Reg_T as natural extensions of static and dynamic regret, provides algorithmic solutions with logarithmic or near‑logarithmic bounds under various curvature assumptions, and clarifies the fundamental trade‑off between individual regret minimization and cumulative saddle‑point approximation. The results broaden the scope of online game‑theoretic learning and open avenues for applications such as robust portfolio management.
Comments & Academic Discussion
Loading comments...
Leave a Comment