Small Gradient Norm Regret for Online Convex Optimization
This paper introduces a new problem-dependent regret measure for online convex optimization with smooth losses. The notion, which we call the $G^\star$ regret, depends on the cumulative squared gradient norm evaluated at the decision in hindsight. We show that the $G^\star$ regret strictly refines the existing $L^\star$ (small loss) regret, and that it can be arbitrarily sharper when the losses have vanishing curvature around the hindsight decision. We establish upper and lower bounds on the $G^\star$ regret and extend our results to dynamic regret and bandit settings. As a byproduct, we refine the existing convergence analysis of stochastic optimization algorithms in the interpolation regime. Some experiments validate our theoretical findings.
💡 Research Summary
The paper introduces a novel problem‑dependent regret measure for online convex optimization (OCO) with smooth losses, called the $G^{\star}$ regret. Traditional “small‑loss” regret, denoted $L^{\star}$, requires non‑negative losses, a lower bound on each loss, and lacks translation invariance, which limits its applicability to settings such as linear losses. To overcome these drawbacks, the authors define the cumulative squared gradient norm at the hindsight optimal point $x^{\star}$ as $G^{\star}T=\sum{t=1}^T|\nabla\ell_t(x^{\star})|^2$ and use it as the basis of a new regret notion. Because gradient norms are invariant to additive constants and do not depend on loss sign, $G^{\star}$ is a strictly more general quantity than $L^{\star}$.
Under the standard assumptions of convexity, $L$‑smoothness (Assumption A1) and a bounded decision set (Assumption A2), the authors prove a fundamental relationship $G^{\star}_T\le 2L,L^{\star}_T$, showing that a small $L^{\star}$ automatically yields a small $G^{\star}$. More importantly, they construct several loss families—logistic regression with cross‑entropy, $\ell_p$ regression for $p\ge2$, exponential loss, and a synthetic regression with vanishing coefficients—where the gradient norm decays faster than the loss value, i.e., $|\nabla\ell_t(x^{\star})|^2=o(\ell_t(x^{\star}))$. In these regimes $G^{\star}_T$ can be arbitrarily smaller than $L^{\star}_T$, demonstrating that $G^{\star}$ strictly refines the small‑loss regret.
The paper revisits the analysis of Online Gradient Descent (OGD). By exploiting the smoothness inequality $\ell_t(x)\ge\ell_t(x_t)+\langle\nabla\ell_t(x_t),x-x_t\rangle+\frac{1}{2L}|\nabla\ell_t(x_t)-\nabla\ell_t(x)|^2$, the authors replace the usual self‑boundedness term with a direct bound on the gradient difference. This yields the key recursion
\
Comments & Academic Discussion
Loading comments...
Leave a Comment