Regret Bound by Variation for Online Convex Optimization

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In citep{Hazan-2008-extract}, the authors showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In this paper, we extend this result to general online convex optimization. We first analyze the limitations of the algorithm in \citep{Hazan-2008-extract} when applied it to online convex optimization. We then present two algorithms for online convex optimization whose regrets are bounded by the variation of cost functions. We finally consider the bandit setting, and present a randomized algorithm for online bandit convex optimization with a variation-based regret bound. We show that the regret bound for online bandit convex optimization is optimal when the variation of cost functions is independent of the number of trials.

💡 Research Summary

The paper builds on the seminal work of Hazan and Kale (2008), which showed that for online linear optimization the regret of the Follow‑the‑Regularized‑Leader (FTRL) algorithm can be bounded by the total variation of the cost vectors, i.e., (V_{AR}=\sum_{t=1}^{T}|f_t-\mu|^2). The authors ask whether a similar variation‑based bound can be obtained for the much broader class of online convex optimization (OCO) problems, where the loss functions (c_t(\cdot)) are convex but not necessarily linear.

Limitations of a naïve extension.
A straightforward way to apply FTRL to OCO is to replace each convex loss by its first‑order Taylor approximation at the point where the decision was made, i.e., use the gradient (\nabla c_t(x_t)) as a surrogate linear cost. This yields a regret decomposition that involves two terms: (i) a term measuring the smoothness of each individual loss (the distance between gradients evaluated at different points of the same loss) and (ii) a term measuring the change of the gradients across time. Even when all loss functions are identical, the first term does not vanish, leading to a regret bound of order (\mathcal{O}(\sqrt{T})) regardless of how “stable’’ the environment is. Hence, the original FTRL analysis does not directly give a tight variation‑based bound for general convex losses.

Sequential variation.
To overcome this, the authors introduce a new notion of variation, called sequential variation:
\

Regret Bound by Variation for Online Convex Optimization

💡 Research Summary

Comments & Academic Discussion

Leave a Comment