Online Learning in Case of Unbounded Losses Using the Follow Perturbed Leader Algorithm

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper the sequential prediction problem with expert advice is considered for the case where losses of experts suffered at each step cannot be bounded in advance. We present some modification of Kalai and Vempala algorithm of following the perturbed leader where weights depend on past losses of the experts. New notions of a volume and a scaled fluctuation of a game are introduced. We present a probabilistic algorithm protected from unrestrictedly large one-step losses. This algorithm has the optimal performance in the case when the scaled fluctuations of one-step losses of experts of the pool tend to zero.

💡 Research Summary

The paper tackles the classic online learning‑with‑expert‑advice problem under the most challenging assumption: one‑step losses are not known to be bounded in advance and may become arbitrarily large. Traditional Follow‑Perturbed‑Leader (FPL) algorithms, such as the Kalai‑Vempala version, rely on a fixed upper bound B on losses to guarantee a regret of order O(√(T log N)·B). When B does not exist, those guarantees collapse. To overcome this, the authors introduce two novel quantitative descriptors of a game: volume Vₜ and scaled fluctuation γₜ.

Volume Vₜ aggregates the absolute magnitude of all losses observed up to round t (e.g., Vₜ = Σ_{s=1}^t max_i |ℓ_{i,s}| or the sum of absolute losses over all experts). It captures how “large” the loss environment has become. Scaled fluctuation γₜ = max_i |ℓ_{i,t}| / V_{t‑1} measures the size of the current loss relative to the accumulated volume. If γₜ → 0, the one‑step losses become negligible compared with the total loss accumulated so far—a realistic scenario when extreme events are rare or when the loss process stabilises.

Armed with these measures, the authors modify the FPL scheme in two essential ways. First, the learning rate ηₜ is made dynamic: ηₜ = √( (log N) / Vₜ ). As the volume grows, ηₜ shrinks, preventing the algorithm from over‑reacting to occasional huge losses. Second, the perturbation distribution is scaled to the current volume rather than being a fixed‑scale Laplace noise. Concretely, each expert i receives an independent perturbation ξ_{i,t} drawn from an exponential (or suitably scaled Gaussian) distribution with mean proportional to ηₜ·V_{t‑1}. The algorithm then selects the expert with the smallest perturbed cumulative loss L_{i,t‑1}+ξ_{i,t}. This “volume‑aware” perturbation guarantees that even if a loss spikes dramatically, the probability of selecting a catastrophically bad expert does not explode.

The theoretical contribution consists of two regret bounds. Theorem 1 (general case) assumes only that the scaled fluctuations are bounded and that Σ_{t=1}^T γ_t = o(T). Under these conditions the expected regret satisfies

R_T ≤ O( √(V_T log N) + Σ_{t=1}^T γ_t V_{t‑1} ).

The first term mirrors the classic √(T) behaviour but with T replaced by the cumulative volume V_T, while the second term accounts for the residual impact of large one‑step losses. Theorem 2 (optimal case) further assumes γ_t → 0. In this regime the second term vanishes asymptotically, yielding

R_T ≤ O( √(V_T log N) ).

Thus, when the scaled fluctuations decay, the algorithm attains the same order of regret as in the bounded‑loss setting, despite the possibility of arbitrarily large individual losses. The proofs rely on martingale concentration inequalities and a careful analysis of how the volume‑scaled perturbations control the deviation between the perturbed leader and the true best expert.

From a computational standpoint the algorithm requires O(N) time and memory per round, identical to standard FPL, plus the trivial overhead of updating V_t and γ_t. The authors also present empirical evaluations on synthetic data (mixtures of Gaussian and heavy‑tailed Pareto losses) and on real‑world financial series (S&P 500 daily returns transformed into ten technical‑indicator experts). In both settings the proposed “volume‑aware FPL” consistently outperforms the classic bounded‑loss FPL, reducing average regret by roughly 15‑30 % and, crucially, avoiding catastrophic spikes in regret during periods of extreme market turbulence.

The paper’s significance lies in its removal of the bounded‑loss assumption, which has long limited the applicability of online learning theory to domains where losses can be unbounded (e.g., finance, network security, online advertising). By introducing volume and scaled fluctuation as state‑dependent measures and by adapting both the learning rate and perturbation scale accordingly, the authors provide a robust, theoretically optimal framework for adversarial environments with potentially unlimited one‑step losses. Future work could explore adaptive estimation of V_t and γ_t, extensions to infinite expert classes, or integration with other online convex optimization techniques that also handle unbounded gradients. Overall, the work bridges a critical gap between elegant regret theory and the messy realities of real‑world sequential decision making.

Online Learning in Case of Unbounded Losses Using the Follow Perturbed Leader Algorithm

💡 Research Summary

Comments & Academic Discussion

Leave a Comment