Online Learning via Sequential Complexities

We consider the problem of sequential prediction and provide tools to study the minimax value of the associated game. Classical statistical learning theory provides several useful complexity measures to study learning with i.i.d. data. Our proposed sequential complexities can be seen as extensions of these measures to the sequential setting. The developed theory is shown to yield precise learning guarantees for the problem of sequential prediction. In particular, we show necessary and sufficient conditions for online learnability in the setting of supervised learning. Several examples show the utility of our framework: we can establish learnability without having to exhibit an explicit online learning algorithm.

💡 Research Summary

The paper tackles the fundamental problem of online (sequential) prediction by framing it as a two‑player zero‑sum game: the learner repeatedly selects a hypothesis from a class 𝔽, while an adversary chooses the next instance‑label pair based on the learner’s past actions. The performance metric is the minimax value V_T(𝔽), i.e., the worst‑case cumulative loss the learner can guarantee over T rounds. Classical statistical learning theory provides several complexity measures—VC dimension, Rademacher complexity, covering numbers—that are extremely useful for i.i.d. data, but they do not directly capture the adaptive nature of online settings where the data distribution can change in response to the learner’s predictions.

To bridge this gap, the authors introduce a family of “sequential complexities,” the most prominent being the Sequential Rademacher Complexity R_T(𝔽). Instead of sampling independent Rademacher signs for a fixed data set, they construct a full binary tree of depth T. Each node of the tree is labeled by a possible instance x_t that the adversary could present after observing the sequence of previous signs σ₁,…,σ_{t‑1}. Independent Rademacher variables σ₁,…,σ_T are then placed on the edges, and the complexity is defined as the expected supremum over 𝔽 of the average signed output along a random root‑to‑leaf path. Formally,

R_T(𝔽) = 𝔼_{σ}

💡 Research Summary

📜 Original Paper Content