A Simple Reduction Scheme for Constrained Contextual Bandits with Adversarial Contexts via Regression

A Simple Reduction Scheme for Constrained Contextual Bandits with Adversarial Contexts via Regression
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study constrained contextual bandits (CCB) with adversarially chosen contexts, where each action yields a random reward and incurs a random cost. We adopt the standard realizability assumption: conditioned on the observed context, rewards and costs are drawn independently from fixed distributions whose expectations belong to known function classes. We consider the continuing setting, in which the algorithm operates over the entire horizon even after the budget is exhausted. In this setting, the objective is to simultaneously control regret and cumulative constraint violation. Building on the seminal SquareCB framework of Foster et al. (2018), we propose a simple and modular algorithmic scheme that leverages online regression oracles to reduce the constrained problem to a standard unconstrained contextual bandit problem with adaptively defined surrogate reward functions. In contrast to most prior work on CCB, which focuses on stochastic contexts, our reduction yields improved guarantees for the more general adversarial context setting, together with a compact and transparent analysis.


💡 Research Summary

The paper tackles the problem of constrained contextual bandits (CCB) where each round presents an adversarially chosen context, a stochastic reward, and a stochastic cost. Under the standard realizability assumption—there exist functions f*∈F and g*∈G that exactly describe the conditional expectations of reward and cost—the learner must maximize cumulative reward while keeping long‑term cost (or budget) violations small. Unlike most prior work that assumes i.i.d. contexts, the authors consider fully adversarial contexts, which allows the model to remain robust under distribution shift or adaptive adversaries.

The core contribution is a reduction scheme that transforms the constrained problem into an unconstrained contextual bandit problem using the SquareCB framework (Foster & Rakhlin, 2020) together with an online regression oracle O_sq. At each round t, the oracle receives the current context x_t and outputs predictions \hat f_t(x_t,a) and \hat g_t(x_t,a) for all actions a∈


Comments & Academic Discussion

Loading comments...

Leave a Comment