A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes various stochastic contextual bandit settings, including heteroskedastic linear bandits and logistic/Poisson bandits. We propose HCW-GLB-OMD, which consists of two components: an online mirror descent (OMD)-based estimator and Hessian-based confidence weights to achieve corruption robustness. This is computationally efficient in that it only requires ${O}(1)$ space and time complexity per iteration. Under the self-concordance assumption on the link function, we show a regret bound of $\tilde{O}\left( d \sqrt{\sum_t g(τ_t) \dotμ_{t,\star}} + d^2 g_{\max} κ+ d κC \right)$, where $\dotμ_{t,\star}$ is the slope of $μ$ around the optimal arm at time $t$, $g(τ_t)$’s are potentially exogenously time-varying dispersions (e.g., $g(τ_t) = σ_t^2$ for heteroskedastic linear bandits, $g(τ_t) = 1$ for Bernoulli and Poisson), $g_{\max} = \max_{t \in [T]} g(τ_t)$ is the maximum dispersion, and $C \geq 0$ is the total corruption budget of the adversary. We complement this with a lower bound of $\tildeΩ(d \sqrt{\sum_t g(τ_t) \dotμ_{t,\star}} + d C)$, unifying previous problem-specific lower bounds. Thus, our algorithm achieves, up to a $κ$-factor in the corruption term, instance-wise minimax optimality simultaneously across various instances of heteroskedastic GLBs with adversarial corruptions.

💡 Research Summary

The paper tackles a highly general and challenging setting in stochastic contextual bandits: heteroskedastic generalized linear bandits (GLBs) under adversarial reward corruptions. In this model, at each round the learner receives a possibly adversarially chosen context set, selects an arm, and observes a reward drawn from a GLM whose dispersion parameter may vary over time (capturing heteroskedasticity) and is then perturbed by an adaptive adversary with a total corruption budget C. The goal is to minimize cumulative pseudo‑regret measured in terms of the link function μ.

The authors propose a new algorithm called HCW‑GLB‑OMD (Hessian‑Confidence Weighted GLB‑Online Mirror Descent). The method builds on the recent GLB‑OMD framework but introduces a novel Hessian‑based confidence weighting scheme inspired by the CW‑OFUL algorithm. After observing a possibly corrupted reward, the algorithm assigns a weight
( w_t = \min{1, \alpha, g(\tau_t) / |x_t|{H_t^{-1}}} )
where ( g(\tau_t) ) is the known dispersion function (e.g., σ²_t for heteroskedastic linear bandits) and ( H_t ) is the accumulated Hessian ( \lambda I + \sum{s\le t} \dot\mu(\langle x_s,\theta_s\rangle) x_s x_s^\top ). This weight down‑weights the contribution of rounds where the uncertainty relative to the noise level is large, thereby limiting the impact of adversarial corruptions.

The algorithm proceeds in three steps each round: (1) select an arm by maximizing an optimistic estimate that combines the current mean prediction μ(⟨x,θ_t⟩) with a confidence bonus proportional to the Mahalanobis norm ( |x|_{H_t^{-1}} ); (2) receive the corrupted reward ( \tilde r_t = r_t + c_t ); (3) update the parameter estimate using an online mirror descent step on the weighted negative log‑likelihood. Crucially, all operations require only O(1) additional memory and computation beyond the standard OMD update, making the method extremely scalable.

Theoretical contributions are twofold. First, under the standard self‑concordance assumption on the link function (which bounds the second derivative of μ by a constant multiple of its first derivative), the authors prove a regret bound of
\

A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions

💡 Research Summary

Comments & Academic Discussion

Leave a Comment