Lipschitz Bandits with Stochastic Delayed Feedback
The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit in the presence of stochastic delayed feedback, where the rewards are not observed immediately but after a random delay. We consider both bounded and unbounded stochastic delays, and design algorithms that attain sublinear regret guarantees in each setting. For bounded delays, we propose a delay-aware zooming algorithm that retains the optimal performance of the delay-free setting up to an additional term that scales with the maximal delay $τ_{\max}$. For unbounded delays, we propose a novel phased learning strategy that accumulates reliable feedback over carefully scheduled intervals, and establish a regret lower bound showing that our method is nearly optimal up to logarithmic factors. Finally, we present experimental results to demonstrate the efficiency of our algorithms under various delay scenarios.
💡 Research Summary
This paper initiates the study of Lipschitz (continuum‑armed) bandits when feedback arrives after a random delay. The authors consider two regimes: bounded delays, where the delay τ_t is guaranteed to lie in {0,…,τ_max}, and unbounded delays, where τ_t may be arbitrarily large (including the possibility of never arriving). The action space A is a compact doubling metric space of diameter 1, and the unknown mean reward function μ: A →
Comments & Academic Discussion
Loading comments...
Leave a Comment