Nonparametric Bayesian Optimization for General Rewards
This work focuses on Bayesian optimization (BO) under reward model uncertainty. We propose the first BO algorithm that achieves no-regret guarantee in a general reward setting, requiring only Lipschitz continuity of the objective function and accommodating a broad class of measurement noise. The core of our approach is a novel surrogate model, termed as infinite Gaussian process ($\infty$-GP). It is a Bayesian nonparametric model that places a prior on the space of reward distributions, enabling it to represent a substantially broader class of reward models than classical Gaussian process (GP). The $\infty$-GP is used in combination with Thompson Sampling (TS) to enable effective exploration and exploitation. Correspondingly, we develop a new TS regret analysis framework for general rewards, which relates the regret to the total variation distance between the surrogate model and the true reward distribution. Furthermore, with a truncated Gibbs sampling procedure, our method is computationally scalable, incurring minimal additional memory and computational complexities compared to classical GP. Empirical results demonstrate state-of-the-art performance, particularly in settings with non-stationary, heavy-tailed, or other ill-conditioned rewards.
💡 Research Summary
The paper addresses a fundamental limitation of Bayesian optimization (BO): the reliance on restrictive assumptions about the reward function and noise, typically enforced by Gaussian process (GP) surrogates. Classical BO assumes that the deterministic component μ*(x) either follows a GP sample path or belongs to a reproducing kernel Hilbert space (RKHS) with a known norm bound, and that the stochastic noise ε*(x) is i.i.d. sub‑Gaussian. These assumptions are often violated in practice, especially when the objective is non‑stationary, the noise is heavy‑tailed or heteroscedastic, or the smoothness of μ* varies across the domain.
To overcome these constraints, the authors introduce the infinite Gaussian process (∞‑GP), a Bayesian non‑parametric surrogate that places a prior over the entire space of reward distributions rather than over a fixed functional class. The ∞‑GP is constructed as an infinite mixture of GPs, where the mixing weights and component functions are generated by a spatial Dirichlet process (DP) that adapts sequentially to the locations queried during BO. This formulation captures two distinct forms of uncertainty: (i) value uncertainty (the usual GP posterior variance) and (ii) model uncertainty (uncertainty about the correct family of reward distributions). By allowing each input point to be represented as a mixture of infinitely many GP experts, the model can approximate a far broader class of reward distributions while only requiring that μ* be Lipschitz continuous—a strictly weaker condition than the RKHS assumption.
The acquisition policy is Thompson Sampling (TS). The authors develop a novel regret analysis that directly relates the Bayesian regret of TS to the total variation (TV) distance between the surrogate’s posterior predictive distribution and the true reward distribution. If the ∞‑GP posterior is consistent with the true distribution, the TV distance converges to zero, yielding a sub‑linear regret bound (O(√T)) without invoking information‑gain arguments that depend on RKHS norms. This analysis is the first to provide a no‑regret guarantee for BO under such general reward settings.
Computationally, the infinite mixture would normally require expensive MCMC. The authors propose a truncated Gibbs sampler that limits the number of active mixture components to a logarithmic factor, raising the computational complexity from O(n³) (standard GP) to O(n³ log n) and adding negligible memory overhead. This makes the method scalable to typical BO problem sizes.
Empirical evaluation covers three challenging scenarios: (1) non‑stationary objectives where smoothness and length‑scale vary across the domain, (2) heavy‑tailed noise modeled by α‑stable distributions, and (3) heteroscedastic noise with location‑dependent variance. In all cases, ∞‑GP‑TS outperforms conventional GP‑UCB, GP‑TS, and specialized variants designed for each individual challenge. The method consistently achieves faster convergence to the global optimum and lower cumulative regret, demonstrating robustness to model misspecification and extreme noise.
The contributions can be summarized as: (i) a flexible non‑parametric surrogate (∞‑GP) that relaxes all standard GP assumptions, (ii) a TV‑based TS regret framework that yields no‑regret guarantees under merely Lipschitz continuity and mild tail conditions, (iii) an efficient truncated Gibbs inference scheme with near‑GP computational cost, and (iv) extensive experiments confirming state‑of‑the‑art performance on ill‑conditioned reward landscapes.
Limitations include the reliance on Lipschitz continuity and mild tail assumptions for the theoretical guarantees, potential scalability concerns in very high‑dimensional spaces where DP mixtures may become less efficient, and the need for principled selection of the truncation level in practice. Future work could explore adaptive truncation, extensions to high‑dimensional BO, and tighter regret bounds that incorporate problem‑specific structure.
Comments & Academic Discussion
Loading comments...
Leave a Comment