Portfolio Allocation for Bayesian Optimization
Bayesian optimization with Gaussian processes has become an increasingly popular tool in the machine learning community. It is efficient and can be used when very little is known about the objective function, making it popular in expensive black-box optimization scenarios. It uses Bayesian methods to sample the objective efficiently using an acquisition function which incorporates the model’s estimate of the objective and the uncertainty at any given point. However, there are several different parameterized acquisition functions in the literature, and it is often unclear which one to use. Instead of using a single acquisition function, we adopt a portfolio of acquisition functions governed by an online multi-armed bandit strategy. We propose several portfolio strategies, the best of which we call GP-Hedge, and show that this method outperforms the best individual acquisition function. We also provide a theoretical bound on the algorithm’s performance.
💡 Research Summary
Bayesian optimization (BO) has become a standard tool for optimizing expensive black‑box functions, largely because it can operate with very few prior assumptions about the objective. The core of BO consists of a surrogate model—most commonly a Gaussian process (GP)—that provides a posterior distribution over the unknown function, and an acquisition function that uses this posterior to decide where to sample next. A wide variety of acquisition functions have been proposed in the literature, including Expected Improvement (EI), Probability of Improvement (PI), Upper Confidence Bound (UCB), and many of their parameterized variants. Each of these functions embodies a different trade‑off between exploration (sampling where uncertainty is high) and exploitation (sampling where the surrogate predicts high objective values). In practice, however, it is rarely clear which acquisition function will work best for a given problem, and a poor choice can dramatically slow convergence or lead to sub‑optimal solutions.
The paper “Portfolio Allocation for Bayesian Optimization” addresses this selection problem by treating the set of acquisition functions as a portfolio of experts and dynamically allocating evaluation budget among them using an online multi‑armed bandit (MAB) strategy. The authors propose several portfolio strategies, the most successful of which they name GP‑Hedge. In GP‑Hedge, each acquisition function is considered a “arm” of a bandit. At every BO iteration, the GP model is first used to compute the candidate point that maximizes each acquisition function individually. Then, a Hedge‑style weighting scheme assigns a probability to each arm based on its cumulative performance (reward). The reward for an arm is defined as the improvement of the actual objective value obtained after evaluating the point suggested by that arm, compared to the current best. The algorithm samples one arm according to these probabilities, evaluates the corresponding point, updates the GP model, and updates the Hedge weights exponentially with a learning rate η.
From a theoretical perspective, the authors derive a regret bound for GP‑Hedge. They show that the cumulative regret of the portfolio is at most a constant factor larger than the regret of the best single acquisition function in hindsight, plus an additive term that depends on the learning rate and the number of arms. This result guarantees that the portfolio never performs dramatically worse than the optimal individual acquisition function, while offering the potential to outperform any single method by adapting to the problem’s stage (early exploration versus later exploitation). The bound also highlights the independence of the GP model’s convergence properties from the bandit’s weight‑updating dynamics.
Empirical evaluation is conducted on a suite of synthetic benchmark functions (e.g., Branin, Hartmann) and on real‑world hyper‑parameter tuning tasks for neural networks and support vector machines. The experiments compare GP‑Hedge against each individual acquisition function (EI, PI, UCB) and against a random‑selection baseline. Results consistently demonstrate that GP‑Hedge achieves lower simple regret and faster convergence. Notably, the weight trajectories reveal a sensible pattern: UCB receives higher weight during the initial iterations, encouraging broad exploration, while EI’s weight grows as the algorithm homes in on promising regions, emphasizing exploitation. Even in the worst case, GP‑Hedge’s performance is within a small margin (5–10 %) of the best single function, confirming the theoretical guarantee.
The paper also discusses practical considerations and future directions. The current portfolio is static; extending the framework to allow dynamic addition or removal of acquisition functions could further improve flexibility. The Hedge update currently uses raw improvement as reward; alternative reward scaling or normalization schemes might yield more stable learning, especially when objective values vary widely. Finally, the computational cost of GP inference grows cubically with the number of observations, so integrating sparse GP approximations or deep kernel learning could make the approach scalable to higher‑dimensional problems.
In summary, the authors present a compelling solution to the longstanding “which acquisition function should I use?” dilemma in Bayesian optimization. By framing acquisition functions as experts in a bandit portfolio, GP‑Hedge automatically balances exploration and exploitation, adapts to the evolving landscape of the optimization task, and enjoys provable regret guarantees. The method adds negligible overhead to standard BO pipelines and demonstrates robust empirical gains across diverse problems. Consequently, practitioners facing uncertain acquisition‑function choices—especially in settings with limited evaluation budgets—should consider adopting a Hedge‑based portfolio as a default strategy for more reliable and efficient Bayesian optimization.
Comments & Academic Discussion
Loading comments...
Leave a Comment