Gaussian Process Bandit Optimization with Machine Learning Predictions and Application to Hypothesis Generation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Many real-world optimization problems involve an expensive ground-truth oracle (e.g., human evaluation, physical experiments) and a cheap, low-fidelity prediction oracle (e.g., machine learning models, simulations). Meanwhile, abundant offline data (e.g., past experiments and predictions) are often available and can be used to pretrain powerful predictive models, as well as to provide an informative prior. We propose Prediction-Augmented Gaussian Process Upper Confidence Bound (PA-GP-UCB), a novel Bayesian optimization algorithm that leverages both oracles and offline data to achieve provable gains in sample efficiency for the ground-truth oracle queries. PA-GP-UCB employs a control-variates estimator derived from a joint Gaussian process posterior to correct prediction bias and reduce uncertainty. We prove that PA-GP-UCB preserves the standard regret rate of GP-UCB while achieving a strictly smaller leading constant that is explicitly controlled by prediction quality and offline data coverage. Empirically, PA-GP-UCB converges faster than Vanilla GP-UCB and naive prediction-augmented GP-UCB baselines on synthetic benchmarks and on a real-world hypothesis evaluation task grounded in human behavioral data, where predictions are provided by large language models. These results establish PA-GP-UCB as a general and sample-efficient framework for hypothesis generation under expensive feedback.

💡 Research Summary

The paper introduces Prediction‑Augmented Gaussian Process Upper Confidence Bound (PA‑GP‑UCB), a Bayesian optimization algorithm designed for settings where a costly, high‑fidelity oracle (e.g., human evaluation or physical experiment) co‑exists with a cheap, low‑fidelity prediction oracle (e.g., a machine‑learning model or large language model), and where abundant offline data from past experiments or predictions are available.

The authors model the true objective function f_true(x) and the prediction function f_ML(x) jointly as a two‑output Gaussian Process (GP) with a correlation matrix B that captures the statistical dependence between the two tasks. In the offline stage, they query the prediction oracle on a uniform ε‑net covering the domain and repeat each query N times to reduce observation noise. These offline observations are used to train a global GP (“GP_all”) that yields a posterior variance σ_ML,all(x) that is uniformly smaller than the variance σ_ML(x) obtained from online data alone. The ratio (σ_ML,all(x)/σ_ML(x))² ≤ R (with 0 < R ≤ 1) quantifies how much the offline data tighten the prediction uncertainty.

During the online stage, at each round t the algorithm observes both the expensive true observation y_true(x_t) and the cheap prediction y_ML(x_t). It then constructs a control‑variates estimator:

μ_PA,t(x) = μ_true,t(x) – ρ_t(x)·σ_true,t(x)/σ_ML,t(x)·

Gaussian Process Bandit Optimization with Machine Learning Predictions and Application to Hypothesis Generation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment