Knowledge Gradient for Preference Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The knowledge gradient is a popular acquisition function in Bayesian optimization (BO) for optimizing black-box objectives with noisy function evaluations. Many practical settings, however, allow only pairwise comparison queries, yielding a preferential BO problem where direct function evaluations are unavailable. Extending the knowledge gradient to preferential BO is hindered by its computational challenge. At its core, the look-ahead step in the preferential setting requires computing a non-Gaussian posterior, which was previously considered intractable. In this paper, we address this challenge by deriving an exact and analytical knowledge gradient for preferential BO. We show that the exact knowledge gradient performs strongly on a suite of benchmark problems, often outperforming existing acquisition functions. In addition, we also present a case study illustrating the limitation of the knowledge gradient in certain scenarios.

💡 Research Summary

The paper tackles a long‑standing obstacle in preferential Bayesian optimization (PBO): the inability to compute the Knowledge Gradient (KG) acquisition function exactly because the look‑ahead posterior after a pairwise comparison is non‑Gaussian. In standard Bayesian optimization, KG measures the expected increase in the maximum posterior mean after evaluating a candidate point. Extending this to PBO requires handling the binary outcome of a comparison, which induces a probit likelihood Pr(x₁ ≻ x₂)=Φ(f(x₁)−f(x₂)). The authors observe that conditioning on the event x₁ ≻ x₂ is equivalent to imposing a linear inequality f(x₁)−f(x₂)+ε ≥ 0 with ε∼N(0,1). Under a Gaussian‑process (GP) prior (or posterior approximated by a GP), the joint distribution of f(x) and f(x₁)−f(x₂)+ε is bivariate normal, and the conditional distribution of f(x) given the inequality follows an extended skew‑normal distribution. Crucially, the mean of this distribution has a closed‑form expression (Azzalini, 2013):

Knowledge Gradient for Preference Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment