CRPS-Based Targeted Sequential Design with Application in Chemical Space

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Sequential design of real and computer experiments via Gaussian Process (GP) models has proven useful for parsimonious, goal-oriented data acquisition purposes. In this work, we focus on acquisition strategies for a GP model that needs to be accurate within a predefined range of the response of interest. Such an approach is useful in various fields including synthetic chemistry, where finding molecules with particular properties is essential for developing useful materials and effective medications. GP modeling and sequential design of experiments have been successfully applied to a plethora of domains, including molecule research. Our main contribution here is to use the threshold-weighted Continuous Ranked Probability Score (CRPS) as a basic building block for acquisition functions employed within sequential design. We study pointwise and integral criteria relying on two different weighting measures and benchmark them against competitors, demonstrating improved performance with respect to considered goals. The resulting acquisition strategies are applicable to a wide range of fields and pave the way to further developing sequential design relying on scoring rules.

💡 Research Summary

This paper addresses the problem of sequential experimental design when the ultimate goal is not to locate a global optimum or to improve overall predictive accuracy, but rather to obtain accurate predictions within a predefined region of interest – an excursion set defined by a threshold t (Γ = {x ∈ X | f(x) ≥ t}). The authors propose to use the threshold‑weighted Continuous Ranked Probability Score (CRPS) as the building block of acquisition functions for Gaussian Process (GP) models.

First, the standard GP framework is reviewed, emphasizing that the posterior predictive distribution at any location x is Gaussian with mean mₙ(x) and variance kₙ(x,x). From this, the excursion probability pₙ(x) = Φ((mₙ(x)‑t)/√kₙ(x,x)) can be computed analytically, allowing a simple binary classifier ηₙ(x) = 1{pₙ(x) ≥ 0.5} and the associated estimator of the excursion set. However, classic acquisition criteria such as Expected Improvement (EI), Targeted Mean Square Error (TMSE) or Targeted Integrated MSE (TIMSE) either ignore the location of the threshold or treat all regions equally, which is sub‑optimal for the targeted task.

The CRPS measures the L₂ distance between a predictive cumulative distribution function F and the observed value y; lower scores indicate better forecasts. By introducing a weighting measure γ, the authors obtain a threshold‑weighted CRPS that emphasizes values near or above the threshold. Two weighting schemes are considered: (i) an indicator weight γ₁(u)=1_{u≥t} that gives full weight to the excursion region, and (ii) a Gaussian weight γ₂(u)=𝒩(u; t, σ_γ²) that smoothly focuses on the neighbourhood of t.

Crucially, the expected weighted CRPS under the current GP posterior can be expressed in closed form using only mₙ(x), kₙ(x,x) and the parameters of γ. This enables the construction of two families of acquisition functions:

Pointwise (myopic) criterion:
Gₙ(x) = E

CRPS-Based Targeted Sequential Design with Application in Chemical Space

💡 Research Summary

Comments & Academic Discussion

Leave a Comment