Scaling factors for ab initio vibrational frequencies: comparison of uncertainty models for quantified prediction

Scaling factors for ab initio vibrational frequencies: comparison of   uncertainty models for quantified prediction
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Bayesian Model Calibration is used to revisit the problem of scaling factor calibration for semi-empirical correction of ab initio calculations. A particular attention is devoted to uncertainty evaluation for scaling factors, and to their effect on prediction of observables involving scaled properties. We argue that linear models used for calibration of scaling factors are generally not statistically valid, in the sense that they are not able to fit calibration data within their uncertainty limits. Uncertainty evaluation and uncertainty propagation by statistical methods from such invalid models are doomed to failure. To relieve this problem, a stochastic function is included in the model to account for model inadequacy, according to the Bayesian Model Calibration approach. In this framework, we demonstrate that standard calibration summary statistics, as optimal scaling factor and root mean square, can be safely used for uncertainty propagation only when large calibration sets of precise data are used. For small datasets containing a few dozens of data, a more accurate formula is provided which involves scaling factor calibration uncertainty. For measurement uncertainties larger than model inadequacy, the problem can be reduced to a weighted least squares analysis. For intermediate cases, no analytical estimators were found, and numerical Bayesian estimation of parameters has to be used.


💡 Research Summary

The paper revisits the long‑standing practice of applying a linear scaling factor to correct ab‑initio vibrational frequencies, using the formalism of Bayesian Model Calibration (BMC). The authors first demonstrate that the conventional linear model ν_exp ≈ s·ν_calc, calibrated by ordinary least squares, is statistically invalid in most realistic situations: the residuals cannot be accommodated within the experimental uncertainties, as shown by χ² tests and residual analyses. Consequently, any uncertainty propagation that relies solely on the optimal scaling factor and the root‑mean‑square (RMS) error is fundamentally flawed, because it ignores model inadequacy – the systematic discrepancy between the true physical relationship and the simplistic linear form.

To remedy this, the authors embed an additional stochastic term δ into the model: ν_exp = s·ν_calc + δ + ε, where ε represents known measurement noise and δ is a zero‑mean Gaussian random variable with variance σ_δ² that captures model inadequacy. Non‑informative priors are assigned to s (uniform) and σ_δ (Jeffreys). The posterior distribution p(s, σ_δ | data) is sampled using Markov‑Chain Monte Carlo (MCMC), providing a full probabilistic description of both the scaling factor and the inadequacy term.

The study then explores three regimes defined by the size of the calibration set and the relative magnitude of measurement uncertainty (σ_ε) versus model inadequacy (σ_δ).

  1. Large, high‑precision datasets (N ≫ 200, σ_ε ≪ σ_δ). In this limit the posterior for s collapses around the classical least‑squares estimate, and the posterior variance of s is essentially the RMS‑derived variance. Hence the traditional practice of using the optimal scaling factor and RMS for uncertainty propagation remains justified.

  2. Small or moderate‑size high‑precision datasets (tens of points, σ_ε ≪ σ_δ). Here the uncertainty in s becomes non‑negligible. The authors derive an explicit propagation formula that incorporates the posterior variance of s:

    u(ν_pred) = √


Comments & Academic Discussion

Loading comments...

Leave a Comment