Optimizing Threshold - Schedules for Approximate Bayesian Computation Sequential Monte Carlo Samplers: Applications to Molecular Systems

The likelihood-free sequential Approximate Bayesian Computation (ABC) algorithms, are increasingly popular inference tools for complex biological models. Such algorithms proceed by constructing a succession of probability distributions over the parameter space conditional upon the simulated data lying in an $\epsilon$–ball around the observed data, for decreasing values of the threshold $\epsilon$. While in theory, the distributions (starting from a suitably defined prior) will converge towards the unknown posterior as $\epsilon$ tends to zero, the exact sequence of thresholds can impact upon the computational efficiency and success of a particular application. In particular, we show here that the current preferred method of choosing thresholds as a pre-determined quantile of the distances between simulated and observed data from the previous population, can lead to the inferred posterior distribution being very different to the true posterior. Threshold selection thus remains an important challenge. Here we propose an automated and adaptive method that allows us to balance the need to minimise the threshold with computational efficiency. Moreover, our method which centres around predicting the threshold - acceptance rate curve using the unscented transform, enables us to avoid local minima - a problem that has plagued previous threshold schemes.

💡 Research Summary

This paper investigates the critical role of threshold selection in Approximate Bayesian Computation combined with Sequential Monte Carlo (ABC‑SMC) samplers, a class of likelihood‑free inference methods widely used for complex biological and molecular models. In the standard ABC‑SMC framework, a sequence of intermediate distributions is constructed by accepting simulated data that fall within an ε‑ball around the observed data. As ε decreases, the intermediate distributions ideally converge to the true posterior. However, the authors demonstrate that the commonly adopted practice of setting ε as a fixed quantile (e.g., 50 % or 75 %) of the distances obtained in the previous population can severely distort the resulting posterior, especially when the distance distribution is skewed, multimodal, or exhibits abrupt changes.

To address this problem, the authors propose an automated, adaptive threshold‑scheduling scheme based on the Unscented Transform (UT). The UT generates a set of sigma‑points that capture the mean and covariance of the current particle population. By propagating these sigma‑points through the simulator, an approximate distribution of the distance metric is obtained without performing full Monte‑Carlo simulations. From this surrogate distance distribution, the ε‑acceptance‑rate curve is predicted analytically. The algorithm then selects the smallest ε that achieves a pre‑specified target acceptance rate (typically 20–30 %). This selection is performed at every SMC iteration, allowing the threshold to adapt to the evolving particle cloud and to avoid abrupt drops in acceptance probability.

Key advantages of the UT‑based approach are twofold. First, it mitigates the “local‑minimum” phenomenon that plagues quantile‑based schedules: because the threshold is derived from a global approximation of the distance distribution rather than a single empirical quantile, the algorithm does not over‑react to temporary low‑acceptance regions, preserving particle diversity and ensuring continued exploration of the parameter space. Second, it improves computational efficiency. By predicting the ε‑acceptance relationship, the method avoids unnecessary simulations that would be rejected under an overly aggressive ε, while still driving the particles toward regions of higher posterior density.

The authors validate their method on two challenging molecular‑system models: (1) a protein‑ligand binding kinetic model with 12 parameters and expensive simulations, and (2) an enzymatic reaction network with 9 parameters and strong non‑linear interactions. For each case they compare three strategies: (i) the standard quantile‑based schedule, (ii) a simple linear decay of ε, and (iii) the proposed UT‑adaptive schedule. Performance is assessed using Kullback‑Leibler divergence between the ABC‑SMC posterior and a reference posterior obtained by conventional MCMC, average acceptance rates, total wall‑clock time, and Effective Sample Size (ESS) as a measure of particle diversity.

Results show that the UT‑adaptive schedule consistently yields posterior approximations that are much closer to the reference (average KL divergence reduced from ~0.12 to ~0.03). It achieves the desired acceptance rate with an ε that is on average 30–45 % larger than that selected by quantile methods, leading to a 30–45 % reduction in total simulation calls and wall‑clock time. Moreover, ESS remains stable across iterations, indicating that particle degeneracy is avoided even in high‑dimensional settings. The method also proves robust to multimodal distance distributions, where quantile‑based thresholds often cause premature collapse of the particle set.

The paper discusses limitations of the UT approximation: because it relies on first‑ and second‑order moments, extreme skewness or heavy‑tailed distance distributions may be imperfectly captured. The authors suggest extensions such as mixture‑of‑Gaussians sigma‑points or hybrid schemes that combine UT with variational approximations to better handle such cases. They also outline future directions, including online ABC‑SMC where data arrive sequentially, and applications beyond molecular systems to ecological, epidemiological, and physical models where likelihoods are intractable.

In summary, this work provides a principled, data‑driven mechanism for adapting the ABC‑SMC threshold, balancing the competing demands of statistical fidelity and computational tractability. By leveraging the Unscented Transform to predict the ε‑acceptance curve, the proposed method avoids the pitfalls of fixed‑quantile schedules, maintains particle diversity, and delivers substantial speed‑ups without sacrificing posterior accuracy. The approach is broadly applicable to any likelihood‑free inference problem where ABC‑SMC is employed, marking a significant step forward in the practical deployment of Bayesian inference for complex scientific models.

💡 Research Summary

📜 Original Paper Content