Incorporating the ChEES Criterion into Sequential Monte Carlo Samplers
Markov chain Monte Carlo (MCMC) methods are a powerful but computationally expensive way of performing non-parametric Bayesian inference. MCMC proposals which utilise gradients, such as Hamiltonian Monte Carlo (HMC), can better explore the parameter space of interest if the additional hyper-parameters are chosen well. The No-U-Turn Sampler (NUTS) is a variant of HMC which is extremely effective at selecting these hyper-parameters but is slow to run and is not suited to GPU architectures. An alternative to NUTS, Change in the Estimator of the Expected Square HMC (ChEES-HMC) was shown not only to run faster than NUTS on GPU but also sample from posteriors more efficiently. Sequential Monte Carlo (SMC) samplers are another sampling method which instead output weighted samples from the posterior. They are very amenable to parallelisation and therefore being run on GPUs while having additional flexibility in their choice of proposal over MCMC. We incorporate (ChEEs-HMC) as a proposal into SMC samplers and demonstrate competitive but faster performance than NUTS on a number of tasks.
💡 Research Summary
This paper presents a novel integration of the Change in the Estimator of the Expected Square (ChEES) criterion—originally introduced as an adaptive alternative to the No‑U‑Turn Sampler (NUTS)—into the framework of Sequential Monte Carlo (SMC) samplers. The motivation stems from the observation that while NUTS automatically tunes HMC trajectory length, its recursive tree‑building logic and complex control flow make it ill‑suited for modern GPU architectures. In contrast, ChEES‑HMC employs a simple jittered trajectory length, optimized via gradient ascent on the ChEES objective, and can be efficiently implemented on parallel hardware.
The authors first review the foundations of Hamiltonian Monte Carlo (HMC), the Metropolis‑Hastings acceptance rule, and the NUTS algorithm. They then introduce the ChEES metric, defined as the expected squared change in the sample variance between successive states. Maximizing ChEES encourages larger dispersion of the particle cloud, thereby reducing autocorrelation. In practice, a Halton sequence is used to jitter the trajectory length (l_m = h_m L_{m-1}); during a warm‑up phase the log‑trajectory length is updated with Adam using a weighted gradient estimate. After warm‑up, the average length (\bar L) is fixed, while jitter continues to provide diversity across particles.
The core contribution is the embedding of this adaptive HMC proposal into an SMC sampler. The SMC algorithm proceeds through a series of importance‑sampling steps, resampling when the effective sample size (ESS) falls below a threshold. The proposal distribution (q(\theta_{k}|\theta_{k-1})) is taken to be the ChEES‑HMC transition, and the backward kernel (L(\theta_{k-1}|\theta_{k})) is chosen to be the same distribution, which cancels Jacobian terms in the weight update. Notably, the authors omit an explicit Metropolis‑Hastings accept‑reject step; instead, the acceptance probability (\alpha) computed by ChEES‑HMC is incorporated directly into the particle weight update. This design preserves the parallel nature of SMC while still benefitting from gradient‑based proposals.
A substantial experimental component investigates how the choice of random or quasi‑random number generator (RNG) for the jitter sequence influences performance. Thirteen generators are evaluated, including plain uniform draws, 1‑D and N‑D Halton, inverse Halton, Sobol, prime‑based sequences, the golden ratio, equidistant spacing, and variants with offsets. The authors report two key metrics: the number of gradient evaluations per sample (lower is better) and the effective sample size per gradient evaluation (higher is better). Results consistently show that low‑discrepancy sequences (especially 1‑D Halton and 1‑D Sobol) achieve superior efficiency compared with plain uniform random numbers.
Three benchmark problems are used: (1) a 5‑dimensional multivariate Gaussian, (2) a 100‑dimensional ill‑conditioned Gaussian with a condition number around (1.3\times10^{5}), and (3) a non‑linear “banana‑shaped” posterior. Across all tasks, the SMC‑ChEES sampler matches or exceeds the ESS achieved by an SMC sampler using NUTS, while requiring 30‑70 % fewer gradient evaluations. In the high‑dimensional ill‑conditioned case, NUTS spends considerable time expanding its trajectory tree, whereas ChEES‑HMC stabilizes after warm‑up with a modest fixed trajectory length, leading to a 2‑3× speed‑up on GPU hardware. For the banana posterior, jittered trajectories enable the particle cloud to escape narrow curvature regions that often trap NUTS, resulting in a 20 % increase in ESS.
The paper’s contributions can be summarized as follows:
- A seamless integration of the ChEES adaptive criterion into SMC, allowing particle‑wise trajectory length tuning without sacrificing parallel efficiency.
- Elimination of the Metropolis‑Hastings accept‑reject step in the SMC context, leveraging the acceptance probability directly in weight updates.
- A systematic evaluation of RNG and quasi‑RNG strategies for jitter generation, demonstrating that low‑discrepancy sequences improve sampling efficiency.
- Empirical evidence that SMC‑ChEES outperforms NUTS‑based SMC in both computational cost and effective sample size, particularly on GPU platforms.
Limitations include the relatively modest particle count (J = 1000) and fixed number of SMC iterations (K = 200) used in the experiments, leaving open questions about scalability to very large models such as deep Bayesian neural networks. Additionally, while jitter provides diversity, overly aggressive jitter could introduce bias; adaptive control of jitter magnitude is identified as future work. The authors suggest extending the method to multi‑GPU or distributed settings and exploring automatic jitter‑scale adaptation to further enhance robustness.
In conclusion, the study demonstrates that incorporating the ChEES criterion into SMC samplers yields a GPU‑friendly, gradient‑based sampling algorithm that retains the adaptive strengths of NUTS while delivering substantial reductions in computational overhead. This makes SMC‑ChEES a compelling alternative for practitioners requiring fast, scalable Bayesian inference on modern parallel hardware.
Comments & Academic Discussion
Loading comments...
Leave a Comment