Tuning Sequential Monte Carlo Samplers via Greedy Incremental Divergence Minimization
The performance of sequential Monte Carlo (SMC) samplers heavily depends on the tuning of the Markov kernels used in the path proposal. For SMC samplers with unadjusted Markov kernels, standard tuning objectives, such as the Metropolis-Hastings acceptance rate or the expected-squared jump distance, are no longer applicable. While stochastic gradient-based end-to-end optimization has been explored for tuning SMC samplers, they often incur excessive training costs, even for tuning just the kernel step sizes. In this work, we propose a general adaptation framework for tuning the Markov kernels in SMC samplers by minimizing the incremental Kullback-Leibler (KL) divergence between the proposal and target paths. For step size tuning, we provide a gradient- and tuning-free algorithm that is generally applicable for kernels such as Langevin Monte Carlo (LMC). We further demonstrate the utility of our approach by providing a tailored scheme for tuning kinetic LMC used in SMC samplers. Our implementations are able to obtain a full schedule of tuned parameters at the cost of a few vanilla SMC runs, which is a fraction of gradient-based approaches.
💡 Research Summary
This paper addresses a critical bottleneck in sequential Monte Carlo (SMC) samplers: the tuning of unadjusted Markov kernels that are used to propagate particles along a path of intermediate distributions. While adjusted kernels (e.g., Metropolis‑Hastings) can be tuned using classic diagnostics such as acceptance rates or expected squared jump distance, these metrics are not applicable to unadjusted kernels like Langevin Monte Carlo (LMC) or kinetic LMC, which are increasingly popular because of their favorable high‑dimensional convergence properties and differentiability. Existing approaches to tune such kernels typically rely on stochastic gradient descent (SGD) in an end‑to‑end fashion, treating the entire SMC pipeline as a black‑box variational objective. Although theoretically sound, SGD‑based tuning is computationally expensive (often requiring 10⁵–10⁶ gradient steps) and sensitive to hyper‑parameters such as learning rate, batch size, and initialization.
The authors propose a fundamentally different adaptation strategy that sidesteps gradient‑based optimization altogether. Their key insight is to minimize the incremental Kullback‑Leibler (KL) divergence between the target path measure (P_{0:T}) and the proposal path measure (Q_{0:T}) at each SMC step. By exploiting the chain rule for KL, they define a path‑wise divergence
\
Comments & Academic Discussion
Loading comments...
Leave a Comment