Sequential Monte Carlo approximations of Wasserstein--Fisher--Rao gradient flows

Sequential Monte Carlo approximations of Wasserstein--Fisher--Rao gradient flows
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider the problem of sampling from a probability distribution $π$. It is well known that this can be written as an optimisation problem over the space of probability distribution in which we aim to minimise the Kullback–Leibler divergence from $π$. We consider several partial differential equations (PDEs) whose solution is a minimiser of the Kullback–Leibler divergence from $π$ and connect them to well-known Monte Carlo algorithms. We focus in particular on PDEs obtained by considering the Wasserstein–Fisher–Rao geometry over the space of probabilities and show that these lead to a natural implementation using importance sampling and sequential Monte Carlo. We propose a novel algorithm to approximate the Wasserstein–Fisher–Rao flow of the Kullback–Leibler divergence and conduct an extensive empirical study to identify when these algorithms outperforms other popular Monte Carlo algorithms.


💡 Research Summary

The paper addresses the fundamental problem of sampling from a target distribution π whose density is known only up to a normalising constant. By viewing sampling as the optimisation of the Kullback–Leibler (KL) divergence KL(μ‖π) over the space of probability measures, the authors connect the problem to gradient flows defined with respect to different Riemannian metrics on the space of densities. Three flows are considered: the Wasserstein (W) gradient flow, which corresponds to the Fokker‑Planck equation of a Langevin diffusion; the Fisher–Rao (FR) gradient flow, which yields a birth‑death (reaction) dynamics that re‑weights particles according to their likelihood under π; and the combined Wasserstein–Fisher–Rao (WFR) flow, which adds the diffusive and birth‑death components in a single orthogonal metric. The WFR flow inherits the fast exponential convergence of the FR flow under mild moment assumptions and the diffusion‑driven exploration of the W flow, offering theoretically superior convergence rates (KL(μ_t‖π) ≤ min{KL(μ_t^{W}‖π), KL(μ_t^{FR}‖π)}).

Existing numerical schemes for the WFR flow first discretise space (particle approximations) and then time, typically coupling a Langevin step with a birth‑death correction. The authors propose a different strategy: they discretise time first, applying a deterministic transport map that implements one step of the W flow (a Gaussian convolution with a transport map Id + γ∇log π), followed by an explicit update that solves the FR flow analytically. The FR update reduces to a mirror‑descent step on the KL functional: μ_{n} ∝ π^{1−e^{−γ}} μ_{n−½}^{e^{−γ}}. Alternating these two steps yields a fully explicit scheme that can be interpreted as a Sequential Monte Carlo (SMC) sampler with importance weights updated by the FR step and particles moved by the W step. Proposition 3.1 establishes exponential decay of KL under a log‑Sobolev inequality for π, with a step‑size condition γ ≤ C^{-1}LSI/(4L_π^2). The bound shows a leading term identical to the unadjusted Langevin algorithm plus an O(γ) discretisation error, confirming that the combined scheme converges at least as fast as pure W flow and typically faster thanks to the FR contribution.

Algorithm 1 implements the scheme with resampling to control weight degeneracy, making it a practical SMC algorithm. The authors also provide a broader optimisation viewpoint, showing that many FR‑type gradient flows (including annealed importance sampling, temperature schedules, and mirror descent) can be seen as special cases of SMC updates, paralleling earlier work that identified Langevin dynamics as a discretisation of the Wasserstein gradient flow.

Empirical evaluation covers one‑dimensional Gaussian and multimodal Gaussian mixtures, as well as a high‑dimensional Bayesian logistic regression problem. The proposed WFR‑SMC method is compared against Unadjusted Langevin (ULA), Metropolis‑adjusted Langevin (MALA), and existing birth‑death Langevin algorithms. Results demonstrate that WFR‑SMC achieves faster KL reduction, higher effective sample size (ESS), and greater robustness to poor initialisations. In multimodal settings the FR component accelerates convergence by re‑weighting particles toward high‑density regions, while the W component maintains exploration, leading to superior performance over pure diffusion or pure birth‑death schemes. The method also scales more gracefully with dimensionality, suggesting applicability to large‑scale Bayesian inference.

In summary, the paper introduces a novel, theoretically grounded SMC algorithm that approximates the Wasserstein–Fisher–Rao gradient flow of the KL divergence. By combining diffusion‑based transport with analytically solvable FR updates, the algorithm enjoys exponential convergence guarantees, improved empirical performance, and a unifying perspective that links gradient‑flow‑based sampling, mirror descent, and sequential Monte Carlo methods. This contribution is likely to influence future research on efficient sampling algorithms for complex, high‑dimensional target distributions.


Comments & Academic Discussion

Loading comments...

Leave a Comment