On the Convergence of Wasserstein Gradient Descent for Sampling

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper studies the optimization of the KL functional on the Wasserstein space of probability measures, and develops a sampling framework based on Wasserstein gradient descent (WGD). We identify two important subclasses of the Wasserstein space for which the WGD scheme is guaranteed to converge, thereby providing new theoretical foundations for optimization-based sampling methods on measure spaces. For practical implementation, we construct a particle-based WGD algorithm in which the score function is estimated via score matching. Through a series of numerical experiments, we demonstrate that WGD can provide good approximation to a variety of complex target distributions, including those that pose substantial challenges for standard MCMC and parametric variational Bayes methods. These results suggest that WGD offers a promising and flexible alternative for scalable Bayesian inference in high-dimensional or multimodal settings.

💡 Research Summary

This paper revisits Bayesian sampling from the perspective of minimizing the Kullback–Leibler (KL) divergence between a candidate distribution μ and a target posterior π, treating the problem as an optimization over the 2‑Wasserstein space W₂(ℝᵈ). The authors focus on the Wasserstein Gradient Descent (WGD) algorithm, which updates a distribution by pushing it along the Wasserstein gradient of the KL functional: ν = (Id − ε ∇_μ F)#μ, where ∇_μ F = ∇ log μ − ∇ log π.

A major obstacle to proving convergence of WGD on the full Wasserstein space is the non‑smoothness of the entropy term H(μ)=∫ log μ dμ. Prior work (Salim et al., 2020) circumvented this by a forward‑backward splitting that requires a proximal “backward” step, which is only tractable in the Gaussian (Bures‑Wasserstein) setting. The present work overcomes this limitation by identifying two subclasses of probability measures on which the KL functional behaves nicely enough for plain WGD to converge.

(α, β)‑regular measures.
A measure μ belongs to P_r(α,β) if its density is C^∞, its log‑density f = −log μ is α‑strongly convex and β‑smooth (i.e., α I ≼ ∇²f ≼ β I). This condition is satisfied by any Gaussian whose covariance eigenvalues lie in

On the Convergence of Wasserstein Gradient Descent for Sampling

💡 Research Summary

Comments & Academic Discussion

Leave a Comment