The Wasserstein gradient flow of the Sinkhorn divergence between Gaussian distributions

The Wasserstein gradient flow of the Sinkhorn divergence between Gaussian distributions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the Wasserstein gradient flow of the Sinkhorn divergence when both the source and the target are Gaussian distributions. We prove the existence of a flow that stays in the class of Gaussian distributions, and is unique in the larger class of measures with strongly-concave and smooth log-densities. We prove that the flow globally converges toward the target measure when the source’s covariance matrix is not singular, and provide counter-examples to global convergence when it is, giving a first answer to an open question raised in [Carlier et al. 2024, \S4.2]. When the covariance matrix of the source distribution commutes with that of the target, we derive more quantitative results that showcase exponential convergence toward the target when the source and the target share their support, but dropping to linear rates (O(t^{-1})) if the target is concentrated on a strict subspace of the source’s support.


💡 Research Summary

This paper investigates the Wasserstein gradient flow of the Sinkhorn divergence when both the source and target measures are Gaussian. The Sinkhorn divergence Sε(μ,ν) is defined as the entropically regularized optimal transport (OTε) cost minus half the self‑OTε terms, thereby correcting the shrinking bias of OTε. The authors consider the functional F(μ)=Sε(μ,μ★) for a fixed Gaussian target μ★ and study its Wasserstein gradient flow (SWGF) given by the continuity equation ˙μt = div(μt ∇(f_{μt,μ★}−f_{μt})), where f_{·,·} are the Schrödinger potentials associated with OTε.

Key contributions are:

  1. Closed‑form Gaussian formulas – By exploiting known explicit expressions for OTε and the Schrödinger potentials between Gaussian measures, the authors derive exact formulas for Sε and its first variation in terms of means and covariance matrices, extending these results to possibly singular covariances.

  2. Existence, uniqueness, and λ‑convexity – Restricting Sε to the Gaussian family yields a functional that is λ‑convex along generalized geodesics in the Wasserstein space, with a negative convexity constant λ that can be computed explicitly. This λ‑convexity, together with the Evolution Variational Inequality (EVI) framework of Ambrosio–Gigli–Savare, guarantees the existence of a unique gradient flow curve (μt) that remains Gaussian for all t≥0. Moreover, the flow is shown to be the unique Wasserstein gradient flow of F among all measures with strongly concave and smooth log‑densities.

  3. Global convergence analysis – When the source covariance Σ0 is positive definite, the flow converges globally in Wasserstein distance to the target μ★. If Σ0 is singular and μ★ is not identical to μ0, the flow converges to a different Gaussian limit whose covariance is the orthogonal projection of Σ★ onto the support of Σ0. This provides the first rigorous counter‑example to global convergence in the singular case, answering an open question raised in Carlier et al. (2024).

  4. Commuting covariances and quantitative rates – Assuming Σ0 and Σ★ commute, they can be simultaneously diagonalized. The authors derive an ODE for each eigenvalue λi(t) of the covariance matrix along the flow: λi′(t)=−2ε·(λi(t)−λi★)/(λi(t)+ε). From this they obtain:

    • Exponential convergence when all target eigenvalues are positive (i.e., the supports coincide). The decay rate is explicitly given in terms of ε and the eigenvalue gaps.
    • Sublinear O(t⁻¹) convergence when some target eigenvalues are zero, meaning the target lives on a strict subspace of the source’s support. The authors prove that the functional Sε(μt,μ★) decays like C/t in this regime.
  5. Numerical scheme and experiments – An explicit Euler discretization of the eigenvalue dynamics is proposed, enabling efficient simulation of the flow even in high dimensions. Numerical experiments confirm the theoretical rates (exponential vs. O(t⁻¹)) and illustrate the behavior in both regular and singular settings.

Overall, the paper delivers a complete analytical picture of the Wasserstein gradient flow of the Sinkhorn divergence in the Gaussian setting, removing the compact‑support restriction present in earlier works. It establishes existence, uniqueness, and precise convergence rates, and it clarifies when the flow fails to reach the target. These results bridge optimal transport, entropy‑regularized transport, and Riemannian geometry on the space of Gaussian measures (the Bures–Wasserstein manifold), and they open avenues for applications in variational inference, robust statistics, and gradient‑based generative modeling.


Comments & Academic Discussion

Loading comments...

Leave a Comment