Fast and Large-Scale Unbalanced Optimal Transport via its Semi-Dual and Adaptive Gradient Methods
Unbalanced Optimal Transport (UOT) has emerged as a robust relaxation of standard Optimal Transport, particularly effective for handling outliers and mass variations. However, scalable algorithms for UOT, specifically those based on Gradient Descent (SGD), remain largely underexplored. In this work, we address this gap by analyzing the semi-dual formulation of Entropic UOT and demonstrating its suitability for adaptive gradient methods. While the semi-dual is a standard tool for large-scale balanced OT, its geometry in the unbalanced setting appears ill-conditioned under standard analysis. Specifically, worst-case bounds on the marginal penalties using $χ^2$ divergence suggest a condition number scaling with $n/\varepsilon$, implying poor scalability. In contrast, we show that the local condition number actually scales as $\mathcal{O}(1/\varepsilon)$, effectively removing the ill-conditioned dependence on $n$. Exploiting this property, we prove that SGD methods adapt to this local curvature, achieving a convergence rate of $\mathcal{O}(n/\varepsilon T)$ in the stochastic and online regimes, making it suitable for large-scale and semi-discrete applications. Finally, for the full batch discrete setting, we derive a nearly tight upper bound on local smoothness depending solely on the gradient. Using it to adapt step sizes, we propose a modified Adaptive Nesterov Accelerated Gradient (ANAG) method on the semi-dual functional and prove that it achieves a local complexity of $\mathcal{O}(n^2\sqrt{1/\varepsilon}\ln(1/δ))$.
💡 Research Summary
The paper addresses the lack of scalable first‑order algorithms for Entropic Unbalanced Optimal Transport (UOT). By focusing on the semi‑dual formulation and choosing a χ² divergence for the target marginal penalty, the authors reveal a favorable geometry that is hidden under traditional worst‑case analyses. They prove that the local condition number at the optimum scales only as O(1/ε), independent of the problem size n, whereas naïve global bounds suggest a dependence of O(n/ε). This insight stems from a novel smoothness bound: the operator norm of the Hessian is bounded by (1/ε) times the infinity‑norm of the transport gradient plus a constant term. Consequently, the smoothness constant L is O(1/ε) within a region K that contains the optimizer (g_k ≤ ρ₂ + δ).
Leveraging this local curvature, the authors design two algorithms. First, a Projected Averaged Stochastic Gradient Descent (PASGD) suitable for semi‑discrete or fully continuous source measures. An unbiased mini‑batch estimator of the semi‑dual gradient is constructed, requiring O(m·n) operations per iteration (m = batch size). By projecting onto K and averaging iterates, PASGD automatically adapts to the local smoothness and achieves an expected convergence rate of O(n/(εT)) after T iterations, both in batch and online streaming settings.
Second, for the full‑batch discrete case, they propose an Adaptive Nesterov Accelerated Gradient (ANAG) method. Instead of using a conservative global Lipschitz constant, ANAG dynamically adjusts the step size based on the current value of ∥∇J_trans(g)∥_∞, which serves as a data‑dependent smoothness estimate. Combining this adaptive step with Nesterov’s momentum yields a local complexity of O(n²√(1/ε)·log(1/δ)) to reach an ε‑accurate solution with confidence 1−δ.
The paper also explains why the χ² target penalty is essential: with a KL‑KL combination the Hessian’s diagonal entries decay exponentially (e^{−g/ρ₂}), destroying uniform strong convexity and making acceleration difficult. The χ² penalty, by contrast, provides a constant curvature of 1/ρ₂, guaranteeing global strong convexity and enabling the theoretical results.
Overall, the work provides a thorough geometric analysis of the Entropic UOT semi‑dual, demonstrates that the problem is locally well‑conditioned, and translates these findings into practical, scalable gradient‑based solvers. The proposed methods bridge the gap between the success of semi‑dual approaches in balanced OT and the need for efficient algorithms in the unbalanced setting, opening the door to large‑scale applications such as semi‑discrete transport, generative modeling, and online data assimilation.
Comments & Academic Discussion
Loading comments...
Leave a Comment