Solving Monge problem by Hilbert space embeddings of probability measures
We propose deep learning methods for classical Monge’s optimal mass transportation problems, where where the distribution constraint is treated as penalty terms defined by the maximum mean discrepancy in the theory of Hilbert space embeddings of probability measures. We prove that the transport maps given by the proposed methods converge to optimal transport maps in the problem with $L^2$ cost. Several numerical experiments validate our methods. In particular, we show that our methods are applicable to large-scale Monge problems. This is a corrected version of the ICORES 2025 proceedings paper.
💡 Research Summary
The paper introduces a novel deep‑learning framework for solving the classical Monge optimal transport problem by leveraging Hilbert‑space embeddings of probability measures. The authors replace the hard constraint that the push‑forward of the source distribution μ under a transport map T equals the target distribution ν with a soft penalty based on the Maximum Mean Discrepancy (MMD). MMD is defined with respect to a positive‑definite kernel K (e.g., Gaussian or Matérn), which guarantees that γK(·,·) is a true metric and metrizes the weak topology on the space of probability measures.
The objective functional is
Mλ(T) = ∫‖x−T(x)‖² dμ(x) + λ·γK²(μ∘T⁻¹, ν),
where the cost is the squared Euclidean distance (L²‑cost). The transport map T is parametrized by a deep neural network (a multilayer perceptron) with parameters θ. In practice the MMD term is estimated from minibatches using the unbiased U‑statistic estimator, and the whole loss is minimized by stochastic gradient descent (Adam) with a fixed learning rate of 1e‑4.
Theoretical contributions are encapsulated in Theorem 2.1. Under the assumptions that μ is absolutely continuous with respect to Lebesgue measure, both μ and ν have finite second moments, the cost is quadratic, and the kernel‑induced MMD metrizes weak convergence, the authors consider a sequence of penalty parameters λn → ∞ and approximation errors εn → 0. They prove that the sequence of minimizers Tn of Mλn converges in law (under μ) to the unique optimal transport map T* and that the corresponding objective values converge to the optimal Monge cost. The proof follows classical optimal‑transport existence/uniqueness arguments and exploits the fact that MMD controls weak convergence, thereby establishing tightness of the push‑forward measures.
Empirically, three synthetic 2‑dimensional experiments are presented: (1) transforming a two‑moon distribution into two circles, (2) mapping a standard normal distribution to a two‑moon shape, and (3) shifting a standard normal to a normal with mean 5. Each experiment uses 5 000 samples, batch size 500, 3 000 training epochs, and λ set to 1/0.00001. Visual results show that the learned samples match the target distributions, and loss curves plateau within the first 500 epochs.
A performance comparison with the Python Optimal Transport (POT) library is also reported. The authors benchmark CPU (AMD EPYC 9654, 768 GB) and GPU (NVIDIA H100) configurations, measuring the number of samples that can be processed, runtime, and standard deviation of the loss. The proposed method scales well on the GPU, handling up to 60 000 samples, whereas CPU memory limits prevent processing of larger batches. However, the paper does not provide quantitative accuracy metrics such as the Wasserstein distance between the learned push‑forward and ν, making it difficult to assess solution quality relative to established OT solvers.
Strengths of the work include: (i) a clean formulation that turns a hard transport constraint into a differentiable penalty, enabling end‑to‑end training with modern deep‑learning toolkits; (ii) theoretical convergence guarantees that connect the penalty formulation to the classical Monge solution; (iii) demonstration of GPU‑friendly scalability. Weaknesses are: (i) the choice and scheduling of the penalty weight λ lack a principled guideline; the experiments keep λ fixed, which deviates from the asymptotic regime required by the theorem; (ii) the empirical evaluation is limited to low‑dimensional synthetic data, leaving open the question of performance on high‑dimensional or real‑world datasets; (iii) no direct comparison of transport quality (e.g., Wasserstein error) with state‑of‑the‑art OT methods such as Sinkhorn‑regularized solvers is provided.
In summary, the paper proposes an interesting MMD‑based penalty approach to the Monge problem, backed by solid theoretical analysis and initial empirical validation. Future work should address adaptive λ strategies, extend experiments to higher dimensions and real data, and benchmark transport accuracy against established optimal‑transport algorithms.
Comments & Academic Discussion
Loading comments...
Leave a Comment