Geometry-Aware Optimal Transport: Fast Intrinsic Dimension and Wasserstein Distance Estimation

Geometry-Aware Optimal Transport: Fast Intrinsic Dimension and Wasserstein Distance Estimation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Solving large scale Optimal Transport (OT) in machine learning typically relies on sampling measures to obtain a tractable discrete problem. While the discrete solver’s accuracy is controllable, the rate of convergence of the discretization error is governed by the intrinsic dimension of our data. Therefore, the true bottleneck is the knowledge and control of the sampling error. In this work, we tackle this issue by introducing novel estimators for both sampling error and intrinsic dimension. The key finding is a simple, tuning-free estimator of $\text{OT}_c(ρ, \hatρ)$ that utilizes the semi-dual OT functional and, remarkably, requires no OT solver. Furthermore, we derive a fast intrinsic dimension estimator from the multi-scale decay of our sampling error estimator. This framework unlocks significant computational and statistical advantages in practice, enabling us to (i) quantify the convergence rate of the discretization error, (ii) calibrate the entropic regularization of Sinkhorn divergences to the data’s intrinsic geometry, and (iii) introduce a novel, intrinsic-dimension-based Richardson extrapolation estimator that strongly debiases Wasserstein distance estimation. Numerical experiments demonstrate that our geometry-aware pipeline effectively mitigates the discretization error bottleneck while maintaining computational efficiency.


💡 Research Summary

The paper addresses a fundamental bottleneck in large‑scale machine learning applications of Optimal Transport (OT): the error introduced when continuous probability measures are replaced by empirical (discrete) approximations. While modern OT solvers (e.g., Sinkhorn iterations) can compute the transport cost efficiently, the statistical error caused by sampling dominates the overall accuracy, especially in high‑dimensional settings where the convergence rate is O(n⁻¹ᐟᵈ). The authors propose a unified, geometry‑aware framework that (1) provides a fast, solver‑free estimator of the discretization error, (2) extracts the intrinsic dimension of the data from the decay of this estimator, and (3) leverages both quantities to debias Wasserstein distance estimates via a novel diagonal Richardson extrapolation that jointly tunes the entropic regularization parameter ε and the sample size n.

Solver‑free discretization error estimator.
The key theoretical insight (Proposition 3.1) is that for a fixed support set X = {x₁,…,xₙ} drawn i.i.d. from a distribution ρ, the optimal discrete approximation ρₙ is obtained by assigning to each support point a weight equal to the probability mass of its Voronoi cell under the cost c. With these optimal weights, the semi‑dual OT formulation collapses to the zero potential, and the OT cost reduces to the expectation of the c‑transform of the zero vector, i.e., the average distance from a fresh sample to its nearest support point. Consequently, the error OT_c(ρ, ρₙ) can be estimated by a simple Monte‑Carlo average
 d_OTⁿᴺ = (1/N) ∑{k=1}^N min{j≤n} c(X_k, x_j),
where X_k are additional i.i.d. draws from ρ. This estimator requires no OT solver, is trivially parallelizable on GPUs, and enjoys exponential concentration (Proposition 3.2) via an empirical Bernstein bound. The same Monte‑Carlo scheme yields unbiased estimates of the optimal weights w_i.

Intrinsic dimension estimator.
Statistical OT theory shows that the discretization error decays as n⁻¹ᐟd_eff, where d_eff is an effective dimension that can vary with the observation scale ε (covering numbers). By evaluating d_OTⁿᴺ for several sample sizes n and fitting a log‑log regression, the authors obtain a multi‑scale estimate d_int that captures the manifold’s intrinsic geometry. The procedure runs in O(n) time and O(n) memory, making it suitable for millions of points. Empirical results on synthetic manifolds (circles, tori, Swiss rolls) demonstrate that the estimated d_int matches the true manifold dimension within a few percent.

Diagonal Richardson extrapolation.
Entropic regularization introduces a bias that scales as ε, while sampling introduces a bias scaling as n⁻¹ᐟd_int. Prior work (e.g., Chizat et al., 2020) applied Richardson extrapolation only on ε, achieving an O(n⁻²ᐟ(d+4)) rate for the squared 2‑Wasserstein distance. The authors propose a “diagonal” extrapolation that simultaneously varies ε and n according to the relation ε ∝ n⁻¹ᐟ(d_int+4). By computing Sinkhorn divergences S_ε(μ̂_n, ν̂_n) at two (ε, n) pairs and forming a linear combination that cancels the first‑order terms in both ε and n, they achieve a convergence rate of o(n⁻²ᐟ(d_int+4)) for W₂², which is strictly faster when d_int ≪ d. Experiments confirm that the debiased estimator dramatically reduces bias (up to an order of magnitude) compared with standard Sinkhorn or ε‑only Richardson methods, especially on low‑intrinsic‑dimension data.

Experimental validation.
The pipeline is evaluated on (i) synthetic manifolds of known dimension, where the discretization error estimator follows the predicted power law and the intrinsic dimension estimator recovers the ground truth; (ii) real image datasets (MNIST, CIFAR‑10), where the estimated intrinsic dimensions are ≈12 and ≈30 respectively, and the diagonal extrapolation yields Wasserstein distance estimates with substantially lower RMSE than baselines, using far fewer samples. Computationally, the Monte‑Carlo error estimator and weight computation are an order of magnitude faster than solving a semi‑discrete OT problem, and the overall pipeline scales linearly with the number of points.

Impact and future directions.
By turning the traditionally “unknown” sampling error into a directly measurable quantity and by extracting the data’s intrinsic geometry from it, the paper bridges the gap between statistical OT theory and practical OT computation. The framework is modular: the error estimator, dimension estimator, and extrapolation can be combined with any OT solver (Sinkhorn, Greenkhorn, stochastic OT) and extended to other regularizations (quadratic, group‑sparse) or to non‑Euclidean costs. This opens the door to principled, data‑adaptive regularization in high‑dimensional generative modeling, domain adaptation, and beyond.


Comments & Academic Discussion

Loading comments...

Leave a Comment