Efficient Distribution Learning with Error Bounds in Wasserstein Distance
The Wasserstein distance has emerged as a key metric to quantify distances between probability distributions, with applications in various fields, including machine learning, control theory, decision theory, and biological systems. Consequently, learning an unknown distribution with non-asymptotic and easy-to-compute error bounds in Wasserstein distance has become a fundamental problem in many fields. In this paper, we devise a novel algorithmic and theoretical framework to approximate an unknown probability distribution $\mathbb{P}$ from a finite set of samples by an approximate discrete distribution $\widehat{\mathbb{P}}$ while bounding the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$. Our framework leverages optimal transport, nonlinear optimization, and concentration inequalities. In particular, we show that, even if $\mathbb{P}$ is unknown, the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$ can be efficiently bounded with high confidence by solving a tractable optimization problem (a mixed integer linear program) of a size that only depends on the size of the support of $\widehat{\mathbb{P}}$. This enables us to develop intelligent clustering algorithms to optimally find the support of $\widehat{\mathbb{P}}$ while minimizing the Wasserstein distance error. On a set of benchmarks, we demonstrate that our approach outperforms state-of-the-art comparable methods by generally returning approximating distributions with substantially smaller support and tighter error bounds.
💡 Research Summary
The paper addresses the fundamental problem of learning an unknown probability distribution from a finite sample set while providing non‑asymptotic, data‑dependent error bounds measured in the Wasserstein distance. Traditional results give uniform convergence rates that depend only on the sample size, dimension, and support size, often leading to overly conservative bounds that require a huge number of samples to be useful in practice. The authors propose a novel framework that leverages optimal transport theory, mixed‑integer linear programming (MILP), and concentration inequalities to obtain much tighter, high‑confidence bounds that adapt to the observed data.
The core idea is to replace the full empirical distribution (which has support equal to the number of samples) with a compact discrete approximation (\widehat{\mathbb{P}} = \sum_{i=1}^{M} \pi_i \delta_{c_i}). Here (M) is a user‑chosen support size much smaller than the sample size (N). The space (\mathcal{X}) is partitioned into regions ({C_i}_{i=1}^M) and each region is represented by a point (c_i). The mass (\pi_i) is simply the proportion of samples falling into region (C_i). This “clustered empirical distribution” generalizes the standard empirical measure and can be interpreted as a quantization of the unknown distribution.
To bound the Wasserstein distance (W_\rho(\mathbb{P},\widehat{\mathbb{P}})) the authors construct data‑dependent confidence intervals (
Comments & Academic Discussion
Loading comments...
Leave a Comment