Finite-Sample Wasserstein Error Bounds and Concentration Inequalities for Nonlinear Stochastic Approximation

Finite-Sample Wasserstein Error Bounds and Concentration Inequalities for Nonlinear Stochastic Approximation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper derives non-asymptotic error bounds for nonlinear stochastic approximation algorithms in the Wasserstein-$p$ distance. To obtain explicit finite-sample guarantees for the last iterate, we develop a coupling argument that compares the discrete-time process to a limiting Ornstein-Uhlenbeck process. Our analysis applies to algorithms driven by general noise conditions, including martingale differences and functions of ergodic Markov chains. Complementing this result, we handle the convergence rate of the Polyak-Ruppert average through a direct analysis that applies under the same general setting. Assuming the driving noise satisfies a non-asymptotic central limit theorem, we show that the normalized last iterates converge to a Gaussian distribution in the $p$-Wasserstein distance at a rate of order $γ_n^{1/6}$, where $γ_n$ is the step size. Similarly, the Polyak-Ruppert average is shown to converge in the Wasserstein distance at a rate of order $n^{-1/6}$. These distributional guarantees imply high-probability concentration inequalities that improve upon those derived from moment bounds and Markov’s inequality. We demonstrate the utility of this approach by considering two applications: (1) linear stochastic approximation, where we explicitly quantify the transition from heavy-tailed to Gaussian behavior of the iterates, thereby bridging the gap between recent finite-sample analyses and asymptotic theory and (2) stochastic gradient descent, where we establish rate of convergence to the central limit theorem.


💡 Research Summary

This paper develops non‑asymptotic convergence guarantees for nonlinear stochastic approximation (SA) algorithms measured in the Wasserstein‑p distance, a metric that is stronger than weak convergence and captures distributional closeness. The authors focus on two estimators: the last iterate (x_n) and the Polyak‑Ruppert averaged iterate (\bar{x}_n). Their analysis proceeds by coupling the discrete‑time SA recursion with a limiting Ornstein‑Uhlenbeck (OU) diffusion, allowing explicit control of the distance between the law of the scaled error and the Gaussian law of the OU stationary distribution.

The setting assumes a diminishing step‑size (\gamma_k = \gamma_1 k^{-a}) with (a\in(0,1]). The drift function (f) and its Jacobian are globally Lipschitz, and the algorithm is driven by two sources of noise: (i) a Markov chain ({\xi_k}) satisfying a geometric drift condition (ψ‑irreducibility, aperiodicity, and a Lyapunov function (V)), and (ii) an exogenous martingale‑difference sequence ({W_k}) independent of the chain. Under these conditions a Lyapunov matrix (Q) is defined via a Lyapunov equation, yielding a weighted norm (|x| = \sqrt{x^\top Q x}) under which the linearized mean field (-\bar A_a) is Hurwitz stable.

A key technical device is the solution of Poisson equations for the Markov noise, which produces a martingale difference sequence \


Comments & Academic Discussion

Loading comments...

Leave a Comment