Bregman Stochastic Proximal Point Algorithm with Variance Reduction

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Stochastic algorithms, especially stochastic gradient descent (SGD), have proven to be the go-to methods in data science and machine learning. In recent years, the stochastic proximal point algorithm (SPPA) emerged, and it was shown to be more robust than SGD with respect to stepsize settings. However, SPPA still suffers from a decreased convergence rate due to the need for vanishing stepsizes, which is resolved by using variance reduction methods. In the deterministic setting, there are many problems that can be solved more efficiently when viewing them in a non-Euclidean geometry using Bregman distances. This paper combines these two worlds and proposes variance reduction techniques for the Bregman stochastic proximal point algorithm (BSPPA). As special cases, we obtain SAGA- and SVRG-like variance reduction techniques for BSPPA. Our theoretical and numerical results demonstrate improved stability and convergence rates compared to the vanilla BSPPA with constant and vanishing stepsizes, respectively. Our analysis, also, allow to recover the same variance reduction techniques for Bregman SGD in a unified way.

💡 Research Summary

This paper addresses the limitations of stochastic optimization methods that rely on Euclidean geometry and suffer from high variance, particularly stochastic gradient descent (SGD) and the stochastic proximal point algorithm (SPPA). While SPPA improves robustness to stepsize selection by using proximal operators instead of gradients, it still requires diminishing stepsizes to guarantee convergence because the stochastic estimator of the proximal operator retains variance. In parallel, variance‑reduction techniques such as SAGA and SVRG have been shown to eliminate the need for vanishing stepsizes in Euclidean settings, achieving faster sublinear or linear convergence rates.

The authors propose a unified framework that brings together two powerful ideas: (i) the use of Bregman distances, which capture problem‑specific geometry (e.g., KL‑divergence, entropy, or other non‑Euclidean measures), and (ii) generic variance‑reduction mechanisms that can be instantiated as SAGA‑like, SVRG‑like, or L‑SVRG‑like schemes. The resulting algorithm, called the Bregman Stochastic Proximal Point Algorithm with Variance Reduction (BSPPA‑VR), is presented as Algorithm 4.1. It augments the vanilla BSPPA update with a perturbation vector (e_k) that encodes the variance‑reduction correction. By defining a virtual explicit iterate (z_{k+1}) and a virtual gradient‑like term (v_k = g_k - e_k), the authors are able to write the update in a form that mirrors explicit Bregman gradient steps, which greatly simplifies the analysis.

A set of abstract assumptions (Assumption 4.5) is introduced to capture the statistical properties required for any variance‑reduction scheme: unbiasedness of the correction, an expected Bregman‑smoothness bound on the distance between successive iterates, and a recursion controlling the evolution of a variance proxy (\sigma_k). These assumptions are later verified for each concrete algorithm.

Two main convergence theorems are proved. Theorem 4.6 handles the case where the objective (F) is merely convex (relative to the Bregman kernel). Under a modest condition (\rho>0) on the variance‑reduction recursion and a stepsize bound (\alpha_k < 1/(A_k+M_k C_k)), the expected suboptimality decays as (O(1/k)). Theorem 4.7 treats relatively strongly convex objectives (parameter (\mu>0)). It shows a linear (geometric) rate (V_{k+1} \le q_k V_k + \mathbb{E}

Bregman Stochastic Proximal Point Algorithm with Variance Reduction

💡 Research Summary

Comments & Academic Discussion

Leave a Comment