Convergence Analysis of Randomized Subspace Normalized SGD under Heavy-Tailed Noise
Randomized subspace methods reduce per-iteration cost; however, in nonconvex optimization, most analyses are expectation-based, and high-probability bounds remain scarce even under sub-Gaussian noise. We first prove that randomized subspace SGD (RS-SGD) admits a high-probability convergence bound under sub-Gaussian noise, achieving the same order of oracle complexity as prior in-expectation results. Motivated by the prevalence of heavy-tailed gradients in modern machine learning, we then propose randomized subspace normalized SGD (RS-NSGD), which integrates direction normalization into subspace updates. Assuming the noise has bounded $p$-th moments, we establish both in-expectation and high-probability convergence guarantees, and show that RS-NSGD can achieve better oracle complexity than full-dimensional normalized SGD.
💡 Research Summary
This paper investigates stochastic optimization in high‑dimensional non‑convex settings where each iteration updates the parameters using only a low‑dimensional random subspace. The authors first consider Randomized Subspace Stochastic Gradient Descent (RS‑SGD), which projects the stochastic gradient onto a Haar‑random orthogonal matrix (P_k\in\mathbb{R}^{d\times r}) (with (r\ll d)) and performs the update (x_{k+1}=x_k-\bar\eta,P_kP_k^{\top}g_k). While prior work (Flynn et al., 2020) provided only expectation‑based convergence under a bounded‑variance (BV) assumption, this work establishes a high‑probability convergence guarantee under a sub‑Gaussian noise model (Assumption 2.4). By leveraging concentration for the mini‑batch estimator and properties of Haar matrices (e.g., (E
Comments & Academic Discussion
Loading comments...
Leave a Comment