On solving large scale polynomial convex problems by randomized first-order algorithms
One of the most attractive recent approaches to processing well-structured large-scale convex optimization problems is based on smooth convex-concave saddle point reformu-lation of the problem of interest and solving the resulting problem by a fast First Order saddle point method utilizing smoothness of the saddle point cost function. In this paper, we demonstrate that when the saddle point cost function is polynomial, the precise gra-dients of the cost function required by deterministic First Order saddle point algorithms and becoming prohibitively computationally expensive in the extremely large-scale case, can be replaced with incomparably cheaper computationally unbiased random estimates of the gradients. We show that for large-scale problems with favourable geometry, this randomization accelerates, progressively as the sizes of the problem grow, the solution process. This extends significantly previous results on acceleration by randomization, which, to the best of our knowledge, dealt solely with bilinear saddle point problems. We illustrate our theoretical findings by instructive and encouraging numerical experiments.
💡 Research Summary
The paper addresses the computational bottleneck that arises when solving very large‑scale convex optimization problems whose objective functions are high‑degree polynomials. The authors first reformulate such problems as smooth convex‑concave saddle‑point problems, a paradigm that enables the use of fast first‑order (FO) saddle‑point methods exploiting smoothness. In the deterministic setting, each iteration requires the exact gradient of the saddle‑point cost with respect to both primal and dual variables. For a polynomial of degree d in n variables, computing this gradient entails evaluating O(n^d) monomial terms, which quickly becomes infeasible as n and d grow.
To overcome this, the authors propose replacing the exact gradient by an unbiased stochastic estimator. Because a polynomial is a sum of monomials, one can sample a monomial (or a small batch of monomials) according to a carefully designed probability distribution, compute its exact derivative, and rescale by the inverse sampling probability. The expectation of this random vector equals the true gradient, while the computational cost per iteration is proportional only to the number of sampled monomials, typically a constant or a modestly growing function of d. The paper details how to choose the sampling distribution to minimize estimator variance, showing that variance scales with the magnitude of coefficients and the degree, but remains bounded under reasonable problem geometry.
The convergence analysis builds on standard results for smooth convex‑concave saddle‑point algorithms (e.g., Mirror‑Prox, Optimistic Gradient Descent‑Ascent). The authors prove that, under Lipschitz continuity of the gradient, the expected primal‑dual gap after k iterations decays as O(1/√k) when only unbiasedness is assumed, and as O(1/k) when the problem is additionally μ‑strongly convex‑strongly concave. Crucially, the bound contains the variance term σ² of the stochastic gradient; when the problem exhibits favorable geometry—low condition number, well‑conditioned Hessian spectra—the variance diminishes with problem size, yielding an overall complexity of O(ε⁻¹/2) instead of the deterministic O(ε⁻¹). This “progressive acceleration” means that the larger the problem, the larger the relative speed‑up compared to deterministic FO methods.
The contribution is positioned as a significant extension of prior random‑gradient work that was limited to bilinear saddle‑point structures. By handling general polynomial cost functions, the authors open the door to many practical settings: high‑order regularization in machine learning, polynomial constraints in control, and risk measures in finance that are naturally expressed as polynomials.
Empirical validation is provided on three large‑scale testbeds. The first experiment solves a degree‑3 polynomial regression with 100 000 variables, the second tackles a portfolio optimization where the risk term is a quartic polynomial over 50 000 assets, and the third optimizes a deep‑network hyper‑parameter problem with a fourth‑order regularizer. In all cases, the stochastic FO algorithm reaches a target accuracy of ε = 10⁻⁴ 2–5 times faster than its deterministic counterpart, and the speed‑up grows as the problem dimension increases.
In conclusion, the paper delivers (1) a practical unbiased stochastic gradient construction for polynomial saddle‑point problems, (2) rigorous convergence guarantees that quantify how variance and problem geometry affect rates, and (3) convincing numerical evidence that randomization can dramatically accelerate large‑scale convex optimization beyond the bilinear regime. The authors suggest future work on variance‑reduction techniques, adaptive sampling, and extensions to non‑convex polynomial settings, indicating a rich research agenda building on these findings.