Complexity of Projected Gradient Methods for Strongly Convex Optimization with Hölder Continuous Gradient Terms
This paper studies the complexity of projected gradient descent methods for a class of strongly convex constrained optimization problems where the objective function is expressed as a summation of $m$ component functions, each possessing a gradient that is Hölder continuous with an exponent $α_i \in (0, 1]$. Under this formulation, the gradient of the objective function may fail to be globally Hölder continuous, thereby rendering existing complexity results inapplicable to this class of problems. Our theoretical analysis reveals that, in this setting, the complexity of projected gradient methods is determined by $\hatα = \min_{i \in {1, \dotsc, m}} α_i$. We first prove that, with an appropriately fixed stepsize, the complexity bound for finding an approximate minimizer with a distance to the true minimizer less than $\varepsilon$ is $O (\log (\varepsilon^{-1}) \varepsilon^{2 (\hatα - 1) / (1 + \hatα)})$, which extends the well-known complexity result for $\hatα = 1$. Next we show that the complexity bound can be improved to $O (\log (\varepsilon^{-1}) \varepsilon^{2 (\hatα - 1) / (1 + 3 \hatα)})$ if the stepsize is updated by the universal scheme. We illustrate our complexity results by numerical examples arising from elliptic equations with a non-Lipschitz term.
💡 Research Summary
The paper investigates the iteration complexity of projected gradient descent (PGD) methods for a class of strongly convex constrained optimization problems in which the objective function is a finite sum of m component functions, f(u)=\frac{1}{m}\sum_{i=1}^{m}f_i(u). Each component f_i has a gradient that satisfies a Hölder continuity condition with its own exponent α_i∈(0,1]; that is, ‖∇f_i(u)−∇f_i(v)‖≤L_i‖u−v‖^{α_i}. Because the exponents may differ, the gradient of the total objective f need not be globally Hölder continuous, and therefore existing complexity results that assume a uniform Lipschitz or Hölder constant cannot be applied.
The authors introduce the key parameter \hatα = min_i α_i, the smallest Hölder exponent among the components, and show that the overall complexity of PGD is governed solely by \hatα. They develop three algorithmic frameworks:
- Fixed‑step PGD (Algorithm 1).
By choosing a stepsize τ = ε^{2(1−\hatα)/(1+\hatα)}/M, where M depends on the individual α_i and L_i, they prove that the number of iterations required to obtain an iterate u_k with ‖u_k−u*‖≤ε is
\
Comments & Academic Discussion
Loading comments...
Leave a Comment