Iteration Complexity of Randomized Block-Coordinate Descent Methods for Minimizing a Composite Function
In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an $\epsilon$-accurate solution with probability at least $1-\rho$ in at most $O(\tfrac{n}{\epsilon} \log \tfrac{1}{\rho})$ iterations, where $n$ is the number of blocks. For strongly convex functions the method converges linearly. This extends recent results of Nesterov [Efficiency of coordinate descent methods on huge-scale optimization problems, CORE Discussion Paper #2010/2], which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing $\epsilon$ from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving true iteration complexity bounds. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Finally, we demonstrate numerically that the algorithm is able to solve huge-scale $\ell_1$-regularized least squares and support vector machine problems with a billion variables.
💡 Research Summary
This paper studies the iteration complexity of randomized block‑coordinate descent (RBCD) methods for minimizing a composite convex function of the form
F(x) = f(x) + Ψ(x), x ∈ ℝᴺ,
where f is smooth with block‑wise Lipschitz‑continuous gradients and Ψ is a simple block‑separable convex (possibly nonsmooth) regularizer such as ℓ₁‑norm, box constraints, group‑lasso, etc. The authors propose a generic algorithm (Algorithm 1, RCDC) that at each iteration randomly selects a block i according to a probability vector p and performs a proximal update on that block using the block‑specific Lipschitz constant L_i and a positive‑definite scaling matrix B_i. Two concrete variants are examined in depth: (i) Uniform Block‑Coordinate Descent (UCDC) where each block is chosen with equal probability, and (ii) Randomized Coordinate Descent for Smooth functions (RCDS) where Ψ ≡ 0 and arbitrary probabilities are allowed.
The main theoretical contributions are high‑probability (1 − ρ) iteration‑complexity bounds that are explicit in the desired accuracy ε, confidence level ρ, and problem parameters. For the non‑strongly convex composite case, Theorem 4 shows that the number of iterations required to obtain an ε‑accurate solution with probability at least 1 − ρ satisfies
k ≤ 2 n · max{R_W²·L(x⁰), F(x⁰) − F*} · ε⁻¹ · (1 + log (1/ρ)),
where n is the number of blocks, L(x⁰) aggregates the block Lipschitz constants, and R_W(x⁰) measures the distance of the starting point to the optimal set in the norm induced by the block scaling matrices. In the strongly convex case (μ > 0) a linear convergence rate is obtained (Theorem 7):
k ≤ max{4/μ, μ/(μ − 1)} · n · log
Comments & Academic Discussion
Loading comments...
Leave a Comment