A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization
Decentralized optimization is a powerful paradigm that finds applications in engineering and learning design. This work studies decentralized composite optimization problems with non-smooth regularization terms. Most existing gradient-based proximal decentralized methods are known to converge to the optimal solution with sublinear rates, and it remains unclear whether this family of methods can achieve global linear convergence. To tackle this problem, this work assumes the non-smooth regularization term is common across all networked agents, which is the case for many machine learning problems. Under this condition, we design a proximal gradient decentralized algorithm whose fixed point coincides with the desired minimizer. We then provide a concise proof that establishes its linear convergence. In the absence of the non-smooth term, our analysis technique covers the well known EXTRA algorithm and provides useful bounds on the convergence rate and step-size.
💡 Research Summary
This paper addresses a long‑standing open problem in decentralized optimization: achieving global linear convergence for proximal‑gradient‑type methods when the objective includes a nonsmooth regularizer. The authors focus on the practically important setting where all agents share the same nonsmooth term R(w) (e.g., a global ℓ₁ penalty, elastic‑net, or indicator of a convex set). Under this assumption, they propose a new algorithm—named Proximal Primal‑Dual Diffusion (P2D2)—that operates solely with local communications and a single vector exchange per iteration.
The problem formulation is a composite consensus optimization: each of K agents holds a smooth local loss Jₖ(w) with Lipschitz‑continuous gradient and the common regularizer R(w). The global objective is
w* = arg min₍w∈ℝᴹ₎ (1/K)∑ₖ Jₖ(w) + R(w).
Strong convexity of the average loss \bar J(w) (parameter ν) and Lipschitz smoothness (parameter δ) are assumed, with 0 < ν ≤ δ.
To enforce consensus, the authors introduce a symmetric doubly‑stochastic weight matrix A describing the network and define B = ½(Iₖ − A)⊗Iₘ. The consensus constraint B^{1/2}W = 0 (where W stacks all local copies) is incorporated into an augmented Lagrangian with dual variable Y and penalty μ. The resulting saddle‑point problem leads to the following primal‑dual recursions:
- Zᵢ = W_{i‑1} – μ∇J_μ(W_{i‑1}) – B^{1/2}Y_{i‑1}
- Yᵢ = Y_{i‑1} + α B^{1/2}Zᵢ
- Wᵢ = prox_{μR}(Zᵢ),
where J_μ(W) = J(W) + (μ/2)‖B^{1/2}W‖² and α>0 is a dual step‑size. By initializing Y₀ = 0 and exploiting the structure of B, the authors show that the algorithm can be implemented in a fully decentralized fashion: each node needs only the weighted combination of its neighbors’ current variables (the “communication step”) and a local gradient evaluation.
A key technical contribution is the proof that the fixed point (W*, Y*, Z*) of these recursions exists, is unique, and satisfies W* = 1_K⊗w* where w* solves the original problem. The proof hinges on the property B^{1/2}Z* = 0, which forces all blocks of Z* to be equal, and on the definition of the proximal operator.
The convergence analysis proceeds by defining an error vector e_i =
Comments & Academic Discussion
Loading comments...
Leave a Comment