Achieving Linear Speedup for Composite Federated Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper proposes FedNMap, a normal map-based method for composite federated learning, where the objective consists of a smooth loss and a possibly nonsmooth regularizer. FedNMap leverages a normal map-based update scheme to handle the nonsmooth term and incorporates a local correction strategy to mitigate the impact of data heterogeneity across clients. Under standard assumptions, including smooth local losses, weak convexity of the regularizer, and bounded stochastic gradient variance, FedNMap achieves linear speedup with respect to both the number of clients $n$ and the number of local updates $Q$ for nonconvex losses, both with and without the Polyak-Łojasiewicz (PL) condition. To our knowledge, this is the first result establishing linear speedup for nonconvex composite federated learning.

💡 Research Summary

The paper introduces FedNMap, a novel federated learning (FL) algorithm designed to handle composite optimization problems where the global objective consists of a smooth loss term f(x) and a possibly nonsmooth regularizer φ(x). Traditional FL methods focus on smooth objectives and achieve linear speedup—i.e., the number of communication rounds needed to reach a target accuracy scales inversely with the product of the number of participating clients n and the number of local updates Q. However, many practical machine‑learning tasks involve nonsmooth regularizers such as ℓ₁ norms, indicator functions of constraint sets, or model‑pruning penalties. Existing FL approaches either assume convexity, require homogeneous data, need bounded subgradients of φ, or fail to provide convergence guarantees for nonconvex composite objectives.

FedNMap addresses these gaps by leveraging a normal‑map based update scheme. The normal map is defined as Fγnor(z)=∇f(proxγφ(z)) + γ⁻¹(z−proxγφ(z)). This construction has two crucial properties: (1) Fγnor(z) lies in the subdifferential ∂ψ(proxγφ(z)) where ψ(x)=f(x)+φ(x), and (2) the expectation of the stochastic normal‑map estimator remains unbiased when the stochastic gradient of f is unbiased. Consequently, the bias introduced by the proximal operator in standard Prox‑SGD updates is eliminated, even when multiple local steps are performed.

The algorithm proceeds as follows. At each communication round t, the server broadcasts the global auxiliary variable z_t and the average correction term (1/n)∑j y{j,t‑1}. Each client i computes x_t = proxγφ(z_t) and initializes a local correction c_{i,t}. The correction is updated as c_{i,t}=c_{i,t‑1} – y_{i,t‑1} + (1/n)∑j y{j,t‑1}. Then the client performs Q local updates: for ℓ=0,…,Q‑1 it computes x_{ℓ,i,t}=proxγφ(z_{ℓ,i,t}) and updates the auxiliary variable via
z_{ℓ+1,i,t}=z_{ℓ,i,t} – η_a

Achieving Linear Speedup for Composite Federated Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment