Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) being distributed over $n$ machines that can only communicate to their neighbors on a fixed communication graph. To reduce the communication bottleneck, the nodes compress (e.g. quantize or sparsify) their model updates. We cover both unbiased and biased compression operators with quality denoted by $\omega \leq 1$ ($\omega=1$ meaning no compression). We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate $\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations and $\delta$ the eigengap of the connectivity matrix. Despite compression quality and network connectivity affecting the higher order terms, the first term in the rate, $\mathcal{O}(1/(nT))$, is the same as for the centralized baseline with exact communication. We (ii) present a novel gossip algorithm, CHOCO-GOSSIP, for the average consensus problem that converges in time $\mathcal{O}(1/(\delta^2\omega) \log (1/\epsilon))$ for accuracy $\epsilon > 0$. This is (up to our knowledge) the first gossip algorithm that supports arbitrary compressed messages for $\omega > 0$ and still exhibits linear convergence. We (iii) show in experiments that both of our algorithms do outperform the respective state-of-the-art baselines and CHOCO-SGD can reduce communication by at least two orders of magnitudes.

💡 Research Summary

This paper tackles the problem of decentralized stochastic optimization in a setting where n machines hold disjoint data and can only exchange information with immediate neighbors on a fixed communication graph. To alleviate the communication bottleneck that is inherent in such distributed systems, the authors allow each node to compress its model updates before transmission, using either quantization, sparsification, or any other compression operator Q:ℝᵈ→ℝᵈ. The quality of the compressor is captured by a scalar ω ∈ (0, 1] with ω = 1 corresponding to exact (uncompressed) communication; smaller ω means higher compression but larger distortion. Importantly, the analysis covers both unbiased and biased compressors, extending beyond prior work that typically assumes unbiasedness or very high precision.

The main contributions are two algorithms:

CHOCO‑SGD (Compressed Gossip Stochastic Gradient Descent).
- Each node performs a local stochastic gradient step on its own objective fᵢ, then computes the difference between its current model xᵢᵗ and a locally stored “compressed estimate” \hat{x}ᵢᵗ. This difference is compressed with Q and sent to neighbors.
- Neighbors aggregate the received compressed differences, update their own estimates, and continue.
- The algorithm preserves the global average of the iterates while gradually reducing compression error.
- For strongly convex objectives, the authors prove a convergence rate of
  \

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

💡 Research Summary

Comments & Academic Discussion

Leave a Comment