ROSS: RObust decentralized Stochastic learning based on Shapley values

ROSS: RObust decentralized Stochastic learning based on Shapley values
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In the paradigm of decentralized learning, a group of agents collaborate to learn a global model using a distributed dataset without a central server; nevertheless, it is severely challenged by the heterogeneity of the data distribution across the agents. For example, the data may be distributed non-independently and identically, and even be noised or poisoned. To address these data challenges, we propose ROSS, a novel robust decentralized stochastic learning algorithm based on Shapley values, in this paper. Specifically, in each round, each agent aggregates the cross-gradient information from its neighbors, i.e., the derivatives of its local model with respect to the datasets of its neighbors, to update its local model in a momentum like manner, while we innovate in weighting the derivatives according to their contributions measured by Shapley values. We perform solid theoretical analysis to reveal the linear convergence speedup of our ROSS algorithm. We also verify the efficacy of our algorithm through extensive experiments on public datasets. Our results demonstrate that, in face of the above variety of data challenges, our ROSS algorithm has significant advantages over existing state-of-the-art proposals in terms of both convergence and prediction accuracy.


💡 Research Summary

The paper tackles a fundamental challenge in fully decentralized machine learning: how to train a global model when agents hold heterogeneous, noisy, or even poisoned data, and there is no central server to coordinate aggregation. The authors propose ROSS (Robust decentralized Stochastic learning based on Shapley values), a novel algorithm that combines cross‑gradient information with Shapley‑value‑based weighting to achieve both robustness and fast convergence.

In each communication round, every agent i computes its own stochastic gradient gᵢᵢ on its local data and broadcasts its current model parameters xᵢ to its neighbors. Upon receiving a neighbor’s model xⱼ, agent i evaluates a cross‑gradient gᵢⱼ, i.e., the gradient of its local loss evaluated at the neighbor’s model. These gradients are sent back to the originating neighbor, so each agent eventually possesses its own gradient and the set of cross‑gradients from all adjacent agents.

To decide how much each gradient should influence the update, the algorithm estimates Shapley values φᵢⱼ locally. Each agent holds a small validation subset Qᵢ sampled from the global data distribution; using Qᵢ, it measures the marginal contribution of each neighbor’s cross‑gradient to the reduction of the global loss. The Shapley value, defined as the average marginal contribution over all possible coalitions, provides a principled, fair weight that captures both individual and synergistic effects among agents.

The model update follows a momentum‑like rule:

xᵢ^{t} = Σ_{j∈Nᵢ} ω_{ij}·xⱼ^{t‑1} – η·


Comments & Academic Discussion

Loading comments...

Leave a Comment