Dynamic Topology Optimization for Non-IID Data in Decentralized Learning
Decentralized learning (DL) enables a set of nodes to train a model collaboratively without central coordination, offering benefits for privacy and scalability. However, DL struggles to train a high accuracy model when the data distribution is non-independent and identically distributed (non-IID) and when the communication topology is static. To address these issues, we propose Morph, a topology optimization algorithm for DL. In Morph, nodes adaptively choose peers for model exchange based on maximum model dissimilarity. Morph maintains a fixed in-degree while dynamically reshaping the communication graph through gossip-based peer discovery and diversity-driven neighbor selection, thereby improving robustness to data heterogeneity. Experiments on CIFAR-10 and FEMNIST with up to 100 nodes show that Morph consistently outperforms static and epidemic baselines, while closely tracking the fully connected upper bound. On CIFAR-10, Morph achieves a relative improvement of 1.12x in test accuracy compared to the state-of-the-art baselines. On FEMNIST, Morph achieves an accuracy that is 1.08x higher than Epidemic Learning. Similar trends hold for 50 node deployments, where Morph narrows the gap to the fully connected upper bound within 0.5 percentage points on CIFAR-10. These results demonstrate that Morph achieves higher final accuracy, faster convergence, and more stable learning as quantified by lower inter-node variance, while requiring fewer communication rounds than baselines and no global knowledge.
💡 Research Summary
The paper addresses a fundamental challenge in decentralized learning (DL): when data across nodes is non‑independent and identically distributed (non‑IID) and the communication graph is static, the system struggles to achieve high‑accuracy models. To overcome this, the authors propose Morph, a fully decentralized topology‑optimization algorithm that dynamically reshapes the peer‑to‑peer communication graph based on model dissimilarity while keeping each node’s in‑degree fixed.
Core Mechanism
Morph measures the “distance” between models using cosine similarity. For each layer the parameter vectors are normalized, the cosine of the angle between two nodes’ weight vectors is computed, and the per‑layer scores are averaged, yielding a scale‑invariant similarity metric that can be computed locally with negligible communication overhead. When a node does not have direct access to another peer’s model, it estimates similarity transitively through intermediate peers using the angular inequality (a quasi‑transitive bound).
Every Δr rounds, each node updates its set of desired senders. It builds a candidate pool C_A of known peers and a full pool C of all peers it has learned about (through gossip). Using a soft‑max distribution p_j ∝ exp(−β·sim(w_i,w_j)), the node samples k peers sequentially, favoring those with the lowest similarity (i.e., highest dissimilarity). The sampled peers become the node’s incoming connections; because the in‑degree is fixed, every node receives exactly k models each round, preventing isolation and over‑fitting.
To keep the out‑degree bounded, Morph adopts a matching process analogous to the college‑admissions problem: a node accepts incoming connection requests up to k; if already full, it replaces the least dissimilar existing connection with a more dissimilar request. This ensures that each node also sends its model to at most k peers, and the matching converges in at most ⌈(n‑1)/k⌉ steps.
Preserving Global Connectivity
Purely similarity‑driven selection could fragment the network into tightly knit clusters, harming global mixing. Morph therefore injects a small random component: after selecting k dissimilar peers (C_b), it uniformly samples s‑k additional peers from the unknown portion of C, forming the final neighbor view V = C_b ∪ R. This two‑step peer‑sampling protocol guarantees that the overall graph remains connected with only O(log n) extra messages per node per round, preserving scalability.
Experimental Evaluation
The authors evaluate Morph on CIFAR‑10 and FEMNIST under realistic non‑IID partitions generated by a Dirichlet distribution. Experiments are conducted with 50 and 100 nodes. Baselines include static graph D‑PSGD, Epidemic Learning (random peer selection), and a fully connected “upper bound”. Results show that Morph consistently outperforms the baselines: on CIFAR‑10 it achieves a 1.12× relative improvement in test accuracy, and on FEMNIST a 1.08× improvement over Epidemic Learning. With 100 nodes, Morph narrows the gap to the fully connected upper bound to within 0.5 percentage points, reduces the number of communication rounds needed for convergence by roughly 15‑20 %, and yields lower inter‑node variance, indicating more stable and fair learning across the network.
Limitations and Future Work
The paper acknowledges that cosine similarity estimates can accumulate error when inferred transitively, and that the hyper‑parameters β and Δr require careful tuning. Moreover, the current design assumes honest peers; malicious actors could manipulate similarity reports. Future directions suggested include more robust similarity estimators (e.g., learned embeddings), adaptive β schedules, and Byzantine‑resilient peer‑selection mechanisms.
Conclusion
Morph demonstrates that dynamic, locally‑driven topology adaptation based on model dissimilarity can substantially mitigate the adverse effects of non‑IID data in decentralized learning. By keeping a fixed in‑degree, blending similarity‑driven and random peer selections, and operating without any global coordinator or global knowledge, Morph achieves higher final accuracy, faster convergence, and more uniform performance across nodes while maintaining low communication overhead. This makes it a practical and scalable solution for privacy‑preserving, large‑scale edge learning scenarios.
Comments & Academic Discussion
Loading comments...
Leave a Comment