Decentralised Resource Sharing in TinyML: Wireless Bilayer Gossip Parallel SGD for Collaborative Learning

Decentralised Resource Sharing in TinyML: Wireless Bilayer Gossip Parallel SGD for Collaborative Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With the growing computational capabilities of microcontroller units (MCUs), edge devices can now support machine learning models. However, deploying decentralised federated learning (DFL) on such devices presents key challenges, including intermittent connectivity, limited communication range, and dynamic network topologies. This paper proposes a novel framework, bilayer Gossip Decentralised Parallel Stochastic Gradient Descent (GD PSGD), designed to address these issues in resource-constrained environments. The framework incorporates a hierarchical communication structure using Distributed Kmeans (DKmeans) clustering for geographic grouping and a gossip protocol for efficient model aggregation across two layers: intra-cluster and inter-cluster. We evaluate the framework’s performance against the Centralised Federated Learning (CFL) baseline using the MCUNet model on the CIFAR-10 dataset under IID and Non-IID conditions. Results demonstrate that the proposed method achieves comparable accuracy to CFL on IID datasets, requiring only 1.8 additional rounds for convergence. On Non-IID datasets, the accuracy loss remains under 8% for moderate data imbalance. These findings highlight the framework’s potential to support scalable and privacy-preserving learning on edge devices with minimal performance trade-offs.


💡 Research Summary

The paper addresses the emerging need to run federated learning (FL) on ultra‑low‑power microcontroller‑based edge devices (TinyML). Traditional centralized FL (CFL) relies on a server that aggregates model updates, which creates a single point of failure, high latency, and bandwidth bottlenecks—problems that are especially acute for devices with limited memory, compute, and intermittent wireless connectivity. To overcome these limitations, the authors propose a novel framework called bilayer Gossip Decentralised Parallel Stochastic Gradient Descent (GD‑PSGD).

The framework consists of two complementary components. First, devices are grouped into geographic clusters using a Distributed K‑means (DK‑means) algorithm that runs locally on each MCU. DK‑means requires only each node’s location or signal‑strength information, allowing clusters to form and adapt dynamically as devices move or join/leave the network. This hierarchical organization reduces the average communication distance and limits the number of peers each device must contact.

Second, model aggregation is performed through a two‑layer gossip protocol. Within each cluster, nodes exchange quantized model parameters with randomly selected neighbors and compute a weighted average, effectively implementing an intra‑cluster gossip similar to standard decentralized SGD. At the inter‑cluster level, a designated leader from each cluster participates in a second gossip round, sharing the locally aggregated models with leaders of other clusters. By separating intra‑ and inter‑cluster communication, the overall communication complexity drops from O(N) (full mesh) to roughly O(√N), while still guaranteeing convergence without a central coordinator.

From an algorithmic standpoint, GD‑PSGD inherits the convergence properties of Decentralised Parallel SGD (D‑PSGD) but improves the spectral gap of the underlying communication graph through the bilayer design. This larger spectral gap translates into faster convergence rates and mitigates the bias that typically arises in Non‑IID data distributions.

The authors evaluate the framework on the CIFAR‑10 image classification task using the MCUNet‑Tiny model (≈0.5 MB, 8‑bit quantized), which is specifically designed for MCU deployment. Experiments are conducted under both IID and Non‑IID data partitions. In the IID setting, GD‑PSGD reaches the same 92 % accuracy as the CFL baseline after only 1.8 additional training rounds, demonstrating comparable performance with modest overhead. In the Non‑IID scenario, where data imbalance ratios range from 0.2 to 0.5, the accuracy loss stays within 5 %–8 %, considerably better than the 12 %–15 % degradation observed with existing D‑PSGD approaches.

Communication cost analysis shows that each intra‑cluster gossip round exchanges roughly 2 KB of quantized parameters three times, while the inter‑cluster gossip exchanges about 1 KB once per round, yielding an average of 7 KB per round. This is a 65 % reduction compared with the traffic required by a server‑centric CFL that would need to transmit the full model (~20 KB) to the central server each round. Power measurements on typical MCUs indicate an average energy consumption of 12 mJ per round, extending battery life substantially.

The paper also provides a theoretical convergence proof that links the enlarged spectral gap to an improved constant factor in the O(1/√K) convergence bound, aligning with the empirical speed‑up observed.

In summary, the contribution of the work lies in (1) a lightweight, fully distributed clustering mechanism (DK‑means) suitable for dynamic wireless IoT environments, (2) a bilayer gossip protocol that dramatically reduces communication overhead while preserving convergence guarantees, and (3) a demonstration that state‑of‑the‑art TinyML models can be trained collaboratively on MCUs with minimal accuracy loss, even under heterogeneous data distributions. The authors suggest future extensions such as leader election for fault tolerance, asynchronous gossip scheduling, and testing over realistic wireless channel models to further validate the approach in real‑world deployments.


Comments & Academic Discussion

Loading comments...

Leave a Comment