BouquetFL: Emulating diverse participant hardware in Federated Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In Federated Learning (FL), multiple parties collaboratively train a shared Machine Learning model to encapsulate all private knowledge without exchange of information. While it has seen application in several industrial projects, most FL research considers simulations on a central machine, without considering potential hardware heterogeneity between the involved parties. In this paper, we present BouquetFL, a framework designed to address this methodological gap by simulating heterogeneous client hardware on a single physical machine. By programmatically emulating diverse hardware configurations through resource restriction, BouquetFL enables controlled FL experimentation under realistic hardware diversity. Our tool provides an accessible way to study system heterogeneity in FL without requiring multiple physical devices, thereby bringing experimental practice closer to practical deployment conditions. The target audience are FL researchers studying highly heterogeneous federations. We include a wide range of profiles derived from commonly available consumer and small-lab devices, as well as a custom hardware sampler built on real-world hardware popularity, allowing users to configure the federation according to their preference.

💡 Research Summary

Federated Learning (FL) has become a cornerstone for privacy‑preserving collaborative model training across a multitude of devices, yet most experimental studies assume a homogeneous hardware environment by running all clients on a single high‑end server. This simplification masks the performance variability introduced by real‑world device heterogeneity—differences in CPU speed, core count, RAM capacity, and GPU throughput can dramatically affect training latency, convergence, and fairness among participants. To address this methodological gap, the authors present BouquetFL, a framework that emulates diverse client hardware on a single physical machine while integrating seamlessly with the popular Flower FL library.

The core idea of BouquetFL is to enforce per‑client resource constraints at the operating‑system level. Using Linux cgroups, the framework throttles CPU frequency, limits the number of visible cores, and caps RAM usage for each client process. For GPU resources, BouquetFL leverages NVIDIA’s CUDA Multi‑Process Service (MPS) to partition a single physical GPU into multiple virtual slices, each with a configurable share of CUDA cores and memory. When a Flower client’s fit method is invoked, BouquetFL spawns a dedicated subprocess, applies the specified CPU, memory, and GPU limits, runs the local training inside this sandbox, and then tears down the restrictions before the next round. This design guarantees that hardware limits are isolated per client and do not leak into subsequent training rounds.

BouquetFL offers two ways to define client hardware profiles. Users can manually specify the target CPU model, GPU model, and RAM size for each class of client. In addition, the authors provide an automatic hardware sampler that draws from the Steam Hardware Survey (2025), a dataset containing millions of real‑world PC configurations. By matching survey entries to a curated database of device specifications, the sampler generates realistic, popularity‑weighted distributions of consumer‑grade CPUs and GPUs, while explicitly excluding unattainable high‑end configurations. This feature enables researchers to quickly assemble large federations that reflect the statistical makeup of today’s edge devices without hand‑crafting each profile.

The framework is deliberately lightweight: it requires only a Linux host with NVIDIA drivers and sudo privileges, and it does not interfere with the underlying model architecture, optimizer, or aggregation strategy. Consequently, any Flower‑based FL pipeline—whether using FedAvg, FedProx, or more exotic algorithms—can be augmented with BouquetFL by adding a few lines of configuration. The authors release the full source code and documentation on GitHub, emphasizing ease of deployment and reproducibility.

A notable limitation is that the current implementation applies resource limits globally rather than per‑process, forcing clients to be executed sequentially. This restriction prevents simultaneous execution of heterogeneous clients on the same host and can increase total simulation time for large federations. Moreover, certain low‑level hardware characteristics—cache hierarchy, PCIe bandwidth, memory channel width—cannot be directly constrained, which may lead to discrepancies for highly memory‑bound workloads. The authors acknowledge these gaps and position BouquetFL as an approximation that captures the dominant performance factors (compute and memory capacity) while remaining accessible.

To validate the fidelity of the emulation, the authors conduct experiments on a single Ubuntu 24.04 machine equipped with an AMD Ryzen 1800X CPU, 32 GB DDR4 RAM, and an NVIDIA RTX 4070 Super GPU. They emulate four generations of consumer NVIDIA GPUs (GTX 10xx, GTX 16xx, RTX 20xx, RTX 30xx) and train a ResNet‑18 model on a standard image classification task. Since public ML benchmarks for consumer GPUs are scarce, the authors compare relative training times against normalized gaming benchmarks (PassMark, UserBenchmark) as a proxy for hardware performance. The results show a Spearman correlation of ρ = 0.92 and a Kendall τ = 0.80 between simulated training times and benchmark‑derived performance rankings, indicating that BouquetFL successfully preserves the relative ordering and scaling trends across GPU generations.

Beyond timing, the authors demonstrate that BouquetFL can reproduce out‑of‑memory (OOM) failures by configuring low‑RAM profiles and can model data‑loading bottlenecks by varying CPU core counts. A supplementary video illustrates dynamic hardware profile switching, runtime differences, memory failures, and dataloader throttling in real time.

In conclusion, BouquetFL fills a critical gap in FL research by providing a practical, reproducible tool for emulating heterogeneous client hardware on a single host. It enables systematic study of how device diversity impacts FL algorithms without the prohibitive cost of procuring and managing large physical testbeds. The framework’s open‑source nature, integration with Flower, and realistic hardware sampler make it immediately useful for the community. Future work includes adding network latency simulation, supporting limited parallel client execution, and extending support to non‑NVIDIA accelerators, thereby moving toward a more comprehensive emulation environment for edge‑centric federated learning.

BouquetFL: Emulating diverse participant hardware in Federated Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment