Type-Based Unsourced Federated Learning With Client Self-Selection

Type-Based Unsourced Federated Learning With Client Self-Selection
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We address the client-selection problem in federated learning over wireless networks under data heterogeneity. Existing client-selection methods often rely on server-side knowledge of client-specific information, thus compromising privacy. To overcome this issue, we propose a client self-selection strategy based solely on the comparison between locally computed training losses and a centrally updated selection threshold. Furthermore, to support robust aggregation of clients’ updates over wireless channels, we integrate this client self-selection strategy into the recently proposed type-based unsourced multiple-access framework over distributed multiple-input multiple-output (D-MIMO) networks. The resulting scheme is completely unsourced: the server does not need to know the identity of the clients. Moreover, no channel state information is required, neither at the clients nor at the server side. Simulation results conducted over a D-MIMO wireless network show that the proposed self-selection strategy matches the performance of a comparable state-of-the-art server-side selection method and consistently outperforms random client selection.


💡 Research Summary

This paper tackles two intertwined challenges in federated learning (FL) over wireless networks: preserving client privacy during the client‑selection phase and reliably aggregating model updates without channel state information (CSI). Traditional client‑selection methods, such as the Power‑of‑Choice (PoC) scheme, require clients to report their local training losses or other statistics to the server. Although effective at mitigating client drift caused by data heterogeneity, these reports leak information about each client’s local data distribution. To eliminate this privacy risk, the authors propose a fully decentralized self‑selection mechanism. In each round the server broadcasts only the current global model and a scalar selection threshold θ(t). Every active client first joins a candidate set with probability p_cand = d/(λK) (where d is a design parameter and λ the activation probability). Each candidate computes its local loss ℓ_k(t) on the received global model and then decides to participate with probability σ(a·(ℓ_k(t) – θ(t))) where σ(·) is the sigmoid function and a controls the steepness of the transition. After aggregation, the server updates the threshold according to θ(t+1) = θ(t) + ξ·(b_L(t) – K_tar), where b_L(t) is the estimated number of participants and K_tar the desired average number of participants per round. This scheme requires no client‑specific information at the server, thereby achieving strong privacy guarantees while still steering the average participation level toward a target.

For the communication layer, the paper integrates the self‑selection rule into a type‑based unsourced multiple‑access (TUMA) framework designed for distributed multiple‑input multiple‑output (D‑MIMO) networks. All participating clients use the same codebook; each quantized model update is represented by a sequence of indices that are mapped to zone‑specific codewords. The D‑MIMO access points jointly receive the superimposed signals and an approximate‑message‑passing (AMP) decoder estimates the multiplicities of each codeword (the “type”). Crucially, this decoder operates without CSI at either the clients or the server, eliminating the need for channel pre‑equalization. Quantization follows a vector‑quantization with error‑accumulation scheme: each client adds its accumulated quantization error to the current update, splits the resulting vector into D sub‑vectors, quantizes each sub‑vector to the nearest codeword in a shared codebook Q, and updates the error term for the next round. The quantization indices are then transmitted over D sub‑rounds using the TUMA coding.

The authors evaluate the approach on the FMNIST dataset with a multilayer perceptron (≈52.5 k parameters) and K = 1000 clients, each activated with probability λ = 0.8. Data heterogeneity is induced via a Dirichlet distribution (α = 2). The target participation ratio is set to 10 % (K_tar ≈ 100). Under error‑free communication, the proposed self‑selection achieves test‑accuracy curves virtually indistinguishable from PoC and significantly better than random selection, confirming that privacy‑preserving self‑selection does not sacrifice convergence speed. In realistic D‑MIMO simulations (e.g., 4 APs each with 4 antennas), the CSI‑free TUMA decoder attains higher accuracy than the previously proposed MD‑AirComp scheme, and when combined with self‑selection, its performance approaches the ideal error‑free baseline.

Key contributions are: (1) a client‑side, loss‑based probabilistic selection rule that requires only a scalar threshold broadcast, thus protecting client privacy; (2) a CSI‑free type‑based unsourced multiple‑access scheme that enables scalable digital transmission of high‑dimensional model updates over D‑MIMO networks; (3) extensive simulations demonstrating that the combined system matches the convergence of server‑centric PoC while outperforming random selection and prior over‑the‑air methods. The paper also discusses limitations such as the assumption of perfect synchronization, a fixed shared codebook, and the need for hyper‑parameter tuning (a, ξ, d). Future work is suggested on asynchronous transmissions, adaptive codebook design, dynamic threshold adaptation for highly non‑i.i.d. data, and experimental validation on real wireless testbeds.


Comments & Academic Discussion

Loading comments...

Leave a Comment