Conflict-Aware Client Selection for Multi-Server Federated Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Federated learning (FL) has emerged as a promising distributed machine learning (ML) that enables collaborative model training across clients without exposing raw data, thereby preserving user privacy and reducing communication costs. Despite these benefits, traditional single-server FL suffers from high communication latency due to the aggregation of models from a large number of clients. While multi-server FL distributes workloads across edge servers, overlapping client coverage and uncoordinated selection often lead to resource contention, causing bandwidth conflicts and training failures. To address these limitations, we propose a decentralized reinforcement learning with conflict risk prediction, named RL CRP, to optimize client selection in multi-server FL systems. Specifically, each server estimates the likelihood of client selection conflicts using a categorical hidden Markov model based on its sparse historical client selection sequence. Then, a fairness-aware reward mechanism is incorporated to promote long-term client participation for minimizing training latency and resource contention. Extensive experiments demonstrate that the proposed RL-CRP framework effectively reduces inter-server conflicts and significantly improves training efficiency in terms of convergence speed and communication cost.

💡 Research Summary

The paper tackles a critical bottleneck in multi‑server federated learning (FL): client selection conflicts that arise when the same client lies in the coverage area of multiple edge servers and is independently selected by them. Such conflicts lead to bandwidth contention, time‑outs, and overall slowdown of the training process. To mitigate this, the authors propose a decentralized reinforcement‑learning framework called RL‑CRP (Reinforcement Learning with Conflict Risk Prediction). The solution consists of two main components.

First, each server builds a categorical hidden Markov model (HMM) to predict the probability that a given client will be selected by another server (i.e., a conflict risk). The HMM works with K hidden states and V observation categories (conflict vs. no‑conflict). Because historical selection logs are often sparse and outdated, the model fills missing entries using the forward‑backward algorithm and updates its parameters incrementally via a Baum‑Welch‑style EM procedure. This yields a per‑client conflict probability p_i(t) that reflects recent trends while handling incomplete data.

Second, each server runs an independent Soft Actor‑Critic (SAC) agent. The state includes the current upload latency vector of its associated clients and the conflict probabilities supplied by the HMM. The action is a selection of a fixed‑size subset of clients for the current FL round. The reward function is carefully crafted to balance three objectives: (1) minimize the worst‑case latency L_m, (2) penalize actual conflicts C_m, and (3) promote long‑term fairness among clients. Fairness is quantified by f = tanh(μ/δ + ε), where μ is the average number of rounds a client participates and δ is the standard deviation; α controls the weight of this term. By integrating fairness, the policy avoids repeatedly favoring high‑capacity clients and instead encourages participation of under‑represented devices, which is especially beneficial under non‑IID data distributions.

After the policy selects clients, a water‑filling algorithm allocates the limited wireless bandwidth (e.g., 100 MHz) to the chosen devices, prioritizing those with better channel conditions until the bandwidth budget is exhausted. This step further reduces the likelihood of contention.

The experimental evaluation uses CIFAR‑10 with a three‑layer CNN under both IID and non‑IID (Dirichlet α = 0.1) data splits. The testbed comprises two edge servers covering a 1 km radius each, with 50 clients (40 overlapping). Baselines include standard FedAvg (random selection), RL‑CRP without the fairness term, and ENSAC (entropy‑normalized SAC). Results show that RL‑CRP achieves the highest test accuracy—67.68 % (IID) and 61.32 % (non‑IID)—outperforming all baselines by 4–7 percentage points. Moreover, RL‑CRP converges to a stable reward significantly faster than ENSAC and exhibits a markedly lower conflict rate, which translates into a 15 % reduction in average round latency. When the number of servers is increased to four, the method still maintains its advantage, confirming scalability.

Key contributions are: (1) a novel HMM‑based conflict risk predictor that works with sparse historical data, (2) a decentralized SAC‑based client‑selection policy that jointly optimizes latency, conflict avoidance, and fairness, and (3) an integrated bandwidth‑allocation scheme that respects real‑world wireless constraints. The approach requires no inter‑server communication for coordination, making it highly scalable for edge‑centric IoT or mobile scenarios where network conditions and client availability are highly dynamic. Overall, RL‑CRP offers a practical pathway to improve efficiency, robustness, and fairness in multi‑server federated learning deployments.

Conflict-Aware Client Selection for Multi-Server Federated Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment