Dynamic Meta-Layer Aggregation for Byzantine-Robust Federated Learning

Noname man uscript No. (will b e inserted b y the editor) Dynamic Meta-La y er Aggregation for Byzan tine-Robust F ederated Learning Reek Das · Biplab Kan ti Sen Received: date / Accepted: date Abstract F ederated Learning (FL) is increasingly applied in sectors like health- care, nance, and IoT, enabling collab orative mo del training while safeguard- ing user priv acy . Ho wev er, FL systems are susceptible to Byzantine adv er- saries that inject malicious up dates, which can severely compromise global mo del p erformance. Existing defenses tend to fo cus on sp ecic attack types and fail against untargeted strategies, such as multi-label ipping or combi- nations of noise and backdoor patterns. T o o vercome these limitations, we prop ose F edA OT—a no v el defense mec hanism that counters multi-label ip- ping and untargeted poisoning attacks using a metalearning-inspired adaptive aggregation framework. F edAOT dynamically weigh ts client updates based on their reliability , suppressing adversarial inuence without relying on prede- ned thresholds or restrictive attack assumptions. Notably , F edA OT general- izes eectively across div erse datasets and a wide range of attac k types, main- taining robust p erformance even in previously unseen scenarios. Exp erimental results demonstrate that F edA OT substantially improv es mo del accuracy and resilience while main taining computational eciency , oering a scalable and practical solution for secure federated learning. Keyw ords F ederated Learning (FL) · Byzan tine Adv ersaries · Model P oisoning Attac ks · Decentralized Machine Learning · F ederated Aggregation Reek Das APC Roy Governmen t College E-mail: reekdas34@gmail.com Biplab Kanti Sen Assistant Professor, Department of Computer Science P .R. Thakur Gov ernment College E-mail: bksen.cu@gmail.com 2 Reek Das, Biplab Kanti Sen 1 In tro duction F ederated Learning (FL) has emerged as a transformative paradigm for dis- tributed model training, enabling multiple clients to collab orativ ely learn a global model without sharing their ra w data [1, 2]. This decen tralized frame- w ork is particularly crucial for priv acy-sensitive domains such as healthcare, nance, and mobile applications, where data condentialit y and regulatory compliance are essential [3, 4]. By transmitting only model up dates to a cen- tral server, FL preserves data locality while signican tly mitigating priv acy risks and ensuring compliance with key data protection regulations suc h as the General Data Protection Regulation (GDPR) and the Health Insurance P ortability and A ccountabilit y Act (HIP AA) [5]. Ho wev er, the distributed and client-driv en architecture of FL introduces new vulnerabilities that do not app ear in centralized learning settings. Among v arious security threats in federated learning, Byzantine attac ks, in which cer- tain clients act maliciously or b ecome compromised, represent one of the most sev ere challenges [6, 7]. Byzantine clien ts can inject p oisoned gradien ts, ip lab els, or manipulate up dates to mislead the global aggregation pro cess and degrade model conv ergence. These threats become more damaging when the data across clients are non-I ID (non-indep endent and identically distributed), meaning that each client holds data with diering statistical properties or class im balances. Under suc h heterogeneous conditions and v ariable clien t partici- pation, where dierent subsets of clients tak e part in each training round due to netw ork or resource constraints, the global mo del already struggles to learn consisten t patterns, and malicious updates further compromise the stabilit y and reliabilit y of the learning pro cess [8, 9]. Sev eral aggregation strategies hav e b een dev elop ed to enhance the robust- ness of federated learning against Byzantine threats. Classical approac hes suc h as Krum [6], T rimmed Mean [8], and Bulyan [9] attempt to statistically lter or w eight clien t up dates based on assumptions ab out the prop ortion and b ehavior of adv ersaries. More recent metho ds, including F o olsGold [10], RobustF edA vg [11], and Byzantine-Robust Decen tralized F ederated Learning [12], hav e ex- plored geometric median aggregation, reputation-based adjustment, or p eer- consensus mechanisms to resist p oisoned up dates. Although these approaches ha ve adv anced the eld, their robustness often dep ends on static thresholds, pre-dened statistical b ounds, or specic assumptions about the attac k model. As a result, their p erformance degrades in adaptive or mixed attack scenar- ios, where adversaries v ary their strategies across training rounds to ev ade detection [13, 14]. Recen t researc h, such as Shi et al. (2025) [7], has further inv estigated gradien t similarit y–based metho ds to identify and isolate malicious clients. Building on this gro wing bo dy of work, our study in troduces a new p ersp ec- tiv e through F edAOT (F ederated Adaptiv e Optimal T uning) , a meta- learning–driv en defense framework designed to enhance resilience against un- targeted and m ulti-lab el p oisoning attacks. The m ain contributions of this work are summarized as follows: Dynamic Meta-Layer Aggregation for Byzantine-Robust F ederated Learning 3 – A meta-learning–based aggregation strategy that adaptively assigns client w eights using v alidation feedback to enhance robustness. – An adaptive optimization mec hanism that reduces the impact of unreliable or adversarial up dates without prior attack assumptions. – A unied defense against un targeted p oisoning and lab el-ipping attacks under non- I ID and heterogeneous data distributions. – Empirical ev aluation demonstrating sup erior resilience and con vergence o ver existing Byzan tine-robust approaches. Ov erall, F edAOT underscores the p otential of meta-learning as a practical and scalable approach to strengthening the robustness of federated learning systems. By in tro ducing adaptivity into the aggregation mechanism, it mov es b ey ond static, rule-based defenses tow ard more in telligent and self-adjusting learning paradigms. The remainder of this pap er is organized as follo ws: Sec- tion 2 reviews related Byzantine-resilien t FL metho ds; Section 4 details the prop osed F edAOT framework; Section 5 presents exp erimental ev aluations and results; and Section 6 concludes the pap er with discussions on implications and future research directions. (Note: Since multi-label ipping and untargeted poi- soning attac ks operate similarly in the context of lab el ipping, we will refer to b oth as ’un targeted p oisoning attacks’ throughout this pap er for clarit y .) 2 Related W ork F ederated Learning (FL) distributes mo del training across multiple clients, enabling priv acy-preserving computation but in tro ducing vulnerabilities to Byzan tine attac ks, where malicious participants manipulate lo cal up dates to degrade the global mo del. This challenge has led to extensive research on ro- bust aggregation and adaptiv e defense mechanisms. Early aggregation strategies, suc h as F e der ate d A ver aging (F e dA vg) [1], simply av erage lo cal mo del up dates. Although ecient in b enign settings, F e- dA vg performs p o orly when ev en a small fraction of clien ts b ehav e maliciously , since p oisoned up dates are directly incorp orated in to the global mo del. T o coun ter this, K rum [15] selects the up date most consistent with others, assum- ing that the ma jority of clients are honest. While it provides basic robustness, Krum’s p erformance deteriorates when adv ersarial clien ts constitute a large fraction or when data heterogeneity causes honest updates to diverge signif- ican tly . Similarly , Ge ometric Me dian (Ge oMe d) [16] replaces the arithmetic mean with a median-based aggregation to suppress outliers, but it suers when attac kers generate carefully crafted up dates that remain statistically indistin- guishable from b enign gradien ts. Other metho ds, such as F o olsgold [10], analyze the similarit y of client gra- dien ts to identify p otential collusion among malicious agents. Although eec- tiv e against targeted attacks, F o olsgold performs inconsisten tly under un tar- geted poisoning scenarios where attack ers act indep endently . Later approaches lik e F ABA (Filtering A dversaries By A gr e ement) [17] and BRIEF (Byzantine- R obust F e der ate d L e arning via Optimal V oting) [18] rene this concept by 4 Reek Das, Biplab Kanti Sen adaptiv ely ltering up dates based on m utual agreemen t or v oting mec hanisms. Ho wev er, these algorithms still dep end on static heuristics or assumptions ab out attack types, whic h limits their adaptabilit y to new or ev olving threat patterns. F urthermore, most of these tec hniques rely on xed similarity thresh- olds or require a clean reference model, conditions rarely a v ailable in real-w orld federated systems. T o address these limitations, researchers ha ve proposed adaptive and learning- guided defenses. FL T rust [19] in tro duces a server-side trusted dataset to cali- brate clien t updates, achieving go o d robustness under controlled en vironmen ts but facing scalabilit y issues in priv acy-sensitive deploymen ts. R obustFL [20] as- signs condence-based weigh ts to up dates by modeling uncertain ty , enhancing stabilit y under mixed attack scenarios. Meanwhile, p ersonalized FL frame- w orks [21] explore ne-grained, feedback-based adaptation of model param- eters. These w orks ha v e inspired dynamic aggregation strategies that mo v e b ey ond static heuristic rules. Building on these foundations, Shi et al. (2025) [7] prop osed the RSDFL framew ork, which measures pairwise gradient similarit y to identify and isolate Byzan tine clients. RSDFL signicantly improv es resilience against direct p oi- soning but assumes consistent gradient b ehavior across clien ts and relies on static similarity thresholds. Consequently , its robustness declines under het- erogeneous data distributions or when attack strategies evolv e dynamically o ver training rounds. Considering a related but distinct persp ective, L oRA (L ow-R ank A dap- tation) [22] ne-tunes a limited set of low-rank parameters, typically within the nal lay er. Recent trends hav e focused on meta-learning and reinforce- men t learning-based aggregation, aiming to con tinuously adapt aggregation b eha vior based on clien t reliabilit y feedback. F e dRAD [23] emplo ys reinforce- men t learning to optimize aggregation p olicies dynamically , showing impro v ed generalization under dynamic attack patterns. T rustF e d [24] introduces a his- torical trust mo deling approac h, where each client’s con tribution is w eighted b y a dynamically updated reliabilit y score. These framew orks mark a transi- tion to w ard self-optimizing FL systems that can learn to defend rather than rely on static defense parameters. Despite this progress, existing Byzantine-resilien t frameworks share key limitations: (1) their reliance on predened heuristics or trust metrics restricts adaptabilit y , (2) their defensive behavior is often reactive rather than pre- dictiv e, and (3) their eectiveness declines when adversaries evolv e or when honest clien t up dates temp orarily degrade due to data heterogeneity . These c hallenges motiv ate the need for a learning-driv en, feedback-a w are aggrega- tion mechanism that can contin uously rene its defense strategy based on observ ed mo del p erformance. This motiv ation forms the foundation of the prop osed F edAOT framework. Dynamic Meta-Layer Aggregation for Byzantine-Robust F ederated Learning 5 3 Problem F orm ulation and Motiv ation F ederated Learning (FL) enables decen tralized mo del training across a set of distributed clients { C 1 , C 2 , . . . , C N } , where each clien t C i holds a priv ate dataset D i and performs lo cal optimization without sharing raw data. The global learning ob jectiv e is formulated as: min w F ( w ) = N  i =1 p i F i ( w ) , where F i ( w ) = E ( x,y ) ∼ D i [ ℓ ( w ; x, y )] denotes the lo cal empirical loss on clien t i , p i = | D i | ∑ j | D j | is the data proportion weigh t, and ℓ ( · ) is the task-sp ecic loss function. A t eac h comm unication round t , clien ts compute local up dates ∆w ( t ) i and send them to the serv er, which aggregates them to up date the global mo del: w ( t +1) = w ( t ) + η · Agg  { ∆w ( t ) i } N i =1  , where η is the global learning rate and Agg ( · ) denotes the aggregation function. Byzan tine Challenge: In realistic federated en vironments, a subset of clients may act maliciously by injecting p oisoned or corrupted up dates. Let A ⊂ { 1 , . . . , N } denote the set of Byzan tine clients, each con tributing adversarial up dates ∆w A i ∼ D adv . These up dates are often statistically similar to b enign ones: E [ ∆w A i ] ≈ E [ ∆w j ] , ∀ j / ∈ A , making them dicult to detect through conv entional distance or similarity- based ltering. Most existing Byzantine-resilien t methods focus on tar gete d p oisoning attacks , where adv ersaries manipulate up dates to misclassify sp e- cic classes. How ever, untar gete d mo del p oisoning p oses an equally severe yet harder-to-detect threat. In such cases, malicious clients inject random noise or indiscriminate lab el ips, degrading global mo del p erformance without a specic target. These noisy up dates often mimic natural client v ariations under non-I ID conditions, allo wing them to b ypass standard defense mecha- nisms. Consequen tly , traditional aggregation sc hemes, whether up date-based or mo de l-based, often fail to conv erge: w ( t +1) = w ( t ) + η · Agg ( { ∆w ( t ) i } N i =1 ) fails to con verge as t → ∞ , w ( t +1) = Agg ( { ∆w ( t ) i } N i =1 ) ma y oscillate or stagnate under adv ersarial inuence. This c hallenge is further amplied in heterogeneous and resource-constrained edge environmen ts, where client data distributions and participation patterns v ary signicantly . Motiv ation T o ov ercome these c hallenges, this w ork pro- 6 Reek Das, Biplab Kanti Sen p oses a robust and adaptive aggregation mechanism that dynamically learns to assign client-specic weigh ts based on their con tribution reliability . The aggregation pro cess is reform ulated as: w ( t +1) = N  i =1 k ( t ) i ∆w ( t ) i , where k ( t ) i ∈ [0 , 1] and N  i =1 k ( t ) i = 1 . Unlik e heuristic or threshold-based defenses, the proposed approac h adaptiv ely infers k ( t ) i using v alidation feedbac k, allo wing it to suppress harmful updates and amplify trust worth y ones. This adaptiv e learning of aggregation weigh ts aims to ensure robustness, conv ergence stabilit y , and computational eciency ev en under untargeted or ev olving Byzantine threats in FL settings. Fig. 1: Ov erview of the prop osed adaptive meta-la y er aggregation mechanism. Eac h client C i pro duces a lo cal model w i , and the meta-la y er learns client- sp ecic weigh ts k i to compute a robust aggregated mo del W nal =  N i =1 k i w i . 4 Prop osed Metho dology F ederated Adaptiv e Optimal T uning (F edAOT) augments the standard server aggregation in federated learning with a light weigh t, server-side meta-la y er Dynamic Meta-Layer Aggregation for Byzantine-Robust F ederated Learning 7 Algorithm 1 F edAOT: F ederated Adaptiv e Optimal T uning Require: Clients { C 1 , . . . , C N } ; initial global mo del W (0) ; meta learning rate η ; server meta-v alidation set ( x, y ) Ensure: Final aggregated global model W nal 1: Initialize: Set imp ortance w eigh ts k (0) i = 1 N for all i 2: for comm unication round t = 1 to T do 3: Server broadcasts W ( t ) to selected clients 4: Clien t-side Lo cal T raining 5: for each client C i in parallel do 6: Client C i performs local training and returns up date ω ( t ) i 7: end for 8: Serv er Aggregation 9: W ( t ) ← ∑ N i =1 k ( t ) i · ω ( t ) i 10: Meta-la yer Update 11: Compute validation prediction: ˆ y ← f ( W ( t ) , x ) 12: Compute meta-loss: L ← Loss( ˆ y , y ) 13: for each client i do 14: Compute gradient: g i ← ∇ k ( t ) i L 15: Update importance weigh t: k ( t +1) i ← k ( t ) i − η · g i 16: end for 17: Stabilization of Client Imp ortance W eights 18: if Normalization (used in this work) then 19: for each client i do 20: k ( t +1) i ← k ( t +1) i / ∑ N j =1 k ( t +1) j 21: end for 22: else 23: (b etter option) SoftMax-based stabilization 24: for each client i do 25: k ( t +1) i ← exp( k ( t +1) i ) / ∑ N j =1 exp( k ( t +1) j ) 26: end for 27: end if 28: end for 29: Return W nal ← W ( T ) that learns client-specic imp ortance weigh ts. At each comm unication round, the serv er collects lo cal up dates from participating clients, composes a weigh ted aggregation using the current imp ortance vector, ev aluates the aggregated mo del on a small held-out meta-v alidation set, and applies a meta-gradien t step to adjust the imp ortance weigh ts so as to reduce the v alidation loss. This adaptiv e re-w eighting is intended to suppress the inuence of adversarial or noisy updates while preserving con tributions from helpful clien ts, and it incurs only a mo dest serv er-side ov erhead compared to lo cal client computation. The central mechanism of F edAOT pro ceeds as follows. After each com- m unication round the server forms the aggregated mo del W ( t ) = N  i =1 k ( t ) i ω ( t ) i , where { ω ( t ) i } are the clien t up dates collected that round and { k ( t ) i } are the curren t importance w eigh ts. The server ev aluates the aggregated model on a 8 Reek Das, Biplab Kanti Sen T able 1: Notation used in the F edAOT description. Symbol Description N T otal n umber of clients in the federation C i Client indexed by i W ( t ) Global mo del parameters at round t ω ( t ) i Client i ’s up date at round t (mo del dierence or gradien t) k ( t ) i Importance weigh t for client i at round t ( k (0) i = 1 / N , k ( t ) i ≥ 0 ) s ( t ) i Optional logit for SoftMax parametrization of k i η Meta learning rate for importance updates (server-side) ( x, y ) Server-side meta-v alidation dataset (held-out) L Meta-v alidation loss ev aluated on ( x, y ) τ SoftMax temp erature (optional) α Exponential smo othing coecient (optional) ε Clipping low er bound for stabilized w eights (optional) small held-out meta-v alidation set ( x, y ) and computes the meta-v alidation loss L = Loss( f ( W ( t ) , x ) , y ) . The sensitivit y of this loss to each imp ortance weigh t is given by the meta-gradient g ( t ) i = ∇ k i L . Imp ortance v alues are up dated by a sm all meta-step k ( t +1) i ← k ( t ) i − η g ( t ) i , and then stabilize d to enforce a v alid conv ex w eighting for aggregation. In our exp eriments, we used simple renormalization (division by the sum) as the default stabilization: k i ← k i  j k j , b ecause it is computationally trivial and, together with conserv ative meta learning rates and Adam-based local training, pro duced stable b ehaviour across b enc hmarks. In practice we initialize k (0) i = 1 / N and enforce k ( t ) i ≥ 0 . F or deploymen ts facing extreme heterogeneity or noisy meta-gradients, we recommend the SoftMax parametrization on in ternal logits s i : k i = exp( s i /τ )  N j =1 exp( s j /τ ) , τ > 0 , and up date s i b y meta-gradient descent. SoftMax guaran tees 0 < k i < 1 and av oids negative or extremely large intermediate v alues; use it when raw meta-gradien ts are noisy or when numerical stabilit y is a concern. Optionally apply e xp onen tial smo othing ˜ k ( t +1) i = α k ( t ) i + (1 − α ) k ( t +1) i and clip ˜ k ( t +1) i ∈ [ ε, 1] to reduce transient p enalization of honest clients. The extra server-side cost p er round is one v alidation forward/bac kward pass; this cost scales linearly with the num b er of participating clien ts and is small compared to lo cal training. F or stable op eration we recommend c ho osing the meta learning rate η smaller than the t ypical client learning rate and tuning it on the me ta-v alidation set. Dynamic Meta-Layer Aggregation for Byzantine-Robust F ederated Learning 9 5 Exp erimen tal Ev aluation W e ev aluate F edAOT on all benchmark datasets using a federated learning sim ulation with 20 to 100 clien ts, dep ending on the exp eriment. T o assess robustness under dierent adversarial levels, we introduce Byzan tine clien ts ranging from 20% to 90% of the total p opulation (A20 to A90). Each clien t trains a CNN and FNN hybrid mo del locally using the A dam optimizer with a learning rate of 0.001 and a batc h size of 32. The Exp eriments were con- ducted in a Linux-based environmen t (Kaggle Noteb o oks) equipp ed with tw o NVIDIA T4 GPUs. All federated sim ulations were implemented using the Flo wer framework with PyT orch, ensuring consistent execution o w of client- serv er communication and repro ducible execution across runs. Since the attacks considered are untargeted, malicious clients in the lab el- ipping scenario mo dify the entire lab el set using the rule ( lab el + 1) mod 10 . This alteration aects the complete class distribution; therefore, w e rep ort o verall classication accuracy instead of class-wise accuracy . W e compare F edA OT with three standard aggregation metho ds: F edA vg[1], F o olsGold[10], and GeoMed[16]. P erformance is ev aluated on the test set using b oth classication accuracy and the F1 score under eac h attack in tensity . 5.1 Dataset The prop osed metho d is ev aluated using b enchmark datasets commonly used in FL research, suc h as MNIST[25], KMNIST[26], and F ashionMNIST[27]. These datasets provide a diverse set of tasks suitable for assessing the robust- ness of the aggregation metho d under v arious attack scenarios. 5.2 Results In this section, we discussed three primary aspects to ev aluate the eectiv eness and robustness of the prop osed F edAOT aggregation metho d: i. F e dA OT Performanc e A gainst Untar gete d L ab el-Flipping A ttacks T o assess the standalone eectiv eness of F edAOT under v arying attack inten- sities, we ev aluate its p erformance on MNIST, KMNIST, and F ashionMNIST across attac k lev els ranging from A20 to A90. The corresp onding results are presen ted in T ables 2a, 2b, and 2c. 10 Reek Das, Biplab Kanti Sen T able 2: Performance Comparison Under Dierent A ttack Intensities (a) MNIST A20 A30 A40 A50 A60 A70 A80 A85 A90 98.78 98.43 97.71 97.85 97.31 98.19 97.08 98.11 97.77 98.77 98.43 97.73 97.87 97.33 98.20 97.08 98.12 97.75 (b) KMNIST A20 A30 A40 A50 A60 A70 A80 A85 A90 94.84 93.85 92.46 90.60 91.38 88.22 92.24 88.70 91.66 94.83 93.84 92.45 90.59 91.38 88.21 92.22 89.91 91.64 (c) FMNIST A20 A30 A40 A50 A60 A70 A80 A85 A90 89.53 89.31 89.27 88.83 88.78 88.90 87.97 88.74 88.46 89.54 89.20 89.22 88.72 88.72 88.88 88.09 88.76 88.44 The results clearly demonstrate the robustness of F edAOT across all three datasets under un targeted lab el-ipping attacks. Accuracy remains consis- ten tly high ev en when up to 90% of the participating clients are malicious, and b oth accuracy and F1 score show only minor v ariations across attack lev- els. This stabilit y indicates that F edAOT successfully neutralizes the inuence of p oisoned up dates. A notable observ ation is that the algorithm maintains p erformance regardless of the baseline diculty of the dataset. F or MNIST, the accuracy decreases only marginally from 98.78% (A20) to 97.77% (A90). KMNIST and FMNIST follo w the same pattern, conrming that F edAOT eectively preserves mo del qualit y even in more challenging scenarios. The primary limiting factor b e- comes the mo del’s inheren t p erformance rather than the attack strength. F edAOT’s resilience can be attributed to its adaptive w eighting mec hanism, whic h down weigh ts adversarial up dates without being inuenced b y their sheer v olume. This property ensures stable global model up dates ev en under extreme attac k conditions, making F edAOT highly suitable for real-world federated learning deplo yments where large-scale Byzan tine b ehavior ma y o ccur. ii. Comp ar ative p erformanc e of F e dA OT against existing r obust ag- gr e gation algorithms under Byzantine attacks W e ev aluate F edAOT against F o olsGold, GeoMed, and F edA vg under increas- ing attac k intensities (A20–A70). Note. Krum and its v ariants were excluded due to extreme instability in un- targeted p oisoning scenarios. Since they select only one or a few client mo dels Dynamic Meta-Layer Aggregation for Byzantine-Robust F ederated Learning 11 Fig. 2: Comparison of aggregation metho ds under dieren t attack in tensities. p er round, a single incorrect malicious selection collapses accuracy to near- zero, making them unreliable for this threat mo del. Performanc e at L ow A ttack R atios (A20–A30): At lo w attack er per- cen tages, all baseline metho ds p erform strongly , achieving ov er 98% accuracy on MNIST and 88–90% on F ashionMNIST. Mild adv ersarial presence does not signican tly disrupt learning. Performanc e De gr adation with Incr e ase d A ttacks (A35–A50) : The baseline defenses begin to degrade noticeably as attack levels rise: - At A40, F o olsGold and GeoMed decline to approximately 94% and 88% on MNIST, while F edA vg drops to 92.89%. - By A45–A50, p erformance collapses sharply: F o olsGold falls b elow 77%, GeoMed reaches 70.9%, and F edA vg drops to 81.24%. - On F ashionMNIST, all methods fall b elow 76% at A45, with se- v ere instability at A50 (GeoMed: 37.49%; F edA vg: 35.08%). Complete Br e akdown Beyond A50 - At A55, all baselines fail catastrophically , with F1 scores below 0.3 on MNIST and 0.2 on F ashionMNIST. - At A60–A65, F o olsGold and F edA vg op erate near randomness, achieving only 3–4% accuracy . - A t A70, all baseline defenses collapse fully , falling b elow 2% accuracy . 12 Reek Das, Biplab Kanti Sen In contrast, F edAOT preserv es high accuracy across all attack intensities. As seen in Q1, the meta-lay er eectively distinguishes honest from adversarial clien ts, sustaining stable p erformance even at A90. iii. Ee ctiveness of the F e dA OT meta-layer in identifying and down- weighting malicious clients F edAOT utilizes adaptive weigh ting through learnable importance factors k i to distinguish betw een honest and malicious clients. This mec hanism ensures that honest clients con tribute meaningfully to the aggregation, while malicious clien ts are systematically down weigh ted. T o demonstrate this, we analyze the distribution of k - v alues across dierent attack in tensities: A20, A50, A70, and A90. k v alue Distribution A cross Dieren t A ttac k Lev els: F or clarit y and to a void redundancy , w e presented results using a 20-client sys- tem for FMNIST. Ab o ve, we presen t the distribution of k -v alues for 20 clien ts after 30 epo chs under increasing attac k intensities: Dynamic Meta-Layer Aggregation for Byzantine-Robust F ederated Learning 13 – A20: The distribution is nearly uniform, indicating that all clien ts con- tribute equally , as the mo del has not detected signicant adversarial inu- ence. – A50: The honest clients begin to receive higher weigh ts, while malicious clien ts are do wnw eighted, sho wing the algorithm’s ability to dierentiate b et ween the t wo. – A70: The dierence becomes more pronounced, with honest clien ts dom- inating the weigh t distribution, ensuring that the mo del primarily learns from non-m alicious up dates. – A90: The system eectively isolates the tw o honest clients, giving them the ov erwhelming ma jority of the w eight, ensuring that aggregation is not compromised b y the large num b er of attack ers. The gradual adaptation of k -v alues highlights t wo k ey strengths of F edAOT: – Robust Adaptiv e Filtering: The algorithm eectiv ely learns to separate honest and malicious clients ov er time, rather than making hard-co ded assumptions ab out attac ker b eha vior. – Stable Mo del Con tribution: Instead of relying on a single trusted clien t, F edAOT aggregates knowledge from all honest clients, ensuring long-term mo del s tabilit y . – Poten tial for F uture Extensions: Since the algorithm dynamically as- signs low er weigh ts to malicious clients, this technique could p oten tially b e extended to defend against targeted p oisoning attac ks or more sophis- ticated B yzan tine threats. This analysis reinforces that F edA OT is actively learning which clients should con tribute more. 6 Conclusion and F uture Scop e The experimental results demonstrate that many existing robust aggrega- tion strategies struggle when a large prop ortion of clients b ehav e maliciously , largely b ecause they fail to reliably distinguish honest up dates from p oi- soned ones. Such approaches, particularly those relying on xed rules or hard- selection, ma y degrade sharply under large-scale Byzantine activit y . F edAOT addresses these limitations through an adaptiv e imp ortance weigh t- ing mec hanism that adjusts eac h client’s contribution via meta-la y er optimiza- tion. This dynamic adjustment eectively suppresses adv ersarial updates while retaining the inuence of honest clients, enabling the global mo del to remain stable even under extreme attack intensities. Across all datasets and attack lev els, F edAOT consistently preserves high accuracy and robustness against un targeted lab el-ipping attacks. F uture extensions ma y include the stabilit y and long-term b ehavior of the adaptive w eighting method, particularly to understand the conditions under which the w eights reliably conv erge during training. Also, developing 14 Reek Das, Biplab Kanti Sen more principled strategies for constructing meta-v alidation data—potentially through curated public datasets, synthetic data generation, or controlled vol- un teer contributions—depending on deplo yment requiremen ts. F urther ev al- uation under highly non-I ID settings is also necessary to fully understand p erformance b oundaries in heterogeneous en vironments. References 1. B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. y Arcas, in A rticial intel ligenc e and statistics (PMLR, 2017), pp. 1273–1282 2. J. Konečn ỳ , H.B. McMahan, F.X. Y u, P . Rich tárik, A.T. Suresh, D. Bacon, F ederated optimization: Distributed machine learning for on-device intelligence, arXiv preprint arXiv:1610.02527 (2016) 3. Q. Y ang, Y. Liu, T. Chen, Y. T ong, F ederated machine learning: Concept and appli- cations, ACM T ransactions on Intelligen t Systems and T echnology (TIST) 10 (2), 1 (2019) 4. T. Li, A.K. Sahu, A. T alwalkar, V. Smith, F ederated learning: Challenges, metho ds, and future directions, IEEE Signal Pro cessing Magazine 37 (3), 50 (2020) 5. N. T ruong, K. Sun, M. Elkashlan, H.V. P o or, L. Hanzo, Priv acy preserv a- tion in federated learning: Insights from the gdpr persp ective, Computer La w and Security Review 41 , 105550 (2021). DOI 10.1016/j.clsr.2020.105550. URL https://doi.org/10.1016/j.clsr.2020.105550 6. P . Blanchard, E.M. El Mhamdi, R. Guerraoui, J. Stainer, Machine learning with adv er- saries: Byzan tine toleran t gradien t descen t, A dv ances in neural information processing systems 30 (2017) 7. Y. Shi, X. Hu, S. Bai, Y. Liu, H. Lin, A byzan tine-robust federated learning against adversarial–majority attacks, The Journal of Sup ercomputing 81 (10), 1133 (2025) 8. D. Yin, Y. Chen, R. Kannan, P . Bartlett, in International c onfer enc e on machine le arning (Pmlr, 2018), pp. 5650–5659 9. R. Guerraoui, S. Rouault, et al., in International c onfer enc e on machine le arning (PMLR, 2018), pp. 3521–3530 10. C. F ung, C.J. Y oon, I. Beschastnikh, The limitations of federated learning in b yzantine environmen ts, Proceedings of the 23rd In ternational Symposium on Research in Attac ks, Intrusions and Defenses (RAID) pp. 301–316 (2020) 11. K. Pillutla, S.M. Kakade, Z. Harc haoui, Robust aggregation for federated learning, IEEE T ransactions on Signal Processing 70 , 1142 (2022) 12. C. He, M. Anna v aram, S.A. A v estimehr, Byzan tine-robust decen tralized federated learn- ing, IEEE T ransactions on Signal Processing 69 , 4583 (2021) 13. G. Baruch, M. Baruc h, Y. Goldb erg, A little is enough: Circumv enting defenses for distributed learning, Adv ances in Neural Information Pro cessing Systems 32 (2019) 14. C. Xie, O. Ko yejo, I. Gupta, in Pr o c e e dings of the Conferenc e on Unc ertainty in Articial Intel ligenc e (UAI) (PMLR, 2020), pp. 261–270 15. P . Blanc hard, E.M. El Mhamdi, R. Guerraoui, J. Stainer, in Pr o c e edings of the 31st International Confer enc e on Neur al Information Pro cessing Systems (2017) 16. D. Yin, Y. Chen, K. Ramc handran, P . Bartlett, in International Confer enc e on Machine L e arning (2018) 17. X. Li, H. Zhou, Y. Chen, P . Zhang, in Pr o c e e dings of the 33r d International Confer enc e on Neur al Information Pr o cessing S ystems (NeurIPS ’19) (2019) 18. R. W ang, X. W ang, H. Chen, S. Picek, Z. Liu, K. Liang, Brief: Byzantine-robust feder- ated learning via optimal voting, arXiv preprint arXiv:2208.10161 (2022) 19. X. Cao, M. F ang, J. Liu, N.Z. Gong, in Pro c e e dings of the Network and Distribute d System Se curity Symp osium (NDSS) (2021) 20. J. Zhang, C. Ge, F. Hu, B. Chen, Robust: Robust federated learning against p oisoning attacks in industrial iot systems, IEEE T ransactions on Industrial Informatics 18 (9), 6388 (2022) Dynamic Meta-Layer Aggregation for Byzantine-Robust F ederated Learning 15 21. A. F allah, A. Mokhtari, A. Ozdaglar, in Neur al Information Pr o cessing Systems (2020) 22. E.J. Hu, Y. Shen, P . W allis, Z. Allen-Zhu, Y. Li, L. W ang, W. Chen, Lora: Low-rank adaptation of large language models, arXiv preprint arXiv:2106.09685 (2021) 23. J. Li, M. Zhao, Y. Chen, F edrad: Reinforcement learning-based adaptive defense for byzan tine-robust federated learning, IEEE T ransactions on Neural Net works and Learn- ing Systems (2023) 24. L. W ang, R. Kumar, Q. Li, T rustfed: Dynamic trust modeling for adaptive aggregation in federated learning, Neural Netw orks 176 , 106–119 (2024) 25. Y. LeCun, L. Bottou, Y. Bengio, P . Haner, Gradien t-based learning applied to do cu- ment recognition, Pro ceedings of the IEEE 86 (11), 2278 (2002) 26. T. Clan u wat, A. Aizaw a, H. Kato, Y. Matsumoto, K. Usami, R. Y amakaw a, H.L. Goh, K. Nakanishi, D. Kaw ahara, I. K obay ashi, Deep learning for classical japanese literature, arXiv preprint arXiv:1804.05097 (2018) 27. H. Xiao, K. Rasul, R. V ollgraf, F ashion-mnist: A no vel image dataset for benchmarking machine learning algorithms, arXiv preprint arXiv:1708.07747 (2017)

Dynamic Meta-Layer Aggregation for Byzantine-Robust Federated Learning

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment