Defending the Edge: Representative-Attention Defense against Backdoor Attacks in Federated Learning
Federated learning (FL) remains highly vulnerable to adaptive backdoor attacks that preserve stealth by closely imitating benign update statistics. Existing defenses predominantly rely on anomaly detection in parameter or gradient space, overlooking behavioral constraints that backdoor attacks must satisfy to ensure reliable trigger activation. These anomaly-centric methods fail against adaptive attacks that normalize update magnitudes and mimic benign statistical patterns while preserving backdoor functionality, creating a fundamental detection gap. To address this limitation, this paper introduces FeRA (Federated Representative Attention) – a novel attention-driven defense that shifts the detection paradigm from anomaly-centric to consistency-centric analysis. FeRA exploits the intrinsic need for backdoor persistence across training rounds, identifying malicious clients through suppressed representation-space variance, an orthogonal property to traditional magnitude-based statistics. The framework conducts multi-dimensional behavioral analysis combining spectral and spatial attention, directional alignment, mutual similarity, and norm inflation across two complementary detection mechanisms: consistency analysis and norm-inflation detection. Through this mechanism, FeRA isolates malicious clients that exhibit low-variance consistency or magnitude amplification. Extensive evaluation across six datasets, nine attacks, and three model architectures under both Independent and Identically Distributed (IID) and non-IID settings confirm FeRA achieves superior backdoor mitigation. Under different non-IID settings, FeRA achieved the lowest average Backdoor Accuracy (BA), about 1.67% while maintaining high clean accuracy compared to other state-of-the-art defenses. The code is available at https://github.com/Peatech/FeRA_defense.git.
💡 Research Summary
Federated learning (FL) enables collaborative model training without centralizing raw data, but its decentralized nature makes it vulnerable to backdoor attacks that embed hidden malicious behavior while preserving high clean accuracy. Existing defenses largely rely on detecting statistical anomalies in parameter or gradient space—such as unusually large norms, divergent directions, or outlier clusters—and therefore fail against adaptive attacks that deliberately mimic benign update statistics. This paper introduces FeRA (Federated Representative‑Attention), a novel defense that shifts the detection focus from anomaly‑centric to consistency‑centric analysis. The key insight is that a successful backdoor must remain reliable across many training rounds; this reliability imposes behavioral constraints that manifest as suppressed variance in the model’s representation space and, in some cases, inflated update norms.
FeRA computes six complementary metrics: (1) spectral attention (variance of principal components in client embeddings), (2) spatial attention (variance concentration across activation maps), (3) directional attention (cosine similarity between a client’s update and the global mean update), (4) mutual similarity (CKA‑based similarity matrix among clients), (5) spectral ratio (relationship between L2 norm and spectral norm of updates), and (6) norm‑inflation score (absolute magnitude deviation). Two detection mechanisms are built on these metrics. The consistency filter flags clients that simultaneously exhibit low spectral and spatial attention, low directional attention, and high mutual similarity—signatures of variance suppression needed for backdoor persistence. The norm‑inflation filter catches clients whose spectral norms are unusually large, targeting model‑replacement or scaling attacks.
Extensive experiments cover six benchmark datasets (including CIFAR‑10/100, SVHN), three neural architectures (ResNet‑18, VGG‑11, MobileNet), and nine backdoor attack variants (model‑replacement, distributed backdoor, NeurOtoxin, IBA, adaptive scaling, etc.) under both IID and highly non‑IID client data distributions. Compared against state‑of‑the‑art defenses such as FoolsGold, FLAME, MKrum, Bulyan, and FedAvgCKA, FeRA consistently achieves the lowest backdoor success rate—averaging 1.67% in non‑IID settings—while preserving clean accuracy within 1.5% of the non‑attacked baseline. Notably, FeRA remains effective when attackers normalize update magnitudes, embed malicious updates inside dense benign clusters, or distribute the trigger across colluding clients, because it simultaneously monitors variance‑based consistency and norm‑based anomalies.
The authors release the full implementation (https://github.com/Peatech/FeRA_defense.git), facilitating reproducibility and future extensions. By exploiting the intrinsic consistency requirement of backdoors and employing multi‑dimensional attention metrics, FeRA establishes a robust, practical defense paradigm for securing federated learning against sophisticated, adaptive backdoor threats.
Comments & Academic Discussion
Loading comments...
Leave a Comment