Protection against Source Inference Attacks in Federated Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Federated Learning (FL) was initially proposed as a privacy-preserving machine learning paradigm. However, FL has been shown to be susceptible to a series of privacy attacks. Recently, there has been concern about the Source Inference Attack (SIA), where an honest-but-curious central server attempts to identify exactly which client owns a given data point which was used in the training phase. Alarmingly, standard gradient obfuscation techniques with Differential Privacy have been shown to be ineffective against SIAs, at least without severely diminishing the accuracy. In this work, we propose a defense against SIAs within the widely studied shuffle model of FL, where an honest shuffler acts as an intermediary between the clients and the server. First, we demonstrate that standard naive shuffling alone is insufficient to prevent SIAs. To effectively defend against SIAs, shuffling needs to be applied at a more granular level; we propose a novel combination of parameter-level shuffling with the residue number system (RNS). Our approach provides robust protection against SIAs without affecting the accuracy of the joint model and can be seamlessly integrated into other privacy protection mechanisms. We conduct experiments on a series of models and datasets, confirming that standard shuffling approaches fail to prevent SIAs and that, in contrast, our proposed method reduce the attack’s accuracy to the level of random guessing.

💡 Research Summary

Federated learning (FL) promises privacy by keeping raw data on client devices, yet recent work has shown that an honest‑but‑curious central server can launch source inference attacks (SIAs) to pinpoint which client contributed a specific training record. Existing defenses—differential privacy, regularization, or data‑reconstruction safeguards—either severely degrade model utility or fail to reduce SIA success rates. A popular mitigation is the shuffle model, where a trusted shuffler permutes client updates before they reach the server. This paper first demonstrates that naïve shuffling at the model, layer, or even parameter level is insufficient: an attacker equipped with a small shadow dataset for a target client can reverse the permutation by evaluating each shuffled update on the shadow data and selecting the one with highest accuracy. These “reconstruction attacks” expose a fundamental weakness of the shuffle model.

To close this gap, the authors propose a novel defense that combines parameter‑level shuffling with the Residue Number System (RNS). Each model parameter is encoded as a vector of residues modulo a set of pairwise‑coprime integers. The shuffler independently permutes each residue dimension across clients, making it mathematically infeasible for the server to map a particular residue vector back to its original client using only accuracy comparisons. Reconstruction of the original parameter values is possible only after the server receives all residue dimensions and applies the Chinese Remainder Theorem, a step performed after aggregation and thus invisible to the attacker.

The method preserves the original model’s predictive performance because the underlying numeric values are unchanged; only their representation and ordering are altered. Experiments on MNIST (CNN), CIFAR‑10 (CNN), and CIFAR‑100 (ResNet‑18) confirm that standard shuffling leaves SIA accuracy around 60‑80 %, whereas the RNS‑based granular shuffling reduces it to the random‑guess baseline (≈1 / n). Model accuracy drops by less than 0.1 % across all settings. Communication overhead grows modestly (≈1.5×) due to transmitting multiple residue components, which is acceptable in cross‑silo scenarios with a limited number of well‑connected clients. The defense is compatible with differential privacy and secure aggregation, requiring only a trusted shuffler that can be realized via hardware enclaves or distributed MixNets, thus meeting the paper’s design specifications: strong protection, integrability, reasonable communication cost, unchanged model utility, and minimal trust assumptions.

In summary, the work identifies a critical vulnerability of existing shuffle‑based FL defenses against SIAs and introduces a mathematically grounded, parameter‑level RNS shuffling technique that effectively neutralizes the attack while maintaining model performance and compatibility with other privacy mechanisms. This contribution advances the state of privacy‑preserving FL and provides a practical blueprint for deploying robust SIA defenses in real‑world cross‑silo collaborations.

Protection against Source Inference Attacks in Federated Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment