Guarding the Middle: Protecting Intermediate Representations in Federated Split Learning
Big data scenarios, where massive, heterogeneous datasets are distributed across clients, demand scalable, privacy-preserving learning methods. Federated learning (FL) enables decentralized training of machine learning (ML) models across clients without data centralization. Decentralized training, however, introduces a computational burden on client devices. U-shaped federated split learning (UFSL) offloads a fraction of the client computation to the server while keeping both data and labels on the clients’ side. However, the intermediate representations (i.e., smashed data) shared by clients with the server are prone to exposing clients’ private data. To reduce exposure of client data through intermediate data representations, this work proposes k-anonymous differentially private UFSL (KD-UFSL), which leverages privacy-enhancing techniques such as microaggregation and differential privacy to minimize data leakage from the smashed data transferred to the server. We first demonstrate that an adversary can access private client data from intermediate representations via a data-reconstruction attack, and then present a privacy-enhancing solution, KD-UFSL, to mitigate this risk. Our experiments indicate that, alongside increasing the mean squared error between the actual and reconstructed images by up to 50% in some cases, KD-UFSL also decreases the structural similarity between them by up to 40% on four benchmarking datasets. More importantly, KD-UFSL improves privacy while preserving the utility of the global model. This highlights its suitability for large-scale big data applications where privacy and utility must be balanced.
💡 Research Summary
The paper addresses a critical privacy vulnerability in U‑shaped Federated Split Learning (UFSL), a hybrid of federated learning (FL) and split learning (SL) that keeps both raw data and labels on client devices while off‑loading the middle part of the model to a server. Although UFSL solves the label‑exposure problem of two‑way federated split learning, the intermediate “smashed” representations transmitted from the client’s head network to the server can be exploited to reconstruct the original inputs.
To demonstrate this risk, the authors construct a data‑reconstruction adversary. The adversary, assumed to be an honest‑but‑curious server, knows the client head architecture and possesses an auxiliary dataset with a similar distribution. By training an inversion network on this auxiliary data, the server learns to map smashed features back to raw images, achieving high fidelity reconstructions.
In response, the authors propose KD‑UFSL, a privacy‑enhanced UFSL framework that combines two complementary mechanisms: (1) differential privacy (DP) applied directly to the raw client data, and (2) model‑level k‑anonymity realized through micro‑aggregation of smashed features. Concretely, each client first adds Gaussian noise N(0,σ²) to its input batch, guaranteeing (ε,δ)‑DP when σ satisfies the standard sensitivity bound. After the head network processes the noisy inputs, the server groups clients into clusters of at least k members each round. Within each group, the smashed outputs are averaged, producing a group‑level representation that is indistinguishable from any of the k members. This aggregated representation is then fed through the server‑side body network, and the resulting server‑smashed data is sent back to all group members for final tail‑network inference.
The dual‑layer protection has several advantages. Adding DP noise at the data level reduces the sensitivity of the smashed features, making them less informative for inversion. Micro‑aggregation enforces k‑anonymity at the model level, ensuring that any single client’s contribution cannot be isolated by the adversary. Together, they substantially increase the difficulty of reconstructing original inputs while preserving the learning utility.
Experimental evaluation spans four benchmark datasets—MNIST, Fashion‑MNIST, CIFAR‑10, and a medical imaging set—covering both grayscale and RGB modalities. Under a strong reconstruction attack, KD‑UFSL raises the mean‑squared error (MSE) between original and reconstructed images by up to 50 % and reduces structural similarity index (SSIM) by up to 40 % compared with vanilla UFSL. Importantly, the global model’s test accuracy remains comparable to the baseline when appropriate values of k (typically 5–10) and ε are chosen, demonstrating that privacy gains do not come at a prohibitive utility cost.
The paper also discusses practical considerations. Larger k values improve privacy but increase communication overhead and require sufficient client population to form groups each round. The authors provide guidance on selecting k and DP parameters based on client pool size, network bandwidth, and acceptable performance degradation.
In summary, KD‑UFSL introduces a novel combination of differential privacy and k‑anonymity to protect intermediate representations in federated split learning. By adding noise to raw inputs and aggregating smashed features across client groups, the framework mitigates data‑reconstruction attacks while maintaining model performance, making it well‑suited for privacy‑sensitive, large‑scale applications such as IoT, healthcare, and smart‑city analytics.
Comments & Academic Discussion
Loading comments...
Leave a Comment