ERIS: Enhancing Privacy and Communication Efficiency in Serverless Federated Learning
Scaling federated learning (FL) to billion-parameter models introduces critical trade-offs between communication efficiency, model accuracy, and privacy guarantees. Existing solutions often tackle these challenges in isolation, sacrificing accuracy or relying on costly cryptographic tools. We propose ERIS, a serverless FL framework that balances privacy and accuracy while eliminating the server bottleneck and distributing the communication load. ERIS combines a model partitioning strategy, distributing aggregation across multiple client-side aggregators, with a distributed shifted gradient compression mechanism. We theoretically prove that ERIS (i) converges at the same rate as FedAvg under standard assumptions, and (ii) bounds mutual information leakage inversely with the number of aggregators, enabling strong privacy guarantees with no accuracy degradation. Experiments across image and text tasks, including large language models, confirm that ERIS achieves FedAvg-level accuracy while substantially reducing communication cost and improving robustness to membership inference and reconstruction attacks, without relying on heavy cryptography or noise injection.
💡 Research Summary
The paper introduces ERIS, a server‑less federated learning (FL) framework designed to simultaneously address three major challenges that arise when scaling FL to billion‑parameter models: (1) prohibitive communication costs caused by transmitting full model updates through a central server, (2) degradation of model accuracy when aggressive compression is applied, and (3) privacy leakage from gradients that can be exploited by membership‑inference or reconstruction attacks.
Core design. ERIS removes the central server entirely. Instead, each client performs two operations before sending any data: (i) shifted compression, where a local reference vector sₜᵏ is maintained and the client compresses the shifted gradient (g̃ₜᵏ − sₜᵏ) using an unbiased ω‑compression operator (e.g., random sparsification or quantization); the reference is then updated with a step size γₜ. (ii) model partitioning, where the compressed vector is split into A disjoint shards using a set of binary masks {mₜ(a)} that are mutually exclusive and collectively exhaustive. Each shard vₜᵏ,(a) is sent to a distinct client‑side aggregator a.
Each aggregator collects the corresponding shards from all K clients, adds back the global reference portion sₜ(a), and computes a permutation‑invariant average: vₜ(a) = sₜ(a) + (1/K)∑ₖ vₜᵏ,(a). The aggregator then updates its segment of the global model xₜ₊₁(a) = xₜ(a) + λₜ vₜ(a) and broadcasts the updated segment (and the new reference segment) back to all clients. Because every aggregator only ever sees a small, randomly selected subset of each client’s update, no single entity can reconstruct the full gradient, providing an inherent privacy shield without cryptographic primitives.
Theoretical contributions.
- Convergence: Under standard L‑smoothness and unbiased gradient‑estimator assumptions, the authors prove (Theorem 3.6) that ERIS achieves the same O(1/√T) convergence rate as FedAvg. The bound depends only on the variance of the gradient estimator (C₂) and the compression factor ω; there is no term that grows with the number of communication rounds T, unlike many DP‑or compression‑based FL analyses.
- Privacy: Using an information‑theoretic analysis, the paper shows that the mutual information between a client’s raw update Δ and what any aggregator observes is bounded by O(1/(A·p)), where A is the number of aggregators and p is the fraction of parameters each shard contains. Thus, increasing the number of aggregators or the compression level directly reduces leakage, offering a provable privacy amplification effect without adding noise.
Empirical evaluation. Experiments span five datasets: CIFAR‑10, ImageNet‑mini, FEMNIST (image), GLUE‑SST‑2, WikiText (text), and a 2.7 B‑parameter language model. Baselines include vanilla FedAvg, DP‑FedAvg, compressed‑plus‑DP variants, and Secure Aggregation. Key findings:
- Accuracy: ERIS matches FedAvg within 0.1 % on all tasks, while DP‑based methods lose 2–5 % accuracy.
- Communication: ERIS transmits on average <3.3 % of the model size per round (≈96 % reduction), with worst‑case reductions of 99.7 %. End‑to‑end latency per round improves by up to 10³×.
- Robustness to attacks: Membership inference success drops by >30 % relative to FedAvg, and reconstruction attacks produce significantly poorer visual/textual fidelity.
Strengths and limitations. The framework elegantly combines compression and decentralization to eliminate the server bottleneck, achieve near‑FedAvg utility, and provide quantifiable privacy gains without heavy cryptography or utility‑damaging noise. However, the privacy benefit scales with the number of aggregators, raising practical questions about how to provision and trust many aggregators in real networks. The current analysis assumes synchronous communication; extending to asynchronous or partially connected topologies remains open. Moreover, malicious aggregators are not explicitly defended against, so additional safeguards would be needed for hostile environments.
Conclusion. ERIS demonstrates that server‑less FL with carefully designed shifted compression and model partitioning can deliver FedAvg‑level performance, drastically lower communication overhead, and provable information‑theoretic privacy. Future work should explore dynamic aggregator placement, asynchronous protocols, and integration with lightweight cryptographic checks to harden the system against adversarial aggregators, thereby moving ERIS closer to deployment in real‑world large‑scale federated learning scenarios.
Comments & Academic Discussion
Loading comments...
Leave a Comment