The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics
Classical Federated Learning relies on a multi-round iterative process of model exchange and aggregation between server and clients, with high communication costs and privacy risks from repeated model transmissions. In contrast, one-shot federated learning (OFL) alleviates these limitations by reducing communication to a single round, thereby lowering overhead and enhancing practical deployability. Nevertheless, most existing one-shot approaches remain either impractical or constrained, for example, they often depend on the availability of a public dataset, assume homogeneous client models, or require uploading additional data or model information. To overcome these issues, we introduce the Gaussian-Head OFL (GH-OFL) family, a suite of one-shot federated methods that assume class-conditional Gaussianity of pretrained embeddings. Clients transmit only sufficient statistics (per-class counts and first/second-order moments) and the server builds heads via three components: (i) Closed-form Gaussian heads (NB/LDA/QDA) computed directly from the received statistics; (ii) FisherMix, a linear head with cosine margin trained on synthetic samples drawn in an estimated Fisher subspace; and (iii) Proto-Hyper, a lightweight low-rank residual head that refines Gaussian logits via knowledge distillation on those synthetic samples. In our experiments, GH-OFL methods deliver state-of-the-art robustness and accuracy under strong non-IID skew while remaining strictly data-free.
💡 Research Summary
The paper introduces GH‑OFL (Gaussian‑Head One‑Shot Federated Learning), a novel framework that eliminates the multi‑round communication overhead typical of conventional federated learning (FL). Instead of transmitting model updates or raw data, each client computes and securely aggregates only class‑wise sufficient statistics from a frozen pretrained encoder: per‑class sample counts, first‑order sums, and second‑order moments (full covariance or diagonal variance). To further reduce bandwidth and improve privacy, clients optionally project embeddings into a low‑dimensional random‑projection space before accumulating statistics, which are additive across clients.
From the global aggregated statistics the server constructs three closed‑form Gaussian discriminant heads: (i) Naïve Bayes with diagonal covariances (NB), (ii) Linear Discriminant Analysis with a shared pooled covariance (LDA), and (iii) Quadratic Discriminant Analysis with class‑specific full covariances (QDA). All heads are derived analytically, requiring only the aggregated moments, and are stabilized by a standard shrinkage regularizer.
Beyond these static heads, the server estimates a Fisher discriminative subspace by solving a generalized eigenvalue problem on between‑class and within‑class scatter matrices derived from the same statistics. Projecting the data onto the top eigenvectors yields a compact subspace where class separation is maximized. In this subspace the server synthesizes class‑conditional Gaussian samples using the estimated means and (shrunk) covariances, optionally adding small perturbations along dominant Fisher directions to probe decision margins. No real client data are ever accessed; the synthesis is entirely data‑free.
Two lightweight trainable heads are then learned solely on the synthetic samples:
-
FisherMix – a cosine‑margin linear classifier that normalizes features and weights, applies a scale factor and an angular margin, and is trained with standard cross‑entropy. This head excels when class prototypes are well‑formed but decision boundaries need a small angular buffer.
-
Proto‑Hyper – a low‑rank residual module added to a base Gaussian head (NB/LDA/QDA). The residual is parameterized as a product of two small matrices, and the combined model is trained with a knowledge‑distillation loss (soft targets from a blended Gaussian teacher) plus cross‑entropy. The residual corrects systematic biases of the Gaussian assumption (e.g., non‑Gaussian tails, mild correlations) while keeping the parameter count minimal.
The authors evaluate GH‑OFL on four image classification benchmarks (CIFAR‑10, CIFAR‑100, CIFAR‑100‑C, SVHN) under a range of non‑IID partitions generated by Dirichlet distributions with concentration parameters α from 0.05 to 1.0. Multiple backbones (ResNet‑18, ViT‑Base, etc.) are tested to demonstrate backbone‑agnosticism. Results show that the closed‑form heads already outperform existing one‑shot baselines, and the trainable FisherMix and Proto‑Hyper heads further close the gap to multi‑round FL methods, achieving state‑of‑the‑art accuracy and robustness even under severe label skew (α ≤ 0.1) and corruption (CIFAR‑100‑C). Communication cost is reduced to a few hundred kilobytes per round (class‑wise statistics in the projected space), representing a 2–3 order‑of‑magnitude reduction compared to traditional FedAvg which requires dozens of rounds of model transmission.
Key contributions are: (1) Demonstrating that per‑class sufficient statistics are enough to instantiate high‑performing Gaussian discriminants in a one‑shot setting; (2) Introducing a Fisher‑subspace‑driven synthetic data generator that enables data‑free training of additional lightweight heads; (3) Providing a bandwidth‑efficient, privacy‑preserving pipeline via random‑projection sketches and secure aggregation; (4) Showing strong empirical robustness to non‑IID data, model‑architecture changes, and data corruptions. Limitations include the need to transmit full class‑wise covariance when QDA is desired (increasing bandwidth modestly) and sensitivity to the choice of Fisher subspace dimensionality. Future work may explore automatic subspace dimension selection, extensions to non‑Gaussian generative models, and application to text or time‑series federated tasks.
Comments & Academic Discussion
Loading comments...
Leave a Comment