Operator-Theoretic Framework for Gradient-Free Federated Learning

Background: Federated learning in practice must address client heterogeneity, strict communication and computation requirements, and data privacy, while optimizing performance.Objectives: Develop an operator-theoretic framework for federated learning that simultaneously addresses statistical heterogeneity, performance guarantees, and privacy under practical communication and computation constraints. Methods: We first map the L 2 -optimal solution into a reproducing kernel Hilbert space (RKHS) using a forward operator. Using the available data in that RKHS, we approximate the optimal solution. We then map this solution back to the original L 2 function space via the inverse operator. This construction yields a gradient-free learning scheme. We derive explicit finite-sample performance bounds for this scheme using concentration inequalities over operator norms. The framework analytically identifies a data-dependent hypothesis space and provides guarantees on risk, prediction error, robustness, and approximation error. Within this space, we design a communication-and computation-efficient model using kernel machines, leveraging the space folding property of Kernel Affine Hull Machines (KAHMs). Clients transfer knowledge to the server using a novel scalar metric, space folding measure, derived from KAHMs. Being a scalar, this measure greatly reduces communication overhead. It also supports a simple differentially private FL protocol in which scalar space folding summaries are computed from noise-perturbed data matrices obtained via a single application of a noise-adding mechanism, thereby avoiding per-round gradient clipping and privacy accounting. Finally, the induced global prediction rule can be implemented using a small number of integer minimum and equality-comparison operations per test point, making it structurally compatible with fully homomorphic encryption (FHE) during inference. Results: Across four benchmarks (20Newsgroup, XGLUE-NC, CIFAR-10-LT, CIFAR-100-LT), the resulting gradient-free FL method built on fixed encoder embeddings is competitive with, and in several cases outperforms, strong gradient-based federated fine-tuning, with gains of up to 23.7 percentage points on the considered benchmarks. In differentially private experiments, the proposed kernel-based smoothing mechanism partially offsets the accuracy loss caused by noise in high-privacy regimes. The induced global prediction rule admits an FHE realization based on Q × C encrypted minimum and C equality-comparison operations per test point (where Q = #clients and C = #classes), and our operation-level benchmarks for these primitives indicate latencies

📜 Original Paper Content