Accurate, private, secure, federated U-statistics with higher degree
We study the problem of computing a U-statistic with a kernel function f of degree k $\ge$ 2, i.e., the average of some function f over all k-tuples of instances, in a federated learning setting. Ustatistics of degree 2 include several useful statistics such as Kendall’s $τ$ coefficient, the Area under the Receiver-Operator Curve and the Gini mean difference. Existing methods provide solutions only under the lower-utility local differential privacy model and/or scale poorly in the size of the domain discretization. In this work, we propose a protocol that securely computes U-statistics of degree k $\ge$ 2 under central differential privacy by leveraging Multi Party Computation (MPC). Our method substantially improves accuracy when compared to prior solutions. We provide a detailed theoretical analysis of its accuracy, communication and computational properties. We evaluate its performance empirically, obtaining favorable results, e.g., for Kendall’s $τ$ coefficient, our approach reduces the Mean Squared Error by up to four orders of magnitude over existing baselines.
💡 Research Summary
This paper tackles the problem of computing U‑statistics of degree k ≥ 2 in a federated learning setting while providing strong privacy guarantees under central differential privacy (CDP). U‑statistics are a fundamental class of estimators that include many widely used metrics such as Kendall’s τ, the area under the ROC curve (AUC), and the Gini mean difference, all of which have kernels of degree two. Existing federated solutions either rely on local differential privacy (LDP) – which adds a large amount of noise to each client’s data – or are limited to two‑party protocols and require heavy discretisation or matrix approximations (e.g., Johnson‑Lindenstrauss embeddings). These approaches suffer from poor accuracy, high communication overhead, or restrictive trust assumptions.
The authors propose a novel protocol, Q U‑MPC, that combines additive secret sharing with a carefully designed hypergraph‑based sampling of k‑tuples. Instead of evaluating the kernel on all (\binom{n}{k}) possible tuples, the parties jointly generate a random edge set E ⊂ Cₙᵏ using a shared PRG. The resulting partial U‑statistic (U_{f,E}) approximates the full statistic while keeping the maximum degree (\delta_{\max}) of the hypergraph bounded. This bound directly controls the sensitivity of the sum of kernel evaluations, which in turn determines the scale of the Gaussian (or Laplace) noise required for CDP.
The protocol proceeds in four phases:
- Sharing Phase – Each client secret‑shares its data value (x_i) among a designated subset of parties for every tuple in E that contains it. The sharing is performed with a (t, p) threshold scheme; the authors assume an honest‑majority setting (t ≈ p/2) but also discuss extensions to malicious settings.
- Computation Phase – Using the shared inputs, the parties evaluate the kernel function f on each sampled tuple via a standard MPC sub‑protocol (e.g., GMW or a linear‑secret‑sharing multiplication circuit). The result is a secret‑shared value (
Comments & Academic Discussion
Loading comments...
Leave a Comment