Computationally Efficient Replicable Learning of Parities

Computationally Efficient Replicable Learning of Parities
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the computational relationship between replicability (Impagliazzo et al. [STOC 22], Ghazi et al. [NeurIPS 21]) and other stability notions. Specifically, we focus on replicable PAC learning and its connections to differential privacy (Dwork et al. [TCC 2006]) and to the statistical query (SQ) model (Kearns [JACM `98]). Statistically, it was known that differentially private learning and replicable learning are equivalent and strictly more powerful than SQ-learning. Yet, computationally, all previously known efficient (i.e., polynomial-time) replicable learning algorithms were confined to SQ-learnable tasks or restricted distributions, in contrast to differentially private learning. Our main contribution is the first computationally efficient replicable algorithm for realizable learning of parities over arbitrary distributions, a task that is known to be hard in the SQ-model, but possible under differential privacy. This result provides the first evidence that efficient replicable learning over general distributions strictly extends efficient SQ-learning, and is closer in power to efficient differentially private learning, despite computational separations between replicability and privacy. Our main building block is a new, efficient, and replicable algorithm that, given a set of vectors, outputs a subspace of their linear span that covers most of them.


💡 Research Summary

The paper investigates the computational relationship among three stability notions that have become central in modern learning theory: replicability, differential privacy, and the statistical query (SQ) model. While these notions are known to be statistically equivalent—meaning that the existence of a sample‑efficient algorithm for a task under one notion implies the existence of a sample‑efficient algorithm under the others—their computational equivalences remain largely unresolved. In particular, all previously known polynomial‑time replicable learners were confined to tasks that admit efficient SQ algorithms or to restricted distribution families, whereas differential privacy has been shown to enable efficient learning for a broader class of problems, including learning parity functions under arbitrary distributions.

Parity learning is a canonical example of a task that is SQ‑hard (Kearns, 1998) but admits efficient pure differential‑privacy algorithms (KLN+11). This makes it a prime candidate for separating replicability from privacy at the computational level. Prior work achieved replicable parity learning only under the uniform distribution or under distributions with low decision‑tree complexity. The authors close this gap by presenting the first polynomial‑time replicable algorithm for realizable learning of parity functions over any distribution on {0,1}^d.

The core technical contribution is a new subroutine called RepLinearSpan. Given a multiset S = {v₁,…,v_m} of binary vectors, RepLinearSpan first runs the stable partition algorithm of Kothari, Meka, and Mahajan (2025). This algorithm greedily partitions S into linearly independent blocks {A₁,…,A_M} such that each block’s span belongs to a nested family of at most d subspaces. For each subspace V generated by a block, the algorithm counts how many blocks have exactly that span, yielding a multiplicity function n_S(V). The algorithm then selects a random threshold t uniformly from a carefully calibrated interval


Comments & Academic Discussion

Loading comments...

Leave a Comment