Learning Poisson Binomial Distributions

Learning Poisson Binomial Distributions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider a basic problem in unsupervised learning: learning an unknown \emph{Poisson Binomial Distribution}. A Poisson Binomial Distribution (PBD) over ${0,1,\dots,n}$ is the distribution of a sum of $n$ independent Bernoulli random variables which may have arbitrary, potentially non-equal, expectations. These distributions were first studied by S. Poisson in 1837 \cite{Poisson:37} and are a natural $n$-parameter generalization of the familiar Binomial Distribution. Surprisingly, prior to our work this basic learning problem was poorly understood, and known results for it were far from optimal. We essentially settle the complexity of the learning problem for this basic class of distributions. As our first main result we give a highly efficient algorithm which learns to $\eps$-accuracy (with respect to the total variation distance) using $\tilde{O}(1/\eps^3)$ samples \emph{independent of $n$}. The running time of the algorithm is \emph{quasilinear} in the size of its input data, i.e., $\tilde{O}(\log(n)/\eps^3)$ bit-operations. (Observe that each draw from the distribution is a $\log(n)$-bit string.) Our second main result is a {\em proper} learning algorithm that learns to $\eps$-accuracy using $\tilde{O}(1/\eps^2)$ samples, and runs in time $(1/\eps)^{\poly (\log (1/\eps))} \cdot \log n$. This is nearly optimal, since any algorithm {for this problem} must use $\Omega(1/\eps^2)$ samples. We also give positive and negative results for some extensions of this learning problem to weighted sums of independent Bernoulli random variables.


💡 Research Summary

The paper addresses the fundamental unsupervised learning problem of estimating an unknown Poisson Binomial Distribution (PBD), which is the distribution of the sum of n independent Bernoulli variables with possibly distinct success probabilities. Prior to this work, the sample and computational complexities for learning PBDs were far from optimal. The authors present two main algorithms that essentially settle the complexity of this problem.

The first algorithm is a non‑proper learner: given parameters n, ε, δ and independent samples from the target PBD, it outputs an arbitrary distribution ˆX over {0,…,n} such that the total variation distance d_TV(ˆX, X) ≤ ε with probability at least 1−δ. The algorithm uses \tilde O(1/ε³)·log(1/δ) samples, independent of n, and runs in \tilde O(log n/ε³) bit‑operations, i.e., almost linear in the input size (each sample is a log n‑bit string).

The second algorithm is a proper learner: it outputs a hypothesis that is itself a PBD, i.e., a vector \hat p of n Bernoulli parameters. This algorithm achieves the optimal sample complexity \tilde O(1/ε²)·log(1/δ) and runs in time (1/ε)^{polylog(1/ε)}·log n. The lower bound Ω(1/ε²) follows from distinguishing two nearby binomial distributions, so the proper learner is essentially sample‑optimal.

Both results rely on a structural theorem (DP11/Das08) stating that every PBD is either (i) close to a distribution with sparse support (mass concentrated on O(ε√n) points) or (ii) close to a “heavy” binomial distribution after an appropriate translation. For the sparse case the algorithm runs Birgé’s unimodal‑distribution learner on a carefully chosen interval, exploiting the small effective support to obtain the \tilde O(1/ε³) sample bound. For the heavy‑binomial case the algorithm estimates the mean and variance, constructs a translated Poisson distribution matching these moments, and then converts it into an actual binomial distribution (which is a proper PBD).

To decide which of the two candidate hypotheses (sparse‑based or heavy‑binomial‑based) is correct, the authors employ a hypothesis‑testing tournament that compares each candidate against the empirical samples, guaranteeing that the selected hypothesis satisfies the ε‑accuracy requirement with high probability.

A key technical contribution is Lemma 10, a generic learning result: if a class of distributions admits an ε‑cover of size N, then O((log N)/ε²) samples suffice to learn any member of the class to ε accuracy. The authors construct an ε‑cover for the sparse case of size (1/ε)^{O(log²(1/ε))}, leading to the proper learner’s runtime overhead.

The paper also extends the analysis to weighted sums of independent Bernoulli variables, X = Σ a_i X_i. When the number of distinct weights k is constant, they give an algorithm using O(k/ε²·log n·log(1/δ)) samples and polynomial time in n. Conversely, they prove an information‑theoretic lower bound of Ω(n) samples when the weights are all distinct (e.g., a_i = i).

In summary, the authors provide (1) a near‑optimal non‑proper learner with sample complexity independent of n, (2) an essentially optimal proper learner, (3) a general covering‑based learning framework, and (4) extensions and lower bounds for weighted Bernoulli sums. These results dramatically improve upon prior work and establish a clear understanding of the learnability of Poisson Binomial Distributions, with implications for statistical estimation, algorithmic probability theory, and any application involving sums of independent but non‑identically distributed binary variables.


Comments & Academic Discussion

Loading comments...

Leave a Comment