Efficient Online Random Sampling via Randomness Recycling

Efficient Online Random Sampling via Randomness Recycling
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This article studies the fundamental problem of using i.i.d. coin tosses from an entropy source to efficiently generate random variables $X_i \sim P_i$ $(i \ge 1)$, where $(P_1, P_2, \dots)$ is a random sequence of rational discrete probability distributions subject to an \textit{arbitrary} stochastic process. Our method achieves an amortized expected entropy cost within $\varepsilon > 0$ bits of the information-theoretically optimal Shannon lower bound using $O(\log(1/\varepsilon))$ space. This result holds both pointwise in terms of the Shannon information content conditioned on $X_i$ and $P_i$, and in expectation to obtain a rate of $\mathbb{E}[H(P_1) + \dots + H(P_n)]/n + \varepsilon$ bits per sample as $n \to \infty$ (where $H$ is the Shannon entropy). The combination of space, time, and entropy properties of our method improves upon the Knuth and Yao (1976) entropy-optimal algorithm and Han and Hoshi (1997) interval algorithm for online sampling, which require unbounded space. It also uses exponentially less space than the more specialized methods of Kozen and Soloviev (2022) and Shao and Wang (2025) that generate i.i.d. samples from a fixed distribution. Our online sampling algorithm rests on a powerful algorithmic technique called \textit{randomness recycling}, which reuses a fraction of the random information consumed by a probabilistic algorithm to reduce its amortized entropy cost. On the practical side, we develop randomness recycling techniques to accelerate a variety of prominent sampling algorithms. We show that randomness recycling enables state-of-the-art runtime performance on the Fisher-Yates shuffle when using a cryptographically secure pseudorandom number generator, and that it reduces the entropy cost of discrete Gaussian sampling. Accompanying the manuscript is a performant software library in the C programming language.


💡 Research Summary

The paper tackles the classic problem of generating a sequence of random variables (X_i) from a dynamically changing sequence of discrete probability distributions (P_i) using only i.i.d. fair coin flips as an entropy source. The setting is fully online: at each round the algorithm receives the next distribution, must produce a sample from it, and then proceeds to the next round. While the optimal entropy lower bound for this task is given by Shannon’s source coding theorem (the expected number of coin flips must be at least the sum of the Shannon entropies of the target distributions), previously known optimal algorithms such as Knuth‑Yao’s tree method (1976) and the Han‑Hoshi interval (arithmetic coding) method (1997) require unbounded auxiliary memory, making them impractical for real‑world systems with limited space.

The authors introduce a novel algorithmic paradigm called Randomness Recycling. The key insight is that after a sampling step a portion of the random bits remains unused; instead of discarding this “residual randomness”, the algorithm stores it in a compact state and reuses it in later rounds. To formalize this, they define two kinds of random states—uniform and non‑uniform—and provide elementary operations for merging and splitting these states while preserving exact distributional correctness. All operations are performed with integer arithmetic only, avoiding costly arbitrary‑precision calculations.

The central theoretical contribution is Theorem 1.5. For any error tolerance (\varepsilon>0) and any common denominator (d) (so that each distribution has rational probabilities with denominator at most (d)), the authors construct an online sampler that, for every distribution sequence (p\in(\Delta_X^d)^{\mathbb N}), satisfies
\


Comments & Academic Discussion

Loading comments...

Leave a Comment