Empirical Measures and Strong Laws of Large Numbers in Categorical Probability

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The Glivenko–Cantelli theorem is a uniform version of the strong law of large numbers. It states that for every IID sequence of random variables, the empirical measure converges to the underlying distribution (in the sense of uniform convergence of the CDF). In this work, we provide tools to study such limits of empirical measures in categorical probability. We propose two axioms, namely permutation invariance and empirical adequacy, that a morphism of type $X^{\mathbb{N}} \to X$ should satisfy to be interpretable as taking an infinite sequence as input and producing a sample from its empirical measure as output. Since not all sequences have a well-defined empirical measure, such \emph{empirical sampling morphisms} live in quasi-Markov categories, which, unlike Markov categories, allow for partial morphisms. Given an empirical sampling morphism and a few other properties, we prove representability as well as abstract versions of the de Finetti theorem, the Glivenko–Cantelli theorem and the strong law of large numbers. We provide several concrete constructions of empirical sampling morphisms as partially defined Markov kernels on standard Borel spaces. Instantiating our abstract results then recovers the standard Glivenko–Cantelli theorem and the strong law of large numbers for random variables with finite first moment. Our work thus provides a joint proof of these two theorems in conjunction with the de Finetti theorem from first principles.

💡 Research Summary

The paper develops a categorical framework for studying empirical measures and the classical laws of large numbers. The authors introduce the notion of an “empirical sampling morphism” es : X^ℕ → X, a partial Markov kernel that, given an infinite sequence of outcomes, returns a sample drawn from the empirical distribution of that sequence. Because not every infinite sequence admits a well‑defined empirical measure, the morphism is allowed to be partial; its domain consists precisely of those sequences for which the empirical frequencies converge in the required sense.

Two axioms are imposed on es: (1) Permutation invariance – the probability assigned to any measurable set T is unchanged under any finite permutation of the input sequence; this captures the fact that empirical frequencies do not depend on the order of observations. (2) Empirical adequacy – for any exchangeable probability measure μ on X^ℕ, sampling a sequence from μ and then applying es produces a single sample whose law coincides with the first marginal of μ. This axiom encodes the de Finetti representation principle at the level of the morphism.

To make the abstract theory concrete, the authors work in the quasi‑Markov category of standard Borel spaces, a generalisation of ordinary Markov categories that admits partial morphisms. They construct explicit partial kernels for any standard Borel space X. For a measurable set T, the kernel assigns the limit

es(T | (x_i){i∈ℕ}) = lim{n→∞} |{i≤n | x_i∈T}| / n

whenever this limit exists; otherwise the kernel is undefined. Three technical subtleties are addressed: (i) many sequences lack a limit, so the domain must be restricted; (ii) for infinite X the limits may fail σ‑additivity, so uniform convergence on initial intervals (or on intervals in ℝ) is required; (iii) for uncountable X one cannot demand convergence for all measurable sets, so the construction is limited to a generating class (e.g., intervals in ℝ).

With these constructions in place, the paper proves several synthetic theorems. First, representability shows that the space of probability measures P(X) is the limit of the diagram of finite permutations acting on X^ℕ, yielding a categorical version of de Finetti’s theorem. Second, the Synthetic Glivenko–Cantelli Theorem (Theorem 4.7) states that, under the axioms, the empirical distribution functions converge uniformly over all intervals, exactly as in the classical Glivenko–Cantelli result. Third, the Synthetic Strong Law of Large Numbers (Corollary 4.11) establishes almost‑sure convergence of empirical averages to the true expectation for real‑valued random variables with finite first moment.

A key technical contribution is a functorial treatment of Kolmogorov products in quasi‑Markov categories, together with the notion of σ‑continuity (existence of countable directed meets in hom‑sets that are preserved by composition and tensor). This machinery underlies the ability to compose partial kernels and to reason about limits categorically.

The authors argue that their approach yields shorter, conceptually clearer proofs than traditional measure‑theoretic arguments, while also providing a unified perspective: the same empirical sampling morphism is the central ingredient for de Finetti, Glivenko–Cantelli, and the strong law. Moreover, because the framework works in any quasi‑Markov category, it opens the door to applying these results in settings beyond classical probability, such as categorical models of uncertainty, causal inference, and ergodic theory.

In summary, the paper introduces a novel categorical abstraction of empirical sampling, proves that it satisfies natural permutation‑invariance and adequacy axioms, constructs concrete instances on standard Borel spaces, and leverages this structure to give unified synthetic proofs of three cornerstone theorems of probability theory. This work both deepens the theoretical foundations of categorical probability and suggests new avenues for applying categorical methods to broader areas of stochastic analysis.

Empirical Measures and Strong Laws of Large Numbers in Categorical Probability

💡 Research Summary

Comments & Academic Discussion

Leave a Comment