The Redundancy of a Computable Code on a Noncomputable Distribution
We introduce new definitions of universal and superuniversal computable codes, which are based on a code’s ability to approximate Kolmogorov complexity within the prescribed margin for all individual sequences from a given set. Such sets of sequences may be singled out almost surely with respect to certain probability measures. Consider a measure parameterized with a real parameter and put an arbitrary prior on the parameter. The Bayesian measure is the expectation of the parameterized measure with respect to the prior. It appears that a modified Shannon-Fano code for any computable Bayesian measure, which we call the Bayesian code, is superuniversal on a set of parameterized measure-almost all sequences for prior-almost every parameter. According to this result, in the typical setting of mathematical statistics no computable code enjoys redundancy which is ultimately much less than that of the Bayesian code. Thus we introduce another characteristic of computable codes: The catch-up time is the length of data for which the code length drops below the Kolmogorov complexity plus the prescribed margin. Some codes may have smaller catch-up times than Bayesian codes.
💡 Research Summary
The paper revisits the foundations of universal coding by shifting the focus from average‑case performance to guarantees that hold for each individual sequence. Classical universal codes are defined only through an inequality on expected code length, which does not ensure that a code’s length stays close to the Kolmogorov complexity K(x) for every possible data string. To address this gap the authors introduce two refined notions:
- Universal (U) code – a computable code C such that for every sequence x in a prescribed set S, the code length ℓC(x) satisfies ℓC(x) ≤ K(x) + c for some constant c independent of x.
- Super‑universal (SU) code – a stronger version where the same inequality holds for infinitely many prefixes of each sequence, i.e., ℓC(x1…n) ≤ K(x1…n) + ε for all sufficiently large n (or for infinitely many n).
These definitions allow the authors to speak about “individual‑sequence optimality” rather than only about expected redundancy.
The statistical setting considered is a family of computable probability measures {Pθ} indexed by a real parameter θ∈Θ, together with an arbitrary prior π on Θ. The Bayesian mixture measure is defined as
Pπ = ∫Θ Pθ dπ(θ).
Because each Pθ is computable, the mixture Pπ is also computable, and a Shannon–Fano code can be built for it. The authors call this the Bayesian code: ℓBayes(x1…n) = ⌈−log₂ Pπ(x1…n)⌉ + O(1).
The central theorem states that for π‑almost every parameter θ, the Bayesian code is super‑universal on a set of sequences that has Pθ‑measure one. Formally, for any ε>0,
Pθ{ x : ∃N ∀n≥N ℓBayes(x1…n) ≤ K(x1…n)+ε } = 1
holds for π‑almost all θ. The proof combines the computability of the mixture, the classical Bayesian convergence theorem, and a martingale argument that shows −log₂ Pπ(x1…n) converges to K(x1…n) up to a vanishing additive term for almost all sample paths.
Consequences are striking for mathematical statistics: in the usual “model is true” scenario, no computable code can achieve asymptotically smaller redundancy than the Bayesian code. In other words, the Bayesian mixture attains the optimal individual‑sequence redundancy bound that any computable code can hope for.
Recognising that asymptotic optimality does not fully capture practical performance, the authors introduce a new metric called catch‑up time. For a given margin ε, the catch‑up time τε(x) is the smallest n such that ℓC(x1…n) ≤ K(x1…n)+ε. While the Bayesian code enjoys the best possible asymptotic bound, its constant overhead can be large, leading to a relatively long τε. By contrast, codes tailored to specific regularities (e.g., context‑tree weighting for Markov sources) may have much smaller catch‑up times even though they are not super‑universal for all θ. Thus the catch‑up time provides a finer‑grained comparison of codes in finite‑sample regimes.
The paper also discusses the limits of computable coding for non‑computable measures, showing that no computable code can be universal on a set of positive measure under a non‑computable source. This underscores a fundamental boundary between algorithmic information theory and classical probability.
In the concluding sections the authors compare their work with earlier notions of universal prediction, discuss potential extensions (e.g., multi‑parameter hierarchies, adaptive priors, or approximations of non‑computable mixtures), and suggest that future research should aim at designing codes that balance asymptotic optimality with small catch‑up times. Overall, the study provides a rigorous bridge between Bayesian inference, Kolmogorov complexity, and the theory of universal coding, while offering a practical performance measure that could guide the development of more efficient compression algorithms.
Comments & Academic Discussion
Loading comments...
Leave a Comment