Prequential Plug-In Codes that Achieve Optimal Redundancy Rates even if the Model is Wrong

Prequential Plug-In Codes that Achieve Optimal Redundancy Rates even if   the Model is Wrong
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We analyse the prequential plug-in codes relative to one-parameter exponential families M. We show that if data are sampled i.i.d. from some distribution outside M, then the redundancy of any plug-in prequential code grows at rate larger than 1/2 ln(n) in the worst case. This means that plug-in codes, such as the Rissanen-Dawid ML code, may behave inferior to other important universal codes such as the 2-part MDL, Shtarkov and Bayes codes, for which the redundancy is always 1/2 ln(n) + O(1). However, we also show that a slight modification of the ML plug-in code, “almost” in the model, does achieve the optimal redundancy even if the the true distribution is outside M.


💡 Research Summary

The paper investigates the performance of prequential plug‑in codes when the underlying statistical model is misspecified, focusing on one‑parameter exponential families. A prequential code encodes each observation sequentially by plugging in a parameter estimate derived from the data observed so far. The classic implementation uses the maximum‑likelihood (ML) estimator, which is known to achieve the optimal redundancy of (1/2) ln n + O(1) when the data are generated from a distribution that belongs to the model.

The authors first ask what happens if the true data‑generating distribution Q lies outside the model M. They show that in this misspecified setting the redundancy of any ML‑based prequential code grows faster than (1/2) ln n. More precisely, the worst‑case redundancy is (1/2 + δ) ln n + O(1) where δ > 0 depends on the distance between Q and the closest member of M (typically measured by Kullback‑Leibler divergence). This result is derived by analysing the asymptotic behaviour of the ML estimator under misspecification: the estimator converges to the “projection” parameter θ* that minimises KL(Q‖pθ) within M, but because Q≠pθ* there remains a persistent model‑error term that contributes an extra logarithmic factor. Consequently, traditional ML plug‑in codes can be strictly inferior to other universal coding schemes such as the two‑part MDL, Shtarkov’s normalized maximum likelihood (NML), and Bayesian mixture codes, all of which retain the optimal (1/2) ln n redundancy regardless of misspecification.

To overcome this limitation, the paper proposes a modest modification of the plug‑in rule, termed the “almost‑in‑model” plug‑in code. Instead of using the raw ML estimate, the algorithm incorporates a small regularisation or prior term that shrinks the estimate toward the interior of the model. Concretely, at each step i the code uses a parameter

 θ̃_i = argmax_{θ∈Θ}


Comments & Academic Discussion

Loading comments...

Leave a Comment