Prequential Plug-In Codes that Achieve Optimal Redundancy Rates even if the Model is Wrong
We analyse the prequential plug-in codes relative to one-parameter exponential families M. We show that if data are sampled i.i.d. from some distribution outside M, then the redundancy of any plug-in prequential code grows at rate larger than 1/2 ln(n) in the worst case. This means that plug-in codes, such as the Rissanen-Dawid ML code, may behave inferior to other important universal codes such as the 2-part MDL, Shtarkov and Bayes codes, for which the redundancy is always 1/2 ln(n) + O(1). However, we also show that a slight modification of the ML plug-in code, “almost” in the model, does achieve the optimal redundancy even if the the true distribution is outside M.
💡 Research Summary
The paper investigates the performance of prequential plug‑in codes when the underlying statistical model is misspecified, focusing on one‑parameter exponential families. A prequential code encodes each observation sequentially by plugging in a parameter estimate derived from the data observed so far. The classic implementation uses the maximum‑likelihood (ML) estimator, which is known to achieve the optimal redundancy of (1/2) ln n + O(1) when the data are generated from a distribution that belongs to the model.
The authors first ask what happens if the true data‑generating distribution Q lies outside the model M. They show that in this misspecified setting the redundancy of any ML‑based prequential code grows faster than (1/2) ln n. More precisely, the worst‑case redundancy is (1/2 + δ) ln n + O(1) where δ > 0 depends on the distance between Q and the closest member of M (typically measured by Kullback‑Leibler divergence). This result is derived by analysing the asymptotic behaviour of the ML estimator under misspecification: the estimator converges to the “projection” parameter θ* that minimises KL(Q‖pθ) within M, but because Q≠pθ* there remains a persistent model‑error term that contributes an extra logarithmic factor. Consequently, traditional ML plug‑in codes can be strictly inferior to other universal coding schemes such as the two‑part MDL, Shtarkov’s normalized maximum likelihood (NML), and Bayesian mixture codes, all of which retain the optimal (1/2) ln n redundancy regardless of misspecification.
To overcome this limitation, the paper proposes a modest modification of the plug‑in rule, termed the “almost‑in‑model” plug‑in code. Instead of using the raw ML estimate, the algorithm incorporates a small regularisation or prior term that shrinks the estimate toward the interior of the model. Concretely, at each step i the code uses a parameter
θ̃_i = argmax_{θ∈Θ}
Comments & Academic Discussion
Loading comments...
Leave a Comment