Discrete MDL Predicts in Total Variation
The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance. Implications for non-i.i.d. domains like time-series forecasting, discriminative learning, and reinforcement learning are discussed.
💡 Research Summary
The paper “Discrete MDL Predicts in Total Variation” establishes a remarkably general convergence result for the Minimum Description Length (MDL) principle when applied to a countable class of probabilistic models. The authors show that, without any assumptions about independence, stationarity, ergodicity, identifiability, or even the presence of the true distribution within the model class, the model selected by MDL (or equivalently by a Maximum A Posteriori (MAP) rule) merges with the true data‑generating measure in total variation distance as the amount of observed data grows without bound.
Problem setting
Let (\mathcal{M}={P_i}{i\in\mathbb{N}}) be a countable set of probability measures on a common measurable space. Each model receives a strictly positive prior weight (w_i) (so (\sum_i w_i=1) can be assumed). A data sequence (x{1:n}) is generated by an unknown “true” measure (P^*); the true measure need not belong to (\mathcal{M}). The MDL estimator at time (n) is defined as
\
Comments & Academic Discussion
Loading comments...
Leave a Comment