Characterizing predictable classes of processes

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The problem is sequence prediction in the following setting. A sequence $x_1,…,x_n,…$ of discrete-valued observations is generated according to some unknown probabilistic law (measure) $\mu$. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure $\mu$ belongs to an arbitrary class $\C$ of stochastic processes. We are interested in predictors $\rho$ whose conditional probabilities converge to the “true” $\mu$-conditional probabilities if any $\mu\in\C$ is chosen to generate the data. We show that if such a predictor exists, then a predictor can also be obtained as a convex combination of a countably many elements of $\C$. In other words, it can be obtained as a Bayesian predictor whose prior is concentrated on a countable set. This result is established for two very different measures of performance of prediction, one of which is very strong, namely, total variation, and the other is very weak, namely, prediction in expected average Kullback-Leibler divergence.

💡 Research Summary

The paper addresses the classic problem of sequential prediction under uncertainty. A discrete‑valued infinite sequence (x_{1},x_{2},\dots) is generated by an unknown stochastic law (\mu). The only information available to the forecaster is that (\mu) belongs to a prescribed class (\mathcal C) of stochastic processes. After each observation the predictor must output a conditional distribution for the next symbol. The central question is: under what conditions on (\mathcal C) does there exist a predictor (\rho) whose conditional probabilities converge to the true (\mu)‑conditionals for every possible (\mu\in\mathcal C)?

Two notions of convergence are considered. The first, very strong, requires convergence in total variation distance: for almost every history the predictive distribution (\rho(\cdot\mid x_{1:t})) must become indistinguishable from (\mu(\cdot\mid x_{1:t})). The second, much weaker, demands that the expected average Kullback‑Leibler (KL) divergence between (\rho) and (\mu) tends to zero: \

Characterizing predictable classes of processes

💡 Research Summary

Comments & Academic Discussion

Leave a Comment