Universal Prediction of Selected Bits
Many learning tasks can be viewed as sequence prediction problems. For example, online classification can be converted to sequence prediction with the sequence being pairs of input/target data and where the goal is to correctly predict the target data given input data and previous input/target pairs. Solomonoff induction is known to solve the general sequence prediction problem, but only if the entire sequence is sampled from a computable distribution. In the case of classification and discriminative learning though, only the targets need be structured (given the inputs). We show that the normalised version of Solomonoff induction can still be used in this case, and more generally that it can detect any recursive sub-pattern (regularity) within an otherwise completely unstructured sequence. It is also shown that the unnormalised version can fail to predict very simple recursive sub-patterns.
💡 Research Summary
The paper investigates the predictive capabilities of Solomonoff induction when only a subset of bits in a binary sequence follows a computable pattern, while the rest of the sequence may be completely unstructured. Classical results guarantee that the universal a‑priori semimeasure M converges to the true distribution only when the entire sequence is generated by a computable probability measure. In many learning scenarios, however, only the target variable (e.g., class labels) exhibits regularity, whereas the input data may be arbitrary. The authors therefore focus on the “selected bits” problem: can a predictor infer the regular bits without any assumptions on the remaining bits?
The paper first reviews the necessary formalism: binary strings, semimeasures, monotone and prefix Turing machines, the universal monotone machine U_M, and the associated universal semimeasure M. Because M is defective (M(x) > M(x0)+M(x1)), it does not sum to one and consequently assigns a non‑zero probability that the sequence terminates. Solomonoff’s original solution is to normalise M, yielding a proper probability measure M_norm defined by
M_norm(y₁…yₙ | y₁…yₙ₋₁) = M(y₁…yₙ) / (M(y₁…yₙ₋₁0)+M(y₁…yₙ₋₁1)).
The authors argue that this normalisation, often ignored in the literature, is crucial for the selected‑bits problem.
The central positive result is Theorem 10. Let f : B* → B∪{ε} be a total recursive function that, whenever it outputs a bit (i.e., f(ωn) ≠ ε), correctly predicts the next bit of an infinite binary sequence ω. Under these conditions, the normalized Solomonoff predictor M_norm eventually assigns probability 1 to the bits that f predicts. In other words, if there exists any computable rule that correctly predicts a subsequence of bits, M_norm will learn to predict exactly those bits with certainty. The proof constructs a new monotone machine L consisting of all programs from U_M that produce outputs consistent with f up to the first mistake. By carefully pruning L to a prefix‑free subset, the authors define an enumerable semi‑distribution P that is dominated by M. Using known relationships between the universal semimeasure M, the universal prefix measure m, and Kolmogorov complexity (Theorem 8), together with Lemma 9 (which states that m(ωn)/M(ωn) → 0 for any ω), they show that the conditional probability M_norm(¬ωₙ | ωn) tends to zero, implying convergence to the correct prediction.
The paper also presents a negative result: the unnormalised semimeasure M can fail to learn even the simplest recursive pattern. The authors exhibit a sequence where every even bit equals the preceding odd bit, while odd bits are arbitrary. In this case, M’s conditional probabilities for the even positions never converge to 1, demonstrating that normalisation is not a cosmetic detail but a substantive requirement for learning selected bits.
Theorem 11 sketches an extension to partial recursive predictors, showing that if a partial function f is defined on all prefixes of ω and correctly predicts the bits where it outputs a value, a similar convergence may hold. The full proof is left open, indicating a direction for future work.
Overall, the paper establishes that normalized Solomonoff induction is capable of detecting any computable sub‑pattern embedded in an otherwise arbitrary binary stream. This result bridges a gap between the theory of universal prediction and practical discriminative learning: even when the input distribution is non‑computable or highly complex, as long as the target output follows a computable rule, the normalized universal predictor will eventually learn it. The work also clarifies the role of normalisation, which had been largely ignored in prior analyses, and suggests that many existing impossibility results for Solomonoff induction may need to be revisited under the normalized framework.
Comments & Academic Discussion
Loading comments...
Leave a Comment