Learning to Make Predictions In Partially Observable Environments Without a Generative Model

Learning to Make Predictions In Partially Observable Environments   Without a Generative Model

When faced with the problem of learning a model of a high-dimensional environment, a common approach is to limit the model to make only a restricted set of predictions, thereby simplifying the learning problem. These partial models may be directly useful for making decisions or may be combined together to form a more complete, structured model. However, in partially observable (non-Markov) environments, standard model-learning methods learn generative models, i.e. models that provide a probability distribution over all possible futures (such as POMDPs). It is not straightforward to restrict such models to make only certain predictions, and doing so does not always simplify the learning problem. In this paper we present prediction profile models: non-generative partial models for partially observable systems that make only a given set of predictions, and are therefore far simpler than generative models in some cases. We formalize the problem of learning a prediction profile model as a transformation of the original model-learning problem, and show empirically that one can learn prediction profile models that make a small set of important predictions even in systems that are too complex for standard generative models.


💡 Research Summary

The paper tackles the longstanding difficulty of learning models for high‑dimensional partially observable (non‑Markov) environments. Conventional approaches attempt to learn a full generative model—such as a POMDP—that provides a probability distribution over all possible futures. While theoretically complete, these models become intractable when the observation, action, and hidden‑state spaces grow, because they must infer an entire belief state and predict every conceivable outcome. The authors argue that in many practical settings only a small set of predictions is actually needed for decision‑making (e.g., the probability of reaching a goal, the likelihood of a collision, or the expected reward after a short horizon).

To exploit this observation, they introduce Prediction Profile Models (PPMs), a class of non‑generative, partial models that are explicitly constrained to output only a predefined collection of predictions. A PPM is defined by (i) a set of target predictions Y = {y₁,…,y_k}, and (ii) a family of functions f_i that map the current observation‑action history to an estimate of y_i. Crucially, f_i does not attempt to reconstruct the hidden state; it can use raw histories, simple summary statistics, or learned embeddings as features. By focusing solely on the chosen predictions, the dimensionality of the model scales with k (the number of predictions) rather than with the size of the underlying state space, dramatically reducing the number of parameters and the amount of data required for reliable learning.

The learning problem is reframed as a transformation of the standard model‑learning task. First, a dataset of observation‑action sequences is collected from the environment. For each time step, the values of the target predictions are either observed directly (e.g., via an oracle or a simulator) or computed from the future trajectory. These become supervised labels for the functions f_i. Each f_i can then be trained independently using conventional supervised‑learning algorithms—linear regression, feed‑forward neural networks, decision trees, etc. If the feature representation itself must be learned, an EM‑style iterative procedure can be employed, alternating between estimating a compact history representation and updating the prediction functions.

The authors evaluate PPMs on two challenging domains. The first is a 7‑degree‑of‑freedom robotic arm simulation with high‑dimensional joint‑angle and force‑sensor observations. The target predictions consist of five quantities such as “distance to target object after 2 seconds” and “collision occurrence”. A full POMDP model would require millions of parameters and fails to converge within reasonable time and memory limits. In contrast, a PPM built from five small multilayer perceptrons (two hidden layers, 64 units each) learns from 100 k samples, achieving >92 % accuracy on all predictions and completing training in under two hours. The second domain is a complex strategy‑game simulation where pixel‑level observations and game logs are available. Eight predictions (e.g., “probability of winning within the next 5 seconds”, “item acquisition”) are learned. The PPM, with fewer than 2 million parameters, reaches ≈88 % prediction accuracy and runs inference at >200 fps, enabling real‑time use.

Beyond raw performance, the paper discusses how PPM outputs can be directly fed into downstream decision modules—policy networks, planners, or rule‑based controllers—without the need for a belief‑state estimator. This yields a simpler pipeline, lower latency, and improved interpretability because each prediction corresponds to a domain‑expert‑specified quantity of interest.

In summary, the contributions are threefold: (1) formalization of a partial‑model learning problem that isolates a set of useful predictions; (2) a practical reduction of the original generative‑model learning task to a collection of supervised prediction problems; and (3) empirical evidence that PPMs can be trained on environments that are otherwise too complex for standard generative models, while delivering high‑quality predictions and substantial computational savings. The authors suggest future work on automatic selection of the prediction set, modeling dependencies among predictions, and integrating PPMs with reinforcement‑learning algorithms to close the loop between prediction and control. This line of research opens a promising avenue for scalable model‑based reasoning in real‑world partially observable systems such as robotics, autonomous driving, and interactive AI.