Automatic Induction of Bellman-Error Features for Probabilistic Planning
Domain-specific features are important in representing problem structure throughout machine learning and decision-theoretic planning. In planning, once state features are provided, domain-independent algorithms such as approximate value iteration can learn weighted combinations of those features that often perform well as heuristic estimates of state value (e.g., distance to the goal). Successful applications in real-world domains often require features crafted by human experts. Here, we propose automatic processes for learning useful domain-specific feature sets with little or no human intervention. Our methods select and add features that describe state-space regions of high inconsistency in the Bellman equation (statewise Bellman error) during approximate value iteration. Our method can be applied using any real-valued-feature hypothesis space and corresponding learning method for selecting features from training sets of state-value pairs. We evaluate the method with hypothesis spaces defined by both relational and propositional feature languages, using nine probabilistic planning domains. We show that approximate value iteration using a relational feature space performs at the state-of-the-art in domain-independent stochastic relational planning. Our method provides the first domain-independent approach that plays Tetris successfully (without human-engineered features).
💡 Research Summary
The paper tackles the long‑standing bottleneck of feature engineering in probabilistic planning by proposing a fully automated method that discovers domain‑specific state features during approximate value iteration (AVI). The core insight is that the Bellman error— the discrepancy between the current value estimate and the expected return dictated by the Bellman equation—highlights regions of the state space where the existing feature set fails to capture essential structure. By repeatedly measuring this error, the algorithm identifies high‑error states, constructs a training set of (state, value) pairs, and then selects new real‑valued features that best reduce the error on this set.
The proposed framework consists of four steps: (1) run AVI with the current feature set; (2) compute the Bellman error for every visited state; (3) sample states with large errors to form a regression dataset; and (4) invoke a generic feature‑selection learner (e.g., L1‑regularized regression or regression trees) to pick additional features from a predefined hypothesis space. The method is deliberately agnostic to the nature of the hypothesis space, allowing the authors to experiment with both propositional (fixed‑length binary vectors) and relational (first‑order logical predicates) feature languages.
Relational features are expressed as parameterized predicates such as “on(x, y)” or “clear(z)”, enabling the system to capture structural regularities that generalize across objects and problem instances. Propositional features, by contrast, are simple Boolean indicators that are cheap to compute and work well in small, fully enumerated domains. The authors evaluate the approach on nine benchmark probabilistic planning domains, ranging from stochastic blocks‑world and logistics problems to the classic video game Tetris.
Results show that automatically induced features consistently improve policy quality compared with baseline AVI using only the initial feature set, and they often match or surpass hand‑crafted feature baselines. In particular, the relational feature space achieves state‑of‑the‑art performance on stochastic relational planning tasks, and it is the first domain‑independent method to solve Tetris without any human‑engineered features. These findings demonstrate that Bellman‑error‑driven feature induction can replace expert knowledge in many settings, offering a scalable path toward truly domain‑independent planning.
The paper also discusses limitations. The candidate‑generation step can become computationally expensive when the hypothesis space is large, and the current experiments focus on discrete or factored state representations; extending the approach to high‑dimensional continuous spaces remains an open challenge. Future work is suggested in the areas of efficient candidate sampling, integration with deep neural feature extractors, and online updating of features during execution.
Overall, the contribution is a principled, general‑purpose mechanism for automatically enriching the feature representation of a planning problem, guided by the very error signal that the planner seeks to minimize. This bridges a gap between model‑free reinforcement learning and model‑based planning, and it opens the door to more autonomous, adaptable AI systems that require minimal human intervention in feature design.