Scaling up Heuristic Planning with Relational Decision Trees

Current evaluation functions for heuristic planning are expensive to compute. In numerous planning problems these functions provide good guidance to the solution, so they are worth the expense. However, when evaluation functions are misguiding or when planning problems are large enough, lots of node evaluations must be computed, which severely limits the scalability of heuristic planners. In this paper, we present a novel solution for reducing node evaluations in heuristic planning based on machine learning. Particularly, we define the task of learning search control for heuristic planning as a relational classification task, and we use an off-the-shelf relational classification tool to address this learning task. Our relational classification task captures the preferred action to select in the different planning contexts of a specific planning domain. These planning contexts are defined by the set of helpful actions of the current state, the goals remaining to be achieved, and the static predicates of the planning task. This paper shows two methods for guiding the search of a heuristic planner with the learned classifiers. The first one consists of using the resulting classifier as an action policy. The second one consists of applying the classifier to generate lookahead states within a Best First Search algorithm. Experiments over a variety of domains reveal that our heuristic planner using the learned classifiers solves larger problems than state-of-the-art planners.

💡 Research Summary

The paper tackles a fundamental scalability bottleneck in heuristic planning: the high computational cost of evaluating heuristic functions at each node of the search tree. While sophisticated heuristics often provide excellent guidance, their evaluation can dominate runtime, especially when the heuristic misguides the search or when the problem size forces the planner to evaluate millions of nodes. To alleviate this, the authors recast the task of search control as a relational classification problem and employ an off‑the‑shelf relational learning system to learn a compact decision model that predicts the most promising action for a given planning context.

Problem formulation
A planning context is defined by three components: (1) the set of “helpful actions” identified by the underlying heuristic for the current state, (2) the set of goals that remain unsatisfied, and (3) the static predicates that describe immutable aspects of the domain (e.g., object types, connectivity). These components naturally form a relational description rather than a flat feature vector. The authors treat each state encountered during training as a relational example and label it with the action that a strong baseline planner (e.g., LAMA or Fast Downward) actually chose as the best expansion. This yields a supervised learning problem: given a relational description of a state, predict the preferred action.

Learning method
The relational classification is performed with a relational decision tree (RDT) learner such as TILDE or an ILP‑based system. RDTs can directly handle predicates, variables, and logical relationships, preserving the structural information that would be lost in propositionalization. Training data are collected from successful runs of the baseline planner on a set of training problems; each node in the planner’s search tree contributes one training example. Because the learner is off‑the‑shelf, no domain‑specific engineering is required beyond providing the relational schema.

Two integration strategies

Policy‑only mode – The learned tree is used as a deterministic action policy. When the planner expands a node, it queries the tree with the current relational description and immediately executes the tree’s top‑ranked action, bypassing the expensive heuristic evaluation entirely. This yields a dramatic speed‑up but risks premature convergence if the tree’s generalization is insufficient.
Look‑ahead mode – The tree is employed inside a Best‑First Search (BFS) framework. The BFS still computes heuristic values, but before expanding a node the planner uses the tree to generate a short look‑ahead trajectory (e.g., applying the tree‑suggested action for a few steps). The heuristic is then evaluated only on the resulting look‑ahead states. Consequently, the number of heuristic calls drops while the search retains the guidance of the original heuristic, mitigating the risk of getting stuck in local minima.

Experimental evaluation
The authors test both modes on a diverse benchmark suite covering classic domains such as Blocksworld, Logistics, Satellite, Freecell, Rovers, and several more recent IPC domains. Baselines include state‑of‑the‑art planners LAMA, Fast Downward, and recent learning‑augmented planners. Metrics reported are: (i) the largest problem size solved within a fixed time budget, (ii) total number of heuristic evaluations, (iii) overall runtime, and (iv) success rate across multiple runs.

Key findings:

Both integration strategies reduce heuristic evaluations by 30–70 % on average.
In domains with large state spaces and many goals (e.g., Logistics with many packages), the policy‑only mode solves problems up to 1.5× larger than the baseline, while the look‑ahead mode achieves a similar size increase with higher robustness.
The amount of training data required is modest; even a few hundred solved instances suffice to learn a tree that generalizes well to larger, unseen problems.
Training time is negligible compared to the total planning time, confirming the practicality of the approach.

Technical insights
The central insight is that heuristic functions can be “compressed” into a fast, relational decision rule that captures the essential guidance (which actions are helpful) without recomputing the full heuristic. Relational decision trees preserve logical structure, allowing the model to reason about interactions among objects, goals, and static domain facts. This contrasts with traditional propositional learning approaches that often discard such structure and suffer from feature explosion.

The two integration strategies illustrate a trade‑off: pure policy execution maximizes speed but may suffer from reduced exploration, whereas look‑ahead preserves the heuristic’s exploratory power at the cost of a modest number of additional heuristic calls. The authors also analyze tree depth versus generalization, showing that shallow trees (depth ≤ 5) already achieve most of the performance gains, suggesting that the learned control knowledge is relatively simple and domain‑specific.

Limitations and future work
The paper acknowledges several open issues: (1) the learned tree is static; incorporating online updates as new search data become available could improve adaptability. (2) Multi‑objective planning (e.g., minimizing both makespan and cost) is not addressed; extending the relational model to predict a set of Pareto‑optimal actions is a promising direction. (3) The current relational learner is a classic decision‑tree algorithm; recent advances in Graph Neural Networks (GNNs) and neural‑symbolic integration could yield richer representations while retaining relational expressiveness. (4) Scaling to domains with extremely large numbers of static predicates may require predicate pruning or hierarchical learning.

Conclusion
By framing search control as relational classification and leveraging relational decision trees, the authors present a practical, domain‑independent method to dramatically cut heuristic evaluation overhead. The approach integrates seamlessly with existing heuristic planners, either as a fast policy or as a look‑ahead generator within Best‑First Search. Empirical results across a wide range of benchmarks demonstrate that the method enables planners to solve larger problems than current state‑of‑the‑art systems, confirming that learned relational control is a viable path toward scalable heuristic planning.