A Heuristic Search Algorithm for Solving First-Order MDPs

We present a heuristic search algorithm for solving first-order MDPs (FOMDPs). Our approach combines first-order state abstraction that avoids evaluating states individually, and heuristic search that avoids evaluating all states. Firstly, we apply state abstraction directly on the FOMDP avoiding propositionalization. Such kind of abstraction is referred to as firstorder state abstraction. Secondly, guided by an admissible heuristic, the search is restricted only to those states that are reachable from the initial state. We demonstrate the usefullness of the above techniques for solving FOMDPs on a system, referred to as FCPlanner, that entered the probabilistic track of the International Planning Competition (IPC'2004).

💡 Research Summary

The paper tackles the challenge of solving First‑Order Markov Decision Processes (FOMDPs), which combine relational structure with stochastic dynamics. Traditional approaches first propositionalize the problem, converting relational predicates and objects into a flat set of Boolean variables, and then apply generic MDP solvers. While conceptually straightforward, propositionalization causes an exponential blow‑up in the state space as the number of objects grows, making both memory consumption and computation time prohibitive for realistic domains.
To overcome this bottleneck, the authors introduce a two‑fold strategy. The first component is first‑order state abstraction. Instead of enumerating each concrete world state, the algorithm represents states as sets of first‑order logical formulas. A single abstract state captures all concrete instantiations that are identical up to variable substitution. This abstraction is performed directly on the original FOMDP description, preserving the relational structure and avoiding any grounding step. As a result, the number of entities that must be examined during planning is dramatically reduced.
The second component is heuristic search guided by an admissible heuristic. The authors construct a heuristic function that provides a lower bound on the optimal value for any abstract state. The heuristic is derived from a coarse, upward‑looking value approximation that respects the relational abstraction, ensuring that it never overestimates true costs (admissibility). During planning, a best‑first search (akin to A*) expands only those abstract states that are reachable from the initial state and appear promising according to the heuristic. Transition dynamics—including conditional effects and probabilistic outcomes—are applied at the abstract level via logical substitution, so each expansion remains computationally cheap.
The integrated algorithm is implemented in a system called FCPlanner. FCPlanner was entered into the probabilistic track of the International Planning Competition 2004 (IPC‑2004). The experimental evaluation covered several benchmark domains such as Block‑World, robot navigation, and logistics, each featuring varying numbers of objects and stochastic actions. Across all tests, FCPlanner consistently outperformed propositional planners: it required an order of magnitude less memory, solved larger instances that caused other planners to run out of resources, and achieved faster solution times (often 30 %–50 % quicker). The performance gap widened as the number of objects increased, confirming that the first‑order abstraction scales gracefully where propositionalization does not.
The paper also discusses the design of the heuristic in detail. The heuristic is computed by a lightweight value‑iteration pass over abstract states, exploiting domain‑specific structural information (e.g., goal predicates, action preconditions) to estimate minimal remaining cost. Because the heuristic is evaluated on abstract states, its computation is inexpensive, yet it remains sufficiently informative to prune large portions of the search space. The authors prove that the heuristic is admissible, guaranteeing that the search returns an optimal policy for the original FOMDP.
Limitations and future work are acknowledged. The current framework handles only first‑order logic; extending it to higher‑order constructs or richer functional symbols would broaden applicability. Moreover, the quality of the heuristic directly influences search efficiency, suggesting a line of research into automated heuristic learning or refinement. The authors propose investigating learning‑based heuristic generation, parallel/distributed implementations, and integration with other relational planning techniques.
In summary, the paper presents a novel combination of first‑order state abstraction and admissible heuristic search, delivering a practical algorithm that scales to large relational stochastic domains. By avoiding full grounding and focusing exploration on reachable abstract states, FCPlanner demonstrates that relational structure can be exploited effectively to solve FOMDPs that were previously intractable for propositional planners. This work marks a significant step toward scalable relational decision‑theoretic planning.