Planning with Noisy Probabilistic Relational Rules

Noisy probabilistic relational rules are a promising world model representation for several reasons. They are compact and generalize over world instantiations. They are usually interpretable and they can be learned effectively from the action experiences in complex worlds. We investigate reasoning with such rules in grounded relational domains. Our algorithms exploit the compactness of rules for efficient and flexible decision-theoretic planning. As a first approach, we combine these rules with the Upper Confidence Bounds applied to Trees (UCT) algorithm based on look-ahead trees. Our second approach converts these rules into a structured dynamic Bayesian network representation and predicts the effects of action sequences using approximate inference and beliefs over world states. We evaluate the effectiveness of our approaches for planning in a simulated complex 3D robot manipulation scenario with an articulated manipulator and realistic physics and in domains of the probabilistic planning competition. Empirical results show that our methods can solve problems where existing methods fail.

💡 Research Summary

The paper tackles decision‑making in relational domains by leveraging Noisy Probabilistic Relational Rules (NPRRs) as a compact, interpretable world model. NPRRs extend traditional relational planning rules with explicit probability distributions over effects and independent “noise atoms” that capture stochastic variations. This representation yields two major benefits: (1) it generalizes across many concrete world instantiations because the same relational schema can be reused, dramatically reducing model size; and (2) the rules remain human‑readable, facilitating learning, verification, and manual refinement.

Two distinct planning algorithms are built on top of NPRRs. The first integrates NPRRs directly with Upper Confidence Bounds applied to Trees (UCT). In this approach each node of the search tree corresponds to a grounded world state, and child nodes are generated by sampling the stochastic outcomes of all applicable rules. The sampling respects the rule’s preconditions, effect probabilities, and noise variables, so the transition model is never fully enumerated. UCT’s exploration‑exploitation balance guides the search toward promising action sequences while keeping the number of simulated rollouts modest. Because the rule set already provides a compressed transition model, the algorithm scales to large relational state spaces with modest memory and CPU demands.

The second approach converts the entire NPRR set into a structured Dynamic Bayesian Network (DBN). Preconditions become conditional probability tables (CPTs) for observed variables, while effect distributions and noise atoms are encoded as hidden nodes. The DBN thus represents the joint distribution over successive world states given an action sequence. Approximate inference—implemented as a hybrid of particle filtering and variational techniques—propagates belief states forward, allowing the planner to evaluate the expected utility of action sequences without explicit rollouts. This belief‑based method excels when long horizons cause cumulative uncertainty, as it can maintain a compact distribution over possible worlds rather than a single sampled trajectory.

The authors evaluate both methods in two challenging settings. The primary benchmark is a 3‑D robot manipulation simulation featuring an articulated arm, realistic physics, friction, collision noise, and joint backlash. The task requires the robot to grasp, lift, and place objects under significant stochastic disturbances. The secondary benchmark consists of several domains from the Probabilistic Planning Competition (PPC), which include classic relational planning problems with hidden variables and probabilistic effects. In both suites, the NPRR‑based planners outperform traditional PDDL‑based planners and Monte‑Carlo Tree Search variants that either ignore relational structure or treat stochasticity in a less principled way. UCT‑NPRR shows rapid convergence for shallow horizons and limited rollout budgets, while DBN‑NPRR achieves higher success rates on deep, highly uncertain plans.

Key insights from the study are: (i) NPRRs provide a powerful middle ground between expressive relational models and tractable probabilistic reasoning; (ii) direct rule‑based UCT leverages the compactness of NPRRs for efficient tree search but remains sensitive to rollout policy design; (iii) converting rules to a DBN enables global belief updates and robust long‑term planning under noise; and (iv) the two approaches are complementary—practitioners can select or hybridize them based on domain characteristics such as horizon length, computational budget, and the degree of stochastic coupling.

In conclusion, the paper demonstrates that noisy probabilistic relational rules can be effectively exploited for decision‑theoretic planning in complex, physics‑rich environments. By coupling NPRRs with both tree‑search and belief‑propagation techniques, the authors achieve planning performance that surpasses existing methods on problems previously considered intractable. The work opens avenues for future research on hybrid planners that dynamically switch between UCT and DBN inference, as well as on learning richer rule structures from limited interaction data.