A Heuristic Search Approach to Planning with Continuous Resources in Stochastic Domains

We consider the problem of optimal planning in stochastic domains with resource constraints, where the resources are continuous and the choice of action at each step depends on resource availability. We introduce the HAO* algorithm, a generalization of the AO* algorithm that performs search in a hybrid state space that is modeled using both discrete and continuous state variables, where the continuous variables represent monotonic resources. Like other heuristic search algorithms, HAO* leverages knowledge of the start state and an admissible heuristic to focus computational effort on those parts of the state space that could be reached from the start state by following an optimal policy. We show that this approach is especially effective when resource constraints limit how much of the state space is reachable. Experimental results demonstrate its effectiveness in the domain that motivates our research: automated planning for planetary exploration rovers.

💡 Research Summary

The paper tackles the challenging problem of optimal planning in stochastic domains where resources such as fuel or power are continuous and limited. Traditional heuristic search methods like AO* operate on purely discrete state spaces and become infeasible when continuous resources must be represented, because discretising the resource dimension leads to an explosion of states. To overcome this, the authors introduce HAO* (Hybrid AO*), a generalisation of AO* that works directly on a hybrid state space composed of discrete variables and continuous resource intervals.

Problem formulation
The authors model the environment as a stochastic Markov decision process (MDP) with a state defined by a pair (s, x), where s is a discrete component (e.g., rover location, mode) and x ∈ ℝⁿ is a vector of continuous resources. A crucial assumption is that each resource is monotonic – it only decreases (or changes in a known direction) as actions are executed. This monotonicity enables the compression of the continuous dimensions into intervals. The transition model P(s′, x′ | s, x, a) and reward function R(s, x, a) are defined in the usual stochastic way, and the objective is to maximise the expected cumulative reward from a given start state (s₀, x₀).

Limitations of prior work
Previous approaches either ignore the stochastic nature, treat resources as purely discrete (by fine‑grained quantisation), or apply dynamic programming without heuristic guidance. These methods either lose optimality guarantees or suffer from prohibitive computational costs, especially when resource constraints severely limit the reachable portion of the state space.

HAO algorithm*
HAO* retains the graph‑based structure of AO* but augments each node with a continuous resource interval X =