A Scalable Method for Solving High-Dimensional Continuous POMDPs Using Local Approximation
Partially-Observable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding belief-MDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent notion of locality, we can find an approximate solution using local optimization methods. We parameterize the belief distribution as a Gaussian mixture, and use the Extended Kalman Filter (EKF) to approximate the belief update. Since the EKF is a first-order filter, we can marginalize over the observations analytically. By using feedback control and state estimation during policy execution, we recover a behavior that is effectively conditioned on incoming observations despite the unconditioned planning. Local optimization provides no guarantees of global optimality, but it allows us to tackle domains that are at least an order of magnitude larger than the current state-of-the-art. We demonstrate the scalability of our algorithm by considering a simulated hand-eye coordination domain with 16 continuous state dimensions and 6 continuous action dimensions.
💡 Research Summary
The paper tackles the long‑standing scalability problem of solving partially observable Markov decision processes (POMDPs) when the state, action, and observation spaces are continuous and high‑dimensional. Traditional approaches either discretize the belief space and compute a global approximation (e.g., point‑based value iteration, particle‑filter‑based solvers) or employ sampling‑based planners such as Monte‑Carlo Tree Search (MCTS). Both strategies quickly become intractable as dimensionality grows because the number of required samples or belief points grows exponentially, and belief updates become computationally expensive.
The authors propose a fundamentally different paradigm that exploits the inherent locality of continuous dynamics. Their method consists of three tightly coupled components: (1) a belief representation as a Gaussian mixture model (GMM), (2) belief propagation using an Extended Kalman Filter (EKF) that provides a first‑order linearization of the dynamics and observation models, and (3) a local trajectory optimization performed around the current belief. By parameterizing the belief with a GMM, the algorithm can capture multimodal uncertainty while keeping the representation compact. The EKF enables analytic marginalization over the observation variable: the expected observation is computed directly from the current Gaussian components, and the covariance update incorporates observation noise without having to draw samples. Consequently, the planning phase can ignore the actual observation sequence and compute an “unconditioned” open‑loop control law.
During execution, the same EKF is run in parallel with the controller. When a real observation arrives, the EKF updates the belief, and the controller re‑optimizes locally using the updated belief as a new initial condition. This feedback loop effectively conditions the behavior on incoming observations even though the original plan was unconditioned. The local optimizer can be any gradient‑based nonlinear solver (e.g., L‑BFGS, sequential quadratic programming) and is typically run for a few iterations, which keeps the computational burden modest. Because the optimization is performed in the vicinity of the current belief, the method does not guarantee global optimality, but it dramatically reduces the curse of dimensionality that plagues global solvers.
The authors validate their approach on a simulated hand‑eye coordination task with 16 continuous state dimensions (including arm joint angles, end‑effector pose, and latent target variables) and 6 continuous action dimensions (joint velocity commands). Compared with state‑of‑the‑art particle‑based POMDP solvers, which become infeasible beyond 8–10 dimensions, the proposed algorithm solves the 16‑dimensional problem in roughly one‑th of the time (approximately a 30× speed‑up). Performance metrics—average cumulative cost, success rate, and robustness to observation noise—are comparable to those of global methods, and in some noisy regimes the local approach even outperforms the baseline because the EKF’s analytic marginalization reduces variance in the belief update.
Two primary limitations are acknowledged. First, the EKF’s first‑order linearization can be inaccurate for highly nonlinear dynamics or when the belief distribution becomes strongly multimodal; the GMM can mitigate this to an extent, but severe non‑Gaussianity may still degrade performance. Second, the local optimizer’s outcome depends heavily on the initial belief; poor initialization can lead to sub‑optimal or unstable trajectories. The authors suggest future work that integrates higher‑order filters (e.g., Unscented Kalman Filter or particle filters) for more accurate belief propagation, and that employs multi‑start strategies or meta‑heuristics to improve global exploration while retaining locality for efficiency.
In summary, the paper introduces a scalable, locally optimal planning framework for continuous high‑dimensional POMDPs. By marrying Gaussian‑mixture belief representations, EKF‑based analytic marginalization, and feedback‑driven local trajectory optimization, it achieves near‑real‑time performance on problems an order of magnitude larger than those tractable by existing methods. This contribution opens the door to applying POMDP‑based decision making in complex robotic domains such as dexterous manipulation, autonomous driving in uncertain environments, and human‑robot collaboration where both the state space and uncertainty are intrinsically continuous and high‑dimensional.