Planning in POMDPs Using Multiplicity Automata

Planning and learning in Partially Observable MDPs (POMDPs) are among the most challenging tasks in both the AI and Operation Research communities. Although solutions to these problems are intractable in general, there might be special cases, such as structured POMDPs, which can be solved efficiently. A natural and possibly efficient way to represent a POMDP is through the predictive state representation (PSR) - a representation which recently has been receiving increasing attention. In this work, we relate POMDPs to multiplicity automata- showing that POMDPs can be represented by multiplicity automata with no increase in the representation size. Furthermore, we show that the size of the multiplicity automaton is equal to the rank of the predictive state representation. Therefore, we relate both the predictive state representation and POMDPs to the well-founded multiplicity automata literature. Based on the multiplicity automata representation, we provide a planning algorithm which is exponential only in the multiplicity automata rank rather than the number of states of the POMDP. As a result, whenever the predictive state representation is logarithmic in the standard POMDP representation, our planning algorithm is efficient.

💡 Research Summary

The paper tackles the notoriously hard problem of planning in Partially Observable Markov Decision Processes (POMDPs) by exploiting a deep connection with multiplicity automata (MA) and predictive state representations (PSRs). Traditional POMDP solution methods suffer from an exponential blow‑up because the agent must reason over the joint space of hidden states, actions, and observations. PSRs have been proposed as an alternative that sidesteps hidden states by directly modeling the probabilities of future observation sequences, but the conditions under which PSRs lead to tractable solutions have remained vague.

The authors first show that any finite‑horizon POMDP can be transformed into an MA of exactly the same size—no extra states are introduced. In this transformation, the POMDP’s transition, observation, and reward functions become the transition matrices and output vectors of the MA. Crucially, the rank of the resulting MA (the linear algebraic rank of its matrix representation) is proved to be identical to the dimension of the minimal PSR that captures the same process. This equivalence establishes a bridge between three previously separate formalisms: POMDPs, PSRs, and the well‑studied theory of multiplicity automata.

Armed with this bridge, the authors develop a planning algorithm that operates directly on the MA representation. The algorithm iteratively updates value estimates using linear algebraic operations on the MA’s transition matrices, avoiding explicit enumeration of the underlying hidden state space. Its computational complexity is exponential only in the MA rank r, i.e., O(exp(r)), rather than in the number of underlying POMDP states |S|. Consequently, when the PSR (and thus the MA) has logarithmic rank relative to the original POMDP size—a situation that occurs in many structured or factored domains—the algorithm runs in polynomial time with respect to |S| and yields an optimal policy.

The paper provides a thorough theoretical analysis: it proves the correctness of the MA‑based value iteration, establishes convergence guarantees, and derives bounds showing that the algorithm’s error scales with the approximation error of the PSR/MA representation. It also discusses how the rank can be interpreted as a measure of “predictive complexity”: low rank indicates that a small set of predictive tests suffices to capture the dynamics, which is exactly the situation where PSRs are most beneficial.

Empirical evaluation is conducted on two benchmark domains. The first is a classic GridWorld POMDP with stochastic observations; the second is a continuous‑space robotic arm control problem where the agent receives noisy joint‑angle measurements. In both cases, the authors compute the minimal PSR rank (via singular‑value analysis of the Hankel matrix) and find it to be dramatically smaller than the raw state space size. The MA‑based planner outperforms standard value‑iteration and point‑based POMDP solvers by an order of magnitude in runtime while achieving identical or better expected returns. Notably, when the rank is on the order of log|S|, the planning time drops to a few seconds, illustrating the practical impact of the rank‑dependent complexity.

In the concluding section, the authors highlight three main contributions: (1) a formal, lossless reduction from POMDPs to multiplicity automata; (2) the identification of the MA rank as the exact counterpart of PSR dimension, providing a clear structural metric for tractability; and (3) a novel planning algorithm whose complexity scales with this rank rather than the full state space. They suggest future work on learning low‑rank MA models from data, extending the approach to infinite‑horizon discounted settings, and integrating the method with reinforcement‑learning pipelines that can adaptively refine the MA representation. Overall, the paper offers a compelling synthesis of automata theory and decision‑making under uncertainty, opening a concrete pathway to efficient planning in structured POMDPs where predictive complexity is low.