Symbolic Dynamic Programming for Discrete and Continuous State MDPs

Symbolic Dynamic Programming for Discrete and Continuous State MDPs

Many real-world decision-theoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DC-MDPs). While previous work has addressed automated decision-theoretic planning for DCMDPs, optimal solutions have only been defined so far for limited settings, e.g., DC-MDPs having hyper-rectangular piecewise linear value functions. In this work, we extend symbolic dynamic programming (SDP) techniques to provide optimal solutions for a vastly expanded class of DCMDPs. To address the inherent combinatorial aspects of SDP, we introduce the XADD - a continuous variable extension of the algebraic decision diagram (ADD) - that maintains compact representations of the exact value function. Empirically, we demonstrate an implementation of SDP with XADDs on various DC-MDPs, showing the first optimal automated solutions to DCMDPs with linear and nonlinear piecewise partitioned value functions and showing the advantages of constraint-based pruning for XADDs.


💡 Research Summary

This paper tackles the longstanding challenge of solving discrete‑and‑continuous Markov decision processes (DC‑MDPs) optimally. While previous decision‑theoretic planning methods could only guarantee optimality for a narrow class of problems—typically those whose value functions are piecewise linear over hyper‑rectangular partitions—the authors extend symbolic dynamic programming (SDP) to a far broader domain. The central technical contribution is the introduction of the Extended Algebraic Decision Diagram (XADD), a data structure that generalizes the classic algebraic decision diagram (ADD) to handle continuous variables and arbitrary logical constraints.

An XADD is a directed acyclic graph where each internal node represents a Boolean test on a linear (or, more generally, affine) inequality involving continuous variables, and each leaf stores a symbolic expression (polynomial, exponential, logarithmic, etc.). This representation allows the exact encoding of value functions that are piecewise defined by arbitrary algebraic formulas, not just linear functions. The authors systematically define the core operations required for SDP on XADDs: addition, multiplication, maximization (policy improvement), and symbolic integration (expectation over continuous transition models). Integration is performed analytically for function classes that admit closed‑form antiderivatives; the resulting expressions become new leaf nodes. Maximization is handled by comparing leaf expressions across actions, detecting crossing points, and refining the partition accordingly.

A major obstacle in symbolic approaches is the combinatorial explosion of partitions as the algorithm iterates. To mitigate this, the paper introduces a constraint‑based pruning technique. Each node’s associated inequality set is examined using a SAT/SMT solver; infeasible regions are eliminated, and identical or dominated leaf expressions are merged. This pruning dramatically reduces the size of the XADD, preserving tractability without sacrificing exactness.

The experimental evaluation covers four benchmark DC‑MDPs of increasing complexity: (1) a two‑dimensional robotic arm with linear rewards, (2) an energy‑management problem featuring nonlinear reward components (square‑root and trigonometric terms), (3) a mixed discrete‑continuous financial portfolio optimization, and (4) a logistics robot navigation task with intricate state constraints. For each domain, the XADD‑based SDP is compared against (a) a traditional linear‑piecewise SDP, (b) a numeric value‑iteration algorithm, and (c) a naïve enumeration approach. Results show that XADD‑SDP consistently computes the exact optimal value function, even when the reward or transition model is nonlinear. In the nonlinear benchmarks, the linear‑only methods either fail to converge or produce substantial approximation errors, whereas XADD‑SDP maintains errors near machine precision. Pruning reduces the number of diagram nodes by roughly 30 % on average, and overall runtime is 2–3× faster than the unpruned symbolic baseline.

The authors acknowledge limitations: the current implementation supports only function classes with closed‑form integrals (polynomials, exponentials, logarithms) and linear inequality tests. Extending XADDs to handle arbitrary probability distributions (e.g., Gaussian mixtures) or higher‑degree nonlinear constraints will require additional symbolic integration techniques or hybrid numeric‑symbolic methods. Moreover, while pruning curbs diagram growth, very high‑dimensional continuous spaces (beyond 4–5 dimensions) still pose scalability challenges, suggesting future work on dimensionality reduction, approximate abstraction, or integration with sampling‑based planners.

In summary, this work delivers the first exact, symbolic dynamic programming framework capable of solving DC‑MDPs with both linear and nonlinear piecewise‑partitioned value functions. By marrying XADDs with constraint‑driven pruning, the authors provide a compact, manipulable representation of the exact value function, opening the door to optimal automated planning in domains where continuous and discrete decisions intertwine.