Qualitative MDPs and POMDPs: An Order-Of-Magnitude Approximation

We develop a qualitative theory of Markov Decision Processes (MDPs) and Partially Observable MDPs that can be used to model sequential decision making tasks when only qualitative information is available. Our approach is based upon an order-of-magnitude approximation of both probabilities and utilities, similar to epsilon-semantics. The result is a qualitative theory that has close ties with the standard maximum-expected-utility theory and is amenable to general planning techniques.

💡 Research Summary

The paper “Qualitative MDPs and POMDPs: An Order‑Of‑Magnitude Approximation” proposes a framework for sequential decision‑making when only qualitative information about probabilities and utilities is available. The authors observe that classical Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs) rely on precise numerical transition probabilities, observation models, and reward functions—data that are often missing or unreliable in real‑world domains. To address this gap, they introduce an order‑of‑magnitude (OOM) approximation inspired by ε‑semantics and κ‑rankings, which replaces exact numbers with symbolic exponents that capture the relative “size” of probabilities and utilities.

Core Concepts

ε‑Semantics and κ‑Rankings: A small positive constant ε (conceptually infinitesimal) is fixed. Any probability p is approximated as ε^k where k ∈ ℤ ∪ {∞} is the κ‑rank. Larger k indicates a “very small” probability. Utilities are treated analogously: a reward u becomes ε^l, with l representing the order of magnitude of the reward. Arithmetic on these quantities follows exponent rules: ε^a·ε^b = ε^{a+b}, and comparisons reduce to integer ordering of the exponents.
Qualitative MDP (QMDP): A QMDP is defined by a state set S, action set A, a κ‑rank transition function τ(s,a,s′)=k_{ss′}^a, and a κ‑rank reward function r(s,a)=l_{s}^a. The expected utility of a state satisfies a qualitative Bellman equation: \