Qualitative MDPs and POMDPs: An Order-Of-Magnitude Approximation
We develop a qualitative theory of Markov Decision Processes (MDPs) and Partially Observable MDPs that can be used to model sequential decision making tasks when only qualitative information is available. Our approach is based upon an order-of-magnitude approximation of both probabilities and utilities, similar to epsilon-semantics. The result is a qualitative theory that has close ties with the standard maximum-expected-utility theory and is amenable to general planning techniques.
đĄ Research Summary
The paper âQualitative MDPs and POMDPs: An OrderâOfâMagnitude Approximationâ proposes a framework for sequential decisionâmaking when only qualitative information about probabilities and utilities is available. The authors observe that classical Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs) rely on precise numerical transition probabilities, observation models, and reward functionsâdata that are often missing or unreliable in realâworld domains. To address this gap, they introduce an orderâofâmagnitude (OOM) approximation inspired by Δâsemantics and Îșârankings, which replaces exact numbers with symbolic exponents that capture the relative âsizeâ of probabilities and utilities.
Core Concepts
-
ΔâSemantics and ÎșâRankings: A small positive constant Δ (conceptually infinitesimal) is fixed. Any probability p is approximated as Δ^k where k â †âȘ {â} is the Îșârank. Larger k indicates a âvery smallâ probability. Utilities are treated analogously: a reward u becomes Δ^l, with l representing the order of magnitude of the reward. Arithmetic on these quantities follows exponent rules: Δ^a·Δ^b = Δ^{a+b}, and comparisons reduce to integer ordering of the exponents.
-
Qualitative MDP (QMDP): A QMDP is defined by a state set S, action set A, a Îșârank transition function Ï(s,a,sâČ)=k_{ssâČ}^a, and a Îșârank reward function r(s,a)=l_{s}^a. The expected utility of a state satisfies a qualitative Bellman equation: \