Multi-Environment MDPs with Prior and Universal Semantics
Multiple-environment Markov decision processes (MEMDPs) equip an MDP with several probabilistic transition functions (one per possible environment) so that the state is observable but the environment is not. Previous work studies two semantics: (i) the universal semantics, where an adversary picks the environment; and (ii) the prior semantics, where the environment is drawn once before execution from a fixed distribution. We clarify the relation between these semantics. For parity objectives, we show that the qualitative questions, i.e. value one, coincide, and we develop a new algorithm for the general value of MEMDP with prior semantics. In particular, we show that the prior value of an MEMDP with a parity objective can be approximated to any precision with a space efficient algorithm; equivalently, the associated gap problem is decidable in PSPACE when probabilities are given in unary (and in EXPSPACE otherwise). We then prove that the universal value equals the infimum of prior values over all beliefs. This yields a new algorithm for the universal gap problem with the same complexity (PSPACE for unary probabilities, EXPSPACE in general), improving on earlier doubly-exponential-space procedures. Finally, we observe that MEMDPs under the prior semantics form an important tractable subclass of POMDPs: our algorithms exploit the fact that belief entropy never increases, and we establish that any POMDP with this property reduces effectively to a prior-MEMDP, showing that prior-MEMDPs capture a broad and practically relevant subclass of POMDPs.
💡 Research Summary
This paper investigates Multi‑Environment Markov Decision Processes (MEMDPs), a model that augments a classical MDP with several probabilistic transition functions—one for each possible environment—while keeping the state observable and the environment hidden. Two semantics have been studied previously: (i) the universal (adversarial) semantics, where an adversary selects the environment before the controller acts, and (ii) the prior semantics, where the environment is drawn once from a fixed distribution (the prior) before execution. The authors clarify the relationship between these semantics, focusing on parity objectives, a canonical class of ω‑regular properties.
First, they prove a qualitative equivalence: for parity objectives, the “value‑1” (almost‑sure) problem coincides under both semantics. In other words, a strategy that achieves probability 1 in the worst‑case environment exists if and only if a strategy that achieves probability 1 in expectation with respect to a full‑support prior exists. This result unifies earlier separate analyses and yields PSPACE‑completeness for the almost‑sure and limit‑sure decision problems (PSPACE‑complete in general, polynomial when the number of environments is fixed).
The core technical contribution is an algorithm that approximates the prior value of an MEMDP to any desired precision ε. The algorithm solves the ε‑gap problem: given a MEMDP M, a prior belief b over environments, a parity objective W, a threshold α∈(0,1), and precision ε>0, it answers YES if the prior value is at least α, NO if it is at most α−ε, and may answer arbitrarily otherwise. The method relies on belief updates whenever a distinguishing state‑action pair is encountered. The belief update is defined by a Bayes‑style rule λ
Comments & Academic Discussion
Loading comments...
Leave a Comment