Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty
We consider Markov decision processes under parameter uncertainty. Previous studies all restrict to the case that uncertainties among different states are uncoupled, which leads to conservative solutions. In contrast, we introduce an intuitive concept, termed “Lightning Does not Strike Twice,” to model coupled uncertain parameters. Specifically, we require that the system can deviate from its nominal parameters only a bounded number of times. We give probabilistic guarantees indicating that this model represents real life situations and devise tractable algorithms for computing optimal control policies using this concept.
💡 Research Summary
The paper addresses the problem of decision‑making under parametric uncertainty in Markov decision processes (MDPs). Classical robust MDP formulations assume that uncertainties in transition probabilities and rewards are uncoupled across states, which forces the planner to guard against the worst‑case realization at every time step. While this yields safety guarantees, it often produces overly conservative policies because in real systems parameter deviations tend to be rare and correlated rather than occurring independently at every state.
To capture this phenomenon the authors introduce the “Lightning Does Not Strike Twice” (LDNST) model. The key idea is to bound the total number of times the true parameters may deviate from a nominal set during the entire planning horizon. Formally, let θ⁰ denote the nominal parameter vector and Θ the full uncertainty set. The LDNST uncertainty set is defined as
Θ(k) = {θ ∈ Θ | |{t | θ_t ≠ θ⁰}| ≤ k},
where k is a user‑specified integer that limits the number of deviations. This creates a coupled uncertainty structure: once a deviation has been used, the remaining budget shrinks, and future steps must respect the remaining allowance.
The authors provide probabilistic guarantees for the model. Assuming an underlying distribution P over parameter realizations, they show that the event “the number of deviations does not exceed k” occurs with probability at least 1 − δ, where δ can be bounded using Chebyshev or Markov inequalities. Consequently, a planner that optimizes against Θ(k) enjoys a confidence level of 1 − δ that the true system will remain within the considered uncertainty. They also develop a scenario‑based sampling bound: to achieve confidence 1 − δ with approximation error ε, it suffices to draw N = O((1/δ)·log(1/ε)) independent scenarios, each specifying the time indices and values of the at‑most‑k deviations.
Algorithmically, the paper extends standard dynamic programming (DP) by augmenting the state space with a “budget” variable m ∈ {0,…,k} that records how many deviations have already been consumed. The Bellman recursion becomes
V_t(s,m) = max_a min_{θ∈Θ(m)}
Comments & Academic Discussion
Loading comments...
Leave a Comment