Control Design for Markov Chains under Safety Constraints: A Convex Approach
This paper focuses on the design of time-invariant memoryless control policies for fully observed controlled Markov chains, with a finite state space. Safety constraints are imposed through a pre-selected set of forbidden states. A state is qualified as safe if it is not a forbidden state and the probability of it transitioning to a forbidden state is zero. The main objective is to obtain control policies whose closed loop generates the maximal set of safe recurrent states, which may include multiple recurrent classes. A design method is proposed that relies on a finitely parametrized convex program inspired on entropy maximization principles. A numerical example is provided and the adoption of additional constraints is discussed.
💡 Research Summary
The paper addresses the problem of synthesizing time‑invariant, memoryless control policies for fully observed, finite‑state Markov decision processes (MDPs) under explicit safety requirements. A set F of forbidden states is prescribed a priori; a state s is deemed “safe” if (i) s∉F and (ii) the probability of transitioning from s to any state in F under the chosen control action is zero. The authors’ objective is to construct a policy that maximizes the set of safe recurrent states generated by the closed‑loop Markov chain. Importantly, the maximal safe set may consist of several disjoint recurrent classes, a situation that most existing works, which typically target a single recurrent class, do not handle.
To achieve this, the authors formulate a convex optimization problem inspired by entropy maximization. For each state‑action pair (s,a) they introduce a non‑negative variable x(s,a) representing the long‑run joint occupation measure of being in s and applying action a. The variables satisfy linear constraints that encode (i) probability conservation (the sum of x(s,a) over actions equals the stationary probability of s), (ii) the Markov transition dynamics (the stationary distribution of the next state is a linear function of x(s,a) and the known transition probabilities P(s′|s,a)), and (iii) the safety requirements (zero stationary mass on any forbidden state and zero flow into forbidden states).
The objective function is the negative Shannon entropy –∑_{s,a} x(s,a) log x(s,a). Maximizing entropy spreads the occupation measure as uniformly as possible over the admissible region, preventing the solution from arbitrarily favoring a particular recurrent class when multiple safe classes exist. Because the entropy function is convex and all constraints are linear, the resulting program is a standard convex program that can be solved efficiently with interior‑point methods or any off‑the‑shelf convex solver.
A notable strength of the formulation is its modularity. Additional operational constraints—such as limits on the frequency of specific actions, energy or cost budgets (∑_{s,a} c(s,a) x(s,a) ≤ C), or minimum dwell‑time requirements in certain states (∑_a x(s,a) ≥ β)—can be incorporated as extra linear inequalities without destroying convexity. This flexibility makes the approach attractive for real‑world applications where safety must coexist with performance, resource, or regulatory constraints.
The authors illustrate the method on a small example with six states and two control inputs. By declaring state 3 as forbidden, the convex program yields a policy that renders states {1,2,4,5,6} safe and recurrent, while guaranteeing zero probability of ever reaching state 3. The example also demonstrates how imposing an extra action‑frequency constraint reshapes the optimal occupation measure, confirming the method’s adaptability.
In summary, the paper contributes a principled, convex‑optimization‑based framework for safety‑constrained control of Markov chains. By leveraging entropy maximization, it ensures a balanced distribution over all admissible safe recurrent classes, and by keeping the problem linear‑convex, it remains computationally tractable even when enriched with a variety of practical constraints. This work bridges a gap between theoretical safety guarantees and implementable control synthesis for stochastic discrete‑time systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment