Risk-sensitive Markov control processes

Risk-sensitive Markov control processes
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce a general framework for measuring risk in the context of Markov control processes with risk maps on general Borel spaces that generalize known concepts of risk measures in mathematical finance, operations research and behavioral economics. Within the framework, applying weighted norm spaces to incorporate also unbounded costs, we study two types of infinite-horizon risk-sensitive criteria, discounted total risk and average risk, and solve the associated optimization problems by dynamic programming. For the discounted case, we propose a new discount scheme, which is different from the conventional form but consistent with the existing literature, while for the average risk criterion, we state Lyapunov-like stability conditions that generalize known conditions for Markov chains to ensure the existence of solutions to the optimality equation.


💡 Research Summary

The paper develops a unified, mathematically rigorous framework for risk‑sensitive Markov control processes (MCPs) on general Borel spaces. Starting from the classical MCP components—state space X, action space A, transition kernel Q, and cost function c—the authors replace the usual conditional expectation in the objective with a “risk map” R that acts on functions of the next state. A risk map satisfies the three basic axioms of a risk measure (monotonicity, translation invariance, and centralization) and may additionally be convex, concave, homogeneous, or coherent, allowing the modeling of risk‑averse, risk‑seeking, or mixed preferences as encountered in behavioral economics.

The authors first study risk maps without control, introducing sub‑module (R♯) and upper‑module (R) constructions that provide, respectively, a sub‑linear and a homogeneous envelope of a given risk map. They prove that R♯ dominates the original map, that both modules preserve the risk‑measure properties, and that when the original map is coherent the two envelopes coincide with it. These results give powerful algebraic tools for later dynamic programming analysis.

Control is then incorporated by defining policy‑dependent transition kernels P^π and cost expectations c^π, and by extending the risk map to R^π. Complex risk maps can be built by convex combinations of simpler ones (e.g., mixing an entropic risk‑averse map with a risk‑seeking map), which greatly expands modeling flexibility.

Two infinite‑horizon criteria are examined. For the discounted total risk, the paper proposes a novel discounting scheme: instead of the conventional α·E


Comments & Academic Discussion

Loading comments...

Leave a Comment