Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk
We propose risk-sensitive reinforcement learning algorithms catering to three families of risk measures, namely expectiles, utility-based shortfall risk and optimized certainty equivalent risk. For each risk measure, in the context of a finite horizon Markov decision process, we first derive a policy gradient theorem. Second, we propose estimators of the risk-sensitive policy gradient for each of the aforementioned risk measures, and establish $\mathcal{O}\left(1/m\right)$ mean-squared error bounds for our estimators, where $m$ is the number of trajectories. Further, under standard assumptions for policy gradient-type algorithms, we establish smoothness of the risk-sensitive objective, in turn leading to stationary convergence rate bounds for the overall risk-sensitive policy gradient algorithm that we propose. Finally, we conduct numerical experiments to validate the theoretical findings on popular RL benchmarks.
💡 Research Summary
This paper develops a unified risk‑sensitive reinforcement‑learning (RL) framework that accommodates three important families of convex risk measures: expectiles, utility‑based shortfall risk (UBSR), and optimized certainty equivalent (OCE) risk. For each risk measure, the authors work within a finite‑horizon Markov decision process (MDP) and first derive a policy‑gradient theorem that expresses the gradient of the risk‑sensitive objective with respect to the policy parameters.
The expectile risk is defined via a smooth asymmetric quadratic loss eν(x)=x²·|ν−I{x≤0}|, whose minimizer ξν(θ) is the expectile of the discounted return distribution under policy πθ. By exploiting the strong convexity and smoothness of the loss, the authors prove that ξν(θ) is a unique, differentiable function of θ and obtain the gradient formula
∇ξν(θ)=Eθ
Comments & Academic Discussion
Loading comments...
Leave a Comment