Average optimality for risk-sensitive control with general state space

Reading time: 5 minute
...

📝 Abstract

This paper deals with discrete-time Markov control processes on a general state space. A long-run risk-sensitive average cost criterion is used as a performance measure. The one-step cost function is nonnegative and possibly unbounded. Using the vanishing discount factor approach, the optimality inequality and an optimal stationary strategy for the decision maker are established.

💡 Analysis

This paper deals with discrete-time Markov control processes on a general state space. A long-run risk-sensitive average cost criterion is used as a performance measure. The one-step cost function is nonnegative and possibly unbounded. Using the vanishing discount factor approach, the optimality inequality and an optimal stationary strategy for the decision maker are established.

📄 Content

  1. Introduction and the model. This paper deals with discrete-time Markov control processes on a general state space. The one-step cost function is nonnegative and possibly unbounded. The decision maker is supposed to be risk-averse with a constant risk coefficient γ > 0. The risk-sensitive average cost criterion is used as a performance measure. The aim of the work is to establish the optimality inequality for risk-sensitive dynamic programming and derive an optimal stationary policy. The result is proved under two different sets of compactness-continuity assumptions, namely, for Markov control processes with weakly continuous transition probabilities [Condition (W)], as well as transition probabilities that are continuous with respect to setwise convergence [Condition (S)]. A similar problem for risk-neutral stochastic control models has been examined in [27] using the vanishing discount factor approach. However, it is well known that, for risk-sensitive control models, an analogous approximation of the average cost via a sequence of the corresponding discounted models does not work. Instead of this, following [9,15,16], we introduce an auxiliary discounted minimax problem. A variational formula that expresses the mutual relationship between the relative entropy function and the logarithmic moment-generating function enables us to connect the discounted minimax model with the original one.

Next, assuming that a certain family of functions is bounded [Condition (B)] and using Fatou’s lemma (for weakly or setwise convergent measures), we obtain the optimality inequality.

The predecessor of our result is Theorem 4.1 in [16], where the optimality inequality for the risk-sensitive dynamic programming with a countable state space was established. Instead of boundedness assumption (B), Hernández-Hernández and Marcus [16] assume that there exists a stationary policy which induces a finite average cost that is equal some constant in each state. On the other hand, it is well known that an optimal risk-sensitive average cost may depend on the initial state (see Example 1). This behavior happens if the risk factor is too large. Instead of this restriction on the risk coefficient, we use Condition (B), which makes the process reach “good states” sufficiently fast.

There is a rich literature in risk-sensitive control, going back at least to the seminal works of Howard and Matheson [18] and Jacobson [19], which covered the finite horizon case. The average cost criterion on the infinite horizon was studied in [5,8,14,15,16,31] for a denumerable state space and in [10,11,20] for a general state space. It is also worth mentioning that risk-sensitive control finds natural applications in portfolio managment, where the objective is to maximize the growth rate of the expected utility of wealth; see [3,4,30] and the references cited therein.

The paper is organized as follows. Below a Markov control model with the long-run average cost criterion as a performance measure is described, as well as some basic notation is set up. In Section 2 we introduce preliminaries and present the auxiliary discounted minimax problem, which is, in turn, solved in Section 3. The main result is established in Section 4. Section 5 contains a discussion of Condition (B), and in the Appendix a variational formula for the logarithmic moment-generating function is stated.

A discrete-time Markov control process is specified by the following objects:

(i) The state space X is a standard Borel space (i.e., a nonempty Borel subset of some Polish space).

(ii) A is a Borel action space.

(iii) K is a nonempty Borel subset of X ×A. We assume that, for each x ∈ X, the nonempty x-section A(x) = {a ∈ A : (x, a) ∈ K} of K is compact and represents the set of actions available in state x.

(iv) q is a regular conditional distribution from K to X.

(v) The one-step cost function c is a Borel measurable mapping from K to [0, +∞].

Then the history spaces are defined as

The class of stationary policies is identified with the class F of measurable functions f from X to A such that f (x) ∈ A(x). It is well known that F is nonempty [6]. By the Ionescu-Tulcea theorem [24], for each policy π and each initial state x 0 = x, a probability measure P π x and a stochastic process {(x k , a k )} are defined on H ∞ in a canonical way, where x k and a k describe the state and the decision at stage k, respectively. By E π

x we denote the expectation operator with respect to the probability measure P π

x . Let γ > 0 be a given risk factor. For any initial state x ∈ X and policy π ∈ Π, we define the following risk-sensitive average cost criterion:

Our aim is to minimize J(x, π) within the class of all policies and find a policy π * , for which

Throughout the paper the following assumption will be supposed to hold true even without explicit reference:

Remark 1. Throughout the remainder, we assume that the risk factor γ > 0 is arbitrary and fixed. Therefore, here and subse

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut