We consider a class of optimal control problems, with finite or infinite horizon, for a continuous-time Markov chain with finite state space. In this case, the control process affects the transition rates. We suppose that the controlled process can not be observed, and at any time the control actions are chosen based on the observation of a related stochastic process perturbed by an exogenous Brownian motion. We describe a construction of the controlled Markov chain, having stochastic transition rates adapted to the observation filtration. By a change of probability measure of Girsanov type, we introduce the so-called separated optimal control problem, where the state is the conditional (unnormalized) distribution of the controlled Markov chain and the observation process becomes a driving Brownian motion, and we prove the equivalence with the original control problem. The controlled equations for the separated problem are an instance of the Wonham filtering equations. Next we present an analysis of the separated problem: we characterize the value function as the unique viscosity solution to the dynamic programming equations (both in the parabolic and the elliptic case) we prove verifications theorems and a version of the stochastic maximum principle in the form of a necessary conditions for optimality.
This paper is devoted to the study of optimal control problems for controlled Markov chains with partial observation. Except for some initial general constructions, we will consider controlled Markov processes (X α t ) t≥0 which are time-continuous and with values in a finite state space S. The controlled process depends on a control process (α t ), with values in a general action space A, which is chosen in order to maximize a reward functional of the form
for the finite and infinite horizon cases, where f , g are given real functions and β > 0 is a discount factor (below we also consider some slightly more general reward functionals). Here Ē denotes the expectation with respect to some probability P, called the “physical” probability to distinguish it from the reference probability P introduced below.
We consider the case of partial observation, namely when the state is not directly observable and the choice of the control α t at any time t is based on the observation of the past values of another related process, denoted (W t ) t≥0 . In the literature the related terminology Hidden (or Latent) Markov Model is also used. Thus, the control process (α t ) will be required to be (F W t )-predictable, where (F W t ) is the σ-algebra generated by (W t ). In our model we assume that the observation process W takes values in R d and has the form
where h : S × A → R d is a given function and (B t ) t≥0 is a Brownian motion in R d . Among many possible variations, this model -controlled Markov chain with observation corrupted by Brownian noise -is often deemed to be of basic importance.
The main route to the solution of the optimal control problem -that we also adopt in this paperconsists in reducing it to a different problem with complete observation (sometimes called the separated problem) where the controlled state process is given by the so-called filter process, whose values at time t are conditional distributions of the unobserved process X α t given F W t . For our model, in the uncontrolled case, explicit recursive equations for the filter were obtained in [22] and their solutions are called the Wonham filter.
There is a huge literature on partially observed control problems and we refer the reader to the monographs [2], [16] and [8] which include expositions of the required technical prerequisites and contain extensive references. The books [2] and [16] mainly consider the case when the controlled process is defined as the solution to a controlled stochastic differential equation in Euclidean space driven by a Brownian motion. The treatise [8] presents a large number of hidden Markov models with many variations with respect to our case, for instance discrete-time problems, continuous state spaces, different observation models and so on. In the sequel we will also refer to [1] and [3], dealing with technical aspects on stochastic filtering theory and optimal control of marked point processes.
The analysis of our model is of course made easier by the assumption that the state space S is finite, but it turns out that a direct application of general existing theories does not yield satisfactory results, as it requires unnecessary assumptions or it does not give sharp conclusions. It is the purpose of this paper to present a rather complete analysis of the model sketched above, with various methodologies (stochastic maximum principle and dynamic programming, including analysis of the Hamilton-Jacobi-Bellman equation), encompassing the finite and infinite horizon case and with a careful formulation of the optimization problem. Except for some natural boundedness or continuity assumptions on the coefficients (the functions f , g, h introduced above, as well as the controlled transition rates presented below) we try to be as general as possible.
In order to explain more carefully our contributions we have to enter some technical details while we describe the plan of the paper at the same time. The first issue concerns the construction of a controlled Markov chain. In this case the transition rate from state i ∈ S to state j ̸ = i, denoted q(a, i, j), depends on the choice of the control parameter a ∈ A. Given the functions q(a, i, j) and an F W -predictable control process (α t ) the aim is to construct a process (X α t ) admitting stochastic transition rates q(α t , i, j). The precise meaning of this, according to most of the literature, is that the random measure q(α t , X α t-, j)dt is the compensator of the process N t (j) which counts the number of jumps of (X α t ) to the state j in the time interval [0, t], namely N t (j) -t 0 q(α s , X α s-, j)ds is a martingale with respect to the filtration generated by (W t ) and by the controlled process itself. When there is no observation process and the only filtration is the natural one, the existence of the controlled process may be deduced from a general result on a martingale problem for marked point process: see [13]. In this case the controlled process
This content is AI-processed based on open access ArXiv data.