Partially observed controlled Markov chains and optimal control of the Wonham filter

February 20, 2026

Reading time: 6 minute

...

📝 Original Info

Title: Partially observed controlled Markov chains and optimal control of the Wonham filter
ArXiv ID: 2602.16392
Date: 2026-02-18
Authors: ** 논문의 저자는 명시되지 않았으나, 참고문헌에 등장하는 주요 연구자들(예: B. Øksendal, H. Pham, J. Yong, S. Peng)과 해당 분야의 전문가들이 공동 연구했을 가능성이 높다. **

📝 Abstract

We consider a class of optimal control problems, with finite or infinite horizon, for a continuous-time Markov chain with finite state space. In this case, the control process affects the transition rates. We suppose that the controlled process can not be observed, and at any time the control actions are chosen based on the observation of a related stochastic process perturbed by an exogenous Brownian motion. We describe a construction of the controlled Markov chain, having stochastic transition rates adapted to the observation filtration. By a change of probability measure of Girsanov type, we introduce the so-called separated optimal control problem, where the state is the conditional (unnormalized) distribution of the controlled Markov chain and the observation process becomes a driving Brownian motion, and we prove the equivalence with the original control problem. The controlled equations for the separated problem are an instance of the Wonham filtering equations. Next we present an analysis of the separated problem: we characterize the value function as the unique viscosity solution to the dynamic programming equations (both in the parabolic and the elliptic case) we prove verifications theorems and a version of the stochastic maximum principle in the form of a necessary conditions for optimality.

💡 Deep Analysis

📄 Full Content

This paper is devoted to the study of optimal control problems for controlled Markov chains with partial observation. Except for some initial general constructions, we will consider controlled Markov processes (X α t ) t≥0 which are time-continuous and with values in a finite state space S. The controlled process depends on a control process (α t ), with values in a general action space A, which is chosen in order to maximize a reward functional of the form

for the finite and infinite horizon cases, where f , g are given real functions and β > 0 is a discount factor (below we also consider some slightly more general reward functionals). Here Ē denotes the expectation with respect to some probability P, called the “physical” probability to distinguish it from the reference probability P introduced below.

We consider the case of partial observation, namely when the state is not directly observable and the choice of the control α t at any time t is based on the observation of the past values of another related process, denoted (W t ) t≥0 . In the literature the related terminology Hidden (or Latent) Markov Model is also used. Thus, the control process (α t ) will be required to be (F W t )-predictable, where (F W t ) is the σ-algebra generated by (W t ). In our model we assume that the observation process W takes values in R d and has the form

where h : S × A → R d is a given function and (B t ) t≥0 is a Brownian motion in R d . Among many possible variations, this model -controlled Markov chain with observation corrupted by Brownian noise -is often deemed to be of basic importance.

The main route to the solution of the optimal control problem -that we also adopt in this paperconsists in reducing it to a different problem with complete observation (sometimes called the separated problem) where the controlled state process is given by the so-called filter process, whose values at time t are conditional distributions of the unobserved process X α t given F W t . For our model, in the uncontrolled case, explicit recursive equations for the filter were obtained in [22] and their solutions are called the Wonham filter.

There is a huge literature on partially observed control problems and we refer the reader to the monographs [2], [16] and [8] which include expositions of the required technical prerequisites and contain extensive references. The books [2] and [16] mainly consider the case when the controlled process is defined as the solution to a controlled stochastic differential equation in Euclidean space driven by a Brownian motion. The treatise [8] presents a large number of hidden Markov models with many variations with respect to our case, for instance discrete-time problems, continuous state spaces, different observation models and so on. In the sequel we will also refer to [1] and [3], dealing with technical aspects on stochastic filtering theory and optimal control of marked point processes.

The analysis of our model is of course made easier by the assumption that the state space S is finite, but it turns out that a direct application of general existing theories does not yield satisfactory results, as it requires unnecessary assumptions or it does not give sharp conclusions. It is the purpose of this paper to present a rather complete analysis of the model sketched above, with various methodologies (stochastic maximum principle and dynamic programming, including analysis of the Hamilton-Jacobi-Bellman equation), encompassing the finite and infinite horizon case and with a careful formulation of the optimization problem. Except for some natural boundedness or continuity assumptions on the coefficients (the functions f , g, h introduced above, as well as the controlled transition rates presented below) we try to be as general as possible.

In order to explain more carefully our contributions we have to enter some technical details while we describe the plan of the paper at the same time. The first issue concerns the construction of a controlled Markov chain. In this case the transition rate from state i ∈ S to state j ̸ = i, denoted q(a, i, j), depends on the choice of the control parameter a ∈ A. Given the functions q(a, i, j) and an F W -predictable control process (α t ) the aim is to construct a process (X α t ) admitting stochastic transition rates q(α t , i, j). The precise meaning of this, according to most of the literature, is that the random measure q(α t , X α t-, j)dt is the compensator of the process N t (j) which counts the number of jumps of (X α t ) to the state j in the time interval [0, t], namely N t (j) -t 0 q(α s , X α s-, j)ds is a martingale with respect to the filtration generated by (W t ) and by the controlled process itself. When there is no observation process and the only filtration is the natural one, the existence of the controlled process may be deduced from a general result on a martingale problem for marked point process: see [13]. In this case the controlled process

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on open access ArXiv data.

Partially observed controlled Markov chains and optimal control of the Wonham filter

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

A Class of algebras admitting infinitely many norm topologies

A Fully Discrete Nonnegativity-Preserving FEM for a Stochastic Heat Equation

A New Lower Bound for the Diagonal Poset Ramsey Numbers

Start searching

No results found