Partially observed controlled Markov chains and optimal control of the Wonham filter

P artially observ ed con trolled Mark o v c hains and optimal con trol of the W onham ﬁlter F ulvia Confortola ∗ Marco F uhrman † Abstract W e consider a class of optimal control problems, with ﬁnite or inﬁnite horizon, for a contin uous-time Mark ov c hain with ﬁnite state space. In this case, the control pro cess aﬀects the transition rates. W e supp ose that the controlled pro cess can not b e observed, and at any time the control actions are c hosen based on the observ ation of a related sto chastic pro cess p erturbed b y an exogenous Bro wnian motion. W e describe a construction of the controlled Marko v chain, ha ving sto chastic transition rates adapted to the observ ation ﬁltration. By a change of probability measure of Girsanov type, we introduce the so- called separated optimal control problem, where the state is the conditional (unnormalized) distribution of the controlled Mark ov chain and the observ ation pro cess becomes a driving Brownian motion, and w e prov e the equiv alence with the original control problem. The controlled equations for the separated problem are an instance of the W onham ﬁltering equations. Next we presen t an analysis of the separated problem: w e c haracterize the v alue function as the unique viscosit y solution to the dynamic programming equations (b oth in the parab olic and the elliptic case) we prov e veriﬁcations theorems and a version of the sto chastic maximum principle in the form of a necessary conditions for optimality . MSC Classiﬁcation : 60H30; 60J27; 93E11; 93E20; 49L25. Key w ords : optimal control with partial observ ation; controlled hidden Marko v mo dels; W onham ﬁlter; Bellman’s equation; viscosity solutions; sto chastic maximum principle. 1 In tro duction This pap er is devoted to the study of optimal con trol problems for con trolled Marko v c hains with partial observ ation. Except for some initial general constructions, we will consider controlled Mark ov pro cesses ( X α t ) t ≥ 0 whic h are time-contin uous and with v alues in a ﬁnite state space S . The controlled process dep ends on a con trol process ( α t ), with v alues in a general action space A , whic h is chosen in order to maximize a rew ard functional of the form J ( α ) = ¯ E " Z T 0 f ( X α t , α t ) dt + g ( X α T ) # , or J ( α ) = ¯ E  Z ∞ 0 e − β t f ( X α t , α t ) dt  , for the ﬁnite and inﬁnite horizon cases, where f , g are giv en real functions and β > 0 is a discoun t factor (b elo w we also consider some slightly more general reward functionals). Here ¯ E denotes the expectation with resp ect to some probabilit y ¯ P , called the “ph ysical” probability to distinguish it from the reference probabilit y P introduced b elow. ∗ Dipartimento di Matematica, Politecnico di Milano, fulvia.confortola at polimi.it This author is a member of INdAM- GNAMP A. † Dipartimento di Matematica, Universit` a degli Studi di Milano, marco.fuhrman at unimi.it This author is a member of INdAM-GNAMP A. 1 W e consider the case of partial observ ation, namely when the state is not directly observ able and the c hoice of the control α t at any time t is based on the observ ation of the past v alues of another related pro cess, denoted ( W t ) t ≥ 0 . In the literature the related terminology Hidden (or Latent) Mark ov Model is also used. Thus, the control pro cess ( α t ) will be required to b e ( F W t )-predictable, where ( F W t ) is the σ -algebra generated by ( W t ). In our mo del we assume that the observ ation pro cess W tak es v alues in R d and has the form W t = Z t 0 h ( X α s , α s ) ds + B t (1.1) where h : S × A → R d is a given function and ( B t ) t ≥ 0 is a Brownian motion in R d . Among man y p ossible v ariations, this mo del - con trolled Mark ov c hain with observ ation corrupted by Bro wnian noise - is often deemed to b e of basic importance. The main route to the solution of the optimal control problem - that we also adopt in this paper - consists in reducing it to a diﬀeren t problem with complete observ ation (sometimes called the separated problem) where the controlled state process is given b y the so-called ﬁlter pro cess, whose v alues at time t are conditional distributions of the unobserved pro cess X α t giv en F W t . F or our mo del, in the uncon trolled case, explicit recursive equations for the ﬁlter were obtained in [22] and their solutions are called the W onham ﬁlter. There is a huge literature on partially observ ed con trol problems and w e refer the reader to the mono- graphs [2], [16] and [8] whic h include exp ositions of the required tec hnical prerequisites and con tain extensive references. The b o oks [2] and [16] mainly consider the case when the controlled process is deﬁned as the solution to a controlled sto chastic diﬀeren tial equation in Euclidean space driven by a Bro wnian motion. The treatise [8] presents a large num b er of hidden Marko v mo dels with man y v ariations with resp ect to our case, for instance discrete-time problems, con tinuous state spaces, diﬀerent observ ation mo dels and so on. In the sequel w e will also refer to [1] and [3], dealing with technical aspects on sto c hastic ﬁltering theory and optimal control of mark ed p oin t pro cesses. The analysis of our mo del is of course made easier b y the assumption that the state space S is ﬁnite, but it turns out that a direct application of general existing theories do es not yield satisfactory results, as it requires unnecessary assumptions or it do es not give sharp conclusions. It is the purp ose of this pap er to presen t a rather complete analysis of the model sketc hed ab o ve, with v arious metho dologies (sto chastic maxim um principle and dynamic programming, including analysis of the Hamilton-Jacobi-Bellman equation), encompassing the ﬁnite and inﬁnite horizon case and with a careful form ulation of the optimization problem. Except for some natural b oundedness or contin uity assumptions on the co eﬃcients (the functions f , g , h in tro duced ab o ve, as well as the con trolled transition rates presented b elo w) w e try to b e as general as p ossible. In order to explain more carefully our con tributions w e hav e to en ter some tec hnical details while we describ e the plan of the pap er at the same time. The ﬁrst issue concerns the construction of a controlled Mark ov c hain. In this case the transition rate from state i ∈ S to state j  = i , denoted q ( a, i, j ), depends on the c hoice of the control parameter a ∈ A . Given the functions q ( a, i, j ) and an F W -predictable control pro cess ( α t ) the aim is to construct a pro cess ( X α t ) admitting stochastic transition rates q ( α t , i, j ). The precise meaning of this, according to most of the literature, is that the random measure q ( α t , X α t − , j ) dt is the comp ensator of the pro cess N t ( j ) whic h counts the n umber of jumps of ( X α t ) to the state j in the time in terv al [0 , t ], namely N t ( j ) − Z t 0 q ( α s , X α s − , j ) ds is a martingale with respect to the ﬁltration generated b y ( W t ) and by the con trolled pro cess itself. When there is no observ ation pro cess and the only ﬁltration is the natural one, the existence of the con trolled pro cess may b e deduced from a general result on a martingale problem for mark ed point pro cess: see [13]. In this case the con trolled pro cess is deﬁned in a weak sense, as a la w on a canonical space. In the general case with observ ation, when the state S is ﬁnite, one can write do wn sto c hastic diﬀeren tial equations for a pure jump pro cess identifying S with a ﬁnite subset of R N : see [8] c hapter 12. In the present pap er we revert to a 2 diﬀeren t construction which is inspired by the Grigelionis theorem (see e.g. [3] section 5.7). It admits several v ariants: see for instance Section 3 of [4] for related results and references. W e construct the controlled pro cess in strong form ulation, starting from an auxiliary Poisson pro cess on an extended space and then taking an appropriate pro jection (depending on the control pro cess) on (0 , ∞ ) × S of the corresp onding random measure. This direct construction for a con trolled pure jump process has the adv antage that it can b e extended to general state space S . Section 2 is devoted to the exp osition of this result in its general form. In the following sections we apply the previous construction and w e formulate the optimal control problem. In order to introduce the separated control problem for the W onham ﬁlter one needs to p erform a change of probabilit y of Girsanov type: giv en the martingale ( Z α t ) − 1 = exp  − Z t 0 h ( X α s , α s ) dB s − 1 2 Z t 0 | h ( X α s , α s ) | 2 ds  , t ≥ 0 , (this inv olved notation is consisten t with the follo wing sections) one deﬁnes the so-called reference probability P setting d P ( dω ) = Z α T ( ω ) d ¯ P ( dω ) and the ﬁlter pro cess of the unnormalized conditional laws ρ i t = E [1 X α t = i Z α t | F W t ] , t ≥ 0 , i ∈ S. By the Girsanov theorem the observ ation W is a Brownian motion on [0 , T ] under P and it is well known (see e.g. [1]) that the pro cesses ( ρ i t ) solv e the Zak ai ﬁltering equations, that are called the W onham ﬁltering equations in this particular situation: dρ i t = X j ∈ S ρ j t q ( α t , j, i ) dt + ρ i t h ( i, α t ) dW t , i ∈ S. (1.2) The reward functional tak es the form J ( α ) = E " Z T 0 X i ∈ S ρ i t f ( i, α t ) dt + X i ∈ S ρ i T g ( i ) # , or J ( α ) = E " Z ∞ 0 e − β t X i ∈ S ρ i t f ( i, α t ) dt # . (1.3) This wa y w e obtain the separated control problem, where the new state equation is now (1.2), driven by the observ ation Brownian motion ( W t ), so that the new con trol problem is fully observ ed. As it is customary (see e.g. [2]) it is more conv enient to formulate the entire setting under the reference probability P from the b eginning and to p erform the in verse Girsanov transformation to construct the physical probability ¯ P : this w ay one obtains a weak form ulation of the original con trol problem under ¯ P . Section 3 is devoted to the presen tation of this standard material, and it also contains some preliminary prop erties of the corresp onding v alue function. In the following sections we address the optimal control problem for the state equation (1.2) and the rew ard (1.3). The controlled state ρ t = ( ρ i t ) i ∈ S ev olves in the state space D = { x = ( x 1 , . . . , x N ) ∈ R N : x i ≥ 0 , i = 1 , . . . , N } . W e ﬁrst consider the dynamic programming approac h. W e introduce the v alue functions v ( t, x ) or v ( x ) for the ﬁnite or inﬁnite horizon cases, where x ∈ D denotes the starting state. The v alue functions are related to the Hamilton-Jacobi-Bellman (HJB) equations, whic h are, respectively , parab olic and elliptic equations on D . F or instance, in the elliptic case for a function v ( x ) = v ( x 1 , . . . , x N ) this is: β v ( x ) − sup a ∈ A  1 2 X ij ∂ 2 ij v ( x ) x i x j d X k =1 h k ( i, a ) h k ( j, a ) + X ij ∂ i v ( x ) x j q ( a, j, i ) + X i x i f ( i, a )  = 0 . (1.4) In the general case this equation is fully nonlinear and it is not uniformly elliptic, so the conv enient notion of solution is the concept of viscosit y solution, see e.g. [7]. While proving that the v alue function is a viscosit y solution follows from standard results, uniqueness of solutions is more delicate and is usually pro ved via comparison results betw een sub- and super-solutions to the equation. While there exists very 3 sophisticated v ersion of this kind of results for more general cases, for instance even when D is replaced by a Hilb ert space (see [15], [11] or [9]) we are not a ware of any result which can be applied to (1.4) or to its parab olic v ersion, under our assumptions. Therefore we present tw o comparison theorems in Sections 4 and 5, th us establishing uniqueness of viscosit y solution and concluding that the v alue functions are completely c haracterized analytically as solutions to suitable PDEs. In Section 6 we prov e tw o veriﬁcation theorems, for the ﬁnite and inﬁnite time horizon, sho wing that if a classical solution to the dynamic programming equation exists then, under some additional conditions, it coincides with the v alue function and it is p ossible to construct an optimal control in feedback form for the separated problem. In our context of a controlled ﬁnite Marko v chain it ma y happ en that the HJB equation is uniformly elliptic and analytical results on the existence of smo oth solutions apply: see Remark 6.1. Section 7 is devoted to the approach to the control problem (1.2)-(1.3) by means of the sto c hastic maxim um principle. This is a basic tool in sto chastic optimization and as such it has b een applied to partially observ ed optimal control problems as well. The reader may ﬁnd an exp osition and further references in [2] or [8]. W e form ulate a sto chastic maximum principle for the separated problem as a necessary condition for optimality related to our optimization problem. Although the pro of relies on classical arguments, the ﬁnal statemen t improv es existing results in the literature. Indeed, the maximum principle for the controlled Zak ai equation is usually formulated under the assumptions that the set of con trol a actions A is conv ex and the coeﬃcients are diﬀerentiable with respect to a ∈ A . These restrictions, for completely observ able con trol problems, hav e b een remo ved by P eng [17] and w e follo w the same approac h here. In spite of the greater generality , the ﬁnal formulation do es not require the second adjoint equation in tro duced in [17], since simpliﬁcations o ccur due to the linearity of the separated con trol problem with resp ect to the state v ariable. In any case, the restrictions mentioned ab o ve can b e av oided and in particular the co eﬃcients are only assumed to be contin uous with resp ect to a ∈ A : see Theorem 7.1 b elow. In conclusion, our con tribution consists in the construction of a controlled Marko v c hain with sto c hastic transition rates adapted to a general giv en ﬁltration (in particular, a Brownian ﬁltration); the form ulation of a separated optimal control problem for the W onham ﬁlter and the pro of of its equiv alence with the original one; a complete and largely self-contained analysis of the separated problem, b oth for the ﬁnite and inﬁnite horizon case, including a characterization of the v alue function as the unique viscosity solution to the dynamic programming equations, a v eriﬁcation theorem, an instance of the stochastic maximum principle in the form of a necessary condition for optimalit y . 2 A construction of a p oin t pro cess with random comp ensator In this section we supp ose that S is a Polish space with a Borel probability measure µ . W e assume w e are giv en a nonnegative function q ( ω , t, x, y ) with suitable prop erties (in particular, b ounded) and w e show ho w to construct an S -v alued pure jump pro cess ( X q t ) such that the corresp onding random measure admits comp ensator q ( t, X q t − , y ) µ ( dy ) dt . W e refer e.g. to [3] for prerequisites on random measures and p oint pro cesses. In the following sections this construction will b e applied to deﬁne a controlled Mark ov chain in S . Our setting is summarized in the following hypotheses. Assumption 2.1 Assume that on a pr ob ability sp ac e (Ω , F , P ) the fol lowing indep endent r andom elements ar e deﬁne d: 1. a Poisson pr o c ess ( T n ) n ≥ 1 on (0 , ∞ ) with intensity K > 0 ; we set T 0 = 0 ; 2. an indep endent se quenc e ( X n ) n ≥ 1 of r andom variables, taking values in a Polish sp ac e S , e ach with the same law µ ; 3. an S -value d r andom variable X 0 ; 4. an indep endent se quenc e ( U n ) n ≥ 1 of r andom variables, e ach uniformly distribute d on (0 , 1) . 4 W e deﬁne a random measure ¯ N ( dt, dy , du ) on (0 , ∞ ) × S × (0 , 1) b y the formula ¯ N ( dt, dy , du ) = X n ≥ 1 δ ( T n ,X n ,U n ) ( dt, dy , du ) and we denote F ¯ N = ( F ¯ N t ) t ≥ 0 the ﬁltration generated b y ¯ N and X 0 . W e note that ¯ N is a marked P oisson pro cesses, with indep endent marks ( X n , U n ) taking v alues in S × (0 , 1). Therefore the F ¯ N -comp ensator of ¯ N is ¯ ν ( dt, dy, du ) = K dt µ ( dy ) du. No w supp ose that we are given a ﬁltration F 1 = ( F 1 t ) t ≥ 0 in (Ω , F ), with F 1 ∞ indep enden t from the ab ov e random pro cesses and v ariables. Denote F ¯ N , 1 = ( F ¯ N , 1 t ) t ≥ 0 the ﬁltration deﬁned by F ¯ N , 1 t = F ¯ N t ∨ F 1 t . Since F 1 ∞ is indep endent of ¯ N , it is easily veriﬁed that ¯ ν is also the comp ensator of ¯ N with resp ect to F ¯ N , 1 . Also supp ose that we are giv en a function q : Ω × [0 , ∞ ) × S × S → R satisfying P -a.s. 0 ≤ q ( ω , t, x, y ) ≤ C q , t ≥ 0 , x, y ∈ S (2.1) for some constant C q > 0. W e assume that q is P ( F 1 ) ⊗ B ( S ) ⊗ B ( S )-measurable, where P ( F 1 ) denotes the predictable σ -algebra in Ω × [0 , ∞ ) for the ﬁltration F 1 and B ( S ) the Borel σ -algebra in S . Finally we assume that the constan t in Assumption 2.1- 1 satisﬁes K ≥ C q . Deﬁne inductively ν 0 = 0 and, for k ≥ 0, ν k +1 = inf { n > ν k : U n < q ( T n , X ν k , X n ) /K } , with the conv ention inf ∅ = ∞ . W e take an elemen t δ / ∈ S and we add it to S as an isolated p oint. W e set T ν n = ∞ and X ν n = δ if ν n = ∞ and we consider the marked p oin t pro cess ( T ν n , X ν n ) n ≥ 1 . W e also in tro duce the corresp onding S ∪ { δ } -v alued piecewise-constant pro cess ( X q t ) t ≥ 0 (starting from X 0 at time 0) and the asso ciated random measure N ( dt, dy ) on (0 , ∞ ) × S : for n ≥ 0 , X q t = X ν n , T ν n ≤ t < T ν n +1 ; N ( dt, dy ) = X n ≥ 1 δ ( T ν n ,X ν n ) ( dt, dy ) 1 T ν n < ∞ . W e denote F N = ( F N t ) t ≥ 0 the ﬁltration generated by N and X 0 , and by F N , 1 = ( F N , 1 t ) t ≥ 0 the ﬁltration deﬁned b y F N , 1 t = F N t ∨ F 1 t . W e note that in fact also ν k , N ( dt, dy ) and F N , 1 dep end on q , but we omit indicating this dep endence. Lemma 2.1 The pr o c ess X q is c` ad l` ag and F ¯ N , 1 -adapte d. Pro of. Since X q t = X n ≥ 0 X ν n 1 [ T ν n ,T ν n +1 ) ( t )1 ν n < ∞ (2.2) w e see that X q is clearly c` adl` ag. Adaptedness is intuitiv e, since at any time t all its present and past v alues and jump times can b e recov ered observing T n and q ( T n , i, j ) up to time t as well as the corresp onding X n , U n . Now we pro ceed to a formal pro of. Step I: for e ach k , n ≥ 0 { ν k = n } ∈ F ¯ N , 1 T n , i.e., ν k is a stopping time for the ﬁltr ation ( F ¯ N , 1 T n ) n ≥ 0 . Since T n is a stopping time for F ¯ N , it is also a stopping time for F ¯ N , 1 . Since ( q ( t, i.j )) t is predictable for F 1 , it is also predictable - and hence progressiv ely measurable - for F ¯ N , 1 . It follows that q ( T n , i.j ) is F ¯ N , 1 T n -measurable. The same holds for ( U n , X n ) and hence for q ( T n , i.X n ), by composition. W e deﬁne a discrete time ﬁltration H = ( H n ) n ≥ 0 and, for every i ∈ S , a time discrete pro cess ( Y n ( i )) n ≥ 0 b y the formulae Y n ( i ) = ( U n , q ( T n , i.X n )) , H n = F ¯ N , 1 T n 5 (here we set U 0 = 0). Then we hav e seen that ( Y n ( i )) n is H -adapted. It takes v alues in the set { ( q , u ) : 0 < u < 1 , 0 ≤ q < ∞} . Deﬁne D = { ( u, q ) : u < q /K } . W e can express ν 1 as the ﬁrst hitting time of D b y the pro cess ( Y n ( X 0 )) n : ν 1 = inf { n > 0 : U n < q ( T n , X 0 , X n ) /K } = inf { n > 0 : Y n ( X 0 ) ∈ D } . Since ( Y n ( X 0 )) n is H -adapted we conclude that ν 1 is a stopping time for H . Since ( X n ) n is H -adapted, the pro cess ( X n ∧ ν 1 ) n is also H -adapted. Similarly , we can express ν 1 and ν 2 as the ﬁrst and second hitting time of D b y the pro cess ( Y n ( X n ∧ ν 1 )) n : ν 1 = inf { n > 0 : U n < q ( T n , X n ∧ ν 1 , X n ) /K } = inf { n > 0 : Y n ( X n ∧ ν 1 ) ∈ D } ν 2 = inf { n > ν 1 : U n < q ( T n , X ν 1 , X n ∧ ν 1 ) /K } = inf { n > ν 1 : Y n ( X n ∧ ν 1 ) ∈ D } Since ( Y n ( X n ∧ ν 1 )) n is H -adapted we conclude that ν 2 is a stopping time for H . Since ( X n ) n is H -adapted, the process ( X n ∧ ν 2 ) n is also H -adapted. Iterating this argumen t we can sho w that all the random times ν k are stopping times for H and Step I is prov ed. Step II: for every k ≥ 0 , T ν k is a stopping time for the ﬁltr ation F ¯ N , 1 . This is trivial for k = 0, so assume k ≥ 1. W e write { T ν k ≤ t } = [ n ≥ 0 { ν k = n, T n ≤ t } and we recall that T n is a stopping time for F ¯ N , 1 and, b y Step I, that { ν k = n } ∈ F ¯ N , 1 T n . It follows that { ν k = n, T n ≤ t } ∈ F ¯ N , 1 t (b y the very deﬁnition of F ¯ N , 1 T n ) and therefore also { T n k ≤ t } ∈ F ¯ N , 1 t . Step III: for every k ≥ 0 , X ν k 1 ν k < ∞ is F ¯ N , 1 T ν k -me asur able . This is clear for k = 0, since ν 0 = 0, T 0 = 0 and X 0 is F ¯ N 0 = σ ( X 0 )-measurable. Next we assume k ≥ 1. F or an y B ⊂ S and an y t ≥ 0 w e ha ve { X ν k 1 ν k < ∞ ∈ B , T ν k ≤ t } = [ n ≥ 1 { ν k = n, X n ∈ B , T n ≤ t } . F rom Step I we hav e { ν k = n } ∈ F ¯ N , 1 T n . Since X n is measurable with resp ect to F ¯ N T n ⊂ F ¯ N , 1 T n it follows that { ν k = n, X n ∈ B } ∈ F ¯ N , 1 T n and so { ν k = n, X n ∈ B , T n ≤ t } ∈ F ¯ N , 1 t (b y deﬁnition of F ¯ N , 1 T n ) and ﬁnally we obtain { X ν k 1 ν k < ∞ ∈ B , T ν k ≤ t } ∈ F ¯ N , 1 t whic h prov es Step II I. No w adaptedness of X q follo ws from Steps I I and I I I and the represen tation (2.2). W e are now ready for the main result of this section. Theorem 2.1 Supp ose that Assumption 2.1 holds and that F 1 = ( F 1 t ) t ≥ 0 is a ﬁltr ation in (Ω , F ) , with F 1 ∞ indep endent fr om the r andom elements in Assumption 2.1. With the pr evious notation, let q : Ω × [0 , ∞ ) × S × S → R b e P ( F 1 ) ⊗ B ( S ) ⊗ B ( S ) -me asur able and satisfy (2.1) . L et us take the c onstant in Assumption 2.1-1 so lar ge that K ≥ C q . Then the F N , 1 - c omp ensator of N ( dt, dy ) is ν ( dt, dy ) = q ( t, X q t − , y ) µ ( dy ) dt. Pro of. First w e note that ( X q t − ) is F N -predictable, so that by the measurability assumptions on q the random measure q ( t, X q t − , y ) µ ( dy ) dt is F N , 1 -predictable. Let H ( t, y ) ≥ 0 b e an F N , 1 -predictable pro cess. W e ha ve E Z S Z ∞ 0 H ( t, y ) N ( dt, dy ) = E X k ≥ 1 H ( T ν k , X ν k ) 1 T ν k < ∞ . 6 F or n ≥ 1 and k ≥ 1 suc h that ν k − 1 < n < ν k the inequality U n ≥ q ( T n , X ν k , U n ) /K tak es place. So we may rewrite the previous sum adding sev eral null terms as follo ws: X k ≥ 1 H T ν k ( X ν k ) 1 T ν k < ∞ = X k ≥ 1  ν k X n =1+ ν k − 1 H ( T n , X n ) 1  U n < q ( T n , X ν k − 1 , X n ) /K   1 T ν k < ∞ (in each sum in square brack ets only the last term may b e non-zero). Next note that, for k ≥ 1, X ν k − 1 = X q ( t − ) for T ν k − 1 < t ≤ T ν k = ⇒ X ν k − 1 = X q ( T n − ) for ν k − 1 < n ≤ ν k . So we obtain X k ≥ 1 H T ν k ( X ν k ) 1 T ν k < ∞ = X k ≥ 1  ν k X n =1+ ν k − 1 H ( T n , X n ) 1  U n < q ( T n , X q ( T n − ) , X n ) /K   1 T ν k < ∞ = X n ≥ 1 H ( T n , X n ) 1  U n < q ( T n , X q ( T n − ) , X n ) /K  . This may b e written as an integral with resp ect to the random measure ¯ N , leading to E Z S Z ∞ 0 H ( t, y ) N ( dt, dy ) = E Z S Z ∞ 0 Z 1 0 H ( t, y ) 1  u < q ( t, X q ( t − ) , y ) /K  ¯ N ( dt, dy , du ) . F rom Lemma 2.1 it follows that ( X q ( t − )) is F ¯ N , 1 -predictable and so is the integrand in the right-hand side of the last displa yed formula. Recalling the form of the compensator of ¯ N we obtain E Z S Z ∞ 0 H ( t, y ) N ( dt, dy ) = E Z S Z ∞ 0 Z 1 0 H ( t, y ) 1  u < q ( t, X q ( t − ) , y ) /K  du K dt µ ( dy ) . Noting that q ( t, X q ( t − ) , j ) /K ≤ C q /K ≤ 1 we can compute the integral in du and we conclude that E Z S Z ∞ 0 H ( t, y ) N ( dt, dy ) = E Z S Z ∞ 0 H ( t, y ) q ( t, X q ( t − ) , y ) dt µ ( dy ) = E Z S Z ∞ 0 H ( t, y ) ν ( dy , dt ) . 3 The partially observ ed con trol problem and its reform ulations In this section we supp ose that Assumption 2.1 holds. F rom now on w e also assume that the state space S is ﬁnite; we will use letters i, j to denote its elements. W e need to in tro duce a space A of control actions where the control pro cess ( α t ) takes v alues. W e also need to introduce controlled transition rates q ( a, t, i, j ), a function h ( t, i, a ) to mo del the observ ation and real functions f ( t, i, a ) and g ( i ) to deﬁne the rew ard to b e maximized; they ma y depend on the con trol action a ∈ A . According to the usual approac h (see e.g. [2]) w e will initially set the con trol problem under the reference probabilit y measure P , so that in particular the observ ation W will b e a giv en Brownian motion under P . This has the adv an tage that the corresp onding ﬁltration does not depend on the control pro cess. Here are the hypotheses we need and which be v alid in the rest of the pap er (in addition to Assumption 2.1). Assumption 3.1 1. ( W t ) t ≥ 0 is a standar d d -dimensional Br ownian motion deﬁne d in (Ω , F , P ) ; we de- note F W = ( F W t ) t ≥ 0 its c omplete d ﬁltr ation. 2. S is a ﬁnite set with c ar dinality N . A is a Polish sp ac e. T > 0 and β > 0 ar e given c onstants. 3. F or every i, j ∈ S ( i  = j ) we ar e given numb ers g ( i ) ∈ R and functions q ( · , · , i, j ) : A × [0 , ∞ ) → [0 , ∞ ) , h ( i, · , · ) : A × [0 , ∞ ) → R d , f ( i, · , · ) : A × [0 , ∞ ) → R , which ar e Bor el me asur able, and ther e exists a c onstant K 0 such that | q ( a, t, i, j ) | + | h ( i, a, t ) | + | f ( i, a, t ) | + | g ( i ) | ≤ K 0 , a ∈ A ; t ≥ 0; i, j ∈ S ( i  = j ) . (3.1) 7 4. The c onstant in Assumption 2.1-1 is taken so lar ge that N · q ( a, t, i, j ) ≤ K, a ∈ A ; t ≥ 0; i, j ∈ S ( i  = j ) . 5. The r andom variables in Assumption 2.1-2 ar e uniformly distribute d on S . W e complete the deﬁnition of the rate matrix setting as usual q ( a, t, i, i ) = − X j  = i q ( a, t, i, j ) . W e ﬁnally deﬁne the set of admissible controls of the partial observ ation problem as A = { α : Ω × [0 , ∞ ) → A, F W -predictable } . 3.1 The partially observ ed con trol problem for the reference probabilit y F or ev ery α ∈ A w e next deﬁne a corresp onding con trolled S -v alued process using the construction of the previous section. Instead of a general ﬁltration F 1 no w w e tak e the ﬁltration F W . Then w e consider the F W -predictable processes ( N · q ( α t , t, i, j )) t and w e construct the corresp onding pro cess X q as in the previous section, that will no w b e called X α . Explicitly , w e deﬁne ν 0 = 0 and, for k ≥ 0, ν k +1 = inf { n > ν k : U n < N · q ( α T n , T n , X ν k , X n ) /K } , with the con v ention inf ∅ = ∞ . W e take an element δ / ∈ S , w e set T ν n = ∞ and X ν n = δ if ν n = ∞ and w e consider the mark ed point pro cess ( T ν n , X ν n ) n ≥ 1 . The corresp onding S ∪ { δ } -v alued process ( X α t ) t ≥ 0 (starting from X 0 at time 0) deﬁned by X α t = X ν n , for T ν n ≤ t < T ν n +1 , n ≥ 0 , is the controlled pro cess corresp onding to α ∈ A . On the ﬁnite state space S , measures µ ( dy ) are identiﬁed with their masses µ ( j ) at an y p oin t j ∈ S . F or instance, the uniform distribution of the v ariables X n is µ ( j ) = 1 / N (whic h accounts for the factor N in some of the previous formulae). Corresp ondingly , the random measure on (0 , ∞ ) × S asso ciated to ( T ν n , X ν n ) n ≥ 1 is now denoted N α ( dt, j ) = X n ≥ 1 δ ( T ν n ,X ν n ) ( dt, j ) 1 T ν n < ∞ . W e denote F N α ,W the ﬁltration generated b y N α , X 0 and W . By Theorem 2.1, the F N α ,W -comp ensator of N α ( dt, j ) is ν α ( dt, j ) = q ( α t , t, X α t − , j ) dt. This form ula justiﬁes the in terpretation of X α as a Mark ov chain with “sto chastic transition rates” given b y q ( α t , t, i, j ). Ha ving constructed the con trolled pro cesses X α w e can formulate the optimization problem by introduc- ing the reward functional to b e maximized. Let us deﬁne Z α t = exp  Z t 0 h ( X α s , α s , s ) dW s − 1 2 Z t 0 | h ( X α s , α s , s ) | 2 ds  , t ≥ 0 . The optimal control problem for a ﬁnite horizon T consists in maximizing the reward functional J T ( α ) = E " Z T 0 Z α t f ( X α t , α t , t ) dt + Z α T g ( X α T ) # o ver all α ∈ A . The inﬁnite horizon optimal control problem consists in maximizing the discounted reward functional J ∞ ( α ) = E  Z ∞ 0 e − β t Z α t f ( X α t , α t ) dt  with discount rate β . In the inﬁnite horizon case the functions q , h and f are tak en to b e time-indep endent. The o ccurrence of the pro cess Z α is explained in the following reformulation. 8 3.2 The partially observ ed con trol problem for the physical probabilit y Here we show that the previous formulation corresponds to the original control problem outlined in the in tro duction, provided an appropriate w eak sense form ulation is giv en. The ﬁrst step will b e to construct a physical probability under which the observ ation has the desired form (1.1). Let us start with some preliminary remarks. W e note that, since h is b ounded, the process Z α in tro duced abov e is a con tin uous F N α ,W -martingale. By the form of the comp ensator, the pro cesses M j,α t := N α ((0 , t ] , j ) − ν α ((0 , t ] , j ) , t ≥ 0 , are also F N α ,W -martingales. Since they are lo cally of integrable v ariation, they are purely discontin uous martingales, hence orthogonal to Z α , which means that the pro ducts M j,α Z α are lo cal martingales. No w let us deﬁne, for each t ≥ 0, a consistent family of probabilities ¯ P α t on F N α ,W t corresp onding to the Dol ´ eans exp onen tial Z α , namely: d ¯ P α t = Z α t d P    F N α ,W t , as well as the pro cess B α t := W t − R t 0 h ( X α s , α s , s ) ds , t ≥ 0. Then the following holds. 1. F or every T > 0, under ¯ P α T the pro cess B α is a Wiener process on [0 , T ]: This follows from the Girsanov theorem. 2. F or ev ery T > 0, under ¯ P α T the random measure N α ( dt, j ) has the same F N α ,W -comp ensator ν α ( dt, j ) = q ( α t , X α t − , j ) dt. Indeed, the processes M j,α remain F N α ,W -martingales under ¯ P α , because M j,α Z α are F N α ,W -lo cal martingales under P . It is easy to chec k that the rew ard functionals can b e written J T ( α ) = ¯ E α T " Z T 0 f ( X α t , α t , t ) dt + g ( X α T ) # , where ¯ E α T denotes exp ectation under ¯ P α T and, in the inﬁnite horizon case, J ∞ ( α ) = lim T →∞ ¯ E α T " Z T 0 e − β t f ( X α t , α t ) dt # . (3.2) This is the original optimal control problem outlined in the introduction: indeed, on each interv al [0 , T ], the con trolled Marko v chain X α has F N α ,W -comp ensator ν α ( dt, j ) = q ( α t , t, X α t − , j ) dt. and the observ ation pro cess has the form W t = Z t 0 h ( X α s , α s , s ) ds + B α t , t ≥ 0 . This optimization problem is in weak form, since the physical probabilities ¯ P α T and the observ ation noise B α dep end on α . Remark 3.1 Supp ose that, in the ﬁnite horizon case, one wishes to maximize a functional of the form ¯ E α T   X j Z T 0 ℓ ( X α t − , j, α t , t ) N α ( dt, j )   9 for some b ounded Borel measurable real function ℓ ( i, j, a, t ). This is a running reward depending explicitly on the random measure N α ( dt, j ). Since the previous integrand is F N α ,W -predictable, this is the same as ¯ E α T   X j Z T 0 ℓ ( X α t , j, α t , t ) q ( α t , t, X α t , j ) dt   whic h has the form of the running reward considered b efore, setting f ( i, a, t ) = P j ℓ ( i, j, a, t ) q ( a, t, i, j ). Similar considerations apply to the inﬁnite horizon case. 3.3 The separated optimal con trol problem Still assuming that Assumptions 2.1 and 3.1 hold true, w e come bac k to the optimization problem form ulated in subsection 3.1, that we rewrite in a diﬀerent equiv alent form. It is conv enient to introduce the generator Q a t of the controlled, time-dep endent Mark o v c hain, whic h maps an y function ϕ : S → R to the function Q a t ϕ : S → R given b y Q a t ϕ ( i ) = X j ϕ ( j ) q ( a, t, i, j ) i ∈ S. Next we introduce the unnormalized conditional la w setting, for an y ϕ , ρ t ( ϕ ) = E [ ϕ ( X α t ) Z α t | F W t ] . The conditional exp ectation is taken under the reference probability P . The pro cess ( ρ t ( ϕ )) t is understo o d as the optional pro jection of ( ϕ ( X α t ) Z α t ) for the ﬁltration F W and the form ula deﬁnes an optional pro cess with v alues in the space of nonnegative ﬁnite measures o ver S ; we refer to [1] for details. It is easy to sho w that the reward functionals can b e written J T ( α ) = E " Z T 0 ρ t ( f ( · , α t , t )) dt + ρ T ( g ) # and J ∞ ( α ) = E  Z ∞ 0 e − β t ρ t ( f ( · , α t )) dt  . The motiv ation to in tro duce the pro cess ρ is the w ell known fact that it is a solution to the the Zak ai ﬁltering equation which, in the presen t case of a ﬁnite-state Marko v chain, is also called the W onham ﬁlter: for every ϕ : S → R , dρ t ( ϕ ) = ρ t ( Q α t t ϕ ) dt + ρ t  h ( · , α t , t ) ϕ ( · )  dW t , ρ 0 ( ϕ ) = E [ ϕ ( X 0 )] . W e will so on see that for ev ery admissible control and any initial condition ρ 0 there exists a unique solution. This w a y we hav e obtained the so-called separated problem: the state equation is the con trolled W onham ﬁlter and the rew ard functionals dep end on the control and the corresp onding state tra jectory . 3.4 Optimal control of the W onham ﬁlter: setting and preliminary results Here we introduce the appropriate form ulation for our optimization problem, that will b e the aim of the analysis in all the following sections. W e start from the separated problem and we ﬁrst note that the space of nonnegativ e ﬁnite measures ov er S can b e iden tiﬁed with D = { x = ( x 1 , . . . , x N ) ∈ R N : x i ≥ 0 , i = 1 , . . . , N } . Let us deﬁne ρ i t := ρ t (1 { i } ), where 1 { i } : S → R is the indicator function of state i . Noting that for ev ery ϕ we ha ve ρ t ( ϕ ) = P i ρ i t ϕ ( i ), easy computations sho w the con trolled W onham ﬁltering equations can b e rewritten as a system of SDEs for the process ( ρ 1 t , . . . , ρ N t ). Allowing a general starting time t ∈ [0 , T ] and 10 a general initial condition x = ( x i ) ∈ D , the ﬁnite horizon problem is              dρ i s = X j ρ j s q ( α s , s, j, i ) ds + ρ i s h ( i, α s , s ) dW s , s ∈ [ t, T ] , i ∈ S, ρ i t = x i , x ∈ D , i ∈ S, J T ( t, x, α ) = E " Z T t X i ρ i s f ( i, α s , s ) ds + X i ρ i T g ( i ) # , (3.3) and the inﬁnite horizon problem starting at time 0 is              dρ i t = X j ρ j s q ( α t , j, i ) dt + ρ i s h ( i, α t ) dW t , t ≥ 0 , i ∈ S, ρ i 0 = x i , x ∈ D , i ∈ S, J ∞ ( x, α ) = E " Z ∞ 0 e − β t X i ρ i t f ( i, α t ) dt # . (3.4) Clearly , the problem starting at time t ≥ 0 also admits a rephrasing in the original formulations, even under the physical probability . Now we deﬁne the v alue functions for these problems: V ( t, x ) = sup α ∈A J T ( t, x, α ) , V ( x ) = sup α ∈A J ∞ ( x, α ) , t ∈ [0 , T ] , x ∈ D . (3.5) In the following Prop osition we collect some preliminary prop erties of these optimization problems and the corresp onding v alue functions. Prop osition 3.1 Supp ose that Assumptions 2.1 and 3.1 hold true, and that the c o eﬃcients do not dep end on time in the inﬁnite horizon c ase. 1. F or the solution to (3.3) we have, P -a.s., ρ i s ∈ D for every s ∈ [ t, T ] , i ∈ S . If x i > 0 for every i ∈ S then, P -a.s., ρ i s > 0 for every s ∈ [ t, T ] and i ∈ S . Similar r esults hold for the solution to (3.4) . 2. F or every t ∈ [0 , T ] the function x 7→ V ( t, x ) is c onvex; mor e over ther e exists a c onstant C such that for every t ∈ [0 , T ] , x, ¯ x ∈ D , | V ( t, x ) − V ( t, ¯ x ) | ≤ C | x − ¯ x | , | V ( t, x ) | ≤ C (1 + | x | ) . (3.6) 3. the function x 7→ V ( x ) is c onvex, henc e lo c al ly Lipschitz; mor e over for every x ∈ D , | V ( x ) | ≤ 1 β sup | f | . (3.7) Pro of. 1. W e note that the equation is linear with respect to the state v ariable and has b ounded (stochastic) co eﬃcien ts. So the classical conditions on Lipschitz con tinuit y and linear gro wth hold true and the equation has a unique contin uous F W -adapted solution starting from any x ∈ R N . If x = 0 ∈ D then ρ = 0. If x ∈ D and x  = 0 then cx is a probabilit y measure on S for c = ( P i x i ) − 1 so that cρ t coincides with the unnnormalized conditional distribution and so it is a nonnegativ e measure on S ; it follows that ρ i t ≥ 0. T o prov e the result of strict positivity we write the state equations as a system of ordinary (deterministic) diﬀeren tial equations with sto chastic co eﬃcien t: this is the so-called robust form of the Zak ai equation. W e set ν i t = ρ i t exp  − Z t 0 h ( i, α s ) dW s  , and w e lo ok for the equation satisﬁed b y ν i . Computing the Ito diﬀeren tial dν i t , after some calculations we ha ve dν i t = X j ν j t q ( α t , j, i ) exp  Z t 0 [ h ( j, α s ) − h ( i, α s )] dW s  dt − 1 2 ν i t | h ( i, α s ) | 2 dt. 11 So we obtain the robust equation in the form d dt ν i t = P j a ij t ν j t setting a ii t = q ( α t , i, i ) − 1 2 | h ( i, α t ) | 2 , a ij t = q ( α t , j, i ) exp  Z t 0 [ h ( j, α s ) − h ( i, α s )] dW s  , j  = i. W e ha ve d dt ν i t = a ii t ν i t + X j  = i a ij t ν j t = a ii t ν i t + g i t where g i t := P j  = i a ij t ν j t ≥ 0 by the nonnegativity result already prov ed. It follows that ν i t = ν i 0 exp  Z t 0 a ii s ds  + Z t 0 exp  Z t s a ii r dr  g i s ds ≥ ν i 0 exp  Z t 0 a ii s ds  > 0 for every i ∈ S if ν i 0 > 0 for every i ∈ S . The same clearly holds for ρ i . 2. Let ρ , ¯ ρ denote the solutions starting from x, ¯ x . By standard estimates we hav e E sup s ∈ [ t,T ] | ρ s − ¯ ρ s | 2 ≤ C | x − ¯ x | 2 for some constant C (dep ending also on T ). By the b oundedness of f and g it follo ws easily that | J T ( t, x, α ) − J T ( t, ¯ x, α ) | 2 ≤ C ′ | x − ¯ x | 2 for some constant C ′ indep enden t of α . (3.6) follows immediately . W e note that the state equation and the rew ard functional are linear. It follo ws that x 7→ J T ( t, x, α ) is linear and x 7→ V ( t, x ) is conv ex as the suprem um of linear functions. 3. F rom form ula (3.2) it follo ws that | J ∞ ( x, α ) | ≤ lim inf T →∞ R T 0 e − β t sup | f | dt ≤ sup | f | /β and the estimate on V holds. Conv exity follows from linearit y as b efore. Remark 3.2 By similar arguments it is also easy to sho w that x 7→ V ( x ) is globally Lipsc hitz provided β > 0 is suﬃciently large. 4 Dynamic programming equation for inﬁnite horizon: viscosit y theory In this section we study the v alue function V for the problem (3.4). W e supp ose that Assumptions 2.1 and 3.1 hold and that the co eﬃcien ts do not dep end on time. W e will show that V is the unique viscosit y solution to the dynamic programming equation (also called Hamilton-Jacobi-Bellman equation - HJB) which takes the following form: for x ∈ D , β v ( x ) − sup a ∈ A  1 2 X ij ∂ 2 ij v ( x ) x i x j d X k =1 h k ( i, a ) h k ( j, a ) + X ij ∂ i v ( x ) x j q ( a, j, i ) + X i x i f ( i, a )  = 0 . (4.1) This will b e written as follo ws: denoting D v and D 2 v the gradient and the Hessian matrix of v we hav e β v + F ( x, D v , D 2 v ) = 0 , where, for x ∈ D , p ∈ R N and X ∈ S ( N ) (the set of symmetric real N × N matrices) F ( x, p, X ) = − sup a ∈ A  1 2 T race  Σ( x, a )Σ( x, a ) T X  + ⟨ p, b ( x, a ) ⟩ + f ( x, a )  where the N × d matrix Σ( x, a ), the v ector b ( x, a ) and the real function f ( x, a ) are Σ( x, a ) = ( x i h k ( i, a )) ik , b ( x, a ) =  X j x j q ( a, j, i )  i , f ( x, a ) = X i x i f ( i, a ) , for i = 1 , . . . , N , k = 1 , . . . , d . Recall that we assume (see (3.1)) | h k ( i, a ) | + | q ( a, i, j ) | + | f ( i, a ) | ≤ K 0 , a ∈ A ; k = 1 , . . . , d ; i, j = 1 , . . . , N 12 for some constant K 0 . Therefore Σ( x, a ), b ( x, a ), f ( x, a ) are Lipschitz con tinuous in x , uniformly in a , with a Lipschitz constant that only dep ends on K 0 , N , d . An easy computation sho ws that | F ( x, p, X ) − F ( x, q , Y ) | ≤ C 0 ( | x | | p − q | + | x | 2 | X − Y | ) (4.2) for ev ery x ∈ D , p, q ∈ R N , X, Y ∈ S ( N ) and for some constan t C 0 dep ending only on K 0 , N , d . W e also note that F is a contin uous function of all its arguments. Let us brieﬂy recall the deﬁnition of viscosity sub-/sup ersolutions. W e ﬁnd it conv enient to write it using sub- and superjets. The equiv alence with the other deﬁnition based on the use of test functions is w ell kno wn and can b e found e.g. in [7], [10]. F or u : D → R and x ∈ D , the sup erjet J 2 , + u ( x ) is the set of pairs ( p, X ) ∈ R N × S ( N ) such that u ( y ) ≤ u ( x ) + ⟨ p, y − x ⟩ + 1 2 ⟨ X ( y − x ) , y − x ⟩ + o ( | y − x | 2 ) as y ∈ D , y → x. The closure ¯ J 2 , + u ( x ) consists of the pairs ( p, X ) such that there exists a sequence ( x n , p n , X n ) ∈ D × R d × S ( N ) such that x n → x , p n → p , X n → X , u ( x n ) → u ( x ), ( p n , X n ) ∈ J 2 , + u ( x n ). W e deﬁne sub jets setting J 2 , − u ( x ) = − J 2 , + ( − u )( x ), ¯ J 2 , − u ( x ) = − ¯ J 2 , + ( − u )( x ). W e sa y that an upp er semicon tinuous function u : D → R is a viscosity subsolution if for any x ∈ D ( p, X ) ∈ ¯ J 2 , + u ( x ) = ⇒ β u ( x ) + F ( x, p, X ) ≤ 0 . A low er semicon tinuous function v : D → R is called a viscosity sup ersolution if for an y x ∈ D ( p, X ) ∈ ¯ J 2 , − v ( x ) = ⇒ β v ( x ) + F ( x, p, X ) ≥ 0 . Finally , a viscosity solution is b oth a sub- and supersolution. The main result of this section is the following comparison result, which immediately implies uniqueness of bounded viscosit y solution to the HJB equations and allo ws to prov e the characterization result for the v alue function. Theorem 4.1 L et u b e an upp er semic ontinuous subsolution b ounde d ab ove, v a lower semic ontinuous su- p ersolution b ounde d b elow. Then u ≤ v . Pro of. Deﬁne, for x, y ∈ D , α > 0, δ > 0, Φ( x, y ) = u ( x ) − v ( y ) − α 2 | x − y | 2 − δ log( γ + | x | 2 ) − δ log ( γ + | y | 2 ) . Here α will even tually tend to ∞ , δ to 0 and γ > 0 will b e ﬁxed late r, suﬃciently large. By the b oundedness assumption there exists a maximum point ( ˆ x, ˆ y ) ∈ D × D . Φ , ˆ x, ˆ y dep end on α, δ, γ but we omit this dep endence in the notation. By standard argumen ts (see e.g. [7] Lemma 3.1 or Prop osition 3.7) for ﬁxed δ, γ we hav e α | ˆ x − ˆ y | 2 → 0 , | ˆ x − ˆ y | → 0 (4.3) as α → ∞ . F or ev ery x ∈ D we hav e Φ( ˆ x, ˆ y ) ≥ Φ( x, x ) = u ( x ) − v ( x ) − 2 δ log( γ + | x | 2 ) , so, letting θ δ = sup x ∈ D ( u ( x ) − v ( x ) − 2 δ log ( γ + | x | 2 )) , w e hav e Φ( ˆ x, ˆ y ) ≥ θ δ , which implies in particular θ δ + δ log( γ + | ˆ x | 2 ) + δ log ( γ + | ˆ y | 2 ) ≤ u ( ˆ x ) − v ( ˆ y ) . (4.4) 13 W e note that θ δ is decreasing in δ > 0. W e claim that lim δ → 0 θ δ ≤ 0, so that for every δ suﬃcien tly small we ha ve u ( x ) − v ( x ) − 2 δ log( γ + | x | 2 ) ≤ 0 for every x ∈ D and letting δ → 0 w e obtain the desired conclusion u ≤ v . T o pro ve the claim we will show that assuming lim δ → 0 θ δ ∈ (0 , ∞ ] leads to a con tradiction. Let us deﬁne g ( x ) = log( γ + | x | 2 ) , ˜ u ( x ) = u ( x ) − δ g ( x ) , ˜ v ( y ) = v ( y ) + δ g ( y ) . Then ( ˆ x, ˆ y ) is a maxim um p oint of ˜ u ( x ) − ˜ v ( y ) − α 2 | x − y | 2 . By the Crandall-Ishii Lemma (see [6], or [7] Theorem 3.2 and the discussion that follows) there exist X , Y ∈ S ( N ) such that ( α ( ˆ x − ˆ y ) , X ) ∈ ¯ J 2 , + ˜ u ( ˆ x ) , ( α ( ˆ x − ˆ y ) , Y ) ∈ ¯ J 2 , − ˜ v ( ˆ y ) ,  X 0 0 − Y  ≤ 3 α  I − I − I I  . (4.5) Since g is smo oth, it follows that  α ( ˆ x − ˆ y ) + δ Dg ( ˆ x ) , X + δ D 2 g ( ˆ x )  ∈ ¯ J 2 , + u ( ˆ x ) ,  α ( ˆ x − ˆ y ) − δ Dg ( ˆ y ) , Y − δ D 2 g ( ˆ y )  ∈ ¯ J 2 , − v ( ˆ y ) and since u, − v are subsolutions, β u ( ˆ x ) + F  ˆ x, α ( ˆ x − ˆ y ) + δ D g ( ˆ x ) , X + δ D 2 g ( ˆ x )  ≤ 0 , β v ( ˆ y ) + F  ˆ y, α ( ˆ x − ˆ y ) − δ D g ( ˆ y ) , Y − δ D 2 g ( ˆ y )  ≥ 0 . Subtracting these inequalities and recalling (4.4) w e obtain β θ δ + β δ log( γ + | ˆ x | 2 ) + β δ log( γ + | ˆ y | 2 ) ≤ F  ˆ y, α ( ˆ x − ˆ y ) − δ D g ( ˆ y ) , Y − δ D 2 g ( ˆ y )  − F  ˆ x, α ( ˆ x − ˆ y ) + δ D g ( ˆ x ) , X + δ D 2 g ( ˆ x )  . Using (4.2), the right-hand side can b e estimated by F  ˆ y, α ( ˆ x − ˆ y ) , Y  − F  ˆ x, α ( ˆ x − ˆ y ) , X  + δ C 0 ( | ˆ y | | D g ( ˆ y ) | + | ˆ y | 2 | D 2 g ( ˆ y ) | ) + δ C 0 ( | ˆ x | | D g ( ˆ x ) | + | ˆ x | 2 | D 2 g ( ˆ x ) | ) . By explicit computations, D g ( x ) = 2 x γ + | x | 2 , D 2 g ( x ) = 2 I γ + | x | 2 − 4 x ⊗ x ( γ + | x | 2 ) 2 , so that C 0 ( | x | | D g ( x ) | + | x | 2 | D 2 g ( x ) | ) ≤ C 1 where C 1 is another constant dep ending only on K 0 , N , d (and not on γ > 0). It follows that β θ δ + β δ log( γ + | ˆ x | 2 ) + β δ log( γ + | ˆ y | 2 ) ≤ F  ˆ y, α ( ˆ x − ˆ y ) , Y  − F  ˆ x, α ( ˆ x − ˆ y ) , X  + 2 δ C 1 . Cho osing γ > 0 so large that β log γ ≤ C 1 w e arrive at β θ δ ≤ F  ˆ y, α ( ˆ x − ˆ y ) , Y  − F  ˆ x, α ( ˆ x − ˆ y ) , X  . It is well kno wn (see [7] Example 3.6) that the right-hand side can b e estimated as follows: β θ δ ≤ ω ( α | ˆ x − ˆ y | 2 + | ˆ x − ˆ y | ) for a mo dulus ω (i.e. a function ω : [0 , ∞ ) → [0 , ∞ ) such that ω (0+) = 0) that only dep ends on the Lipschitz constan ts of Σ, b , f , hence only on K 0 , N , d . Letting α → ∞ and recalling (4.3) we ha ve then β θ δ ≤ 0 which leads to a con tradiction with the assumption that β > 0 and lim δ → 0 θ δ > 0. This was the main step to the follo wing result that summarizes the main conclusions on the con trol problem 14 Theorem 4.2 Supp ose that Assumptions 2.1 and 3.1 hold and that the c o eﬃcients do not dep end on time. Then the value function V for the pr oblem (3.4) is the unique b ounde d visc osity solution of the HJB e quation (4.1) . Pro of. Boundedness of V was prov ed in (3.7) and uniqueness follo ws from the previous result. Under our assumptions it is w ell known that V satisﬁes a dynamic programming principle and that it is a viscosity solution: see e.g. [18] or [10]. 5 Dynamic programming equation for ﬁnite horizon: viscosit y theory In this section we study the v alue function V for the problem (3.3). W e still supp ose that Assumptions 2.1 and 3.1 hold. W e will show that V is the unique viscosit y solution to the HJB equation. Here this is an equation for a function v ( t, x ) = v ( t, x 1 , . . . , x N ) on the domain (0 , T ) × D of the form − v t ( t, x ) − sup a ∈ A  1 2 X ij ∂ 2 ij v ( t, x ) x i x j d X k =1 h k ( i, a, t ) h k ( j, a, t ) + X ij ∂ i v ( t, x ) x j q ( a, t, j, i ) + X i x i f ( i, a, t )  = 0 , (5.1) with the b oundary condition v ( T , x ) = X i x i g ( i ) , x ∈ D . (5.2) Denoting D v and D 2 v the gradient and the Hessian matrix of v with resp ect to x , w e hav e − v t + F ( t, x, Dv , D 2 v ) = 0 , where F is deﬁned for t ∈ [0 , T ], x ∈ D , p ∈ R N and X ∈ S ( N ) by F ( t, x, p, X ) = − sup a ∈ A  1 2 T race  Σ( x, a, t )Σ( x, a, t ) T X  + ⟨ p, b ( x, a, t ) ⟩ + f ( x, a, t )  Σ( x, a, t ), b ( x, a, t ) and f ( x, a, t ) are deﬁned similarly as b efore, but p ossibly dep ending on t . The inequality (4.2) still holds for ev ery t , with the same constan t C 0 . W e will assume that h, q , f are contin uous in t ∈ [0 , T ] uniformly in a ∈ A , so that F is a contin uous function of all its arguments. W e rep ort the standard deﬁnitions of viscosit y sub- and sup ersolution using parab olic sub/superjets. F or u : (0 , T ) × D → R , t ∈ (0 , T ), x ∈ D , the parab olic sup erjet P 2 , + u ( t, x ) is the set of triples ( a, p, X ) ∈ R × R N × S ( N ) such that u ( s, y ) ≤ u ( t, x ) + a ( s − t ) + ⟨ p, y − x ⟩ + 1 2 ⟨ X ( y − x ) , y − x ⟩ + o ( | y − x | 2 + | s − t | ) as y ∈ D, y → x , s ∈ (0 , T ) , s → t . The closure ¯ P 2 , + u ( t, x ) consists of the triples ( a, p, X ) suc h that there exists a sequence ( a n , x n , p n , X n ) ∈ (0 , T ) × D × R d × S ( N ) such that a n → a, x n → x , p n → p , X n → X , u ( t n , x n ) → u ( t, x ), ( a n , p n , X n ) ∈ J 2 , + u ( t n , x n ). W e deﬁne sub jets setting P 2 , − u ( t, x ) = − P 2 , + ( − u )( t, x ), ¯ P 2 , − u ( t, x ) = − ¯ P 2 , + ( − u )( t, x ). W e say that an upp er semicon tinuous function u : (0 , T ] × D → R is a viscosity subsolution if for an y t ∈ (0 , T ), x ∈ D ( a, p, X ) ∈ ¯ P 2 , + u ( t, x ) = ⇒ a + F ( t, x, p, X ) ≤ 0 , and moreov er u ( T , x ) ≤ g ( x ) for x ∈ D . A low er semicontin uous function v : (0 , T ] × D → R is called a viscosit y sup ersolution if for an y t ∈ (0 , T ], x ∈ D ( a, p, X ) ∈ ¯ P 2 , − v ( t, x ) = ⇒ a + F ( t, x, p, X ) ≥ 0 , and moreov er v ( T , x ) ≥ g ( x ) for x ∈ D . Finally , a viscosity solution is b oth a sub- and sup ersolution. W e ﬁrst prov e the following comparison result. 15 Theorem 5.1 Supp ose that h, q , f ar e c ontinuous in t ∈ [0 , T ] uniformly in a ∈ A . Supp ose that u, v : (0 , T ] × D → R ar e upp er and lower semic ontinuous, r esp e ctively. L et u b e a subsolution and v a sup ersolution satisfying u ( T , x ) ≤ v ( T , x ) x ∈ D . (5.3) Supp ose mor e over that ther e exists a c onstant C 1 > 0 such that u ( t, x ) ≤ C 1 (1 + | x | ) , v ( t, x ) ≥ − C 1 (1 + | x | ) , t ∈ (0 , T ] , x ∈ D . (5.4) Then u ≤ v on (0 , T ] × D . Pro of. Step I. W e will ﬁrst prov e this result for sub/sup ersolutions to the equation − v t + K v ( t, x ) + F ( t, x, D v , D 2 v ) = 0 , (5.5) where K > 0 will b e taken suﬃciently large (in fact, satisfying K ≥ 2 C 0 , compare (4.2)). The general case will then b e reduced to this one. F or δ ∈ (0 , 1] deﬁne θ δ = sup x ∈ D, 0 0 and δ > 0 suc h that θ δ ≥ ¯ θ . F rom no w on w e ﬁx δ and omit to indicate that sev eral quan tities in the sequel may dep end on it. Deﬁne, for x, y ∈ D , t ∈ (0 , T ], α > 0, Φ α ( t, x, y ) = u ( t, x ) − v ( t, y ) − δ (1 + | x | 2 ) − δ (1 + | y | 2 ) − δ t − α 2 | x − y | 2 . Later we will let α → ∞ . F rom (5.4) it follows that Φ α ( t, x, y ) ≤ C 1 (1 + | x | ) − δ (1 + | x | 2 ) + C 1 (1 + | y | ) − δ (1 + | y | 2 ) − δ t and since Φ α is upp er semicontin uous it achiev es a maximum at a p oint ( t α , x α , y α ) ∈ (0 , T ] × D × D . Since Φ α ( t α , x α , y α ) ≥ Φ α ( t, x, x ) = u ( t, x ) − v ( t, x ) − 2 δ (1 + | x | 2 ) − δ t , t ∈ (0 , T ] , x ∈ D it follows that Φ α ( t α , x α , y α ) ≥ θ δ ≥ ¯ θ , namely u ( t α , x α ) − v ( t α , y α ) ≥ ¯ θ + δ (1 + | x α | 2 ) + δ (1 + | y α | 2 ) + δ t α + α 2 | x α − y α | 2 . (5.6) Using (5.4) once more, w e deduce from (5.6) that C 1 (1 + | x α | ) + C 1 (1 + | y α | ) ≥ δ (1 + | x α | 2 ) + δ (1 + | y α | 2 ) + δ t α whic h implies that there exists a constant C , indep enden t from α , such that | x α | + | y α | + 1 t α ≤ C . (5.7) 16 Moreo ver, by standard arguments (see e.g. [7] Lemma 3.1 or Prop osition 3.7) w e hav e α | x α − y α | 2 → 0 , | x α − y α | → 0 (5.8) as α → ∞ . By (5.7) the family ( x α , y α , t α ) α is bounded, so it admits a limit point, necessarily of the form ( ¯ x, ¯ x, ¯ t ) b y (5.8). (5.7) also implies that ¯ t > 0. Supp ose that we had ¯ t = T : then letting α → ∞ along a subsequence, by upp er semicontin uity it follows from (5.6) and (5.8) that u ( T , ¯ x ) − v ( T , ¯ y ) ≥ ¯ θ > 0, whic h contradicts the assumption (5.3). So we conclude that 0 < ¯ t < T and it follo ws that ( x α , y α , t α ) ∈ (0 , T ) × D × D for inﬁnitely many α → ∞ . Next recall that ( t α , x α , y α ) was a maxim um p oin t of Φ α , that we rewrite in the form Φ α ( t, x, y ) = ˜ u ( t, x ) − ˜ v ( t, y ) − φ α ( t, x, y ) , where we deﬁne ˜ u ( t, x ) = u ( t, x ) − δ (1 + | x | 2 ) , ˜ v ( t, y ) = v ( y ) + δ (1 + | y | 2 ) , φ α ( t, x, y ) = δ t + α 2 | x − y | 2 . Since the quadratic terms are smo oth, the parab olic sub/sup erjets are related as follo ws: ¯ P 2 , + ˜ u ( t, x ) = ¯ P 2 , + u ( t, x ) + (0 , − 2 δ x, − 2 δ I ) , ¯ P 2 , − ˜ v ( t, y ) = ¯ P 2 , − v ( t, y ) + (0 , 2 δ y, 2 δ I ) . (5.9) W e wish to apply the the Crandall-Ishii Lemma in the parab olic form: see [6], or [7] Theorem 8.3. F or our equation, which is backw ard in time, it is conv enient to chec k the required assumptions in the form stated in [10] Theorem 6.1: we m ust show that for every M > 0 there exists a constant C ( M ) suc h that ( a, p, X ) ∈ ¯ P 2 , + ˜ u ( t, x ) , | p | + | x | + | X | + | ˜ u ( t, x ) | ≤ M = ⇒ a ≥ − C ( M ) , ( a, p, X ) ∈ ¯ P 2 , − ˜ v ( t, y ) , | p | + | y | + | X | + | ˜ v ( t, y ) | ≤ M = ⇒ a ≤ C ( M ) . W e c hec k the ﬁrst implication, the other one being similar. Assume ( a, p, X ) ∈ ¯ P 2 , + ˜ u ( t, x ). Then by (5.9) w e hav e ( a, p + 2 δ x, X + 2 δ I ) ∈ ¯ P 2 , + u ( t, x ) and since u is a subsolution to (5.5) w e ha ve − a + K u ( t, x ) + F ( t, x, p + 2 δ x, X + 2 δ I ) ≤ 0 . Since ˜ u ≤ u , recalling (4.2) w e hav e a ≥ K ˜ u ( t, x ) + F ( t, x, p + 2 δ x, X + 2 δ I ) ≥ K ˜ u ( t, x ) + F ( t, x, p, X ) − 2 δ C 0 | x | 2 and if | p | + | x | + | X | + | ˜ u ( t, x ) | ≤ M we obtain the required inequality a ≥ − C ( M ) setting C ( M ) = K M + 2 δ C 0 M 2 + sup {| F ( t, x, p, X ) | : | p | + | x | + | X | ≤ M , t ∈ [0 , T ] } . After c hecking the required assumptions w e can now apply the Crandall-Ishii lemma and conclude that there exist a, b ∈ R , X , Y ∈ S ( N ) suc h that ( a, α ( x α − y α ) , X ) ∈ ¯ P 2 , + ˜ u ( t α , x α ) , ( b, α ( x α − y α ) , Y ) ∈ ¯ P 2 , − ˜ v ( t α , y α ) , a − b = ( φ α ) t ( t α , x α , y α ) = − δ t 2 α ,  X 0 0 − Y  ≤ 3 α  I − I − I I  . (5.10) F rom (5.9) it follows that ( a, α ( x α − y α ) + 2 δ x α , X ) ∈ ¯ P 2 , + u ( t α , x α ) , ( b, α ( x α − y α ) − 2 δ y α , Y ) ∈ ¯ P 2 , − v ( t α , y α ) , 17 and since u, − v are subsolutions to (5.5), − a + K u ( t α , x α ) + F  t α , x α , α ( x α − y α ) + 2 δ x α , X + 2 δ I  ≤ 0 , − b + K v ( t α , y α ) + F  t α , y α , α ( x α − y α ) − 2 δ y α , Y − 2 δ I  ≥ 0 . Subtracting these inequalities and recalling the equality in (5.10) we obtain δ t 2 α + K ( u ( t α , x α ) − v ( t α , y α )) ≤ F  t α , y α , α ( x α − y α ) − 2 δ y α , Y − 2 δ I  − F  t α , x α , α ( x α − y α ) + 2 δ x α , X + 2 δ I  . Using (4.2), the right-hand side can b e estimated from abov e b y F  t α , y α , α ( x α − y α ) , Y  − F  t α , x α , α ( x α − y α ) , X  + 2 δ C 0 | x α | 2 + 2 δ C 0 | y α | 2 . Inequalit y (5.6) implies u ( t α , x α ) − v ( t α , y α ) ≥ ¯ θ + δ (1 + | x α | 2 ) + δ (1 + | y α | 2 ) so we can estimate the left-hand side from b elo w and arrive at K ¯ θ + K δ (1 + | x α | 2 ) + K δ (1 + | y α | 2 ) ≤ F  t α , y α , α ( x α − y α ) , Y  − F  t α , x α , α ( x α − y α ) , X  + 2 δ C 0 | x α | 2 + 2 δ C 0 | y α | 2 . Recall that C 0 w as a constant dep ending only on K 0 , N , d . Cho osing K ≥ 2 C 0 w e obtain K ¯ θ ≤ F  t α , y α , α ( x α − y α ) , Y  − F  t α , x α , α ( x α − y α ) , X  . It is well kno wn (see [7] Example 3.6) that the right-hand side can b e estimated as follows: K ¯ θ ≤ ω ( α | x α − y α | 2 + | x α − y α | ) for a mo dulus ω (i.e. a function ω : [0 , ∞ ) → [0 , ∞ ) such that ω (0+) = 0) that only dep ends on the Lipschitz constan ts of Σ, b , f , hence only on K 0 , N , d . Letting α → ∞ along a subsequence and recalling (5.8) w e ha ve K ¯ θ ≤ 0 whic h is a contradiction. Step II. No w w e consider the general case. F or K > 0 w e deﬁne u K ( t, x ) = e − K ( T − t ) u ( t, x ) , v K ( t, x ) = e − K ( T − t ) v ( t, x ) . It is easy to chec k that u K and v K are, resp ectively , sub- and sup er-solutions to the equation − v t + K v ( t, x ) + F K ( t, x, D v , D 2 v ) = 0 , where F K is deﬁned by F K ( t, x, p, X ) = − sup a ∈ A  1 2 T race  Σ( x, a, t )Σ( x, a, t ) T X  + ⟨ p, b ( x, a, t ) ⟩ + e − K ( T − t ) f ( x, a, t )  W e note that this equation is of the form (5.5) and it saﬁsﬁes inequalit y (4.2) with the same constan t C 0 . Therefore the result of Step I applies and taking K ≥ 2 C 0 w e conclude that u K ≤ v K and therefore u ≤ v . As in the inﬁnite horizon case we arrive at the following c haracterization of the v alue function. Theorem 5.2 Supp ose that Assumptions 2.1 and 3.1 hold and that h, q , f ar e c ontinuous in t ∈ [0 , T ] uniformly in a ∈ A . Then the value function V for the pr oblem (3.3) is the unique visc osity solution of the HJB e quation (5.1) in the class of functions having with line ar gr owth in x uniformly in t . Pro of. The linear growth condition is the second inequality in (3.6). Uniqueness follows from the previous result. The fact that V is a viscosity solution is a standard result: see e.g. [18] or [10]. 18 6 Dynamic programming equation: v eriﬁcation theorems In general, veriﬁcation theorems state that if the HJB equation admits a classical solution, and some ad- ditional conditions are satisﬁed, then the solution coincides with the v alue function and an optimal control admits a feedback form. Lo oking at the HJB equations one ma y note that the second order part degenerates when x approaches the boundary of D . Therefore w e will present results where a classical solution is assumed to exist only in the the interior of D , denoted ◦ D = { x = ( x i ) ∈ R N : x i > 0 , i = 1 , . . . , N } . The strict p ositivit y result in Prop osition 3.1- 1 will repeatedly play a role. In this section we assume that Assumptions 2.1 and 3.1 are satisﬁed and w e still denote b y V the v alue function of the separated problem. W e presen t t wo results, for the parabolic and the elliptic case resp ectively . Theorem 6.1 Supp ose that v ∈ C 1 , 2 ([0 , T ] × ◦ D ) satisﬁes e quation (5.1) on [0 , T ] × ◦ D and the terminal c ondition (5.2) on ◦ D , and has p olynomial gr owth in x uniformly in t . Then v ≥ V . A lso assume that, for every ( t, x ) ∈ [0 , T ] × ◦ D , the supr emum in the e quation is achieve d at a p oint a = b a ( t, x ) ∈ A for a me asur able function b a : [0 , T ] × ◦ D → A . Assume ﬁnal ly that, for every t ∈ [0 , T ] and x = ( x i ) ∈ ◦ D , the close d-lo op e quation    d b ρ i s = X j b ρ j s q ( b a ( s, b ρ s ) , s, j, i ) ds + b ρ i s h ( i, b a ( s, b ρ s ) , s ) dW s , s ∈ [ t, T ] , i ∈ S, b ρ i t = x i , (6.1) has an F W -adapte d c ontinuous solution b ρ . Then the c ontr ol pr o c ess in fe e db ack form b α s = b a ( s, b ρ s ) , s ∈ [ t, T ] , is optimal and v c oincides with the value function V . In p articular, a solution to (6.1) exists if, for every i, j ∈ S , the functions x 7→ q ( b a ( s, x ) , s, j, i ) , x 7→ h ( i, b a ( s, x ) , s ) , (6.2) ar e lo c al ly Lipschitz on ◦ D , uniformly in s . Pro of. The argument is classical, but w e sketc h a proof in order to show that the b eha vior of the solution near the b oundary of D is irrelev an t. W e in tro duce the controlled Kolmogorov op erator L a v ( t, x ) = 1 2 X ij ∂ 2 ij v ( t, x ) x i x j d X k =1 h k ( i, a, t ) h k ( j, a, t ) + X ij ∂ i v ( t, x ) x j q ( a, t, j, i ) , and we write the HJB equation (5.1) in the form v t ( t, x ) + sup a ∈ A  L a v ( t, x ) + X i x i f ( i, a, t )  = 0 , v ( T , x ) = X i x i g ( i ) . Let us ﬁx t ∈ [0 , T ] and x ∈ ◦ D . Given an arbitrary control pro cess α ∈ A let us denote by ρ the corresponding solution to the equation in (3.3). F or ev ery in teger k > 0 deﬁne the stopping times T k = inf { s ≥ t : | ρ s | > k or dist ( ρ s , ∂ D ) | < 1 /k } , (6.3) where dist ( · , ∂ D ) denotes the distance from the b oundary ∂ D of D . By the Ito formula we hav e v ( T ∧ T k , ρ T ∧ T k ) − v ( t, x ) = Z T ∧ T k t [ v t ( s, ρ s ) + L α s v ( s, ρ s )] ds + Z T ∧ T k t X i ∂ i v ( s, ρ s ) ρ i s h ( i, α s , s ) dW s . 19 By the choice of T k , the pro cesses { ∂ i v ( s, ρ s ) : s ∈ [ t, T ∧ T k ] } are b ounded, and the function h is also assumed to b e bounded. Therefore, up on taking exp ectation, the sto chastic integral disapp ears. Summing and substracting terms, after rearrangement we obtain v ( t, x ) = E Z T ∧ T k t n − v t ( s, ρ s ) − L α s v ( s, ρ s ) − X i ρ i s f ( i, α s , s ) o ds + E [ v ( T ∧ T k , ρ T ∧ T k )] + E Z T ∧ T k t X i ρ i s f ( i, α s , s ) ds. By the strict p ositivity result in Proposition 3.1- 1 , the tra jectories of ρ nev er leav e ◦ D , a.s. It follows that a.s. w e ha ve T ∧ T k = T for large k . Letting k → ∞ in the last displa yed form ula, b y dominated con vergence, the last tw o terms tend to E [ v ( T , ρ T )] + E Z T t X i ρ i s f ( i, α s , s ) ds = J T ( t, x, α ) . By the HJB equation the term { . . . } is nonnegativ e, and by monotone con vergence we obtain v ( t, x ) = E Z T t n − v t ( s, ρ s ) − L α s v ( s, ρ s ) − X i ρ i s f ( i, α s , s ) o ds + J T ( t, x, α ) . Since { . . . } ≥ 0 it follows that v ( t, x ) ≥ J T ( t, x, α ) for every α ∈ A and therefore v ( t, x ) ≥ V ( t, x ). When the con trol b α is chosen we ha ve { . . . } = 0 and it follo ws that v ( t, x ) = J T ( t, x, b α ), whic h shows the optimalit y of b α and the equality v ( t, x ) = V ( t, x ). T o pro ve the ﬁnal statemen t of the Theorem we note that the closed-lo op equation (6.1) has co eﬃcients with linear gro wth in x and, when (6.2) holds, also lo cally Lipsc hitz, and therefore a unique solution exists up to the stopping times T k ∧ T . As noted ab o ve, by Prop osition 3.1- 1 , a.s. w e hav e T ∧ T k = T for large k , so that the solution exists on the whole in terv al [0 , T ]. Theorem 6.2 Supp ose that the c o eﬃcients q , h k , f do not dep end on time and that v ∈ C 2 ( ◦ D ) satisﬁes e quation (4.1) on ◦ D . Supp ose that for every c ontr ol le d tr aje ctory ρ starting at x ∈ ◦ D we have lim T →∞ e − β T E [ v ( ρ T )] = 0 . Then v ≥ V . A lso assume that, for every x ∈ ◦ D , the supr emum in the e quation is achieve d at a p oint a = b a ( x ) ∈ A for a me asur able function b a : ◦ D → A . Assume ﬁnal ly that for every x = ( x i ) ∈ ◦ D , the close d-lo op e quation    d b ρ i s = X j b ρ j s q ( b a ( b ρ s ) , j, i ) ds + b ρ i s h ( i, b a ( b ρ s )) dW s , s ≥ 0 , i ∈ S, b ρ i 0 = x i , (6.4) has an F W -adapte d c ontinuous solution b ρ . Then the c ontr ol pr o c ess b α s = b a ( b ρ s ) , s ≥ 0 , is optimal and v c oincides with the value function V . In p articular, a solution to (6.4) exists if, for every i, j ∈ S , the functions x 7→ q ( b a ( x ) , j, i ) , x 7→ h ( i, b a ( x )) , (6.5) ar e lo c al ly Lipschitz on ◦ D . 20 Pro of. W e only sketc h the arguments, which are similar to the previous ones. Let ρ denote the tra jectory corresp onding to an arbitrary control α and starting p oint x ∈ ◦ D . By Proposition 3.1- 1 , ρ never hits the b oundary of D , a.s. Applying the Ito formula to e − β s v ( ρ s ) on [0 , T ∧ T k ], taking expectation and letting k → ∞ and T → ∞ we obtain v ( x ) = E Z ∞ 0 e − β s n β v ( ρ s ) − L α s v ( ρ s ) − X i ρ i s f ( i, α s ) o ds + J ∞ ( x, α ) . As b efore, the term in curly brack ets is nonnegative and it is zero when α = ˆ α . The conclusion follows. Example 6.1 Consider the case when A ⊂ R is an interv al [0 , R ] for some R > 0. T ake q ( a, t, i, j ) = a, h ( i, a, t ) = h ( i ) , f ( i, a, t ) = − a 2 2 for a ∈ [0 , R ], t ∈ [0 , T ], i, j ∈ S . Thus, w e are considering a control problem for a Mark ov chain X with con trolled transition rates that can tak e an y v alue in [0 , R ] and reward functional and observ ation pro cess giv en by J ( α ) = ¯ E " − 1 2 Z T 0 α 2 t dt + g ( X T ) # , Z t 0 h ( X s ) ds + B t , for arbitrary g : S → R . Setting γ ij = P d k =1 h k ( i ) h k ( j ), the HJB equation (5.1) b ecomes v t ( t, x ) + 1 2 X ij ∂ 2 ij v ( t, x ) x i x j γ ij + sup a ∈ [0 ,R ]  a X ij ∂ i v ( t, x ) x j − a 2 2 X i x i  = 0 , with the b oundary condition v ( T , x ) = P i x i g ( i ). Setting b a ( p ) := arg max a ∈ [0 ,R ]  p a − a 2 2  = p + ∧ R , p ∈ R , the equation b ecomes v t ( t, x ) + 1 2 X ij ∂ 2 ij v ( t, x ) x i x j γ ij + b a  P ij ∂ i v ( t, x ) x j P i x i  X i x i = 0 . Assume that a solution v ∈ C 1 , 2 ([0 , T ] × ◦ D ) exists, with polynomial growth in x uniformly in t . Then Theorem 6.1 applies and we conclude that the closed-lo op equation      d b ρ i s = b a P ℓj ∂ ℓ v ( s, b ρ s ) b ρ j s P ℓ b ρ ℓ s ! X j b ρ j s ds + b ρ i s h ( i ) dW s , s ∈ [ t, T ] , i ∈ S, b ρ i t = x i , has an F W -adapted contin uous solution b ρ , the control pro cess in feedback form b α s = b a P ℓj ∂ ℓ v ( s, b ρ s ) b ρ j s P ℓ b ρ ℓ s ! , s ∈ [ t, T ] , is optimal and v coincides with the v alue function. Remark 6.1 Deﬁne γ ij ( a ) = P d k =1 h k ( i, a ) h k ( j, a ) and write the equation (4.1) in the form β v ( x ) − sup a ∈ A  1 2 X ij ∂ 2 ij v ( x ) x i x j γ ij ( a ) + X ij ∂ i v ( x ) x j q ( a, j, i ) + X i x i f ( i, a )  = 0 . (6.6) 21 Assume in addition that the functions h ( i, · ) : A → R d , q ( · , i, j ) : A → [0 , ∞ ) and f ( i, · ) : A → R are con tinuous for ev ery i, j ∈ S and the following ellipticity condition holds: there exists κ > 0 suc h that X i,j γ ij ( a ) ξ i ξ j ≥ κ | ξ | 2 , ξ ∈ R N , a ∈ A. Note that this ma y happen only provided N ≤ d . Then one may prov e that the solution v is in fact of class C 2 ( ◦ D ) with H¨ older contin uous second deriv atives and it satisﬁes the equation in the classical sense. This follo ws from a result in [19], established for b ounded smo oth domains in R n and thus applies to any smo oth domain compactly con tained in ◦ D . (In this reference the suprem um is tak en ov er a countable family; the extension to our setting is a direct consequence of the con tinuit y of the co eﬃcients with resp ect to a and the fact that the control action space A is Polish): The same result can b e ac hieved as in [12] b y a logarithmic c hange of v ariables y i = log x i in tro ducing the auxiliary unknown function w ( y 1 , . . . , y N ) = v ( e y 1 , . . . , e y N ) , y = ( y 1 , . . . , y N ) ∈ R N , whic h is deﬁned on R N . Similarly , in the parab olic case, the equation (5.1) can b e written as − v t ( t, x ) − sup a ∈ A  1 2 X ij ∂ 2 ij v ( t, x ) x i x j γ ij ( a, t ) + X ij ∂ i v ( t, x ) x j q ( a, t, j, i ) + X i x i f ( i, a, t )  = 0 , (6.7) where γ ij ( a, t ) = P d k =1 h k ( i, a, t ) h k ( j, a, t ). Assume that the functions h ( i, · , t ) : A → R d , q ( · , t, i, j ) : A → [0 , ∞ ) and f ( i, · ) : A → R are contin uous for ev ery i, j ∈ S , t ∈ [0 , T ] and the following ellipticity condition holds: there exists κ > 0 suc h that X i,j γ ij ( a, t ) ξ i ξ j ≥ κ | ξ | 2 , ξ ∈ R N , a ∈ A, t ∈ [0 , T ] . One may pro ve again that the solution v is of class C 1 , 2 ([0 , T ] × ◦ D ) with H¨ older con tinuous deriv atives. This follo ws from [21], Theorem 1.1, see also [5] (comments b efore Theorem 9.1). 7 Sto c hastic maxim um principle W e devote this ﬁnal section to formulate a sto chastic maximum principle for the separated problem as a necessary condition for optimalit y . Although the pro of partially relies on kno wn results, the form ulation impro ves existing results in the literature, especially b ecause we remo ve the restriction that the set of control actions A should be conv ex and the co eﬃcients b e diﬀerentiable with resp ect to a ∈ A . In this section we assume that Assumptions 2.1 and 3.1 hold, and for simplicity we only treat the ﬁnite horizon case starting at time 0, namely: dρ i t = X j ρ j t q ( α t , t, j, i ) dt + ρ i t h ( i, α t , t ) dW t , ρ i 0 = x i , t ∈ [0 , T ] , i ∈ S, J ( α ) = E " Z T 0 X i ρ i t f ( i, α t , t ) dt + X i ρ i T g ( i ) # . It is con venien t to write it in vector form. Let us recall the deﬁnition of the matrix Q a and let us deﬁne the N -dimensional vectors ρ = ( ρ i ) i , h k ( a, t ) = ( h k ( i, a, t )) i , f ( a, t ) = ( f ( i, a, t )) i , g = ( g ( i )) i for a ∈ A, t ≥ 0, k = 1 , . . . , d . W e also denote the comp onent wise m ultiplication of vectors as follo ws: x ∗ y = ( x ( i ) y ( i )) i , for x = ( x ( i )) i , y = ( y ( i )) i ∈ R N . 22 With this notation w e write dρ t = ( Q α t t ) T ρ t dt + d X k =1 ρ t ∗ h k ( α t , t ) dW k t , J ( α ) = E " Z T 0 ⟨ ρ t , f ( α t , t ) ⟩ dt + ⟨ ρ T , g ⟩ # , where ⟨· , ·⟩ stands for the scalar pro duct in R N . F or an arbitrary admissible con trol ( α t ) we consider the adjoint BSDE      − dp t = − d X k =1 q k t dW k t +  Q α t t p t + d X k =1 h k ( α t , t ) ∗ q k t + f ( α t , t )  dt, p T = g . (7.1) In our sp eciﬁc case the con trolled tra jectory ( ρ t ) does not o ccur in the BSDE. The solution is understo o d in the usual w ay: the pro cess p is contin uous adapted, the processes q 1 , . . . , q d are progressive, and E " sup t ∈ [0 ,T ] | p t | 2 + d X k =1 Z T 0 | q k t | 2 # < ∞ . Within this class there exists a solution ( p, q k ), the pro cess p is unique up to indistinguishability and the pro cesses q k up to equality d P ⊗ dt -a.s. This follo ws from standard results on BSDEs and our b oundedness assumptions on q ( a, t, i, j ), h k ( i, a, t ), f ( i, a, t ). Theorem 7.1 Supp ose that the c o eﬃcients q , h k , f ar e c ontinuous functions of a ∈ A , for ﬁxe d t, i, j . Assume that ( α t ) is an optimal c ontr ol. L et ( ρ t ) b e the c orr esp onding tr aje ctory and ( p t , q k t ) the solution to the adjoint BSDE. Deﬁning the Hamiltonian H ( t, ρ, a, p, q 1 , . . . , q d ) = f ( a, t ) + ⟨ Q a t p, ρ ⟩ + d X k =1 ⟨ q k , h k ( a, t ) ∗ ρ ⟩ , a ∈ A ; ρ, p, q k ∈ R N we have, d P ⊗ dt -a.s., H ( t, ρ t , α t , p t , q 1 t , . . . , q d t ) = max a ∈ A H ( t, ρ t , a, p t , q 1 t , . . . , q d t ) . Pro of. As explained before, this result is essen tially an application of the general stochastic maxim um principle in [17] (see also [23] for a careful exposition). Our sketc h of pro of is simply in tented to giv e the reader exact indications for all details and warn ab out the minor changes required in our case. T ake an arbitrary admissible control ( ¯ α t ). F or any ϵ ∈ (0 , T ] and any Borel set I ϵ ⊂ [0 , T ] with Leb esgue measure | I ϵ | = ϵ , deﬁne the spike v ariation control setting α ϵ t =  α t , t ∈ I ϵ , ¯ α t , t ∈ [0 , T ] \ I ϵ . Since α is optimal we hav e J ( α ϵ ) ≤ J ( α ). Proceeding as in [17] one arrives at 0 ≥ J ( α ϵ ) − J ( α ) = E " Z T 0  H ( t, ρ t , α ϵ t , p t , q 1 t , . . . , q d t ) − H ( t, ρ t , α t , p t , q 1 t , . . . , q d t )  dt # + o ( ϵ ) . (7.2) This follows immediately from formula (4.59) in Section 5.4 of [23], and the reader may ﬁnd a detailed pro of there. In fact, since H and the terminal reward ⟨ g , ρ ⟩ are linear functions of ρ , their second deriv atives with resp ect to ρ v anish and the formula (4.59) in [23] reduces to (7.2). W e describ e in some detail how the conclusion follo ws from (7.2), a p oint which is often neglected in sev eral papers. W e follow the elegan t approach of [20], which is based on the follo wing result (compare Lemma 2.2 of [20]) and a voids using the Leb esgue diﬀeren tiation theorem. 23 Lemma 7.1 L et ℓ : [0 , T ] → R b e Bor el me asur able and satifying ∥ ℓ ∥ L 1 := R T 0 | ℓ ( t ) | dt < ∞ . Then for any ϵ ∈ (0 , T ] ther e exists a Bor el set I ϵ ⊂ [0 , T ] with | I ϵ | = ϵ and such that      ϵ T Z T 0 ℓ ( t ) dt − Z I ϵ ℓ ( t ) dt      ≤ ϵ 2 . Pro of. W e presen t a self-contained and simpliﬁed pro of of a more general result that can b e found in Theorem 2 in [14]. T ake a ﬁnite-v alued function ¯ ℓ suc h that ∥ ℓ − ¯ ℓ ∥ L 1 ≤ ϵ 2 / 2. W rite ¯ ℓ in the form P n i =1 ℓ i 1 E i for ℓ i ∈ R and a ﬁnite partition { E i } of [0 , T ] consisting of Borel sets. Since the Lebesgue measure is non-atomic, there exist Borel sets E i ϵ ⊂ E i suc h that | E i ϵ | = ϵ | E i | /T . Then we ha ve ϵ T Z T 0 ¯ ℓ ( t ) dt = n X i =1 ℓ i ϵ | E i | /T = n X i =1 ℓ i | E i ϵ | = Z I ϵ ¯ ℓ ( t ) dt, pro vided we set I ϵ = ∪ n i =1 E i ϵ . W e ha ve | I ϵ | = P n i =1 | E i ϵ | = P n i =1 ϵ | E i | /T = ϵ and      ϵ T Z T 0 ℓ ( t ) dt − Z I ϵ ℓ ( t ) dt      =      ϵ T Z T 0 ℓ ( t ) dt − ϵ T Z T 0 ¯ ℓ ( t ) dt + Z I ϵ ¯ ℓ ( t ) dt − Z I ϵ ℓ ( t ) dt      ≤ ϵ T ∥ ℓ − ¯ ℓ ∥ L 1 + ∥ ℓ − ¯ ℓ ∥ L 1 ≤ ϵ 2 . W e conclude the pro of of Theorem 7.1. Apply the previous lemma to the function ℓ ( t ) = E  H ( t, ρ t , ¯ α t , p t , q 1 t , . . . , q d t ) − H ( t, ρ t , α t , p t , q 1 t , . . . , q d t )  and choose the set I ϵ accordingly . Then (7.2) yields R I ϵ ℓ ( t ) dt ≤ o ( ϵ ). By the lemma we also hav e ϵ T R T 0 ℓ ( t ) dt ≤ o ( ϵ ) and we conclude that Z T 0 ℓ ( t ) dt = E " Z T 0  H ( t, ρ t , ¯ α t , p t , q 1 t , . . . , q d t ) − H ( t, ρ t , α t , p t , q 1 t , . . . , q d t )  dt # ≤ 0 (7.3) for an arbitrary admissible control ¯ α . Giv en any a ∈ A , let B a = { ( ω , t ) ∈ Ω × [0 , T ] : H ( t, ρ t ( ω ) , a, p t ( ω ) , q 1 t ( ω ) , . . . , q d t ( ω )) > H ( t, ρ t ( ω ) , α t ( ω ) , p t ( ω ) , q 1 t ( ω ) , . . . , q d t ( ω )) } . Cho osing ¯ α t ( ω ) = a 1 B a ( ω , t ) + α t ( ω ) 1 (Ω × [0 ,T ]) \ B a ( ω , t ) in (7.3) it follows that B a is d P ⊗ dt -negligible. In other words, for ev ery a ∈ A , H ( t, ρ t , α t , p t , q 1 t , . . . , q d t ) ≥ H ( t, ρ t , a, p t , q 1 t , . . . , q d t ) , d P ⊗ dt − a.s. By choosing a coun table dense set of a ’s in A , and using the contin uity of the coeﬃcients with resp ect to a , w e obtain the required conclusion. Ac knowledgemen ts. The authors wish to thank Prof. Andrzej ´ Swi¸ ec h for his help and suggestions on viscosit y solutions to the dynamic programming equations considered in this pap er. References [1] Alan Bain and Dan Crisan. F undamentals of sto chastic ﬁltering , volume 60 of Sto chastic Mo del ling and Applie d Pr ob ability . Springer, New Y ork, 2009. [2] Alain Bensoussan. Sto chastic c ontr ol of p artial ly observable systems . Cam bridge Universit y Press, Cam bridge, 1992. 24 [3] Pierre Br´ emaud. Point pr o c ess c alculus in time and sp ac e—an intr o duction with applic ations , volume 98 of Pr ob ability The ory and Sto chastic Mo del ling . Springer, Cham, [2020] © 2020. [4] Pierre Br´ emaud and Lauren t Massouli ´ e. Stability of nonlinear Ha wkes pro cesses. Ann. Pr ob ab. , 24(3):1563–1588, 1996. [5] M. G. Crandall, M. Ko can, and A. ´ Swi¸ ech. L p -theory for fully nonlinear uniformly parab olic equations. Comm. Partial Diﬀer ential Equations , 25(11-12):1997–2053, 2000. [6] Michael G. Crandall and Hitoshi Ishii. The maxim um principle for semicontin uous functions. Diﬀer ential Inte gr al Equations , 3(6):1001–1014, 1990. [7] Michael G. Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial diﬀerential equations. Bul l. A mer. Math. So c. (N.S.) , 27(1):1–67, 1992. [8] Rob ert J. Elliott, Lakhdar Aggoun, and John B. Mo ore. Hidden Markov mo dels , volume 29 of Applic a- tions of Mathematics (New Y ork) . Springer-V erlag, New Y ork, 1995. Estimation and control. [9] Giorgio F abbri, F austo Gozzi, and Andrzej ´ Swi¸ ech. Sto chastic optimal c ontr ol in inﬁnite dimension , v olume 82 of Pr ob ability The ory and Sto chastic Mo del ling . Springer, Cham, 2017. Dynamic programming and HJB equations, With a con tribution by Marco F uhrman and Gianmario T essitore. [10] W endell H. Fleming and H. Mete Soner. Contr ol le d Markov pr o c esses and visc osity solutions , volume 25 of Sto chastic Mo del ling and Applie d Pr ob ability . Springer, New Y ork, second edition, 2006. [11] F austo Gozzi and Andrzej ´ Swi¸ ech. Hamilton-Jacobi-Bellman equations for the optimal con trol of the Duncan-Mortensen-Zak ai equation. J. F unct. Anal. , 172(2):466–510, 2000. [12] F austo Gozzi and Tiziano V argiolu. Sup erreplication of Europ ean m ultiasset deriv atives with b ounded sto c hastic v olatility . Math. Metho ds Op er. R es. , 55(1):69–91, 2002. [13] Jean Jaco d. Multiv ariate point processes: predictable pro jection, Radon-Nikod ´ ym deriv ativ es, repre- sen tation of martingales. Z. Wahrscheinlichkeitsthe orie und V erw. Gebiete , 31:235–253, 1974/75. [14] Xun Jing Li and Y ung Long Y ao. Maximum principle of distributed parameter systems with time lags. In Distribute d p ar ameter systems (Vor au, 1984) , volume 75 of L e ct. Notes Contr ol Inf. Sci. , pages 410–427. Springer, Berlin, 1985. [15] P .-L. Lions. Viscosity solutions of fully nonlinear second order equations and optimal sto chastic control in inﬁnite dimensions. I I. Optimal con trol of Zak ai’s equation. In Sto chastic p artial diﬀer ential e quations and applic ations, II (Tr ento, 1988) , volume 1390 of L e ctur e Notes in Math. , pages 147–170. Springer, Berlin, 1989. [16] Makiko Nisio. Sto chastic c ontr ol the ory , volume 72 of Pr ob ability The ory and Sto chastic Mo del ling . Springer, T okyo, second edition, 2015. Dynamic programming principle. [17] Shi Ge Peng. A general sto chastic maximum principle for optimal control problems. SIAM J. Contr ol Optim. , 28(4):966–979, 1990. [18] Huyˆ en Pham. Continuous-time sto chastic c ontr ol and optimization with ﬁnancial applic ations , v olume 61 of Sto chastic Mo del ling and Applie d Pr ob ability . Springer-V erlag, Berlin, 2009. [19] M. V. Safonov. Classical solution of second-order nonlinear elliptic equations. Izv. A kad. Nauk SSSR Ser. Mat. , 52(6):1272–1287, 1328, 1988. [20] Shan Jian T ang and Xun Jing Li. Necessary conditions for optimal control of sto c hastic systems with random jumps. SIAM J. Contr ol Optim. , 32(5):1447–1475, 1994. 25 [21] Lihe W ang. On the regularity theory of fully nonlinear parabolic equations. Bul l. Amer. Math. So c. (N.S.) , 22(1):107–114, 1990. [22] W. M. W onham. Some applications of stochastic diﬀerential equations to optimal nonlinear ﬁltering. J. SIAM Contr ol Ser. A , 2:347–369, 1965. [23] Jiongmin Y ong and Xun Y u Zhou. Sto chastic c ontr ols , v olume 43 of Applic ations of Mathematics (New Y ork) . Springer-V erlag, New Y ork, 1999. Hamiltonian systems and HJB equations. 26

Partially observed controlled Markov chains and optimal control of the Wonham filter

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment