Partially observed controlled Markov chains and optimal control of the Wonham filter
We consider a class of optimal control problems, with finite or infinite horizon, for a continuous-time Markov chain with finite state space. In this case, the control process affects the transition rates. We suppose that the controlled process can n…
Authors: Fulvia Confortola, Marco Fuhrman
P artially observ ed con trolled Mark o v c hains and optimal con trol of the W onham filter F ulvia Confortola ∗ Marco F uhrman † Abstract W e consider a class of optimal control problems, with finite or infinite horizon, for a contin uous-time Mark ov c hain with finite state space. In this case, the control pro cess affects the transition rates. W e supp ose that the controlled pro cess can not b e observed, and at any time the control actions are c hosen based on the observ ation of a related sto chastic pro cess p erturbed b y an exogenous Bro wnian motion. W e describe a construction of the controlled Marko v chain, ha ving sto chastic transition rates adapted to the observ ation filtration. By a change of probability measure of Girsanov type, we introduce the so- called separated optimal control problem, where the state is the conditional (unnormalized) distribution of the controlled Mark ov chain and the observ ation pro cess becomes a driving Brownian motion, and w e prov e the equiv alence with the original control problem. The controlled equations for the separated problem are an instance of the W onham filtering equations. Next we presen t an analysis of the separated problem: w e c haracterize the v alue function as the unique viscosit y solution to the dynamic programming equations (b oth in the parab olic and the elliptic case) we prov e verifications theorems and a version of the sto chastic maximum principle in the form of a necessary conditions for optimality . MSC Classification : 60H30; 60J27; 93E11; 93E20; 49L25. Key w ords : optimal control with partial observ ation; controlled hidden Marko v mo dels; W onham filter; Bellman’s equation; viscosity solutions; sto chastic maximum principle. 1 In tro duction This pap er is devoted to the study of optimal con trol problems for con trolled Marko v c hains with partial observ ation. Except for some initial general constructions, we will consider controlled Mark ov pro cesses ( X α t ) t ≥ 0 whic h are time-contin uous and with v alues in a finite state space S . The controlled process dep ends on a con trol process ( α t ), with v alues in a general action space A , whic h is chosen in order to maximize a rew ard functional of the form J ( α ) = ¯ E " Z T 0 f ( X α t , α t ) dt + g ( X α T ) # , or J ( α ) = ¯ E Z ∞ 0 e − β t f ( X α t , α t ) dt , for the finite and infinite horizon cases, where f , g are giv en real functions and β > 0 is a discoun t factor (b elo w we also consider some slightly more general reward functionals). Here ¯ E denotes the expectation with resp ect to some probabilit y ¯ P , called the “ph ysical” probability to distinguish it from the reference probabilit y P introduced b elow. ∗ Dipartimento di Matematica, Politecnico di Milano, fulvia.confortola at polimi.it This author is a member of INdAM- GNAMP A. † Dipartimento di Matematica, Universit` a degli Studi di Milano, marco.fuhrman at unimi.it This author is a member of INdAM-GNAMP A. 1 W e consider the case of partial observ ation, namely when the state is not directly observ able and the c hoice of the control α t at any time t is based on the observ ation of the past v alues of another related pro cess, denoted ( W t ) t ≥ 0 . In the literature the related terminology Hidden (or Latent) Mark ov Model is also used. Thus, the control pro cess ( α t ) will be required to b e ( F W t )-predictable, where ( F W t ) is the σ -algebra generated by ( W t ). In our mo del we assume that the observ ation pro cess W tak es v alues in R d and has the form W t = Z t 0 h ( X α s , α s ) ds + B t (1.1) where h : S × A → R d is a given function and ( B t ) t ≥ 0 is a Brownian motion in R d . Among man y p ossible v ariations, this mo del - con trolled Mark ov c hain with observ ation corrupted by Bro wnian noise - is often deemed to b e of basic importance. The main route to the solution of the optimal control problem - that we also adopt in this paper - consists in reducing it to a differen t problem with complete observ ation (sometimes called the separated problem) where the controlled state process is given b y the so-called filter pro cess, whose v alues at time t are conditional distributions of the unobserved pro cess X α t giv en F W t . F or our mo del, in the uncon trolled case, explicit recursive equations for the filter were obtained in [22] and their solutions are called the W onham filter. There is a huge literature on partially observ ed con trol problems and w e refer the reader to the mono- graphs [2], [16] and [8] whic h include exp ositions of the required tec hnical prerequisites and con tain extensive references. The b o oks [2] and [16] mainly consider the case when the controlled process is defined as the solution to a controlled sto chastic differen tial equation in Euclidean space driven by a Bro wnian motion. The treatise [8] presents a large num b er of hidden Marko v mo dels with man y v ariations with resp ect to our case, for instance discrete-time problems, con tinuous state spaces, different observ ation mo dels and so on. In the sequel w e will also refer to [1] and [3], dealing with technical aspects on sto c hastic filtering theory and optimal control of mark ed p oin t pro cesses. The analysis of our mo del is of course made easier b y the assumption that the state space S is finite, but it turns out that a direct application of general existing theories do es not yield satisfactory results, as it requires unnecessary assumptions or it do es not give sharp conclusions. It is the purp ose of this pap er to presen t a rather complete analysis of the model sketc hed ab o ve, with v arious metho dologies (sto chastic maxim um principle and dynamic programming, including analysis of the Hamilton-Jacobi-Bellman equation), encompassing the finite and infinite horizon case and with a careful form ulation of the optimization problem. Except for some natural b oundedness or contin uity assumptions on the co efficients (the functions f , g , h in tro duced ab o ve, as well as the con trolled transition rates presented b elo w) w e try to b e as general as p ossible. In order to explain more carefully our con tributions w e hav e to en ter some tec hnical details while we describ e the plan of the pap er at the same time. The first issue concerns the construction of a controlled Mark ov c hain. In this case the transition rate from state i ∈ S to state j = i , denoted q ( a, i, j ), depends on the c hoice of the control parameter a ∈ A . Given the functions q ( a, i, j ) and an F W -predictable control pro cess ( α t ) the aim is to construct a pro cess ( X α t ) admitting stochastic transition rates q ( α t , i, j ). The precise meaning of this, according to most of the literature, is that the random measure q ( α t , X α t − , j ) dt is the comp ensator of the pro cess N t ( j ) whic h counts the n umber of jumps of ( X α t ) to the state j in the time in terv al [0 , t ], namely N t ( j ) − Z t 0 q ( α s , X α s − , j ) ds is a martingale with respect to the filtration generated b y ( W t ) and by the con trolled pro cess itself. When there is no observ ation pro cess and the only filtration is the natural one, the existence of the con trolled pro cess may b e deduced from a general result on a martingale problem for mark ed point pro cess: see [13]. In this case the con trolled pro cess is defined in a weak sense, as a la w on a canonical space. In the general case with observ ation, when the state S is finite, one can write do wn sto c hastic differen tial equations for a pure jump pro cess identifying S with a finite subset of R N : see [8] c hapter 12. In the present pap er we revert to a 2 differen t construction which is inspired by the Grigelionis theorem (see e.g. [3] section 5.7). It admits several v ariants: see for instance Section 3 of [4] for related results and references. W e construct the controlled pro cess in strong form ulation, starting from an auxiliary Poisson pro cess on an extended space and then taking an appropriate pro jection (depending on the control pro cess) on (0 , ∞ ) × S of the corresp onding random measure. This direct construction for a con trolled pure jump process has the adv antage that it can b e extended to general state space S . Section 2 is devoted to the exp osition of this result in its general form. In the following sections we apply the previous construction and w e formulate the optimal control problem. In order to introduce the separated control problem for the W onham filter one needs to p erform a change of probabilit y of Girsanov type: giv en the martingale ( Z α t ) − 1 = exp − Z t 0 h ( X α s , α s ) dB s − 1 2 Z t 0 | h ( X α s , α s ) | 2 ds , t ≥ 0 , (this inv olved notation is consisten t with the follo wing sections) one defines the so-called reference probability P setting d P ( dω ) = Z α T ( ω ) d ¯ P ( dω ) and the filter pro cess of the unnormalized conditional laws ρ i t = E [1 X α t = i Z α t | F W t ] , t ≥ 0 , i ∈ S. By the Girsanov theorem the observ ation W is a Brownian motion on [0 , T ] under P and it is well known (see e.g. [1]) that the pro cesses ( ρ i t ) solv e the Zak ai filtering equations, that are called the W onham filtering equations in this particular situation: dρ i t = X j ∈ S ρ j t q ( α t , j, i ) dt + ρ i t h ( i, α t ) dW t , i ∈ S. (1.2) The reward functional tak es the form J ( α ) = E " Z T 0 X i ∈ S ρ i t f ( i, α t ) dt + X i ∈ S ρ i T g ( i ) # , or J ( α ) = E " Z ∞ 0 e − β t X i ∈ S ρ i t f ( i, α t ) dt # . (1.3) This wa y w e obtain the separated control problem, where the new state equation is now (1.2), driven by the observ ation Brownian motion ( W t ), so that the new con trol problem is fully observ ed. As it is customary (see e.g. [2]) it is more conv enient to formulate the entire setting under the reference probability P from the b eginning and to p erform the in verse Girsanov transformation to construct the physical probability ¯ P : this w ay one obtains a weak form ulation of the original con trol problem under ¯ P . Section 3 is devoted to the presen tation of this standard material, and it also contains some preliminary prop erties of the corresp onding v alue function. In the following sections we address the optimal control problem for the state equation (1.2) and the rew ard (1.3). The controlled state ρ t = ( ρ i t ) i ∈ S ev olves in the state space D = { x = ( x 1 , . . . , x N ) ∈ R N : x i ≥ 0 , i = 1 , . . . , N } . W e first consider the dynamic programming approac h. W e introduce the v alue functions v ( t, x ) or v ( x ) for the finite or infinite horizon cases, where x ∈ D denotes the starting state. The v alue functions are related to the Hamilton-Jacobi-Bellman (HJB) equations, whic h are, respectively , parab olic and elliptic equations on D . F or instance, in the elliptic case for a function v ( x ) = v ( x 1 , . . . , x N ) this is: β v ( x ) − sup a ∈ A 1 2 X ij ∂ 2 ij v ( x ) x i x j d X k =1 h k ( i, a ) h k ( j, a ) + X ij ∂ i v ( x ) x j q ( a, j, i ) + X i x i f ( i, a ) = 0 . (1.4) In the general case this equation is fully nonlinear and it is not uniformly elliptic, so the conv enient notion of solution is the concept of viscosit y solution, see e.g. [7]. While proving that the v alue function is a viscosit y solution follows from standard results, uniqueness of solutions is more delicate and is usually pro ved via comparison results betw een sub- and super-solutions to the equation. While there exists very 3 sophisticated v ersion of this kind of results for more general cases, for instance even when D is replaced by a Hilb ert space (see [15], [11] or [9]) we are not a ware of any result which can be applied to (1.4) or to its parab olic v ersion, under our assumptions. Therefore we present tw o comparison theorems in Sections 4 and 5, th us establishing uniqueness of viscosit y solution and concluding that the v alue functions are completely c haracterized analytically as solutions to suitable PDEs. In Section 6 we prov e tw o verification theorems, for the finite and infinite time horizon, sho wing that if a classical solution to the dynamic programming equation exists then, under some additional conditions, it coincides with the v alue function and it is p ossible to construct an optimal control in feedback form for the separated problem. In our context of a controlled finite Marko v chain it ma y happ en that the HJB equation is uniformly elliptic and analytical results on the existence of smo oth solutions apply: see Remark 6.1. Section 7 is devoted to the approach to the control problem (1.2)-(1.3) by means of the sto c hastic maxim um principle. This is a basic tool in sto chastic optimization and as such it has b een applied to partially observ ed optimal control problems as well. The reader may find an exp osition and further references in [2] or [8]. W e form ulate a sto chastic maximum principle for the separated problem as a necessary condition for optimality related to our optimization problem. Although the pro of relies on classical arguments, the final statemen t improv es existing results in the literature. Indeed, the maximum principle for the controlled Zak ai equation is usually formulated under the assumptions that the set of con trol a actions A is conv ex and the coefficients are differentiable with respect to a ∈ A . These restrictions, for completely observ able con trol problems, hav e b een remo ved by P eng [17] and w e follo w the same approac h here. In spite of the greater generality , the final formulation do es not require the second adjoint equation in tro duced in [17], since simplifications o ccur due to the linearity of the separated con trol problem with resp ect to the state v ariable. In any case, the restrictions mentioned ab o ve can b e av oided and in particular the co efficients are only assumed to be contin uous with resp ect to a ∈ A : see Theorem 7.1 b elow. In conclusion, our con tribution consists in the construction of a controlled Marko v c hain with sto c hastic transition rates adapted to a general giv en filtration (in particular, a Brownian filtration); the form ulation of a separated optimal control problem for the W onham filter and the pro of of its equiv alence with the original one; a complete and largely self-contained analysis of the separated problem, b oth for the finite and infinite horizon case, including a characterization of the v alue function as the unique viscosity solution to the dynamic programming equations, a v erification theorem, an instance of the stochastic maximum principle in the form of a necessary condition for optimalit y . 2 A construction of a p oin t pro cess with random comp ensator In this section we supp ose that S is a Polish space with a Borel probability measure µ . W e assume w e are giv en a nonnegative function q ( ω , t, x, y ) with suitable prop erties (in particular, b ounded) and w e show ho w to construct an S -v alued pure jump pro cess ( X q t ) such that the corresp onding random measure admits comp ensator q ( t, X q t − , y ) µ ( dy ) dt . W e refer e.g. to [3] for prerequisites on random measures and p oint pro cesses. In the following sections this construction will b e applied to define a controlled Mark ov chain in S . Our setting is summarized in the following hypotheses. Assumption 2.1 Assume that on a pr ob ability sp ac e (Ω , F , P ) the fol lowing indep endent r andom elements ar e define d: 1. a Poisson pr o c ess ( T n ) n ≥ 1 on (0 , ∞ ) with intensity K > 0 ; we set T 0 = 0 ; 2. an indep endent se quenc e ( X n ) n ≥ 1 of r andom variables, taking values in a Polish sp ac e S , e ach with the same law µ ; 3. an S -value d r andom variable X 0 ; 4. an indep endent se quenc e ( U n ) n ≥ 1 of r andom variables, e ach uniformly distribute d on (0 , 1) . 4 W e define a random measure ¯ N ( dt, dy , du ) on (0 , ∞ ) × S × (0 , 1) b y the formula ¯ N ( dt, dy , du ) = X n ≥ 1 δ ( T n ,X n ,U n ) ( dt, dy , du ) and we denote F ¯ N = ( F ¯ N t ) t ≥ 0 the filtration generated b y ¯ N and X 0 . W e note that ¯ N is a marked P oisson pro cesses, with indep endent marks ( X n , U n ) taking v alues in S × (0 , 1). Therefore the F ¯ N -comp ensator of ¯ N is ¯ ν ( dt, dy, du ) = K dt µ ( dy ) du. No w supp ose that we are given a filtration F 1 = ( F 1 t ) t ≥ 0 in (Ω , F ), with F 1 ∞ indep enden t from the ab ov e random pro cesses and v ariables. Denote F ¯ N , 1 = ( F ¯ N , 1 t ) t ≥ 0 the filtration defined by F ¯ N , 1 t = F ¯ N t ∨ F 1 t . Since F 1 ∞ is indep endent of ¯ N , it is easily verified that ¯ ν is also the comp ensator of ¯ N with resp ect to F ¯ N , 1 . Also supp ose that we are giv en a function q : Ω × [0 , ∞ ) × S × S → R satisfying P -a.s. 0 ≤ q ( ω , t, x, y ) ≤ C q , t ≥ 0 , x, y ∈ S (2.1) for some constant C q > 0. W e assume that q is P ( F 1 ) ⊗ B ( S ) ⊗ B ( S )-measurable, where P ( F 1 ) denotes the predictable σ -algebra in Ω × [0 , ∞ ) for the filtration F 1 and B ( S ) the Borel σ -algebra in S . Finally we assume that the constan t in Assumption 2.1- 1 satisfies K ≥ C q . Define inductively ν 0 = 0 and, for k ≥ 0, ν k +1 = inf { n > ν k : U n < q ( T n , X ν k , X n ) /K } , with the conv ention inf ∅ = ∞ . W e take an elemen t δ / ∈ S and we add it to S as an isolated p oint. W e set T ν n = ∞ and X ν n = δ if ν n = ∞ and we consider the marked p oin t pro cess ( T ν n , X ν n ) n ≥ 1 . W e also in tro duce the corresp onding S ∪ { δ } -v alued piecewise-constant pro cess ( X q t ) t ≥ 0 (starting from X 0 at time 0) and the asso ciated random measure N ( dt, dy ) on (0 , ∞ ) × S : for n ≥ 0 , X q t = X ν n , T ν n ≤ t < T ν n +1 ; N ( dt, dy ) = X n ≥ 1 δ ( T ν n ,X ν n ) ( dt, dy ) 1 T ν n < ∞ . W e denote F N = ( F N t ) t ≥ 0 the filtration generated by N and X 0 , and by F N , 1 = ( F N , 1 t ) t ≥ 0 the filtration defined b y F N , 1 t = F N t ∨ F 1 t . W e note that in fact also ν k , N ( dt, dy ) and F N , 1 dep end on q , but we omit indicating this dep endence. Lemma 2.1 The pr o c ess X q is c` ad l` ag and F ¯ N , 1 -adapte d. Pro of. Since X q t = X n ≥ 0 X ν n 1 [ T ν n ,T ν n +1 ) ( t )1 ν n < ∞ (2.2) w e see that X q is clearly c` adl` ag. Adaptedness is intuitiv e, since at any time t all its present and past v alues and jump times can b e recov ered observing T n and q ( T n , i, j ) up to time t as well as the corresp onding X n , U n . Now we pro ceed to a formal pro of. Step I: for e ach k , n ≥ 0 { ν k = n } ∈ F ¯ N , 1 T n , i.e., ν k is a stopping time for the filtr ation ( F ¯ N , 1 T n ) n ≥ 0 . Since T n is a stopping time for F ¯ N , it is also a stopping time for F ¯ N , 1 . Since ( q ( t, i.j )) t is predictable for F 1 , it is also predictable - and hence progressiv ely measurable - for F ¯ N , 1 . It follows that q ( T n , i.j ) is F ¯ N , 1 T n -measurable. The same holds for ( U n , X n ) and hence for q ( T n , i.X n ), by composition. W e define a discrete time filtration H = ( H n ) n ≥ 0 and, for every i ∈ S , a time discrete pro cess ( Y n ( i )) n ≥ 0 b y the formulae Y n ( i ) = ( U n , q ( T n , i.X n )) , H n = F ¯ N , 1 T n 5 (here we set U 0 = 0). Then we hav e seen that ( Y n ( i )) n is H -adapted. It takes v alues in the set { ( q , u ) : 0 < u < 1 , 0 ≤ q < ∞} . Define D = { ( u, q ) : u < q /K } . W e can express ν 1 as the first hitting time of D b y the pro cess ( Y n ( X 0 )) n : ν 1 = inf { n > 0 : U n < q ( T n , X 0 , X n ) /K } = inf { n > 0 : Y n ( X 0 ) ∈ D } . Since ( Y n ( X 0 )) n is H -adapted we conclude that ν 1 is a stopping time for H . Since ( X n ) n is H -adapted, the pro cess ( X n ∧ ν 1 ) n is also H -adapted. Similarly , we can express ν 1 and ν 2 as the first and second hitting time of D b y the pro cess ( Y n ( X n ∧ ν 1 )) n : ν 1 = inf { n > 0 : U n < q ( T n , X n ∧ ν 1 , X n ) /K } = inf { n > 0 : Y n ( X n ∧ ν 1 ) ∈ D } ν 2 = inf { n > ν 1 : U n < q ( T n , X ν 1 , X n ∧ ν 1 ) /K } = inf { n > ν 1 : Y n ( X n ∧ ν 1 ) ∈ D } Since ( Y n ( X n ∧ ν 1 )) n is H -adapted we conclude that ν 2 is a stopping time for H . Since ( X n ) n is H -adapted, the process ( X n ∧ ν 2 ) n is also H -adapted. Iterating this argumen t we can sho w that all the random times ν k are stopping times for H and Step I is prov ed. Step II: for every k ≥ 0 , T ν k is a stopping time for the filtr ation F ¯ N , 1 . This is trivial for k = 0, so assume k ≥ 1. W e write { T ν k ≤ t } = [ n ≥ 0 { ν k = n, T n ≤ t } and we recall that T n is a stopping time for F ¯ N , 1 and, b y Step I, that { ν k = n } ∈ F ¯ N , 1 T n . It follows that { ν k = n, T n ≤ t } ∈ F ¯ N , 1 t (b y the very definition of F ¯ N , 1 T n ) and therefore also { T n k ≤ t } ∈ F ¯ N , 1 t . Step III: for every k ≥ 0 , X ν k 1 ν k < ∞ is F ¯ N , 1 T ν k -me asur able . This is clear for k = 0, since ν 0 = 0, T 0 = 0 and X 0 is F ¯ N 0 = σ ( X 0 )-measurable. Next we assume k ≥ 1. F or an y B ⊂ S and an y t ≥ 0 w e ha ve { X ν k 1 ν k < ∞ ∈ B , T ν k ≤ t } = [ n ≥ 1 { ν k = n, X n ∈ B , T n ≤ t } . F rom Step I we hav e { ν k = n } ∈ F ¯ N , 1 T n . Since X n is measurable with resp ect to F ¯ N T n ⊂ F ¯ N , 1 T n it follows that { ν k = n, X n ∈ B } ∈ F ¯ N , 1 T n and so { ν k = n, X n ∈ B , T n ≤ t } ∈ F ¯ N , 1 t (b y definition of F ¯ N , 1 T n ) and finally we obtain { X ν k 1 ν k < ∞ ∈ B , T ν k ≤ t } ∈ F ¯ N , 1 t whic h prov es Step II I. No w adaptedness of X q follo ws from Steps I I and I I I and the represen tation (2.2). W e are now ready for the main result of this section. Theorem 2.1 Supp ose that Assumption 2.1 holds and that F 1 = ( F 1 t ) t ≥ 0 is a filtr ation in (Ω , F ) , with F 1 ∞ indep endent fr om the r andom elements in Assumption 2.1. With the pr evious notation, let q : Ω × [0 , ∞ ) × S × S → R b e P ( F 1 ) ⊗ B ( S ) ⊗ B ( S ) -me asur able and satisfy (2.1) . L et us take the c onstant in Assumption 2.1-1 so lar ge that K ≥ C q . Then the F N , 1 - c omp ensator of N ( dt, dy ) is ν ( dt, dy ) = q ( t, X q t − , y ) µ ( dy ) dt. Pro of. First w e note that ( X q t − ) is F N -predictable, so that by the measurability assumptions on q the random measure q ( t, X q t − , y ) µ ( dy ) dt is F N , 1 -predictable. Let H ( t, y ) ≥ 0 b e an F N , 1 -predictable pro cess. W e ha ve E Z S Z ∞ 0 H ( t, y ) N ( dt, dy ) = E X k ≥ 1 H ( T ν k , X ν k ) 1 T ν k < ∞ . 6 F or n ≥ 1 and k ≥ 1 suc h that ν k − 1 < n < ν k the inequality U n ≥ q ( T n , X ν k , U n ) /K tak es place. So we may rewrite the previous sum adding sev eral null terms as follo ws: X k ≥ 1 H T ν k ( X ν k ) 1 T ν k < ∞ = X k ≥ 1 ν k X n =1+ ν k − 1 H ( T n , X n ) 1 U n < q ( T n , X ν k − 1 , X n ) /K 1 T ν k < ∞ (in each sum in square brack ets only the last term may b e non-zero). Next note that, for k ≥ 1, X ν k − 1 = X q ( t − ) for T ν k − 1 < t ≤ T ν k = ⇒ X ν k − 1 = X q ( T n − ) for ν k − 1 < n ≤ ν k . So we obtain X k ≥ 1 H T ν k ( X ν k ) 1 T ν k < ∞ = X k ≥ 1 ν k X n =1+ ν k − 1 H ( T n , X n ) 1 U n < q ( T n , X q ( T n − ) , X n ) /K 1 T ν k < ∞ = X n ≥ 1 H ( T n , X n ) 1 U n < q ( T n , X q ( T n − ) , X n ) /K . This may b e written as an integral with resp ect to the random measure ¯ N , leading to E Z S Z ∞ 0 H ( t, y ) N ( dt, dy ) = E Z S Z ∞ 0 Z 1 0 H ( t, y ) 1 u < q ( t, X q ( t − ) , y ) /K ¯ N ( dt, dy , du ) . F rom Lemma 2.1 it follows that ( X q ( t − )) is F ¯ N , 1 -predictable and so is the integrand in the right-hand side of the last displa yed formula. Recalling the form of the compensator of ¯ N we obtain E Z S Z ∞ 0 H ( t, y ) N ( dt, dy ) = E Z S Z ∞ 0 Z 1 0 H ( t, y ) 1 u < q ( t, X q ( t − ) , y ) /K du K dt µ ( dy ) . Noting that q ( t, X q ( t − ) , j ) /K ≤ C q /K ≤ 1 we can compute the integral in du and we conclude that E Z S Z ∞ 0 H ( t, y ) N ( dt, dy ) = E Z S Z ∞ 0 H ( t, y ) q ( t, X q ( t − ) , y ) dt µ ( dy ) = E Z S Z ∞ 0 H ( t, y ) ν ( dy , dt ) . 3 The partially observ ed con trol problem and its reform ulations In this section we supp ose that Assumption 2.1 holds. F rom now on w e also assume that the state space S is finite; we will use letters i, j to denote its elements. W e need to in tro duce a space A of control actions where the control pro cess ( α t ) takes v alues. W e also need to introduce controlled transition rates q ( a, t, i, j ), a function h ( t, i, a ) to mo del the observ ation and real functions f ( t, i, a ) and g ( i ) to define the rew ard to b e maximized; they ma y depend on the con trol action a ∈ A . According to the usual approac h (see e.g. [2]) w e will initially set the con trol problem under the reference probabilit y measure P , so that in particular the observ ation W will b e a giv en Brownian motion under P . This has the adv an tage that the corresp onding filtration does not depend on the control pro cess. Here are the hypotheses we need and which be v alid in the rest of the pap er (in addition to Assumption 2.1). Assumption 3.1 1. ( W t ) t ≥ 0 is a standar d d -dimensional Br ownian motion define d in (Ω , F , P ) ; we de- note F W = ( F W t ) t ≥ 0 its c omplete d filtr ation. 2. S is a finite set with c ar dinality N . A is a Polish sp ac e. T > 0 and β > 0 ar e given c onstants. 3. F or every i, j ∈ S ( i = j ) we ar e given numb ers g ( i ) ∈ R and functions q ( · , · , i, j ) : A × [0 , ∞ ) → [0 , ∞ ) , h ( i, · , · ) : A × [0 , ∞ ) → R d , f ( i, · , · ) : A × [0 , ∞ ) → R , which ar e Bor el me asur able, and ther e exists a c onstant K 0 such that | q ( a, t, i, j ) | + | h ( i, a, t ) | + | f ( i, a, t ) | + | g ( i ) | ≤ K 0 , a ∈ A ; t ≥ 0; i, j ∈ S ( i = j ) . (3.1) 7 4. The c onstant in Assumption 2.1-1 is taken so lar ge that N · q ( a, t, i, j ) ≤ K, a ∈ A ; t ≥ 0; i, j ∈ S ( i = j ) . 5. The r andom variables in Assumption 2.1-2 ar e uniformly distribute d on S . W e complete the definition of the rate matrix setting as usual q ( a, t, i, i ) = − X j = i q ( a, t, i, j ) . W e finally define the set of admissible controls of the partial observ ation problem as A = { α : Ω × [0 , ∞ ) → A, F W -predictable } . 3.1 The partially observ ed con trol problem for the reference probabilit y F or ev ery α ∈ A w e next define a corresp onding con trolled S -v alued process using the construction of the previous section. Instead of a general filtration F 1 no w w e tak e the filtration F W . Then w e consider the F W -predictable processes ( N · q ( α t , t, i, j )) t and w e construct the corresp onding pro cess X q as in the previous section, that will no w b e called X α . Explicitly , w e define ν 0 = 0 and, for k ≥ 0, ν k +1 = inf { n > ν k : U n < N · q ( α T n , T n , X ν k , X n ) /K } , with the con v ention inf ∅ = ∞ . W e take an element δ / ∈ S , w e set T ν n = ∞ and X ν n = δ if ν n = ∞ and w e consider the mark ed point pro cess ( T ν n , X ν n ) n ≥ 1 . The corresp onding S ∪ { δ } -v alued process ( X α t ) t ≥ 0 (starting from X 0 at time 0) defined by X α t = X ν n , for T ν n ≤ t < T ν n +1 , n ≥ 0 , is the controlled pro cess corresp onding to α ∈ A . On the finite state space S , measures µ ( dy ) are identified with their masses µ ( j ) at an y p oin t j ∈ S . F or instance, the uniform distribution of the v ariables X n is µ ( j ) = 1 / N (whic h accounts for the factor N in some of the previous formulae). Corresp ondingly , the random measure on (0 , ∞ ) × S asso ciated to ( T ν n , X ν n ) n ≥ 1 is now denoted N α ( dt, j ) = X n ≥ 1 δ ( T ν n ,X ν n ) ( dt, j ) 1 T ν n < ∞ . W e denote F N α ,W the filtration generated b y N α , X 0 and W . By Theorem 2.1, the F N α ,W -comp ensator of N α ( dt, j ) is ν α ( dt, j ) = q ( α t , t, X α t − , j ) dt. This form ula justifies the in terpretation of X α as a Mark ov chain with “sto chastic transition rates” given b y q ( α t , t, i, j ). Ha ving constructed the con trolled pro cesses X α w e can formulate the optimization problem by introduc- ing the reward functional to b e maximized. Let us define Z α t = exp Z t 0 h ( X α s , α s , s ) dW s − 1 2 Z t 0 | h ( X α s , α s , s ) | 2 ds , t ≥ 0 . The optimal control problem for a finite horizon T consists in maximizing the reward functional J T ( α ) = E " Z T 0 Z α t f ( X α t , α t , t ) dt + Z α T g ( X α T ) # o ver all α ∈ A . The infinite horizon optimal control problem consists in maximizing the discounted reward functional J ∞ ( α ) = E Z ∞ 0 e − β t Z α t f ( X α t , α t ) dt with discount rate β . In the infinite horizon case the functions q , h and f are tak en to b e time-indep endent. The o ccurrence of the pro cess Z α is explained in the following reformulation. 8 3.2 The partially observ ed con trol problem for the physical probabilit y Here we show that the previous formulation corresponds to the original control problem outlined in the in tro duction, provided an appropriate w eak sense form ulation is giv en. The first step will b e to construct a physical probability under which the observ ation has the desired form (1.1). Let us start with some preliminary remarks. W e note that, since h is b ounded, the process Z α in tro duced abov e is a con tin uous F N α ,W -martingale. By the form of the comp ensator, the pro cesses M j,α t := N α ((0 , t ] , j ) − ν α ((0 , t ] , j ) , t ≥ 0 , are also F N α ,W -martingales. Since they are lo cally of integrable v ariation, they are purely discontin uous martingales, hence orthogonal to Z α , which means that the pro ducts M j,α Z α are lo cal martingales. No w let us define, for each t ≥ 0, a consistent family of probabilities ¯ P α t on F N α ,W t corresp onding to the Dol ´ eans exp onen tial Z α , namely: d ¯ P α t = Z α t d P F N α ,W t , as well as the pro cess B α t := W t − R t 0 h ( X α s , α s , s ) ds , t ≥ 0. Then the following holds. 1. F or every T > 0, under ¯ P α T the pro cess B α is a Wiener process on [0 , T ]: This follows from the Girsanov theorem. 2. F or ev ery T > 0, under ¯ P α T the random measure N α ( dt, j ) has the same F N α ,W -comp ensator ν α ( dt, j ) = q ( α t , X α t − , j ) dt. Indeed, the processes M j,α remain F N α ,W -martingales under ¯ P α , because M j,α Z α are F N α ,W -lo cal martingales under P . It is easy to chec k that the rew ard functionals can b e written J T ( α ) = ¯ E α T " Z T 0 f ( X α t , α t , t ) dt + g ( X α T ) # , where ¯ E α T denotes exp ectation under ¯ P α T and, in the infinite horizon case, J ∞ ( α ) = lim T →∞ ¯ E α T " Z T 0 e − β t f ( X α t , α t ) dt # . (3.2) This is the original optimal control problem outlined in the introduction: indeed, on each interv al [0 , T ], the con trolled Marko v chain X α has F N α ,W -comp ensator ν α ( dt, j ) = q ( α t , t, X α t − , j ) dt. and the observ ation pro cess has the form W t = Z t 0 h ( X α s , α s , s ) ds + B α t , t ≥ 0 . This optimization problem is in weak form, since the physical probabilities ¯ P α T and the observ ation noise B α dep end on α . Remark 3.1 Supp ose that, in the finite horizon case, one wishes to maximize a functional of the form ¯ E α T X j Z T 0 ℓ ( X α t − , j, α t , t ) N α ( dt, j ) 9 for some b ounded Borel measurable real function ℓ ( i, j, a, t ). This is a running reward depending explicitly on the random measure N α ( dt, j ). Since the previous integrand is F N α ,W -predictable, this is the same as ¯ E α T X j Z T 0 ℓ ( X α t , j, α t , t ) q ( α t , t, X α t , j ) dt whic h has the form of the running reward considered b efore, setting f ( i, a, t ) = P j ℓ ( i, j, a, t ) q ( a, t, i, j ). Similar considerations apply to the infinite horizon case. 3.3 The separated optimal con trol problem Still assuming that Assumptions 2.1 and 3.1 hold true, w e come bac k to the optimization problem form ulated in subsection 3.1, that we rewrite in a different equiv alent form. It is conv enient to introduce the generator Q a t of the controlled, time-dep endent Mark o v c hain, whic h maps an y function ϕ : S → R to the function Q a t ϕ : S → R given b y Q a t ϕ ( i ) = X j ϕ ( j ) q ( a, t, i, j ) i ∈ S. Next we introduce the unnormalized conditional la w setting, for an y ϕ , ρ t ( ϕ ) = E [ ϕ ( X α t ) Z α t | F W t ] . The conditional exp ectation is taken under the reference probability P . The pro cess ( ρ t ( ϕ )) t is understo o d as the optional pro jection of ( ϕ ( X α t ) Z α t ) for the filtration F W and the form ula defines an optional pro cess with v alues in the space of nonnegative finite measures o ver S ; we refer to [1] for details. It is easy to sho w that the reward functionals can b e written J T ( α ) = E " Z T 0 ρ t ( f ( · , α t , t )) dt + ρ T ( g ) # and J ∞ ( α ) = E Z ∞ 0 e − β t ρ t ( f ( · , α t )) dt . The motiv ation to in tro duce the pro cess ρ is the w ell known fact that it is a solution to the the Zak ai filtering equation which, in the presen t case of a finite-state Marko v chain, is also called the W onham filter: for every ϕ : S → R , dρ t ( ϕ ) = ρ t ( Q α t t ϕ ) dt + ρ t h ( · , α t , t ) ϕ ( · ) dW t , ρ 0 ( ϕ ) = E [ ϕ ( X 0 )] . W e will so on see that for ev ery admissible control and any initial condition ρ 0 there exists a unique solution. This w a y we hav e obtained the so-called separated problem: the state equation is the con trolled W onham filter and the rew ard functionals dep end on the control and the corresp onding state tra jectory . 3.4 Optimal control of the W onham filter: setting and preliminary results Here we introduce the appropriate form ulation for our optimization problem, that will b e the aim of the analysis in all the following sections. W e start from the separated problem and we first note that the space of nonnegativ e finite measures ov er S can b e iden tified with D = { x = ( x 1 , . . . , x N ) ∈ R N : x i ≥ 0 , i = 1 , . . . , N } . Let us define ρ i t := ρ t (1 { i } ), where 1 { i } : S → R is the indicator function of state i . Noting that for ev ery ϕ we ha ve ρ t ( ϕ ) = P i ρ i t ϕ ( i ), easy computations sho w the con trolled W onham filtering equations can b e rewritten as a system of SDEs for the process ( ρ 1 t , . . . , ρ N t ). Allowing a general starting time t ∈ [0 , T ] and 10 a general initial condition x = ( x i ) ∈ D , the finite horizon problem is dρ i s = X j ρ j s q ( α s , s, j, i ) ds + ρ i s h ( i, α s , s ) dW s , s ∈ [ t, T ] , i ∈ S, ρ i t = x i , x ∈ D , i ∈ S, J T ( t, x, α ) = E " Z T t X i ρ i s f ( i, α s , s ) ds + X i ρ i T g ( i ) # , (3.3) and the infinite horizon problem starting at time 0 is dρ i t = X j ρ j s q ( α t , j, i ) dt + ρ i s h ( i, α t ) dW t , t ≥ 0 , i ∈ S, ρ i 0 = x i , x ∈ D , i ∈ S, J ∞ ( x, α ) = E " Z ∞ 0 e − β t X i ρ i t f ( i, α t ) dt # . (3.4) Clearly , the problem starting at time t ≥ 0 also admits a rephrasing in the original formulations, even under the physical probability . Now we define the v alue functions for these problems: V ( t, x ) = sup α ∈A J T ( t, x, α ) , V ( x ) = sup α ∈A J ∞ ( x, α ) , t ∈ [0 , T ] , x ∈ D . (3.5) In the following Prop osition we collect some preliminary prop erties of these optimization problems and the corresp onding v alue functions. Prop osition 3.1 Supp ose that Assumptions 2.1 and 3.1 hold true, and that the c o efficients do not dep end on time in the infinite horizon c ase. 1. F or the solution to (3.3) we have, P -a.s., ρ i s ∈ D for every s ∈ [ t, T ] , i ∈ S . If x i > 0 for every i ∈ S then, P -a.s., ρ i s > 0 for every s ∈ [ t, T ] and i ∈ S . Similar r esults hold for the solution to (3.4) . 2. F or every t ∈ [0 , T ] the function x 7→ V ( t, x ) is c onvex; mor e over ther e exists a c onstant C such that for every t ∈ [0 , T ] , x, ¯ x ∈ D , | V ( t, x ) − V ( t, ¯ x ) | ≤ C | x − ¯ x | , | V ( t, x ) | ≤ C (1 + | x | ) . (3.6) 3. the function x 7→ V ( x ) is c onvex, henc e lo c al ly Lipschitz; mor e over for every x ∈ D , | V ( x ) | ≤ 1 β sup | f | . (3.7) Pro of. 1. W e note that the equation is linear with respect to the state v ariable and has b ounded (stochastic) co efficien ts. So the classical conditions on Lipschitz con tinuit y and linear gro wth hold true and the equation has a unique contin uous F W -adapted solution starting from any x ∈ R N . If x = 0 ∈ D then ρ = 0. If x ∈ D and x = 0 then cx is a probabilit y measure on S for c = ( P i x i ) − 1 so that cρ t coincides with the unnnormalized conditional distribution and so it is a nonnegativ e measure on S ; it follows that ρ i t ≥ 0. T o prov e the result of strict positivity we write the state equations as a system of ordinary (deterministic) differen tial equations with sto chastic co efficien t: this is the so-called robust form of the Zak ai equation. W e set ν i t = ρ i t exp − Z t 0 h ( i, α s ) dW s , and w e lo ok for the equation satisfied b y ν i . Computing the Ito differen tial dν i t , after some calculations we ha ve dν i t = X j ν j t q ( α t , j, i ) exp Z t 0 [ h ( j, α s ) − h ( i, α s )] dW s dt − 1 2 ν i t | h ( i, α s ) | 2 dt. 11 So we obtain the robust equation in the form d dt ν i t = P j a ij t ν j t setting a ii t = q ( α t , i, i ) − 1 2 | h ( i, α t ) | 2 , a ij t = q ( α t , j, i ) exp Z t 0 [ h ( j, α s ) − h ( i, α s )] dW s , j = i. W e ha ve d dt ν i t = a ii t ν i t + X j = i a ij t ν j t = a ii t ν i t + g i t where g i t := P j = i a ij t ν j t ≥ 0 by the nonnegativity result already prov ed. It follows that ν i t = ν i 0 exp Z t 0 a ii s ds + Z t 0 exp Z t s a ii r dr g i s ds ≥ ν i 0 exp Z t 0 a ii s ds > 0 for every i ∈ S if ν i 0 > 0 for every i ∈ S . The same clearly holds for ρ i . 2. Let ρ , ¯ ρ denote the solutions starting from x, ¯ x . By standard estimates we hav e E sup s ∈ [ t,T ] | ρ s − ¯ ρ s | 2 ≤ C | x − ¯ x | 2 for some constant C (dep ending also on T ). By the b oundedness of f and g it follo ws easily that | J T ( t, x, α ) − J T ( t, ¯ x, α ) | 2 ≤ C ′ | x − ¯ x | 2 for some constant C ′ indep enden t of α . (3.6) follows immediately . W e note that the state equation and the rew ard functional are linear. It follo ws that x 7→ J T ( t, x, α ) is linear and x 7→ V ( t, x ) is conv ex as the suprem um of linear functions. 3. F rom form ula (3.2) it follo ws that | J ∞ ( x, α ) | ≤ lim inf T →∞ R T 0 e − β t sup | f | dt ≤ sup | f | /β and the estimate on V holds. Conv exity follows from linearit y as b efore. Remark 3.2 By similar arguments it is also easy to sho w that x 7→ V ( x ) is globally Lipsc hitz provided β > 0 is sufficiently large. 4 Dynamic programming equation for infinite horizon: viscosit y theory In this section we study the v alue function V for the problem (3.4). W e supp ose that Assumptions 2.1 and 3.1 hold and that the co efficien ts do not dep end on time. W e will show that V is the unique viscosit y solution to the dynamic programming equation (also called Hamilton-Jacobi-Bellman equation - HJB) which takes the following form: for x ∈ D , β v ( x ) − sup a ∈ A 1 2 X ij ∂ 2 ij v ( x ) x i x j d X k =1 h k ( i, a ) h k ( j, a ) + X ij ∂ i v ( x ) x j q ( a, j, i ) + X i x i f ( i, a ) = 0 . (4.1) This will b e written as follo ws: denoting D v and D 2 v the gradient and the Hessian matrix of v we hav e β v + F ( x, D v , D 2 v ) = 0 , where, for x ∈ D , p ∈ R N and X ∈ S ( N ) (the set of symmetric real N × N matrices) F ( x, p, X ) = − sup a ∈ A 1 2 T race Σ( x, a )Σ( x, a ) T X + ⟨ p, b ( x, a ) ⟩ + f ( x, a ) where the N × d matrix Σ( x, a ), the v ector b ( x, a ) and the real function f ( x, a ) are Σ( x, a ) = ( x i h k ( i, a )) ik , b ( x, a ) = X j x j q ( a, j, i ) i , f ( x, a ) = X i x i f ( i, a ) , for i = 1 , . . . , N , k = 1 , . . . , d . Recall that we assume (see (3.1)) | h k ( i, a ) | + | q ( a, i, j ) | + | f ( i, a ) | ≤ K 0 , a ∈ A ; k = 1 , . . . , d ; i, j = 1 , . . . , N 12 for some constant K 0 . Therefore Σ( x, a ), b ( x, a ), f ( x, a ) are Lipschitz con tinuous in x , uniformly in a , with a Lipschitz constant that only dep ends on K 0 , N , d . An easy computation sho ws that | F ( x, p, X ) − F ( x, q , Y ) | ≤ C 0 ( | x | | p − q | + | x | 2 | X − Y | ) (4.2) for ev ery x ∈ D , p, q ∈ R N , X, Y ∈ S ( N ) and for some constan t C 0 dep ending only on K 0 , N , d . W e also note that F is a contin uous function of all its arguments. Let us briefly recall the definition of viscosity sub-/sup ersolutions. W e find it conv enient to write it using sub- and superjets. The equiv alence with the other definition based on the use of test functions is w ell kno wn and can b e found e.g. in [7], [10]. F or u : D → R and x ∈ D , the sup erjet J 2 , + u ( x ) is the set of pairs ( p, X ) ∈ R N × S ( N ) such that u ( y ) ≤ u ( x ) + ⟨ p, y − x ⟩ + 1 2 ⟨ X ( y − x ) , y − x ⟩ + o ( | y − x | 2 ) as y ∈ D , y → x. The closure ¯ J 2 , + u ( x ) consists of the pairs ( p, X ) such that there exists a sequence ( x n , p n , X n ) ∈ D × R d × S ( N ) such that x n → x , p n → p , X n → X , u ( x n ) → u ( x ), ( p n , X n ) ∈ J 2 , + u ( x n ). W e define sub jets setting J 2 , − u ( x ) = − J 2 , + ( − u )( x ), ¯ J 2 , − u ( x ) = − ¯ J 2 , + ( − u )( x ). W e sa y that an upp er semicon tinuous function u : D → R is a viscosity subsolution if for any x ∈ D ( p, X ) ∈ ¯ J 2 , + u ( x ) = ⇒ β u ( x ) + F ( x, p, X ) ≤ 0 . A low er semicon tinuous function v : D → R is called a viscosity sup ersolution if for an y x ∈ D ( p, X ) ∈ ¯ J 2 , − v ( x ) = ⇒ β v ( x ) + F ( x, p, X ) ≥ 0 . Finally , a viscosity solution is b oth a sub- and supersolution. The main result of this section is the following comparison result, which immediately implies uniqueness of bounded viscosit y solution to the HJB equations and allo ws to prov e the characterization result for the v alue function. Theorem 4.1 L et u b e an upp er semic ontinuous subsolution b ounde d ab ove, v a lower semic ontinuous su- p ersolution b ounde d b elow. Then u ≤ v . Pro of. Define, for x, y ∈ D , α > 0, δ > 0, Φ( x, y ) = u ( x ) − v ( y ) − α 2 | x − y | 2 − δ log( γ + | x | 2 ) − δ log ( γ + | y | 2 ) . Here α will even tually tend to ∞ , δ to 0 and γ > 0 will b e fixed late r, sufficiently large. By the b oundedness assumption there exists a maximum point ( ˆ x, ˆ y ) ∈ D × D . Φ , ˆ x, ˆ y dep end on α, δ, γ but we omit this dep endence in the notation. By standard argumen ts (see e.g. [7] Lemma 3.1 or Prop osition 3.7) for fixed δ, γ we hav e α | ˆ x − ˆ y | 2 → 0 , | ˆ x − ˆ y | → 0 (4.3) as α → ∞ . F or ev ery x ∈ D we hav e Φ( ˆ x, ˆ y ) ≥ Φ( x, x ) = u ( x ) − v ( x ) − 2 δ log( γ + | x | 2 ) , so, letting θ δ = sup x ∈ D ( u ( x ) − v ( x ) − 2 δ log ( γ + | x | 2 )) , w e hav e Φ( ˆ x, ˆ y ) ≥ θ δ , which implies in particular θ δ + δ log( γ + | ˆ x | 2 ) + δ log ( γ + | ˆ y | 2 ) ≤ u ( ˆ x ) − v ( ˆ y ) . (4.4) 13 W e note that θ δ is decreasing in δ > 0. W e claim that lim δ → 0 θ δ ≤ 0, so that for every δ sufficien tly small we ha ve u ( x ) − v ( x ) − 2 δ log( γ + | x | 2 ) ≤ 0 for every x ∈ D and letting δ → 0 w e obtain the desired conclusion u ≤ v . T o pro ve the claim we will show that assuming lim δ → 0 θ δ ∈ (0 , ∞ ] leads to a con tradiction. Let us define g ( x ) = log( γ + | x | 2 ) , ˜ u ( x ) = u ( x ) − δ g ( x ) , ˜ v ( y ) = v ( y ) + δ g ( y ) . Then ( ˆ x, ˆ y ) is a maxim um p oint of ˜ u ( x ) − ˜ v ( y ) − α 2 | x − y | 2 . By the Crandall-Ishii Lemma (see [6], or [7] Theorem 3.2 and the discussion that follows) there exist X , Y ∈ S ( N ) such that ( α ( ˆ x − ˆ y ) , X ) ∈ ¯ J 2 , + ˜ u ( ˆ x ) , ( α ( ˆ x − ˆ y ) , Y ) ∈ ¯ J 2 , − ˜ v ( ˆ y ) , X 0 0 − Y ≤ 3 α I − I − I I . (4.5) Since g is smo oth, it follows that α ( ˆ x − ˆ y ) + δ Dg ( ˆ x ) , X + δ D 2 g ( ˆ x ) ∈ ¯ J 2 , + u ( ˆ x ) , α ( ˆ x − ˆ y ) − δ Dg ( ˆ y ) , Y − δ D 2 g ( ˆ y ) ∈ ¯ J 2 , − v ( ˆ y ) and since u, − v are subsolutions, β u ( ˆ x ) + F ˆ x, α ( ˆ x − ˆ y ) + δ D g ( ˆ x ) , X + δ D 2 g ( ˆ x ) ≤ 0 , β v ( ˆ y ) + F ˆ y, α ( ˆ x − ˆ y ) − δ D g ( ˆ y ) , Y − δ D 2 g ( ˆ y ) ≥ 0 . Subtracting these inequalities and recalling (4.4) w e obtain β θ δ + β δ log( γ + | ˆ x | 2 ) + β δ log( γ + | ˆ y | 2 ) ≤ F ˆ y, α ( ˆ x − ˆ y ) − δ D g ( ˆ y ) , Y − δ D 2 g ( ˆ y ) − F ˆ x, α ( ˆ x − ˆ y ) + δ D g ( ˆ x ) , X + δ D 2 g ( ˆ x ) . Using (4.2), the right-hand side can b e estimated by F ˆ y, α ( ˆ x − ˆ y ) , Y − F ˆ x, α ( ˆ x − ˆ y ) , X + δ C 0 ( | ˆ y | | D g ( ˆ y ) | + | ˆ y | 2 | D 2 g ( ˆ y ) | ) + δ C 0 ( | ˆ x | | D g ( ˆ x ) | + | ˆ x | 2 | D 2 g ( ˆ x ) | ) . By explicit computations, D g ( x ) = 2 x γ + | x | 2 , D 2 g ( x ) = 2 I γ + | x | 2 − 4 x ⊗ x ( γ + | x | 2 ) 2 , so that C 0 ( | x | | D g ( x ) | + | x | 2 | D 2 g ( x ) | ) ≤ C 1 where C 1 is another constant dep ending only on K 0 , N , d (and not on γ > 0). It follows that β θ δ + β δ log( γ + | ˆ x | 2 ) + β δ log( γ + | ˆ y | 2 ) ≤ F ˆ y, α ( ˆ x − ˆ y ) , Y − F ˆ x, α ( ˆ x − ˆ y ) , X + 2 δ C 1 . Cho osing γ > 0 so large that β log γ ≤ C 1 w e arrive at β θ δ ≤ F ˆ y, α ( ˆ x − ˆ y ) , Y − F ˆ x, α ( ˆ x − ˆ y ) , X . It is well kno wn (see [7] Example 3.6) that the right-hand side can b e estimated as follows: β θ δ ≤ ω ( α | ˆ x − ˆ y | 2 + | ˆ x − ˆ y | ) for a mo dulus ω (i.e. a function ω : [0 , ∞ ) → [0 , ∞ ) such that ω (0+) = 0) that only dep ends on the Lipschitz constan ts of Σ, b , f , hence only on K 0 , N , d . Letting α → ∞ and recalling (4.3) we ha ve then β θ δ ≤ 0 which leads to a con tradiction with the assumption that β > 0 and lim δ → 0 θ δ > 0. This was the main step to the follo wing result that summarizes the main conclusions on the con trol problem 14 Theorem 4.2 Supp ose that Assumptions 2.1 and 3.1 hold and that the c o efficients do not dep end on time. Then the value function V for the pr oblem (3.4) is the unique b ounde d visc osity solution of the HJB e quation (4.1) . Pro of. Boundedness of V was prov ed in (3.7) and uniqueness follo ws from the previous result. Under our assumptions it is w ell known that V satisfies a dynamic programming principle and that it is a viscosity solution: see e.g. [18] or [10]. 5 Dynamic programming equation for finite horizon: viscosit y theory In this section we study the v alue function V for the problem (3.3). W e still supp ose that Assumptions 2.1 and 3.1 hold. W e will show that V is the unique viscosit y solution to the HJB equation. Here this is an equation for a function v ( t, x ) = v ( t, x 1 , . . . , x N ) on the domain (0 , T ) × D of the form − v t ( t, x ) − sup a ∈ A 1 2 X ij ∂ 2 ij v ( t, x ) x i x j d X k =1 h k ( i, a, t ) h k ( j, a, t ) + X ij ∂ i v ( t, x ) x j q ( a, t, j, i ) + X i x i f ( i, a, t ) = 0 , (5.1) with the b oundary condition v ( T , x ) = X i x i g ( i ) , x ∈ D . (5.2) Denoting D v and D 2 v the gradient and the Hessian matrix of v with resp ect to x , w e hav e − v t + F ( t, x, Dv , D 2 v ) = 0 , where F is defined for t ∈ [0 , T ], x ∈ D , p ∈ R N and X ∈ S ( N ) by F ( t, x, p, X ) = − sup a ∈ A 1 2 T race Σ( x, a, t )Σ( x, a, t ) T X + ⟨ p, b ( x, a, t ) ⟩ + f ( x, a, t ) Σ( x, a, t ), b ( x, a, t ) and f ( x, a, t ) are defined similarly as b efore, but p ossibly dep ending on t . The inequality (4.2) still holds for ev ery t , with the same constan t C 0 . W e will assume that h, q , f are contin uous in t ∈ [0 , T ] uniformly in a ∈ A , so that F is a contin uous function of all its arguments. W e rep ort the standard definitions of viscosit y sub- and sup ersolution using parab olic sub/superjets. F or u : (0 , T ) × D → R , t ∈ (0 , T ), x ∈ D , the parab olic sup erjet P 2 , + u ( t, x ) is the set of triples ( a, p, X ) ∈ R × R N × S ( N ) such that u ( s, y ) ≤ u ( t, x ) + a ( s − t ) + ⟨ p, y − x ⟩ + 1 2 ⟨ X ( y − x ) , y − x ⟩ + o ( | y − x | 2 + | s − t | ) as y ∈ D, y → x , s ∈ (0 , T ) , s → t . The closure ¯ P 2 , + u ( t, x ) consists of the triples ( a, p, X ) suc h that there exists a sequence ( a n , x n , p n , X n ) ∈ (0 , T ) × D × R d × S ( N ) such that a n → a, x n → x , p n → p , X n → X , u ( t n , x n ) → u ( t, x ), ( a n , p n , X n ) ∈ J 2 , + u ( t n , x n ). W e define sub jets setting P 2 , − u ( t, x ) = − P 2 , + ( − u )( t, x ), ¯ P 2 , − u ( t, x ) = − ¯ P 2 , + ( − u )( t, x ). W e say that an upp er semicon tinuous function u : (0 , T ] × D → R is a viscosity subsolution if for an y t ∈ (0 , T ), x ∈ D ( a, p, X ) ∈ ¯ P 2 , + u ( t, x ) = ⇒ a + F ( t, x, p, X ) ≤ 0 , and moreov er u ( T , x ) ≤ g ( x ) for x ∈ D . A low er semicontin uous function v : (0 , T ] × D → R is called a viscosit y sup ersolution if for an y t ∈ (0 , T ], x ∈ D ( a, p, X ) ∈ ¯ P 2 , − v ( t, x ) = ⇒ a + F ( t, x, p, X ) ≥ 0 , and moreov er v ( T , x ) ≥ g ( x ) for x ∈ D . Finally , a viscosity solution is b oth a sub- and sup ersolution. W e first prov e the following comparison result. 15 Theorem 5.1 Supp ose that h, q , f ar e c ontinuous in t ∈ [0 , T ] uniformly in a ∈ A . Supp ose that u, v : (0 , T ] × D → R ar e upp er and lower semic ontinuous, r esp e ctively. L et u b e a subsolution and v a sup ersolution satisfying u ( T , x ) ≤ v ( T , x ) x ∈ D . (5.3) Supp ose mor e over that ther e exists a c onstant C 1 > 0 such that u ( t, x ) ≤ C 1 (1 + | x | ) , v ( t, x ) ≥ − C 1 (1 + | x | ) , t ∈ (0 , T ] , x ∈ D . (5.4) Then u ≤ v on (0 , T ] × D . Pro of. Step I. W e will first prov e this result for sub/sup ersolutions to the equation − v t + K v ( t, x ) + F ( t, x, D v , D 2 v ) = 0 , (5.5) where K > 0 will b e taken sufficiently large (in fact, satisfying K ≥ 2 C 0 , compare (4.2)). The general case will then b e reduced to this one. F or δ ∈ (0 , 1] define θ δ = sup x ∈ D, 0 0 and δ > 0 suc h that θ δ ≥ ¯ θ . F rom no w on w e fix δ and omit to indicate that sev eral quan tities in the sequel may dep end on it. Define, for x, y ∈ D , t ∈ (0 , T ], α > 0, Φ α ( t, x, y ) = u ( t, x ) − v ( t, y ) − δ (1 + | x | 2 ) − δ (1 + | y | 2 ) − δ t − α 2 | x − y | 2 . Later we will let α → ∞ . F rom (5.4) it follows that Φ α ( t, x, y ) ≤ C 1 (1 + | x | ) − δ (1 + | x | 2 ) + C 1 (1 + | y | ) − δ (1 + | y | 2 ) − δ t and since Φ α is upp er semicontin uous it achiev es a maximum at a p oint ( t α , x α , y α ) ∈ (0 , T ] × D × D . Since Φ α ( t α , x α , y α ) ≥ Φ α ( t, x, x ) = u ( t, x ) − v ( t, x ) − 2 δ (1 + | x | 2 ) − δ t , t ∈ (0 , T ] , x ∈ D it follows that Φ α ( t α , x α , y α ) ≥ θ δ ≥ ¯ θ , namely u ( t α , x α ) − v ( t α , y α ) ≥ ¯ θ + δ (1 + | x α | 2 ) + δ (1 + | y α | 2 ) + δ t α + α 2 | x α − y α | 2 . (5.6) Using (5.4) once more, w e deduce from (5.6) that C 1 (1 + | x α | ) + C 1 (1 + | y α | ) ≥ δ (1 + | x α | 2 ) + δ (1 + | y α | 2 ) + δ t α whic h implies that there exists a constant C , indep enden t from α , such that | x α | + | y α | + 1 t α ≤ C . (5.7) 16 Moreo ver, by standard arguments (see e.g. [7] Lemma 3.1 or Prop osition 3.7) w e hav e α | x α − y α | 2 → 0 , | x α − y α | → 0 (5.8) as α → ∞ . By (5.7) the family ( x α , y α , t α ) α is bounded, so it admits a limit point, necessarily of the form ( ¯ x, ¯ x, ¯ t ) b y (5.8). (5.7) also implies that ¯ t > 0. Supp ose that we had ¯ t = T : then letting α → ∞ along a subsequence, by upp er semicontin uity it follows from (5.6) and (5.8) that u ( T , ¯ x ) − v ( T , ¯ y ) ≥ ¯ θ > 0, whic h contradicts the assumption (5.3). So we conclude that 0 < ¯ t < T and it follo ws that ( x α , y α , t α ) ∈ (0 , T ) × D × D for infinitely many α → ∞ . Next recall that ( t α , x α , y α ) was a maxim um p oin t of Φ α , that we rewrite in the form Φ α ( t, x, y ) = ˜ u ( t, x ) − ˜ v ( t, y ) − φ α ( t, x, y ) , where we define ˜ u ( t, x ) = u ( t, x ) − δ (1 + | x | 2 ) , ˜ v ( t, y ) = v ( y ) + δ (1 + | y | 2 ) , φ α ( t, x, y ) = δ t + α 2 | x − y | 2 . Since the quadratic terms are smo oth, the parab olic sub/sup erjets are related as follo ws: ¯ P 2 , + ˜ u ( t, x ) = ¯ P 2 , + u ( t, x ) + (0 , − 2 δ x, − 2 δ I ) , ¯ P 2 , − ˜ v ( t, y ) = ¯ P 2 , − v ( t, y ) + (0 , 2 δ y, 2 δ I ) . (5.9) W e wish to apply the the Crandall-Ishii Lemma in the parab olic form: see [6], or [7] Theorem 8.3. F or our equation, which is backw ard in time, it is conv enient to chec k the required assumptions in the form stated in [10] Theorem 6.1: we m ust show that for every M > 0 there exists a constant C ( M ) suc h that ( a, p, X ) ∈ ¯ P 2 , + ˜ u ( t, x ) , | p | + | x | + | X | + | ˜ u ( t, x ) | ≤ M = ⇒ a ≥ − C ( M ) , ( a, p, X ) ∈ ¯ P 2 , − ˜ v ( t, y ) , | p | + | y | + | X | + | ˜ v ( t, y ) | ≤ M = ⇒ a ≤ C ( M ) . W e c hec k the first implication, the other one being similar. Assume ( a, p, X ) ∈ ¯ P 2 , + ˜ u ( t, x ). Then by (5.9) w e hav e ( a, p + 2 δ x, X + 2 δ I ) ∈ ¯ P 2 , + u ( t, x ) and since u is a subsolution to (5.5) w e ha ve − a + K u ( t, x ) + F ( t, x, p + 2 δ x, X + 2 δ I ) ≤ 0 . Since ˜ u ≤ u , recalling (4.2) w e hav e a ≥ K ˜ u ( t, x ) + F ( t, x, p + 2 δ x, X + 2 δ I ) ≥ K ˜ u ( t, x ) + F ( t, x, p, X ) − 2 δ C 0 | x | 2 and if | p | + | x | + | X | + | ˜ u ( t, x ) | ≤ M we obtain the required inequality a ≥ − C ( M ) setting C ( M ) = K M + 2 δ C 0 M 2 + sup {| F ( t, x, p, X ) | : | p | + | x | + | X | ≤ M , t ∈ [0 , T ] } . After c hecking the required assumptions w e can now apply the Crandall-Ishii lemma and conclude that there exist a, b ∈ R , X , Y ∈ S ( N ) suc h that ( a, α ( x α − y α ) , X ) ∈ ¯ P 2 , + ˜ u ( t α , x α ) , ( b, α ( x α − y α ) , Y ) ∈ ¯ P 2 , − ˜ v ( t α , y α ) , a − b = ( φ α ) t ( t α , x α , y α ) = − δ t 2 α , X 0 0 − Y ≤ 3 α I − I − I I . (5.10) F rom (5.9) it follows that ( a, α ( x α − y α ) + 2 δ x α , X ) ∈ ¯ P 2 , + u ( t α , x α ) , ( b, α ( x α − y α ) − 2 δ y α , Y ) ∈ ¯ P 2 , − v ( t α , y α ) , 17 and since u, − v are subsolutions to (5.5), − a + K u ( t α , x α ) + F t α , x α , α ( x α − y α ) + 2 δ x α , X + 2 δ I ≤ 0 , − b + K v ( t α , y α ) + F t α , y α , α ( x α − y α ) − 2 δ y α , Y − 2 δ I ≥ 0 . Subtracting these inequalities and recalling the equality in (5.10) we obtain δ t 2 α + K ( u ( t α , x α ) − v ( t α , y α )) ≤ F t α , y α , α ( x α − y α ) − 2 δ y α , Y − 2 δ I − F t α , x α , α ( x α − y α ) + 2 δ x α , X + 2 δ I . Using (4.2), the right-hand side can b e estimated from abov e b y F t α , y α , α ( x α − y α ) , Y − F t α , x α , α ( x α − y α ) , X + 2 δ C 0 | x α | 2 + 2 δ C 0 | y α | 2 . Inequalit y (5.6) implies u ( t α , x α ) − v ( t α , y α ) ≥ ¯ θ + δ (1 + | x α | 2 ) + δ (1 + | y α | 2 ) so we can estimate the left-hand side from b elo w and arrive at K ¯ θ + K δ (1 + | x α | 2 ) + K δ (1 + | y α | 2 ) ≤ F t α , y α , α ( x α − y α ) , Y − F t α , x α , α ( x α − y α ) , X + 2 δ C 0 | x α | 2 + 2 δ C 0 | y α | 2 . Recall that C 0 w as a constant dep ending only on K 0 , N , d . Cho osing K ≥ 2 C 0 w e obtain K ¯ θ ≤ F t α , y α , α ( x α − y α ) , Y − F t α , x α , α ( x α − y α ) , X . It is well kno wn (see [7] Example 3.6) that the right-hand side can b e estimated as follows: K ¯ θ ≤ ω ( α | x α − y α | 2 + | x α − y α | ) for a mo dulus ω (i.e. a function ω : [0 , ∞ ) → [0 , ∞ ) such that ω (0+) = 0) that only dep ends on the Lipschitz constan ts of Σ, b , f , hence only on K 0 , N , d . Letting α → ∞ along a subsequence and recalling (5.8) w e ha ve K ¯ θ ≤ 0 whic h is a contradiction. Step II. No w w e consider the general case. F or K > 0 w e define u K ( t, x ) = e − K ( T − t ) u ( t, x ) , v K ( t, x ) = e − K ( T − t ) v ( t, x ) . It is easy to chec k that u K and v K are, resp ectively , sub- and sup er-solutions to the equation − v t + K v ( t, x ) + F K ( t, x, D v , D 2 v ) = 0 , where F K is defined by F K ( t, x, p, X ) = − sup a ∈ A 1 2 T race Σ( x, a, t )Σ( x, a, t ) T X + ⟨ p, b ( x, a, t ) ⟩ + e − K ( T − t ) f ( x, a, t ) W e note that this equation is of the form (5.5) and it safisfies inequalit y (4.2) with the same constan t C 0 . Therefore the result of Step I applies and taking K ≥ 2 C 0 w e conclude that u K ≤ v K and therefore u ≤ v . As in the infinite horizon case we arrive at the following c haracterization of the v alue function. Theorem 5.2 Supp ose that Assumptions 2.1 and 3.1 hold and that h, q , f ar e c ontinuous in t ∈ [0 , T ] uniformly in a ∈ A . Then the value function V for the pr oblem (3.3) is the unique visc osity solution of the HJB e quation (5.1) in the class of functions having with line ar gr owth in x uniformly in t . Pro of. The linear growth condition is the second inequality in (3.6). Uniqueness follows from the previous result. The fact that V is a viscosity solution is a standard result: see e.g. [18] or [10]. 18 6 Dynamic programming equation: v erification theorems In general, verification theorems state that if the HJB equation admits a classical solution, and some ad- ditional conditions are satisfied, then the solution coincides with the v alue function and an optimal control admits a feedback form. Lo oking at the HJB equations one ma y note that the second order part degenerates when x approaches the boundary of D . Therefore w e will present results where a classical solution is assumed to exist only in the the interior of D , denoted ◦ D = { x = ( x i ) ∈ R N : x i > 0 , i = 1 , . . . , N } . The strict p ositivit y result in Prop osition 3.1- 1 will repeatedly play a role. In this section we assume that Assumptions 2.1 and 3.1 are satisfied and w e still denote b y V the v alue function of the separated problem. W e presen t t wo results, for the parabolic and the elliptic case resp ectively . Theorem 6.1 Supp ose that v ∈ C 1 , 2 ([0 , T ] × ◦ D ) satisfies e quation (5.1) on [0 , T ] × ◦ D and the terminal c ondition (5.2) on ◦ D , and has p olynomial gr owth in x uniformly in t . Then v ≥ V . A lso assume that, for every ( t, x ) ∈ [0 , T ] × ◦ D , the supr emum in the e quation is achieve d at a p oint a = b a ( t, x ) ∈ A for a me asur able function b a : [0 , T ] × ◦ D → A . Assume final ly that, for every t ∈ [0 , T ] and x = ( x i ) ∈ ◦ D , the close d-lo op e quation d b ρ i s = X j b ρ j s q ( b a ( s, b ρ s ) , s, j, i ) ds + b ρ i s h ( i, b a ( s, b ρ s ) , s ) dW s , s ∈ [ t, T ] , i ∈ S, b ρ i t = x i , (6.1) has an F W -adapte d c ontinuous solution b ρ . Then the c ontr ol pr o c ess in fe e db ack form b α s = b a ( s, b ρ s ) , s ∈ [ t, T ] , is optimal and v c oincides with the value function V . In p articular, a solution to (6.1) exists if, for every i, j ∈ S , the functions x 7→ q ( b a ( s, x ) , s, j, i ) , x 7→ h ( i, b a ( s, x ) , s ) , (6.2) ar e lo c al ly Lipschitz on ◦ D , uniformly in s . Pro of. The argument is classical, but w e sketc h a proof in order to show that the b eha vior of the solution near the b oundary of D is irrelev an t. W e in tro duce the controlled Kolmogorov op erator L a v ( t, x ) = 1 2 X ij ∂ 2 ij v ( t, x ) x i x j d X k =1 h k ( i, a, t ) h k ( j, a, t ) + X ij ∂ i v ( t, x ) x j q ( a, t, j, i ) , and we write the HJB equation (5.1) in the form v t ( t, x ) + sup a ∈ A L a v ( t, x ) + X i x i f ( i, a, t ) = 0 , v ( T , x ) = X i x i g ( i ) . Let us fix t ∈ [0 , T ] and x ∈ ◦ D . Given an arbitrary control pro cess α ∈ A let us denote by ρ the corresponding solution to the equation in (3.3). F or ev ery in teger k > 0 define the stopping times T k = inf { s ≥ t : | ρ s | > k or dist ( ρ s , ∂ D ) | < 1 /k } , (6.3) where dist ( · , ∂ D ) denotes the distance from the b oundary ∂ D of D . By the Ito formula we hav e v ( T ∧ T k , ρ T ∧ T k ) − v ( t, x ) = Z T ∧ T k t [ v t ( s, ρ s ) + L α s v ( s, ρ s )] ds + Z T ∧ T k t X i ∂ i v ( s, ρ s ) ρ i s h ( i, α s , s ) dW s . 19 By the choice of T k , the pro cesses { ∂ i v ( s, ρ s ) : s ∈ [ t, T ∧ T k ] } are b ounded, and the function h is also assumed to b e bounded. Therefore, up on taking exp ectation, the sto chastic integral disapp ears. Summing and substracting terms, after rearrangement we obtain v ( t, x ) = E Z T ∧ T k t n − v t ( s, ρ s ) − L α s v ( s, ρ s ) − X i ρ i s f ( i, α s , s ) o ds + E [ v ( T ∧ T k , ρ T ∧ T k )] + E Z T ∧ T k t X i ρ i s f ( i, α s , s ) ds. By the strict p ositivity result in Proposition 3.1- 1 , the tra jectories of ρ nev er leav e ◦ D , a.s. It follows that a.s. w e ha ve T ∧ T k = T for large k . Letting k → ∞ in the last displa yed form ula, b y dominated con vergence, the last tw o terms tend to E [ v ( T , ρ T )] + E Z T t X i ρ i s f ( i, α s , s ) ds = J T ( t, x, α ) . By the HJB equation the term { . . . } is nonnegativ e, and by monotone con vergence we obtain v ( t, x ) = E Z T t n − v t ( s, ρ s ) − L α s v ( s, ρ s ) − X i ρ i s f ( i, α s , s ) o ds + J T ( t, x, α ) . Since { . . . } ≥ 0 it follows that v ( t, x ) ≥ J T ( t, x, α ) for every α ∈ A and therefore v ( t, x ) ≥ V ( t, x ). When the con trol b α is chosen we ha ve { . . . } = 0 and it follo ws that v ( t, x ) = J T ( t, x, b α ), whic h shows the optimalit y of b α and the equality v ( t, x ) = V ( t, x ). T o pro ve the final statemen t of the Theorem we note that the closed-lo op equation (6.1) has co efficients with linear gro wth in x and, when (6.2) holds, also lo cally Lipsc hitz, and therefore a unique solution exists up to the stopping times T k ∧ T . As noted ab o ve, by Prop osition 3.1- 1 , a.s. w e hav e T ∧ T k = T for large k , so that the solution exists on the whole in terv al [0 , T ]. Theorem 6.2 Supp ose that the c o efficients q , h k , f do not dep end on time and that v ∈ C 2 ( ◦ D ) satisfies e quation (4.1) on ◦ D . Supp ose that for every c ontr ol le d tr aje ctory ρ starting at x ∈ ◦ D we have lim T →∞ e − β T E [ v ( ρ T )] = 0 . Then v ≥ V . A lso assume that, for every x ∈ ◦ D , the supr emum in the e quation is achieve d at a p oint a = b a ( x ) ∈ A for a me asur able function b a : ◦ D → A . Assume final ly that for every x = ( x i ) ∈ ◦ D , the close d-lo op e quation d b ρ i s = X j b ρ j s q ( b a ( b ρ s ) , j, i ) ds + b ρ i s h ( i, b a ( b ρ s )) dW s , s ≥ 0 , i ∈ S, b ρ i 0 = x i , (6.4) has an F W -adapte d c ontinuous solution b ρ . Then the c ontr ol pr o c ess b α s = b a ( b ρ s ) , s ≥ 0 , is optimal and v c oincides with the value function V . In p articular, a solution to (6.4) exists if, for every i, j ∈ S , the functions x 7→ q ( b a ( x ) , j, i ) , x 7→ h ( i, b a ( x )) , (6.5) ar e lo c al ly Lipschitz on ◦ D . 20 Pro of. W e only sketc h the arguments, which are similar to the previous ones. Let ρ denote the tra jectory corresp onding to an arbitrary control α and starting p oint x ∈ ◦ D . By Proposition 3.1- 1 , ρ never hits the b oundary of D , a.s. Applying the Ito formula to e − β s v ( ρ s ) on [0 , T ∧ T k ], taking expectation and letting k → ∞ and T → ∞ we obtain v ( x ) = E Z ∞ 0 e − β s n β v ( ρ s ) − L α s v ( ρ s ) − X i ρ i s f ( i, α s ) o ds + J ∞ ( x, α ) . As b efore, the term in curly brack ets is nonnegative and it is zero when α = ˆ α . The conclusion follows. Example 6.1 Consider the case when A ⊂ R is an interv al [0 , R ] for some R > 0. T ake q ( a, t, i, j ) = a, h ( i, a, t ) = h ( i ) , f ( i, a, t ) = − a 2 2 for a ∈ [0 , R ], t ∈ [0 , T ], i, j ∈ S . Thus, w e are considering a control problem for a Mark ov chain X with con trolled transition rates that can tak e an y v alue in [0 , R ] and reward functional and observ ation pro cess giv en by J ( α ) = ¯ E " − 1 2 Z T 0 α 2 t dt + g ( X T ) # , Z t 0 h ( X s ) ds + B t , for arbitrary g : S → R . Setting γ ij = P d k =1 h k ( i ) h k ( j ), the HJB equation (5.1) b ecomes v t ( t, x ) + 1 2 X ij ∂ 2 ij v ( t, x ) x i x j γ ij + sup a ∈ [0 ,R ] a X ij ∂ i v ( t, x ) x j − a 2 2 X i x i = 0 , with the b oundary condition v ( T , x ) = P i x i g ( i ). Setting b a ( p ) := arg max a ∈ [0 ,R ] p a − a 2 2 = p + ∧ R , p ∈ R , the equation b ecomes v t ( t, x ) + 1 2 X ij ∂ 2 ij v ( t, x ) x i x j γ ij + b a P ij ∂ i v ( t, x ) x j P i x i X i x i = 0 . Assume that a solution v ∈ C 1 , 2 ([0 , T ] × ◦ D ) exists, with polynomial growth in x uniformly in t . Then Theorem 6.1 applies and we conclude that the closed-lo op equation d b ρ i s = b a P ℓj ∂ ℓ v ( s, b ρ s ) b ρ j s P ℓ b ρ ℓ s ! X j b ρ j s ds + b ρ i s h ( i ) dW s , s ∈ [ t, T ] , i ∈ S, b ρ i t = x i , has an F W -adapted contin uous solution b ρ , the control pro cess in feedback form b α s = b a P ℓj ∂ ℓ v ( s, b ρ s ) b ρ j s P ℓ b ρ ℓ s ! , s ∈ [ t, T ] , is optimal and v coincides with the v alue function. Remark 6.1 Define γ ij ( a ) = P d k =1 h k ( i, a ) h k ( j, a ) and write the equation (4.1) in the form β v ( x ) − sup a ∈ A 1 2 X ij ∂ 2 ij v ( x ) x i x j γ ij ( a ) + X ij ∂ i v ( x ) x j q ( a, j, i ) + X i x i f ( i, a ) = 0 . (6.6) 21 Assume in addition that the functions h ( i, · ) : A → R d , q ( · , i, j ) : A → [0 , ∞ ) and f ( i, · ) : A → R are con tinuous for ev ery i, j ∈ S and the following ellipticity condition holds: there exists κ > 0 suc h that X i,j γ ij ( a ) ξ i ξ j ≥ κ | ξ | 2 , ξ ∈ R N , a ∈ A. Note that this ma y happen only provided N ≤ d . Then one may prov e that the solution v is in fact of class C 2 ( ◦ D ) with H¨ older contin uous second deriv atives and it satisfies the equation in the classical sense. This follo ws from a result in [19], established for b ounded smo oth domains in R n and thus applies to any smo oth domain compactly con tained in ◦ D . (In this reference the suprem um is tak en ov er a countable family; the extension to our setting is a direct consequence of the con tinuit y of the co efficients with resp ect to a and the fact that the control action space A is Polish): The same result can b e ac hieved as in [12] b y a logarithmic c hange of v ariables y i = log x i in tro ducing the auxiliary unknown function w ( y 1 , . . . , y N ) = v ( e y 1 , . . . , e y N ) , y = ( y 1 , . . . , y N ) ∈ R N , whic h is defined on R N . Similarly , in the parab olic case, the equation (5.1) can b e written as − v t ( t, x ) − sup a ∈ A 1 2 X ij ∂ 2 ij v ( t, x ) x i x j γ ij ( a, t ) + X ij ∂ i v ( t, x ) x j q ( a, t, j, i ) + X i x i f ( i, a, t ) = 0 , (6.7) where γ ij ( a, t ) = P d k =1 h k ( i, a, t ) h k ( j, a, t ). Assume that the functions h ( i, · , t ) : A → R d , q ( · , t, i, j ) : A → [0 , ∞ ) and f ( i, · ) : A → R are contin uous for ev ery i, j ∈ S , t ∈ [0 , T ] and the following ellipticity condition holds: there exists κ > 0 suc h that X i,j γ ij ( a, t ) ξ i ξ j ≥ κ | ξ | 2 , ξ ∈ R N , a ∈ A, t ∈ [0 , T ] . One may pro ve again that the solution v is of class C 1 , 2 ([0 , T ] × ◦ D ) with H¨ older con tinuous deriv atives. This follo ws from [21], Theorem 1.1, see also [5] (comments b efore Theorem 9.1). 7 Sto c hastic maxim um principle W e devote this final section to formulate a sto chastic maximum principle for the separated problem as a necessary condition for optimalit y . Although the pro of partially relies on kno wn results, the form ulation impro ves existing results in the literature, especially b ecause we remo ve the restriction that the set of control actions A should be conv ex and the co efficients b e differentiable with resp ect to a ∈ A . In this section we assume that Assumptions 2.1 and 3.1 hold, and for simplicity we only treat the finite horizon case starting at time 0, namely: dρ i t = X j ρ j t q ( α t , t, j, i ) dt + ρ i t h ( i, α t , t ) dW t , ρ i 0 = x i , t ∈ [0 , T ] , i ∈ S, J ( α ) = E " Z T 0 X i ρ i t f ( i, α t , t ) dt + X i ρ i T g ( i ) # . It is con venien t to write it in vector form. Let us recall the definition of the matrix Q a and let us define the N -dimensional vectors ρ = ( ρ i ) i , h k ( a, t ) = ( h k ( i, a, t )) i , f ( a, t ) = ( f ( i, a, t )) i , g = ( g ( i )) i for a ∈ A, t ≥ 0, k = 1 , . . . , d . W e also denote the comp onent wise m ultiplication of vectors as follo ws: x ∗ y = ( x ( i ) y ( i )) i , for x = ( x ( i )) i , y = ( y ( i )) i ∈ R N . 22 With this notation w e write dρ t = ( Q α t t ) T ρ t dt + d X k =1 ρ t ∗ h k ( α t , t ) dW k t , J ( α ) = E " Z T 0 ⟨ ρ t , f ( α t , t ) ⟩ dt + ⟨ ρ T , g ⟩ # , where ⟨· , ·⟩ stands for the scalar pro duct in R N . F or an arbitrary admissible con trol ( α t ) we consider the adjoint BSDE − dp t = − d X k =1 q k t dW k t + Q α t t p t + d X k =1 h k ( α t , t ) ∗ q k t + f ( α t , t ) dt, p T = g . (7.1) In our sp ecific case the con trolled tra jectory ( ρ t ) does not o ccur in the BSDE. The solution is understo o d in the usual w ay: the pro cess p is contin uous adapted, the processes q 1 , . . . , q d are progressive, and E " sup t ∈ [0 ,T ] | p t | 2 + d X k =1 Z T 0 | q k t | 2 # < ∞ . Within this class there exists a solution ( p, q k ), the pro cess p is unique up to indistinguishability and the pro cesses q k up to equality d P ⊗ dt -a.s. This follo ws from standard results on BSDEs and our b oundedness assumptions on q ( a, t, i, j ), h k ( i, a, t ), f ( i, a, t ). Theorem 7.1 Supp ose that the c o efficients q , h k , f ar e c ontinuous functions of a ∈ A , for fixe d t, i, j . Assume that ( α t ) is an optimal c ontr ol. L et ( ρ t ) b e the c orr esp onding tr aje ctory and ( p t , q k t ) the solution to the adjoint BSDE. Defining the Hamiltonian H ( t, ρ, a, p, q 1 , . . . , q d ) = f ( a, t ) + ⟨ Q a t p, ρ ⟩ + d X k =1 ⟨ q k , h k ( a, t ) ∗ ρ ⟩ , a ∈ A ; ρ, p, q k ∈ R N we have, d P ⊗ dt -a.s., H ( t, ρ t , α t , p t , q 1 t , . . . , q d t ) = max a ∈ A H ( t, ρ t , a, p t , q 1 t , . . . , q d t ) . Pro of. As explained before, this result is essen tially an application of the general stochastic maxim um principle in [17] (see also [23] for a careful exposition). Our sketc h of pro of is simply in tented to giv e the reader exact indications for all details and warn ab out the minor changes required in our case. T ake an arbitrary admissible control ( ¯ α t ). F or any ϵ ∈ (0 , T ] and any Borel set I ϵ ⊂ [0 , T ] with Leb esgue measure | I ϵ | = ϵ , define the spike v ariation control setting α ϵ t = α t , t ∈ I ϵ , ¯ α t , t ∈ [0 , T ] \ I ϵ . Since α is optimal we hav e J ( α ϵ ) ≤ J ( α ). Proceeding as in [17] one arrives at 0 ≥ J ( α ϵ ) − J ( α ) = E " Z T 0 H ( t, ρ t , α ϵ t , p t , q 1 t , . . . , q d t ) − H ( t, ρ t , α t , p t , q 1 t , . . . , q d t ) dt # + o ( ϵ ) . (7.2) This follows immediately from formula (4.59) in Section 5.4 of [23], and the reader may find a detailed pro of there. In fact, since H and the terminal reward ⟨ g , ρ ⟩ are linear functions of ρ , their second deriv atives with resp ect to ρ v anish and the formula (4.59) in [23] reduces to (7.2). W e describ e in some detail how the conclusion follo ws from (7.2), a p oint which is often neglected in sev eral papers. W e follow the elegan t approach of [20], which is based on the follo wing result (compare Lemma 2.2 of [20]) and a voids using the Leb esgue differen tiation theorem. 23 Lemma 7.1 L et ℓ : [0 , T ] → R b e Bor el me asur able and satifying ∥ ℓ ∥ L 1 := R T 0 | ℓ ( t ) | dt < ∞ . Then for any ϵ ∈ (0 , T ] ther e exists a Bor el set I ϵ ⊂ [0 , T ] with | I ϵ | = ϵ and such that ϵ T Z T 0 ℓ ( t ) dt − Z I ϵ ℓ ( t ) dt ≤ ϵ 2 . Pro of. W e presen t a self-contained and simplified pro of of a more general result that can b e found in Theorem 2 in [14]. T ake a finite-v alued function ¯ ℓ suc h that ∥ ℓ − ¯ ℓ ∥ L 1 ≤ ϵ 2 / 2. W rite ¯ ℓ in the form P n i =1 ℓ i 1 E i for ℓ i ∈ R and a finite partition { E i } of [0 , T ] consisting of Borel sets. Since the Lebesgue measure is non-atomic, there exist Borel sets E i ϵ ⊂ E i suc h that | E i ϵ | = ϵ | E i | /T . Then we ha ve ϵ T Z T 0 ¯ ℓ ( t ) dt = n X i =1 ℓ i ϵ | E i | /T = n X i =1 ℓ i | E i ϵ | = Z I ϵ ¯ ℓ ( t ) dt, pro vided we set I ϵ = ∪ n i =1 E i ϵ . W e ha ve | I ϵ | = P n i =1 | E i ϵ | = P n i =1 ϵ | E i | /T = ϵ and ϵ T Z T 0 ℓ ( t ) dt − Z I ϵ ℓ ( t ) dt = ϵ T Z T 0 ℓ ( t ) dt − ϵ T Z T 0 ¯ ℓ ( t ) dt + Z I ϵ ¯ ℓ ( t ) dt − Z I ϵ ℓ ( t ) dt ≤ ϵ T ∥ ℓ − ¯ ℓ ∥ L 1 + ∥ ℓ − ¯ ℓ ∥ L 1 ≤ ϵ 2 . W e conclude the pro of of Theorem 7.1. Apply the previous lemma to the function ℓ ( t ) = E H ( t, ρ t , ¯ α t , p t , q 1 t , . . . , q d t ) − H ( t, ρ t , α t , p t , q 1 t , . . . , q d t ) and choose the set I ϵ accordingly . Then (7.2) yields R I ϵ ℓ ( t ) dt ≤ o ( ϵ ). By the lemma we also hav e ϵ T R T 0 ℓ ( t ) dt ≤ o ( ϵ ) and we conclude that Z T 0 ℓ ( t ) dt = E " Z T 0 H ( t, ρ t , ¯ α t , p t , q 1 t , . . . , q d t ) − H ( t, ρ t , α t , p t , q 1 t , . . . , q d t ) dt # ≤ 0 (7.3) for an arbitrary admissible control ¯ α . Giv en any a ∈ A , let B a = { ( ω , t ) ∈ Ω × [0 , T ] : H ( t, ρ t ( ω ) , a, p t ( ω ) , q 1 t ( ω ) , . . . , q d t ( ω )) > H ( t, ρ t ( ω ) , α t ( ω ) , p t ( ω ) , q 1 t ( ω ) , . . . , q d t ( ω )) } . Cho osing ¯ α t ( ω ) = a 1 B a ( ω , t ) + α t ( ω ) 1 (Ω × [0 ,T ]) \ B a ( ω , t ) in (7.3) it follows that B a is d P ⊗ dt -negligible. In other words, for ev ery a ∈ A , H ( t, ρ t , α t , p t , q 1 t , . . . , q d t ) ≥ H ( t, ρ t , a, p t , q 1 t , . . . , q d t ) , d P ⊗ dt − a.s. By choosing a coun table dense set of a ’s in A , and using the contin uity of the coefficients with resp ect to a , w e obtain the required conclusion. Ac knowledgemen ts. The authors wish to thank Prof. Andrzej ´ Swi¸ ec h for his help and suggestions on viscosit y solutions to the dynamic programming equations considered in this pap er. References [1] Alan Bain and Dan Crisan. F undamentals of sto chastic filtering , volume 60 of Sto chastic Mo del ling and Applie d Pr ob ability . Springer, New Y ork, 2009. [2] Alain Bensoussan. Sto chastic c ontr ol of p artial ly observable systems . Cam bridge Universit y Press, Cam bridge, 1992. 24 [3] Pierre Br´ emaud. Point pr o c ess c alculus in time and sp ac e—an intr o duction with applic ations , volume 98 of Pr ob ability The ory and Sto chastic Mo del ling . Springer, Cham, [2020] © 2020. [4] Pierre Br´ emaud and Lauren t Massouli ´ e. Stability of nonlinear Ha wkes pro cesses. Ann. Pr ob ab. , 24(3):1563–1588, 1996. [5] M. G. Crandall, M. Ko can, and A. ´ Swi¸ ech. L p -theory for fully nonlinear uniformly parab olic equations. Comm. Partial Differ ential Equations , 25(11-12):1997–2053, 2000. [6] Michael G. Crandall and Hitoshi Ishii. The maxim um principle for semicontin uous functions. Differ ential Inte gr al Equations , 3(6):1001–1014, 1990. [7] Michael G. Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial differential equations. Bul l. A mer. Math. So c. (N.S.) , 27(1):1–67, 1992. [8] Rob ert J. Elliott, Lakhdar Aggoun, and John B. Mo ore. Hidden Markov mo dels , volume 29 of Applic a- tions of Mathematics (New Y ork) . Springer-V erlag, New Y ork, 1995. Estimation and control. [9] Giorgio F abbri, F austo Gozzi, and Andrzej ´ Swi¸ ech. Sto chastic optimal c ontr ol in infinite dimension , v olume 82 of Pr ob ability The ory and Sto chastic Mo del ling . Springer, Cham, 2017. Dynamic programming and HJB equations, With a con tribution by Marco F uhrman and Gianmario T essitore. [10] W endell H. Fleming and H. Mete Soner. Contr ol le d Markov pr o c esses and visc osity solutions , volume 25 of Sto chastic Mo del ling and Applie d Pr ob ability . Springer, New Y ork, second edition, 2006. [11] F austo Gozzi and Andrzej ´ Swi¸ ech. Hamilton-Jacobi-Bellman equations for the optimal con trol of the Duncan-Mortensen-Zak ai equation. J. F unct. Anal. , 172(2):466–510, 2000. [12] F austo Gozzi and Tiziano V argiolu. Sup erreplication of Europ ean m ultiasset deriv atives with b ounded sto c hastic v olatility . Math. Metho ds Op er. R es. , 55(1):69–91, 2002. [13] Jean Jaco d. Multiv ariate point processes: predictable pro jection, Radon-Nikod ´ ym deriv ativ es, repre- sen tation of martingales. Z. Wahrscheinlichkeitsthe orie und V erw. Gebiete , 31:235–253, 1974/75. [14] Xun Jing Li and Y ung Long Y ao. Maximum principle of distributed parameter systems with time lags. In Distribute d p ar ameter systems (Vor au, 1984) , volume 75 of L e ct. Notes Contr ol Inf. Sci. , pages 410–427. Springer, Berlin, 1985. [15] P .-L. Lions. Viscosity solutions of fully nonlinear second order equations and optimal sto chastic control in infinite dimensions. I I. Optimal con trol of Zak ai’s equation. In Sto chastic p artial differ ential e quations and applic ations, II (Tr ento, 1988) , volume 1390 of L e ctur e Notes in Math. , pages 147–170. Springer, Berlin, 1989. [16] Makiko Nisio. Sto chastic c ontr ol the ory , volume 72 of Pr ob ability The ory and Sto chastic Mo del ling . Springer, T okyo, second edition, 2015. Dynamic programming principle. [17] Shi Ge Peng. A general sto chastic maximum principle for optimal control problems. SIAM J. Contr ol Optim. , 28(4):966–979, 1990. [18] Huyˆ en Pham. Continuous-time sto chastic c ontr ol and optimization with financial applic ations , v olume 61 of Sto chastic Mo del ling and Applie d Pr ob ability . Springer-V erlag, Berlin, 2009. [19] M. V. Safonov. Classical solution of second-order nonlinear elliptic equations. Izv. A kad. Nauk SSSR Ser. Mat. , 52(6):1272–1287, 1328, 1988. [20] Shan Jian T ang and Xun Jing Li. Necessary conditions for optimal control of sto c hastic systems with random jumps. SIAM J. Contr ol Optim. , 32(5):1447–1475, 1994. 25 [21] Lihe W ang. On the regularity theory of fully nonlinear parabolic equations. Bul l. Amer. Math. So c. (N.S.) , 22(1):107–114, 1990. [22] W. M. W onham. Some applications of stochastic differential equations to optimal nonlinear filtering. J. SIAM Contr ol Ser. A , 2:347–369, 1965. [23] Jiongmin Y ong and Xun Y u Zhou. Sto chastic c ontr ols , v olume 43 of Applic ations of Mathematics (New Y ork) . Springer-V erlag, New Y ork, 1999. Hamiltonian systems and HJB equations. 26
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment