Here, there and everywhere: state-dependent time-inconsistent stochastic control

Here, there and ev erywhere: state-dep enden t time-inconsisten t sto c hastic con trol Dylan Possamaï ∗ Mateo R odriguez Polo † Marc h 24, 2026 Abstract This pap er addresses the challenge of time-inconsisten t stochastic con trol within a con tinuous-time framework. Its primary fo cus lies in unco vering a probabilistic represen tation, speciﬁcally in the shap e of a system of backw ard sto c hastic diﬀeren tial equations (BSDEs). These equations encapsulate the equilibrium v alue function essential for resolving cases where the presen t state aﬀecting the target functional triggers the inconsistency . Additionally , the pap er oﬀers an application exemplifying this theory through the time-inconsistent linear–quadratic regulator. 1 In tro duction Classical stochastic con trol is largely built around an in tertemp oral consistency principle: the p olicy that is optimal when the problem is p osed at time 0 remains optimal when the same optimisation is reconsidered at any later time t , conditional on the information a v ailable at t . This prop ert y is the backbone of Bellman’s dynamic programming principle (DPP). It allows one to propagate v alue functions through conditioning and concatenation, and it leads to tractable characterisations of optimal feedbac k controls via Hamilton–Jacobi–Bellman (HJB) equations and veriﬁcation arguments; see, for instance, Fleming and Soner [ 16 ], or Y ong and Zhou [ 50 ]. A large and imp ortan t family of economically and ﬁnancially motiv ated ob jectiv es violates this principle. In a time-inc onsistent con trol problem, the contin uation criterion used by the agent at time t diﬀers from the criterion that will b e used at a later date s > t . As a consequence, a plan designed at time 0 is t ypically not self-enforcing: when time t arrive s, the agent re- optimises and may deviate from the original plan even when the underlying dynamics hav e not changed. Time inconsistency therefore fundamen tally alters the nature of the problem. Since a global optimum in the classical sense is no longer necessarily meaningful, the relev an t solution concept m ust b e reconsidered, and one needs new analytical tools to replace the missing DPP . A natural resolution, going bac k to Strotz [ 44 ], is to interpret time inconsistency as an intr ap ersonal dynamic game in which the ‘pla yers’ are the agent’s successiv e selves. This viewp oin t clariﬁes three canonical b eha vioural b enc hmarks. A pr e-c ommitte d agen t computes an optim um at time 0 and follo ws it regardless of future incentiv es. A naive agent re-optimises o ver time as if the current plan w ould never b e revised again. The sophistic ate d (game-theoretic) agen t studied in this pap er instead seeks a self-enforcing, subgame-p erfect strategy: no self has an incentiv e to deviate, giv en that later selves will also b eha v e optimally from their own p ersp ectiv e. In discrete time, this ‘consistent planning’ paradigm is classical Phelps and P ollak [ 39 ], P ollak [ 40 ], P eleg and Y aari [ 38 ], and it also provides b eha vioural foundations for quasi-hyperb olic and more general forms of discounting Laibson [ 32 ], O’Donoghue and Rabin [ 36 ]. W e will illustrate the quantitativ e gap b et ween precommitmen t, naiv ety and sophistication in our linear–quadratic example in Section 4 , and remark that analogous tw o-lay er game-theoretic structures also arise when time inconsistency in teracts with strategic considerations in multi-pla yer games [ 41 ]. In contin uous time, equilibrium notions are necessarily lo cal. A ‘current self ’ is allow ed to deviate only on a short time interv al, while taking the contin uation b eha viour of future selves as ﬁxed, so that equilibrium controls are lo cally optimal in the sense of an inﬁnitesimal deviation analysis. Sev eral equilibrium concepts coexist in the sto c hastic control literature, reﬂecting both mo delling c hoices (op en-loop versus feedback strategies) and analytical requirements (how deviations are measured, and what regularit y is imp osed on the candidate strategy). The strong/weak equilibrium distinction of Huang and Zhou [ 25 ] and the ∗ ETH Zürich, Mathematics department, Switzerland, dylan.possamai@math.ethz.ch. This author gratefully ackno wledges partial supp ort by the SNF pro ject MINT 205121-219818. † ETH Zürich, Mathematics department, Switzerland, mateo.ro driguezpolo@math.ethz.ch. This author gratefully acknowledges partial supp ort by the SNF pro ject MINT 205121-219818. 1 subsequen t analysis of equilibrium notions in He and Jiang [ 19 ] mak e this particularly transparent. A related, widely used notion is that of r e gular e quilibrium , which is tailored to the extended HJB approac h and is closely connected to the solv abilit y of equilibrium PDE systems Lindensjö [ 33 ], Björk, Khapko, and Murgo ci [ 8 ]. In this pap er w e fo cus on feedback equilibria in the sense of lo cal deviations, as this is the natural notion for dynamic programming. Time inconsistency can b e generated by sev eral conceptually distinct mechanisms, and the contin uous-time literature reﬂects this diversit y . First, and perhaps most prominen tly , non-exp onential disc ounting destroys stationarit y: the discount factor dep ends on the ev aluation time and induces a re-weigh ting of future pa yoﬀs as time passes. In contin uous time this mechanism motiv ated the pioneering equilibrium analysis of Ekeland and Lazrak [ 12 ; 13 ], Ekeland and Pirvu [ 14 ]. It remains a b enc hmark class and has b een revisited in general Marko vian settings; see, for instance, Björk, Khapko, and Murgo ci [ 7 ; 8 ]. Second, nonline ar dep endenc e on c onditional exp e ctations breaks the DPP even when discounting is exp onential. The paradig- matic example is the mean–v ariance criterion, whic h in tro duces a v ariance term (a nonlinear function of an exp ectation) into the objective and is central in dynamic Marko witz p ortfolio selection. Equilibrium form ulations for mean–v ariance and related deviation–risk criteria hav e b een developed in, among man y others, Basak and Chabakauri [ 2 ], Björk, Murgo ci, and Zhou [ 6 ], Gu, Si, and Zheng [ 17 ]. This line of work has also motiv ated robust and ambiguit y-av erse formulations, where time inconsistency and mo del uncertaint y interact; see, e.g. , Pun [ 42 ]. Time inconsistency also in teracts with additional mo delling features such as regime switching and discrete interv en tions; equilibrium analyses of time-inconsistent sto c hastic switc hing problems can b e found in, for instance, Mei and Y ong [ 35 ]. Third, and this is the focus of the present pap er, time inconsistency may stem from state-dep endent pr efer enc e p ar ameters . In many mo dels the criterion dep ends on a parameter that is up dated as the state ev olves—w ealth-dep enden t risk av ersion, mo ving targets, relativ e-p erformance b enc hmarks, or endogenous reference p oin ts. When this parameter is r e c alibr ate d by eac h future self, diﬀerent selves eﬀectively face diﬀeren t ob jectiv e functionals even if discoun ting is exp onential and the reward structure is otherwise time-homogeneous. State-dep enden t risk av ersion in deviation–risk criteria provides one family of examples [ 6 ; 17 ; 42 ], but the mechanism is broader: the preference parameter may itself b e the state used as a reference p oin t, as in the criterion considered in ( 1.1 ) b elo w. A further imp ortan t class, closely related to nonlinear exp ectation criteria, arises in r e cursive (BSDE-type) ob jectiv es: time inconsistency can emerge from a lack of ﬂo w prop ert y in the bac kw ard component and from non-separable aggregation. This has led naturally to equilibrium c haracterisations in terms of ﬂows of forward–bac kward SDEs and, more generally , bac kward stochastic V olterra integral equations (BSVIEs); see W ei, Y ong, and Y u [ 47 ], Hamaguchi [ 18 ], W ang and Y ong [ 45 ], Mastrogiacomo and T arsia [ 34 ]. Finally , time-inconsistent stopping (and mixed con trol–stopping) problems form a parallel and active strand of the literature, where the game-theoretic equilibrium concept takes a diﬀeren t form but shares the same conceptual origin. W e refer to Christensen and Lindensjö [ 10 ; 11 ], Bayraktar, Zhang, and Zhou [ 3 ], Bo dnariu, Christensen, and Lindensjö [ 9 ] for representati ve recen t works and for further references. W e concentrate on a Marko vian controlled diﬀusion in w eak formulation and on objective functionals of the form J ( t, x, α ) : = E P t,x,α  Z T t f  s, x, X s , α s  d s + ξ ( x, X T )  , ( t, x, α ) ∈ [0 , T ] × R n × A , (1.1) where X denotes the controlled state, α is the control, and the crucial feature is the app earance of the curr ent state x as an additional argument in b oth the running and terminal pay oﬀ. When the same problem is re-ev aluated at time s > t , the parameter x is up dated to X s , so the contin uation criterion diﬀers from ( 1.1 ) even if the con trol law is kept ﬁxed. Suc h state-dep enden t up dating is natural whenever pay oﬀs are formulated relativ e to a moving target or a reference point that ev olves with the system, rather than b eing ﬁxed at time 0 . A t a formal level, criteria of the form ( 1.1 ) are encompassed b y the general Marko vian equilibrium frameworks of [ 4 ; 7 ; 8 ]. The k ey insight in these framew orks is that equilibrium b eha viour is describ ed not by a single v alue function but by an extended ob ject (an ‘equilibrium v alue function’ together with auxiliary functions) whose diagonal captures the contin uation v alues faced b y eac h self. How ever, the existing Marko vian literature at this level of generality pro ceeds primarily via veriﬁc ation-typ e results: one p ostulates an extended HJB system (a coupled system of nonlinear PDEs in m ultiple v ariables) and prov es that an y suﬃciently smo oth solution yields an equilibrium control. This approac h was pioneered and systematised in [ 4 ; 7 ] and remains central in the monograph [ 8 ]. Parallel approaches based on Pon tryagin-t ype maximum principles lead to equilibrium c haracterisations in terms of ﬂows of forw ard–backw ard SDEs, esp ecially in linear–quadratic settings; see Hu, Jin, and Zhou [ 23 ], Hu, Jin, and Zhou [ 24 ] and the references therein. There are also contributions fo cusing on the existence of closed-lo op equilibria in more general mo dels and on the relationship b et ween diﬀerent equilibrium notions; see, e.g. , Y ong [ 49 ], Huang and Zhou [ 25 ], He and Jiang [ 19 ], W ang and Zheng [ 46 ]. 2 Despite this substantial progress, genuinely state-dep enden t time inconsistency raises conceptual and technical obstacles that, in our view, hav e not b een fully resolv ed at the level of dynamic programming. The key diﬃcult y is that the preference parameter driving the inconsistency b ecomes sto c hastic once it is up dated to the current state. F rom a dynamic programming viewp oin t, the equilibrium ob ject is therefore not a single scalar v alue function: one must keep track of a family of contin uation v alues indexed b y a reference parameter (the ‘reference state’), together with a consistent mechanism that selects the correct diagonal when the parameter is up dated along the state pro cess. In smo oth PDE approaches this manifests in the need to solv e an extended HJB system on an enlarged state space and to ev aluate the solution along a diagonal. Outside smo oth settings, how ever, it is not a priori clear how to interpret this diagonal, how it ev olves along the diﬀusion, and how it in teracts with the equilibrium deﬁnition based on local deviations. By contrast, the most complete rigorous dynamic programming foundations currently a v ailable in the time-inconsisten t litera- ture fo cus on mec hanisms where the preference parameter is either deterministic (as in non-exp onen tial discounting) or enters through conditional exp ectations (as in mean–v ariance and deviation–risk criteria). In these cases one can often set up a ﬂow of v alue functions indexed by the initial time or by auxiliary exp ectation v ariables and derive extended HJB systems, FBSDE ﬂo ws, and/or BSVIE characterisations [ 2 ; 12 ; 13 ; 14 ; 23 ; 45 ]. Recen t w orks hav e also developed dynamic programming and viscosit y-solution metho ds for the resulting extended HJB systems in sp eciﬁc settings Karnam, Ma, and Zhang [ 30 ], Xu and Y ang [ 48 ]. The non-Marko vian theory of Hernández and Possamaï [ 21 ] provides a very general equilibrium DPP and BSDE represen tation for sophisticated agents, but does not cov er the Marko vian sp ecialisation required for state-dep enden t reference parameters. T o the b est of our knowledge, a ful ly rigor ous dynamic programming treatment of time inconsistency stemming from state- dep endent pr efer enc e up dating of the form ( 1.1 ) has b een missing. While state dep endence is present in the general Marko vian framew orks abov e, existing results in that direction are predominantly veriﬁcation-t yp e. They do not derive a dynamic programming principle that is b oth necessary and suﬃcien t and that explicitly propagates the state-dependent preference parameter through time. Providing suc h a dynamic programming principle, and turning it in to a concrete probabilistic represen tation, is the cen tral ob jectiv e of the presen t pap er. W e develop a rigorous and op erational dynamic programming theory for state-dep endent time-inconsistent sto c hastic control in contin uous time. W e work in weak formulation for a controlled diﬀusion with uncon trolled volatilit y , and w e seek feedback equilibrium controls. The analysis is probabilistic throughout, and the main output is an equilibrium DPP together with a Mark ovian system of backw ard sto chastic diﬀeren tial equations (BSDEs) charac terising the equilibrium v alue. The starting p oin t is the non-Mark ovian equilibrium DPP of [ 21 ]. In the state-dep enden t Marko vian setting, this suggests that the equilibrium v alue at ( t, x ) should b e understo o d as the diagonal of a ﬂow of con tinuation v alues indexed by a reference parameter. T urning this into a tractable Mark ovian ob ject requires a wa y to ev aluate such a ﬂow along the random curve giv en by the state pro cess when the reference parameter is up dated. The key to ol enabling this step is the Itô–Kunita– W entzell formula Kunita [ 31 ]. Roughly sp eaking, the Itô–Kunita–W entzell formula allo ws us to compute the semimartingale decomp osition of a random ﬁeld ev aluated along a sto c hastic ﬂo w. In our context, it provides a clean and explicit ‘diagonal calculus’ for the equilibrium ﬂow and mak es the additional drift terms generated by state dep endence transparent. The resulting BSDE system yields a probabilistic counterpart to extended HJB systems that is compatible with lo w regularity . It also clariﬁes the role of diagonal ob jects that app ear throughout the equilibrium literature (b oth in PDE and FBSDE form ulations) and that are intimately connected to the lo cal deviation structure of equilibrium deﬁnitions [ 19 ; 23 ; 25 ]. F or completeness, we recall that BSDE methods pla y a central role in sto c hastic con trol, both as a probabilistic representation of PDEs and as a natural language for recursiv e criteria; see, e.g. , Pardoux and Protter [ 37 ], El Karoui, Peng, and Quenez [ 15 ]. A second theme of the pap er is a uniﬁcation of time-dep endent and state-dep endent time inconsistency . In standard (time- consisten t) optimal control, explicit time dep endence can alw ays b e reduced to state dep endence by augmen ting the state with a clo c k v ariable [ 16 ; 50 ]. While this observ ation is classical, it has not b een systematically exploited at the lev el of equilibrium dynamic programming for sophisticated agents. The reason is that, without a complete treatmen t of state- dep enden t preference up dating, the reduction is essentially formal: one may embed time into an enlarged state space, but one still needs to understand ho w the equilibrium ﬂo w and its diagonal b eha v e when the preference parameter b ecomes a comp onen t of the state. Our probabilistic approach, and in particular the Itô–Kunita–W en tzell based diagonal calculus, makes this reduction trans- paren t and explicit in the equilibrium setting. It shows that non-exp onen tial discoun ting can b e viewed as a sp ecial instance of state-dep endent preference up dating (with the ‘reference’ b eing the augmented state, i.e. the clo c k), and it clariﬁes how the BSDE systems app earing in the discounting literature are recov ered as a degenerate case of the general state-dep enden t theory . In that sense, the present w ork do es more than recall the classical state-augmentation trick: it pro vides the missing state-dep enden t equilibrium theory that makes the reduction operational. 3 The presen t pap er provides, to our knowledge, the ﬁrst complete Marko vian dynamic programming theory for time inconsis- tency driv en by state-dep enden t preference updating. Concretely , our contributions can b e summarised as follo ws. ( i ) Equilibrium DPP and Markovian BSDE char acterisation for state dep endenc e. W e establish an equilibrium DPP for the criterion ( 1.1 ) and derive a Marko vian system of BSDEs whose solution c haracterises b oth the equilibrium v alue and the equilibrium feedback con trol. This yields a probabilistic analogue of the extended HJB approach whic h do es not require smo oth PDE solutions and which makes the diagonal structure explicit. ( ii ) A tr ansp ar ent diagonal c alculus via the Itô–K unita–W entzel l formula. W e sho w that the Itô–Kunita–W en tzell formula pro vides the correct probabilistic mechanism behind the diagonal terms that app ear in equilibrium conditions. This clariﬁes and complements the extended HJB viewp oin t of [ 4 ; 7 ; 8 ], and it connects the Mark ovian state-dependent setting to the general non-Mark ovian equilibrium DPP of [ 21 ]. ( iii ) R e duction of time dep endenc e to state dep endenc e in the e quilibrium setting. W e mak e explicit how time-dep endent mec hanisms such as non-exp onen tial discoun ting can b e embedded into the state-dep enden t framew ork via state augmentation. W e then sho w ho w the corresponding equilibrium BSDE systems arise as a degenerate case of our general theory . T o our kno wledge, this “time as state” reduction has not previously b een p oin ted out and exploited in a dynamic programming framew ork for sophisticated equilibrium con trols. ( iv ) A tr actable il lustr ation: a time-inc onsistent line ar–quadr atic r e gulator. W e apply the general results to a time-inconsistent linear–quadratic regulator, where we obtain existence and characterisation results in a concrete class and provide n umerical exp erimen ts comparing equilibrium, naive, and precommitted con trols. The theory dev elop ed here ﬁts naturally within the growing probabilistic approach to time-inconsistent con trol. On the one hand, it complements the general non-Marko vian equilibrium theory of [ 21 ] by providing an explicit Marko vian sp ecialisation adapted to state-dep enden t preference parameters, and by connecting it to the extended HJB paradigm through a concrete BSDE system. On the other hand, it provides a rigorous dynamic programming underpinning for Marko vian state-dep enden t mo dels that ha ve previously b een handled mainly through smo oth veriﬁcation arguments. Time-inconsisten t preferences also arise in other domains, including contracting problems with sophisticated agents, where the failure of commitmen t interacts with moral hazard. W e refer to [ 22 ] for recen t developmen ts in that direction and note that, while our fo cus is on Mark ovian diﬀusion control, the present results strengthen the conceptual bridge b et ween Marko vian state-dep enden t models and the general non-Mark ovian probabilistic theory . The rest of the pap er is organised as follo ws. Section 2 in tro duces the time-inconsistent control problem and the equilibrium concept. Section 3 states the main results, including the equilibrium DPP and the BSDE characterisation. Section 4 studies the linear–quadratic regulator example and compares equilibrium and naive controls. Finally , Section 5 discusses the reduction of time dep endence to state dep endence and its implications for non-exp onen tial discoun ting. Notations: Throughout thi s paper we take the con ven tion ∞ − ∞ : = −∞ , and we ﬁx a time horizon T > 0 . R + and R ⋆ + denote the sets of non-negative and p ositive real num b ers, resp ectiv ely . Given ( E , ∥ · ∥ ) a Banach space, a p ositiv e integer p , and a non-negative integer q , C p q ( E ) (resp. C p q,b ( E ) ) will denote the space of functions from E to R p which are at least q times contin uously diﬀerentiable (resp. and b ounded with b ounded deriv atives). Whenev er E = [0 , T ] (resp. q = 0 or b is not sp eciﬁed), we suppress the dep endence on E (resp. on q or b ), e.g. C p denotes the space of contin uous functions from [ 0 , T ] to R p . F or an y ( x, y ) ∈ C k × C k , we write ∥ x − y ∥ ∞ : = sup t ∈ [0 ,T ] ∥ x ( t ) − y ( t ) ∥ . F or any dimension k ∈ N ⋆ and radius R > 0 , we de note by ¯ B R the closed ball of radius R centred at the origin in R k . That is ¯ B R : =  y ∈ R k : ∥ y ∥ ≤ R  . Given ( x, ˜ x ) ∈ C p × C p and t ∈ [0 , T ] , we deﬁne their concatenation x ⊗ t ˜ x ∈ C p by ( x ⊗ t ˜ x )( r ) : = x ( r ) 1 { r ≤ t } + ( x ( t ) + ˜ x ( r ) − ˜ x ( t )) 1 { r ≥ t } , r ∈ [0 , T ] . F or φ ∈ C p q ( E ) with q ≥ 2 , ∂ 2 xx φ will denote its Hessian matrix. F or ( u, v ) ∈ R p × R p , u · v will denote their usual inner pro duct, and ∥ u ∥ the corresponding norm. F or p ositive integers m and n , we denote by M m,n ( R ) the space of m × n matrices with real en tries, and we simplify notations by setting M n ( R ) : = M n,n ( R ) . T r[ M ] denotes the trace of a matrix M ∈ M n ( R ) . F or (Ω , G ) a measurable space, Prob(Ω) denotes the collection of all probability measures on (Ω , G ) . F or P ∈ Prob(Ω) and a ﬁltration G , G P : = ( G P t ) t ∈ [0 ,T ] , denotes the P -completion of G . W e recall that for any t ∈ [0 , T ] , G P t : = G t ∨ σ ( N P ) , where N P : = { N ⊆ Ω : ∃ B ∈ G , N ⊆ B , and P [ B ] = 0 } . G P + denotes the right limit of G P , i.e. G P t + : = T ε> 0 G P t + ε , t ∈ [0 , T ) , and G P T + : = G P T . F or ( s, t ) ∈ [0 , T ] 2 , with s ≤ t , T s,t ( F ) denotes the collection of [ s, t ] -v alued F –stopping times. 4 2 Time-inconsisten t sto chastic control W e ﬁx tw o p ositiv e integers n and d , which represen t resp ectiv ely the dimension of the pro cess controlled b y the agent, and the dimension of the Bro wnian motion driving this controlled pro cess. W e ﬁx a time horizon T > 0 , and consider the canonical space Ω : = C ([0 , T ] , R n ) , with canonical pro cess X , and whose generic elements we denote ω . W e let F b e the Borel σ -algebra on Ω (for the top ology of uniform conv ergence), and we denote by F X : = ( F X t ) t ∈ [0 ,T ] the natural ﬁltration of X . W e let A b e a closed subset of R k for some p ositiv e integer k , where the controls will take v alues. Remark 2.1. Note that we do not assume that A is c omp act. This wil l al low the c ase tr e ate d in Section 4 to b e include d in our the ory. However, we wil l later assume that the Hamiltonian in ( 3.8 ) is attaine d, either due to c omp actness of A or c o er civity of the c o eﬃcients. Remark 2.2. W e r estrict our attention to Euclide an action sp ac es primarily to facilitate the heuristic derivations in Section 3 , which r ely on diﬀer entiation with r esp e ct to the c ontr ol variable. However, the rigor ous r esults of this p ap er ( sp e ciﬁc al ly the ne c essity and veriﬁc ation the or ems ) r ely solely on me asur able sele ction ar guments. Conse quently, our the ory extends str aightforwar d ly to the c ase wher e A is a close d subset of an arbitr ary Polish sp ac e. 2.1 Probabilistic setting W e will follow a similar setting to the one in Hernández and P ossamaï [ 21 ] restricting to a Marko vian framework, and working exclusiv ely under the weak formulation. W e ﬁx a b ounded Borel measurable map σ : [0 , T ] × R n − → R n × d , an initial condition x 0 ∈ R n , and assume that there is a unique solution, denoted by P , to the martingale problem for whic h X is an ( F X , P ) – lo cal martingale, such that X 0 = x 0 with P -probability 1 , and d[ X ] t = σ ( t, X t ) σ ⊤ ( t, X t )d t , P –a.s.. Enlarging the original probabilit y space if necessary (see Stro ock and V aradhan [ 43 , Theorem 4.5.2]), we can ﬁnd an R d -v alued Brownian motion W suc h that X t = x 0 + Z t 0 σ ( r , X r )d W r , t ∈ [0 , T ] . W e now let F : = ( F t ) t ∈ [0 ,T ] b e the P –augmentation of F X . W e recall that uniqueness of the solution to the martingale problem implies that the predictable martingale represen tation prop ert y holds for ( F , P ) -martingales, whic h can b e represented as sto c hastic integrals with resp ect to X (see Jaco d and Shiryaev [ 27 , Theorem I II.4.29]). W e also mention that the right- con tinuit y of F guaran tees that ( F , P ) satisﬁes the Blumenthal zero–one la w and, in particular, all F 0 -measurable random v ariables are deterministic. W e can then in tro duce our drift functional b : [0 , T ] × Ω × A − → R d , which is assumed to b e Borel-measurable with resp ect to all its arguments. Let us recall that for an y A -v alued, F -predictable pro cess α suc h that E P  exp  Z T 0 b ( r , X r , α r ) · d W r − 1 2 Z T 0   b ( r , X r , α r )   2 d r  < ∞ , (2.1) w e can deﬁne the probabilit y measure P α on (Ω , F T ) , whose density with resp ect to P is giv en by d P α d P : = exp  Z T 0 b ( r , X r , α r ) · d W r − 1 2 Z T 0   b ( r , X r , α r )   2 d r  . Moreo ver, by Girsanov’s theorem, the pro cess W α : = W − R · 0 b ( r , X r , α r )d r is an R d -v alued, ( F , P α ) –Bro wnian motion and we ha ve X t = x 0 + Z t 0 σ ( r , X r ) b ( r , X r , α r )d r + Z t 0 σ ( r , X r )d W α r , t ∈ [0 , T ] , P –a.s. W e deﬁne A to b e the set of all contin uous pro cesses such that condition ( 2.1 ) holds. Let us emphasise that w e are working under the so-called w eak formulation of the problem. This means that the state process X is ﬁxed and, in con trast to the t ypical strong formulation, the Brownian motion, and the probability measure are not ﬁxed. Indeed, the c hoice of α corresp onds to the c hoice of probability measure P α and th us impacts the distribution of pro cess X . Let us no w recall the celebrated result on the existence of a w ell-b eha ved ω -by- ω versions of the conditional exp ectation. W e also introduce the concatenation of a measure and a sto c hastic kernel. Recall Ω is a Polish space and F is a countably generated σ -algebra. F or P ∈ Prob(Ω) and τ ∈ T 0 ,T ( F ) , F τ is also countably generated, so there exists an asso ciated regular conditional probabilit y distribution (r.c.p.d. for short) ( P τ x ) x ∈ Ω , see Stro o c k and V aradhan [ 43 , Theorem 1.3.4], satisfying ( i ) for every x ∈ Ω , P τ x is a probability measure on (Ω , F ) ; 5 ( ii ) for every E ∈ F , the mapping x 7− → P τ x [ E ] is F τ -measurable; ( iii ) the family ( P τ x ) x ∈ Ω is a version of the conditional probabilit y measure of P given F τ , that is to say that for every P -in tegrable, F -measurable random v ariable ξ , we hav e E P [ ξ |F τ ]( x ) = E P τ x [ ξ ] , for P – a.e. x ∈ Ω ; ( iv ) for ev ery x ∈ Ω , P τ x [Ω x τ ] = 1 , where Ω x τ : = { x ′ ∈ Ω : x ′ ( r ) = x ( r ) , 0 ≤ r ≤ τ ( x ) } . Moreo ver, for P ∈ Prob(Ω) and an F τ -measurable sto chastic kernel ( Q τ x ) x ∈ Ω suc h that Q τ x [Ω x τ ] = 1 , for ev ery x ∈ Ω , the concatenated probabilit y measure is deﬁned by P ⊗ τ Q · [ A ] : = Z Ω P (d x ) Z Ω 1 A ( x ⊗ τ ( x ) ˜ x ) Q x (d ˜ x ) , ∀ A ∈ F . (2.2) The follo wing result, see [ 43 , Theorem 6.1.2], gives a rigorous c haracterisation of the concatenation pro cedure. Theorem 2.3 (Concatenated measure) . Consider a sto chastic kernel ( Q ω ) ω ∈ Ω , and let τ ∈ T 0 ,T ( F ) . Supp ose the map ω 7− → Q ω is F τ -me asur able and Q ω [Ω ω τ ] = 1 for al l ω ∈ Ω . Given P ∈ Prob(Ω) , ther e is a unique pr ob ability me asur e P ⊗ τ ( · ) Q · on (Ω , F ) such that P ⊗ τ ( · ) Q · e quals P on (Ω , F τ ) and ( δ ω ⊗ τ ( ω ) Q ω ) ω ∈ Ω is an r.c.p.d . of P ⊗ τ ( · ) Q · |F τ . F or some t ∈ [0 , T ] , supp ose that τ ≥ t , that M : [ t, T ] × Ω − → R is a right-c ontinuous, F –pr o gr essively me asur able function after t , such that M t is P ⊗ τ ( · ) Q · -inte gr able, that for al l r ∈ [ t, T ] , ( M r ∧ τ ) r ∈ [ t,T ] is an ( F , P ) -martingale, and that ( M r − M r ∧ τ ( ω ) ) r ∈ [ t,T ] is an ( F , Q ω ) -martingale, for al l ω ∈ Ω . Then ( M r ) r ∈ [ t,T ] is an ( F , P ⊗ τ ( · ) Q · ) -martingale. In particular, for an F -measurable function ξ , E P ⊗ τ P τ · [ ξ ] = E P [ E P [ ξ |F τ ]] = E P [ ξ ] . This is the classical to wer property . A dditionally , the rev erse implication in the last statement in Theorem 2.3 holds b y [ 43 , Theorem 1.2.10]. In particular, the exp osition ab o ve means that we can ensure the existence of probabilit y measures indexed by ( t, x, α ) ∈ [0 , T ] × R n × A under which the state pro cess satisﬁes, for s ∈ [ t, T ] X s = x + Z s t σ ( r , X r ) b ( r , X r , α r )d r + Z s t σ ( r , X r )d W α r , t ∈ [0 , T ] , P t,x,α –a.s. , where W α is a Brownian motion with resp ect to P t,x,α : = ( P α ) t x . 2.2 T arget functional Let us introduce the running and terminal pay oﬀ functionals J ( t, x, α ) : = E P t , x , α  Z T t f ( s, x, X s , α s )d s + ξ ( x, X T )  , (2.3) where f : [0 , T ] × R n × R n × A − → R and ξ : R n × R n − → R are Borel-measurable functions. W e will refer to f as the running pa yoﬀ function and ξ as the terminal pa yoﬀ function. W e will sometimes refer to a more generic pa yoﬀ functional of the form J ( t, x, y , α ) : = E P t , x , α  Z T t f ( s, y , X s , α s )d s + ξ ( y , X T )  , ( t, x, α ) ∈ [0 , T ] × R n × A . Note that we ha ve that J ( t, x, x, α ) = J ( t, x, α ) , justifying our nomenclature. As in tro duced earlier, we remark that the app earance of x in b oth functions in the rew ard functional creates the time-inconsistency . The goal of the controller will b e, roughly sp eaking, to choose α to maximise ( 2.3 ). How ever, since their preferences c hange ov er time, it is not clear what we mean mathematically b y this. In the next subsection, we in tro duce the precise notion of controls that we will b e interested in. 2.3 Game formulation W e recall that a strategy proﬁle is sub-game p erfe ct if it prescrib es a Nash equilibrium in any sub-game. In our framework, ev ery play er together with a past tra jectory deﬁne a new sub-game. This motiv ates the idea b ehind the deﬁnition of an equilibrium mo del, see among others Björk and Murgo ci [ 4 ], Ekeland and Lazrak [ 12 ] and Strotz [ 44 ]. The intuition b ehind this consideration is that at each p oin t in time a diﬀeren t play er stands (which can b e though t of diﬀerent versions of one-self ), and w e intuitiv ely try to achiev e a sub-game p erfect strategy . Let α ∈ A be an action, ( t, x ) ∈ [0 , T ] × R n an arbitrary initial condition, and ℓ ∈ (0 , T − t ] . W e recall that α ⊗ τ α ⋆ : = α 1 [ t,τ ) + α ⋆ 1 [ τ ,T ] . 6 Deﬁnition 2.4 ( Equilibrium c ontr ol ) . L et α ⋆ ∈ A b e an admissible c ontr ol. W e say that α ⋆ is an e quilibrium c ontr ol, if for any ε > 0 , we have that ℓ ε > 0 , wher e ℓ ε := inf  ℓ > 0 : ∃ α ∈ A , P [ {∃ t ∈ [0 , T ] , J ( t, X t , α ⋆ ) < J ( t, X t , α ⊗ ℓ α ⋆ ) − εℓ } ] > 0  . In this c ase, we write α ⋆ ∈ E . Remark 2.5. W e c an show that one c an r e c over the essenc e of the classic al deﬁnition in [ 5 ] in the fol lowing sense: assume that α ⋆ is an e quilibrium c ontr ol as in the pr evious deﬁnition, and let ε > 0 . Then, ther e exists some ℓ ε > 0 and a set ˜ Ω with P [ ˜ Ω] = 1 with J ( t, X t , α ⋆ ) − J ( t, X t , α ⊗ ℓ α ⋆ ) ≥ − εℓ, ∀ ( ℓ, X t , α ) ∈ (0 , ℓ ε ) × ˜ Ω × A. Now, as ε was arbitr ary, we c an take a se quenc e ε n = 1 /n , n ∈ N ⋆ , with their c orr esp onding sets ˜ Ω n , and on Ω ⋆ : = T n ∈ N ⋆ ˜ Ω n we have that lim inf ℓ ↓ 0 J ( t, X t , α ⋆ ) − J ( t, X t , α ⊗ ℓ α ⋆ ) ℓ ≥ 0 . In the rest of the do cumen t w e ﬁx some ( t, x ) ∈ [0 , T ] × R n and study the problem v ( t, x ) : = J ( t, x, α ⋆ ) , ( t, x ) ∈ [0 , T ] × R n , α ⋆ ∈ E . (P) Thanks to the weak uniqueness assumption, v is w ell-deﬁned for all ( t, x ) ∈ [0 , T ] × R n and Borel-measurable. Remark 2.6. ( P ) is fundamental ly diﬀer ent fr om the pr oblem of maximising A ∋ α 7− → J ( t, x, α ) . In ( P ) , one ﬁnds α ⋆ ∈ A ﬁrst and then deﬁnes the value function. This c ontr asts with the classic al formulation of optimal c ontr ol pr oblems. Se c ond, the pr evious maximisation wil l ﬁnd player t ’s so-c al le d pr e-c ommitte d str ate gy. 2.4 F unctional spaces In this section, we introduce the spaces of pro cesses that w e will b e using throughout this pap er. W e ﬁrst recall the standard spaces of square-integrable pro cesses • S 2 ( R n , F , P ) : the space of F –progressively measurable, càdlàg pro cesses Y taking v alues in R n suc h that ∥ Y ∥ 2 S 2 ( R n , F , P ) : = E P  sup t ∈ [0 ,T ] ∥ Y t ∥ 2  < ∞ . • H 2 ( R d , F , P ) : the space of F -predictable pro cesses Z taking v alues in R d suc h that ∥ Z ∥ 2 H 2 ( R d , F , P ) : = E P  Z T 0 ∥ Z t ∥ 2 d t  < ∞ . F or the deriv ativ e processes, which dep end on the parameter y ∈ R n , w e require well-posedness uniform on compact sets. T ow ard this purp ose, w e introduce the spaces of lo cally square-in tegrable random ﬁelds. Deﬁnition 2.7 (Lo cally uniform random ﬁelds) . L et U = ( U y ) y ∈ R n and V = ( V y ) y ∈ R n b e two families of sto chastic pr o c esses indexe d by y . • W e say U ∈ S 2 loc ( R n , F , P ) if the map y 7− → U y is c ontinuous fr om R n to S 2 ( R n , F , P ) , and b ounde d on c omp act sets. That is, for any c omp act set K ⊂ R n sup y ∈ K ∥U y ∥ S 2 ( R n , F , P ) < ∞ . • W e say V ∈ H 2 loc ( R d , F , P ) if the map y 7− → V y is c ontinuous fr om R d to H 2 ( R d , F , P ) , and for any c omp act set K ⊂ R d sup y ∈ K ∥V y ∥ H 2 ( R d , F , P ) < ∞ . The sp ac es ar e e quipp e d with the top olo gy induc e d by the family of semi-norms { sup y ∈ ¯ B R ∥ · ∥} R> 0 . 7 2.4.1 A uxiliary w eighted functional spaces and norms T o carry out the pro of of well-posedness, we introduce the sp eciﬁc p olynomial weigh t function ρ : R n − → R + deﬁned b y ρ ( y ) : = (1 + ∥ y ∥ 2 ) − k , where k ≥ 1 is a ﬁxed integer chosen suﬃciently large relative to the growth rate m app earing in Assumption 3.11 . Sp eciﬁcally , w e require 2 k ≥ m , as w e will see later. Remark 2.8 (General growth conditions) . The choic e of the weight function ρ has b e en made for pr esentation purp oses and to dir e ctly enc omp ass the LQR example that we wil l pr esent in Section 4 . Se e also Remark 3.13 . F or any β > 0 and dimension d ∈ N ⋆ , w e deﬁne the follo wing Banach spaces for processes on [0 , T ] . • H 2 β ( R d , F , P ) is the space of R d -v alued, F -predictable pro cesses Z suc h that ∥ Z ∥ 2 H 2 β ( R d , F , P ) : = E P  Z T 0 e β t ∥ Z t ∥ 2 d t  < ∞ . • S 2 β ( R d , F , P ) is the space of R d -v alued, F -optional càdlàg pro cesses Y such that ∥ Y ∥ 2 S 2 β ( R d , F , P ) : = E P  sup t ∈ [0 ,T ] e β t ∥ Y t ∥ 2  < ∞ . Note that the norms are equiv alen t for all v alues of β since [0 , T ] is compact. Let U = ( U y t ) y ∈ R n b e a random ﬁeld where, for eac h y , U y is a pro cess. W e deﬁne the weigh ted spaces: • S 2 , 2 β ,ρ ( R d , F , P ) is the space of random ﬁelds U suc h that U y ∈ S 2 β ( R d , F , P ) for all y , the map y 7− → U y is contin uous from R n to S 2 β ( R d , F , P ) , and ∥ U ∥ 2 S 2 , 2 β,ρ ( R d , F , P ) : = sup y ∈ R n n ρ ( y ) ∥ U y ∥ 2 S 2 β ( R d , F , P ) o < ∞ . • H 2 , 2 β ,ρ ( R d , F , P ) is the space of random ﬁelds V such that V y ∈ H 2 β ( R d , F , P ) for all y , the map y 7− → V y is contin uous from R n to H 2 β ( R d ) , and ∥ V ∥ 2 H 2 , 2 β,ρ ( R d , F , P ) : = sup y ∈ R n n ρ ( y ) ∥ V y ∥ 2 H 2 β ( R d , F , P ) o < ∞ . W e deﬁne the global pro duct space K n,d β for the tuple ( Y , Z , ∂ Y , ∂ Z, ∂ ∂ Y , ∂ ∂ Z ) , which will solve the BSDE system ( 3.7 ), to b e introduced in Section 3 : K n,d β ( F , P ) : = S 2 β ( R , F , P ) × H 2 β ( R d , F , P ) | {z } V alue proce ss ( Y ,Z ) × S 2 , 2 β ,ρ ( R n , F , P ) × H 2 , 2 β ,ρ ( R n × d , F , P ) | {z } Gradient process ( ∂ Y ,∂ Z ) × S 2 , 2 β ,ρ ( R n × n , F , P ) × H 2 , 2 β ,ρ ( R n × n × d , F , P ) | {z } Hessian pro cess ( ∂ ∂ Y ,∂ ∂ Z ) . (2.4) Prop osition 2.9 (Banach structure) . The sp ac e K n,d β ( F , P ) is a Banach sp ac e. Pr o of. The spaces S 2 β ( R , F , P ) and H 2 β ( R d , F , P ) are standard spaces of square-integrable pro cesses and are well-kno wn to b e Banac h spaces (actually Hilb ert spaces). The weigh ted spaces S 2 , 2 β ,ρ ( R n , F , P ) (resp. S 2 , 2 β ,ρ ( R n × n , F , P ) ) and H 2 , 2 β ,ρ ( R n × d , F , P ) (resp. H 2 , 2 β ,ρ ( R n × n × d , F , P ) ) are deﬁned as spaces of contin uous functions y 7− → U y from R n (resp. R n × n ) into the Banach spaces S 2 β ( R n , F , P ) (resp. S 2 β ( R n × n , F , P ) ) and H 2 β ( R n × d , F , P ) (resp. H 2 β ( R n × n × d , F , P ) ), equipp ed with a supremum norm w eighted by ρ ( y ) 1 / 2 . Since ρ is strictly p ositiv e, these are weigh ted spaces of b ounded contin uous functions taking v alues in a Banach space. By standard functional analysis results, the space of b ounded contin uous functions from a top ological space in to a Banach space is itself a Banac h space under the suprem um norm. Since K β ( F , P ) is a ﬁnite Cartesian pro duct of Banach spaces, it is itself a Banac h space. T o further motiv ate these spaces at this point, let us present the follo wing lemma, that asserts that they hold Deﬁnition 2.7 . 8 Lemma 2.10 (Embedding of weigh ted spaces) . L et β > 0 and k ≥ 0 . L et U = ( U y ) y ∈ R n b e a r andom ﬁeld b elonging to the weighte d sp ac e S 2 , 2 β ,ρ ( R n , F , P ) . Then, U b elongs to the lo c al ly uniform sp ac e S 2 loc ( R n , F , P ) . Similarly, H 2 , 2 β ,ρ ( R n , F , P ) ⊂ H 2 loc ( R n , F , P ) . Pr o of. Let U ∈ S 2 , 2 β ,ρ ( R n , F , P ) . By deﬁnition, there exists a constant C U < ∞ such that sup z ∈ R n ∥U z ∥ 2 S 2 β ( R n , F , P ) (1 + ∥ z ∥ 2 ) k = C U . (2.5) W e must show that for an y compact set K ⊂ R n , the standard S 2 ( R n , F , P ) norm is uniformly b ounded. Let K be an arbitrary compact subset of R n . Since K is b ounded, there exists a radius R > 0 suc h that ∥ y ∥ ≤ R for all y ∈ K . First, we relate the β -weigh ted time norm to the standard norm. Since t ∈ [0 , T ] , we hav e e β t ≥ 1 . Thus, for an y pro cess Y ∥ Y ∥ 2 S 2 ( R n , F , P ) = E P  sup t ∈ [0 ,T ] ∥ Y t ∥ 2  ≤ E P  sup t ∈ [0 ,T ] e β t ∥ Y t ∥ 2  = ∥ Y ∥ 2 S 2 β ( R n , F , P ) . Next, w e handle the parameter weigh t. F or any y ∈ K ∥U y ∥ 2 S 2 ( R n , F , P ) ≤ ∥U y ∥ 2 S 2 β ( R n , F , P ) = (1 + ∥ y ∥ 2 ) k ∥U y ∥ 2 S 2 β ( R n , F , P ) (1 + ∥ y ∥ 2 ) k ≤ (1 + R 2 ) k sup z ∈ R n  ∥U z ∥ 2 S 2 β ( R n , F , P ) (1 + ∥ z ∥ 2 ) k  = (1 + R 2 ) k C U . The righ t-hand side is a ﬁnite constant independent of y ∈ K . Th us, sup y ∈ K ∥U y ∥ S 2 ( R n , F , P ) < ∞ . Con tinuit y of y 7− → U y in the standard norm follows immediately from the contin uity in the weigh ted norm, as the weigh t function (1 + ∥ y ∥ 2 ) − k is smo oth and b ounded aw a y from zero on compacts. Therefore, U ∈ S 2 loc ( R n , F , P ) . The remaining result is prov ed in an analogous wa y . 3 Main results In this section, we present the core theoretical contributions of this pap er. W e characterise the equilibrium strategies for state-dep enden t time-inconsisten t control problems through a probabilistic approach. The roadmap will b e as follows: ( i ) we ﬁrst pro vide an informal deriv ation of the system of backw ard sto chastic diﬀeren tial equations (BSDEs) that c har- acterises the equilibrium, building intuition from the extended HJB equation; ( ii ) we then establish an extended dynamic programming principle (DPP), which generalises the Bellman principle b y accoun ting for the c hanging preferences of the agen t; ( iii ) we deriv e the BSDE system (as a necessary condition for equilibria) and pro ve a veriﬁcation theorem (the suﬃciency coun terpart); ( iv ) w e prov e the well-posedness (existence and uniqueness) of this system. 3.1 An informal deriv ation of the BSDE system The purp ose of this section is to informally justify the BSDE system that will b e at the heart of this work. This deriv ation will be based on the extended HJB equation [ 8 , Deﬁnition 15.4], and th us we will remain in the Marko vian, feedback control (meaning we lo ok for an equilibrium con trol α ⋆ that is a deterministic feedbac k function of the time and state, i.e. , α ⋆ t = α ⋆ ( t, X t ) for some Borel-measurable map α ⋆ ), and we will use the w eak formulation all along. F or simplicity in this deriv ation, let n = d = 1 and let the dynamics of the state pro cess ( X t ) t ≥ 0 under P α b e given by X t = x 0 + Z t 0 σ ( r , X r ) b ( r , X r , α r )d r + Z t 0 σ ( r , X r )d W α r , t ∈ [0 , T ] . (3.1) Once again, the pay oﬀ functional is given by J ( t, x, α ) : = E P t , x , α  Z T t f ( s, x, X s , α s )d s + ξ ( x, X T )  , ( t, x ) ∈ [0 , T ] × R . 9 F or a ﬁxed control α ⋆ , w e let V ( t, x ) : = J ( t, x, α ⋆ ) denote the equilibrium v alue function and J ( t, x, y ) denote the auxiliary v alue function with ﬁxed preference parameter y , deﬁned as J ( t, x, y ) : = E P t , x , α ⋆  Z T t f ( s, y , X s , α ⋆ s )d s + ξ ( y , X T )  , ( t, x, y ) ∈ [0 , T ] × R × R . A ccording to the theory developed in Björk and Murgoci [ 5 ], the pair ( V , J ) must satisfy the extended HJB system, which w e present now particularised for our case. F or any ( t, x ) ∈ [0 , T ] × R n and action a ∈ A , we deﬁne the inﬁnitesimal generator L a t acting on smo oth functions ϕ ∈ C 2 ( R n ) b y L a t ϕ ( x ) : = b ( t, x, a ) σ ( t, x ) ∇ x ϕ ( x ) + 1 2 T r  σ ( t, x ) σ ( t, x ) ⊤ ∇ 2 xx ϕ ( x )  . F or ( t, x, y ) ∈ [0 , T ) × R × R , the system is                    ∂ t V ( t, x ) + sup a ∈ A n f ( t, x, x, a ) + b ( t, x, a ) σ ( t, x )  ∂ x V ( t, x ) − ∂ y J ( t, x, x )  o + 1 2 σ 2 ( t, x ) ∂ 2 xx V ( t, x ) − σ 2 ( t, x ) ∂ 2 xy J ( t, x, x ) − 1 2 σ 2 ( t, x ) ∂ 2 y y J ( t, x, x ) = 0 , ∂ t J ( t, x, y ) + L α ⋆ ( t,x ) t J ( t, x, y ) + f ( t, y , x, α ⋆ ( t, x )) = 0 , V ( T , x ) = ξ ( x, x ) , J ( T , x, y ) = ξ ( y , x ) . (3.2) The equilibrium con trol α ⋆ ( t, x ) is deﬁned as the argumen t attaining the suprem um in the ﬁrst equation. Note that in the second equation, the generator L α ⋆ ( t,x ) t acts on the v ariable x with y ﬁxed. Note that the equilibrium control, which maximises the supremum in the ﬁrst equation, appears in the second equation. Sim ultaneously , the function J ( t, x, y ) is part of the ﬁrst equation. Hence, the system is very entangled and it is hard to determine its well-posedness using analytical tec hniques. Within the supremum in ( 3.2 ), the eﬀectiv e gradient acting on the drift is not the standard ∂ x V , but the diﬀerence ∂ x V − ∂ y J . This sp eciﬁc structure motiv ates the deﬁnition of our Hamiltonian b elow. The diﬀusion part includes the standard Hessian ∂ xx V corrected b y the mixed deriv ativ e σ 2 ∂ xy J and the parameter Hessian 1 2 σ 2 ∂ y y J . T o derive the BSDE system, w e diﬀeren tiate the second equation in ( 3.2 ) with resp ect to y to ﬁnd the dynamics of the deriv ativ es J y ( t, x ) : = J ( t, x, y ) . F or ( t, x, y ) ∈ [0 , T ) × R × R     ∂ t + L α ⋆ ( t,x ) t  ∂ y J y ( t, x ) + ∂ y f ( t, y , x, α ⋆ ( t, x )) = 0 ,  ∂ t + L α ⋆ ( t,x ) t  ∂ 2 y y J y ( t, x ) + ∂ 2 y y f ( t, y , x, α ⋆ ( t, x )) = 0 . (3.3) W e now deﬁne the sto chastic pro cesses corresp onding to these quantities along the equilibrium tra jectory X t Y t = V ( t, X t ) , Z t = σ ( t, X t ) ∂ x V ( t, X t ) , t ∈ [0 , T ] , ∂ Y y t = ∂ y J ( t, y , X t ) , ∂ Z y t = σ ( t, X t ) ∂ 2 xy J ( t, y , X t ) , t ∈ [0 , T ] , y ∈ R , ∂ ∂ Y y t = ∂ 2 y y J ( t, y , X t ) , ∂ ∂ Z y t = σ ( t, X t ) ∂ 3 xy y J ( t, y , X t ) , t ∈ [0 , T ] , y ∈ R . Applying Itô’s formula to Y t , the drift is given b y ( ∂ t + L α ⋆ ( t, X t ) ) V ( t, X t ) . By rearranging the ﬁrst equation of the extended HJB system, we can express this op erator as  ∂ t + L α ⋆ ( t,X t ) ) V ( t, X t ) = − f  t, X t , X t , α ⋆ ( t, X t )  + b  t, X t , α ⋆ ( t, X t )  σ ( t, X t ) ∂ y J ( t, X t , X t ) + σ ( t, X t ) ∂ 2 xy J ( t, X t , X t ) + 1 2 σ 2 ( t, X t ) ∂ 2 y y J ( t, X t , X t ) . Substituting the pro cess deﬁnitions ( e.g. , σ ∂ y J = σ ∂ Y X t ), the driver for Y t b ecomes f ( t, X t , X t , α ⋆ ( t, X t )) − b ( t, X t , α ⋆ ( t, X t ))( σ ( t, X t ) ∂ Y X t t ) − σ ( t, X t ) ∂ Z X t t − 1 2 σ 2 ( t, X t ) ∂ ∂ Y X t t . 10 W e deﬁne the extended Hamiltonian H to encapsulate the maximisation problem. F or argumen ts ( t, x, z , γ , η , ρ ) representing ( t, X t , Z t , ∂ Y t , ∂ ∂ Y t , ∂ Z t ) in R H ( t, x, z , γ , η , ρ ) : = sup a ∈ A  f ( t, x, x, a ) + b ( t, x, a )( z − σ ( t, x ) γ )  − σ ( t, x ) ρ − 1 2 σ 2 ( t, x ) η . (3.4) W e assume, for simplicit y in this exp ository section, that there exists a unique A -v alued, Borel-measurable map V ⋆ satisfying the maximisation condition. The resulting BSDE system, under the reference measure P , is                Y t = ξ ( X T , X T ) + Z T t H  r , X r , Z r , ∂ Y X r r , ∂ ∂ Y X r r , ∂ Z X r r  d r − Z T t Z r d W r , t ∈ [0 , T ] , ∂ Y y t = ∂ y ξ ( y , X T ) + Z T t  ∂ y f  r , y , X r , α ⋆ r  + ∂ Z y r b ( r , X r , α ⋆ r )  d r − Z T t ∂ Z y r d W r , t ∈ [0 , T ] , y ∈ R n , ∂ ∂ Y y t = ∂ 2 y y ξ ( y , X T ) + Z T t  ∂ 2 y y f  r , y , X r , α ⋆ r  + ∂ ∂ Z y r b ( r , X r , α ⋆ r )  d r − Z T t ∂ ∂ Z y r d W r , t ∈ [0 , T ] , y ∈ R n . (3.5) One might ask why the system requires three equations ( including the Hessian ∂ ∂ Y ) when the original problem is characterised b y V and J . The reason lies in the second-order adjustment terms that appear in the equation for V . In the con text of BSDEs, the pro cess ∂ Z y carries the information of the mixed deriv ativ e ( sp eciﬁcally σ ∂ 2 xy J ) . T o write our system, we need the dynamics of the gradien t ∂ Y y . How ev er, as seen in ( 3.3 ), the dynamics of the ﬁrst deriv ative dep ends on the second deriv ativ es, such as ∂ 2 y y J . Therefore, to determine the evolution of the gradien t, we must simultaneously use the Hessian pro cess ∂ ∂ Y y . 3.2 Assumptions W e require the follo wing regularity assumptions for the v alidity of our main results. Assumption 3.1 (Regularity and gro wth of the co eﬃcien ts) . W e assume the fol lowing c onditions on the pr oblem data ( i ) contin uit y: the functions b, σ, f , and ξ ar e c ontinuous in al l their ar guments ; ( ii ) regularity of the state dynamics: the drift b : [0 , T ] × R n × A → R d is Lipschitz-c ontinuous with r esp e ct to the state variable x , uniformly in ( t, a ) . That is, ther e exists K > 0 such that for al l t ∈ [0 , T ] , a ∈ A , and ( x, x ′ ) ∈ R n × R n ∥ b ( t, x, a ) − b ( t, x ′ , a ) ∥ ≤ K ∥ x − x ′ ∥ ; ( iii ) regularity and growth of the cost: for every ﬁxe d ( t, x, a ) , the c ost functions y 7− → f ( t, x, y , a ) and y 7− → ξ ( y , x ) b elong to C 2 ( R n ) . Mor e over, the functions and their p artial derivatives satisfy a p olynomial gr owth c ondition. Ther e exist c onstants C > 0 and m ≥ 1 such that for al l ( t, x, y , a ) ∈ [0 , T ] × R n × R n × A | f ( t, x, y , a ) | + ∥∇ y f ( t, x, y , a ) ∥ + ∥∇ 2 y y f ( t, x, y , a ) ∥ + | ξ ( y , x ) | + ∥∇ y ξ ( y , x ) ∥ + ∥∇ 2 y y ξ ( y , x ) ∥ ≤ C  1 + ∥ x ∥ m + ∥ y ∥ m + ∥ a ∥ m  ; ( iv ) in tegrability of the state: for any admissible c ontr ol α ∈ A and any p ≥ 1 , the c ontr ol le d state pr o c ess X admits ﬁnite moments of or der p , uniformly in time E P α  sup t ∈ [0 ,T ] ∥ X t ∥ p  < ∞ ; ( v ) non-degeneracy: the diﬀusion matrix σ : [0 , T ] × R n → R n × d is b ounde d and ful l r ank. The Lipschitz-con tin uity of the co eﬃcients ensures that the state pro cess remains well-behav ed under reasonable con trols. W e formalise this in the following lemma, whic h justiﬁes the in tegrability of the p olynomial costs. Lemma 3.2 (Momen t estimates for the state pro cess) . L et Assumption 3.1 . ( ii ) hold. L et α ∈ A b e an admissible c ontr ol such that the drift b α ( t, x ) : = b ( t, x, α t ) satisﬁes the line ar gr owth c ondition ∥ b α ( t, x ) ∥ ≤ C (1 + ∥ x ∥ ) , ∀ ( t, x ) ∈ [0 , T ] × R n . This holds, for instanc e, if α is b ounde d or is a line ar fe e db ack c ontr ol as in the LQR c ase. Then, for any p ≥ 1 , the state pr o c ess X admits ﬁnite moments of or der p under the c ontr ol le d me asur e P α , uniformly in time E P α  sup t ∈ [0 ,T ] ∥ X t ∥ p  < ∞ . Pr o of. This is a standard result in the theory of sto c hastic diﬀeren tial equations. Under the linear growth condition on the drift b α and the diﬀusion σ (implied by Assumption 3.1 . ( ii ) ), the existence of moments of all orders follows from standard estimates, suc h as those in [ 29 , Theorem 5.2.2.9]. 11 3.3 The extended dynamic programming principle As with all time-inconsistent problems, the classical Bellman principle fails b ecause the cost functional changes with the state as time adv ances. How ever, we manage to pro ve an equalit y w e call extended dynamic programming principle that resem bles a classical DPP , and in fact implies it in the absence of x in the rew ard functional. Theorem 3.3 (Extended dynamic programming principle) . L et Assumption 3.1 hold and let α ⋆ ∈ E b e an e quilibrium c ontr ol. Then, for any t ∈ [0 , T ] , for al l s ∈ [0 , t ] and x ∈ R n , we have v ( s, x ) = sup α ∈A E P s , x , α " v ( t, X t ) + Z t s f ( r, X r , X r , α r ) − b ( r , X r , α r ) · σ ( r , X r ) ⊤ E P r , X r , α ⋆  ∇ y ξ ( X r , X T ) + Z T r ∇ y f  u, X r , X u , α ⋆ u  d u  − T r " σ ( r , X r ) σ ( r , X r ) ⊤ E P r , X r , α ⋆  ∇ 2 y x ξ ( X r , X T ) + Z T r ∇ 2 y x f  u, X r , X u , α ⋆ u  d u  # − 1 2 T r " σ ( r , X r ) σ ( r , X r ) ⊤ E P r , X r , α ⋆  ∇ 2 y y ξ ( X r , X T ) + Z T r ∇ 2 y y f  u, X r , X u , α ⋆ u  d u  #! d r # . (3.6) F urthermor e, the e quilibrium c ontr ol α ⋆ attains the supr emum in ( 3.6 ) . The three last rows represent the cost of time-inconsistency: the drift in v alue caused solely by the up dating of preferences along the path. This result is the main building blo c k for the rest of the theory developed in this pap er. See Section A for the proof. 3.4 A necessity result W e recall that, for a ﬁxed equilibrium con trol α ⋆ , w e will v ery often use the follo wing notation J ( t, x, y ) : = E P t , x , α ⋆  Z T t f ( u, y , X u , α ⋆ u )d u + ξ ( y , X T )  , t ∈ [0 , T ] , y ∈ R n . In other words, J ( t, x, y ) represents the pa yoﬀ under the equilibrium control if we were to freeze the parameter y . The next theorem guaran tees that smo oth equilibrium controls implicitly deﬁne solutions to ( 3.7 ). Using the extended DPP , we can formally characterise the equilibrium via the system of BSDEs ( 3.7 ). W e identify the scalar v alue pro cess Y t = v ( t, X t ) , the gradien t vector pro cess ∂ Y y t = ∂ y J ( t, X t , y ) , and the Hessian matrix pro cess ∂ ∂ Y y t = ∂ 2 y y J ( t, X t , y ) .                Y t = ξ ( X T , X T ) + Z T t H  r , X r , Z r , ∂ Y X r r , ∂ ∂ Y X r r , ∂ Z X r r  d r − Z T t Z r d W r , t ∈ [0 , T ] . ∂ Y y t = ∇ y ξ ( y , X T ) + Z T t  ∇ y f ( r, y, X r , α ⋆ r ) + ∂ Z y r b ( r , X r , α ⋆ r )  d r − Z T t ∂ Z y r d W r , t ∈ [0 , T ] , y ∈ R n , ∂ ∂ Y y t = ∇ 2 y y ξ ( y , X T ) + Z T t  ∇ 2 y y f ( r, y, X r , α ⋆ r ) + ∂ ∂ Z y r b ( r , X r , α ⋆ r )  d r − Z T t ∂ ∂ Z y r d W r , t ∈ [0 , T ] , y ∈ R n . (3.7) Here, the extended Hamiltonian H is deﬁned to match the v ariables introduced in the informal deriv ation. F or a state x ∈ R n , it takes as arguments the co-state z ∈ R d , the parameter gradient γ ∈ R n , the parameter Hessian η ∈ M n ( R ) , and the mixed consistency term ρ ∈ M n,d ( R ) H ( t, x, z , γ , η , ρ ) : = sup a ∈ A  f ( t, x, x, a ) + b ( t, x, a ) · ( z − σ ( t, x ) ⊤ γ )  − 1 2 T r  σ ( t, x ) σ ( t, x ) ⊤ η  − T r  σ ( t, x ) ρ ⊤  . (3.8) Remark 3.4 (Dimensionalit y of the adjoint processes) . L et us clarify the dimensions of the pr o c esses app e aring in the system ( 3.7 ) . L et us r e c al l that the state pr o c ess X takes values in R n and the Br ownian motion W in R d . • V alue pro cess: Y is sc alar-value d in R . Its volatility Z takes values in R d . 12 • Gradient pro cess: ∂ Y takes values in R n (r epr esenting ∇ y J ). Its volatility ∂ Z is deﬁne d as a matrix in R n × d . This sp e ciﬁc dimension is r e quir e d by the Hamiltonian term T r[ σρ ] in ( 3.8 ) . Sinc e σ ∈ R n × d , the variable ρ ( identiﬁe d with ∂ Z ) must b e in R n × d for the pr o duct σ ρ ⊤ to b e a squar e matrix in R n × n . • Hessian pro cess: ∂ ∂ Y takes values in R n × n ( r epr esenting ∇ 2 y y J ) . Conse quently, its volatility ∂ ∂ Z is a r ank-3 tensor in R n × n × d , r epr esenting the sensitivity of e ach entry of the Hessian matrix to the d c omp onents of the Br ownian motion. Remark 3.5 (Consistency with the classical theory) . The Hamiltonian deﬁne d in ( 3.8 ) includes the terms involving γ , η , and ρ , which diﬀer fr om the standar d Hamiltonian in time-c onsistent sto chastic c ontr ol. These terms r epr esent the inconsistency adjustmen t . Inde e d, c onsider a standar d time-c onsistent pr oblem wher e the c ost functions f and ξ do not dep end on the p ar ameter y . In this c ase, the auxiliary value function J ( t, x, y ) is indep endent of y , implying that the derivatives ∇ y J , ∇ 2 y y J , and ∇ 2 xy J vanish. Conse quently, the inputs γ , η , and ρ ar e zer o, and the Hamiltonian r e duc es to H ( t, x, z , 0 , 0 , 0) = sup a ∈ A  f ( t, x, a ) + b ( t, x, a ) · z  . Thus, we r e c over the standar d Hamiltonian fr om the classic al sto chastic c ontr ol the ory. Let us deﬁne what we mean b y the solution to suc h a system. Deﬁnition 3.6. W e say that ( Y , Z, ∂ Y , ∂ Z , ∂ ∂ Y , ∂ ∂ Z ) is a solution to the system ( 3.7 ) if ( i ) the system of e quations ( 3.7 ) holds P –a.s. ; ( ii ) the value pr o c ess and its c ontr ol satisfy the standar d inte gr ability Y ∈ S 2 ( R , F , P ) , Z ∈ H 2 ( R d , F , P ); ( iii ) the derivative r andom ﬁelds b elong to the lo c al ly uniform sp ac es. That is, for any ψ ∈ { ∂ Y , ∂ ∂ Y } and ϕ ∈ { ∂ Z , ∂ ∂ Z } ψ ∈ S 2 loc ( R k 1 , F , P ) , ϕ ∈ H 2 loc ( R k 2 , F , P ) , wher e k 1 and k 2 r epr esent the right dimensions of the derivative r andom ﬁelds. In other words, we ask the pro cesses to b e in the classical spaces for the solution of BSDEs, but we additionally ask that the norms of the families indexed by the parameter y are uniformly b ounded in the sense of the norm of con vergence ov er compact subsets. Compared with the deﬁnition of solution given in Hernández and Possamaï [ 21 ], where the space in whic h the uni-parametric family to ok v alues was already compact, we need to consider a w eaker norm. Theorem 3.7 (Necessity) . L et Assumption 3.1 hold and let α ⋆ ∈ A b e an e quilibrium c ontr ol in the sense of Deﬁnition 2.4 . A ssume that the e quilibrium value function V ( t, x ) : = J ( t, x, α ⋆ ) b elongs to C 1 , 2 ([0 , T ) × R n ) ∩ C 0 ([0 , T ] × R n ) and the p ar ametric function J ( t, x, y ) : = E P t , x , α ⋆  Z T t f ( s, y , X s , α ⋆ s )d s + ξ ( y , X T )  , b elongs to C 1 , 2 , 2 ([0 , T ) × R n × R n ) ∩ C 0 ([0 , T ] × R n × R n ) . Then, the pr o c esses ( Y , Z, ∂ Y , ∂ Z , ∂ ∂ Y , ∂ ∂ Z ) deﬁne d by Y t : = V ( t, X t ) , Z t : = ∇ x V ( t, X t ) σ ( t, X t ) , t ∈ [0 , T ] , ∂ Y y t : = ∇ y J ( t, X t , y ) , ∂ Z y t : = ∇ 2 y x J ( t, X t , y ) σ ( t, X t ) , t ∈ [0 , T ] , y ∈ R n , ∂ ∂ Y y t : = ∇ 2 y y J ( t, X t , y ) , ∂ ∂ Z y t : = ∇ 3 xy y J ( t, X t , y ) σ ( t, X t ) , t ∈ [0 , T ] , y ∈ R n , pr ovide d they b elong to the suitable sp ac es state d in Deﬁnition 3.6 , solve the BSDE system ( 3.7 ) . F urthermor e, α ⋆ satisﬁes the optimality c ondition α ⋆ t ∈ argmax a ∈ A n f ( t, X t , X t , a ) + b ( t, X t , a ) ·  Z t − σ ( t, X t ) ⊤ ∂ Y X t t  o , d t ⊗ P –a.e. (3.9) The proof can b e found in Section B . Remark 3.8 (Structure of the inconsistency adjustmen t) . In the optimality c ondition ab ove, it is imp ortant to note that the auxiliary function J ( t, x, y ) and the pr o c ess ∂ Y X t t ar e deﬁne d for a ﬁxe d e quilibrium str ate gy α ⋆ . 13 3.5 V eriﬁcation theorem W e now present the v eriﬁcation theorem, whic h states that a solution to the deriv ed BSDE system, satisfying the Hamiltonian maximisation condition, yields an equilibrium con trol. Theorem 3.9 (V eriﬁcation) . L et Assumption 3.1 hold. A ssume ther e exists a solution ( Y , Z , ∂ Y , ∂ Z , ∂ ∂ Y , ∂ ∂ Z ) to the system ( 3.7 ) in the sense of Deﬁnition 3.6 . Deﬁne the c andidate fe e db ack c ontr ol pr o c ess α ⋆ = ( α ⋆ t ) t ∈ [0 ,T ] by the c ondition that it maximises the extende d Hamiltonian α ⋆ t ∈ argmax a ∈ A  f ( t, X t , X t , a ) + b ( t, X t , a ) ·  Z t − σ ( t, X t ) ⊤ ∂ Y X t t  , d t ⊗ d P –a.e. (3.10) Supp ose further that ( i ) the c ontr ol pr o c ess α ⋆ is admissible, i.e. , α ⋆ ∈ A ; ( ii ) the function v ( t, x ) identiﬁe d with Y via Y t = v ( t, X t ) b elongs to C 1 , 2 ([0 , T ) × R n ) ∩ C 0 ([0 , T ] × R n ) . Then, α ⋆ is an e quilibrium c ontr ol, and Y t is the asso ciate d value pr o c ess, i.e. , Y t = J ( t, X t , α ⋆ ) . The proof can b e found in Section C . Remark 3.10 (Existence of a measurable equilibrium feedback) . In the statement of Theorem 3.9 , we deﬁne d the c andidate c ontr ol α ⋆ via the maximisation of the Hamiltonian, assuming that an admissible, me asur able sele ction of the ar gmax exists. L et us brieﬂy mention why assuming this is p erfe ctly r e asonable in our setting. Consider the set-value d map Φ : [0 , T ] × R n × R d × R n ⇒ A deﬁne d by the set of maximisers Φ( t, x, z , γ ) : = argmax a ∈ A  f ( t, x, x, a ) + b ( t, x, a ) · ( z − σ ( t, x ) ⊤ γ )  . Under Assumption 3.1 , the c o eﬃcients b, σ, and f ar e c ontinuous in al l ar guments. Conse quently, the function b eing maximise d is jointly c ontinuous in (( t, x, z , γ ) , a ) , which implies that the map Φ has a me asur able gr aph and takes close d values. Sinc e the action sp ac e A is a close d subset of a Polish sp ac e ( and assuming the maximum is attaine d, e.g. , if A is c omp act or under suitable c o er civity c onditions ) , the Kuratowski–R yll–Nardzewski selection theorem ( or r ather, a c or ol lary of it, se e, e.g. , [ 1 , Theorem 17.18] ) guar ante es the existenc e of a Bor el-me asur able function V ⋆ : [0 , T ] × R n × R d × R n − → A such that V ⋆ ( t, x, z , γ ) ∈ Φ( t, x, z , γ ) for al l inputs. Deﬁning the pr o c ess α ⋆ t : = V ⋆ ( t, X t , Z t , ∂ Y X t t ) yields an F -pr e dictable c ontr ol c andidate. 3.6 W ell-posedness of the solution W e ﬁnish with a result guaran teeing existence of solutions in the sense of Deﬁnition 3.6 . W e ﬁrst deﬁne the driver functions G 1 and G 2 corresp onding to the second and third equations of the system ( 3.7 ). W e denote the arguments by ( t, x, y , z , γ , v , v ) , where z represents the volatilit y of the v alue pro cess Z t , γ represents the inconsistency term ∂ Y X t t , and v , v represent the deriv ativ e volatilities ∂ Z y t and ∂ ∂ Z y t , respectively . G 1 ( t, x, y , z , γ , v ) : = ∇ y f ( t, y , x, α ⋆ ) + v b ( t, x, α ⋆ ) , G 2 ( t, x, y , z , γ , v ) : = ∇ 2 y y f ( t, y , x, α ⋆ ) + v b ( t, x, α ⋆ ) , where α ⋆ : = V ⋆ ( t, x, z , γ ) (see Remark 3.10 ). Assumption 3.11 (Drivers integrabilit y and regularity) . L et Θ : = ( z , u , v , v ) b e the ve ctor of inputs for the drivers ( r epr esenting the Z , ∂ Y , ∂ Z , and ∂ ∂ Z c omp onents r esp e ctively ) . W e assume ther e exist a c onstant C > 0 such that: ( i ) Regularity of the Hamiltonian driver H . The driver of the value pr o c ess satisﬁes a Lipschitz-c ontinuity c ondition. F or any t, x and inputs Θ , Θ ′ | H ( t, x, Θ) − H ( t, x, Θ ′ ) | ≤ C ∥ Θ − Θ ′ ∥ . ( ii ) Structure of the deriv ative drivers G ∈ { G 1 , G 2 } . The drivers for the gr adient and Hessian pr o c esses satisfy a Lipschitz- c ontinuity c ondition. F or any p ar ameter y and input ve ctor Θ | G ( t, x, y , Θ) − G ( t, x, y , Θ ′ ) | ≤ C ∥ Θ − Θ ′ ∥ . 14 ( iii ) Integrabilit y of source terms. The terminal c onditions and the drivers evaluate d at the nul l input ve ctor Θ 0 : = 0 satisfy the fol lowing inte gr ability r e quir ements • v alue process source: the diagonal terminal c ost and the b ase Hamiltonian ar e squar e-inte gr able E P    ξ ( X T , X T )   2 + Z T 0   H ( t, X t , Θ 0 )   2 d t  < ∞ ; • deriv ativ e ﬁelds source: the p ar ameter-dep endent sour c e terms have ﬁnite weighte d norms sup y ∈ R n ρ ( y ) E P    ∇ y ξ ( y , X T )   2 +   ∇ 2 y y ξ ( y , X T )   2 + Z T 0 2 X i =1   G i ( t, X t , y , Θ 0 )   2 d t  < ∞ . Note that the integrabilit y of the state pro cess X is already guaranteed by Assumption 3.1 . ( iv ) , whic h is essential to ensure that these p olynomial b ounds result in integrable random v ariables. Now we are able to state our uniqueness and existence result. Theorem 3.12 (W ell-p osedness) . Under Assumptions 3.1 and 3.11 , ther e exists a weighting p ar ameter β > 0 such that the BSDE system ( 3.7 ) admits a unique solution in the weighte d sp ac e K β . Conse quently, this solution also satisﬁes the c onditions of Deﬁnition 3.6 . W e remark that Assumption 3.11 imp oses strong Lipschitz-con tinuit y requirements, and that the inconsisten t linear–quadratic regulator is not cov ered b y our result. Our p oint here is to present a general w ell-p osedness result, and demonstrate the kind of tec hniques and spaces that are necessary to consider. W e b eliev e that a result where H , G 1 and G 2 ha ve a sto chastic Lipschitz co eﬃcien t prop ortional to 1 + ∥ X t ∥ 2 + ∥ Z t ∥ (which is exactly what is required to cov er the linear–quadratic example) is ac hiev able and we leav e it as an open problem for future research. W e will conten t ourselv es here to mention that the literature on BSDEs whose generators hav e BMO Lipschitz-con tin uity constants, see Imkeller, Réveillac, and Rich ter [ 26 ], or quadratic BSVIEs, see Hernández [ 20 ], should be a go od starting p oin t. Remark 3.13 (Dep endency of the functional spaces on the driver’s growth) . The deﬁnition of the weighte d sp ac e K β involving the p olynomial weight ρ ( y ) : = (1 + ∥ y ∥ 2 ) − k is not intrinsic to the gener al the ory but is a sp e ciﬁc choic e made to ac c ommo date the p olynomial gr owth as the one we have on the LQR c ase. 4 An example: the linear–quadratic time-inconsisten t regulator After introducing all our results, we presen t a full study of a time-inconsistent problem whose inconsistency comes fully from the presence of the current state v ariable in the reward functional. 4.1 Problem setting W e consider the linear–quadratic regulator (LQR) problem with a state-dep enden t terminal cost, a classical example in the literature of time-inconsisten t control (see Björk, Khapko, and Murgo ci [ 8 ; 24 ]). F or simplicity , we tak e the dimension of the state process to b e n = d = 1 . The state pro cess X evolv es according to the linear dynamics d X t =  ¯ aX t + ¯ bα t  d t + σ d W t , X 0 = x 0 . (4.1) The ob jectiv e is to minimise the squared distance of the terminal state from the current state, p enalised by the control eﬀort. Hence, the cost functional is giv en by J ( t, x, α ) : = E P t , x , α  Z T t 1 2 α 2 s d s + Γ 2 ( X T − x ) 2  . (4.2) Here, w e iden tify f ( t, y , x, a ) = 1 2 a 2 and ξ ( y , x ) = Γ 2 ( x − y ) 2 . The app earance of the current state x in the terminal cost ξ creates the time-inconsistency . Example 4.1 (Motiv ation: the p olitical economy of debt managemen t) . Consider a government managing its national debt r atio X . The dynamics ar e governe d by the inter est r ate gap ¯ a ( gr owth r ate of debt ) and ﬁsc al adjustments α ( surplus/deﬁcit sp ending ) d X t =  ¯ aX t + ¯ bα t  d t + σ d W t . 15 The government aims at minimising the c ost of ﬁsc al interventions (tax distortions), r epr esente d by 1 2 α 2 t . However, the terminal obje ctive exhibits r efer enc e p oint adaptation. A government at time t c ommits to bringing the debt X T close to their curr ent observe d level X t . They p enalise deviations fr om this inherite d b aseline r ather than an absolute historic al zer o J ( t, x, α ) = E P t , x , α  Z T t 1 2 α 2 s d s + Γ 2 ( X T − x ) 2  . This cr e ates a time-inc onsistent pr efer enc e structur e: as the debt drifts, futur e administr ations c ontinuously r eset the tar get x to the new pr evailing debt level, le ading to the ‘ drifting go alp ost ’ phenomenon that we wil l analyze shortly. 4.2 Equilibrium controls represen tation F ollowing the general theory in Section 3 , the equilibrium v alue function and the asso ciated dual pro cesses are characterised b y the BSDE system ( 3.7 ). F or the LQR problem, this system corresp onds to, under P            Y t = Z T t H  r , X r , Z r , ∂ Y X r r , ∂ ∂ Y X r r , ∂ Z X r r  d r − Z T t Z r · d W r , t ∈ [0 , T ] , ∂ Y y t = Γ( y − X T ) + Z T t ∂ Z y r σ − 1 (¯ aX r + ¯ bα ⋆ r )d r − Z T t ∂ Z y r · d W r , t ∈ [0 , T ] , ∂ ∂ Y y t = Γ , t ∈ [0 , T ] . (4.3) The extended Hamiltonian H corresponds to: H ( t, x, z , γ , η , ρ ) : = inf a ∈ R  1 2 a 2 + (¯ ax + ¯ ba ) σ − 1 z − ( ¯ ax + ¯ ba ) γ  − 1 2 σ 2 η − σ ρ. The equilibrium control α ⋆ is the minimiser of this Hamiltonian. The ﬁrst-order conditions yield a + ¯ bσ − 1 z − ¯ bγ = 0 ⇐ ⇒ a = − ¯ b ( σ − 1 z − γ ) . Remark 4.2 (Sign conv en tion) . The gener al the ory in Section 3 is formulate d as a maximisation pr oblem, with the agent se eking to maximise the functional J . The line ar–quadr atic example studie d in this se ction is inste ad a minimisation pr oblem: the agent incurs a quadr atic running c ost 1 2 α 2 and a quadr atic terminal p enalty Γ 2 ( X T − x ) 2 , b oth non-ne gative, and se eks to minimise their exp e cte d sum. T o emb e d this within the gener al fr amework it suﬃc es to r eplac e J by − J thr oughout, or e quivalently to r eplac e sup by inf in the Hamiltonian ( 3.8 ) and r everse the ine quality in the e quilibrium c ondition 2.4 . A l l structur al r esults—the extende d DPP , the BSDE char acterisation, the ne c essity and veriﬁc ation the or ems—c arry over verb atim under this sign change. In the notation of this se ction we, ther efor e , write the Hamiltonian as an inﬁmum and identify f ( t, y , x, a ) = 1 2 a 2 and ξ ( y , x ) = Γ 2 ( x − y ) 2 . Remark 4.3 (V eriﬁcation of assumptions) . The LQR pr oblem ﬁts within the fr amework of Assumption 3.1 . Thus, Theo- rem 3.7 and Theorem 3.9 apply to this c ase. In p articular, al l e quilibria that satisfy the hyp otheses of Theorem 3.7 must satisfy the ab ove BSDE . The fact that Theorem 3.12 c annot b e use d her e simply pr events us fr om stating that the e quilibrium we ar e deriving b elow is unique. Substituting the BSDE v ariables Z t and ∂ Y X t t , w e obtain the feedbac k form α ⋆ t = ¯ b  ∂ Y X t t − σ − 1 Z t  . (4.4) T o explicitly solve this system, w e make use of Theorem 3.9 by lo oking for a decoupling ﬁeld J ( t, x, y ) such that J ( t, X t , y ) = Y y t . This function m ust solve the following parametrised PDE ∂ t J + (¯ ax + ¯ bα ⋆ ) ∂ x J + 1 2 σ 2 ∂ xx J + 1 2 ( α ⋆ ) 2 = 0 , J ( T , x, y ) = Γ 2 ( x − y ) 2 . (4.5) Lemma 4.4 (Deriv ation of the Riccati system) . A ssume that the value function admits the quadr atic Ansatz J ( t, x, y ) = A ( t ) x 2 + B ( t ) y 2 + C ( t ) xy + D ( t ) x + F ( t ) y + H ( t ) . (4.6) Then, the e quilibrium c ontr ol is line ar in x α ⋆ ( t, x ) = − ¯ b  (2 A ( t ) + C ( t )) x  . (4.7) 16 The time-dep endent c o eﬃcients satisfy the fol lowing system of or dinary diﬀer ential e quations A ′ + 2¯ aA − 2 ¯ b 2 A (2 A + C ) + 1 2 ¯ b 2 (2 A + C ) 2 = 0 , A ( T ) = Γ / 2 , C ′ + ¯ aC − ¯ b 2 C (2 A + C ) = 0 , C ( T ) = − Γ , H ′ − 1 2 ¯ b 2 D 2 + σ 2 A = 0 , H ( T ) = 0 , (4.8) with B ( t ) ≡ Γ / 2 , D ( t ) ≡ 0 and F ( t ) ≡ 0 . Pr o of. W e derive the system by substituting the A nsatz in to the equilibrium condition and the PDE. First, recall the identiﬁca- tions from the Mark ovian setting: Z t = σ ∂ x V ( t, x ) and ∂ Y X t t = ∂ y J ( t, x, y ) | y = x , where V ( t, x ) = J ( t, x, x ) is the equilibrium v alue function. Using the A nsatz ( 4.6 ), the deriv atives are ∂ x J ( t, x, y ) = 2 A ( t ) x + C ( t ) y + D ( t ) , ∂ y J ( t, x, y ) = 2 B ( t ) y + C ( t ) x + F ( t ) . The equilibrium v alue function is V ( t, x ) = ( A + B + C ) x 2 + ( D + F ) x + H . Thus, ∂ x V ( t, x ) = 2( A + B + C ) x + ( D + F ) . Substituting these in to the control formula ( 4.4 ) (noting that σ − 1 Z t = ∂ x V ( t, x ) implies the term ¯ b ( ∂ Y − σ − 1 Z ) corresp onds to ¯ b ( ∂ y J − ∂ x V ) ): α ⋆ ( t, x ) = ¯ b   2 B x + C x + F  −  2( A + B + C ) x + D + F   = − ¯ b  (2 A + C ) x + D  . Let us deﬁne the feedback gains K ( t ) : = ¯ b (2 A ( t ) + C ( t )) and Λ( t ) : = ¯ bD ( t ) , so α ⋆ = − K x − Λ . Now, substitute J and α ⋆ in to the PDE ( 4.5 ). W e expand all terms fully ( A ′ x 2 + B ′ y 2 + C ′ xy + D ′ x + F ′ y + H ′ ) | {z } ∂ t J +(¯ ax − ¯ bK x − ¯ b Λ) (2 Ax + C y + D ) | {z } ∂ x J + 1 2 σ 2 (2 A ) | {z } ∂ xx J + 1 2 ( K 2 x 2 + 2 K Λ x + Λ 2 ) | {z } ( α ⋆ ) 2 = 0 . Matc hing co eﬃcien ts for each monomial term • x 2 : A ′ + 2 A (¯ a − ¯ bK ) + 1 2 K 2 = 0 . Substituting K A ′ + 2¯ aA − 2 ¯ b 2 A (2 A + C ) + 1 2 ¯ b 2 (2 A + C ) 2 = 0 . • xy : C ′ + C (¯ a − ¯ bK ) = 0 = ⇒ C ′ + ¯ aC − ¯ b 2 C (2 A + C ) = 0 . • x : D ′ + D (¯ a − ¯ bK ) − 2 A ¯ b Λ + K Λ = 0 . Substituting Λ = ¯ bD D ′ + D (¯ a − ¯ bK ) − 2 A ¯ b 2 D + ¯ bK D = D ′ + ¯ aD − 2 ¯ b 2 AD = 0 . • y 2 : B ′ = 0 . Boundary condition B ( T ) = Γ / 2 = ⇒ B ( t ) ≡ Γ / 2 . • y : F ′ − C ¯ b Λ = 0 = ⇒ F ′ − ¯ b 2 C D = 0 . This implies F ′ = 0 = ⇒ F ( t ) ≡ 0 . • constant: H ′ − D ¯ b Λ + σ 2 A + 1 2 Λ 2 = 0 . Since D ≡ 0 = ⇒ Λ ≡ 0 , this simpliﬁes to H ′ + σ 2 A = 0 . Finally , note that since D ( t ) ≡ 0 , the aﬃne part of the con trol v anishes, and α ⋆ ( t, x ) = − K ( t ) x . 4.3 Comparison of strategies W e compare the performance of the sophisticated (equilibrium) agent against the naive agent. More precisely , we consider ( i ) e quilibrium str ate gy: deﬁned b y α ⋆ ( t, x ) = − K eq ( t ) x , where K eq ( t ) = ¯ b (2 A ( t ) + C ( t )) is derived from Lemma 4.4 . ( ii ) naive str ate gy: the naive feedback law α naive ( t, x ) = − K naive ( t ) x is deriv ed b y solving a standard time-consistent LQR problem at each instant t , where the agent treats the current state as a ﬁxed target y = X t for the remaining horizon [ t, T ] . 17 By p ostulating a quadratic v alue function V ( s, x ; y ) = P ( s ) x 2 + Q ( s ) xy + R ( s ) y 2 + M ( s ) x + N ( s ) y + L ( s ) , the HJB equation for a ﬁxed parameter y yields the follo wing system for the principal co eﬃcien ts P ′ ( t ) + 2 aP ( t ) − 2 b 2 P ( t ) 2 = 0 , P ( T ) = Γ / 2 , Q ′ ( t ) + ( a − 2 b 2 P ( t )) Q ( t ) = 0 , Q ( T ) = − Γ . Solving for Q ( t ) via an integrating factor and ev aluating the optimal control a ∗ = − b (2 P ( t ) x + Q ( t ) y ) on the diagonal where y = x leads directly to: K naive ( t ) = b 2 P ( t ) − Γ exp Z T t ( a − 2 b 2 P ( u )) du !! . (4.9) W e simulate the trajectories of the state pro cess X under b oth strategies using an Euler–Maruy ama discretisation. W e use the parameters T = 1 , ¯ a = 0 . 5 , ¯ b = 1 , σ = 0 . 5 , x 0 = 1 , and Γ = 5 . 0 . Figure 1: Comparison of state tra jectories (left) and control eﬀort (right). The equilibrium strategy (blue) maintains the state slightly near the target x 0 = 1 . 0 with mo derate eﬀort. The naive strategy (red dashed) applies more control initially . T o rigorously quantify the p erformance gap, we compute the exact exp ected time- 0 cost J (0 , x 0 ) for b oth strategies. Since b oth strategies are linear feedback laws of the form α ( t, x ) = − K ( t ) x , w e can derive the cost analytically . Prop osition 4.5 (Exact cost) . F or a line ar c ontr ol α t = − K ( t ) X t , the exp e cte d c ost is J (0 , x 0 , α ) = Z T 0 1 2 K ( t ) 2 S ( t )d t + Γ 2  S ( T ) − 2 x 0 m ( T ) + x 2 0  , (4.10) wher e m ( t ) = E P α [ X t ] and S ( t ) = E P α [ X 2 t ] ar e the ﬁrst two moments of the state pr o c ess under the c ontr ol le d me asur e P α , satisfying the ODEs m ′ ( t ) = (¯ a − ¯ bK ( t )) m ( t ) , S ′ ( t ) = 2(¯ a − ¯ bK ( t )) S ( t ) + σ 2 , (4.11) with initial c onditions m (0) = x 0 , S (0) = x 2 0 . Pr o of. The state dynamics under the measure P α are given b y d X t = (¯ a − ¯ bK ( t )) X t d t + σ d W α t . T aking exp ectations yields the ODE for m ( t ) . Applying Itô’s formula to X 2 t giv es d( X 2 t ) = (2(¯ a − ¯ bK ) X 2 t + σ 2 )d t + 2 σ X t d W α t . T aking exp ectations under P α yields the ODE for S ( t ) . Substituting E P α [ α 2 t ] = K ( t ) 2 S ( t ) and expanding the terminal term yields the cost formula. The sensitivit y analysis in Figure 2 , computed using Prop osition 4.5 , conﬁrms that the sophisticated strategy yields a strictly lo wer cost for all Γ > 0 , with the gap widening as Γ increases. This is coherent with the in tuition that the parameter Γ incen tivises co operation b et ween past and future v ersions of the con trollers by increasing the scale of the quadratic p enalt y . 18 Figure 2: Sensitivity analysis. The total exp ected cost J (0 , x 0 ) is plotted against the inconsistency parameter Γ . The equilibrium strategy consistently outp erforms the naiv e strategy as the p enalt y parameter increases. 5 Time-dep endency as a particular case of state-dep endence The primary fo cus of this pap er has b een the dep endence of preferences on the current state x . How ev er, the v ast ma jorit y of the literature on time-inconsistent control fo cuses on a diﬀerent source of inconsistency: time-dep endent preferences. The canonical example is non-exp onen tial discoun ting ( e.g. , h yp erb olic or quasi-hyperb olic discoun ting), where the agen t’s v aluation of future rew ards dep ends on the sp eciﬁc time t at whic h the v aluation is made. A natural question arises: is the theory dev elop ed here for state-dep enden t inconsistency compatible with the existing theory for time-dep endent inconsistency? In this section, we show that our result is, in fact, a strict generalisation of [ 21 ] in the Marko vian, uncontrolled volatilit y case. W e achiev e this by viewing the initial time t not as an indep enden t parameter, but as a comp onen t of the initial state vector. By augmen ting the state process, we can cov er the time-dep enden t problem p erfectly in our state-dep enden t framew ork. Remark 5.1. Note that the non-Markovian c ase is not fe asible in our setting, sinc e the pr esenc e of the curr ent state in the r ewar d functional c omp els us to lo ok for fe e db ack str ate gies that dep end on the curr ent state exclusively. However, we b elieve that the extension to c ontr ol le d volatility should b e p ossible, although te chnic al ly involve d. 5.1 General problem form ulation Let us consider a rew ard functional where the running cost f and the terminal cost ξ depend explicitly on the initialisation time t . W e deﬁne the cost functional for an agent initialised at time t with state x as ˜ J ( t, x, α ) : = E P t , x , α  Z T t f ( s, t, X s , α s )d s + ξ ( t, X T )  . (5.1) Here, the distinction b et ween the v ariable t and the v ariable s is crucial • s ∈ [ t, T ] is the running time, representing the evolution of the system; • t ∈ [0 , T ] is the preference parameter, representing the current time from the p ersp ectiv e of the agent. F or example, in non-exp onen tial discoun ting, one might hav e f ( s, t, X s , α s ) = h ( s − t ) U ( X s , α s ) , where h ( · ) is the discount function. The inconsistency arises b ecause the discoun t factor h ( s − t ) c hanges as the initial time t mov es forw ard. 5.2 The augmented state technique T o apply the theory from Section 3 , we m ust recast the dep endence on the parameter t as a dep endence on a state v ariable. W e accomplish this b y introducing the augmen ted state pro cess. Let X b e a pro cess v alued in R n +1 deﬁned for s ∈ [0 , T ] by X s : =  s X s  . 19 The dynamics of this augmented pro cess under the con trol α are given by d X s =  1 σ ( s, X s ) b ( s, X s , α s )  d s +  0 1 × d σ ( s, X s )  d W α s , initialised at X t =  t x  = : x . (5.2) W e can no w deﬁne the augmen ted cost functions ˜ f and ˜ ξ on the augmen ted space R n +1 × R n +1 (where the ﬁrst co ordinate represen ts the time comp onen t) ˜ f s ( x , z , a ) : = f ( s, x 1 , z 2 : n +1 , a ) , ˜ ξ ( x , z ) : = ξ ( x 1 , z 2 : n +1 ) . Using this notation, the time-dep endent functional ( 5.1 ) can be rewritten exactly in the form of our state-dep enden t problem J ( x , α ) = E P x , α  Z T t ˜ f s ( x , X s , α s )d s + ˜ ξ ( x , X T )  . (5.3) This reform ulation allows us to apply Theorem 3.9 directly . The parameter of the problem is no w the vector x = ( t, x ) . 5.3 Sanit y c hec k: reco v ering the non-exp onen tial discounting system W e now demonstrate that applying our general BSDE system to this augmen ted set-up recov ers the sp eciﬁc system derived in [ 21 ] for the purely time-dep endent case. In the augmented framework, the equilibrium v alue function Y s is accompanied by a gradien t pro cess ∂ Y y . Since the parameter is y = ( t, x ) , this gradient decomp oses into tw o components: ∂ Y y s = ∂ Y ( t ) s ∂ Y ( x ) s ! . Here, ∂ Y ( t ) represen ts the sensitivit y of the v alue to the initial time (the time-inconsistency term), while ∂ Y ( x ) represen ts the sensitivit y to the initial state (the spatial inconsistency term). Assume the problem’s time inconsistency comes purely from the app earance of the present time (as in [ 21 ]). This means the preferences depend on t , but not on x as a parameter. In other w ords ∂ x f ( s, t, y , a ) = 0 , and ∂ x ξ ( t, y ) = 0 . Let us examine the BSDE for the gradient comp onen t (the second line of Condition ( 3.7 )) applied to our augmen ted set-up. ( i ) The sp atial c omp onent ( ∂ Y ( x ) ) : since the drivers ∂ x f and ∂ x ξ are zero, the BSDE for the spatial gradient ∂ Y ( x ) b ecomes a homogeneous linear BSDE with zero terminal condition. By uniqueness, ∂ Y ( x ) s ≡ 0 . This aligns with exp ectation: if preferences do not dep end on the initial state x , the inconsistency adjustment for x v anishes. ( ii ) The inc onsistency adjustment: Recall that in our general framework, the driver of the BSDE for Y contains the inconsis- tency adjustmen t term corresp onding to the op erator L α ⋆ ( y ) . F or the augmented state X , this is deﬁned as: K s : = b X s ( X s , α ⋆ s ) · ∂ y J ( s, X s , X s ) + 1 2 T r  Σ s ( X s )Σ s ( X s ) ⊤ ∂ 2 yy J ( s, X s , X s )  + T r  Σ s ( X s )Σ s ( X s ) ⊤ ∂ 2 xy J ( s, X s , X s )  , where y = ( t, x ) denotes the preference parameter in the augmen ted set-up, and Σ s ( X s ) is the diﬀusion matrix of the augmen ted pro cess. W e compute these terms explicitly . The augmented state dynamics d X s = (1 , b ) ⊤ d s + (0 , σ ) ⊤ d W s imply that the co eﬃcien ts are vectors and matrices in R n +1 b X =  1 σ ( s, X s ) b ( s, X s , α ⋆ s )  , Σ =  0 1 × d σ ( s, X s )  , ΣΣ ⊤ =  0 0 1 × n 0 n × 1 σ ( s, X s ) σ ( s, X s ) ⊤  . Since ∂ x J = 0 , the deriv ativ es with resp ect to y simplify . The Jacobian ∂ y J is ( ∂ t J , 0) ⊤ , and the Hessian matrices hav e zeros in all entries except p otentially the top-left (time-time), whic h do es not interact with the non-zero blo c k of ΣΣ ⊤ . Sp eciﬁcally T r   0 0 0 σ 2   ∂ 2 tt J 0 0 0   = 0 . The mixed deriv ative term trace is similarly zero. Th us, the total inconsistency adjustmen t reduces to the drift term: K s = b X · ∂ y J = ∂ t J ( s, X s , X s ) . This conﬁrms that the extra drift in the Hamiltonian is exactly the time-deriv ative of the v alue function with resp ect to the initial time. The BSDE for the time-deriv ative comp onen t ∂ Y ( t ) s is then obtained directly from our general system ( 3.7 ) d ∂ Y ( t ) s = −  ∂ t f ( s, s, X s , α ⋆ s ) + Z s · b ( s, X s , α ⋆ s )  d s + Z s · d W s , ∂ Y ( t ) T = ∂ t ξ ( t, X T ) . (5.4) This reco vers the structure of the adjoint equation derived [ 21 ]. 20 References [1] C. D. Aliprantis and K. Border. Inﬁnite dimensional analysis: a hitchhiker’s guide . Springer-V erlag Berlin Heidelb erg, third edition, 2006. [2] S. Basak and G. Chabakauri. Dynamic mean–v ariance asset allo cation. The R eview of Financial Studies , 23(8):2970–3016, 2010. [3] E. Ba yraktar, J. Zhang, and Z. Zhou. Equilibrium concepts for time-inconsisten t stopping problems in con tinuous time. Mathematic al Financ e , 31(1):508–530, 2021. [4] T. Björk and A. Murgo ci. A general theory of Marko vian time inconsisten t sto chastic control problems. T ec hnical rep ort, Sto c kholm Sc ho ol of Economics and Aarhus Unive rsity , 2010. [5] T. Björk and A. Murgo ci. A theory of Mark ovian time-inconsistent sto c hastic control in discrete time. Financ e and Sto chastics , 18(3):545–592, 2014. [6] T. Björk, A. Murgoci, and X. Y. Zhou. Mean–v ariance p ortfolio optimization with state-dep enden t risk a version. Math- ematic al Financ e , 24(1):1–24, 2014. [7] T. Björk, M. Khapk o, and A. Murgo ci. On time-inconsistent stochastic control in contin uous time. Financ e and Sto chas- tics , 21(2):331–360, 2017. [8] T. Björk, M. Khapko, and A. Murgo ci. Time-inc onsistent c ontr ol the ory with ﬁnanc e applic ations . Springer ﬁnance. Springer Cham, 2021. [9] A. Bo dnariu, S. Christensen, and K. Lindensjö. Local time pushed mixed stopping and smo oth ﬁt for time-inconsisten t stopping problems. A rXiv pr eprint arXiv:2206.15124 , 2022. [10] S. Christensen and K. Lindensjö. On ﬁnding equilibrium stopping times for time-inconsisten t Marko vian problems. SIAM Journal on Contr ol and Optimization , 56(6):4228–4255, 2018. [11] S. Christensen and K. Lindensjö. Time-inconsistent stopping, my opic adjustmen t and equilibrium stability: with a mean–v ariance application. Banach Center Public ations , 122:53–76, 2020. [12] I. Ek eland and A. Lazrak. Being serious ab out non-commitment: subgame perfect equilibrium in con tin uous time. T echnical rep ort, Universit y of British Columbia, 2006. [13] I. Ekeland and A. Lazrak. The golden rule when preferences are time inconsistent. Mathematics and Financial Ec onomics , 4(1):29–55, 2010. [14] I. Ekeland and T. A. Pirvu. In vestmen t and consumption without commitment. Mathematics and Financial Ec onomics , 2(1):57–86, 2008. [15] N. El Karoui, S. P eng, and M.-C. Quenez. Backw ard sto c hastic diﬀerential equations in ﬁnance. Mathematic al Financ e , 7(1):1–71, 1997. [16] W. H. Fleming and H. M. Soner. Contr ol le d Markov pr o c esses and visc osity solutions , volume 25 of Sto chastic mo del ling and applie d pr ob ability . Springer-V erlag New Y ork, second edition, 2006. [17] J.-W. Gu, S. Si, and H. Zheng. Constrained utility deviation-risk optimization and time-consistent HJB equation. SIAM Journal on Contr ol and Optimization , 58(2):866–894, 2020. [18] Y. Hamaguc hi. Extended backw ard stochastic Volterra integral equations and their applications to time-inconsistent sto c hastic recursiv e control problems. Mathematic al Contr ol and R elate d Fields , 11(2):433–478, 2021. [19] X. D. He and Z. Jiang. On the equilibrium strategies for time-inconsistent problems in con tinuous time. SIAM Journal on Contr ol and Optimization , 59(5):3860–3886, 2021. [20] C. Hernández. On quadratic m ultidimensional type-i BSVIEs, inﬁnite families of BSDEs and their applications. Sto chastic Pr o c esses and their Applic ations , 162:249–298, 2023. [21] C. Hernández and D. Possamaï. Me, m yself and I: a general theory of non-Mark ovian time-inconsisten t sto c hastic control for sophisticated agents. The A nnals of A pplie d Pr ob ability , 33(2):1396–1458, 2023. 21 [22] C. Hernández and D. Possamaï. Time-inconsisten t contract theory . Mathematic al Financ e , 34(3):1022–1085, 2024. [23] Y. Hu, H. Jin, and X. Y. Zhou. Time-inconsisten t sto c hastic linear–quadratic con trol. SIAM Journal on Contr ol and Optimization , 50(3):1548–1572, 2012. [24] Y. Hu, H. Jin, and X. Y. Zhou. Time-inconsistent sto c hastic linear–quadratic control: c haracterization and uniqueness of equilibrium. SIAM Journal on Contr ol and Optimization , 55(2):1261–1279, 2017. [25] Y.-J. Huang and Z. Zhou. Strong and weak equilibria for time-inconsistent sto c hastic control in contin uous time. Math- ematics of Op er ations R ese ar ch , 46(2):428–451, 2021. [26] P . Imk eller, A. Réveillac, and A. Rich ter. Diﬀerentiabilit y of quadratic BSDEs generated by con tinuous martingales. The A nnals of A pplie d Pr ob ability , 22(1):285–336, 2012. [27] J. Jaco d and A. N. Shiryaev. Limit the or ems for sto chastic pr o c esses , v olume 288 of Grund lehr en der mathematischen Wissenschaften . Springer-V erlag Berlin Heidelb erg, 2003. [28] M. Jeanblanc, M. Y or, and M. Chesney . Mathematic al metho ds for ﬁnancial markets . Springer ﬁnance. Springer London, 2009. [29] I. Karatzas and S. E. Shreve. Br ownian motion and sto chastic c alculus , volume 113 of Gr aduate texts in mathematics . Springer-V erlag New Y ork, second edition, 1998. [30] C. Karnam, J. Ma, and J. Zhang. Dynamic approaches for some time inconsistent problems. The A nnals of A pplie d Pr ob ability , 27(6):3435–3477, 2017. [31] H. Kunita. Some extensions of Itô’s formula. Séminair e de pr ob abilités de Str asb our g , XV:118–141, 1981. [32] D. Laibson. Golden eggs and hyperb olic discoun ting. The Quarterly Journal of Ec onomics , 112(2):443–477, 1997. [33] K. Lindensjö. A regular equilibrium solves the extended HJB system. Op er ations R ese ar ch L etters , 47(5):427–432, 2019. [34] E. Mastrogiacomo and M. T arsia. Subgame-p erfect equilibrium strategies for time-inconsistent recursiv e sto c hastic control problems. Journal of Mathematic al A nalysis and A pplic ations , 527(2):127425, 2023. [35] H. Mei and J. Y ong. Equilibrium strategies for time-inconsistent sto chastic switching systems. ESAIM: Contr ol, Opti- misation and Calculus of V ariations , 25(64):1–60, 2019. [36] T. O’Donogh ue and M. Rabin. Doing it no w or later. The A meric an Ec onomic R eview , 89(1):103–124, 1999. [37] É. P ardoux and P . E. Protter. Sto c hastic Volterra equations with anticipating co eﬃcients. The A nnals of Pr ob ability , 18 (4):1635–1655, 1990. [38] B. Peleg and M. E. Y aari. On the existence of a consistent course of action when tastes are changing. The R eview of Ec onomic Studies , 40(3):391–401, 1973. [39] E. S. Phelps and R. A. Pollak. On second-b est national saving and game-equilibrium growth. The R eview of Ec onomic Studies , 35(2):185–199, 1968. [40] R. A. Pollak. Consisten t planning. The R eview of Ec onomic Studies , 35(2):201–208, 1968. [41] D. Possamaï and C. Rossato. V ariance strikes back: sub-game–perfect Nash equilibria in time-inconsisten t N -play er games, and their mean-ﬁeld sequel. A rXiv pr eprint arXiv:2512.08745 , 2025. [42] C. S. Pun. Robust time-inconsisten t sto c hastic control problems. A utomatic a , 94:249–257, 2018. [43] D. W. Stro ock and S. R. S. V aradhan. Multidimensional diﬀusion pr o c esses , volume 233 of Grund lehr en der mathema- tischen Wissenschaften . Springer-V erlag Berlin Heidelb erg, 1997. [44] R. H. Strotz. Myopia and inconsistency in dynamic utility maximization. The R eview of Ec onomic Studies , 23(3):165–180, 1955. [45] H. W ang and J. Y ong. Time-inconsistent sto c hastic optimal con trol problems and bac kward sto c hastic Volterra in tegral equations. ESAIM: Contr ol, Optimisation and Calculus of V ariations , 27(22):1–40, 2021. 22 [46] T. W ang and H. Zheng. Closed-loop equilibrium strategies for general time-inconsistent optimal control problems. SIAM Journal on Contr ol and Optimization , 59(5):3152–3178, 2021. [47] Q. W ei, J. Y ong, and Z. Y u. Time-inconsistent recursive sto c hastic optimal control problems. SIAM Journal on Contr ol and Optimization , 55(6):4156–4201, 2017. [48] Y. Xu and S. Y ang. Dynamic programming principle for a con trolled FBSDE system and asso ciated extended HJB equation. A rXiv pr eprint arXiv:2203.14274 , 2022. [49] J. Y ong. Time-inconsisten t optimal control problems and the equilibrium HJB equation. Mathematic al Contr ol & R elate d Fields , 2(3):271–329, 2012. [50] J. Y ong and X. Y. Zhou. Sto chastic c ontr ols: Hamiltonian systems and HJB e quations , volume 43 of Sto chastic mo del ling and applie d pr ob ability . Springer-V erlag New Y ork, 1999. A Pro of of the extended dynamic programming principle In this section, we provide the detailed pro of of Theorem 3.3 . W e rely on the deﬁnition of equilibrium and the regularity of the v alue function with resp ect to the preference parameter. W e deﬁne the auxiliary v alue function Ψ( t, x ; y ) as the exp ected future rew ard from state x at time t under a ﬁxed equilibrium strategy α ⋆ , ev aluated with the ﬁxed preference parameter y Ψ( t, x ; y ) : = E P t , x , α ⋆  Z T t f ( u, y , X u , α ⋆ u )d u + ξ ( y , X T )  . (A.1) By deﬁnition, the equilibrium v alue function corresp onds to the diagonal restriction v ( t, x ) = Ψ( t, x ; x ) . W e assume throughout this section that the regularity conditions in Assumption 3.1 hold. Before we start, let us introduce a tec hnical lemma from the theory of sto chastic calculus that turns out to b e the crucial step in understanding the dynamic of the process that we are interested in. Lemma A.1 (Itô–Kunita–W entzell’s form ula) . L et f ( t, x ) b e a family of F -adapte d and me asur able sto chastic pr o c esses, c ontinuous in ( t, x ) ∈ ( R + × R d ) , P –a.s. satisfying ( i ) for e ach t ≥ 0 , R d ∋ x 7− → f ( t, x ) ∈ R is C 2 ; ( ii ) ther e is some m ∈ N ⋆ such that for e ach x ∈ R d , f ( t, x ) is a c ontinuous ( F , P ) –semi-martingale with d f ( t, x ) = m X j =1 f j t ( x )d M j t , wher e for any j ∈ { 1 , . . . , m } M j is a c ontinuous ( F , P ) –semi-martingale, and for any x ∈ R d , f j ( x ) is an F –adapte d and me asur able sto chastic pr o c esses c ontinuous in ( t, x ) , such that R d ∋ x 7− → f j ( x ) ∈ R is C 1 . L et X = ( X 1 , . . . , X d ) b e a c ontinuous ( F , P ) –semi-martingale. Then f ( t, X t ) = f (0 , X 0 ) + m X j =1 Z t 0 f j s ( X s )d M j s + d X i =1 Z t 0 ∂ x i f ( s, X s )d X i s + m X j =1 d X i =1 Z t 0 ∂ x i f j s ( X s )d[ X i , M j ] s + 1 2 d X j =1 d X i =1 Z t 0 ∂ 2 x i x j f ( s, X s )d[ X i , X j ] s . (A.2) Remark A.2. Note that, in p articular, the Itô–K unita–W entzel l’s formula says that, with the assumptions of the the or em, the c omp osition of an Itô pr o c ess and a one-p ar ametric family of Itô pr o c esses is again an Itô pr o c ess, which is not such a trivial statement. W e readily see that if the pro cess X is a constant x , we are left with the original decomp osition of the pro cess f ( t, x ) . This v ersion of the theorem w as obtained from Jeanblanc, Y or, and Chesney [ 28 , Theorem 1.5.3.2], and we present it here without pro of, referring to Kunita [ 31 , Theorem 1]. Let us mov e now to the pro of of the extended DPP . W e will divide most of the work in Lemmas C.1 , A.4 and A.5 . 23 Lemma A.3. L et ( t, x ) ∈ [0 , T ] × R n . L et τ ∈ T t,T b e an F –stopping time b ounde d by t + δ for some deterministic c onstant δ > 0 . F or any admissible c ontr ol α ∈ A ( t, x ) , the fol lowing ine quality holds v ( t, x ) ≥ E P t , x , α  v ( τ , X τ ) + Z τ t f ( u, x, X u , α u )d u +  Ψ( τ , X τ ; x ) − Ψ( τ , X τ ; X τ )   − r ( δ ) , (A.3) wher e r ( δ ) is a non-ne gative err or term satisfying the asymptotic pr op erty r ( δ ) = o ( δ ) as δ − → 0 . F urthermor e, if α = α ⋆ , the e quality holds with r ( δ ) ≡ 0 . Pr o of. W e b egin by constructing a sp eciﬁc p erturbation of the equilibrium strategy . As usual, let ˆ α : = α ⊗ τ α ⋆ b e the concatenated con trol deﬁned b y ˆ α s ( ω ) : = α s ( ω ) 1 [ t,τ ( ω )) ( s ) + α ⋆ s ( ω ) 1 [ τ ( ω ) ,T ] ( s ) . This strategy follows the arbitrary control α until the stopping time τ , and reverts to the equilibrium strategy α ⋆ thereafter. By Deﬁnition 2.4 , the strategy α ⋆ is optimal against lo cal deviations up to a ﬁrst-order error. In other words, for small enough δ J ( t, x, α ⋆ ) ≥ J ( t, x, ˆ α ) − o ( δ ) . The left-hand side is, by the deﬁnition of the v alue function, exactly v ( t, x ) . W e now analyze the righ t-hand side, J ( t, x, ˆ α ) . By the deﬁnition of the cost functional, we hav e J ( t, x, ˆ α ) = E P t , x , α  Z τ t f ( u, x, X u , α u )d u + Z T τ f ( u, x, X u , α ⋆ u )d u + ξ ( x, X T )  . Note that the probability measure P t,x,α go verns the dynamics in [ t, τ ] , while the dynamics in ( τ , T ] is go verned b y α ⋆ giv en the state in τ . W e apply the tow er prop ert y of conditional exp ectations, conditioning on the σ -algebra F τ J ( t, x, ˆ α ) = E P t , x , α " Z τ t f ( u, x, X u , α u )d u + E P t , x , ˆ α  Z T τ f ( u, x, X u , α ⋆ u )d u + ξ ( x, X T )     F τ  # . By the prop erties of the concatenated measure in tro duced in Theorem 2.3 , the conditional distribution of the pro cess after τ giv en F τ is precisely given by the kernel P τ ,X τ ,α ⋆ . Consequently , the inner conditional exp ectation satisﬁes E P t , x , ˆ α  Z T τ f ( u, x, X u , α ⋆ u )d u + ξ ( x, X T )     F τ  = E P τ , X τ , α ⋆  Z T τ f ( u, x, X u , α ⋆ u )d u + ξ ( x, X T )  . Comparing this to the deﬁnition in ( A.1 ), we identify the right-hand side precisely as the auxiliary v alue function Ψ( τ , X τ ; x ) . Substituting this back into the expansion of J , we obtain J ( t, x, ˆ α ) = E P t , x , α  Z τ t f ( u, x, X u , α u )d u + Ψ( τ , X τ ; x )  . Using the initial inequality v ( t, x ) ≥ J ( t, x, ˆ α ) − o ( δ ) , w e hav e v ( t, x ) ≥ E P t , x , α  Z τ t f ( u, x, X u , α u )d u + Ψ( τ , X τ ; x )  − o ( δ ) . Finally , we introduce the equilibrium v alue function at time τ . Recall that v ( τ , z ) = Ψ( τ , z ; z ) . W e add and subtract v ( τ , X τ ) = Ψ( τ , X τ ; X τ ) inside the exp ectation Ψ( τ , X τ ; x ) = v ( τ , X τ ) +  Ψ( τ , X τ ; x ) − Ψ( τ , X τ ; X τ )  . Plugging this decomp osition in to the inequality yields the result ( A.3 ). Lemma A.4. Fix a time horizon S > t and N ∈ N ⋆ . L et Π N : = { t 0 , t 1 , . . . , t N } b e a p artition of the interval [ t, S ] , wher e t 0 = t and t N = S . F or any admissible c ontr ol α v ( t, x ) ≥ E P α " v ( S, X S ) + N − 1 X i =0 Z t i +1 t i f ( u, X t i , X u , α u )d u + N − 1 X i =0  Ψ  t i +1 , X t i +1 ; X t i ) − Ψ( t i +1 , X t i +1 ; X t i +1 )  # − o (1) . (A.4) 24 Pr o of. W e pro ceed by backw ard induction or simple iteration. Consider the interv al [ t i , t i +1 ] . W e apply Lemma A.3 condi- tioned on the ﬁltration F t i , with the preference parameter frozen at the state X t i . This gives v ( t i , X t i ) ≥ E P α  v ( t i +1 , X t i +1 ) + Z t i +1 t i f ( u, X t i , X u , α u )d u + ∆ i     F t i  − o ( t i +1 − t i ) , where ∆ i : = Ψ( t i +1 , X t i +1 ; X t i ) − Ψ( t i +1 , X t i +1 ; X t i +1 ) . T aking exp ectations under P α and summing these inequalities from i = 0 to N − 1 leads to a telescoping sum for the v alue function terms v ( t i , X t i ) , lea ving only the initial term v ( t, x ) and the terminal term v ( S, X S ) , plus the cumulativ e sums of the running costs and the adjustment terms ∆ i . No w w e conclude the pro of of the extended dynamic programming principle in Theorem 3.3 by sho wing that the sums in Lemma A.4 conv erge to the terms we exp ect. Lemma A.5 (Conv ergence of the discrete inequality) . L et (Π N ) N ∈ N ⋆ b e a se quenc e of p artitions of [ t, S ] whose mesh size tends to 0 . The discr ete ine quality in Lemma A.4 c onver ges to the fol lowing inte gr al formulation v ( t, x ) ≥ E P α  v ( S, X S ) + Z S t f ( u, X u , X u , α u )d u − Z S t  b ( u, X u , α u ) · ∂ y Ψ( u, X u ; X u ) + 1 2 T r  σ ( u, X u ) σ ⊤ ( u, X u ) ∂ 2 y y Ψ( u, X u ; X u )  + T r  σ ( u, X u ) σ ⊤ ( u, X u ) ∂ 2 xy Ψ( u, X u ; X u )   d u # . Pr o of. T o rigorously analyse the conv ergence of the discrete sums app earing in Lemma A.4 , we in tro duce the time-discretisation map τ N : [ t, S ] − → { t 0 , . . . , t N − 1 } deﬁned by τ N ( u ) : = t i for u ∈ [ t i , t i +1 ) . This notation allows us to express the discrete Riemann sums as contin uous stochastic integrals o ver the full interv al [ t, S ] , facilitating the use of dominated con vergence argumen ts. Part 1 : c onver genc e of the running c ost. W e consider the Riemann sum approximating the running cost I Π N : = N − 1 X i =0 Z t i +1 t i f ( u, X t i , X u , α u )d u. Using the discretisation map τ N , w e rewrite this sum as a single global in tegral I Π N = Z S t f ( u, X τ N ( u ) , X u , α u )d u. W e claim that I Π N con verges to R S t f ( u, X u , X u , α u )d u in L 1 ( R , F , P α ) . Indeed, we can apply the dominated conv ergence theorem under the measure P α 1. p ointwise c onver genc e: the tra jectories of X are contin uous P –a.s. (and thus P α –a.s.). As the mesh size | Π N | − → 0 , w e ha ve τ N ( u ) − → u , implying X τ N ( u ) − → X u for all u . Since f is con tin uous in its arguments, the in tegrand f ( u, X τ N ( u ) , X u , α u ) con verges p oin twise to f ( u, X u , X u , α u ) for d t ⊗ P α –almost ev ery ( u, ω ) ; 2. domination: w e seek a uniform integrable b ound. By the p olynomial gro wth assumption on f (Assumption 3.1 ), there exist constan ts C > 0 and m ≥ 1 such that for all u ∈ [ t, S ]   f ( u, X τ N ( u ) , X u , α u )   ≤ C  1 + ∥ X τ N ( u ) ∥ m + ∥ X u ∥ m  ≤ 2 C  1 + sup s ∈ [ t,S ] ∥ X s ∥ m  = : Z . Since w e hav e Z ∈ L 1 ( R , F , P α ) due to Assumption 3.1 , w e can conclude. Part 2 : c onver genc e of the adjustment term. W e now turn to the inconsistency adjustment sum A Π N : = N − 1 X i =0 ∆ i , ∆ i : = Ψ( t i +1 , X t i +1 ; X t i ) − Ψ( t i +1 , X t i +1 ; X t i +1 ) . All exp ectations in the following are taken under P α , the measure induced by the arbitrary control α . Fix a partition interv al [ t i , t i +1 ] and decomp ose ∆ i = Ψ( t i +1 , X t i +1 ; X t i ) − Ψ( t i , X t i ; X t i ) | {z } T erm I −  v ( t i +1 , X t i +1 ) − v ( t i , X t i )  | {z } T erm I I . 25 T erm I. Apply Itô’s form ula to r 7− → Ψ( r, X r ; X t i ) under P α , holding the preference parameter X t i ﬁxed. W riting L α r r = L α ⋆ r r + ( L α r r − L α ⋆ r r ) and using the PDE ( ∂ t + L α ⋆ r r )Ψ( · , · ; y ) = − f ( · , y , · , α ⋆ r ) , w e obtain Ψ( t i +1 , X t i +1 ; X t i ) − Ψ( t i , X t i ; X t i ) = Z t i +1 t i h − f ( r, X t i , X r , α ⋆ r ) +  b ( r , X r , α r ) − b ( r , X r , α ⋆ r )  · σ ( r , X r ) ⊤ ∂ x Ψ( r , X r ; X t i ) i d r + M ( i ) , I , where M ( i ) , I is a sto chastic integral against W α and hence a true P α –martingale b y the p olynomial growth of ∂ x Ψ and Assumption 3.1 . T aking the conditional exp ectation E P α [ · | F t i ] eliminates M ( i ) , I . T erm II. Since v ( r , x ) = Ψ( r, x ; x ) , applying the chain rule for spatial deriv atives gives ∂ x v ( r , x ) = ∂ x Ψ( r , x ; x ) + ∂ y Ψ( r , x ; x ) , ∂ 2 xx v ( r , x ) = ∂ 2 xx Ψ( r , x ; x ) + 2 ∂ 2 xy Ψ( r , x ; x ) + ∂ 2 y y Ψ( r , x ; x ) . Applying Lemma A.1 (Itô–Kunita–W entzell form ula) to v ( r , X r ) under P α , substituting these identities, and using the PDE for Ψ to simplify ∂ t Ψ + L α r r Ψ = − f ( r, X r , X r , α ⋆ r ) + ( b ( r , X r , α r ) − b ( r , X r , α ⋆ r )) · σ ( r , X r ) ⊤ ∂ x Ψ( r , X r ; X r ) , w e ﬁnd E P α  v ( t i +1 , X t i +1 ) − v ( t i , X t i ) | F t i  = E P α  Z t i +1 t i h − f ( r, X r , X r , α ⋆ r ) +  b ( r , X r , α r ) − b ( r , X r , α ⋆ r )  · σ ( r , X r ) ⊤ ∂ x Ψ( r , X r ; X r ) + L α r r, ( y ) Ψ( r , X r ; X r ) i d r     F t i  , where w e deﬁne the generator acting exclusively on the y -v ariable under the arbitr ary control α as: L α r r, ( y ) Ψ : = b ( r , X r , α r ) · σ ( r , X r ) ⊤ ∂ y Ψ + 1 2 T r  σ ( r , X r ) σ ( r , X r ) ⊤ ∂ 2 y y Ψ  + T r  σ ( r , X r ) σ ( r , X r ) ⊤ ∂ 2 xy Ψ  . Combining. Subtracting T erm I I from T erm I, taking the unconditional exp ectation E P α , summing ov er i , and rewriting the result as a single integral via the discretisation map τ N yields: E P α  A Π N  = E P α  Z S t h f ( r, X r , X r , α ⋆ r ) − f ( r, X τ N ( r ) , X r , α ⋆ r ) +  b ( r , X r , α r ) − b ( r , X r , α ⋆ r )  · σ ( r , X r ) ⊤  ∂ x Ψ( r , X r ; X τ N ( r ) ) − ∂ x Ψ( r , X r ; X r )  − L α r r, ( y ) Ψ( r , X r ; X r ) i d r  . Passage to the limit. As | Π N | → 0 , we hav e X τ N ( r ) → X r , P –a.s. by the contin uity of the tra jectories. By the contin uit y of f and ∂ x Ψ in all their arguments, the ﬁrst t wo lines of the integrand conv erge p oin twise to zero. Sp eciﬁcally: • f ( r, X r , X r , α ⋆ r ) − f ( r, X τ N ( r ) , X r , α ⋆ r ) − → 0 p oint wise; • ∂ x Ψ( r , X r ; X τ N ( r ) ) − ∂ x Ψ( r , X r ; X r ) − → 0 point wise. Since for any ﬁxed ( r, ω ) the ev aluated state and controls are ﬁnite, the drift diﬀerence ( b α r − b α ⋆ r ) acts as a ﬁnite m ultiplier, guaranteeing that the entire cross-term point wise conv erges to zero. Both terms are uniformly dominated by the integrable random v ariable Z constructed in Part 1 (scaled by constan ts dep ending on the Lipschitz contin uity of b and the p olynomial growth of ∂ x Ψ from Assumption 3.1 ). Applying the dominated conv ergence theorem under P α giv es: lim | Π N |→ 0 E P α  A Π N  = − E P α  Z S t L α r r, ( y ) Ψ( r , X r ; X r )d r  . Substituting the explicit form of L α r r, ( y ) Ψ and combining with Part 1 yields the integral inequality stated in the lemma. With these lemmata, we can ﬁnally conclude the pro of of the extended dynamic programming principle. 26 Pr o of of The or em 3.3 . Lemma A.5 establishes that for any admissible control α ∈ A , the v alue function satisﬁes the integral inequalit y v ( t, x ) ≥ E P α  v ( S, X S ) + Z S t f ( u, X u , X u , α u )d u − Z S t  b ( u, X u , α u ) · ∇ y Ψ( u, X u ; X u ) + 1 2 T r  σ ( u, X u ) σ ⊤ ( u, X u ) ∇ 2 y y Ψ( u, X u ; X u )  + T r  σ ( u, X u ) σ ⊤ ( u, X u ) ∇ 2 xy Ψ( u, X u ; X u )   d u # . T o conclude the proof, w e must show that equalit y holds when α = α ⋆ . Recall from Lemma A.3 that if we c ho ose the equilibrium control α ⋆ , the lo cal error term r ( δ ) is identically zero. This implies that the discrete-time inequality b ecomes an equalit y at every step of the iteration in Lemma A.4 . Sp eciﬁcally , for α = α ⋆ , the telescoping sum argumen t holds exactly without an y o (1) error terms. Consequen tly , passing to the limit as the mesh size | Π N | − → 0 in the equality case pro ceeds identically to the inequalit y case, but with equalities throughout. Th us v ( t, x ) = E P t , x , α ⋆ " v ( S, X S ) + Z S t f ( u, X u , X u , α ⋆ u )d u − Z S t  b u ( X u , α ⋆ u ) · ∇ y Ψ( u, X u ; X u ) + 1 2 T r  σ u σ ⊤ u ∇ 2 y y Ψ( u, X u ; X u )  + T r  σ u σ ⊤ u ∇ 2 xy Ψ( u, X u ; X u )   d u # . Finally , we recall the deﬁnition of the auxiliary function Ψ( u, x ; y ) as the exp ected reward with ﬁxed parameter y . Diﬀerentiat- ing under the exp ectation sign (justiﬁed by Assumption 3.1 ), w e observ e that the deriv ativ es ∇ y Ψ , ∇ 2 y y Ψ , and ∇ 2 xy Ψ ev aluated at ( u, X u ; X u ) corresp ond exactly to the exp ectation terms app earing in the theorem statement ( 3.6 ), thereby concluding the pro of. B Pro of of the necessit y theorem Before pro ving the main necessity result, we establish the following consequence of the extended dynamic programming principle. Lemma B.1 (Martingale optimality prop ert y) . L et α ⋆ ∈ A b e an e quilibrium c ontr ol satisfying the extende d DPP identity ( 3.6 ) . Deﬁne the inc onsistency adjustment term K t ( a ) for any a ∈ A by K t ( a ) : = b ( t, X t , a ) · σ ( t, X t ) ⊤ ∇ y J ( t, X t , X t ) + T r  1 2 ∇ 2 y y J ( t, X t , X t ) + ∇ 2 xy J ( t, X t , X t )  σ ( t, X t ) σ ( t, X t ) ⊤  , t ∈ [0 , T ] . Then, the pr o c ess M α ⋆ deﬁne d by M α ⋆ t : = v ( t, X t ) + Z t 0  f ( r, X r , X r , α ⋆ r ) − K r ( α ⋆ r )  d r , t ∈ [0 , T ] , is an ( F , P α ⋆ ) -martingale. F urthermor e, for any arbitr ary admissible c ontr ol α ∈ A , the c orr esp onding pr o c ess M α is a ( F , P α ) –sup er-martingale. Pr o of. W e prov e the martingale prop ert y for α ⋆ . Fix 0 ≤ s ≤ t ≤ T . W e compute the conditional exp ectation of the increment E P α ⋆  M α ⋆ t − M α ⋆ s | F s  = E P α ⋆  v ( t, X t ) − v ( s, X s ) + Z t s  f ( r, X r , X r , α ⋆ r ) − K r ( α ⋆ r )  d r     F s  . By the Marko v property of the state pro cess X and the feedback nature of α ⋆ , we can rewrite the conditional expectation using the exp ectation starting at time s E P α ⋆  M α ⋆ t − M α ⋆ s   F s  = E P s,X s ,α ⋆  v ( t, X t ) + Z t s  f ( r, X r , X r , α ⋆ r ) − K r ( α ⋆ r )  d r  − v ( s, X s ) . 27 W e now compare this expression with the extended DPP ( 3.6 ). Observe that the exp ectation terms app earing in ( 3.6 ) are tak en under the measure P r,X r ,α ⋆ . These terms corresp ond precisely to the deriv atives ∇ y J ( r, X r , X r ) , ∇ 2 y y J ( r, X r , X r ) , and ∇ 2 xy J ( r, X r , X r ) app earing in our deﬁnition of K r ( α ⋆ r ) . Consequently , the in tegral term inv olving K r exactly cancels the inconsistency cost terms in the extended DPP , lea ving the martingale diﬀerence equal to zero. With this in mind, we go on to provide a rigorous pro of of Theorem 3.7 . W e assume the existence of a smo oth equilibrium con trol α ⋆ and smo oth v alue functions V and J , and we show that they necessarily induce a solution to the BSDE system ( 3.7 ) and satisfy the Hamiltonian maximisation condition. Pr o of of The or em 3.7 . The pro of pro ceeds in three steps: ﬁrst, we identify the auxiliary pro cesses for the parameter deriv a- tiv es; second, we deriv e the dynamics of the v alue function using the extended DPP; and third, we v erify the Hamiltonian maximisation condition. Let us start by sho wing that the deriv ativ e pro cesses satisfy ( 3.7 ). Recall the deﬁnition of the auxiliary v alue function with ﬁxed preference parameter y ∈ R n J ( t, x, y ) : = E P t,x,α ⋆  Z T t f ( s, y , X s , α ⋆ s )d s + ξ ( y , X T )  . By the classical F eynman–Kac theorem, for each ﬁxed y , the function ( t, x ) 7− → J ( t, x, y ) solves the linear PDE ∂ t J ( t, x, y ) + L α ⋆ ( t,x ) t J ( t, x, y ) + f ( t, y , x, α ⋆ ( t, x )) = 0 , ( t, x, y ) ∈ [0 , T ) × R n × R n , (B.1) with terminal condition J ( T , x, y ) = ξ ( y , x ) . By the h yp othesis of Theorem 3.7 , J is of class C 1 , 2 with resp ect to the spatial and parameter v ariables. W e can therefore diﬀeren tiate ( B.1 ) with resp ect to the parameter y . Note that the deriv atives of the cost functions ∂ y f and ∂ y ξ exist b y Assumption 3.1 . Let v y ( t, x ) : = ∂ y J ( t, x, y ) denote the gradient with resp ect to y . It satisﬁes the linearised PDE ∂ t v y ( t, x ) + L α ⋆ ( t,x ) t v y ( t, x ) + ∂ y f ( t, y , x, α ⋆ ( t, x )) = 0 , ( t, x, y ) ∈ [0 , T ) × R n × R n , v y ( T , x ) = ∂ y ξ ( y , x ) , ( x, y ) ∈ R n × R n . This is a standard linear parab olic equation. The probabilistic representation of its solution is given by the BSDE ∂ Y y t = ∂ y ξ ( y , X T ) + Z T t ∂ y f ( r, y, X r , α ⋆ r )d r − Z T t ∂ Z y r · d W α ⋆ r , t ∈ [0 , T ] , where w e identify ∂ Y y t = ∂ y J ( t, X t , y ) and ∂ Z y t = σ ( t, X t ) ⊤ ∂ 2 xy J ( t, X t , y ) . W e also let α ⋆ t : = α ⋆ ( t, X t ) , abusing notations sligh tly . Ho wev er, the system ( 3.7 ) is written under the reference measure P (where W is an ( F , P ) –Bro wnian motion), not P α ⋆ . Recall that d W α ⋆ r = d W r − b ( r , X r , α ⋆ r )d r . Substituting this change of measure in to the equation ab o ve yields ∂ Y y t = ∂ y ξ ( y , X T ) + Z T t  ∂ y f ( r, y, X r , α ⋆ r ) + ∂ Z y r · b ( r , X r , α ⋆ r )  d r − Z T t ∂ Z y r · d W r . This matches exactly the second equation of the system ( 3.7 ). The deriv ation for the Hessian pro cess ∂ ∂ Y y follo ws an identical argumen t by diﬀerentiating the PDE t wice. Let us now address the dynamics of the pro cess Y .W e start by determining its driver, keeping in mind that Y t : = V ( t, X t ) . By Itô’s formula d Y t =  ∂ t V ( t, X t ) + L α ⋆ t t V ( t, X t )  d t + ∇ x V ( t, X t ) ⊤ σ ( t, X t )d W α ⋆ t . T o identify the drift term ∂ t V + L α ⋆ V , we use the extended DPP (Theorem 3.3 ). Since α ⋆ is an equilibrium control, Lemma B.1 implies that the pro cess M t : = V ( t, X t ) + Z t 0  f ( r, X r , X r , α ⋆ r ) − K r ( α ⋆ r )  d r , t ∈ [0 , T ] , is an ( F , P α ⋆ ) -martingale. Thus, the drift of M must v anish. Calculating it and setting it to zero giv es ∂ t V ( t, X t ) + L α ⋆ t t V ( t, X t ) | {z } Drift of V + f ( t, X t , X t , α ⋆ t ) − K t ( α ⋆ t ) | {z } Drift from integral = 0 , d t ⊗ P –a.e. 28 Therefore, the generator of the v alue function is given by ∂ t V ( t, X t ) + L α ⋆ t t V ( t, X t ) = − f ( t, X t , X t , α ⋆ t ) + K t ( α ⋆ t ) , d t ⊗ P –a.e. (B.2) W e now deﬁne the BSDE v ariables for the v alue function. Let Z t : = σ ( t, X t ) ⊤ ∇ x V ( t, X t ) , t ∈ [0 , T ] . Under the reference measure P , the dynamics of Y is d Y t =  ∂ t V ( t, X t ) + L α ⋆ t t V ( t, X t ) − Z t · b ( t, X t , α ⋆ t )  d t + Z t · d W t . Substituting the generator expression from ( B.2 ) and expanding K t d Y t =  b ( t, X t , α ⋆ t ) · σ ( t, X t ) ⊤ ∇ y J ( t, X t , X t ) − f ( t, X t , X t , α ⋆ t ) + 1 2 T r  σ ( t, X t ) σ ( t, X t ) ⊤ ∇ 2 y y J ( t, X t , X t )  + T r  σ ( t, X t ) σ ( t, X t ) ⊤ ∇ 2 xy J ( t, X t , X t )  − Z t · b ( t, X t , α ⋆ t )  d t + Z t · d W t . W e identify the terms with the BSDE v ariables deﬁned ab o ve ∂ Y X t t = ∇ y J ( t, X t , X t ) , ∂ ∂ Y X t t = ∇ 2 y y J ( t, X t , X t ) , ∂ Z X t t = σ ( t, X t ) ⊤ ∇ 2 xy J ( t, X t , X t ) , t ∈ [0 , T ] . The driv er b ecomes driv er t = f ( t, X t , X t , α ⋆ t ) + b ( t, X t , α ⋆ t ) · ( Z t − σ ( t, X t ) ⊤ ∂ Y X t t ) − 1 2 T r  σ ( t, X t ) σ ( t, X t ) ⊤ ∂ ∂ Y X t t  − T r  σ ( t, X t ) ∂ Z X t t  . This matches the drift of the ﬁrst equation of ( 3.7 ), provided that α ⋆ maximises the Hamiltonian, which is what we are left to pro ve. T o do so, we compare the dynamics of the equilibrium v alue function under α ⋆ v ersus an arbitrary control α . Since we know M α ⋆ is an ( F , P α ⋆ ) -martingale, w e hav e that its drift is exactly zero ∂ t V ( t, X t ) + L α ⋆ t t V ( t, X t ) + f ( t, X t , X t , α ⋆ ) − K t ( α ⋆ t ) = 0 . (B.3) Using Lemma B.1 we also hav e that its drift must b e non-p ositiv e ∂ t V ( t, X t ) + L α t t V ( t, X t ) + f ( t, X t , X t , α ) − K t ( α t ) ≤ 0 . (B.4) No w we simply we subtract the equality ( B.3 ) from the inequality ( B.4 ). Note that terms not dep ending on the con trol cancel out immediately • the time deriv ativ e ∂ t V cancels; • the second-order diﬀusion term in L α ⋆ t and L α t in volv es 1 2 σ ( t, x ) σ ( t, x ) ⊤ ∇ 2 xx V . Since volatilit y is uncontrolled, this term is iden tical for b oth α and α ⋆ and cancels; • the second-order trace term inside the inconsistency adjustment K t (see Lemma B.1 ) also dep ends only on σ ( x ) (see Remark 3.8 ). It is iden tical in b oth equations and also cancels. W e are left with the ﬁrst-order terms  b ( t, X t , α t ) · σ ( t, X t ) ⊤ ∇ x V + f ( t, X t , X t , α t ) − b ( t, X t , α t ) · σ ( t, X t ) ⊤ ∂ y J ( t, X t , X t )  −  b ( t, X t , α ⋆ t ) · σ ( t, X t ) ⊤ ∇ x V + f ( t, X t , X t , α ⋆ t ) − b ( t, X t , α ⋆ t ) · σ ( t, X t ) ⊤ ∂ y J ( t, X t , X t )  ≤ 0 . Rearranging this inequality to isolate the terms dependent on a and identifying σ − 1 Z t = ∇ x V and ∂ Y X t t = ∇ y J ( t, X t , X t ) , w e obtain: f ( t, X t , X t , α t ) + b ( t, X t , α t ) ·  Z t − σ ( t, X t ) ⊤ ∂ Y X t t  ≤ f ( t, X t , X t , α ⋆ t ) + b ( t, X t , α ⋆ t ) ·  Z t − σ ( t, X t ) ⊤ ∂ Y X t t  . Since this holds for any arbitrary admissible control α t , it implies that α ⋆ t maximises the expression d t ⊗ d P -a.e., concluding the proof. 29 C Pro of of the v eriﬁcation theorem In this section we present the pro of of Theorem 3.9 . W e will start by proving that the BSDE system, which was introduced informally in Section 3 , has a close relation to the problem. Let us introduce the following notation: h t ( y , x, z , a ) = f ( t, y , x, a ) + z · b ( t, x, a ) . W e also denote by L α t, ( y ) the generator asso ciated with the control α but acting on the v ariable y . F or a function ψ ( y ) , we deﬁne: L α t t, ( y ) ψ ( y ) : = b ( t, X t , α t ) · σ ( t, X t ) ⊤ ∇ y ψ ( y ) + 1 2 T r  σ ( t, X t ) σ ( t, X t ) ⊤ ∇ 2 y y ψ ( y )  . In the spirit of Hernández and Possamaï [ 21 ], for a control pro cess α ∈ A and an initial condition ( t, x ) for the state pro cess X , w e deﬁne the following auxiliary processes ( Y y ,α , Z y ,α ) . F or a ﬁxed parameter y ∈ R n , they solve the BSDE                  Y y ,α s = ξ ( y , X T ) + Z T s h u  y , X u , Z y ,α u , α u  d u − Z T s Z y ,α u d W u , s ∈ [ t, T ] , ∂ Y y ,α s = ∇ y ξ ( y , X T ) + Z T s  ∇ y f ( u, y , X u , α u ) + ∂ Z y ,α u b ( u, X u , α u )  d u − Z T s ∂ Z y ,α u d W u , s ∈ [ t, T ] , ∂ ∂ Y y ,α s = ∇ 2 y y ξ ( y , X T ) + Z T s  ∇ 2 y y f ( u, y , X u , α u ) + ∂ ∂ Z y ,α u b ( u, X u , α u )  d u − Z T s ∂ ∂ Z y ,α u d W u , s ∈ [ t, T ] . (C.1) W e see that the structure of the system is the same as the one of ( 3.7 ), and w e will imp ose the same concept of solution. W e start the analysis with the follo wing lemma. Lemma C.1. W e have that Y y ,α t = J ( t, x, y , α ) . Pr o of. W e work under the probabilit y measure P t,x,α , under which X t = x and the dynamics on [ t, T ] are controlled by α . Recall that under this measure, the Brownian motion is W α . Substituting the dynamics of X in to the ﬁrst equation of ( C.1 ), w e hav e d Y y ,α u = −  h u ( y , X u , Z y ,α u , α u ) − Z y ,α u · b ( u, X u , α u )  d u + Z y ,α u · d W α u , u ∈ [ t, T ] . The drift term simpliﬁes to − f ( u, y , X u , α u ) . Integrating from t to T Y y ,α t = ξ ( y , X T ) + Z T t f ( u, y , X u , α u )d u − Z T t Z y ,α u · d W α u . T aking exp ectations under P t,x,α eliminates the sto c hastic integral: Y y ,α t = E P t , x , α  Z T t f ( u, y , X u , α u )d u + ξ ( y , X T )  . By deﬁnition, the right-hand side is exactly the cost functional J ( t, x, y , α ) . In other w ords, the pro cess Y y ,α t captures the dynamics of the rew ard functional if we ﬁx the v alue y . Note that this could ha ve been deduced from the PDE of J , as it is easy to show that Y y ,α ⋆ t = J ( t, X t , y ) . The idea now is to ﬁx an equilibrium con trol α ⋆ and to understand the corresp onding pro cess Y X t ,α ⋆ t . One key observ ation is that it can b e understoo d from tw o p erspectives: ( i ) from that of ( C.1 ), ﬁxing the v alue of y = X t and considering the resulting dynamics. This sho ws that Y X t ,α ⋆ t = J ( t, x, x, α ⋆ ) = V ( t, x ) = V ( t, X t ); ( ii ) or seen as an Itô pro cess: w e hav e deﬁned a uni-parametric family of pro cesses, and consider Y X t ,α ⋆ t as a comp osition of the family with a pro cess. In other words, we let the sup erscript parameter change as time adv ances. In the informal deriv ation of the BSDE system, we wrote Y t = V ( t, X t ) . The ﬁrst goal of the section is that, starting from the BSDE system ( 3.7 ), we can reco ver this rigorously . As we already hav e that Y X t ,α ⋆ t = V ( t, X t ) , we must sho w now that Y X t ,α ⋆ t = Y t under suitable assumptions, which happ en to b e the ones introduced in Section 3 . 30 Prop osition C.2. L et Assumption 3.1 hold. L et ( Y , Z , ∂ Y , ∂ Z , ∂ ∂ Y , ∂ ∂ Z ) b e a solution to ( 3.7 ) in the sense of Deﬁnition 3.6 with α ⋆ t = V ⋆ ( t, X t , Z t , ∂ Y X t t ) . Then, we have that under P α ⋆ Y t = Y X t ,α ⋆ t , t ∈ [0 , T ] . Pr o of. As the equilibrium con trol α ⋆ maximises the Hamiltonian H , we substitute the optimal drift into the ﬁrst equation of ( 3.7 ). Recall that the Hamiltonian is given by: H ( t, x, z , γ , η , ρ ) = f ( t, x, x, α ⋆ t ) + b ( t, x, α ⋆ t ) · ( z − σ ( t, x ) ⊤ γ ) − 1 2 T r  σ ( t, x ) σ ( t, x ) ⊤ η  − T r  σ ( t, x ) ρ ⊤  . Th us, the dynamics of Y under the reference measure P is d Y t = −  f ( t, X t , X t , α ⋆ t ) + b ( t, X t , α ⋆ t ) ·  Z t − σ ( t, X t ) ⊤ ∂ Y X t t  − 1 2 T r  σ ( t, X t ) σ ( t, X t ) ⊤ ∂ ∂ Y X t t  − T r  σ ( t, X t )( ∂ Z X t t ) ⊤   d t + Z t d W t . W e c hange the measure to P α ⋆ using the transformation d W t = d W α ⋆ t + b ( t, X t , α ⋆ t )d t . The term Z t · b ( t, X t , α ⋆ t ) arising from the Girsano v transformation cancels with the term − b ( t, X t , α ⋆ t ) · Z t inside the Hamiltonian driver. This yields the following dynamics for Y under P α ⋆ Y t = ξ ( X T , X T ) + Z T t  f ( u, X u , X u , α ⋆ u ) + b ( u, X u , α ⋆ u ) · σ ( u, X u ) ⊤ ∂ Y X u u + 1 2 T r  σ ( u, X u ) σ ( u, X u ) ⊤ ∂ ∂ Y X u u  + T r  σ ( u, X u ) ∂ ( Z X u u ) ⊤   d u − Z T t Z u · d W α ⋆ u . (C.2) No w we apply the Itô–Kunita–W entzell formula to the comp osed pro cess Y X t ,α ⋆ t . F rom the auxiliary system ( C.1 ), for a ﬁxed y , the pro cess Y y ,α ⋆ satisﬁes the dynamics under P α ⋆ d Y y ,α ⋆ u = − f ( u, y , X u , α ⋆ u )d u + Z y ,α ⋆ u · d W α ⋆ u . The dynamics of the comp osition Y X t ,α ⋆ t is giv en by d Y X t ,α ⋆ t = d Y y ,α ⋆ t   y = X t + ∂ y Y X t ,α ⋆ t · d X t + 1 2 T r  σ ( t, X t ) σ ( t, X t ) ⊤ ∂ 2 y y Y X t ,α ⋆ t  d t + T r  σ ( t, X t )( ∂ Z X t ,α ⋆ t ) ⊤  d t, where ∂ Z X t ,α ⋆ denotes the gradient of the ﬁeld ∇ y Z y ,α ⋆ | y = X t . Substituting d X t = σ ( t, X t ) b ( t, X t , α ⋆ t )d t + σ ( t, X t )d W α ⋆ t d Y X t ,α ⋆ t =  − f ( t, X t , X t , α ⋆ t ) + ∂ y Y X t ,α ⋆ t · σ ( t, X t ) b ( t, X t , α ⋆ t ) + 1 2 T r  σ ( t, X t ) σ ( t, X t ) ⊤ ∂ 2 y y Y X t ,α ⋆ t  + T r  σ ( t, X t )( ∂ Z X t ,α ⋆ t ) ⊤   d t +  Z X t ,α ⋆ t + ∂ y Y X t ,α ⋆ t · σ ( t, X t )  · d W α ⋆ t . Rearranging the drift term ∂ y Y · σ b = b · σ ⊤ ∂ y Y , and identifying the cross-v ariation trace term T r[ σ ( ∂ Z ) ⊤ ] with the Hamiltonian term T r[ σ ∂ Z ] , we observe that Y X t ,α ⋆ satisﬁes the exact same linear BSDE as Y deriv ed in ( C.2 ). Sp eciﬁcally , we iden tify the v ariable Z t with the diﬀusion term Z X t ,α ⋆ t + σ ⊤ t ( X t ) ∂ Y X t t , and we identify the auxiliary ﬁeld deriv ativ es ∂ y Y and ∂ 2 y y Y with the solution pro cesses ∂ Y and ∂ ∂ Y (whic h satisfy the same equations by uniqueness). Th us, b y the uniqueness of solutions to BSDEs, we conclude Y t = Y X t ,α ⋆ t . W e remark that we hav e managed to arrive at the BSDE system that we had deduced from the PDE system app earing in [ 8 ] from purely probabilistic arguments, namely the Itô–Kunita–W en tzell form ula. With the cen tral Prop osition C.2 prov en, w e mo ve on with the pro of of Theorem 3.9 . Pr o of of The or em 3.9 . Let ( t, x ) b e a ﬁxed pair in [ 0 , T ] × R n and let α b e an arbitrary admissible con trol in A . W e aim to v erify the equilibrium condition given in Deﬁnition 2.4 . F or a strictly p ositiv e time step ℓ > 0 , we consider the concatenated con trol strategy ˆ α deﬁned by ˆ α : = α ⊗ ℓ α ⋆ . W e analyse the diﬀerence betw een the cost of this p erturbed strategy and the cost of the equilibrium strategy , J ( t, x, ˆ α ) − J ( t, x, α ⋆ ) . 31 Recall that the v alue function is deﬁned as v ( t, x ) = J ( t, x, α ⋆ ) . W e expand the cost of the p erturb ed strategy using the deﬁnition of the cost functional J ( t, x, ˆ α ) = E P t,x,α  Z t + ℓ t f ( r, x, X r , α r )d r + Z T t + ℓ f ( r, x, X r , α ⋆ r )d r + ξ ( x, X T )  = E P t,x,α  Z t + ℓ t f ( r, x, X r , α r )d r + E P t,x,α  Z T t + ℓ f ( r, x, X r , α ⋆ r )d r + ξ ( x, X T )     F t + ℓ  . Using the concatenated measure prop ert y , we iden tify the conditional exp ectation as the auxiliary v alue function Y , ev aluated with the ﬁxed preference parameter x under the equilibrium control α ⋆ : J ( t, x, ˆ α ) = E P t,x,α  Z t + ℓ t f ( r, x, X r , α r )d r + Y x,α ⋆ t + ℓ  . W e add and subtract the equilibrium v alue function at time t + ℓ , which satisﬁes the relation v ( t + ℓ, X t + ℓ ) = Y X t + ℓ ,α ⋆ t + ℓ . This yields J ( t, x, ˆ α ) = E P t,x,α  Z t + ℓ t f ( r, x, X r , α r )d r + v ( t + ℓ, X t + ℓ )  + I , where the term I captures the cost of inconsistency due to the changing preference parameter I : = E P t,x,α  Y x,α ⋆ t + ℓ − Y X t + ℓ ,α ⋆ t + ℓ  . Since w e assumed that v ( t, x ) is in C 1 , 2 ([0 , T ) × R n ) , let us apply Itô’s formula to the pro cess v ( s, X s ) on the interv al [ t, t + ℓ ] under the measure P t,x,α . Note that d X r = b ( r , X r , α r )d r + σ ( r , X r )d W α r . v ( t + ℓ, X t + ℓ ) = v ( t, x ) + Z t + ℓ t  ∂ t v ( r , X r ) + b ( r , X r , α r ) · ∂ x v ( r , X r ) + 1 2 T r  σ ( r , X r ) σ ( r , X r ) ⊤ ∂ 2 xx v ( r , X r )   d r + Z t + ℓ t ∂ x v ( r , X r ) · σ ( r , X r )d W α r . T aking exp ectations under P t,x,α eliminates the sto chastic integral . Indeed, we identify the in tegrand ∂ x v ( r , X r ) · σ ( r , X r ) as the pro cess Z r from the BSDE gov erning v . By Deﬁnition 3.6 , we hav e Z ∈ H 2 ( R d , F , P t,x,α ) . Consequently , the sto c hastic in tegral is a true martingale with zero exp ectation. Substituting this into the expression for J ( t, x, ˆ α ) , we obtain J ( t, x, ˆ α ) − v ( t, x ) = E P t,x,α  Z t + ℓ t  f ( r, x, X r , α r ) + ∂ t v ( r , X r ) + b ( r , X r , α r ) · ∂ x v ( r , X r ) + 1 2 T r  σ ( r , X r ) σ ( r , X r ) ⊤ ∂ 2 xx v ( r , X r )   d r  + I . T o analyse I , w e apply Condition ( A.2 ) to the map y 7− → Y y ,α ⋆ t + ℓ along the pro cess X r for r ∈ [ t, t + ℓ ] . This yields Y X t + ℓ ,α ⋆ t + ℓ − Y x,α ⋆ t + ℓ = Z t + ℓ t ∂ Y X r ,α ⋆ t + ℓ · d X r + 1 2 Z t + ℓ t T r  σ ( r , X r ) σ ( r , X r ) ⊤ ∂ ∂ Y X r ,α ⋆ t + ℓ  d r . Substituting the dynamics d X r = b ( r , X r , α r )d r + σ ( r , X r )d W α r , w e isolate the stochastic integral term: Z t + ℓ t ∂ Y X r ,α ⋆ t + ℓ · σ ( r , X r )d W α r . T aking exp ectations under P t,x,α , this term v anishes. Indeed, for any ﬁxed parameter y , the pro cess ∂ Y y solv es a linear BSDE whose driver ∇ y f and terminal condition ∇ y ξ ha ve p olynomial growth in y and x (Assumption 3.1 - ( iii ) ). Standard BSDE estimates ( e.g. , [ 15 , Prop osition 2.1]) imply that the solution ∂ Y y inherits this p olynomial gro wth. Consequently , when ev aluated at y = X r , the integrand ∂ Y · σ has p olynomial gro wth in X r . Giv en the ﬁnite moments of X (Assumption 3.1 - ( iv ) ), the in tegrand b elongs to H 2 ( R d , F , P t,x,α ) , making the integral a true martingale with zero mean. W e are th us left with the drift terms I = − E P t , x , α  Z t + ℓ t  b ( r , X r , α r ) · σ ( r , X r ) ⊤ ∂ y Y X r ,α ⋆ t + ℓ + 1 2 T r  σ ( r , X r ) σ ( r , X r ) ⊤ ∂ 2 y y Y X r ,α ⋆ t + ℓ   d r  . W e now combine the results. W e add and subtract t wo sp eciﬁc terms inside the integral 32 1. the running cost ev aluated at the current state preference f ( r, X r , X r , α r ); 2. the generator adjustment term ev aluated at the current time L α r r, ( y ) Y X r ,α ⋆ r . Grouping these terms appropriately , w e obtain the following decomp osition J ( t, x, ˆ α ) − v ( t, x ) = E P t,x,α " Z t + ℓ t  ∂ t v ( r , X r ) + L α r r v ( r , X r ) + f ( r, X r , X r , α r ) − L α r r, ( y ) Y X r ,α ⋆ r  | {z } T erm A: Hamiltonian gap d r # + E P t,x,α " Z t + ℓ t  f ( r, x, X r , α r ) − f ( r, X r , X r , α r )  | {z } T erm B: preference approximation d r # + E P t,x,α " Z t + ℓ t  L α r r, ( y ) Y X r ,α ⋆ r − L α r r, ( y ) Y X r ,α ⋆ t + ℓ  | {z } T erm C: continuit y error d r # . Analysis of term A. This term measures the lo cal sub-optimalit y of the con trol α . Let I r denote the integrand I r : = ∂ t v ( r , X r ) + L α r r v ( r , X r ) + f ( r, X r , X r , α r ) − L α r r, ( y ) Y X r ,α ⋆ r . W e identify ∂ t v using the ﬁrst equation of the BSDE system ( 3.7 ). Under the reference measure P , the drift of the pro cess Y r = v ( r, X r ) is given by the driver − H . Comparing this with the drift obtained from Itô’s formula applied to v ( r , X r ) , we establish the identit y ∂ t v ( r , X r ) = − H  r , X r , Z r , ∂ Y X r r , ∂ ∂ Y X r r , ∂ Z X r r  − 1 2 T r  σ ( r , X r ) σ ( r , X r ) ⊤ ∂ 2 xx v ( r , X r )  , d r ⊗ P –a.e. (C.3) Substituting this expression into I r , and expanding the op erators L α r and L α r ( y ) , w e observe tw o key cancellations ( i ) the diﬀusion term 1 2 T r[ σ σ ⊤ ∂ 2 xx v ] from the generator L α r v cancels with the corresp onding term in ( C.3 ) ; ( ii ) the inconsistency terms inv olving ∂ 2 y y J and ∂ 2 xy J app earing in L α r ( y ) Y dep end only on the volatilit y σ (which is con trol- indep enden t) and cancel exactly with the inconsistency adjustment terms included in the deﬁnition of the extended Hamiltonian H . Consequen tly , the in tegrand reduces to the diﬀerence b et ween the Hamiltonian ob jectiv e ev aluated at the arbitrary control α r and its maximum v alue I r =  f ( r, X r , X r , α r ) + b ( r , X r , α r ) ·  Z r − σ ( r , X r ) ⊤ ∂ Y X r r  − sup a ∈ A  f ( r, X r , X r , a ) + b ( r , X r , a ) ·  Z r − σ ( r , X r ) ⊤ ∂ Y X r r  . Th us, I r ≤ 0 almost surely , and w e readily obtain Z t + ℓ t I r d r ≤ 0 , P t,x,α –a.s. Analysis of term B. W e use the Lipschitz-con tinuit y of f with resp ect to its ﬁrst parameter (Assumption 3.1 ). Let L b e the Lipsc hitz-contin uity constan t. Then ∥ T erm B ∥ =   f ( r, x, X r , α r ) − f ( r, X r , X r , α r )   ≤ L ∥ x − X r ∥ . T aking the exp ectation under P t,x,α     E P t , x , α  Z t + ℓ t T erm B d r      ≤ L Z t + ℓ t E P t,x,α [ ∥ X r − x ∥ ]d r . Using standard moment estimates for SDEs with linear growth co eﬃcien ts (see [ 29 , Corollary 2.5.12]), we hav e E P t,x,α [ ∥ X r − x ∥ ] ≤ C (1 + ∥ x ∥ ) √ r − t . Thus Z t + ℓ t √ r − t d r =  2 3 ( r − t ) 3 / 2  t + ℓ t = 2 3 ℓ 3 / 2 = o ( ℓ ) . 33 Analysis of term C. This term arises from the time-contin uit y of the inconsistency adjustment. W e analyse the integral of the diﬀerence ∆ r : = L α r r, ( y ) Y X r ,α ⋆ r − L α r r, ( y ) Y X r ,α ⋆ t + ℓ . Recall that the op erator L α ( y ) is linear in the deriv atives ∂ y Y · ,α ⋆ and ∂ 2 y y Y · ,α ⋆ , with co eﬃcien ts b and σ that satisfy linear gro wth conditions. Since the solution to the auxiliary BSDE system b elongs to the space S 2 ( R d , F , P ) , the mappings r 7− → ∂ y Y · ,α ⋆ r and r 7− → ∂ 2 y y Y · ,α ⋆ r are contin uous with resp ect to time in the norm of L 2 ( P t,x,α ) . F urthermore, the state pro cess X deﬁnes a mapping r 7− → X r whic h is con tinuous with resp ect to time in L p ( P t,x,α ) for any p ≥ 1 . By Hölder’s inequalit y , the comp osition app earing in ∆ r is con tinuous with resp ect to time in L 1 ( P t,x,α ) . Therefore, we readily obtain: E P t,x,α  Z t + ℓ t ∆ r d r  = Z t + ℓ t o (1)d r = o ( ℓ ) . Com bining the non-p ositivit y of T erm A with the o ( ℓ ) estimates for T erms B and C, we obtain J ( t, x, ˆ α ) − v ( t, x ) ≤ 0 + o ( ℓ ) + o ( ℓ ) . This conﬁrms that the equilibrium strategy α ⋆ pro vides a higher pay oﬀ than the p erturbed strategy ˆ α up to ﬁrst order, thereb y satisfying the deﬁnition of an equilibrium con trol. Remark C.3. This r esult motivates Deﬁnition 2.4 in the fol lowing sense: one c ould ar gue that it would make sense to al low for impr ovements of or der o ( ℓ k ) , sinc e the use of k = 1 in our deﬁnition c ould se em arbitr ary at ﬁrst. However, we se e her e that k = 1 is exactly the p ower that we ne e d to guar ante e the r esult. D W ell-p osedness of the BSDE system In this section, we provide the rigorous pro of for the existence and uniqueness of the solution to the system ( 3.7 ). W e adopt a ﬁxed-p oin t approach on the full system of three equations. T o handle the linear gro wth of the v alue function deriv atives (t ypical in linear–quadratic problems), we work in w eighted spaces that allo w for polynomial growth in the parameter y . W e also sho w that our w ork implies the existence of solutions in the sense of Deﬁnition 3.6 . T o ease the notation, we will denote by C an arbitrary constant that may change line by line. W e ﬁrst introduce and prov e the follo wing standard a priori estimate (similar to El Karoui, Peng, and Quenez [ 15 , Prop osition 2.1]). Lemma D.1 (A Priori Estimates and Contraction) . L et ( δ Y , δ Z ) b e the solution to the line arize d BSDE with driver diﬀer enc e δ f : − d( δ Y t ) = δ f t d t − δ Z t d W t , δ Y T = 0 . (D.1) F or β suﬃciently lar ge, the fol lowing estimate holds: ∥ δ Y ∥ 2 S 2 β ( R , F , P ) + ∥ δ Z ∥ 2 H 2 β ( R d , F , P ) ≤ C β ∥ δ f ∥ 2 H 2 β ( R , F , P ) . (D.2) Pr o of. W e start by applying Itô’s form ula to the pro cess e β t | δ Y t | 2 : d( e β t | δ Y t | 2 ) = β e β t | δ Y t | 2 d t + e β t  2 δ Y t · d( δ Y t ) + | δ Z t | 2 d t  = e β t  β | δ Y t | 2 + | δ Z t | 2 − 2 δ Y t · δ f t  d t + 2 e β t δ Y t · δ Z t d W t . In tegrating from 0 to T , taking exp ectations, and using the fact that δ Y T = 0 and the stochastic integral is a martingale, we get: E  Z T 0 e β s  β | δ Y s | 2 + | δ Z s | 2  d s  ≤ 2 E  Z T 0 e β s δ Y s · δ f s d s  . W e now use Y oung’s inequality , 2 ab ≤ β 2 a 2 + 2 β b 2 on the right-hand side: 2 δ Y s · δ f s ≤ β 2 | δ Y s | 2 + 2 β | δ f s | 2 . 34 Substituting this back into the integral equality: E  Z T 0 e β s  β | δ Y s | 2 + | δ Z s | 2  d s  ≤ E  Z T 0 e β s  β 2 | δ Y s | 2 + 2 β | δ f s | 2  d s  . Subtracting the term β 2 ∥ δ Y ∥ 2 H 2 β from both sides yields: β 2 ∥ δ Y ∥ 2 H 2 β + ∥ δ Z ∥ 2 H 2 β ≤ 2 β ∥ δ f ∥ 2 H 2 β . This inequalit y immediately giv es tw o b ounds: 1. ∥ δ Y ∥ 2 H 2 β ≤ 4 β 2 ∥ δ f ∥ 2 H 2 β . 2. ∥ δ Z ∥ 2 H 2 β ≤ 2 β ∥ δ f ∥ 2 H 2 β . Note that the b ound for ∥ δ Z ∥ 2 H 2 β is the one we need, and that we residually obtained a strong b ound for ∥ δ Y ∥ 2 H 2 β that will also b e useful later. The latter will help us prov e our desired b ound for ∥ δ Y ∥ 2 S 2 β . T o b ound the supremum, we return to the integral form of the pro cess e β t | δ Y t | 2 . By integrating the Itô diﬀerential from t to T and using the terminal condition δ Y T = 0 , w e hav e: e β t | δ Y t | 2 + Z T t e β s ( β | δ Y s | 2 + | δ Z s | 2 )d s = Z T t 2 e β s δ Y s · δ f s d s − Z T t 2 e β s δ Y s · δ Z s d W s . The in tegral term on the left-hand side is non-negative. Thus: e β t | δ Y t | 2 ≤ Z T t 2 e β s | δ Y s || δ f s | d s +     Z T t 2 e β s δ Y s · δ Z s d W s     . W e no w tak e the supremum ov er t ∈ [ 0 , T ] on b oth sides, follow ed b y the exp ectation. F or the ﬁrst term on the right (the drift), w e eﬀectively b ound it by the in tegral ov er [0 , T ] : E  sup t ∈ [0 ,T ] e β t | δ Y t | 2  ≤ E Z T 0 e β s | 2 δ Y s · δ f s | d s | {z } Drift part + 2 E  sup t ∈ [0 ,T ]     Z t 0 e β s δ Y s · δ Z s d W s      | {z } Martingale M t . Using Y oung’s inequalit y for the drift: E " sup t ∈ [0 ,T ] e β t | δ Y t | 2 # ≤ E Z T 0 e β s  β | δ Y s | 2 + 1 β | δ f s | 2  d s + 2 E " sup t ∈ [0 ,T ] | M t | # ≤ C β ∥ δ f ∥ 2 H 2 β + 2 E " sup t ∈ [0 ,T ] | M t | # . Here, we used that the term β ∥ δ Y ∥ 2 is b ounded by C β ∥ δ f ∥ 2 . T o b ound the martingale term, we apply the Burkholder-Da vis- Gundy inequalit y in its L 1 form: E " sup t ∈ [0 ,T ]     Z t 0 e β s δ Y s δ Z s d W s     # ≤ 3 E   Z T 0 e 2 β s | δ Y s | 2 | δ Z s | 2 d s ! 1 / 2   ≤ 3 E   sup r ∈ [0 ,T ] e β r/ 2 | δ Y r | ! Z T 0 e β s | δ Z s | 2 d s ! 1 / 2   . Using again Y oung’s inequality ab ≤ 1 4 a 2 + C b 2 , and putting everything together: E " sup t ∈ [0 ,T ] e β t | δ Y t | 2 # ≤ C β ∥ δ f ∥ 2 H 2 β + 1 2 E " sup r ∈ [0 ,T ] e β r | δ Y r | 2 # + C E " Z T 0 e β s | δ Z s | 2 d s # . 35 The middle term in the RHS can b e absorb ed into the left-hand side of our supremum estimate. The last term is prop ortional to ∥ δ Z ∥ 2 H 2 β , whic h we already b ounded by 2 β ∥ δ f ∥ 2 . Concluding, w e get: ∥ δ Y ∥ 2 S 2 β + ∥ δ Z ∥ 2 H 2 β ≤ C β ∥ δ f ∥ 2 H 2 β , whic h is what w e wan ted to pro ve. W e also present this immediate corollary , which will prov e useful when proving that the central map in the pro of of Theo- rem 3.12 is a contraction. Corollary D.2. Consider two r e al BSDEs − d Y t = f t d t − Z t · d W t and − d Y ′ t = f ′ t d t − Z ′ t · d W t , taking values in R . A ssume that Y T = Y ′ T . L et δ Y : = Y − Y ′ , δ Z : = Z − Z ′ , and δ f t : = f t − f ′ t . Supp ose that ther e exists a c onstant K > 0 and a non-ne gative pr o c ess ϕ ∈ H 2 β ( R , F , P ) such that | δ f t | ≤ K  | δ Y t | + ∥ δ Z t ∥ + ϕ t  , d t ⊗ d P –a.e. Then, for β lar ge enough, ther e exists a c onstant C such that ∥ δ Y ∥ 2 S 2 β ( R , F , P ) + ∥ δ Z ∥ 2 H 2 β ( R d , F , P ) ≤ C β ∥ ϕ ∥ 2 H 2 β ( R , F , P ) . Pr o of. F rom Lemma D.1 we hav e that there exists C such that: ∥ δ Y ∥ 2 S 2 β ( R , F , P ) + ∥ δ Z ∥ 2 H 2 β ( R d , F , P ) ≤ C K β  T ∥ δ Y ∥ 2 S 2 β ( R , F , P ) + ∥ δ Z ∥ 2 H 2 β ( R d , F , P ) + ∥ ϕ ∥ 2 H 2 β ( R , F , P ) ! ≤ C K max (1 , T ) β  ∥ δ Y ∥ 2 S 2 β ( R , F , P ) + ∥ δ Z ∥ 2 H 2 β ( R d , F , P ) + ∥ ϕ ∥ 2 H 2 β ( R , F , P ) ! , where w e implicitly used: ∥ δ Y ∥ 2 H 2 β = E Z T 0 e β t | δ Y t | 2 dt ≤ T ∥ δ Y ∥ 2 S 2 β . Rearranging, w e get: β − C K max (1 , T ) β  ∥ δ Y ∥ 2 S 2 β ( R , F , P ) + ∥ δ Z ∥ 2 H 2 β ( R d , F , P )  ≤ C β ∥ ϕ ∥ 2 H 2 β ( R , F , P ) . Assuming, for instance, that β > 2 C K max (1 , T ) , w e hav e that:  ∥ δ Y ∥ 2 S 2 β ( R , F , P ) + ∥ δ Z ∥ 2 H 2 β ( R d , F , P )  ≤ 2 C β ∥ ϕ ∥ 2 H 2 β ( R , F , P ) . No w we are ready to prov e our existence and uniqueness result for weigh ted spaces: Pr o of of The or em 3.12 . W e pro ceed by constructing a contraction mapping on the Banac h space K β . W e deﬁne the map Φ : K β − → K β as follows. Let w = ( y , z , u, v , u , v ) b e a ﬁxed input tuple in K β . This input serves as the bac kground pro cesses frozen in the driv ers. W e deﬁne the output W = ( Y , Z, U, V , U , V ) = Φ( w ) as the unique solution to the following decoupled system of BSDEs d Y t = − H  t, X t , Z t , u X t t , u X t t , v X t t  d t + Z t · d W t , (D.3) d U y t = − G 1  t, X t , y , z t , v y t , u y t  d t + V y t · d W t , (D.4) d U y t = − G 2  t, X t , y , z t , v y t , v y t  d t + V y t · d W t . (D.5) In this system, the underlined terms indicate that the driv ers depend on the input w rather than the solution v ariables b eing solv ed for. Sp eciﬁcally , the ﬁrst equation for Y dep ends on the diagonal terms of the input ﬁelds ( u, u , v ) ev aluated at the random state X t . The second and third equations are parameterised by y ∈ R n and dep end on the input ﬁelds ev aluated at 36 that sp eciﬁc parameter y . Since the system is decoupled and the drivers satisfy the Lipschitz and gro wth conditions from Assumption 3.11 , standard BSDE theory guaran tees that a unique solution W exists for any given input w . Step 1. Let us prov e that the map Φ is w ell-deﬁned, that is, that for ev ery input w in K n,d β and W = Φ( w ) , w e ha ve that W ∈ K n,d β . W e must th us v erify that each comp onen t of the solution vector W = ( Y , Z, U, V , U , V ) has a ﬁnite norm in its resp ectiv e w eighted space. ( i ) The value pr o c esses ( Y , Z ) . The pair ( Y , Z ) solves the BSDE Y t = ξ ( X T , X T ) + Z T t H  r , X r , Z r , u X r r , u X r r , v X r r  d r − Z T t Z r d W r , t ∈ [0 , T ] . By the standard a priori estimates for BSDEs with Lipschitz contin uous drivers (see, e.g. , El Karoui, Peng, and Quenez [ 15 , Proposition 2.1], where we take f 2 = 0 and ξ 2 = 0 ), the squared norm of the solution in S 2 β ( R , F , P ) × H 2 β ( R d , F , P ) is b ounded by the square-integrabilit y of the terminal condition and the driver ev aluated at zero volatilit y . Sp eciﬁcally , there exists a constant C > 0 dep ending on T and the Lipschitz constant of H such that ∥ Y ∥ 2 S 2 β ( R , F , P ) + ∥ Z ∥ 2 H 2 β ( R d , F , P ) ≤ C E P  e β T | ξ ( X T , X T ) | 2 + Z T 0 e β r   H  r , X r , 0 , u X r r , u X r r , v X r r    2 d r  . Using the Lipschitz contin uity of H with resp ect to the inputs Θ : = ( z , u, u , v ) and the growth assumption on the base term H ( · , 0) , w e hav e   H  r , X r , 0 , u X r r , u X r r , v X r r    2 ≤ 2   H ( r, X r , 0 , 0 , 0 , 0)   2 + 2 K 2  ∥ u X r r ∥ 2 + ∥ u X r r ∥ 2 + ∥ v X r r ∥ 2  . The base term E P [ R T 0 | H ( r, X r , 0 , 0 , 0 , 0) | 2 d r ] is ﬁnite by Assumption 3.11 - ( iii ) . T o b ound the input terms, w e rely on the embedding of the weigh ted spaces. Recall that for any input ﬁeld, sa y u ∈ S 2 , 2 β ,ρ , we ha ve the p oin twise b ound ∥ u y r ∥ 2 ≤ ρ ( y ) − 1 ∥ u ∥ 2 S 2 , 2 β,ρ ( R , F , P ) . Substituting the random parameter y = X r E P  Z T 0 e β r ∥ u X r r ∥ 2 d r  ≤ E P  sup t ∈ [0 ,T ] ρ ( X t ) − 1  ∥ u ∥ 2 S 2 , 2 β,ρ ( R , F , P ) . Since ρ ( x ) − 1 has p olynomial growth and X admits ﬁnite moments of all orders (Assumption 3.1 ), the exp ectation E P [sup t ∈ [0 ,T ] ρ ( X t ) − 1 ] is ﬁnite. An iden tical argument applies to u and v . Consequently , the right-hand side of the a priori estimate is ﬁnite, implying ( Y , Z ) ∈ S 2 β ( R , F , P ) × H 2 β ( R d , F , P ) . ( ii ) The gr adient pr o c esses ( U, V ) . F or any ﬁxed parameter y ∈ R n , the pair ( U y , V y ) solv es a BSDE driven by G 1 . Applying the standard a priori estimate (see [ 15 ]) yields: ∥ U y ∥ 2 S 2 β ( R n , F , P ) + ∥ V y ∥ 2 H 2 β ( R n × d , F , P ) ≤ C E P  e β T |∇ y ξ ( y , X T ) | 2 + Z T 0 e β t   G 1  t, X t , y , z t , v y t , u y t    2 d t  . By Assumption 3.11 . ( ii ) , the driver G 1 is Lipschitz contin uous with resp ect to the input v ariables Θ : = ( z , v , u ) . There- fore, w e can b ound the squared driv er by the source term (at zero input) and the norms of the inputs:   G 1  t, X t , y , z t , v y t , u y t    2 ≤ C    G 1 ( t, X t , y , 0)   2 + ∥ z t ∥ 2 + ∥ v y t ∥ 2 + ∥ u y t ∥ 2  . T o verify that these pro cesses b elong to K β , we multiply the entire estimate by the weigh t ρ ( y ) and take the supremum o ver y ∈ R n . The inequality splits in to tw o parts (a) sour c e terms: by Assumption 3.11 - ( iii ) , the source terms hav e ﬁnite w eighted norms. Sp eciﬁcally sup y ∈ R n ρ ( y ) E P  |∇ y ξ ( y , X T ) | 2 + Z T 0   G 1 ( t, X t , y , 0)   2 d t  < ∞ ; (b) input terms: the inputs belong to K β , so their weigh ted norms are ﬁnite sup y ∈ R n ρ ( y ) E P  Z T 0 e β t  ∥ z t ∥ 2 + ∥ v y t ∥ 2 + ∥ u y t ∥ 2  d t  ≤ C  ∥ z ∥ 2 H 2 β ( R d , F , P ) + ∥ v ∥ 2 H 2 , 2 β,ρ ( R n , F , P ) + ∥ u ∥ 2 S 2 , 2 β,ρ ( R n × n , F , P )  < ∞ . 37 Com bining these b ounds prov es that the output pair ( U, V ) has a ﬁnite weigh ted norm, i.e. , ( U, V ) ∈ S 2 , 2 β ,ρ ( R n , F , P ) × H 2 , 2 β ,ρ ( R n × d , F , P ) . ( iii ) The Hessian pr o c esses ( U , V ) . The argument is strictly identical to the gradient case, as the driver G 2 satisﬁes the same condition. Th us, W ∈ K β ( F , P ) . Step 2. Next, to prov e that Φ is a contraction for a suﬃcien tly large β , let us consider tw o arbitrary inputs w and w ′ in K β ( F , P ) . Let W = Φ( w ) and W ′ = Φ( w ′ ) b e their corresp onding outputs. W e denote the diﬀerences by δ w = w − w ′ and δ W = W − W ′ . Our goal is to derive an estimate for ∥ δ W ∥ K β ( F , P ) in terms of ∥ δ w ∥ K β ( F , P ) . ( i ) Estimation of the value pr o c ess ( Y , Z ) . Consider the ﬁrst equation for the v alue pro cess Y , which is scalar-v alued. The diﬀerence in the drivers, denoted by δ H t , is b ounded p oin twise by the diﬀerences in the solution comp onen ts and the inputs. Let us deﬁne the scalar aggregate error pro cess ϕ t for the inputs as: ϕ t : = ∥ δ z t ∥ + ∥ δ u X t t ∥ + ∥ δ u X t t ∥ + ∥ δ v X t t ∥ . By the Lipschitz contin uity of the Hamiltonian H (Assumption 3.11 ), we hav e the p oint wise b ound | δ H t | ≤ C ( | δ Y t | + ∥ δ Z t ∥ + ϕ t ) . Applying corollary D.2 to the scalar BSDE for δ Y , we obtain the following bound in the standard w eighted spaces: ∥ δ Y ∥ 2 S 2 β ( R , F , P ) + ∥ δ Z ∥ 2 H 2 β ( R d , F , P ) ≤ C β ∥ ϕ ∥ 2 H 2 β ( R , F , P ) = C β E P  Z T 0 e β t ϕ 2 t d t  . T o relate this integral to the norms of the random ﬁelds in K β , we use the inequality ( a + b + c + d ) 2 ≤ 4( a 2 + b 2 + c 2 + d 2 ) to separate the components of ϕ t . W e then bound the integral of each term using the moment constan t M X : = E P [sup t ∈ [0 ,T ] (1 + ∥ X t ∥ 2 ) k ] . F or instance, for the gradien t term δ u , w e hav e: E P  Z T 0 e β t ∥ δ u X t t ∥ 2 d t  = E P  ρ ( X t ) ρ ( X t ) Z T 0 e β t ∥ δ u X t t ∥ 2 d t  ≤ C E P  sup t ∈ [0 ,T ] (1 + ∥ X t ∥ 2 ) k  ∥ δ u ∥ 2 S 2 , 2 β,ρ ( R n , F , P ) ≤ C M X ∥ δ u ∥ 2 S 2 , 2 β,ρ ( R n , F , P ) , Applying iden tical estimates for the Hessian term δ u (in the weigh ted S 2 , 2 space) and the volatilit y gradient term δ v (in the w eighted H 2 , 2 space), and b ounding the integrals, we arrive at the ﬁnal estimate for the v alue process: ∥ δ Y ∥ 2 S 2 β ( R , F , P ) + ∥ δ Z ∥ 2 H 2 β ( R d , F , P ) ≤ 3 C M X β  ∥ δ z ∥ 2 H 2 β ( R d , F , P ) + ∥ δ u ∥ 2 S 2 , 2 β,ρ ( R n , F , P ) + ∥ δ u ∥ 2 S 2 , 2 β,ρ ( R n × n , F , P ) + ∥ δ v ∥ 2 H 2 , 2 β,ρ ( R n × d , F , P )  . (D.6) ( ii ) Estimation of the gr adient pr o c ess ( U, V ) . Next, we consider the system for the gradient ( D.4 ). F or a ﬁxed parameter y ∈ R n , the diﬀerence in the driv er G 1 satisﬁes the Lipschitz condition stated in Assumption 3.11 ∥ ∆ G 1 ( t, X t , y ) ∥ ≤ C  ∥ δ V y t ∥ + ∥ δ z t ∥ + ∥ δ v y t ∥ + ∥ δ u y t ∥  . W e apply the stability estimate from corollary D.2 for this ﬁxed y (or rather, a lifted version of it to R n ), and follow the same reasoning as in Step 1. W e obtain: ∥ δ U y ∥ 2 S 2 β ( R n , F , P ) + ∥ δ V y ∥ 2 H 2 β ( R n × d , F , P ) ≤ C β E P  Z T 0 e β t  ∥ δ z t ∥ 2 + ∥ δ v y t ∥ 2 + ∥ δ u y t ∥ 2  d t  . W e now lift this p oin twise estimate to the functional space norm. W e m ultiply the entire inequality by the ﬁxed weigh t ρ ( y ) , and take the supremum. Using that 2 sup( a 2 + b 2 ) ≥ sup( a 2 ) + sup( b 2 ) , and p oten tially changing the constants, w e obtain: ∥ δ U ∥ 2 S 2 , 2 β,ρ ( R n , F , P ) + ∥ δ V ∥ 2 H 2 , 2 β,ρ ( R n × d , F , P ) ≤ C C ρ β  ∥ δ z ∥ 2 H 2 β ( R d , F , P ) + ∥ δ v ∥ 2 H 2 , 2 β,ρ ( R n × d , F , P ) + ∥ δ u ∥ 2 S 2 , 2 β,ρ ( R n × n , F , P )  . (D.7) 38 ( iii ) Estimation of the Hessian pr o c ess ( U , V ) . The analysis for the Hessian system ( D.5 ) mirrors that of the gradien t exactly . The driv er G 2 satisﬁes the same Lipschitz condition. Mu ltiplying by ρ ( y ) , taking the suprem um, and using the large β estimate yields ∥ δ U ∥ 2 S 2 , 2 β,ρ ( R n × n , F , P ) + ∥ δ V ∥ 2 H 2 , 2 β,ρ ( R n × n × d , F , P ) ≤ C C ρ β  ∥ δ z ∥ 2 H 2 β ( R d , F , P ) + ∥ δ v ∥ 2 H 2 , 2 β,ρ ( R n × d , F , P ) + ∥ δ v ∥ 2 H 2 , 2 β,ρ ( R n × n × d , F , P )  . (D.8) ( iv ) Conclusion. W e sum the inequalities ( D.6 ), ( D.7 ), and ( D.8 ). Let ∥ δ W ∥ 2 K β denote the total squared norm of the diﬀerence in the output, whic h is the sum of the squared norms of all comp onen ts. Similarly , let ∥ δ w ∥ 2 K β denote the norm of the input diﬀerence. Combining the estimates, we ﬁnd ∥ δ W ∥ 2 K β ( F , P ) =  ∥ δ Y ∥ 2 S 2 β ( R , F , P ) + ∥ δ Z ∥ 2 H 2 β ( R d , F , P )  +  ∥ δ U ∥ 2 S 2 , 2 β,ρ ( R n , F , P ) + ∥ δ V ∥ 2 H 2 , 2 β,ρ ( R n × d , F , P )  + +  ∥ δ U ∥ 2 S 2 , 2 β,ρ ( R n × n , F , P ) + ∥ δ V ∥ 2 H 2 , 2 β,ρ ( R n × n × d , F , P )  ≤ ˜ C β  ∥ δ z ∥ 2 H 2 β ( R d , F , P ) + ∥ δ u ∥ 2 S 2 , 2 β,ρ ( R n , F , P ) + ∥ δ v ∥ 2 H 2 , 2 β,ρ ( R n × d , F , P ) + ∥ δ u ∥ 2 S 2 , 2 β,ρ ( R n × n , F , P ) + ∥ δ v ∥ 2 H 2 , 2 β,ρ ( R n × n × d , F , P )  ≤ ˜ C β ∥ δ w ∥ 2 K β ( F , P ) , where ˜ C dep ends only on the Lipschitz-con tin uity constants, the weigh t parameter k , the maturit y T and the moments of X . By choosing β > ˜ C , the factor ˜ C β b ecomes strictly less than 1. This prov es that the map Φ is a contraction on the Banac h space K β ( F , P ) when β > ˜ C . Consequen tly , b y the Banach ﬁxed-p oin t theorem, there exists a unique ﬁxed p oin t W ⋆ ∈ K β ( F , P ) such that W ⋆ = Φ( W ⋆ ) . This ﬁxed p oin t is the unique solution to the BSDE system ( 3.7 ) in K β ( F , P ) . The second part of the theorem is a direct consequence of Lemma 2.10 . 39

Here, there and everywhere: state-dependent time-inconsistent stochastic control

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment