Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions

The Metropolis-adjusted Langevin (MALA) algorithm is a sampling algorithm which makes local moves by incorporating information about the gradient of the logarithm of the target density. In this paper we study the efficiency of MALA on a natural class…

Authors: Natesh S. Pillai, Andrew M. Stuart, Alex

Optimal scaling and diffusion limits for the Langevin algorithm in high   dimensions
The Annals of Applie d Pr obabil ity 2012, V ol. 22, N o. 6, 2320 –2356 DOI: 10.1214 /11-AAP828 c  Institute of Mathematical Statistics , 2 012 OPTIMAL SCALING AND DIFFUSION LIMITS FO R THE LANGEVIN ALGORITHM IN HIGH DIMENSIONS By Na tesh S. Pillai 1 , Andrew M. Stuar t 2 and Alexandr e H. Thi ´ er y 3 Harvar d University, Warwick University and Warwick U niversity The Metropolis-adjusted Langevin (MALA) algorithm is a sam- pling algorithm whic h makes local mov es b y incorporating informa tion abou t the gradient of the logarithm of the targ et densit y . In this pap er w e study the efficiency of MALA on a n atural class of target measures supp orted on an infin ite dimensional Hilb ert space. These natural mea- sures ha ve d ensit y with respect to a Gaussian random field measure and arise in many ap plications such as Bay esian nonp arametric statis- tics and the theory of conditioned diffu sions. W e prov e that, started in stationarit y , a suitably interpolated and scaled version of the Marko v chai n corresp onding to MALA converg es to an infinite dimensional dif- fusion pro cess. Our results imply th at, in stationarity , the MALA al- gorithm applied to an N -dimensional approximation of t he target will take O ( N 1 / 3 ) step s to exp lore the in v ariant measure, comparing fa- vora bly with the Random W alk Metropolis whic h was recently shown to require O ( N ) steps when app lied to th e same class of problems. As a by-produ ct of the diffusion limit, it also follo ws th at the MALA algorithm is optimized at an a verage acceptance probability of 0 . 574. Previous results were prov ed only for targets which are prod ucts of one-dimensional distributions, or for vari ants of th is situation, limiting their applicabilit y . The correlation in our target means that the resca led MALA algorithm converge s weakly to an in fi nite dimensional H ilb ert space v alued diffu sion, and the limit cannot b e d escrib ed through anal- ysis of scalar diffusions. The limit theorem is prov ed by sh owing that a drift-martingale decomp osition of t h e Marko v chain, suitably scaled, closely resembles a wea k Eu ler–Maruy ama d iscretization of the puta- tive limit. An inv ariance principle is pro ved for the martingale, and a conti nuous mappin g argument is u sed to complete the pro of. Received March 2011; revised N ov ember 2011. 1 Supp orted by NSF Gran t DMS-11-07070. 2 Supp orted by EPSRC and ERC. 3 Supp orted by (EPSRC-funded) CRISM. AMS 2000 subje ct classific ations. Primary 60J20; secondary 65C05. Key wor ds and phr ases. Mark o v chai n Monte Carlo, Metrop olis-adjusted Langevin al- gorithm, scaling limit, diffusion appro ximation. This is a n electronic r eprint of the original ar ticle published by the Institute of Mathematical Statistics in The Annals of Applie d Pr ob ability , 2012, V ol. 22 , No. 6, 2 320– 2 356 . This reprint differ s from the origina l in pagination and typog raphic detail. 1 2 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y 1. In tro d uction. Sampling probabilit y distrib utions π N in R N for N large is of interest in numerous applications arising in applied p robabilit y and statistics. Th e Mark o v c hain Mon te Carlo (MCMC) methodology [ 21 ] pro v id es a fr amew ork f or man y algo rithms wh ic h affect this sampling. It is hence of in terest to quantify the compu tational cost of MCMC metho ds as a function of dimension N . This pap er is part of a r esearc h program designed to develo p the analysis of MCMC in h igh dim en sions so that it ma y b e use- fully applied to u nderstand target measures whic h arise in applications. The simplest class of target measur es for wh ic h analysis can b e carried out are p erhap s target distribu tions π N of the f orm dπ N dλ N ( x ) = N Y i =1 f ( x i ) . (1.1) Here λ N ( dx ) is the N -dimensional Leb esgue measure, and f ( x ) is a one- dimensional p robabilit y density function. Thus π N has th e form of an i.i.d. pro du ct. Using unders tanding gained in this situatio n, w e will develo p an analysis, that is, rele v ant to an imp ortan t class of nonpro du ct measures whic h arise in a range of app lications. W e start by describin g the MCMC metho ds wh ic h are studied in th is pa- p er. Cons id er a π N -in v ariant metrop olis Hastings–Mark o v chai n { x k ,N } k ≥ 1 . F rom the current state x , we prop ose y drawn from the kernel q ( x, y ); this is then accepted with prob ab ility α ( x, y ) = 1 ∧ π N ( y ) q ( y , x ) π N ( x ) q ( x, y ) . Tw o widely used prop osals are the random w alk prop osal (obtained from the d iscr ete approxima tion of Bro wn ian motion) y = x + √ 2 δ Z N , Z N ∼ N(0 , I N ) , (1.2) and t he Lan gevin prop osal (obtained from the time d iscretizatio n of the Langevin d iffu sion) y = x + δ ∇ log π N ( x ) + √ 2 δ Z N , Z N ∼ N(0 , I N ) . (1.3) Here 2 δ is the prop osal v ariance, a p arameter qu an tifyin g the size of the discrete time increment; w e will consider “lo cal pr op osals” for whic h δ is small. Th e Marko v c hain corresp onding to prop osal ( 1.2 ) is the Random W a lk Metrop olis (R WM) algorithm [ 20 ], and the Marko v transition rule constructed fr om th e prop osal ( 1.3 ) is known as the Metrop olis Adjusted Langevin Alg orith m (MALA) [ 21 ]. This pap er is aimed at an alyzing the computational complexit y of the MALA algorithm in h igh dimensions. A fru itful wa y to quanti fy the computational cost of these Marko v c hains whic h pr o ceed v ia lo cal prop osals is to determine the “optimal” size of in- LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 3 cremen t δ as a function of dimension N (the precise notion of optimalit y is discussed b elo w). A sim p le h euristic su ggests the existence of such an “optimal scale” for δ : smaller v alues of the pr op osal v ariance lead to high acceptance r ates, b ut the c hain do es not mo ve muc h ev en when accepted, and therefore ma y not b e efficien t. Larger v alues of the prop osal v ariance lead to larger mo v es, but then the acceptance probabilit y is tiny . Th e opti- mal scale for the p rop osal v ariance strikes a b alance b etw een making large mo ves and still ha ving a reasonable acceptance probab ility . In order to qu an- tify this idea it is useful to defi n e a con tin uous in terp olan t of the Marko v c hain as follo ws: z N ( t ) =  t ∆ t − k  x k +1 ,N +  k + 1 − t ∆ t  x k ,N (1.4) for k ∆ t ≤ t < ( k + 1)∆ t. W e c ho ose the pr op osal v ariance to satisfy δ = ℓ ∆ t , with ∆ t = N − γ setting the scale in terms of dimension and the parameter ℓ a “tuning” parameter whic h is indep endent of the dimen sion N . Key qu estions, th en , concern the c hoice of γ and ℓ . If z N con verge s weakly to a suitable stationary diffusion pro cess, th en it is natural to deduce th at the n um b er of Marko v c hain steps required in stati onarit y is in versely prop ortional to the pr op osal v ariance, and hence to ∆ t, and so gro ws lik e N γ . Th e parametric dep end ence of the limiting diffusion pro cess then pr ovides a selection mec h anism for ℓ . A r esearch program along these lin es wa s in itiated by Rob erts and co work ers in th e pair of p ap ers [ 22 , 23 ]. Th ese pap ers concerned the R WM and MALA algorithms, resp ectiv ely , when applied to th e target ( 1.1 ). In b oth cases it w as sho wn that the pro jection of z N in to an y single fi x ed co ord inate direction x i con verge s w eakly in C ([0 , T ]; R ) to z , th e scalar d iffusion pr o cess dz dt = h ( ℓ )[log f ( z )] ′ + p 2 h ( ℓ ) dW dt (1.5) for h ( ℓ ) > 0, a constant determined by the parameter ℓ from the prop osal v ariance. F or R WM the scaling of th e pr op osal v ariance to ac hieve this limit is determined by the c hoice γ = 1 [ 22 ], while for MALA γ = 1 3 [ 23 ]. The anal- ysis shows that the num b er of steps required to s amp le the target measure gro ws as O ( N ) for R WM, bu t only as O ( N 1 / 3 ) for MALA. This quan tifies the efficiency gained b y use of MALA o v er R WM, and in particular from emplo ying lo cal mo ve s informed by the gradien t of the logarithm of th e tar- get densit y . A second imp ortan t feat ure of the analysis is that it s u ggests that the optimal c hoice of ℓ is th at which maximizes h ( ℓ ). This v alue of ℓ leads, in b oth cases, to a u niv ers al [indep en d en t of f ( · )] optimal a verage acceptance probability (to three significant figu r es) of 0 . 234 for R WM and 0 . 574 for MALA. 4 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y These th eoretical analyses ha ve had a huge practical impact as the opti- mal acceptance pr obabilities send a concrete message to practitioners: one should “tune” the prop osal v ariance of the R WM and MALA algorithms so as to ha ve acc eptance pr obabilities of 0 . 234 and 0 . 574, resp ectiv ely . Ho w- ev er, practitioners u se these tuning criteria far outside the class of target distributions giv en by ( 1.1 ). It is natural to ask whether they are wise to do so. Extensiv e simulations (see [ 24 , 26 ]) sho w that these optimalit y re- sults also h old for more complex target distributions. F urtherm ore, a range of subsequent theoretical analyses confirmed that the optimal scaling ideas do indeed extend b ey ond ( 1.1 ); these pap ers stud ied sligh tly more compli- cated mo d els, such as pro du cts of one-dimensional distributions w ith differ- en t v ariances and elliptically symmetric distributions [ 1 , 2 , 9 , 11 ]. Ho wev er, the diffu sion limits obtained remain essen tially one dimensional in all of these extensions. 4 In th is pap er we study considerably more complex target distributions whic h are not of the pr o duct form, and the limiting d iffusion tak es v alues in an infi nite dimensional space. Our p ersp ectiv e on these p roblems is motiv ated b y applications su c h as Ba y esian nonparametric statistics, for example, in application to inv er s e problems [ 27 ], and the th eory of conditioned diffus ions [ 15 ]. In b oth these areas the target measure of in terest, π , is on an infin ite d imensional real separable Hilb ert space H and , for Gaussian pr iors (inv erse p roblems) or ad- ditiv e noise (diffusions) is abs olutely contin uous with r esp ect to a Gaussian measure π 0 on H with mean ze ro and co v ariance op erator C . This frame- w ork for the analysis of MCMC in h igh dimensions was first studied in th e pap ers [ 6 – 8 ]. Th e Radon–Nik o dy m deriv ative defining the target measure is assumed to ha ve the form dπ dπ 0 ( x ) = M Ψ exp( − Ψ( x )) (1.6) for a real-v alued functional Ψ : H s 7→ R defined on a su b space H s ⊂ H that con tains the supp ort of the reference measure π 0 ; h ere M Ψ is a normalizing constan t. W e are in terested in studyin g MCMC metho ds applied to finite dimensional approxima tions of this measure found by p ro jecting ont o the first N eigenfunctions of the co v ariance op erator C of the Gaussian r eference measure π 0 . It is pr o ved in [ 12 , 16 , 17 ] th at the measure π is in v ariant for H -v alued SDEs (or sto c h astic PDEs–SP DEs) with the form dz dt = − h ( ℓ )( z + C ∇ Ψ( z )) + p 2 h ( ℓ ) dW dt , z (0) = z 0 , (1.7) where W is a Bro wn ian motion (see [ 12 ]) in H with co v ariance op erator C . In [ 19 ] the R WM algorithm is studied wh en applied to a sequence of fin ite di- 4 The pap er [ 10 ] contains an infinite dimensional diffusion limit, but we h av e b een unable to employ the techniques of that pap er. LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 5 mensional approximat ions of π as in ( 1.6 ). The con tinuous time interp olan t of the Mark o v chain z N giv en by ( 1.4 ) is sho wn to conv erge wea kly to z solv- ing ( 1.7 ) in C ([0 , T ]; H s ). F ur thermore, as for the i.i.d. target measure, the scaling of the prop osal v ariance which ac hiev es th is scaling limit is in versely prop ortional to N (i.e., corresp ond s to the exp onen t γ = 1), and the sp eed of the limiting diffus ion pro cess is maximized at the same u n iv ersal acceptance probabilit y of 0 . 234 that was fou n d in the i.i.d. case. Thus, remark ably , the i.i.d. case h as b een of f u ndamenta l imp ortance in un derstanding MCMC metho ds app lied to complex in fi nite dimensional p robabilit y measures aris- ing in practice. Th e pap er [ 19 ] dev elop ed an appr oac h for deriving diffusion limits for su ch algorithms, using ideas from numerical analysis. W e can bu ild on these tec hn iques to d eriv e scaling limits for a wid e range of Metrop olis– Hastings algorithms with lo cal p r op osals. The p urp ose of this article is to devel op the tec hniqu es in the con text of the MALA algorithm. T o the b est of our kno wledge, the only pap er to consider the optimal s caling for the MALA algorithm for nonpr o duct targets is [ 9 ], in th e conte x t of non lin ear regression. In [ 9 ] the target measure has a str u cture similar to that of the mean field mo dels studied in statistical mec hanics an d hence b eha v es asymptotically lik e a pro d uct measure when the dimension go es to infin ity . Th us th e diffusion limit obtained in [ 9 ] is finite d imensional. The main con tribution of our work is the pro of of a d iffusion limit for th e output of the MALA algorithm, suitably in terp olated, to the SPDE ( 1.7 ), when app lied to N -dimensional app ro ximations of the target m easures ( 1.6 ) with prop osal v ariance inv ersely prop ortional to N 1 / 3 . Moreo v er we sho w that the sp eed h ( ℓ ) of the limiting diffusion is maximized for an a ve rage acceptance probabilit y of 0 . 574, just as in the i.i.d. pro d uct scenario [ 23 ]. Th us in this regard, our wo rk is the fir st extension of the r emark able re- sults in [ 23 ] for the Langevin algorithm to target m easur es wh ich are not of pro du ct form. This adds theoretical w eight to the resu lts observ ed in com- putational exp eriments which demonstrate the robustness of th e optimalit y criteria d ev elop ed in [ 22 , 23 ]. In particular, the p ap er [ 7 ] sho ws n umerical results indicating th e need to scale time-step as a f unction of dimens ion to obtain O (1) acceptance p robabilities. In Section 2 w e s tate the main theorem of the pap er, h a ving d efi ned precisely the setting in whic h it holds. Secti on 3 cont ains the pro of of the main theorem, p ostp oning the p ro of of a n um b er of k ey tec hnical estimates to Section 4 . In Section 5 w e conclude b y summarizing and pr o viding the outlo ok for fur ther researc h in this area. 2. Main theorem. This section is devot ed to stating the main theorem of the article. Ho w ev er, the setting is complex, and we deve lop it in a step-b y- step fashion, b efore the theorem statemen t. In Section 2.1 w e in tro duce the 6 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y form of the r eference, or prior, Gaussian m easure π 0 , follo w ed in Section 2.2 b y the change of measur e wh ic h ind u ces a genuinely nonp ro du ct structure. In S ection 2.3 w e describ e fin ite dimensional approxima tion of the measure, enabling us to defi ne application of a v arian t MALA-t yp e algorithm in Sec- tion 2.4 . W e then discuss in Section 2.5 how the c hoice of scaling used in the theorem emerges f r om study of the acceptance probabilities. Finally , in Section 2.6 , w e state the main theorem. Throughout the pap er we use the follo wing notation in order to compare sequences and to d enote conditional exp ectations: • Two sequ en ces { α n } and { β n } satisfy α n . β n if there exists a constan t K > 0 satisfying α n ≤ K β n for all n ≥ 0. Th e notation α n ≍ β n means that α n . β n and β n . α n . • Two s equ ences of real fu nctions { f n } and { g n } defined on the same set D satisfy f n . g n if there exists a constant K > 0 satisfying f n ( x ) ≤ K g n ( x ) for all n ≥ 0 and all x ∈ D . T h e notation f n ≍ g n means that f n . g n and g n . f n . • The notat ion E x [ f ( x, ξ )] denotes exp ectation with r esp ect to ξ with the v ariable x fixed. 2.1. Gaussian r e f er enc e me asur e. Let H b e a separable Hilb ert s p ace of r eal v alued fu nctions with scalar pr o duct d enoted b y h· , ·i and asso ciated norm k x k 2 = h x, x i . C onsider a Gaussian pr obabilit y measur e π 0 on ( H , k · k ) with co v ariance op erator C . The general theory of Gaussian measures [ 12 ] ensures th at the op erator C is p ositiv e and trace cla ss. Let { ϕ j , λ 2 j } j ≥ 1 b e the eigenfun ctions and eigen v alues of the co v ariance op erator C : C ϕ j = λ 2 j ϕ j , j ≥ 1 . W e assu m e a normalization under w hic h the family { ϕ j } j ≥ 1 forms a com- plete orthonormal basis in the Hilb ert space H , wh ic h w e refer to us as the Karh unen–Lo ` ev e basis. An y function x ∈ H can b e represented in this basis via the expansion x = ∞ X j =1 x j ϕ j , x j def = h x, ϕ j i . (2.1) Throughout this p ap er w e will often ident ify th e fu n ction x w ith its co- ordinates { x j } ∞ j =1 ∈ ℓ 2 in th is eigen basis, mo ving freely b et ween the t w o represent ations. Th e Karhunen–Lo ` ev e expansion (see [ 12 ], Section White noise exp ansions ), refers to the f act that a realization x from the Gaussian measure π 0 can b e expressed b y allo win g the co ordin ates { x j } j ≥ 1 in ( 2.1 ) to b e indep endent rand om v ariables distributed as x j ∼ N(0 , λ 2 j ). Th us, in the coord inates { x j } j ≥ 1 , the Gaussian reference measure π 0 has a pro duct structure. LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 7 F or ev ery x ∈ H w e ha ve representat ion ( 2.1 ). Using this expansion, w e define Sob olev-lik e spaces H r , r ∈ R , with the inner-pro du cts and norms defined by h x, y i r def = ∞ X j =1 j 2 r x j y j , k x k 2 r def = ∞ X j =1 j 2 r x 2 j . (2.2) Notice that H 0 = H and H r ⊂ H ⊂ H − r for an y r > 0. The Hilb ert–Sc hmidt norm k · k C asso ciated to the co v ariance op erator C is defined as k x k 2 C = X j λ − 2 j x 2 j . F or x, y ∈ H r , the outer pro duct op erator in H r is the op erator x ⊗ H r y : H r → H r defined b y ( x ⊗ H r y ) z def = h y, z i r x for ev er y z ∈ H r . F or r ∈ R , let B r : H 7→ H d enote the op erator w hic h is diagonal in the basis { ϕ j } j ≥ 1 with d iagonal en tries j 2 r . The op erator B r satisfies B r ϕ j = j 2 r ϕ j so that B 1 / 2 r ϕ j = j r ϕ j . The op erator B r lets u s alternate b et we en the Hilb ert sp ace H and the Sob olev sp aces H r via the identitie s h x, y i r = h B 1 / 2 r x, B 1 / 2 r y i . Since k B − 1 / 2 r ϕ k k r = k ϕ k k = 1, w e deduce that { B − 1 / 2 r ϕ k } k ≥ 0 forms an or- thonormal basis for H r . F or a p ositiv e, self-adjoin t op erator D : H 7→ H , we define its trace in H r b y T r H r ( D ) def = ∞ X j =1 h ( B − 1 / 2 r ϕ j ) , D ( B − 1 / 2 r ϕ j ) i r . (2.3) Since T r H r ( D ) do es not d ep end on the orthonormal basis, the op erator D is said to b e tr ace class in H r if T r H r ( D ) < ∞ f or some, and hen ce any , orthonormal b asis of H r . Let u s defin e the op erator C r def = B 1 / 2 r C B 1 / 2 r . Notice that T r H r ( C r ) = P ∞ j =1 λ 2 j j 2 r . In [ 19 ] it is shown that und er the condition T r H r ( C r ) < ∞ , (2.4) the supp ort of π 0 is included in H r in the sense that π 0 -almost ev ery f unction x ∈ H b elongs to H r . F u rthermore, the ind uced distr ib ution of π 0 on H r is iden tical to th at of a cent ered Gaussian measure on H r with co v ariance op erator C r . F or example, if ξ D ∼ π 0 , then E [ h ξ , u i r h ξ , v i r ] = h u, C r v i r for any functions u, v ∈ H r . Th u s in wh at f ollo ws, we alternate b et ween the Gauss ian measures N(0 , C ) on H and N(0 , C r ) on H r , for those r for whic h ( 2.4 ) holds. 2.2. Change of me asur e. Our goal is to sample from a measure π defined through the c hange of probabilit y formula ( 1.6 ). As describ ed in Section 2.1 , the condition T r H r ( C r ) < ∞ imp lies that the measure π 0 has full supp ort on H r , that is, π 0 ( H r ) = 1 . Consequently , if T r H r ( C r ) < ∞ , the fu nctional Ψ( · ) 8 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y needs only to b e defined on H r in order for the c h ange of p robabilit y formula ( 1.6 ) to b e v alid. In this section, we giv e assumptions on th e deca y of the eigen v alues of the co v ariance op erator C of π 0 that en s ure the existence of a real num b er s > 0 su c h th at π 0 has full sup p ort on H s . The fu nctional Ψ( · ) is assum ed to b e defin ed on H s , and we imp ose regularit y assum ptions on Ψ( · ) that ensu re that the probability distribu tion π is not to o different from π 0 , wh en p ro jected into directions asso ciated with ϕ j for j large. F or eac h x ∈ H s the deriv ativ e ∇ Ψ ( x ) is an elemen t of the dual ( H s ) ∗ of H s , comprising linear fu n ctionals on H s . Ho w eve r, we ma y identify ( H s ) ∗ with H − s and view ∇ Ψ ( x ) as an elemen t of H − s for eac h x ∈ H s . With this iden tifi cation, the follo wing identit y h olds: k∇ Ψ( x ) k L ( H s , R ) = k∇ Ψ( x ) k − s , and the second deriv ativ e ∂ 2 Ψ( x ) can b e identified as an element of L ( H s , H − s ). T o a void tec h nicalities we assume that Ψ( · ) is qu ad r atically b ound ed, with the first deriv ativ e linearly b ounded, and the second deriv ativ e globally b ound ed. W eak er assumptions could b e dealt with by u se of stopping time argumen ts. Assumpt ion 2.1. The co v ariance op erator C and functional Ψ satisfy the follo w ing: (1) De c ay of Eigenvalues λ 2 j of C : there is an exp onen t κ > 1 2 suc h that λ j ≍ j − κ . (2.5) (2) Assumptions on Ψ : There exist constan ts M i ∈ R , i ≤ 4 , and s ∈ [0 , κ − 1 / 2) s uc h that f or all x ∈ H s the fu nctional Ψ : H s → R satisfies M 1 ≤ Ψ( x ) ≤ M 2 (1 + k x k 2 s ) , (2.6) k∇ Ψ( x ) k − s ≤ M 3 (1 + k x k s ) , (2.7) k ∂ 2 Ψ( x ) k L ( H s , H − s ) ≤ M 4 . (2.8) Remark 2.2. Th e condition κ > 1 2 ensures that the co v ariance op erator C is trace class in H . In fact, equation ( 2.4 ) sho ws that C r is trace-class in H r for an y r < κ − 1 2 . It follo ws that π 0 has full measure in H r for any r ∈ [0 , κ − 1 / 2). In particular π 0 has full s upp ort on H s . Remark 2.3. The fu nctional Ψ ( x ) = 1 2 k x k 2 s satisfies Assumption 2.1 . It is defined on H s and its deriv ativ e at x ∈ H s is giv en b y ∇ Ψ( x ) = P j ≥ 0 j 2 s x j ϕ j ∈ H − s with k∇ Ψ( x ) k − s = k x k s . Th e second deriv ativ e ∂ 2 Ψ( x ) ∈ L ( H s , H − s ) is the linear op erator that maps u ∈ H s to P j ≥ 0 j 2 s h u, ϕ j i ϕ j ∈ H s : its n orm satisfies k ∂ 2 Ψ( x ) k L ( H s , H − s ) = 1 for an y x ∈ H s . LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 9 Since the eigenv alues λ 2 j of C decrease as λ j ≍ j − κ , the op erator C has a smo othing effec t: C α h gains 2 ακ orders of regularity in th e sense that the H β -norm of C α h is con tr olled by the H β − 2 ακ -norm of h ∈ H . Indeed, u nder Assumption 2.1 , the f ollo wing estimates holds: k h k C ≍ k h k κ and kC α h k β ≍ k h k β − 2 ακ . (2.9) The pro of follo ws the metho dology used to prov e Lemma 3 . 3 of [ 19 ]. The reader is referred to this text for more details. 2.3. Finite dimensional appr oximation. W e are interested in finite di- mensional approxima tions of the probabilit y distribution π . T o this end, w e introd uce the ve ctor space spanned by the fir st N eigenfunctions of the co v ariance op erator, X N def = span { ϕ 1 , ϕ 2 , . . . , ϕ N } . Notice that X N ⊂ H r for an y r ∈ [0; + ∞ ). In p articular, X N is a subspace of H s . Next, we define N -dimensional appr o ximations of the functional Ψ( · ) and of the reference measure π 0 . T o this end, we introdu ce th e orthogonal pro jection on X N denoted by P N : H s 7→ X N ⊂ H s . The fun ctional Ψ( · ) is appro ximated by the fun ctional Ψ N : X N 7→ R defined by Ψ N def = Ψ ◦ P N . (2.10) The appro ximation π N 0 of the referen ce measure π 0 is the Gaussian m easure on X N giv en by the la w of the r andom v ariable π N 0 D ∼ N X j =1 λ j ξ j ϕ j = ( C N ) 1 / 2 ξ N , where ξ j are i.i.d. standard Gaussian rand om v ariables, ξ N = P N j =1 ξ j ϕ j and C N = P N ◦ C ◦ P N . Consequently we ha ve π N 0 = N(0 , C N ). Finally , one can define the approximati on π N of π b y the c h ange of probability formula dπ N dπ N 0 ( x ) = M Ψ N exp( − Ψ N ( x )) , (2.11) where M Ψ N is a normalization constan t. Notice that the prob ab ility distri- bution π N is supp orted on X N and h as Leb esgue densit y 5 on X N equal to π N ( x ) ∝ exp( − 1 2 k x k 2 C N − Ψ N ( x )) . (2.12) 5 F or ease of notation we do not distinguish b etw een a measure and its d ensity , nor do w e distinguish b etw een the representa tion of th e measure in X N or in co ordinates in R N . 10 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y In formula ( 2.12 ), the Hilb ert–Schmidt norm k · k C N on X N is giv en b y the scalar pro duct h u, v i C N = h u, ( C N ) − 1 v i for all u, v ∈ X N . The op erator C N is inv ertible on X N b ecause the eigen v alues of C are assu med to b e strictly p ositiv e. The quantit y C N ∇ log π N ( x ) is rep eatedly used in th e text and, in particular, app ears in the fu nction µ N ( x ) giv en by µ N ( x ) = − ( P N x + C N ∇ Ψ N ( x )) (2.13) whic h , up to an additiv e constant, is C N ∇ log π N ( x ) . Th is function is the drift of an ergo dic Langevin diffusion that lea ves π N in v ariants. Similarly , one defi n es the fu nction µ : H s → H s giv en by µ ( x ) = − ( x + C ∇ Ψ ( x )) (2.14) whic h can inform ally b e seen as C ∇ log π ( x ), up to an add itiv e constan t. In the sequel, Lemma 4.1 sho ws that, for π 0 -almost ev ery fu nction x ∈ H , we ha ve lim N →∞ µ N ( x ) = µ ( x ). This quan tifies th e manner in whic h µ N ( · ) is an app ro ximation of µ ( · ). The n ext lemma gathers v arious regularit y estimates on the functional Ψ( · ) and Ψ N ( · ) that are r ep eatedly used in the sequel. Th ese are simple consequences of Assump tion 2.1 , and pro ofs can b e found in [ 19 ]. Lemma 2.4 (Prop erties of Ψ ). L et the functional Ψ( · ) satisfy Assump- tion 2.1 and c onsider the fu nc tional Ψ N ( · ) define d by e quation ( 2.1 0 ). The fol lowing estimates hold: (1) The functionals Ψ N : H s → R satisfy the same c onditions imp ose d on Ψ given by e quations ( 2.6 ), ( 2.7 ) and ( 2.8 ) with c onstants that c an b e chosen indep endent of N . (2) The function C ∇ Ψ : H s → H s is glob al ly Lipschitz on H s : ther e exists a c onstant M 5 > 0 suc h that kC ∇ Ψ( x ) − C ∇ Ψ( y ) k s ≤ M 5 k x − y k s ∀ x, y ∈ H s . Mor e over, the fu nctions C N ∇ Ψ N : H s → H s also satisfy this estimate with a c onstant that c an b e chosen indep endently of N . (3) The fu nctional Ψ( · ) : H s → R satisfies a se c ond or der T aylor f ormula. 6 Ther e exists a c onstant M 6 > 0 suc h that Ψ( y ) − (Ψ( x ) + h∇ Ψ ( x ) , y − x i ) ≤ M 6 k x − y k 2 s ∀ x, y ∈ H s . (2.15) Mor e over, the functionals Ψ N ( · ) also satisfy this estimates with a c onstant that c an b e chosen indep e ndently of N . Remark 2.5. Regularit y Lemma 2.4 shows, in p articular, that the func- tion µ : H s → H s defined b y ( 2.14 ) is globally Lipsc hitz on H s . Similarly , it follo ws that C N ∇ Ψ N : H s → H s and µ N : H s → H s giv en by ( 2.13 ) are glob- ally L ip sc hitz with Lip sc hitz constan ts that can b e c h osen u niformly in N . 6 W e ex tend h· , ·i from an inner-p rod uct on H to the d ual pairing b etw een H − s and H s . LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 11 2.4. The algorithm. The MALA algorithm is defined in this section. This metho d is motiv ated by the fact that the probabilit y measure π N defined by equation ( 2.11 ) is in v ariant with resp ect to the Langevin d iffu sion pro cess dz dt = µ N ( z ) + √ 2 dW N dt , (2.16) where W N is a Brownian motion in H w ith co v ariance op er ator C N . The drift function µ N : H s → H s is the gradien t of the log-densit y of π N , as describ ed b y equation ( 2.13 ). Th e idea of the MALA algorithm is to mak e a prop osal based on Euler–Maruyama discretization of the diffu sion ( 2.16 ). T o this end w e consider, from state x ∈ X N , pr op osals y ∈ X N giv en by y − x = δ µ N ( x ) + √ 2 δ ( C N ) 1 / 2 ξ N where δ = ℓN − 1 / 3 (2.17) with ξ N = P N i =1 ξ i ϕ i and ξ i D ∼ N(0 , 1). Notice that ( C N ) 1 / 2 ξ N D ∼ N(0 , C N ). The quantit y δ is the time-step in an Euler–Maruy ama discretization of ( 2.16 ). W e intro d uce a related parameter ∆ t := ℓ − 1 δ = N − 1 / 3 whic h will b e the natural time-step for the limiting diffusion p ro cess deriv ed from the p rop osal ab o v e, after inclusion of an accept–reject mec hanism. The scaling of ∆ t , and hence δ, with N will ens ure that the a verage acceptance probabilit y is of order 1 as N gro ws. This is discussed in more detail in Section 2.5 . T he quan tit y ℓ > 0 is a fixed parameter whic h can b e chosen to maximize the sp eed of the limiting diffusion p ro cess; s ee th e discu ssion in the Introd uction and after the Main Theorem b elo w. W e will study the Mark o v chain x N = { x k ,N } k ≥ 0 resulting fr om Metrop oliz- ing this p rop osal wh en it is started at stationarit y: the initial p osition x 0 ,N is distributed as π N and thus lies in X N . Therefore, the Mark ov chai n evol v es in X N ; as a consequence, only the first N comp onen ts of an expansion in the eigen basis of C are nonzero, and the algorithm can b e implemented in R N . Ho we v er the analysis is cleaner when written in X N ⊂ H s . The acceptance probabilit y only d ep end s on the first N co ordinates of x and y an d has the form α N ( x, ξ N ) = 1 ∧ π N ( y ) T N ( y , x ) π N ( x ) T N ( x, y ) = 1 ∧ e Q N ( x,ξ N ) , (2.18) where the prop osal y is giv en by equation ( 2.17 ). The function T N ( · , · ) is the d en sit y of the Langevin pr op osals ( 2.17 ) and is given b y T N ( x, y ) ∝ exp  − 1 4 δ k y − x − δ µ N ( x ) k 2 C N  . 12 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y The lo cal mean acceptance prob ab ility α N ( x ) is d efined by α N ( x ) = E x [ α N ( x, ξ N )] . (2.19) It is the exp ected acceptance p robabilit y when the algorithm stands at x ∈ H . The Marko v c hain x N = { x k ,N } k ≥ 0 can also b e expr essed as  y k ,N = x k ,N + δ µ N ( x k ,N ) + √ 2 δ ( C N ) 1 / 2 ξ k ,N , x k +1 ,N = γ k ,N y k ,N + (1 − γ k ,N ) x k ,N , (2.20) where ξ k ,N are i.i.d. samples d istributed as ξ N , and γ k ,N = γ N ( x k ,N , ξ k ,N ) creates a Bernoulli r an d om s equ ence with k th success probabilit y α N ( x k ,N , ξ k ,N ). W e ma y view th e Bernoulli random v ariable as γ k ,N = 1 { U k <α N ( x k,N ,ξ k,N ) } where U k D ∼ Uniform(0 , 1) is ind ep endent from x k ,N and ξ k ,N . T he quan tit y Q N defined in equation ( 2.18 ) m ay b e expressed as Q N ( x, ξ N ) = − 1 2 ( k y k 2 C N − k x k 2 C N ) − (Ψ N ( y ) − Ψ N ( x )) (2.21) − 1 4 δ {k x − y − δ µ N ( y ) k 2 C N − k y − x − δ µ N ( x ) k 2 C N } . As will b e s een in the next section, a key id ea b ehind our diffusion limit is that, for large N , the quantit y Q N ( x, ξ N ) b eha ves like a Gaussian random v ariable indep endent from the current p osition x . In summary , the Marko v c h ain that we hav e d escrib ed in H s is, when pro- jected onto X N , equiv alen t to a standard MALA algo rithm on R N for the Leb esgue densit y ( 2.12 ). Recall that the target measur e π in ( 1.6 ) is the in- v ariant measure of the S PDE ( 1.7 ). Our goal is to obtain an in v ariance princi- ple for the con tinuous in terp olan t ( 1.4 ) of the Mark o v c hain x N = { x k ,N } k ≥ 0 started in s tationarit y , that is, to sho w weak con vergence in C ([0 , T ]; H s ) of z N ( t ) to the solution z ( t ) of the SPDE ( 1.7 ), as the d imension N → ∞ . 2.5. Optimal sc ale γ = 1 3 . In this section, w e informally d escrib e w h y the optimal scale f or the MALA prop osals ( 2.17 ) is giv en by the exp onent γ = 1 3 . F or pro d uct-form target probabilit y describ ed b y equation ( 1.1 ), the optimalit y of the exp onent γ = 1 3 w as fi rst obtained in [ 23 ]. F or fur ther discussion, see also [ 6 ]. T o k eep the exp osition simp le in this explanatory subsection, we fo cus on the case Ψ ( · ) = 0. The analysis is similar with a non v anishin g fu nction Ψ( · ), b ecause absolute contin uit y ensures that the effect of Ψ ( · ) is small compared to the d ominan t Gaussian effects describ ed here. In clusion of n onv anishing Ψ( · ) is carried out in Lemma 4.4 . In the case Ψ( · ) = 0, straigh tforw ard algebra s ho ws that the acceptance probabilit y α N ( x, ξ N ) = 1 ∧ e Q N ( x,ξ N ) satisfies Q N ( x, ξ N ) = − ℓ ∆ t 4 ( k y k 2 C N − k x k 2 C N ) . LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 13 F or Ψ( · ) = 0 and x ∈ X N , the p rop osal y is distributed as y = (1 − ℓ ∆ t ) x + √ 2 ℓ ∆ t ( C N ) 1 / 2 ξ N . It f ollo ws that k y k 2 C N − k x k 2 C N = − 2 ℓ ∆ t ( k x k 2 C N − k ( C N ) 1 / 2 ξ N k 2 C N ) + ( ℓ ∆ t ) 2 k x k 2 C N + 2 √ 2 ℓ ∆ t (1 − ∆ t ) h x, ( C N ) 1 / 2 ξ N i C N . The details can b e foun d in the pro of of Lemma 4.4 . Sin ce the Marko v c hain x N = { x k ,N } k ≥ 0 ev olv es in stationarit y , for all k ≥ 0, we h a ve x k ,N D ∼ π N = N(0 , C N ). Th er efore, w ith x D ∼ N(0 , C N ) and ξ N D ∼ N(0 , C N ), the la w of large n u m b ers shows that b oth k x k 2 C N and k ( C N ) 1 / 2 ξ N k 2 C N are of ord er O ( N ), while th e central limit theorem sh o ws that h x, ( C N ) 1 / 2 ξ N i C N = O ( N 1 / 2 ) and k x k 2 C N − k ( C N ) 1 / 2 ξ N k 2 C N = O ( N 1 / 2 ). F or ∆ t = ℓN − γ and γ < 1 3 , it follo ws Q N ( x, ξ N ) = − ( ℓ ∆ t ) 3 4 k x k 2 C N + O ( N 1 / 2 − 3 γ / 2 ) ≈ − ℓ 3 4 N 1 − 3 γ , whic h shows that the acceptance probability is exp onential ly s mall of or- der exp( − ℓ 3 4 N 1 − 3 γ ). The same argumen t s ho ws that for γ > 1 3 , we hav e Q N ( x, ξ N ) → 0, whic h sho ws that the av erage acceptance probabilit y con- v erges to 1. F or the critical exp onent γ = 1 3 , the acceptance probabilit y is of order O (1). In fact Lemma 4.4 sho ws that f or γ = 1 3 , ev en when Ψ( · ) is nonzero, the follo wing Gauss ian approxi mation holds: Q N ( x, ξ N ) ≈ N  − ℓ 3 4 , ℓ 3 2  . This approxi mation is k ey to deriv ation of the d iffusion limit. In sum m ary , c ho osin g γ > 1 3 leads to exp onentially small acceptance probabilities: almost all th e prop osals are r ejected so that the exp ected squared jumpin g d istance E π N [ k x k +1 ,N − x k ,N k 2 ] conv erges exp onen tially quic kly to 0 as the dimension N go es to infin it y . On the other hand, for any exp onen t γ ≥ 1 3 , the accep- tance probabilities are b ounded a wa y from zero: the Marko v c hain mo ves with jum ps of size O ( N − γ / 2 ), and the exp ected squared jumping distance is of ord er O ( N − γ ). I f w e adopt the exp ected squ ared jump ing d istance as measure of efficiency , the optimal exp onent is thus giv en by γ = 1 3 . Th is viewp oint is analyzed further in [ 6 ]. 2.6. Statement of main the or em. T he main result of this article d escrib es the b eha vior of the MALA algorithm for the optimal scale γ = 1 3 ; the pro- p osal v ariance is giv en by δ = 2 ℓN − 1 / 3 . In this case, Lemma 4.4 sho ws that the lo cal mean acceptance pr obabilit y α N ( x, ξ N ) = 1 ∧ e Q N ( x,ξ N ) satisfies Q N ( x, ξ N ) → Z ℓ D ∼ N( − ℓ 3 4 , ℓ 3 2 ). As a consequence, th e asymptotic mean ac- ceptance probabilit y of th e MALA algorithm can b e explicitly computed as 14 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y a fun ction of the parameter ℓ > 0, α ( ℓ ) def = lim N →∞ E π N [ α N ( x, ξ N )] = E [1 ∧ e Z ℓ ] . This result is rigorously pro ved as C orollary 4.6 . W e then defin e the “sp eed function” h ( ℓ ) = ℓα ( ℓ ) . (2.22) Note that the time s tep made in the p rop osal is δ = l ∆ t and that if this is accepted a f raction α ( ℓ ) of the time, then a naiv e argum en t inv oking indep en d ence sho w s that the effectiv e time-step is reduced to h ( l )∆ t. T his is made rigorous in Theorem 2.6 w hic h sho ws that the quant it y h ( ℓ ) is the asymptotic sp eed function of the limiting diffusion obtained by rescaling the Metrop olis–Hastings Mark ov c h ain x N = { x k ,N } k ≥ 0 . Theorem 2.6 (Main theorem). L et the r efer enc e me asur e π 0 and the function Ψ( · ) satisfy Assumption 2.1 . Consider the MALA algorithm ( 2.20 ) with initial c ondition x 0 ,N D ∼ π N . L et z N ( t ) b e the pie c ewise line ar, c ontinu- ous interp olant of the MALA algorithm as define d in ( 1.4 ), with ∆ t = N − 1 / 3 . Then z N ( t ) c onver ges we akly in C ([0 , T ] , H s ) to the diffusion pr o c ess z ( t ) given by dz dt = − h ( ℓ )( z + C ∇ Ψ( z )) + p 2 h ( ℓ ) dW dt (2.23) with initial distribution z (0) D ∼ π . W e no w explain the follo wing tw o imp ortant imp lications of th is r esu lt: • Since time has to b e accelerat ed b y a factor (∆ t ) − 1 = N 1 / 3 in order to observ e a diffusion limit, it follo w s that in stationarit y the w ork requ ired to explore the inv arian t measure scales as O ( N 1 / 3 ). • The sp eed at w h ic h the inv arian t measure is explored, again in stationar- it y , is maximized b y c ho osing ℓ so as to maximize h ( ℓ ); this is ac hiev ed at an av erage acceptance probabilit y 0 . 574. F r om a p r actical p oint of view, this sh o ws that one s h ould “tune” the prop osal v ariance of th e MALA algorithm s o as to ha ve a mean acceptance pr obabilit y of 0 . 574. The first implication f ollo ws fr om ( 1.4 ) since this shows that O ( N 1 / 3 ) steps of the MALA Marko v c h ain ( 2.20 ) are requir ed for z N ( t ) to appr o ximate z ( t ) on a time in terv al [0 , T ] long enough for z ( t ) to h a ve explored its in- v ariant measure. T o understand the second implication, note that if Z ( t ) solv es ( 2.23 ) with h ( ℓ ) ≡ 1, then , in law, z ( t ) = Z ( h ( ℓ ) t ) . This result sug- gests c ho osing th e v alue of ℓ that maximizes the sp eed function h ( · ) since LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 15 Fig. 1. Optimal ac c eptanc e pr ob abil ity = 0 . 574 . z ( t ) w ill then exp lore th e inv arian t measure as fast as p ossible. F or practi- tioners, who often tune algorithms according to the acceptance p robabilit y , it is relev an t to expr ess the maximization prin ciple in terms of the asymptotic mean acceptance p r obabilit y α ( ℓ ). Figure 1 shows that the sp eed function h ( · ) is maximized f or an optimal acceptance probabilit y of α ⋆ = 0 . 574, to three-decimal places. This is precisely the argument us ed in [ 23 ] for the case of pro d uct target measur es, and it is remark able that the optimal acceptance probabilit y identified in that context is also optimal for the nonp ro du ct mea- sures stu d ied in this pap er. 3. Pro of of main theorem. In Section 3.1 w e outline the p ro of strat- egy and introd uce the d rift-martingale decomp osition of our discrete-time Mark o v chain which u nderlies it. Section 3.2 conta ins statemen t and pr o of of a general diffu s ion appro xim ation, Pr op osition 3.1 . In Section 3.3 we use this prop osition to pro ve th e main theorem of this pap er, p oint ing to S ection 4 for the k ey estimates required. 3.1. Pr o of str ate gy. T o comm un icate the main ideas, we giv e a heuristic of the pro of b efore p ro ceeding to giv e full details in sub s equen t sections. Let us first examine a simpler situation: consider a scalar Lip sc h itz function µ : R → R and tw o scalar constan ts ℓ, c > 0. The u sual th eory of diffusion ap- pro x im ation for Mark o v pro cesses [ 14 ] sh o ws that the sequence x N = { x k ,N } of Marko v chains x k +1 ,N − x k ,N = µ ( x k ,N ) ℓN − 1 / 3 + p 2 ℓN − 1 / 3 c 1 / 2 ξ k , 16 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y with i.i.d. ξ k D ∼ N(0 , 1) con verge s we akly , w hen inte rp olated using a time- accele ration factor of N 1 / 3 , to the scalar diffusion dz ( t ) = ℓµ ( z ( t )) dt + √ 2 ℓ dW ( t ) where W is a Brownian motio n with v ariance V ar( W ( t )) = ct . Also, if γ k is an i . i . d . sequence of Bernoulli rand om v ariables with success rate α ( ℓ ), indep endent from the Mark ov c hain x N , one can pro ve that the sequence x N = { x k ,N } of Marko v chains giv en by x k +1 ,N − x k ,N = γ k { µ ( x k ,N ) ℓN − 1 / 3 + p 2 ℓN − 1 / 3 c 1 / 2 ξ k } con verge s weakly , wh en int erp olated u sing a time-acce leration factor N 1 / 3 , to the diffusion dz ( t ) = h ( ℓ ) µ ( z ( t )) dt + p 2 h ( ℓ ) dW ( t ) , where the sp eed fun ction is giv en b y h ( ℓ ) = ℓα ( ℓ ). This shows that the Bernoulli random v ariables { γ k } k ≥ 0 ha ve s low ed d o wn the original Marko v c hain b y a f actor α ( ℓ ) . Th e pro of of Theorem 2.6 is an application of this idea in a slight ly more general setting. The follo wing complications arise: • Instead of w orkin g with scalar diffusions, the result h olds for a Hilb ert space-v alued diffu s ion. The correlation stru cture b et we en the d ifferen t co- ordinates is n ot presen t in the preceding simple example and has to b e tak en into accoun t. • Instead of w orking with a single d rift fun ction µ , a sequence of appro xi- mations d N con vergi ng to µ h as to b e tak en into accoun t. • The Bernoulli random v ariables γ k ,N are not i . i . d . and ha v e an auto corre- lation structure. O n top of that, the Bernoulli random v ariables γ k ,N are not indep enden t from the Mark ov c hain x k ,N . This is the main difficulty in the pro of. • It should b e emph asized that the main theorem uses the fact that the MALA Mark o v c h ain is started at stationarit y; this, in particular, implies that x k ,N D ∼ π N for any k ≥ 0, whic h is crucial to th e pro of of th e inv ariance principle as it allo ws us to con trol the correlation b et wee n γ k ,N and x k ,N . The acceptance p robabilit y of prop osal ( 2.17 ) is equal to α N ( x, ξ N ) = 1 ∧ e Q N ( x,ξ N ) , and the qu an tity α N ( x ) = E x [ α N ( x, ξ N )], giv en by ( 2.19 ), r ep - resen ts the mean acceptance probabilit y wh en the Marko v c hain x N stands at x . F or our p r o of it is imp ortan t to u nderstand ho w the acceptance pr ob - abilit y α N ( x, ξ N ) dep ends on the curr en t p osition x and on the s ou r ce of randomness ξ N . Recall the quant it y Q N defined in equation ( 2.21 ): th e main observ ation is that Q N ( x, ξ N ) can b e approximat ed by a Gaussian rand om v ariable Q N ( x, ξ N ) ≈ Z ℓ , (3.1) LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 17 where Z ℓ D ∼ N( − ℓ 3 4 , ℓ 3 2 ). Th ese app ro ximations are made r igorous in Lem- ma 4.4 and Lemma 4.5 . Therefore, the Bernoulli r andom v ariable γ N ( x, ξ N ) with success probab ility 1 ∧ e Q N ( x,ξ N ) can b e appro ximated b y a Bernou lli random v ariable, in dep end en t of x , with success p r obabilit y equal to α ( ℓ ) = E [1 ∧ e Z ℓ ] . (3.2) Th us, the limiting acceptance probab ility of th e MALA algorithm is as giv en in equation ( 3.2 ). Recall that ∆ t = N − 1 / 3 . With this notation we introd u ce the dr ift func- tion d N : H s → H s giv en by d N ( x ) = ( h ( ℓ )∆ t ) − 1 E [ x 1 ,N − x 0 ,N | x 0 ,N = x ] (3.3) and the martingale difference arr a y { Γ k ,N : k ≥ 0 } defin ed by Γ k ,N = Γ N ( x k ,N , ξ k ,N ) with Γ k ,N = (2 h ( ℓ )∆ t ) − 1 / 2 ( x k +1 ,N − x k ,N − h ( ℓ )∆ td N ( x k ,N )) . (3.4) The normalization constant h ( ℓ ) defined in equation ( 2.22 ) ensures that the drift fun ction d N and the martingale difference array { Γ k ,N } are asymptoti- cally in d ep end en t from the parameter ℓ . The d rift-martingale decomp osition of the Marko v c hain { x k ,N } k then reads x k +1 ,N − x k ,N = h ( ℓ )∆ td N ( x k ,N ) + p 2 h ( ℓ )∆ t Γ k ,N . (3.5) Lemma 4.7 and L emma 4.8 exploit the Gaussian b eha vior of Q N ( x, ξ N ), describ ed in equation ( 3.1 ), in ord er to give qu an titativ e v ersions of the follo wing appr o ximations: d N ( x ) ≈ µ ( x ) and Γ k ,N ≈ N(0 , C ) , (3.6) where the fun ction µ ( · ) is defin ed by equ ation ( 2.14 ). F rom equatio n ( 3.5 ) it follo ws that for large N the ev olution of the Marko v c h ain resembles the Euler discretization of th e limiting diffu sion ( 2.23 ). Th e next s tep consists of pro v in g an inv ariance pr inciple for a rescaled ve rsion of the martingale dif- ference arra y { Γ k ,N } . The cont in u ou s pro cess W N ∈ C ([0; T ] , H s ) is defined as W N ( t ) = √ ∆ t k X j =0 Γ j,N + t − k ∆ t √ ∆ t Γ k +1 ,N for k ∆ t ≤ t < ( k + 1)∆ t. (3.7) The sequence of pro cesses { W N } N ≥ 1 con verge s w eakly as N → ∞ in C ([0 ; T ] , H s ) to a Brownian motion W in H s with co v ariance op erator equal to C s . Indeed, Pr op osition 4.10 pr ov es the stronger result ( x 0 ,N , W N ) = ⇒ ( z 0 , W ) , 18 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y where = ⇒ denotes w eak con vergence in H s × C ([0; T ] , H s ), and z 0 D ∼ π is indep en d en t of th e limiting Bro w nian motion W . Using this in v ariance prin- ciple and th e fact that th e n oise pro cess is ad d itiv e [the d iffusion co efficient of the SPDE ( 2.23 ) is constan t], the main theorem follo w s from a cont in u - ous mapp ing argumen t which we no w outline. F or any W ∈ C ([0 , T ]; H s ) we define the Itˆ o map Θ : H s × C ([0 , T ]; H s ) → C ([0 , T ]; H s ) whic h maps ( z 0 , W ) to th e un iqu e solution of the inte gral equ ation z ( t ) = z 0 − h ( ℓ ) Z t 0 µ ( z ) du + p 2 h ( ℓ ) W ( t ) ∀ t ∈ [0 , T ] . (3.8) Notice that z = Θ ( z 0 , W ) solve s the SPDE ( 2.23 ). The Itˆ o map Θ is con tin- uous, essentiall y b ecause the noise in ( 2.23 ) is add itiv e (do es n ot d ep end on the state z ). The piecewise constan t inte rp olant ¯ z N of x N is defined by ¯ z N ( t ) = x k for k ∆ t ≤ t < ( k + 1)∆ t. (3.9) Using this definition it follo ws that th e con tin uous piecewise linear inter- p olan t z N , defi ned in equation ( 1.4 ), satisfies z N ( t ) = x 0 ,N − h ( ℓ ) Z t 0 d N ( ¯ z N ( u )) du + p 2 h ( ℓ ) W N ( t ) ∀ t ∈ [0 , T ] . (3.10) Using the clo seness of d N ( · ) an d µ ( · ), and of z N and ¯ z N , w e will see that there exists a p ro cess c W N ⇒ W as N → ∞ su c h that z N ( t ) = x 0 ,N − h ( ℓ ) Z t 0 µ ( z N ( u )) du + p 2 h ( ℓ ) c W N ( t ) . Th us we m ay write z N = Θ( x 0 ,N , c W N ). By con tinuit y of the Itˆ o map Θ, it follo ws from the contin uous mapping theorem that z N = Θ ( x 0 ,N , c W N ) = ⇒ Θ( z 0 , W ) = z as N go es to infi nit y . T h is wea k con v ergence result is the principal resu lt of this article. 3.2. Gener al diffusion appr oximation. In this section w e state and prov e a prop osition con taining a general diffusion app r o ximation resu lt. Using th is, w e then prov e our m ain theorem in S ection 3.3 . T o this end , consider a gen- eral sequ ence of Marko v c h ains x N = { x k ,N } k ≥ 0 ev olving at stationarit y in the separable Hilb ert space H s , and int ro duce the drif t-martingale decom- p osition x k +1 ,N − x k ,N = h ( ℓ ) d N ( x k )∆ t + p 2 h ( ℓ )∆ t Γ k ,N , (3.11) where h ( ℓ ) > 0 is a constant parameter, and ∆ t is a time-step decreasing to 0 as N go es to infinity . Here d N and Γ k ,N are as defin ed ab ov e. W e in tro duce the r escaled pro cess W N ( t ) as in ( 3.7 ). Th e main diffusion appro ximation result is the f ollo wing. LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 19 Pr oposition 3.1 (General diffusion ap p ro xim ation for Marko v c h ains). Consider a se p ar able Hilb ert sp ac e ( H s , h· , ·i s ) and a se quenc e of H s -value d Markov chains x N = { x k ,N } k ≥ 0 with invariant distribution π N . Supp ose that the M arkov chains start at stationarity x 0 ,N D ∼ π N and that the drift- martingale de c omp osition ( 3.11 ) satisfies the fol lowing assumptions: (1) Conver genc e of initial c onditions: π N c onver ges in distribution to the pr ob ability me asur e π wher e π has a finite first moment, that is, E π [ k x k s ] < ∞ . (2) Invarianc e principle: the se quenc e ( x 0 ,N , W N ) , define d by e quation ( 3.7 ), c onver ges we akly in H s × C ([0 , T ] , H s ) to ( z 0 , W ) wher e z 0 D ∼ π , and W is a Br ownian motion in H s , indep endent fr om z 0 , with c ovarianc e op- er ator C s . (3) Conver genc e of the drift: Ther e exists a glob al ly Lipschitz function µ : H s → H s that satisfies lim N →∞ E π N [ k d N ( x ) − µ ( x ) k s ] = 0 . Then the se que nc e of r esc ale d interp olants z N ∈ C ([0 , T ] , H s ) , define d by e qu ation ( 1.4 ), c onver ges we akly in C ([0 , T ] , H s ) to z ∈ C ([0 , T ] , H s ) g i ven by dz dt = h ( ℓ ) µ ( z ( t )) + p 2 h ( ℓ ) dW dt , z (0) D ∼ π . Her e W is a Br ownian motion in H s with c ovarianc e C s and initial c ondition z 0 D ∼ π indep endent of W . Pr oof. Define ¯ z N ( t ) as in ( 3.9 ). It then follo w s that z N ( t ) = x 0 ,N + h ( ℓ ) Z t 0 d N ( ¯ z N ( u )) du + p 2 h ( ℓ ) W N ( t ) (3.12) = z 0 ,N + h ( ℓ ) Z t 0 µ ( z N ( u )) du + p 2 h ( ℓ ) c W N ( t ) , where the pro cess W N ∈ C ([0 , T ] , H s ) is defin ed by equation ( 3.7 ) and c W N ( t ) = W N ( t ) + r h ( ℓ ) 2 Z t 0 [ d N ( ¯ z N ( u )) − µ ( z N ( u ))] du. Define the Itˆ o m ap Θ : H s × C ([0 , T ]; H s ) → C ([0 , T ]; H s ) that maps ( z 0 , W ) to the unique solution z ∈ C ([0 , T ] , H s ) of the integ r al equation z ( t ) = z 0 + h ( ℓ ) Z t 0 µ ( z ( u )) du + p 2 h ( ℓ ) W ( t ) ∀ t ∈ [0 , T ] . 20 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y Equation ( 3.12 ) is thus equ iv alent to z N = Θ( x 0 ,N , c W N ). Th e pro of of the diffusion app ro ximation is accomplished through the follo win g steps: • The Itˆ o map Θ : H s × C ([0 , T ] , H s ) → C ([0 , T ] , H s ) is c ontinuous . Th is is Lemma 3 . 7 of [ 19 ]. • The p air ( x 0 ,N , c W N ) c onver ges we akly to ( z 0 , W ). In a separable Hilb ert space, if th e sequence { a n } n ∈ N con verge s weakl y to a , and the s equ ence { b n } n ∈ N con verge s in probability to 0 , then the sequen ce { a n + b n } n ∈ N con verge s we akly to a . It is assu med that ( x 0 ,N , W N ) con ve rges wea k ly to ( z 0 , W ) in H s × C ([0 , T ] , H s ). Consequentl y , to pr ov e that c W N con verge s w eakly to W , it suffices to prov e that R T 0 k d N ( ¯ z N ( u )) − µ ( z N ( u )) k s du con verge s in pr obabilit y to 0. F or any time k ∆ t ≤ u < ( k + 1)∆ t , the stationarit y of the chain sho w s that k d N ( ¯ z N ( u )) − µ ( ¯ z N ( u )) k s = k d N ( x k ,N ) − µ ( x k ,N ) k s D ∼ k d N ( x 0 ,N ) − µ ( x 0 ,N ) k s , k µ ( ¯ z N ( u )) − µ ( z N ( u )) k s ≤ k µ k Lip · k x k +1 ,N − x k ,N k s D ∼ k µ k Lip · k x 1 ,N − x 0 ,N k s , where in the last step w e h a ve used the fact that k ¯ z N ( u ) − z N ( u ) k s ≤ k x k +1 ,N − x k ,N k s . Consequently , E π N  Z T 0 k d N ( ¯ z N ( u )) − µ ( z N ( u )) k s du  ≤ T · E π N [ k d N ( x 0 ,N ) − µ ( x 0 ,N ) k s ] + T · k µ k Lip · E π N [ k x 1 ,N − x 0 ,N k s ] . The first term goes to zero since it is assumed that lim N E π N [ k d N ( x ) − µ ( x ) k s ] = 0 . S in ce T r H s ( C s ) < ∞ , the second term is of order O ( √ ∆ t ) and th u s also conv erges to 0. Therefore c W N con verge s we akly to W , h en ce the conclusion. • Continuous mapping ar gument . W e hav e pro ved that ( x 0 ,N , c W N ) con- v erges weakl y in H s × C ([0 , T ] , H s ) to ( z 0 , W ), and the Itˆ o map Θ : H s × C ([0 , T ] , H s ) → C ([0 , T ] , H s ) is a con tinuous fun ction. The con tinuous map- ping th eorem thus shows that z N = Θ ( x 0 ,N , c W N ) con verge s weakly to z = Θ( z 0 , W ), finish ing the pr o of of Pr op osition 3.1 .  3.3. Pr o of of main the or em. W e now pro v e Theorem 2.6 . The pro of con- sists of c h ec king that the conditions needed for Prop osition 3.1 to apply are satisfied by th e sequence of MALA Mark ov chains ( 2.20 ). The key estimates are pr o ve d later in S ection 4 . LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 21 (1) By Lemma 4.3 the s equ ence of probab ility measur es π N con verge s w eakly in H s to π . (2) Pr op osition 4.10 prov es that ( x 0 ,N , W N ) conv erges weakly in H × C ([0 , T ] , H s ) to ( z 0 , W ), where W is a Brownian motion with co v ariance C s indep en d en t fr om z 0 D ∼ π . (3) Lemma 4.7 states that d N ( x ), defined b y equation ( 3.3 ), satisfies lim N E π N [ k d N ( x ) − µ ( x ) k 2 s ] = 0 , and Prop osition 2.4 sho w s th at µ : H s → H s is a Lip sc hitz fu nction. The three assumptions needed for Lemma 3.1 to ap p ly are satisfied, whic h concludes the pro of of Th eorem 2.6 . 4. Key estimat es. Section 4.1 conta in s some tec hn ical lemmas of use throughout. In Section 4.2 we stud y the large N Gaussian approximat ion of the acceptance probabilit y , simultaneously establishing asymptotic indep en- dence of the cur r en t state of th e Mark ov c hain. T h is appr oximati on is then used in Sections 4.3 and 4.4 to gi v e quant itativ e v ersions of the heuristics ( 3.6 ). The section concludes with Section 4.5 in which w e prov e an inv ariance principle for W N giv en by ( 3.7 ). 4.1. T e chnic al lemmas. The first lemma sh o ws th at, for π 0 -almost ev ery function x ∈ H s , the appr o ximation µ N ( x ) ≈ µ ( x ) holds as N goes to infinity . Lemma 4.1 ( µ N con verge s π 0 -almost surely to µ ). L et Assumption 2.1 hold. The se quenc es of functions µ N : H s → H s satisfies π 0 n x ∈ H s : lim N →∞ k µ N ( x ) − µ ( x ) k s = 0 o = 1 . Pr oof. It is enough to v erify that for x ∈ H s , w e hav e lim N →∞ k P N x − x k s = 0 , (4.1) lim N →∞ kC P N ∇ Ψ( P N x ) − C ∇ Ψ( x ) k s = 0 . (4.2) • Let us pro ve equation ( 4.1 ). F or x ∈ H s , we hav e P j ≥ 1 j 2 s x 2 j < ∞ so that lim N →∞ k P N x − x k 2 s = lim N →∞ ∞ X j = N +1 j 2 s x 2 j = 0 . (4.3) • Let us pr o ve ( 4.2 ). The triangle in equalit y shows that kC P N ∇ Ψ( P N x ) − C ∇ Ψ( x ) k s ≤ kC P N ∇ Ψ( P N x ) − C P N ∇ Ψ( x ) k s + kC P N ∇ Ψ( x ) − C ∇ Ψ( x ) k s . The same p ro of as Lemma 2.4 rev eals th at C P N ∇ Ψ : H s → H s is globally Lipsc h itz, with a Lipschitz constant th at can b e c h osen indep endentl y 22 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y of N . Consequent ly , equ ation ( 4.3 ) sho ws that kC P N ∇ Ψ( P N x ) − C P N ∇ Ψ( x ) k s . k P N x − x k s → 0 . Also, z = ∇ Ψ( x ) ∈ H − s so that k∇ Ψ( x ) k 2 − s = P j ≥ 1 j − 2 s z 2 j < ∞ . The eigen v alues of C satisfy λ 2 j ≍ j − 2 κ with s < κ − 1 2 . Cons equ en tly , kC P N ∇ Ψ( x ) − C ∇ Ψ( x ) k 2 s = ∞ X j = N +1 j 2 s ( λ 2 j z j ) 2 . ∞ X j = N +1 j 2 s − 4 κ z 2 j = ∞ X j = N +1 j 4( s − κ ) j − 2 s z 2 j ≤ 1 ( N + 1) 4( κ − s ) k∇ Ψ( x ) k 2 − s → 0 .  The next lemma shows th at the size of the jump y − x is of order √ ∆ t . Lemma 4.2. Consider y gi v en by ( 2.17 ). Under Assumption 2.1 , for any p ≥ 1 , we have E π N x [ k y − x k p s ] . (∆ t ) p/ 2 · (1 + k x k p s ) . Pr oof. Under Assu mption 2.1 the function µ N is globally Lipsc hitz on H s , with Lip s c hitz constan t that can b e c h osen ind ep endently of N . Thus k y − x k s . ∆ t (1 + k x k s ) + √ ∆ t kC 1 / 2 ξ N k s . W e ha ve E π 0 [ kC 1 / 2 ξ N k p s ] ≤ E π 0 [ k ζ k p s ] < ∞ , where ζ D ∼ N(0 , C ). F r om F er- nique’s theorem [ 12 ], it follo ws that E π 0 [ k ζ k p s ] < ∞ . Consequen tly , E π 0 [ kC 1 / 2 ξ N k p s ] is un iformly b oun ded as a f unction of N , pr o ving the lemma.  The normalizing constan ts M Ψ N are u niformly b ounded, and we u se this fact to obtain uniform b ound s on momen ts of functionals in H un der π N . Moreo v er, we prov e that the sequence of p robabilit y measures π N on H s con verge s wea kly in H s to π . Lemma 4.3 (Finite d imensional approximat ion π N of π ). Under As- sumption 2.1 the normalization c onstants M Ψ N ar e uniformly b ounde d so that for any me asur able functional f : H 7→ R , we have E π N [ | f ( x ) | ] . E π 0 [ | f ( x ) | ] . Mor e over, the se quenc e of pr ob ability me asur e π N satisfies π N = ⇒ π , wher e = ⇒ denotes we ak c onver genc e in H s . LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 23 Pr oof. The first p art is cont ained in Lemma 3 . 5 of [ 19 ]. Let u s prov e that π N = ⇒ π . W e n eed to sh o w that for any b ounded contin uous function g : H s → R we h a ve lim N →∞ E π N [ g ( x )] = E π [ g ( x )] wh er e E π N [ g ( x )] = E π N 0 [ g ( x ) M Ψ N e − Ψ N ( x ) ] = E π 0 [ g ( P N x ) M Ψ N e − Ψ( P N x ) ] . Since g is b ou n ded, Ψ is lo we r b ounded , and since the norm alization con- stan ts are u niformly b ounded, the dominated con ve rgence theorem shows that it suffices to show th at g ( P N x ) M Ψ N e − Ψ( P N x ) con verge s π 0 -almost surely to g ( x ) M Ψ e − Ψ( x ) . F or this in turn it suffices to show that Ψ ( P N x ) conv erges π 0 -almost s urely to Ψ ( x ) , as this also prov es almost sure con vergence of the normalization constants. By ( 2.7 ) we ha ve | Ψ( P N x ) − Ψ( x ) | . (1 + k x k s + k P N x k s ) k P N x − x k s . But lim N →∞ k P N x − x k s → 0 for any x ∈ H s , b y dominated con v ergence, and the result follo ws.  F ernique’s theorem [ 12 ] states that for any exp onent p ≥ 0, we ha ve E π 0 [ k x k p s ] < ∞ . It th us follo ws fr om Lemma 4.3 that for an y p ≥ 0, sup N { E π N [ k x k p s ] : N ∈ N } < ∞ . This estimate is rep eatedly used in the s equ el. 4.2. Gaussian appr oximation of Q N . Recall the quan tit y Q N defined in equation ( 2.21 ). Th is section p ro ves that Q N has a Gaussian b ehavior in the sense th at Q N ( x, ξ N ) = Z N ( x, ξ N ) + i N ( x, ξ N ) + e N ( x, ξ N ) , (4.4) where the quanti ties Z N and i N are equal to Z N ( x, ξ N ) = − ℓ 3 4 − ℓ 3 / 2 √ 2 N − 1 / 2 N X j =1 λ − 1 j ξ j x j , (4.5) i N ( x, ξ N ) = 1 2 ( ℓ ∆ t ) 2 ( k x k 2 C N − k ( C N ) 1 / 2 ξ N k 2 C N ) (4.6) with i N and e N small. Thus the principal con trib utions to Q N comes from the random v ariable Z N ( x, ξ N ). Notice that, for eac h fixed x ∈ H s , the random v ariable Z N ( x, ξ N ) is Gaussian. F urth ermore, the Karhunen–Lo ` eve expansion of π 0 sho w s that for π 0 -almost ev ery choic e of function x ∈ H the sequence { Z N ( x, ξ N ) } N ≥ 1 con verge s in la w to the distribution of Z ℓ D ∼ N( − ℓ 3 4 , ℓ 3 2 ). T h e next lemma rigorously b ounds the err or terms e N ( x, ξ N ) 24 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y and i N ( x, ξ N ): w e sho w th at i N is an er r or term of order O ( N − 1 / 6 ) and e N ( x, ξ ) is an error term of ord er O ( N − 1 / 3 ). I n Lemma 4.5 w e then qu an- tify the con v ergence of Z N ( x, ξ N ) to Z ℓ . Lemma 4.4 (Gaussian appr oximati on). L et p ≥ 1 b e an inte ger. Under Assumption 2.1 , the err or terms i N and e N in the Gaussian appr oximation ( 4.4 ) satisfy ( E π N [ | i N ( x, ξ N ) | p ]) 1 /p = O ( N − 1 / 6 ) and (4.7) ( E π N [ | e N ( x, ξ N ) | p ]) 1 /p = O ( N − 1 / 3 ) . Pr oof. F or notational clarit y , without loss of generalit y , we s u pp ose p = 2 q . Th e qu an tit y Q N is defined in equation ( 2.21 ), and expandin g terms leads to Q N ( x, ξ N ) = I 1 + I 2 + I 3 , where the quanti ties I 1 , I 2 and I 3 are giv en by I 1 = − 1 2 ( k y k 2 C N − k x k 2 C N ) − 1 4 ℓ ∆ t ( k x − y (1 − ℓ ∆ t ) k 2 C N − k y − x (1 − ℓ ∆ t ) k 2 C N ) , I 2 = − (Ψ N ( y ) − Ψ N ( x )) − 1 2 ( h x − y (1 − ℓ ∆ t ) , C N ∇ Ψ N ( y ) i C N − h y − x (1 − ℓ ∆ t ) , C N ∇ Ψ N ( x ) i C N ) , I 3 = − ℓ ∆ t 4 {kC N ∇ Ψ N ( y ) k 2 C N − kC N ∇ Ψ N ( x ) k 2 C N } . The term I 1 arises p urely from the Gauss ian part of the target measure π N and from the Gaussian part of the prop osal. The t wo other terms I 2 and I 3 come fr om the c hange of probability in volving the fun ctional Ψ N . W e start b y simp lifying the expression for I 1 , and then retur n to estimate the terms I 2 and I 3 : I 1 = − 1 2 ( k y k 2 C N − k x k 2 C N ) − 1 4 ℓ ∆ t ( k ( x − y ) + ℓ ∆ ty k 2 C N − k ( y − x ) + ℓ ∆ t x k 2 C N ) = − 1 2 ( k y k 2 C N − k x k 2 C N ) − 1 4 ℓ ∆ t (2 ℓ ∆ t [ k x k 2 C N − k y k 2 C N ] + ( ℓ ∆ t ) 2 [ k y k 2 C N − k x k 2 C N ]) = − ℓ ∆ t 4 ( k y k 2 C N − k x k 2 C N ) . LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 25 The term I 1 is O (1) and constitutes the main con trib u tion to Q N . Before analyzing I 1 in more detail, we sho w that I 2 and I 3 are O ( N − 1 / 3 ). ( E π N [ I 2 q 2 ]) 1 / (2 q ) = O ( N − 1 / 3 ) and ( E π N [ I 2 q 3 ]) 1 / (2 q ) = O ( N − 1 / 3 ) . (4.8) • W e expand I 2 and use the b oun d on the remainder of the T a ylor expansion of Ψ describ ed in equation ( 2.15 ), I 2 = −{ Ψ N ( y ) − [Ψ N ( x ) + h∇ Ψ N ( x ) , y − x i ] } + 1 2 h y − x , ∇ Ψ N ( y ) − ∇ Ψ N ( x ) i + ℓ ∆ t 2 {h x, ∇ Ψ N ( x ) i − h y , ∇ Ψ N ( y ) i} = A 1 + A 2 + A 3 . Equation ( 2.15 ) and Lemma 4.2 s ho w that E π N [ A 2 q 1 ] . E π N [ k y − x k 4 q s ] . (∆ t ) 2 q E π N [1 + k x k 4 q s ] . (∆ t ) 2 q = ( N − 1 / 3 ) 2 q , where we ha ve used the fact that E π N [ k x k 4 q s ] . E π 0 [ k x k 4 q s ] < ∞ . Assum p - tion 2.1 states that ∂ 2 Ψ is uniformly b ounded in L ( H s , H − s ) so that k∇ Ψ( y ) − ∇ Ψ( y ) k − s =     Z 1 0 ∂ 2 Ψ( x + t ( y − x )) · ( y − x ) dt     − s ≤ Z 1 0 k ∂ 2 Ψ( x + t ( y − x )) · ( y − x ) k − s dt (4.9) ≤ M 4 Z 1 0 k y − x k s dt. This pr ov es that k∇ Ψ N ( y ) − ∇ Ψ N ( x ) k − s . k y − x k s . Consequently , Lem- ma 4.2 sho ws that E π N [ A 2 q 2 ] . E π N [ k y − x k 2 q s · k∇ Ψ N ( y ) − ∇ Ψ N ( x ) k 2 q − s ] . E π N [ k y − x k 4 q s ] . (∆ t ) 2 q E π N [1 + k x k 4 q s ] . (∆ t ) 2 = ( N − 1 / 3 ) 2 q . Under Assu m ption 2.1 , for an y z ∈ H s w e hav e k∇ Ψ N ( z ) k − s . 1 + k z k s . Therefore E π N [ A 2 q 3 ] . (∆ t ) 2 q . Putting these estimates together, ( E π N [ I 2 q 2 ]) 1 / (2 q ) . ( E π N [ A 2 q 1 + A 2 q 2 + A 2 q 3 ]) 1 / (2 q ) = O ( N − 1 / 3 ) . 26 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y • Lemma 2.4 states C N ∇ Ψ N : H s → H s is globally Lipsc hitz, with a Lips- c hitz constant that can b e chose n uniformly in N . Th erefore, kC N ∇ Ψ N ( z ) k s . 1 + k z k s . (4.10) Since kC N ∇ Ψ N ( z ) k 2 C N = h∇ Ψ N ( z ) , C N ∇ Ψ N ( z ) i , b ound ( 2.7 ) giv es E π N [ I 2 q 3 ] . ∆ t 2 q E [ h∇ Ψ N ( x ) , C N ∇ Ψ N ( x ) i q + h∇ Ψ N ( y ) , C N ∇ Ψ N ( y ) i q ] . ∆ t 2 q E π N [(1 + k x k s ) 2 q + (1 + k y k s ) 2 q ] . ∆ t 2 q E π N [1 + k x k 2 q s + k y k 2 q s ] . ∆ t 2 q = ( N − 1 / 3 ) 2 q , whic h concludes the pro of of equ ation ( 4.8 ). W e no w simplify fu rther the expression for I 1 and demonstrate th at it has a Gauss ian b eha v ior. W e u se the defin ition of th e pr op osal y , giv en in equation ( 2.17 ), to expand I 1 . F or x ∈ X N w e hav e P N x = x . Therefore, for x ∈ X N , I 1 = − ℓ ∆ t 4 ( k (1 − ℓ ∆ t ) x − ℓ ∆ t C N ∇ Ψ N ( x ) + √ 2 ℓ ∆ t ( C N ) 1 / 2 ξ N k 2 C N − k x k 2 C N ) = Z N ( x, ξ N ) + i N ( x, ξ N ) + B 1 + B 2 + B 3 + B 4 , with Z N ( x, ξ N ) and i N ( x, ξ N ) giv en by equation ( 4.5 ) and ( 4.6 ) and B 1 = ℓ 3 4  1 − k x k 2 C N N  , B 2 = − ℓ 3 4 N − 1 {kC N ∇ Ψ N ( x ) k 2 C N + 2 h x, ∇ Ψ N ( x ) i} , B 3 = ℓ 5 / 2 √ 2 N − 5 / 6 h x + C N ∇ Ψ N ( x ) , ( C N ) 1 / 2 ξ N i C N , B 4 = ℓ 2 2 N − 2 / 3 h x, ∇ Ψ N ( x ) i . The quan tity Z N is the leading term. F or eac h fixed v alue of x ∈ H s , the term Z N ( x, ξ N ) is Gaussian. Belo w, we p ro ve that quan tit y i N is O ( N − 1 / 6 ). W e no w establish that eac h B j is O ( N − 1 / 3 ), ( E π N [ B 2 q j ]) 1 / (2 q ) = O ( N − 1 / 3 ) j = 1 , . . . , 4 . (4.11) • Lemma 4.3 sh o ws that E π N [(1 − k x k 2 C N N ) 2 q ] . E π 0 [(1 − k x k 2 C N N ) 2 q ]. Under π 0 , k x k 2 C N N D ∼ ρ 2 1 + · · · + ρ 2 N N , where ρ 1 , . . . , ρ N are i.i.d. N(0 , 1) Gaussian rand om v ariables. Consequen tly , E π N [ B 2 q 1 ] 1 / (2 q ) = O ( N − 1 / 2 ). LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 27 • The term kC N ∇ Ψ N ( x ) k 2 q C N has already b een b ounded while pro ving E π N [ I 2 q 3 ] . ( N − 1 / 3 ) 2 q . Equation ( 2.7 ) giv es the b ound k∇ Ψ N ( x ) k − s . 1 + k x k s and sho w s that E π N [ h x, ∇ Ψ N ( x ) i 2 q ] is uniform ly b oun ded as a fun ction of N . Consequentl y , E π N [ B 2 q 2 ] 1 / (2 q ) = O ( N − 1 ) . • W e hav e hC N ∇ Ψ N ( x ) , ( C N ) 1 / 2 ξ N i C N = h∇ Ψ N ( x ) , ( C N ) 1 / 2 ξ N i so that E π N [ hC N ∇ Ψ N ( x ) , ( C N ) 1 / 2 ξ N i 2 q C N ] . E π N [ k∇ Ψ N ( x ) k 2 q − s · k ( C N ) 1 / 2 ξ N k 2 q s ] . 1 . By Lemma 4.3 , one can sup p ose x D ∼ π 0 , h x, ( C N ) 1 / 2 ξ N i C N D ∼ N X j =1 ρ j ξ j , where ρ 1 , . . . , ρ N are i.i.d. N(0 , 1) Gaussian rand om v ariables. Consequen tly ( E π N [ h x, ( C N ) 1 / 2 ξ N i 2 q C N ]) 1 / (2 q ) = O ( N 1 / 2 ), whic h prov es that ( E π N [ B 2 q 3 ]) 1 / (2 q ) = O ( N − 5 / 6+1 / 2 ) = O ( N − 1 / 3 ) . • The b ound k∇ Ψ N ( x ) k − s . 1 + k x k s ensures that ( E π N [ B 2 q 4 ]) 1 / (2 q ) = O ( N − 2 / 3 ). Define the quantit y e N ( x, ξ N ) = I 2 + I 3 + B 1 + B 2 + B 3 + B 4 so that Q N can also b e expr essed as Q N ( x, ξ N ) = Z N ( x, ξ N ) + i N ( x, ξ N ) + e N ( x, ξ N ) . Equations ( 4.8 ) and ( 4.11 ) show that e N satisfies ( E π N [ e N ( x, ξ N ) 2 q ]) 1 / (2 q ) = O ( N − 1 / 3 ) . W e no w p ro ve that i N is O ( N − 1 / 6 ). By Lemma 4.3 , E π N [ i N ( x, ξ N ) 2 q ] . E π 0 [ i N ( x, ξ N ) 2 q ]. If x D ∼ π 0 , we h a ve i N ( x, ξ N ) = ℓ 2 2 N − 2 / 3 {k x k 2 C N − k ( C N ) 1 / 2 ξ N k 2 C N } = ℓ 2 2 N − 2 / 3 N X j =1 ( ρ 2 j − ξ 2 j ) , where ρ 1 , . . . , ρ N are i. i.d. N(0 , 1) Gaussian random v ariables. Sin ce E [ { P N j =1 ( ρ 2 j − ξ 2 j ) } 2 q ] . N q , it follo ws that ( E π N [ i N ( x, ξ N ) 2 q ]) 1 / (2 q ) = O ( N − 2 / 3+1 / 2 ) = O ( N − 1 / 6 ) , (4.12) whic h ends the p ro of of Lemm a 4.4 .  28 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y The next lemma quant ifies the fact that Z N ( x, ξ N ) is asymp totical ly in- dep end en t from the cur ren t p osition x . Lemma 4.5 (Asymp totic in dep end ence). L et p ≥ 1 b e a p ositive inte- ger and f : R → R b e a 1 -Lipschitz function. Consider err or terms e N ⋆ ( x, ξ ) satisfying lim N →∞ E π N [ e N ⋆ ( x, ξ N ) p ] = 0 . Define the f u nctions ¯ f N : R → R and the c onstant ¯ f ∈ R by ¯ f N ( x ) = E x [ f ( Z N ( x, ξ N ) + e N ⋆ ( x, ξ N ))] and ¯ f = E [ f ( Z ℓ )] . Then the function f N is highly c onc entr ate d ar ound its me an in the sense that lim N →∞ E π N [ | ¯ f N ( x ) − ¯ f | p ] = 0 . Pr oof. Let f b e a 1 -Lipsc h itz fun ction. Define the fun ction F : R × [0; ∞ ) → R b y F ( µ, σ ) = E [ f ( ρ µ,σ )] where ρ µ,σ D ∼ N( µ, σ 2 ) . The fun ction F satisfies | F ( µ 1 , σ 1 ) − F ( µ 2 , σ 2 ) | . | µ 2 − µ 1 | + | σ 2 − σ 1 | , (4.13) for any c hoice µ 1 , µ 2 ∈ R an d σ 1 , σ 2 ≥ 0. Ind eed, | F ( µ 1 , σ 1 ) − F ( µ 2 , σ 2 ) | = | E [ f ( µ 1 + σ 1 ρ 0 , 1 ) − f ( µ 2 + σ 2 ρ 0 , 1 )] | ≤ E [ | µ 2 − µ 1 | + | σ 2 − σ 1 | · | ρ 0 , 1 | ] . | µ 2 − µ 1 | + | σ 2 − σ 1 | . W e ha ve E x [ Z N ( x, ξ N )] = E [ Z ℓ ] = − ℓ 3 4 while th e v ariances are given by V ar[ Z N ( x, ξ N )] = ℓ 3 2 k x k 2 C N N and V ar[ Z ℓ ] = ℓ 3 2 . Therefore, u sing Lemma 4.3 , E π N [ | ¯ f N ( x ) − ¯ f | p ] = E π N [ | E x [ f ( Z N ( x, ξ N ) + e N ⋆ ( x, ξ N )) − f ( Z ℓ )] | p ] . E π N [ | E x [ f ( Z N ( x, ξ N )) − f ( Z ℓ )] | p ] + E π N [ | e N ⋆ ( x, ξ N ) | p ] = E π N      F  − ℓ 3 4 , V ar[ Z N ( x, ξ N )] 1 / 2  − F  − ℓ 3 4 , V ar [ Z ℓ ] 1 / 2      p  LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 29 + E π N [ | e N ⋆ ( x, ξ N ) | p ] . E π N [ | V ar[ Z N ( x, ξ N )] 1 / 2 − V ar[ Z ℓ ] 1 / 2 | p ] + E π N [ | e N ⋆ ( x, ξ N ) | p ] . E π 0      k x k 2 C N N  1 / 2 − 1     p + E π N [ | e N ⋆ ( x, ξ N ) | p ] → 0 . In th e last step we ha v e u sed the fact that if x D ∼ π 0 , th en k x k 2 C N N D ∼ ρ 2 1 + ··· + ρ 2 N N where ρ 1 , . . . , ρ N are i.i.d. Gaussian r andom v ariables N(0 , 1) so that E π 0 |{ k x k 2 C N N } 1 / 2 − 1 | p → 0.  Corollar y 4.6. L et p ≥ 1 b e a p ositive. The lo c al me an ac c eptanc e pr ob ability α N ( x ) , define d in e quation ( 2.19 ), satisfies lim N →∞ E π N [ | α N ( x ) − α ( ℓ ) | p ] = 0 . Pr oof. The function f ( z ) = 1 ∧ e z is 1 -Lipsc h itz and α ( ℓ ) = E [ f ( Z ℓ )]. Also, α N ( x ) = E x [ f ( Q N ( x, ξ N ))] = E x [ f ( Z N ( x, ξ N ) + e N ⋆ ( x, ξ N ))] with e N ⋆ ( x, ξ N ) = i N ( x, ξ N ) + e N ( x, ξ N ). Lemma 4.4 sho ws that lim N →∞ E π N [ e N ⋆ ( x, ξ ) p ] = 0, and th erefore Lemma 4.5 giv es the conclusion.  4.3. Drift appr oximation. This section pr o ves that the appro x im ate d r ift function d N : H s → H s defined in equation ( 3.3 ) con verge s to the drif t fu nc- tion µ : H s → H s of the limiting d iffusion ( 2.23 ). Lemma 4.7 (Drift ap p ro xim ation). L et Assumption 2.1 hold. The drift function d N : H s → H s c onver ges to µ in the sense that lim N →∞ E π N [ k d N ( x ) − µ ( x ) k 2 s ] = 0 . Pr oof. The appr o ximate d rift d N is giv en b y equation ( 3.3 ). The def- inition of the lo cal mean acceptance probabilit y α N ( x ), gi v en b y equation ( 2.19 ), sh o w that d N can also b e expr essed as d N ( x ) = ( α N ( x ) α ( ℓ ) − 1 ) µ N ( x ) + √ 2 ℓh ( ℓ ) − 1 (∆ t ) − 1 / 2 ε N ( x ) , where µ N ( x ) = − ( P N x + C N ∇ Ψ N ( x )), and the term ε N ( x ) is d efined by ε N ( x ) = E x [ γ N ( x, ξ N ) C 1 / 2 ξ N ] = E x [(1 ∧ e Q N ( x,ξ N ) ) C 1 / 2 ξ N ] . 30 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y T o pro ve Lemma 4.7 , it su ffi ces to v erify that lim N →∞ E π N [ k ( α N ( x ) α ( ℓ ) − 1 ) µ N ( x ) − µ ( x ) k 2 s ] = 0 , (4.14) lim N →∞ (∆ t ) − 1 E π N [ k ε N ( x ) k 2 s ] = 0 . (4.15) • Let us firs t pro ve equ ation ( 4.14 ). The triangle inequalit y and the Cauch y– Sc hw arz inequalit y sh o w that ( E π N [ k ( α N ( x ) α ( ℓ ) − 1 ) µ N ( x ) − µ ( x ) k 2 s ]) 2 . E [ | α N ( x ) − α ( ℓ ) | 4 ] · E π N [ k µ N ( x ) k 4 s ] + E π N [ k µ N ( x ) − µ ( x ) k 4 s ] . By Remark 2.5 , µ N : H s → H s is Lipschitz, w ith a Lipsc hitz constan t that can b e chosen indep end ent of N . It follo ws that su p N E π N [ k µ N ( x ) k 4 s ] < ∞ . Lemma 4.5 and C orollary 4.6 show that E [ | α N ( x ) − α ( ℓ ) | 4 ] → 0. Therefore, lim N →∞ E [ | α N ( x ) − α ( ℓ ) | 4 ] · E π N [ k µ N ( x ) k 4 s ] = 0 . The f unctions µ N and µ are globally Lips chitz on H s , with a Lipschitz con- stan t that can b e c hosen ind ep endently of N , so that k µ N ( x ) − µ ( x ) k 4 s . (1 + k x k 4 s ). Lemma 4.1 pro ves that the sequence of functions { µ N } con- v erges π 0 -almost su r ely to µ ( x ) in H s , and Lemma 4.3 sh o ws that E π N [ k µ N ( x ) − µ ( x ) k 4 s ] . E π 0 [ k µ N ( x ) − µ ( x ) k 4 s ]. It thus follo ws fr om the dominated conv ergence theorem that lim N →∞ E π N [ k µ N ( x ) − µ ( x ) k 4 s ] = 0 . This concludes the p ro of of equ ation ( 4.14 ). • Let us pro v e equation ( 4.15 ). If the Bernoulli rand om v ariable γ N ( x, ξ N ) w ere indep en den t from the n oise term ( C N ) 1 / 2 ξ N , it w ould follo w that ε N ( x ) = 0. In general γ N ( x, ξ N ) is not ind ep endent from ( C N ) 1 / 2 ξ N so that ε N ( x ) is not equal to zero. Nev ertheless, as quanti fied b y Lemma 4.5 , th e Bernoulli random v ariable γ N ( x, ξ N ) is asymp totically indep endent from the curren t p osition x and fr om the noise term ( C N ) 1 / 2 ξ N . C onsequent ly , w e can pr o ve in equation ( 4.17 ) th at th e quantit y ε N ( x ) is small. T o this end, we establish that eac h comp onent h ε ( x ) , ˆ ϕ j i 2 s satisfies E π N [ h ε N ( x ) , ˆ ϕ j i 2 s ] . N − 1 E π N [ h x, ˆ ϕ j i 2 s ] + N − 2 / 3 ( j s λ j ) 2 . (4.16) Summation of equation ( 4.16 ) o ver j = 1 , . . . , N leads to E π N [ k ε N ( x ) k 2 s ] . N − 1 E π N [ k x k 2 s ] + N − 2 / 3 T r H s ( C s ) . N − 2 / 3 , (4.17) LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 31 whic h giv es the pr o of of equation ( 4.1 5 ). T o pr o ve equation ( 4.16 ) for a fixed index j ∈ N , the q u an tity Q N ( x, ξ ) is decomp osed as a su m of a term, indep en d en tly from ξ j , and another remaining term of small magnitude. T o this end w e introdu ce        Q N ( x, ξ N ) = Q N j ( x, ξ N ) + Q N j, ⊥ ( x, ξ N ) , Q N j ( x, ξ N ) = − 1 √ 2 ℓ 3 / 2 N − 1 / 2 λ − 1 j x j ξ j − 1 2 ℓ 2 N − 2 / 3 λ 2 j ξ 2 j + e N ( x, ξ N ) . (4.18) The definitions of Z N ( x, ξ N ) and i N ( x, ξ N ) in equations ( 4. 5 ) and ( 4.6 ) readily sho w that Q N j, ⊥ ( x, ξ N ) is indep en den t from ξ j . Th e noise term sat- isfies C 1 / 2 ξ N = P N j =1 ( j s λ j ) ξ j ˆ ϕ j . Sin ce Q N j, ⊥ ( x, ξ N ), and ξ j are indep end ent and z 7→ 1 ∧ e z is 1 -Lipsc h itz, it follo ws that h ε N ( x ) , ˆ ϕ j i 2 s = ( j s λ j ) 2 ( E x [(1 ∧ e Q N ( x,ξ N ) ) ξ j ]) 2 = ( j s λ j ) 2 ( E x [[(1 ∧ e Q N ( x,ξ N ) ) − (1 ∧ e Q N j, ⊥ ( x,ξ N ) )] ξ j ]) 2 . ( j s λ j ) 2 E x [ | Q N ( x, ξ N ) − Q N j, ⊥ ( x, ξ N ) | 2 ] = ( j s λ j ) 2 E x [ Q N j ( x, ξ N ) 2 ] . By Lemma 4.4 E π N [ e N ( x, ξ N ) 2 ] . N − 2 / 3 . Th erefore, ( j s λ j ) 2 E π N [ Q N j ( x, ξ N ) 2 ] . ( j s λ j ) 2 { N − 1 λ − 2 j E π N [ x 2 j ξ 2 j ] + N − 4 / 3 E π N [ λ 4 j ξ 4 j ] + E π N [ e N ( x, ξ ) 2 ] } . N − 1 E π N [( j s x j ) 2 ξ 2 j ] + ( j s λ j ) 2 ( N − 4 / 3 + N − 2 / 3 ) . N − 1 E π N [ h x, ˆ ϕ j i 2 s ] + ( j s λ j ) 2 N − 2 / 3 . N − 1 E π N [ h x, ˆ ϕ j i 2 s ] + ( j s λ j ) 2 N − 2 / 3 , whic h finish es the pr o of of equation ( 4.16 ).  4.4. Noise appr oximation. Recall defi n ition ( 3.4 ) of the martingale dif- ference Γ k ,N . In this section we estimate th e error in the ap p ro ximation Γ k ,N ≈ N(0 , C s ). T o this end we in tr o duce the co v ariance op erator D N ( x ) = E x [Γ k ,N ⊗ H s Γ k ,N | x k ,N = x ] . F or any x, u, v ∈ H s , the op erator D N ( x ) satisfies E [ h Γ k ,N , u i s h Γ k ,N , v i s | x k ,N = x ] = h u, D N ( x ) v i s . The next lemma giv es a qu an titativ e v ersion of the appr o ximation D N ( x ) ≈ C s . 32 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y Lemma 4.8. L et Assumption 2.1 hold. F or any p air of indic es i, j ≥ 0 , the op er ator D N ( x ) : H s → H s satisfies lim N →∞ E π N |h ˆ ϕ i , D N ( x ) ˆ ϕ j i s − h ˆ ϕ i , C s ˆ ϕ j i s | = 0 (4.19) and, furthermor e, lim N →∞ E π N | T r H s ( D N ( x )) − T r H s ( C s ) | = 0 . (4.20) Pr oof. The martingale difference Γ N ( x, ξ ) is giv en by Γ N ( x, ξ ) = α ( ℓ ) − 1 / 2 γ N ( x, ξ ) C 1 / 2 ξ (4.21) + 1 √ 2 α ( ℓ ) − 1 / 2 ( ℓ ∆ t ) 1 / 2 { γ N ( x, ξ ) µ N ( x ) − α ( ℓ ) d N ( x ) } . W e only prov e equation ( 4.20 ); the p ro of of equation ( 4.19 ) is essential ly iden tical, but easier. Remark 2.5 shows that the functions µ, µ N : H s → H s are globally Lip sc hitz, and Lemma 4.7 sh ows that E π N [ k d N ( x ) − µ ( x ) k 2 s ] → 0. Therefore E π N [ k γ N ( x, ξ ) µ N ( x ) − α ( ℓ ) d N ( x ) k 2 s ] . 1 , (4.22) whic h implies that the second term on the r igh t-hand side of equation ( 4.21 ) is O ( √ ∆ t ). Sin ce T r H s ( D N ( x )) = E x [ k Γ N ( x, ξ ) k 2 s ], equation ( 4.22 ) implies that E π N [ | α ( ℓ ) T r H s ( D N ( x )) − E x [ k γ N ( x, ξ ) C 1 / 2 ξ k 2 s ] | ] . (∆ t ) 1 / 2 . Consequent ly , to p ro ve equation ( 4.20 ), it suffices to v er if y that lim N →∞ E π N [ | E x [ k γ N ( x, ξ ) C 1 / 2 ξ k 2 s ] − α ( ℓ ) T r H s ( C s ) | ] = 0 . (4.23) W e ha ve E x [ k γ N ( x, ξ ) C 1 / 2 ξ k 2 s ] = P N j =1 ( j s λ j ) 2 E x [(1 ∧ e Q N ( x,ξ ) ) ξ 2 j ]. Th erefore, to prov e equation ( 4.23 ), it suffices to establish lim N →∞ N X j =1 ( j s λ j ) 2 E π N [ | E x [(1 ∧ e Q N ( x,ξ ) ) ξ 2 j ] − α ( ℓ ) | ] = 0 . (4.24) Since P ∞ j =1 ( j s λ j ) 2 < ∞ and | 1 ∧ e Q N ( x,ξ ) | ≤ 1 , the dominated conv ergence theorem sh o ws that ( 4.24 ) f ollo ws f rom lim N →∞ E π N [ | E x [(1 ∧ e Q N ( x,ξ ) ) ξ 2 j ] − α ( ℓ ) | ] = 0 ∀ j ≥ 0 . (4.25) LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 33 W e no w pr o ve equation ( 4.25 ). As in the p ro of of Lemm a 4.7 , w e use the decomp osition Q N ( x, ξ ) = Q N j ( x, ξ ) + Q N j, ⊥ ( x, ξ ) w here Q N j, ⊥ ( x, ξ ) is indep en - den t from ξ j . Therefore, since Lip ( f ) = 1 , E x [(1 ∧ e Q N ( x,ξ ) ) ξ 2 j ] = E x [(1 ∧ e Q N j, ⊥ ( x,ξ ) ) ξ 2 j ] + E x [[(1 ∧ e Q N ( x,ξ ) ) − (1 ∧ e Q N j, ⊥ ( x,ξ ) )] ξ 2 j ] = E x [1 ∧ e Q N j, ⊥ ( x,ξ ) ] + O ( { E x [ | Q N ( x, ξ ) − Q N j, ⊥ ( x, ξ ) | 2 ] } 1 / 2 ) = E x [1 ∧ e Q N j, ⊥ ( x,ξ ) ] + O ( { E x [ Q N j ( x, ξ ) 2 ] } 1 / 2 ) . Lemma 4.5 ensures that, for f ( · ) = 1 ∧ exp( · ) , lim N →∞ E π N [ | E x [ f ( Q N j, ⊥ ( x, ξ ))] − α ( ℓ ) | ] = 0 , and the d efinition of Q N i ( x, ξ ) readily sho w s that lim N →∞ E π N [ Q N j ( x, ξ ) 2 ] = 0. This concludes the p r o of of equatio n ( 4.25 ) and th us ends the pro of of Lemma 4.8 .  Corollar y 4.9. Mor e g ener al ly, for any fixe d ve ctor h ∈ H s , the fol- lowing limit holds: lim N →∞ E π N |h h, D N ( x ) h i s − h h, C s h i s | = 0 . (4.26) Pr oof. If h = ˆ ϕ i , this is precisely the con tent of Prop osition 3.1 . More generally , by linearit y , Prop osition 3.1 sho ws that th is is true for h = P i ≤ N α i ˆ ϕ i , where N ∈ N is a fixed int eger. F or a general v ector h ∈ H s , we can us e the decomp osition h = h ∗ + e ∗ where h ∗ = P j ≤ N h h, ˆ ϕ j i s ˆ ϕ j and e ∗ = h − h ∗ . It follo ws that | ( h h, D N ( x ) h i s − h h, C s h i s ) − ( h h ∗ , D N ( x ) h ∗ i s − h h ∗ , C s h ∗ i s ) | ≤ |h h + h ∗ , D N ( x )( h − h ∗ ) i s − h h + h ∗ , C s ( h − h ∗ ) i s | ≤ 2 k h k s · k h − h ∗ k s · (T r H s ( D N ( x )) + T r H s ( C s )) , where w e ha ve us ed th e fact that for an nonnegativ e self-adjoint op erator D : H s → H s w e h a ve h u, D v i s ≤ k u k s · k v k s · T r H s ( D ). Prop osition 3.1 sh o ws that E π N [T r H s ( D N ( x ))] < ∞ , and Ass u mption 2.1 ensures that T r H s ( C s ) < ∞ . Consequently , lim N →∞ E π N |h h, D N ( x ) h i − h h, C s h i| . lim N →∞ E π N |h h ∗ , D N ( x ) h ∗ i − h h ∗ , C s h ∗ i| + k h − h ∗ k s = k h − h ∗ k s . Since k h − h ∗ k s can b e chosen arbitrarily small, the conclusion follo w s.  34 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y 4.5. Martingale invarianc e principle. This section p ro ves that the pro- cess W N defined in equation ( 3.7 ) conv erges to a Br ownian motion. Pr oposition 4.10. L et Assumption 2.1 hold. L et z 0 ∼ π and W N ( t ) , the pr o c ess define d in e q uation ( 3.7 ), and x 0 ,N D ∼ π N , the starting p osition of the Markov chain x N . Then ( x 0 ,N , W N ) = ⇒ ( z 0 , W ) , (4.27) wher e = ⇒ denotes we ak c onver genc e in H s × C ([0 , T ]; H s ) , and W is a H s - value d Br ownian motion with c ovarianc e op er ator C s . F urthermor e the lim- iting Br ownian motion W is indep endent of the i ni tial c ondition z 0 . Pr oof. As a first step, w e s h o w that W N con verge s w eakly to W . As describ ed in [ 19 ], a consequence of Prop osition 5 . 1 of [ 3 ] s ho ws that in order to prov e that W N con verge s w eakly to W in C ([0 , T ]; H s ), it suffices to pro ve th at for an y t ∈ [0 , T ] and any pair of ind ices i, j ≥ 0 the follo w in g three limits hold in probability , the th ir d for an y ε > 0: lim N →∞ ∆ t k N ( T ) X k =1 E [ k Γ k ,N k 2 s |F k ,N ] = T T r H s ( C s ) , (4.28) lim N →∞ ∆ t k N ( t ) X k =1 E [ h Γ k ,N , ˆ ϕ i i s h Γ k ,N , ˆ ϕ j i s |F k ,N ] = t h ˆ ϕ i , C s ˆ ϕ j i s , (4.29) lim N →∞ ∆ t k N ( T ) X k =1 E [ k Γ k ,N k 2 s 1 {k Γ k,N k 2 s ≥ ∆ tε } |F k ,N ] = 0 , (4.30) where k N ( t ) = ⌊ t ∆ t ⌋ , { ˆ ϕ j } is an orthonormal basis of H s and F k ,N is the natural filtration of th e Mark o v c hain { x k ,N } . The pro of follo ws f rom th e estimate on D N ( x ) = E [Γ 0 ,N ⊗ Γ 0 ,N | x 0 ,N = x ] pr esen ted in Lemma 4.8 . F or the sak e of simplicit y , we will write E k [ · ] ins tead of E [ ·|F k ,N ]. W e n o w p ro ve that the three conditions are satisfied. • Condition ( 4.28 ). It is enough to pr o ve that lim E      ( 1 ⌊ N 1 / 3 ⌋ ⌊ N 1 / 3 ⌋ X k =1 E k [ k Γ k ,N k 2 s ] ) − T r H s ( C s )      = 0 , where E k [ k Γ k ,N k 2 s ] = E k N X j =1 [ h ˆ ϕ j , D N ( x k ,N ) ˆ ϕ j i s ] = E k T r H s ( D N ( x k ,N )) . LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 35 Because the Metrop olis–Hastings algorithm preserves stationarit y and x 0 ,N D ∼ π N , it follo w s th at x k ,N D ∼ π N for any k ≥ 0. Therefore, for all k ≥ 0, we hav e T r H s ( D N ( x k ,N )) D ∼ T r H s ( D N ( x )) w here x D ∼ π N . Conse- quen tly , the triangle inequalit y sho ws that E      ( 1 ⌊ N 1 / 3 ⌋ ⌊ N 1 / 3 ⌋ X k =1 E k k Γ k ,N k 2 ) − T r H s ( C s )      ≤ E π N | T r H s ( D N ( x )) − T r H s ( C s ) | → 0 , where the last limit follo ws f rom Lemma 4.8 . • Condition ( 4.29 ). It is enough to pr o ve that lim E π N      ( 1 ⌊ N 1 / 3 ⌋ ⌊ N 1 / 3 ⌋ X k =1 E k [ h Γ k ,N , ˆ ϕ i i s h Γ k ,N , ˆ ϕ j i s ] ) − h ˆ ϕ i , C s ˆ ϕ j i s      = 0 , where E k [ h Γ k ,N , ˆ ϕ i i s h Γ k ,N , ˆ ϕ j i s ] = h ˆ ϕ i , D N ( x k ,N ) ˆ ϕ j i s . Because x k ,N D ∼ π N , the conclusion again follo w s from L emm a 4.8 . • Condition ( 4.30 ). F or all k ≥ 1, w e hav e x k ,N D ∼ π N so that E π N      1 ⌊ N 1 / 3 ⌋ ⌊ N 1 / 3 ⌋ X k =1 E k [ k Γ k ,N k 2 s 1 k Γ k,N k 2 s ≥ N 1 / 3 ε ]      ≤ E π N k Γ 0 ,N k 2 s 1 {k Γ 0 ,N k 2 s ≥ N 1 / 3 ε } . Equation ( 4.21 ) shows that for an y p o wer p ≥ 0 , we ha ve sup N E π N [ k Γ 0 ,N k p s ] < ∞ . T h erefore the sequence {k Γ 0 ,N k 2 s } is un iformly in tegrable, whic h sho w s that lim N →∞ E π N k Γ 0 ,N k 2 s 1 {k Γ 0 ,N k 2 s ≥ N 1 / 3 ε } = 0 . The thr ee hypothesis are satisfied, proving that W N con verge s we akly in C ([0 , T ]; H s ) to a Bro wnian motion W in H s with co v ariance C s . T herefore, Corollary 4 . 4 of [ 19 ] sho w s that the sequence { ( x 0 ,N , W N ) } N ≥ 1 con verge s w eakly to ( z 0 , W ) in H × C ([0 , T ] , H s ). This completes the pro of of Pr op o- sition 4.10 .  5. Conclusion. W e ha v e stud ied the app lication of the MALA algorithm to s ample fr om measur es defined via density with resp ect to a Gaussian measure on Hilb er t sp ace. W e pr o ve that a suitably in terp olated and scaled v ersion of the Mark o v c hain has a d iffusion limit in infinite dimensions. There are tw o main conclusions whic h follo w fr om th is theory: fir st, this w ork sh o ws that, in stationarit y , the MALA algorithm applied to an N - dimensional approximati on of the target will tak e O ( N 1 / 3 ) steps to explore 36 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y the inv arian t measure; second, the MALA algorithm will b e optimized at an av er age acceptance probabilit y of 0 . 574. W e ha ve thus significan tly ex- tended the work [ 23 ] which r eac hes similar conclusions in the case of i.i.d. pro du ct targets. In con trast w e ha v e considered target measures w ith sig- nifican t correlation, with structure motiv ated by a range of ap p lications. As a consequence our limit theorems are in an infinite dimens ional Hilb ert space, and we hav e emplo yed an approac h to the deriv ation of the diffusion limit whic h differs significan tly from that used in [ 23 ]. This app roac h w as dev elop ed in [ 19 ] to study diffus ion limits for the R WM algorithm. There are many p ossible dev elopments of this work. W e list seve ral of these. • In [ 4 ] it is sho wn that the Hybrid Monte Carlo algorithm (HMC) requires, for target measur es of the form ( 1.1 ), O ( N 1 / 4 ) steps to explore the inv ari- an t measur e. Ho wev er, there is n o diffusion limit in this case. Id en tifying an appropriate limit, and extendin g analysis to the case of target measures ( 2.11 ), p ro vides a chall enging a ve n u e for exploration. • In the i.i.d. pro d uct case, it is kn o wn that if the Mark ov chain is s tarted “far” from stationarit y , a flu id limit (ODE) is observed [ 11 ]. It would b e in teresting to study s uc h limits in the pr esen t con text. • Combining the analysis of MCMC metho d s f or h ierarc hical target mea- sures [ 2 ] with the analysis herein pro vides a c h allenging set of theoretical questions, as we ll as ha v in g direct app licabilit y . • It should also b e noted that, for measures absolutely conti n u ous with re- sp ect to a Gaussian, there exist new nonstandard ve rsions of R WM [ 8 ], MALA [ 7 ] and HMC [ 5 ] for whic h th e acceptance probabilit y do es not de- generate to zero as dimens ion N increases. Th ese metho d s m a y b e exp en - siv e to imp lemen t wh en the Karhunen–Lo ` eve basis is not kn own explicitly , and comparing their o ve rall efficiency with that of stand ard R WM, MALA and HMC is an interesting area for fur th er study . • It is natural to ask whether an alysis similar to that undertake n here could b e dev elop ed for Metrop olis–Hastings metho d s applied to other r eference measures with a non-Gaussian p r o duct structure. Pa rticularly , the Beso v priors of [ 18 ] provide an int eresting class of suc h reference measures, and the p ap er [ 13 ] p r o vides a mac hin ery f or analyzing c hange of measure from the Beso v prior, analog ous to that used here in the Gaussian case. An- other int eresting class of reference measures are th ose used in the study of un certain ty quantifica tion for elliptic PDEs: these ha ve the form of an infinite pr o duct of compactly sup p orted uniform distributions; see [ 25 ]. Ac kn o wledgment s. Pa rt of this work was d one when A. H. Thi´ ery w as visiting the Department of Statistics at Harv ard Univ ersit y , and w e th ank this institution for its hospitalit y .W e also thank the referee for his/her very useful comments. LANGEVIN ALGOR ITHM IN HIGH D I MENSIONS 37 REFERENCES [1] B ´ edard, M. (2007). W eak conve rgence of Metrop olis algorithms for n on-i.i.d. target distributions. Ann. Appl. Pr ob ab. 17 1222–1244 . MR2344305 [2] B ´ edard, M . (2009). On the optimal scaling problem of Metrop olis algorithms for hierarc h ical target distribu t ions. Preprint. [3] Berger, E. (1986). Asymptotic b ehaviour of a class of sto chastic approximation proced ures. Pr ob ab. The ory R elat. Fiel ds 71 517–55 2. MR0833268 [4] Beskos, A . , Pillai, N. , Rober ts, G. O. , Sanz-Serna , J. M. and Stuar t, A . M. (2012). Op t imal t uning of the hybrid Monte-Carlo algorithm. Bernoul li . T o ap- p ear. [5] Beskos, A. , Pinski, F. J. , Sanz -Serna, J. M. and Stuar t, A. M. (2011). Hy- brid Mon te Carlo on Hilb ert spaces. Sto chastic Pr o c ess. Appl. 121 2201–2230. MR2822774 [6] Beskos, A. , Rober ts, G. and Stuar t, A. (2009). Op timal scalings for lo cal Metropolis–Hastings chains on nonprodu ct targets in high dimensions. Ann. Appl. Pr ob ab. 19 863–898 . MR2537193 [7] Beskos, A . , Rober ts, G. , Stuar t, A. and Voss, J. (2008). MCMC metho ds for diffusion b ridges. Sto ch. Dyn. 8 319–350 . MR2444507 [8] Beskos, A. and Stuar t, A. (2009). MCMC method s for sampling function space. In ICIAM 07—6th International Congr ess on Industrial and Appl ie d Mathematics 337–364 . Eur. Math. S oc., Z ¨ urich. MR2588600 [9] Breyer, L. A. , Picci oni , M. and Scarla tti, S . (2004). Optimal scaling of MaLa for n onlinear regression. Ann. Appl. Pr ob ab. 14 1479–1505. MR2071431 [10] Brey er, L. A. and Rober ts, G. O. (2000). F rom Metrop olis to diffusions: Gibbs states an d optimal scaling. Sto chastic Pr o c ess. Appl. 90 181–206. MR1794535 [11] Chri ste n sen, O. F. , Rober ts, G. O. and Rosenthal, J. S. (2005). Scaling limits for the transient phase of local Metrop olis–Hastings algorithms. J. R. Stat. So c. Ser. B Stat. Metho dol. 67 253–268. MR2137324 [12] Da Pra to, G. and Zabczyk, J. (1992). Sto chastic Equations in Infinite Dimensions . Encyclop e di a of Mathematics and I ts Appli c ations 44 . Cam brid ge Univ. Press, Cam b rid ge. MR1207136 [13] Dashti, M. , Harri s, S. and Stuar t, A . M. (2012). Besov priors for Bay esian inv erse problems. Inverse Pr obl. Imaging . T o ap p ear. Av ailable at http://a rxiv.org/abs/ 1105.08 89 . [14] Ethier, S. N. and Kur tz, T. G. (1986). Markov Pr o c esses: Char acterization and Conver genc e . Wiley , N ew Y ork. MR0838085 [15] Hai re r, M. , Stuar t, A. M. and Voss, J. (2011). S ignal pro cessing problems on function space: Bay esian formulatio n, sto chas tic PDEs and effective MCM meth - ods. I n The Oxfor d H andb o ok of Nonline ar Filtering (D. Crisan and B. Rozovsky , eds.) 833–873. Oxford Univ. Press, Oxford. MR2884617 [16] Hai re r, M. , Stuar t, A. M. and Voss, J. (2007). Analysis of SPDEs arising in path sampling. I I. The n onlinear case. Ann. Appl. Pr ob ab. 17 1657–17 06. MR2358638 [17] Hai re r, M. , Stuar t, A. M. , Voss , J. and Wi berg, P. (2005). Analysis of S PDEs arising in p ath sampling. I . The Gaussian case. Commun. Math. Sci. 3 587–603. MR2188686 [18] Lassas, M . , Saksman, E. and Sil t a nen, S. (2009). Discretization-inv ariant Ba yesian inve rsion and Besov space p riors. Inverse Pr obl. Imaging 3 87–12 2. MR2558305 38 N. S. PILLAI, A. M. S TUAR T AND A. H. THI ´ ER Y [19] Ma ttingl y, J. C. , Pillai, N. S . and Stuar t, A. M. (2012). Diffusion limits of the random wal k Metrop olis algorithm in high dimensions. An n. Appl. Pr ob ab. 22 881–930 . [20] Metropolis, N . , Rosenbluth, A . W. , Teller, M. N. and T e ller, E. (1953). Equations of state calculations by fast computing mac hines. J. C hem. Phys. 21 1087–10 92. [21] Rober t, C. P. and Casella, G. ( 2004). Monte Carlo Statistic al Metho ds , 2nd ed . Springer, N ew Y ork. MR2080278 [22] Rober ts, G. O. , Gelman, A. and Gi lks, W. R. ( 1997). W eak conv ergence and optimal scaling of random walk Metrop olis algorithms. Ann. Appl. Pr ob ab. 7 110–120 . MR1428751 [23] Rober ts, G. O. and Rosenthal, J. S. (1998). Optimal scaling of d iscrete ap- proximati ons to Langevin diffusions. J. R. Stat. So c. Ser. B Stat. Metho dol. 60 255–268 . MR1625691 [24] Rober ts, G. O. and R osenthal, J. S. (2001). O ptimal scaling for v arious Metropolis–Hastings algorithms. Statist. Sci . 16 351–367. MR1888450 [25] Schw ab, C. and Stuar t, A. M. ( 2012). Sp arse deterministic approximation of Ba yesian inv erse problems. Inverse Pr oblems 28 045003. [26] She rlock, C. , Fearnhe a d, P. and R ober ts, G. O. (2010). The random walk Metropolis: Link ing theory and practice through a case study. Statist. Sci. 25 172–190 . MR2789988 [27] Stuar t, A. M. (2010). Invers e p roblems: A Bay esian p erspective. A cta Numer. 19 451–559 . MR2652785 N. S. Pillai Dep ar tment of S t a tistics Har v ard University Cambridge, Massachusetts 021 38-2901 USA E-mail: pillai@fas. har v ard.edu A. M. Stuart Ma themat ics Institute W ar wick Un iv ersity CV4 7AL, Coventr y United Kingdom E-mail: a.m.stuart@warwic k.ac.uk A. H. Thi ´ er y Dep ar tment of S t a tistics W ar wick Un iv ersity CV4 7AL, Coventr y United Kingdom E-mail: a.h.thiery@warwic k.ac.uk

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment