Diffusion limits of the random walk Metropolis algorithm in high dimensions
Diffusion limits of MCMC methods in high dimensions provide a useful theoretical tool for studying computational complexity. In particular, they lead directly to precise estimates of the number of steps required to explore the target measure, in stat…
Authors: Jonathan C. Mattingly, Natesh S. Pillai, Andrew M. Stuart
The Annals of Applie d Pr obabil ity 2012, V ol. 22, N o. 3, 881– 930 DOI: 10.1214 /10-AAP754 c Institute of Mathematical Statistics , 2 012 DIFFUSION LIMITS OF THE RANDOM W ALK METR OPOLIS ALGORITHM IN HIGH DIMENSIONS By Jona than C. Ma ttingl y 1 , Na tesh S. Pillai and Andrew M. Stuar t 2 Duke University, H arvar d U niversity and Warwick U niversity Diffusion limits of MCMC meth od s in high dimensions p ro vide a useful theoretical tool for studying computational complexity . I n particular, th ey lead directly to precise estimates of the number of steps required to ex p lore the target measure, in stationarit y , as a func- tion of the dimension of the state space. How ever, to date such re- sults hav e mainly b een prov ed for target measures with a pro du ct structure, severe ly limiting their applicability . The purp ose of this pap er is to stud y diffusion limits for a class of naturally occu rring high-dimensional measures found from the appro x imation of mea- sures on a Hilb ert sp ace which are absolutely continuous with resp ect to a Gaussian reference measure. The diffusion limit of a random w alk Metropolis algorithm to an infinite- dimensional Hilb ert space v alued SDE ( or SPDE) is prov ed, facili tating understanding of the compu- tational complexit y of the algorithm. 1. In tro d uction. Metrop olis–Hastings metho ds [ 18 , 21 ] form a widely used class of MCMC metho ds [ 19 , 22 ] for samplin g from complex p robabil- it y distributions. It is, therefore, of considerable in terest to develo p mathe- matical anal yses w h ic h explain the stru cture inherent in these algo rithms, esp ecially structure whic h is p ertinen t to understand in g the compu tational complexit y of the algorithm. Quantifying computatio nal complexit y of an MCMC m etho d is most natur ally undertak en b y s tudying the b eh a vior of the metho d on a family of probabilit y d istributions ind exed by a parameter and stu d ying the cost of th e algorithm as a fun ction of that parameter. In this pap er we will stud y the cost as a function of dimension for algorithms Received March 20 10; revised Nov ember 20 10. 1 Supp orted b y NSF Gran ts DMS-04-49910 and D MS-08-54879. 2 Supp orted b y EPSRC and ERC. AMS 2000 subje ct classific ations. 60J22, 60H15, 65C05 , 65C40, 60J20 . Key wor ds and phr ases. Marko v c hain Mon te Carlo, scaling limits, optimal con verge nce time, stochastic PDEs. This is an electronic r eprint of the orig inal article published by the Institute of Mathematical Statistics in The Annals of Applie d Pr ob ability , 2012, V ol. 2 2 , No. 3, 881– 930 . This r eprint differs from the origina l in pagination and typogr aphic detail. 1 2 J. C. MA TTINGL Y, N. S. PILLAI AND A. M. STUAR T applied to a family o f pr ob ab ility d istributions found from fin ite-dimensional appro ximation of a measure on an infin ite-dimensional space. Our int erest is fo cused on Metrop olis– Hastings MCMC metho ds [ 22 ]. W e study the simplest of these, the random w alk Metrop olis algorithm (R WM) . Let π b e a target distribution on R N . T o sample from π , the R WM algorithm creates a π -reversible Mark o v chain { x n } ∞ n =0 whic h mo ve s f rom a current state x 0 to a n ew state x 1 via pr op osing a candidate y , using a symmetric Mark o v transition ke r nel such as a rand om walk, and accepting y with prob - abilit y α ( x 0 , y ), wh ere α ( x, y ) = 1 ∧ π ( y ) π ( x ) . Although the p rop osal is somewhat naiv e, within the class of all Metropolis–Hastings alg orithms, th e R WM is still used in m an y applications b ecause of its s implicit y . The only computa- tional cost inv olv ed in calculating the acceptance pr obabilities is the relativ e ratio of densities π ( y ) π ( x ) , as compared to, sa y , the Langevin algorithm (MALA) where one needs to ev aluate the gradient of log π . A pioneering pap er in the analysis of complexit y f or MCMC m etho ds in high dimensions is [ 23 ]. T h is pap er studied the b eha vior of random walk Metrop olis metho ds when applied to target distributions with densit y π N ( x ) = N Y i =1 f ( x i ) , (1.1) where f ( x ) is a one-dimensional p robabilit y densit y fun ction. Th e auth ors considered a p rop osal of the form y = x + √ δ ρ, ρ D ∼ N(0 , I N ) , and the ob jectiv e w as to stud y the complexit y of the algorithm as a func- tion of the dimen s ion N o f the state sp ace. It was sho w n that c ho osing the prop osal v ariance δ to scale as δ = 2 ℓ 2 λ 2 N − 1 with 3 λ − 2 = R ( f ′ f ) 2 f dx ( ℓ > 0 is a parameter whic h we will discuss later) lea d s to an a v erage acce ptance probabilit y of order 1 with resp ect to dimension N . F u rthermore, with th is c hoice of scaling, individual comp onents of the resulting Mark o v chain con- v erge to the solutio n of a sto c hastic differen tial equation (SDE). T o state this, we define a conti n u ous interp olan t z N ( t ) = ( N t − k ) x k +1 + ( k + 1 − N t ) x k , k ≤ N t < k + 1 . (1.2) Then [ 23 ] sho ws that, w hen the Marko v chai n is started in s tationarit y , z N ⇒ z as N → ∞ in C ([0 , T ]; R ) wh ere z solves the SDE 4 dz dt = λ 2 h ( ℓ )[log f ( z )] ′ + p 2 λ 2 h ( ℓ ) dW dt , (1.3) 3 If f is the p .d.f. of a Gaussian on R , then λ is its standard deviation. 4 Our h ( · ) and ℓ are different from th e h old and ℓ old used in [ 23 ]. How ever, they can b e reco vered from the identities ℓ 2 old = 2 λ 2 ℓ 2 , h old ( ℓ old ) = 2 λ 2 h ( ℓ ). DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 3 h ( ℓ ) = 2 ℓ 2 Φ − ℓ √ 2 . (1.4) Here Φ denotes the CDF of a sta ndard normal d istr ibution, “ ⇒ ” denotes w eak con ve rgence and C ([0 , T ] , R ) denotes the Banac h space of real-v alued con tinuous fun ctions defi ned on the interv al [0 , T ] endo we d with the usual supremum n orm. Note that the in v ariant measure of the SDE ( 1.3 ) has th e densit y f with r esp ect to th e Leb esgue measure. This w eak con ve rgence result leads to the inte rpretation that, started in stationarit y and applied to target measures of th e form ( 1.1 ), the R WM algorithm w ill tak e on the order of N steps to explore the inv arian t measur e. F urthermore, it ma y b e shown that the v alue of ℓ which maximizes h ( ℓ ) and , therefore, maximizes the sp eed of conv ergence of th e limiting diffusion, leads to a universal acceptance probabilit y , for random wa lk Metrop olis algorithms applied to targets ( 1.1 ), of appr o ximately 0 . 234. These id eas hav e b een generalized to other prop osals, s u c h as the MALA algorithm in [ 24 ]. F or Langevin prop osals, the scaling of δ which ac h iev es order 1, acceptance p r obabilities is δ ∝ N − 1 / 3 and the choic e of the constan t of prop ortionalit y whic h maximizes the sp eed of the limiting SDE results from an acceptance pr obabilit y of approximat ely 0.574. Note, in particular, that this metho d will take on the order of N 1 / 3 steps to explore the inv arian t distribution. T his quantifies th e adv ant age of u sing information ab out the gradien t of log π in the prop osal; R WM algorithms, wh ic h d o not use this information, tak e on the order of N steps. The w ork by Rob erts and co-wo rk ers w as amo ng the first to dev elop a mathematical theo ry of Metrop olis– Hastings m etho ds in h igh dimension and d o es so in a fash ion which leads to clear criteria whic h pr actitioners can use to optimize algorithmic p erformance, for instance, by tuning the acceptance probabilities to 0 . 234 (R WM) or 0 . 574 (MALA). Y et it is op en to the critici sm that, from a practit ioner’s p ersp ective , target measures of the form ( 1.1 ) are to o limited a class of probab ility distrib utions to b e useful and, in an y case, can b e tac kled by sampling a single one-dimensional target b ecause of th e pro d uct structur e. There ha ve b een pap ers whic h generalize this work to target measures whic h r etain the p ro duct stru cture inherent in ( 1.1 ), b ut are no longer i.i.d. (see [ 1 , 25 ]), π N 0 ( x ) = N Y i =1 λ − 1 i f ( λ − 1 i x i ) . (1.5) Ho we v er, the same criticism ma y b e app lied to this scenario as w ell. Despite the apparen t simplicit y of target m easures of the form ( 1.1 ) and ( 1.5 ), the intuiti on obtained from the s tu dy of Metrop olis–Hastings metho ds applied to these mo dels with prod uct stru ctur e is, in fact, ex- tremely v aluable. The t wo ke y r esults which need to b e transferr ed to a more 4 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T general nonpr o duct measure setting are (i) the scaling of the p rop osal v ari- ance with N in order to ensur e order one acceptance probabilities; (ii) the deriv ation of diffu sion limits for the R WM algorithm with a time-scale fac- tor whic h can b e maximized ov er all acceptance p robabilities. T here is some w ork concerning scal ing limits for MCMC metho ds applied to target mea- sures whic h are not of p ro du ct form; the pap er [ 2 ] studies hierarchica l target distributions; the pap er [ 8 ] studies target measures which arise in nonlinear regression and ha v e a mean field structur e and the pap er [ 9 ] studies target densities whic h are Gibbs measures. W e add further to this literature on scaling limits f or m easures with n onpro d uct form b y adoptin g the frame- w ork stud ied in [ 4 – 6 ]. T here the authors consider a target d istribution π whic h lies in an infi nite dimensional, real separable Hilb ert space which is absolutely cont in u ous with resp ect to a Gaussian m easure π 0 with mean zero and co v ariance op erator C (see S ection 2.1 for details). The Radon–Nik o dy m deriv ativ e dπ dπ 0 has the form dπ dπ 0 = M Ψ exp( − Ψ( x )) (1.6) for a real v alued π 0 -measurable functional Ψ on the Hilb ert space and M Ψ a normalizing constan t. I n Section 3.1 w e will sp ecify and discuss the pr ecise assumptions on Ψ whic h w e adopt in this pap er. Th is infinite-dimensional framew ork for the target measures, b esides b eing able to captur e a h u ge n u m b er of useful mo dels arising in pr actice [ 16 , 27 ], also has an inherent mathematical structure whic h mak es it amenable to the deriv ation of dif- fusion limits in infinite dimensions, wh ile retaining links to th e pr o duct structure that has b een widely stud ied. W e highlight tw o asp ects of this mathematical stru cture. First, the theory of Gaussian measur es naturally generalizes from R N to infinite-dimensional Hilb ert sp aces. Let ( H , h· , ·i , k · k ) denote a real separable Hilb ert space with full measure un der µ 0 (Ψ will b e densely d efined on H ). The co v ariance op erator C : H 7→ H is a self-adjoin t, p ositiv e and trace class op erator on H with a complete orth onormal eigen basis { λ 2 j , φ j } , C φ j = λ 2 j φ j . Henceforth, we assume that the eigen v alues are arranged in decreasing order and λ j > 0 . Any fu nction x ∈ H can b e r epresen ted in the orthonormal eigen basis of C via the expansion x = ∞ X j =1 x j φ j , x j def = h x, φ j i . (1.7) Throughout this pap er w e will often identi fy the function x with its co- ordinates { x j } ∞ j =1 ∈ ℓ 2 in this eigen basis, mo vin g freely b et ween the tw o represent ations. Note, in particular, that C is diagonal with resp ect to the co ordinates in this eigen basis. By the K arh u nen–Lo ´ ev e [ 13 ] expansion, a re- DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 5 alizati on x from the Gaussian measure π 0 can b e expr essed b y allo wing the x j to b e ind ep endent random v ariables distribu ted as x j ∼ N (0 , λ 2 j ). Th us, in the co ord in ates { x j } , the prior has the pro du ct structure ( 1.5 ). F or the random w alk algorithm studied in this pap er w e assume that the eigenpairs { λ j , φ j } are kno wn so that sampling from π 0 is straigh tforward. The measur e π is absolutely con tinuous with resp ect to π 0 and hence, an y almost sur e prop ert y u nder π 0 is also tru e un d er π . F or example, it is a consequ ence of th e law of large n u m b ers that, almost surely with r esp ect to π 0 , 1 N N X j =1 x 2 j λ 2 j → 1 as N → ∞ . (1.8) This also holds almost sur ely with r esp ect to π , imp lying that a typica l dra w from the target measure π must b eha ve lik e a t y p ical dra w from π 0 in the large j co ord inates. 5 This offers h op e that ideas from the pro du ct case are applicable to measures π giv en b y ( 1.6 ) as w ell. Ho wev er, the presence of Ψ prev en ts use of th e tec h niques from previous work on this p roblem; the fact that in dividual comp onent s of the Mark ov chain conv erge to a scalar SDE, as pro v ed in [ 23 ], is a direct consequence of the pro duct structure inher- en t in ( 1.1 ) or ( 1.5 ). F or target measures of the form ( 1.6 ), th is structure is not p resen t and ind ividual comp onents of the Marko v chain cannot b e exp ected to con verge to a scalar SDE. Ho we ver, it is natural to exp ect con- v ergence of the en tire Mark ov c hain to an infinite-dimensional con tin u ous time sto chastic pro cess and the p u rp ose of this p ap er is to carry out su c h a program. Th us, the second f act which mak es the target measur e ( 1.6 ) attracti v e from the p oin t of view of establishing diffusion limits is that fact that, as pro ved in a series of r ecent p ap ers [ 15 , 17 ], it is inv arian t for Hilb ert-space v alued SDEs (or s to c hastic PDES –SPDEs) with the form dz dt = − h ( ℓ )( z + C ∇ Ψ( z )) + p 2 h ( ℓ ) dW dt , z (0) = z 0 , (1.9) where W is a Bro wn ian motion (see [ 13 ]) in H with co v ariance op erato r C . Th us, the ab o v e r esu lt from SPDE theory giv es u s a n atural candidate for the in finite-dimensional limit of an MCMC metho d. W e will prov e su c h a limit for a R WM algorithm with prop osal co v ariance 2 ℓ 2 N C . Moreo ver, we will show that the time constant h ( ℓ ) is maximized for an a ve rage acceptance probabilit y of 0 . 234, as obtained in [ 23 ] in the pr o duct case. 5 F or example, if µ 0 is the Gaussian measure associated with Bro wnian motion on a fin ite interv al, then ( 1.8 ) is an expression for the v ariance scale in the quad ratic v ariation, and this is preserved u nder changes of measure such as the Girsanov form ula. 6 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T These measures π giv en by ( 1.6 ) h a ve a num b er of features whic h will enable us to d ev elop the ideas of diffusion limits for MCMC metho ds as originally introd uced in the i.i.d. pro duct case. Carrying out this p rogram is w orthwhile b ecause measures of the form giv en b y ( 1.6 ) arise naturally in a r an ge of applicatio ns. In particular, they arise in the conte xt of non- parametric regression in Ba yesia n statistics where the parameter space is an infin ite-dimensional function space. The measure π 0 is the pr ior an d Ψ the log lik eliho od function. Su c h Ba yesia n in verse problems are o verview ed in [ 27 ]. Another class of problems leading to m easur es of the form ( 1.6 ) are conditioned diffu sions (see [ 16 ]). T o s ample from π n um erically we need a fi nite-dimensional target mea- sure. T o this end, let Ψ N ( · ) = Ψ( P N · ) w here P N denotes pro jection 6 (in H ) on to the first N eigenfunctions of C . Then consider the target measure π N with the form dπ N dπ 0 ( x ) ∝ exp( − Ψ N ( x )) . (1.10) This measure can b e f actored as th e pro du ct of t wo ind ep enden t m easur es: it coincides with π 0 on H \ P N H and h as a density w ith resp ect to Leb esgue measure on P N H , in the co ord inates { x j } N j =1 . In computational practice we implemen t a rand om wa lk metho d on R N in th e coordin ate system { x j } N j =1 , enabling us to sample from π N in P N H . Ho w ever, in order to facilitate a clean analysis, it is b eneficial to write this fi nite-dimensional random w alk metho d in H , noting that the co ordinates { x j } ∞ j = N +1 in the representa tion of f u nctions sampled fr om π N do not then c hange. W e consider prop osal distributions for the R WM wh ich exploit the co v ariance structure of π 0 and can b e expressed in H as y = x + r 2 ℓ 2 N C 1 / 2 ξ where ξ = N X j =1 ξ j φ j with ξ j D ∼ N(0 , 1) i.i.d. (1.11) Note that our prop osal v ariance scales as N − γ with γ = 1. The c hoice of γ in the prop osal v ariance affects the scale of the pr op osal m ov es and ident ifying the optimal c h oice for γ is a d elicate exercise. The larger γ is, the more “localized” the p rop osed mov e is and, therefore, for the algorithm to explore the state space rapidly , γ needs to b e as small as p ossible. How ev er, if w e tak e γ arb itrarily small, th en the acceptance probabilit y decreases to zero v ery r ap id ly as a f unction of N . In f act, it w as sh o wn in [ 4 – 6 ] that, for a v ariet y of Metrop olis–Hastings pr op osals, there is γ c > 0 such that choi ce of γ < γ c leads to a verage acc eptance probabilities which are smaller than 6 Actually Ψ is only densely defined on H but the pro jection P N can also b e defined on th is dense sub set. DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 7 an y inv erse p o we r of N . T hus, in higher d imensions, smaller v alues of γ lead to very p o or mixing b ecause of the neglig ible acceptance probabilit y . Ho we v er, it turns out th at at the critical v alue γ c , the acceptance probabilit y is O (1) as a f unction of N . I n [ 4 , 6 ], the v alue of γ c w as iden tified to b e 1 and 1 / 3 for the R WM and MALA, resp ectiv ely . Finally , when using the scalings leading to O (1) acceptance pr obabilities, it was also shown that the m ean squ are d istance mov ed is maximized by c ho osing the acceptance probabilities to b e 0 . 234 or 0 . 574 as in the i.i.d. p ro du ct case ( 1.1 ). Gu ided b y this intuitio n, w e ha ve chosen γ = γ c = 1 for our R WM p rop osal v ariance whic h , as we will p r o ve b elo w, leads to O (1) acceptance pr ob ab ilities. Summarizing the discussion s o f ar, our goal is to obtain an inv ariance principle for the R WM Mark ov chain with prop osal ( 1.11 ) when app lied to target measures of the form ( 1.6 ). The diffusion limit w ill b e obtained in sta- tionarit y and will b e giv en b y the SP DE ( 1.9 ). W e s h o w that the con tinuous time in terp olan t z N of the Mark o v c hain { x k } d efi ned b y ( 1.2 ) con verges to z solving ( 1.9 ). This will sh ow that, in stationarit y and prop erly scaled to ac hiev e O (1) acceptance probabilities, the random walk Metrop olis algo- rithm tak es O ( N ) s teps to explore the target d istribution. F rom a practical p oint of view, the tak e home message of this wo rk is th at standard R WM algorithms applied to approximat ions of target measures with the form ( 1.6 ) can b e tuned to b eha ve optimally b y adjusting the ac ceptance probabilit y to b e appro ximately 0 . 234 in the case where the prop osal co v ariance is pro- p ortional to the co v ariance C in the reference measure. This will lead to O ( N ) steps to explore the target measure in stationarit y . Th is extends the w ork in [ 23 ] and shows that th e ideas dev elop ed there app ly to non trivial high-dimensional targets arising in applications. Although we only analyze the R WM p r op osal ( 1.11 ), w e b eliev e th at our tec hniques can b e applied to a large r class of Metropolis–Hastings metho ds , including the MALA alg o- rithm, and/or R WM metho d s with isotropic prop osal v ariance. In this latter case we exp ect to get a differen t (nonpreconditioned) π -inv arian t S P DE as the limit when the dimension go es to infinity (see [ 15 , 17 ] for analysis of these SPDEs) and a differen t (more severe) restriction on the s caling of the prop osal v ariance with N ; how ev er, w e conjecture th at the op timal acce p- tance pr obabilit y w ould not b e c hanged. The prop osal that we s tudy in this pap er relies on kn o wledge of th e eig enstructure of the co v ariance op erator of the prior or referen ce measures π 0 . In some applications, this ma y b e a reasonable assump tion, f or example, for conditioned diffu sions or for PDE in verse problems in simple geometries. F or others it ma y not, and then the isotropic p rop osal co v ariance is m ore natural. W e analyze the R WM algorithm started at stationarit y , and thus do not attempt to answer the question of “bu r n-in time”: the num b er of steps re- quired to reac h stationarit y and ho w the prop osal scaling affects the rate of con ve rgence. These are imp ortan t qu estions whic h w e hop e to answ er in a f uture p ap er. F urthermore, practitioners wishing to sample fr om probabil- 8 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T it y measures on function space with the form ( 1.6 ) should b e a ware that for some examples, new generalizatio n s of random w alk Metrop olis algorithms, defined on function space, can b e more efficien t than the standard r andom w alk metho ds analyzed in this pap er [ 5 , 6 ]; w hether or not they are more efficien t dep ends on a trade-off b et w een num b er of steps to explore the mea- sure (which is lo wer for the new generalized m etho ds) and cost p er step (whic h can b e higher, but may not b e). There exist sev eral metho d s in the lite rature to p ro ve inv ariance princi- ples. F or in s tance, b ecause of the reversibilit y of the R WM Mark ov c h ain, utilizing the abstract but p o we r ful theory of Diric hlet f orm s [ 20 ] is app ealing. Another alternativ e is to show the con vergence of generators of the asso ci- ated Mark o v p r o cesses [ 14 ] as used in [ 23 ]. Ho we v er, we c h ose a more “hands on” approac h using simple probabilistic to ols, th us gaining more in tu ition ab out the R WM algorithm in higher dimens ions. W e show that w ith th e cor- rect c hoice of scaling, the one step transition f or th e R WM Marko v chain b e- ha ves nearly lik e an Euler sc heme applied to ( 1.9 ). Since the noise ente rs ( 1.9 ) additiv ely , the induced Itˆ o map wh ic h tak es Wiener tra jecto ries in to solu- tions is con tinuous in the supr em um -in-time top ology . Th is fact, wh ic h w ould not b e true if ( 1.9 ) had multiplica tiv e noise, allo ws to emplo y an argument simpler than the m ore general tec h niques often used (see [ 14 ]). W e fi rst sh o w that the martingale incremen ts con verge weakly to a Hilb ert space-v alued Wiener pro cess using a martingale cen tral limit theorem [ 3 ]. Since w eak con- v ergence is preserved under a con tin uous map, the fact that the Itˆ o m ap is con tinuous implies the R WM Marko v c hain con verges to the SPDE ( 1.9 ). Finally , w e emp hasize that diffusion limits for the R WM prop osal are nec- essarily of weak con verge nce t yp e. Ho wev er, strong con v ergence resu lts are a v ailable for the MALA algorithm, in fixed fin ite dimension (see [ 7 ]). 1.1. Or g anization of the p ap er. W e s tart by setting u p the notation th at is used for the r emainder of the p ap er in Section 2 . W e then inv estigate the mathematical structure of the R WM algorithm wh en app lied to target mea- sures of the form ( 1.10 ). Before presen ting d etails, a heuristic but detailed outline of th e pro of strategy is giv en for comm un icating the main ideas. In Section 3 we state our assu mptions and giv e the pro of of the main theorem at a high lev el, p ostp on in g pro ofs of s ome tec hnical estimates. In Section 4 w e prov e the in v ariance pr inciple for the n oise pro cess. S ection 5 con tains the pro of of the drift and d iffusion estimates. All un iversal constan ts, unless otherwise s tated, are den oted b y the letter M wh ose p recise v alue migh t v ary from one lin e to th e next. 2. Diffusion limits of the R WM algorithm. In this section we state the main theorem, set it in co n text and explain the pro of tec h nique. W e first in tro duce an appro ximation of the measure π , namely π N , whic h is finite dimensional. W e then state the main theorem concerning a diffusion limit DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 9 of the algorithm and sketc h the ideas of the pro of so th at tec hnical details in later sections can b e readily digested. 2.1. Pr e liminaries. Recall that H is a separable Hilb ert sp ace of real- v alued functions with inner-pro d u ct and norm h· , ·i and k · k . Let C b e a p ositiv e, trace class op erator on H . Let { φ j , λ 2 j } b e the eigenfunctions and eigen v alues of C , resp ectiv ely , so that C φ j = λ 2 j φ j , j ∈ N . W e assume a normalization u nder whic h { φ j } forms a complete orthonormal basis in H . W e also assume that th e eigen v alues are arranged in decreasing order. F or ev er y x ∈ H we hav e the represent ation ( 1.7 ). Using this expan- sion, w e define the Sob olev spaces H r , r ∈ R , with the inner-pro ducts and norms defi ned b y h x, y i r def = ∞ X j =1 j 2 r x j y j , k x k 2 r def = ∞ X j =1 j 2 r x 2 j . (2.1) Notice that H 0 = H . F u rthermore, H r ⊂ H ⊂ H − r for an y r > 0. F or r ∈ R , let B r : H 7→ H denote the op erator whic h is diagonal in the basis { φ j } w ith diagonal entries j 2 r , that is, B r φ j = j 2 r φ j so that B 1 / 2 r φ j = j r φ j . The op er ator B r lets us alternate b et w een the Hilb ert space H and the S ob olev spaces H r via the identitie s h x, y i r = h B 1 / 2 r x, B 1 / 2 r y i , k x k 2 r = k B 1 / 2 r x k 2 . (2.2) Let ⊗ den ote the outer p ro duct op erator in H defin ed by ( x ⊗ y ) z def = h y , z i x ∀ x, y , z ∈ H . (2.3) F or an op erator L : H r 7→ H l , w e denote the op erato r norm on H by k · k L ( H r , H l ) defined by k L k L ( H r , H l ) def = sup k x k r =1 k Lx k l . F or self-adjoint L and r = l = 0 this is, of course, the sp ectral radius of L . F or a p ositiv e, self-adjoint op erator D : H 7→ H , define its trace as trace( D ) def = ∞ X j =1 h φ j , D φ j i . Since trace( D ) do es not d ep end on the orthonormal b asis, an op erator D is said to b e trace class if trace( D ) < ∞ for some, and hence any , orthonormal basis { φ j } . 10 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T Let π 0 denote a m ean zero Gauss ian measure on H with co v ariance op er- ator C , th at is, π 0 def = N(0 , C ). If x D ∼ π 0 , then the x j in ( 1.7 ) are ind ep endent N(0 , λ 2 j ) Gaussians and we m a y write (Karhunen–Lo ´ eve) x = ∞ X j =1 λ j ρ j φ j with ρ j D ∼ N(0 , 1) i.i.d. (2.4) Since k B − 1 / 2 φ k k r = k φ k k = 1, we deduce that { B − 1 / 2 r φ k } f orm an orthonor- mal basis f or H r and, therefore, w e ma y write ( 2.4 ) as x = ∞ X j =1 λ j j r ρ j B − 1 / 2 r φ j with ρ j D ∼ N(0 , 1) i.i.d. (2.5) If Ω denotes the probabilit y sp ace for s equences { ρ j } j ≥ 1 , then the sum con- v erges in L 2 (Ω; H r ) as long as P ∞ j =1 λ 2 j j 2 r < ∞ . Th u s, un der this condition, the distribu tion induced by π 0 ma y b e view ed as that of a cen tered Gauss ian measure on H r with cov ariance op erator C r giv en b y C r = B 1 / 2 r C B 1 / 2 r . (2.6) The assu mption on summabilit y is the usu al trace-cla ss condition for Gaus- sian measures on a Hilb ert space: tr ace( C r ) < ∞ . In wh at follo w s, w e freely alternate b et w een th e Gaussian measures N(0 , C ) on H an d N(0 , C r ) on H r , for v alues of r for which the trace-class prop erty of C r holds. Our goal is to samp le f rom a m easur e π on H giv en by ( 1.6 ), dπ dπ 0 = M Ψ exp( − Ψ( x )) with π 0 as constru cted ab ov e. F requen tly in applications, the fun ctional Ψ ma y not b e defi n ed on all of H , bu t only on a subset H r ⊂ H for some exp onent r > 0. F or instance, if H = L 2 ([0 , 1]), the fu nctional Ψ might only act on contin uous fu nctions, in wh ich case it is natural to d efine Ψ on some Sob olev space H r [0 , 1] for r > 1 2 . Ev en th ou gh th e Gaussian measur e π 0 is defined on H , dep end ing on th e deca y of the eigen v alues of C , there exists an en tire range of v alues r su c h that trace( C r ) < ∞ so that the measure π 0 has fu ll supp ort on H r , that is, π 0 ( H r ) = 1. F rom now on ward w e fix a dis- tinguished exp onen t s ≥ 0 and assu me that Ψ : H s 7→ R and that the prior is chosen so that trace( C s ) < ∞ . Then π 0 ∼ N(0 , C ) on H and π ( H s ) = 1; in addition, w e ma y view π 0 as a Gauss ian measure N(0 , C s ) on H s . The precise connection b et ween the exp onen t s and the eigen v alues of C is giv en in Section 3.1 . In order to sample from π w e first appro ximate it by a finite-dimensional measure. Recall that b φ k def = B − 1 / 2 s φ k (2.7) DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 11 form an orthonormal basis for H s . F or N ∈ N , let P N : H s 7→ X N ⊂ H s b e the pr o jectio n op erator in H s on to X N def = span { b φ 1 , b φ 2 , . . . , b φ N } , th at is, P N x def = N X j =1 x j b φ j where x j = h x, b φ j i s , x ∈ H s . This sho ws that X N is isomorp hic to R N . Next, w e ap p ro ximate Ψ by Ψ N : X N 7→ R and attempt to s amp le from th e follo w ing approximat ion to π , namely , dπ N dπ 0 ( x ) def = M Ψ N exp( − Ψ N ( x )) wher e Ψ N ( x ) def = Ψ( P N x ) . Note that ∇ Ψ N ( x ) = P N ∇ Ψ( P N x ) and ∂ 2 Ψ N ( x ) = P N ∂ 2 Ψ( P N x ) P N . The constan t M Ψ N is chose n so that π N ( H s ) = 1. It may b e shown that, for large N , th e measure π N is close to th e measure π in the Hellinger metric (see [ 12 ]). Set C N def = P N C P N , C N r def = B 1 / 2 r C N B 1 / 2 r . (2.8) Notice that on X N , π N has Leb esgue d ensit y 7 π N ( x ) = M Ψ N exp( − Ψ N ( x ) − 1 2 h P N x, C − 1 ( P N x ) i ) , x ∈ X N (2.9) = M Ψ N exp( − Ψ N ( x ) − 1 2 h x, ( C N ) − 1 x i ) since C N is inv ertible on X N b ecause the eigen v alues are assum ed to b e strictly p ositiv e. On H s \ X N w e hav e that π N = π 0 . Later we will imp ose natural assumptions on Ψ (and hence, on Ψ N ) whic h are motiv ated b y ap- plications. 2.2. The algorithm. O ur goal is n o w to s amp le from ( 2.9 ) with x ∈ X N . As explained in the In tro duction , we use a R WM prop osal with co v ariance op erator 2 ℓ 2 N C on H giv en by ( 1.11 ). T he noise ξ is finite dimensional and is indep enden t of x . Hence, ev en though the Marko v c hain ev olves in H s , x and y in ( 1.11 ) differ only in the first N co ordin ates when w ritten in the eigen basis of C ; as a consequence, the Mark o v c h ain do es not m ov e at all in H s \ P N H s and can b e implemen ted in R N . Ho w ever the analysis is cleaner when w ritten in H s . Th e acceptance probabilit y also only dep ends on the first N co ordinates of x and y and has the form α ( x, ξ ) = 1 ∧ exp( Q ( x, ξ )) , (2.10) 7 F or ease of notation we do not distinguish b etw een a measure and its density . 12 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T where Q ( x, ξ ) def = 1 2 k C − 1 / 2 ( P N x ) k 2 − 1 2 k C − 1 / 2 ( P N y ) k 2 (2.11) + Ψ N ( x ) − Ψ N ( y ) . The Mark o v c hain for { x k } , k ≥ 0 is then giv en by x k +1 = γ k +1 y k +1 + (1 − γ k +1 ) x k and y k +1 = x k + r 2 ℓ 2 N C 1 / 2 ξ k +1 (2.12) with γ k +1 def = γ ( x k , ξ k +1 ) D ∼ Bernoulli( α ( x k , ξ k +1 )) and ξ k +1 = N X i =1 ξ k +1 i φ i where ξ k +1 i D ∼ N(0 , 1) i.i.d. with some initial condition x 0 . The rand om v ariables ξ k and x 0 are indep en- den t of one another. F urth ermore, conditional on α ( x k − 1 , ξ k ), the Bernoulli random v ariables γ k are c hosen indep enden tly of all other sources of r andom- ness. This can b e seen in th e usual wa y by introd u cing an i.i.d. sequence of uniform random v ariables Un if [0 , 1] and using these for eac h k to constru ct the Bernoulli random v ariable. In su mmary , the Mark ov chain th at we ha ve d escrib ed in H s is, w hen pro jected into co ord inates { x j } N j =1 , equiv alen t to a standard random walk Metrop olis metho d for the Leb esgue d ensit y ( 2.9 ) with pr op osal v ariance giv en by C N on H . Recall that the target measure π in ( 1.6 ) is the inv arian t measure of the SPDE ( 1.9 ). Our goal is to obtain an in v ariance p rinciple for the contin uous in terp olan t ( 1.2 ) of the Marko v chain { x k } started in stationarit y: to s ho w we ak conv ergence of z N ( t ) to the solution z ( t ) of the SPDE ( 1.9 ), as the dimension N → ∞ . In the rest of th e s ection, we will giv e a heuristic outline of our main argumen t. The emphasis will b e on the pro of strategy and main ideas. So w e will not y et prov e the error b ounds and use the symb ol “ ≈ ” to ind icate so. Once the main ske leton is outlined, we r etrace our argument s and mak e them r igorous in Sections 3 , 4 and 5 . 2.3. Main th e or em and implic ations. As mentioned earlier for fi xed N , the Mark ov chain evo lv es in X N ⊂ H s and w e p ro ve the inv ariance principle for the Mark o v c h ain in the Hilb ert sp ace H s as N go es to infinity . Define the constant β , β def = 2Φ( − ℓ/ √ 2) , (2.13) where Φ d enotes the CDF of the s tandard n ormal distr ibution. No te th at with this d efinition of β , the time scale h ( ℓ ) app earing in ( 1.9 ), and d efined in ( 1.4 ), is giv en by h ( ℓ ) = ℓ 2 β . The follo wing is the m ain result of this article (it is stated precisely , with conditions, as Theorem 3.6 ): DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 13 Main theo rem. L et the initial c ondition x 0 of the R WM algorithm b e such that x 0 D ∼ π N and let z N ( t ) b e a pie c ewise line ar, c ontinuous inter- p olant of the R WM algorithm ( 2.12 ) as define d in ( 1.2 ). Then z N ( t ) c on- ver ges we akly in C ([0 , T ] , H s ) to the diffusion pr o c e ss z ( t ) gi v en by ( 1.9 ) with z (0) D ∼ π . W e will now explain the follo w in g tw o imp ortan t implications of this re- sult: • it demonstrates that, in stationarit y , th e w ork requ ired to explore the in v ariant measure scales as O ( N ); • it demonstrates th at the sp eed at whic h the inv arian t measure is exp lored, again in stationarit y , is maximized b y tunin g the a ve rage acceptance pr ob- abilit y to 0 . 234. The first implication f ollo ws from ( 1.2 ) since this sho ws that O ( N ) s teps of the Mark o v c h ain ( 2.12 ) are required for z N ( t ) to appro ximate z ( t ) on a time inte rv al [0 , T ] long enough for z ( t ) to hav e explored its in v ariant m ea- sure. Th e second imp lication follo ws from ( 1.9 ) for z ( t ) itself. The maxim u m of the time-scale h ( ℓ ) ov er th e parameter ℓ (see [ 23 ]) o ccur s at a unive rsal ac- ceptance probability of b β = 0 . 234, to three decimal places. T h us, r emark ably , the optimal acceptance probabilit y identi fied in [ 23 ] for pro d uct measures, is also op timal for th e nonpro duct measur es studied in this pap er. 2.4. Pr o of str ate gy. Let F k denote the sigma algebra generated b y { x n , ξ n , γ k , n ≤ k } . W e denote the conditional exp ectations E ( ·|F k ) b y E k ( · ). W e fir st compute the one-step exp ected drift of the Marko v c hain { x k } . F or n ota- tional con venience let x 0 = x and ξ 1 = ξ . W e set ξ 0 = 0 and γ 0 = 0 . T hen, under the assumptions on Ψ , Ψ N giv en in Section 3.1 , w e pro ve the follo wing prop osition estimating the mean on e-step drift and diffu sion. The proof is giv en in S ections 5.2 and 5.3 . Pr oposition 2.1. L et Assumptions 3.1 and 3.4 (b elow) hold. L et { x k } b e the R WM Markov chain with x 0 = x D ∼ π N . Then N E 0 ( x 1 − x ) = − ℓ 2 β ( P N x + C N ∇ Ψ N ( x )) + r N , (2.14) N E 0 [( x 1 − x ) ⊗ ( x 1 − x )] = 2 ℓ 2 β C N + E N , (2.15) wher e the err or terms r N and E N satisfy E π N k r N k 2 s → 0 , E π N P N i =1 |h φ i , E N φ i i s | → 0 and E π N |h φ i , E N φ j i s | → 0 as N → ∞ , for any p air of indic es i, j and for s app e aring in Assumptions 3.1 . Th us th e discrete time Mark o v chain { x k } obtained by the successive accepted samples of th e R WM algorithm has approxi mately the exp ected drift and co v ariance structure of the SPDE ( 1.9 ). It is al so crucial to our 14 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T subsequent argument in volving the martingale central limit theorem that the error terms r N and E N con verge to zero in the Hilb ert space H s norm and inner -p ro duct as stated. With this in hand, w e n eed to establish the appropriate in v ariance prin ci- ple to sho w that th e dyn amics of the Mark ov c h ain { x k } , when seen as the v alues of a con tinuous time pro cess on a time mesh with steps of O (1 / N ) , con verge s wea kly to the la w of the S PDE give n in ( 1.9 ) on C ([0 , T ] , H s ). T o this end we define, for k ≥ 0, m N ( · ) def = P N ( · ) + C N ∇ Ψ N ( · )Γ k +1 ,N (2.16) def = s N 2 ℓ 2 β ( x k +1 − x k − E k ( x k +1 − x k )) , r k +1 ,N def = N E k ( x k +1 − x k ) + ℓ 2 β ( P N x k + C N ∇ Ψ N ( x k )) , (2.17) E k +1 ,N def = N E k [( x k +1 − x k ) ⊗ ( x k +1 − x k )] − 2 ℓ 2 β C N (2.18) with E 0 ,N , Γ 0 ,N , r 0 ,N = 0. Notice that for fixed N , { r k ,N } k ≥ 1 , { E k ,N } k ≥ 1 are, since x 0 ∼ π N , stationary s equ ences. By defin ition, x k +1 = x k + E k ( x k +1 − x k ) + r 2 ℓ 2 β N Γ k +1 ,N . (2.19) F rom ( 2.14 ) in Prop osition 2.1 , for large enough N , x k +1 ≈ x k − ℓ 2 β N ( P N x k + C N ∇ Ψ N ( x k )) + r 2 ℓ 2 β N Γ k +1 ,N (2.20) = x k − ℓ 2 β N m N ( x k ) + r 2 ℓ 2 β N Γ k +1 ,N . F rom the d efinition of Γ k ,N in ( 2.16 ), and fr om ( 2.15 ) in Prop osition 2.1 , E k (Γ k +1 ,N ) = 0 a n d E k (Γ k +1 ,N ⊗ Γ k +1 ,N ) ≈ C N . Therefore, for large enough N , equation ( 2.20 ) “resem bles” the Euler sc h eme for simulat ing the finite-dimensional app ro ximation of the S PDE ( 1.9 ) on R N , with drift function m N ( · ) and co v ariance op erator C N : x k +1 ≈ x k − h ( ℓ ) m N ( x k )∆ t + p 2 h ( ℓ )∆ t Γ k +1 ,N where ∆ t def = 1 N . This is th e k ey idea un derlying our main r esult (Theorem 3.6 ): the Marko v c hain ( 2.12 ) lo oks lik e a wea k Euler appro ximation of ( 1.9 ). Note that there is an imp ortan t differen ce in analyzing the w eak conv er- gence f rom the traditional Euler sc heme. In our case, for an y fi xed N ∈ N , Γ k ,N ∈ X N is fin ite dimensional, but clearly the dimension of Γ k ,N gro ws with N . Also, the distribution of the initial condition x (0) D ∼ π N c hanges DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 15 with N , un lik e the case of the traditional Euler scheme where the distribu- tion of x (0) d o es n ot c hange with N . Moreo ver, for any fi xed N , the “noise” pro cess { Γ k ,N } are not formed of ind ep endent random v ariables. Ho w ever, they are iden tically distributed (a stationary sequence) b ecause th e Metrop o- lis algorithm pr eserv es stationarit y . T o obtain an in v ariance prin ciple, we first us e a v ersion of the martingale central limit theorem (Prop osition 4.1 ) to sho w that the noise pro cess { Γ k ,N } , when rescaled and su mmed, con ve rges w eakly to a Brownian motion on C ([0 , T ] , H s ) with co v ariance op erator C s , for an y T = O (1). W e th en use contin uit y of an app ropriate Itˆ o map to deduce the desired result. Before we pro ceed, w e introduce some notation. Fix T > 0, an d define ∆ t def = 1 / N , t k def = k ∆ t, η k ,N def = √ ∆ t k X l =1 Γ l,N (2.21) and W N ( t ) def = η ⌊ N t ⌋ ,N + N t − ⌊ N t ⌋ √ N Γ ⌊ N t ⌋ +1 ,N , t ∈ [0 , T ] . (2.22) Let W ( t ) , t ∈ [0 , T ] b e an H s v alued Bro wn ian motion with co v ariance op era- tor C s . Using a martingale cen tral limit theorem, we will prov e the follo wing prop osition in Section 4 . Pr oposition 2.2. L et Assumptions 3.1 (b elow) hold. L et x 0 ∼ π N . The pr o c ess W N ( t ) define d in ( 2.22 ) c onver ges we akly to W i n C ([0 , T ] , H s ) as N tends to ∞ , wher e W is a Br ownian motion in time with c ovarianc e op er- ator C s in H s and s is define d in A ssumptions 3.1 . F urthermor e, the p air ( x 0 , W N ( t )) c onver ges we akly to ( z 0 , W ) wher e z 0 ∼ π and B r ownian mo- tion W is indep endent of the initial c ondition z 0 almost sur ely. Using this inv ariance p rinciple f or the noise p ro cess and the fact that the noise pr o cess is additiv e (the diffusion co efficient is constan t), the inv ariance principle for the Mark ov chain follo ws f r om a contin uous mapping argument whic h we no w outline. F or any ( z 0 , W ) ∈ H s × C ([0 , T ]; H s ), we define the Itˆ o map Θ: H s × C ([0 , T ]; H s ) → C ([0 , T ]; H s ) b y Θ : ( z 0 , W ) 7→ z w here z solv es z ( t ) = z 0 − h ( ℓ ) Z t 0 ( z ( s ) + C ∇ Ψ( z ( s ))) ds + p 2 h ( ℓ ) W ( t ) (2.23) for all t ∈ [0 , T ] and h ( ℓ ) = ℓ 2 β is as defin ed in ( 1.4 ). Th us z = Θ( z 0 , W ) solv es the SPDE ( 1.9 ) w ith h ( ℓ ) = ℓ 2 β . W e w ill see in Lemma 3.7 th at Θ is a con tin uous map from H s × C ([0 , T ]; H s ) in to C ([0 , T ]; H s ). W e no w define th e piecewise constant in terp olan t of x k , ¯ z N ( t ) = x k for t ∈ [ t k , t k +1 ) . (2.24) 16 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T Set d N ( x ) def = N E 0 ( x 1 − x ) . (2.25) Note that d N ( x ) ≈ − h ( ℓ ) m N ( x ). W e can use ¯ z N to construct a conti n u ous piecewise linear int erp olan t of x k b y defining z N ( t ) = z 0 + Z t 0 d N ( ¯ z N ( s )) ds + p 2 h ( ℓ ) W N ( t ) . (2.26) Notice that d N ( x ) defin ed in ( 2.25 ) is a function wh ic h dep end s on arb itrary x = x 0 and a verage s out the rand omness in x 1 conditional on fixing x = x 0 . W e ma y then ev aluate this fun ction at any x ∈ H s and, in p articular, at ¯ z N ( s ) as in ( 2.26 ). Use of the stati onarit y of the sequence x k , together with equations ( 2.19 ), ( 2.21 ) and ( 2. 22 ), rev eals that the definition ( 2.26 ) coincides with that giv en in ( 1.2 ). Using the closeness of d N and − h ( ℓ ) m N , of z N and ¯ z N and of m N and the d esired limiting drift, we will see that there exists a c W N ⇒ W as N → ∞ , suc h that z N ( t ) = z 0 − h ( ℓ ) Z t 0 ( z N ( s ) + C ∇ Ψ( z N ( s ))) ds + p 2 h ( ℓ ) c W N ( t ) , (2.27) so that z N = Θ( z 0 , c W N ). By the conti n u it y of Θ we will s h o w, using the con tinuous mapping th eorem, that z N = Θ( z 0 , c W N ) = ⇒ z = Θ( z 0 , W ) as N → ∞ . (2.28) It will b e imp ortan t to sh o w th at th e wea k limit of ( z 0 , c W N ), n amely ( z 0 , W ), comprises of t wo indep enden t random v ariables z 0 (from the s tationary dis- tribution) and W . The weak con ve r gence in ( 2.28 ) is the p r incipal result of this article and is stated p recisely in Th eorem 3.6 . T o summarize, w e ha ve argued that the R WM is we ll approxima ted by an Euler app r o ximation of ( 1.9 ). The Euler appro ximation itself can b e seen as an appro xim ate solution of ( 1.9 ) with a mo dified Bro wnian motion. As N → ∞ , all appro ximation errors go to zero in the appropriate sense and one deduces that the R WM algorithm con verge s to the solution of ( 1.9 ). 2.5. A fr amework for exp e cte d drift and diffusi on. W e no w turn to the question of how the R WM algorithm pro d u ces the appropriate d rift and co v ariance encapsu lated in Prop osition 2.1 . This result, whic h sho ws that the algorithm (approximat ely) p erforms a n oisy steep est ascent pro cess, is at the heart of why the Metrop olis algorithm works. In the r est of this section w e set up a f r amew ork w h ic h w ill b e used for deriving the exp ected drift and diffus ion terms. DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 17 Recall the setup fr om Section 2 . Starting fr om ( 2.11 ), after some algebra w e obtain Q ( x, ξ ) = − r 2 ℓ 2 N h ζ , ξ i − ℓ 2 N k ξ k 2 − r ( x, ξ ) , (2.29) where we ha ve defined ζ def = C − 1 / 2 ( P N x ) + C 1 / 2 ∇ Ψ N ( x ) , (2.30) r ( x, ξ ) def = Ψ N ( y ) − Ψ N ( x ) − h∇ Ψ N ( x ) , P N y − P N x i . (2.31) Remark 2.3. If x D ∼ π 0 in H s , then the random v ariable C − 1 / 2 x is not w ell defi ned in H s b ecause C − 1 / 2 is not a trace class op erator. Ho wev er, equation ( 2.30 ) is still well defin ed b ecause the op erator C − 1 / 2 acts only in X N for an y fixe d N . Notice that C 1 / 2 ζ is app ro ximately the drift term in the SPDE ( 1.9 ) and this pla ys a k ey r ole in obtaining the mean drift from the acce pt/reject mec hanism; this p oint is elab orated on in the argumen ts leading u p to ( 2.45 ). By ( 3.5 ) and Assum p tions 3.1 , 3.4 on Ψ and Ψ N b elo w , we will obtain a global b ound on the remainder term of the form | r ( x, ξ ) | ≤ M ℓ 2 N k C 1 / 2 ξ k 2 s . (2. 32) Because of our assu mptions on C in ( 3.1 ), the momen ts of k C 1 / 2 ξ k 2 s sta y uniformly b ounded as N → ∞ . Hence, we will neglect th is term to explain the heuristic ideas. Since ξ = P N i =1 ξ i φ i with ξ i D ∼ N(0 , 1), we find that for fixed x , Q ( x, ξ ) ≈ N − ℓ 2 , 2 ℓ 2 k ζ k 2 N (2.33) for large N (see Lemma 5.1 ). Since x D ∼ π , we ha ve that C − 1 / 2 ( P N x ) = P N k =1 ρ j φ j , wh er e ρ j are i.i.d. N(0 , 1). Much as with the term r ( x, ξ ) ab o v e, the s econd term in expression ( 2.30 ) for ζ can b e seen as a p erturbation term whic h is small in magnitude compared to the fir s t term in ( 2.30 ) as N → ∞ . Thus, as shown in Lemma 5.2 , w e ha ve k ζ k 2 / N → 1 for π -a.e. ζ as N → ∞ . Return ing to ( 2.33 ), this suggests that it is reasonable for N sufficien tly large to mak e the ap p ro ximation Q ( x, ξ ) ≈ N( − ℓ 2 , 2 ℓ 2 ) , π -a.s. (2.34) Muc h of this section is concerned with understanding the b eha vior of one step of the R WM algorithm if we mak e the app ro ximation in ( 2.34 ). Once this is understo o d, we will retrace our steps b eing more careful to con trol the appr o ximation error leading to ( 2.34 ). 18 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T The f ollo wing lemma concerning normal rand om v ariables w ill b e critical to id entifying the source of the ob s erv ed drift. It give s us the relation b et we en the constant s in the exp ected drift and d iffusion co efficien ts whic h ensur es π in v ariance, as will b e seen later in this section. Lemma 2.4. L et Z ℓ D ∼ N( − ℓ 2 , 2 ℓ 2 ) . Then P ( Z ℓ > 0) = E ( e Z ℓ 1 Z ℓ < 0 ) = Φ( − ℓ/ √ 2) and E (1 ∧ e Z ℓ ) = 2Φ( − ℓ/ √ 2) = β . (2.35) F u rthermor e, if z D ∼ N(0 , 1) then E [ z (1 ∧ e az + b )] = a exp( a 2 / 2 + b )Φ − b | a | − | a | (2.36) for any r e al c onstants a and b . Pr oof. A straigh tforward calculation. See Lemma 2 in [ 4 ]. The calculat ions of the exp ected one step drift and diffusion needed to pro ve Prop osition 2.1 are long and tec hn ical. In order to enhan ce the r ead- abilit y , in the n ext tw o sections we outline our p r o of strategy emph asizing the key calculati ons. 2.6. Heuristic ar gument for the e xp e cte d drift. In this section, w e w ill giv e heuristic arguments wh ic h u nderly ( 2.14 ) from Prop ositio n 2.1 . Recall that { φ 1 , φ 2 , . . . } is an orthonormal basis for H . L et x k i , i ≤ N , denote the i th co ordinate of x k and C N denote the co v ariance op erator on X N , the span of { φ 1 , φ 2 , . . . , φ N } . Also r ecall that F k denotes the sigma algebra generated by { x n , ξ n , γ n , n ≤ k } and the conditional exp ectations E ( ·|F k ) are denoted by E k ( · ). Thus E 0 ( · ) d enotes th e exp ectation with resp ect to ξ 1 and γ 1 with x 0 fixed. Also, f or notational con ve nience, set x 0 = x and ξ 1 = ξ . Letting E ξ 0 denote the exp ectation w ith resp ect to ξ , it follo ws that N E 0 ( x 1 i − x 0 i ) = N E 0 ( γ 1 ( y 1 i − x i )) = N E ξ 0 α ( x, ξ ) r 2 ℓ 2 N ( C 1 / 2 ξ ) i ! (2.37) = λ i √ 2 ℓ 2 N E ξ 0 ( α ( x, ξ ) ξ i ) = λ i √ 2 ℓ 2 N E ξ 0 ((1 ∧ e Q ( x,ξ ) ) ξ i ) . T o approximat ely ev aluate ( 2.37 ) using Lemm a 2.4 , it is easier to fir st f actor Q ( x, ξ ) in to comp onen ts in volving ξ i and those orthogonal (und er E ξ 0 ) to DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 19 them. T o this end we in tro d uce the follo win g term s : R ( x, ξ ) def = − r 2 ℓ 2 N N X j =1 ζ j ξ j − ℓ 2 N N X j =1 ξ 2 j , (2.38) R i ( x, ξ ) def = − r 2 ℓ 2 N N X j =1 ,j 6 = i ζ j ξ j − ℓ 2 N N X j =1 ,j 6 = i ξ 2 j . (2.39) Hence, for large N (see Lemma 5.5 ), Q ( x, ξ ) = R ( x, ξ ) − r ( x, ξ ) = R i ( x, ξ ) − r 2 ℓ 2 N ζ i ξ i − ℓ 2 N ξ 2 i − r ( x, ξ ) = R i ( x, ξ ) − r 2 ℓ 2 N ζ i ξ i + O 1 N (2.40) ≈ R i ( x, ξ ) − r 2 ℓ 2 N ζ i ξ i . The imp ortan t observ ation here is that conditional on x , the random v ari- able R i ( x, ξ ) is in dep end en t of ξ i . Hence, th e exp ect ation E ξ 0 ((1 ∧ e Q ( x,ξ ) ) ξ i ) can b e computed by first compu ting it ov er ξ i and then o v er ξ \ ξ i . Let E ξ − i , E ξ i denote the exp ectation with resp ect to ξ \ ξ i , ξ i , resp ectiv ely . Us- ing the relation ( 2.40 ), and applyin g ( 2. 36 ) with a = − q 2 ℓ 2 N ζ i , z = ξ i and b = R i ( x, ξ ) , we obtain (see Lemma 5.6 ) E ξ 0 ((1 ∧ e Q ( x,ξ ) ) ξ i ) ≈ − r 2 ℓ 2 N ζ i E ξ − i 0 e R i ( x,ξ )+( ℓ 2 / N ) ζ 2 i Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | − r 2 ℓ 2 N | ζ i | ! (2.41) ≈ − r 2 ℓ 2 N ζ i E ξ − i 0 e R i ( x,ξ )+ ℓ 2 / N ζ 2 i Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | . No w, aga in from the r elation ( 2.40 ) and the approximati on Q ( x, ξ ) en cap- sulated in ( 2.33 ), it follo ws that for su fficien tly large N R i ( x, ξ ) ≈ N( − ℓ 2 , 2 ℓ 2 ) , π -a.s. (2.42) Com b ining ( 2.41 ) w ith the fact that, for large enough N , Φ( − R i ( x, ξ ) / q 2 ℓ 2 N | ζ i | ) ≈ 1 R i ( x,ξ ) < 0 , we see that L emma 2.4 imp lies that (see Lemmas 5.7 – 5.10 ) E ξ − i 0 e R i ( x,ξ )+ ℓ 2 / N ζ 2 i Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | ≈ E ξ − i 0 ( e R i ( x,ξ ) 1 R i ( x,ξ ) < 0 ) (2.43) ≈ E e Z ℓ 1 Z ℓ < 0 = β / 2 , (2.44) 20 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T where Z ℓ D ∼ N ( − ℓ 2 , 2 ℓ 2 ). Hence, from ( 2.37 ), ( 2.41 ) and ( 2.44 ), we gather that for large N , N E 0 ( x 1 i − x 0 i ) ≈ − ℓ 2 β λ i ζ i . T o identify the d r ift, observe that since C − 1 / 2 is self-adjoint and i ≤ N , w e ha ve λ i C − 1 / 2 φ i = φ i and λ i ζ i = λ i h C − 1 / 2 ( P N x ) + C 1 / 2 ∇ Ψ N ( x ) , φ i i = λ i h C − 1 / 2 ( P N x ) + C − 1 / 2 C ∇ Ψ N ( x ) , φ i i (2.45) = h P N x + C N ∇ Ψ N ( x ) , φ i i . Hence, for large enough N , w e deduce that (heuristically) th e exp ect ed drift in the i th coord inate after one step of the Mark o v chain { x k } is we ll ap- pro x im ated by the expression N E 0 ( x 1 i − x 0 i ) ≈ − ℓ 2 β ( P N x + C N ∇ Ψ N ( x )) i . This is an app r o ximation of the drift term that app ears in the SPDE ( 1.9 ). Therefore, th e ab ov e heur istic arguments sho w how the Metrop olis algorithm ac hiev es the “change of measure” b y m apping π 0 to π . Th e ab o ve argumen ts can b e made rigorous b y quantitat iv ely con trolling the errors made. I n S ec- tion 5 , we quant ify the size of the neglected terms and qu an tify the rate at whic h Q is w ell appro xim ated b y a Gauss ian distribution. Using these es- timates, in Section 5.2 w e will retrace the argumen ts of this section pa yin g atten tion to the cumulat iv e error, thereb y provi ng ( 2.14 ) of Prop ositio n 2.1 . 2.7. Heuristic ar gument for the exp e cte d diffusion c o efficient. W e no w giv e th e heuristic arguments for the exp ecte d diffusion co efficien t, after one step of the Marko v c h ain { x k } . T he argum ents used h ere are m u c h simpler than the drift calculati ons. The strategy is the same as in the dr ift case except that no w w e consider the co v ariance b etw een t wo co ordinates x 1 i and x 1 j . F or 1 ≤ i, j ≤ N , N E 0 [( x 1 i − x 0 i )( x 1 j − x 0 j )] = N E ξ 0 [( y 1 i − x i )( y 1 j − x j ) α ( x, ξ )] (2.46) = N E ξ 0 [( y 1 i − x i )( y 1 j − x j )(1 ∧ exp Q ( x, ξ ))] = 2 ℓ 2 E ξ 0 [( C 1 / 2 ξ ) i ( C 1 / 2 ξ ) j (1 ∧ exp Q ( x, ξ ))] . No w notice that E ξ 0 [( C 1 / 2 ξ ) i ( C 1 / 2 ξ ) j ] = λ i λ j δ ij , DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 21 where δ ij = 1 i = j . S im ilar to the calculations used wh en ev aluating the ex- p ected d rift, we define R ij ( x, ξ ) def = − r 2 ℓ 2 N N X k =1 ,k 6 = i,j ζ k ξ k − ℓ 2 N N X k =1 ,k 6 = i,j ξ 2 k (2.47) and observe that R ( x, ξ ) = R ij ( x, ξ ) − r 2 ℓ 2 N ζ i ξ i − ℓ 2 N ξ 2 i − r 2 ℓ 2 N ζ j ξ j − ℓ 2 N ξ 2 j . Hence, for sufficientl y large N , w e h a ve Q ( x, ξ ) ≈ R ij ( x, ξ ) . By replacing Q ( x, ξ ) in ( 2.46 ) by R ij ( x, ξ ) we can tak e adv antag e of the f act that R ij ( x, ξ ) is conditionally indep end en t of ξ i , ξ j . Ho we ver, the additional error term in tro duced is easy to estimate b ecause the function f ( x ) def = (1 ∧ e x ) is 1- Lipsc h itz. S o, for large enough N (Lemma 5.12 ), E ξ 0 [( C 1 / 2 ξ ) i ( C 1 / 2 ξ ) j (1 ∧ exp Q ( x, ξ ))] ≈ E ξ 0 [( C 1 / 2 ξ ) i ( C 1 / 2 ξ ) j (1 ∧ exp R ij ( x, ξ )) ] (2.48) = λ i λ j δ ij E ξ − ij 0 [(1 ∧ exp R ij ( x, ξ )) ] . Again, as in the d r ift calculation, w e h av e that R ij ( x, ξ ) = ⇒ N( − ℓ 2 , 2 ℓ 2 ) , π -a.s. So by the dominated conv ergence theorem and Lemma 2.4 , lim N →∞ E ξ − ij [(1 ∧ exp R ij ( x, ξ )) ] = β . (2.49) Therefore, f or large N , N E 0 [( x 1 i − x 0 i )( x 1 j − x 0 j )] ≈ 2 ℓ 2 β λ i λ j δ ij = 2 ℓ 2 β h φ i , C φ j i or in other w ords, N E 0 [( x 1 − x 0 ) ⊗ ( x 1 − x 0 )] ≈ 2 ℓ 2 β C N . As w ith the drift calculations in th e last section, these calculations ca n b e made rigorous by trac king the size of th e negle cted terms and quan tifying the rate at w hic h Q is appro ximated by the ap p ropriate Gaussian. W e will substanti ate these argumen ts Section 5.3 . 3. Main theorem. I n this section we state the assumptions w e make on π 0 and Ψ and then pro v e our main theorem. 3.1. Assumptions on Ψ and C . The assump tions w e make no w concern (i) the r ate of d eca y of the standard d eviations in the prior or referen ce measure π 0 and (ii) the prop erties of the Radon–Nik o d ym der iv ativ e (lik e- liho o d function). Th ese assumptions are naturally linked; in order for π to 22 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T b e well d efi ned w e require that Ψ is π 0 -measurable and this can b e ac hieved b y ensuring that Ψ is con tinuous on a space wh ic h h as full measure u n- der π 0 . In fact, in a wide range of app lications, Ψ is Lipsc hitz on su c h a space [ 27 ]. In this pap er we r equ ire, in addition, that Ψ b e twice differen- tiable in order to define the diffusion limit. This, to o, ma y b e established in man y applications. T o av oid tec h nicalities, w e assu me that Ψ( x ) is quadrat- ically b oun ded, with first deriv ativ e linearly b oun ded and second deriv ative globally b ounded. A simple example of a fu n ction Ψ satisfying th e ab o ve assumptions is Ψ( x ) = k x k 2 s . Assumpt ion s 3.1. The op erator C and fun ctional Ψ satisfy the follo w- ing: (1) De c ay of eigenvalues λ 2 i of C : Th ere exist M − , M + ∈ (0 , ∞ ) and κ > 1 2 suc h that M − ≤ i κ λ i ≤ M + ∀ i ∈ Z + . (3.1) (2) Assumptions on Ψ : There exist constan ts M i ∈ R , i ≤ 4 and s ∈ [0 , κ − 1 / 2) suc h that M 1 ≤ Ψ( x ) ≤ M 2 (1 + k x k 2 s ) ∀ x ∈ H s , (3.2) k∇ Ψ( x ) k − s ≤ M 3 (1 + k x k s ) ∀ x ∈ H s , (3.3) k ∂ 2 Ψ( x ) k L ( H s , H − s ) ≤ M 4 ∀ x ∈ H s . (3.4) Notice also that the ab o ve assumptions on Ψ imply that for all x, y ∈ H s , | Ψ( x ) − Ψ( y ) | ≤ M 5 (1 + k x k s + k y k s ) k x − y k s , (3.5a) Ψ( y ) = Ψ( x ) + h∇ Ψ( x ) , y − x i + rem( x, y ) , (3.5b) rem( x, y ) ≤ M 6 k x − y k 2 s (3.5c) for some constants M 5 , M 6 ∈ R + . Remark 3.2. The condition κ > 1 2 ensures that the co v ariance op erator for π 0 is trace class. In fact, the H r norm of a r ealization of a Gaussian m ea- sure N(0 , C ) d efined on H is almost sur ely fin ite if and only if r < κ − 1 2 [ 13 ]. Th us the choic e of Sob olev sp ace H s , with s ∈ [0 , κ − 1 2 ) in w hic h we state the assumptions on Ψ , is made to ensu re that the Radon–Nik o d ym deriv ativ e of π with resp ect to π 0 is w ell defined. Indeed, under our assumptions, Ψ is Lipsc h itz contin uous on a set of f ull π 0 measure; it is hence π 0 -measurable. W eak er gro wth assu mptions on Ψ , its Lips chitz constant and second deriv a- tiv e could b e dealt with by u s e of stopping time arguments. The follo win g lemma will b e u sed rep eatedly . DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 23 Lemma 3.3. U nder Assumptions 3.1 it fol lows that, for al l a ∈ R , k C a x k ≍ k x k − 2 κa . (3.6) F u rthermor e, the function C ∇ Ψ : H s → H s is glob al ly Lipschitz. Pr oof. The first r esu lt follo ws from the inequalit y k C a x k 2 = ∞ X j =1 λ 4 a j x 2 j ≤ M + ∞ X j =1 j − 4 aκ x 2 j = M + k x k 2 − 2 κa , and a similar lo wer b ound, using ( 3.1 ). T o p ro ve the global L ip sc hitz pr op ert y w e first n ote that ∇ Ψ( u 1 ) − ∇ Ψ( u 2 ) = K ( u 1 − u 2 ) (3.7) : = Z 1 0 ∂ 2 Ψ( tu 1 + (1 − t ) u 2 ) dt ( u 1 − u 2 ) . Note that k K k L ( H s , H − s ) ≤ M 4 b y ( 3.4 ). Thus, k C ( ∇ Ψ( u 1 ) − ∇ Ψ( u 2 )) k s ≤ M k C 1 − s/ 2 κ K ( u 1 − u 2 ) k ≤ M k C 1 − s/ 2 κ K C s/ 2 k C − s/ 2 k ( u 1 − u 2 ) k ≤ M k C 1 − s/ 2 κ K C s/ 2 k k L ( H , H ) k u 1 − u 2 k s ≤ M k C 1 − s/ 2 κ k L ( H − s , H ) k K k L ( H s , H − s ) k C s/ 2 k k L ( H , H s ) k u 1 − u 2 k s . The three lin ear op erators are b ound ed b et ween the app r opriate s paces, in the case of C 1 − s/ 2 κ b y usin g the fact that s < κ − 1 2 implies s < κ . 3.2. Finite-dimensional appr oximation of the invariant distribution. F or simplicit y w e assume throughout this pap er that Ψ N ( · ) = Ψ ( P N · ). W e n ote again that ∇ Ψ N ( x ) = P N ∇ Ψ( P N x ) and ∂ 2 Ψ N ( x ) = P N ∂ 2 Ψ( P N x ) P N . Other appro ximations could b e h andled similarly . The fu nction Ψ N ma y b e shown to satisfy th e follo w in g. Assumpt ion s 3.4 (Assumptions on Ψ N ). The functions Ψ N : X N 7→ R satisfy th e same conditions imp osed on Ψ giv en by equations ( 3.2 ), ( 3.3 ) and ( 3.4 ) w ith the same constan ts unif orm ly in N . It is straigh tforward to sho w that the ab o v e assumptions on Ψ N im- ply that the sequence of measures { π N } conv erges to π in the Hellinger metric (see [ 12 ]). Therefore, the measures { π N } are go o d cand id ates for finite-dimensional appr o ximations of π . F urthermore, the n ormalizing con- stan ts M Ψ N are uniformly b ound ed and we use this fact to obtain un if orm b ound s on moments of functionals in H under π N . 24 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T Lemma 3.5. U nder the Assumptions 3.4 on Ψ N , sup N ∈ N M Ψ N < ∞ and for any me asur able functional f : H 7→ R , and any p ≥ 1 , sup N ∈ N E π N | f ( x ) | p ≤ M E π 0 | f ( x ) | p . (3.8) Pr oof. By definition, M − 1 Ψ N = Z H exp {− Ψ N ( x ) } π 0 ( dx ) ≥ Z H exp {− M (1 + k x k 2 s ) } π 0 ( dx ) ≥ e − 2 M P π 0 ( k x k s ≤ 1) and therefore, if inf { M − 1 Ψ N : N ∈ N } > 0, then sup { M Ψ N : N ∈ N } < ∞ . Hence, for an y f : H 7→ R , sup N ∈ N E π N | f ( x ) | p ≤ sup N ∈ N M Ψ N E π 0 ( e − Ψ N ( x ) | f ( x ) | p ) ≤ M E π 0 | f ( x ) | p pro v in g the lemma. The unif orm estimate give n in ( 3.8 ) will b e us ed r ep eatedly in the sequel. 3.3. Statement and pr o of of the main the or em. The assump tions made ab o ve allo w u s to fully state the main result of this article, as outlined in Section 2.4 . Theorem 3.6. L et the Assumptions 3.1 , 3.4 ho ld. L et the initial c on- dition x 0 of the R W M algorithm b e such that x 0 D ∼ π N and let z N ( t ) b e a pie c ewise line ar, c ontinuous interp olant of the R WM algorith m ( 2.12 ) as define d in ( 1.2 ). Then z N ( t ) c onver ges we akly in C ([0 , T ] , H s ) to the diffu- sion pr o c ess z ( t ) giv en by ( 1.9 ) with z (0) D ∼ π . Throughout th e remainder of the pap er w e assume that Assump tions 3.1 , 3.4 hold, w ithout explicitly stating this fact. The p r o of of Theorem 3.6 is giv en b elo w and relies on Prop osition 2.1 stated ab o v e and pr o ved in S ec- tion 5 , P r op osition 2.2 stated ab ov e and pro v ed in Section 4 and Lemm a 3.7 whic h w e now state and then pro ve at the end of this section. Lemma 3.7. Fix any T > 0 , any z 0 ∈ H s and any W ∈ C ([0 , T ] , H s ) . Then the inte gr al e quation ( 2.23 ) has a u ni q ue solution z ∈ C ([0 , T ] , H s ) . F u rthermor e, z = Θ( z 0 , W ) wher e Θ: H s × C ([0 , T ]; H s ) → C ([0 , T ]; H s ) as define d in ( 2.23 ) is c ontinuous. Pr oof of Theorem 3.6 . W e b egin by trac king the error in the Euler appro ximation argu m en t. As b efore, let x 0 D ∼ π N and assume x (0) = x 0 . DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 25 Returning to ( 2.19 ), usin g the defin itions f rom ( 2.16 ) and Prop osition 2.1 , pro du ces x k +1 = x k + E k ( x k +1 − x k ) + r 2 ℓ 2 β N Γ k +1 ,N , (3.9) x k +1 = x k + 1 N d N ( x k ) + r 2 ℓ 2 β N Γ k +1 ,N (3.10) = x k − ℓ 2 β N m N ( x k ) + r 2 ℓ 2 β N Γ k +1 ,N + r k +1 ,N N , (3.11) where d N ( · ) is defined as in ( 2.25 ) and r k +1 ,N as in ( 2.17 ). By construction, E k (Γ k +1 ,N ) = 0 and E k (Γ k +1 ,N ⊗ Γ k +1 ,N ) = N 2 ℓ 2 β [ E k (( x k +1 − x k ) ⊗ ( x k +1 − x k )) (3.12) − E k ( x k +1 − x k ) ⊗ E k ( x k +1 − x k )] = C N + 1 2 ℓ 2 β E k +1 ,N − N 2 ℓ 2 β [ E k ( x k +1 − x k ) ⊗ E k ( x k +1 − x k )] , where E k +1 ,N is as give n in ( 2.18 ). Recall t k giv en by ( 2.21 ) and W N , the linear interp olant of a correctly scaled su m of the Γ k ,N , giv en by ( 2.22 ). W e now defin e c W N so that ( 2.27 ) holds as stated and h ence, Θ( c W N ) = z N . Define r N 1 ( t ) def = r k +1 ,N for t ∈ [ t k , t k +1 ) , r N 2 ( s ) def = ℓ 2 β ( z N ( s ) + C ∇ Ψ( z N ( s )) − m N ( ¯ z N ( s ))) , where r k +1 ,N ( · ) is give n by ( 2.17 ), m N is f r om ( 2.16 ), ¯ z N from ( 2.24 ) and z N from ( 2.26 ). If c W N ( t ) def = W N ( t ) + (1 / p 2 ℓ 2 β ) e N ( t ) with e N ( t ) = R t 0 ( r N 1 ( u ) + r N 2 ( u )) du , then ( 2.27 ) holds. T o see this, observ e from ( 2.26 ) that z N ( t ) = z 0 + Z t 0 d N ( ¯ z N ( u )) du + p 2 ℓ 2 β W N ( t ) = z 0 − ℓ 2 β Z t 0 m N ( ¯ z N ( u )) du + Z t 0 r N 1 ( s ) ds + p 2 ℓ 2 β W N ( t ) = z 0 − ℓ 2 β Z t 0 ( z N ( u ) + C ∇ Ψ( z N ( u ))) du + Z t 0 ( r N 1 ( s ) + r N 2 ( s )) ds 26 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T + p 2 ℓ 2 β W N ( t ) = z 0 − ℓ 2 β Z t 0 ( z N ( u ) + C ∇ Ψ( z N ( u ))) du + p 2 ℓ 2 β c W N ( t ) and hence, with this d efinition of c W N , ( 2.27 ) h olds. F ur th ermore, w e claim that lim N →∞ E π N sup t ∈ [0 ,T ] k e N ( t ) k 2 s = 0 . (3 .13) T o pro v e this, n otice that sup t ∈ [0 ,T ] k e N ( t ) k 2 s ≤ M sup t ∈ [0 ,T ] Z t 0 k r N 1 ( u ) k 2 s du + sup t ∈ [0 ,T ] Z t 0 k r N 2 ( u ) k 2 s du . Also E π N sup t ∈ [0 ,T ] Z t 0 k r N 1 ( u ) k 2 s du ≤ E π N Z T 0 k r N 1 ( u ) k 2 s du ≤ M 1 N E π N N X k =1 k r k ,N k 2 s = M E π N k r 1 ,N k 2 s N →∞ − → 0 , where w e u sed stationarit y of r k ,N and ( 2.14 ) from Prop osition 2.1 in the last step. W e no w estimate the second term similarly to complete the p ro of. Recall th at the f unction z 7→ z + C ∇ Ψ( z ) is Lipsc hitz on H s b y Lemma 3.3 . Note also that C N ∇ Ψ N ( · ) = C P N ∇ Ψ( P N · ). T h u s, k r N 2 ( u ) k s ≤ M k z N ( u ) − P N ¯ z N ( u ) k s + k C ( I − P N ) ∇ Ψ( P N ¯ z N ( u )) k s ≤ M ( k z N ( u ) − ¯ z N ( u ) k s + k ( I − P N ) ¯ z N ( u ) k s ) + k ( I − P N ) C ∇ Ψ( P N ¯ z N ( u )) k s . But for an y u ∈ [ t k , t k +1 ), we ha v e k z N ( u ) − ¯ z N ( u ) k s ≤ k x k +1 − x k k s ≤ k y k +1 − x k k s . This follo ws fr om the fact that ¯ z N ( u ) = x k and z N ( u ) = 1 ∆ t (( u − t k ) x k +1 + ( t k +1 − u ) x k ), b eca use x k +1 − x k = γ k +1 ( y k +1 − x k ) and | γ k +1 | ≤ 1 . F or u ∈ [ t k , t k +1 ), we also hav e k ( P N − I ) ¯ z N ( u ) k s = k ( P N − I ) x k k s = k ( P N − I ) x 0 k s , b ecause x k is not u p d ated in H s \ X N , and k ( P N − I ) C ∇ Ψ ( P N ¯ z N ( u )) k s = k ( P N − I ) C ∇ Ψ( P N x k ) k s . DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 27 Hence, we ha ve b y stationarit y that, f or all u ∈ [0 , T ], E π k r N 2 ( u ) k 2 s ≤ M E π k y 1 − x 0 k 2 s + M E π ( k ( P N − I ) x 0 k 2 s + k ( P N − I ) C ∇ Ψ ( P N x 0 ) k 2 s ) . Equation ( 2.12 ) sho w s that E π k y 1 − x 0 k 2 s ≤ M N − 1 . The definition of P N giv es E π k ( P N − I ) x k 2 s ≤ N − ( r − s ) E π k x k 2 r for an y r ∈ ( s, κ − 1 / 2). Note that E π k x 0 k 2 r is fin ite for r ∈ ( s, κ − 1 / 2) by Lemma 3.5 and the p rop erties of π 0 . Similarly , we h a ve that for r ≤ 2 κ − s < κ + 1 2 , E k C ∇ Ψ ( P N x 0 ) k 2 r ≤ M E k C 1 − ( r + s ) / 2 κ k L ( H , H ) k∇ Ψ( P N x 0 ) k 2 − s ≤ M E (1 + k x 0 k 2 s ) . Hence, we d educe that E π N k r N 2 ( u ) k 2 s → 0 uniform ly for u ∈ [0 , T ]. It follo ws that E π N sup t ∈ [0 ,T ] Z t 0 k r N 2 ( u ) k 2 s du ≤ E π N Z T 0 k r N 2 ( u ) k 2 s du ≤ Z T 0 E π N k r N 2 ( u ) k 2 s du → 0 and we ha ve prov ed the claim concernin g e N made in ( 3.13 ). The pro of concludes with a straigh tforward app lication of the con tin uous mapping theorem. Let c W N = W N + 1 √ 2 ℓ 2 β e N . Let Ω denote the p r obabil- it y space generating th e Marko v c h ain in stationarit y . W e h av e shown that e N → 0 in L 2 (Ω; C ([0 , T ] , H s )) and by Prop ositio n 2.2 , W N con verge s weakly to W a Bro wn ian motion with co v ariance op erato r C s in C ([0 , T ] , H s ). F ur - thermore, w e also ha ve th at W is indep enden t of z 0 . Th us ( z 0 , c W N ) con- v erges w eakly to ( z 0 , W ) in H s × C ([0 , T ] , H s ), w ith z 0 and W indep enden t. Notice that z N = Θ( z 0 , c W N ), w here Θ is defined as in Lemma 3.7 . Sin ce Θ is a conti n u ous map b y Lemma 3.7 , w e dedu ce from th e con tinuous mappin g theorem that the p ro cess z N con verge s we akly in C ([0 , T ] , H s ) to z w ith law giv en b y Θ ( z 0 , W ). Since W is indep enden t of z 0 , this is pr ecisely th e law of the S PDE giv en by ( 1.9 ). Pr oof of Lemma 3.7 . Consider the mappin g z ( n ) 7→ z ( n +1) defined by z ( n +1) ( t ) = z 0 − h ( ℓ ) Z t 0 ( z ( n ) ( s ) + C ∇ Ψ( z ( n ) ( s ))) ds + p 2 h ( ℓ ) W ( t ) for arbitrary z 0 ∈ H and W ∈ C ([0 , T ]; H s ). Recall fr om Lemma 3.3 that z 7→ z + C ∇ Ψ( z ) is globally Lipschitz on H s . It is then a straigh tforw ard application of the con tr action mapping theorem to sho w that this mapping has a unique fix ed p oin t in C ([0 , T ]; H s ), for T sufficien tly small. Rep eated application of the same idea extends this existence and un iqueness result to arbitrary time-in terv als. Let z i solv e ( 2.23 ) with ( z 0 , W ) = ( w i , W i ) , i = 1 , 2. 28 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T Subtracting the tw o equations and using the fact that z 7→ z + C ∇ Ψ ( z ) is globally Lipschitz on H s giv es k z 1 ( t ) − z 2 ( t ) k s ≤ k w 1 − w 2 k s + M Z t 0 k z 1 ( s ) − z 2 ( s ) k s ds + p 2 ℓ 2 β k W 1 ( t ) − W 2 ( t ) k s . Th us, sup 0 ≤ t ≤ T k z 1 ( t ) − z 2 ( t ) k s ≤ k w 1 − w 2 k s + M Z T 0 sup 0 ≤ τ ≤ s k z 1 ( τ ) − z 2 ( τ ) k s ds + p 2 ℓ 2 β sup 0 ≤ t ≤ T k W 1 ( t ) − W 2 ( t ) k s . The Gronw all lemma giv es con tin uity in th e desired sp aces. 4. W eak con ve rgence of the noise pro cess: Pr o of of Prop osition 2.2 . Throughout, we mak e the standing Assum p tions 3.1 , 3.4 w ith ou t explicit men tion. The pro of of Prop osition 2.2 u ses the f ollo wing result concerning triangular martingale incremen t arra ys. The result is similar to the classical results on triangular arra ys of indep enden t increment s. Let k N : [0 , T ] → Z + b e a sequence of n on d ecreasing, righ t-con tinuous functions indexed b y N with k N (0) = 0 and k N ( T ) ≥ 1. Let { M k ,N , F k ,N } 0 ≤ k ≤ k N ( T ) b e an H s v alued m artin gale d ifferen ce array . That is, for k = 1 , . . . , k N ( T ), w e ha v e E ( M k ,N |F k − 1 ,N ) = 0, E ( k M k ,N k 2 s |F k − 1 ,N ) < ∞ almost su rely , and F k − 1 ,N ⊂ F k ,N . W e will make u se of the follo win g r esult. Pr oposition 4.1 ([ 3 ], Prop ositio n 5.1). L et S : H s → H s b e a self- adjoint, p ositive definite, op er ator with finite tr ac e. Assume that, for al l x ∈ H s , ǫ > 0 and t ∈ [0 , T ] , the fol lowing limits hold in pr ob ability: lim N →∞ k N ( T ) X k =1 E ( k M k ,N k 2 s |F k − 1 ,N ) = T trace( S ) , (4.1) lim N →∞ k N ( t ) X k =1 E ( h M k ,N , x i 2 s |F k − 1 ,N ) = t h S x, x i s , (4.2) lim N →∞ k N ( T ) X k =1 E ( h M k ,N , x i 2 s 1 |h M k,N ,x i s |≥ ǫ |F k − 1 ,N ) = 0 . (4.3) Define a c ontinuous time pr o c ess W N by W N ( t ) = P k N ( t ) k =1 M k ,N if k N ( t ) ≥ 1 and k N ( t ) > lim r → 0 + k N ( t − r ) , and by line ar interp olation otherwise. Then the se quenc e of r andom variables W N c onver ges we akly in C ([0 , T ] , H s ) to DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 29 an H s value d Br ownian motion W , with W (0) = 0 , E ( W ( T )) = 0 , and with c ovarianc e op er ator S . Remark 4.2. The first tw o hyp otheses of the ab o ve th eorem en s ure the w eak conv er gence of finite-dimensional distr ib utions of W N ( t ) usin g the martingale cen tral limit theorem in R N ; the last hyp othesis is n eeded to v erify the tigh tness of the f amily { W N ( · ) } . As noted in [ 11 ], the second h y p othesis [equation ( 4.2 )] of Prop osition 4.1 is implied by lim N →∞ k N ( t ) X k =1 E ( h M k ,N , e n i s h M k ,N , e m i s |F k − 1 ,N ) = t h S e n , e m i s (4.4) in probabilit y , where { e n } is an y orthonormal basis for H s . Th e thir d hy- p othesis in ( 4.3 ) is implied by the Lindeb erg type condition, lim N →∞ k N ( T ) X k =1 E ( k M k ,N k 2 s 1 k M k,N k s ≥ ǫ |F k − 1 ,N ) = 0 (4.5) in probability , f or an y fixed ǫ > 0. Using Prop osition 4.1 w e now giv e th e pr o of of Prop ositio n 2.2 . Pr oof of Pr op osition 2.2 . W e apply Prop osition 4.1 with k N ( t ) def = ⌊ N t ⌋ , M k ,N def = 1 √ N Γ k ,N and S def = C s ; the r esu lting defin ition of W N ( t ) from Prop osition 4.1 coincides with that giv en in ( 2.22 ). W e set F k ,N to b e the sigma algebra generated by { x j , ξ j } j ≤ k with x 0 ∼ π N . Since the c hain is stationary , the noise pro cess { Γ k ,N , 1 ≤ k ≤ N } is iden tically distributed, and so are the errors r k ,N and E k ,N from ( 2.17 ) and ( 2.18 ), resp ectiv ely . W e now v erify the three hyp otheses requ ired to apply P rop osition 4.1 . W e generalize the notation E ξ 0 ( · ) fr om Section 2.6 and set E ξ ( ·|F k ,N ) = E ξ k ( · ). • Condition ( 4.1 ). I t is enough to sho w that lim N →∞ E π N 1 N ⌊ N T ⌋ X k =1 E ξ k − 1 ( k Γ k ,N k 2 s ) − trace( C s ) = 0 and condition ( 4.1 ) will follo w fr om Mark ov’s inequalit y . By ( 3.12 ) and ( 2.2 ), E ξ 0 ( k Γ 1 ,N k 2 s ) = N X j =1 E ξ 0 ( k B 1 / 2 s Γ 1 ,N k 2 ) = N X j =1 E ξ 0 h Γ 1 ,N , B 1 / 2 s φ j i 2 = N X j =1 E ξ 0 h B 1 / 2 s φ j , Γ 1 ,N ⊗ Γ 1 ,N B 1 / 2 s φ j i (4.6 ) 30 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T = trace( C N s ) + 1 2 ℓ 2 β N X j =1 h φ j , E 1 ,N φ j i s (4.7) − N 2 ℓ 2 β k E 0 ( x 1 − x 0 ) k 2 s . By Prop osition 2.1 it follo ws that E π N | P N j =1 h φ j , E 1 ,N φ j i s | → 0. F or the third term, n otice that b y Prop osition 2.1 ( 2.14 ) we ha ve E π N N 2 ℓ 2 β k E 0 ( x 1 − x 0 ) k 2 s ≤ M 1 N E π N ( k m N ( x 0 ) k 2 s + k r 1 ,N k 2 s ) ≤ M 1 N ( E π N (1 + k x 0 k s ) 2 + E π N k r 1 ,N k 2 s ) (4.8) → 0 , where the second inequalit y follo ws from the fact that C ∇ Ψ is globally Lipsc h itz in H s . Also { E k ,N } is a stationary sequ ence. Therefore, E π N 1 N ⌊ N T ⌋ X k =1 E ξ k − 1 ( k Γ k ,N k 2 s ) − T trace( C N s ) ≤ M E π N N X j =1 h φ j , E 1 ,N φ j i s + N 2 ℓ 2 β k E 0 ( x 1 − x 0 ) k 2 s ! + trace ( C N s ) ⌊ N T ⌋ N − T → 0 . Condition ( 4.1 ) no w f ollo ws from the fact that lim N →∞ | trace( C s ) − trace( C N s ) | = 0 . • Condition ( 4.2 ). By Remark 4.2 , it is enough to verify ( 4.4 ). T o sho w ( 4.4 ), using stationarit y and similar arguments used in verifying condition ( 4.1 ), it suffices to sho w that lim N →∞ E π N | E ξ 0 ( h Γ 1 ,N , b φ n i s h Γ 1 ,N , b φ m i s ) − h b φ n , C N s b φ m i s | = 0 , (4.9 ) where { b φ k } is as defin ed in ( 2.7 ). W e h av e E π N | E ξ 0 ( h Γ 1 ,N , b φ n i s h Γ 1 ,N , b φ m i s ) − h b φ n , C N s b φ m i s | = n − s m − s E π N | E ξ 0 ( h Γ 1 ,N , φ n i s h Γ 1 ,N , φ m i s ) − h φ n , C N s φ m i s | and therefore, it is enough to sh o w th at lim N →∞ E π N | E ξ 0 ( h Γ 1 ,N , φ n i s h Γ 1 ,N , φ m i s ) − h φ n , C N s φ m i s | = 0 . (4.10) DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 31 Indeed we ha ve h Γ 1 ,N , φ n i s h Γ 1 ,N , φ m i s = h Γ 1 ,N , B s φ n ih Γ 1 ,N , B s φ m i = h B s φ n , Γ 1 ,N ⊗ Γ 1 ,N B s φ m i = h φ n , B 1 / 2 s Γ 1 ,N ⊗ Γ 1 ,N B 1 / 2 s φ m i s and from ( 3.12 ) and Pr op osition 2.1 w e obtain h φ n , B 1 / 2 s Γ 1 ,N ⊗ Γ 1 ,N B 1 / 2 s φ m i s − h φ n , C N s φ m i s = h φ n , B 1 / 2 s Γ 1 ,N ⊗ Γ 1 ,N B 1 / 2 s φ m i s − h φ n , B 1 / 2 s C N B 1 / 2 s φ m i s = n s m s h φ n , E 1 ,N φ m i s − N 2 ℓ 2 β E 0 ( h x 1 − x 0 , φ n i s ) E 0 ( h x 1 − x 0 , φ m i s ) . F rom P r op osition 2.1 , it follo ws th at lim N →∞ E π N |h φ n , E 1 ,N φ m i s | = 0. Also notice that N 2 [ E π N | E 0 ( h x 1 − x 0 , φ n i s ) E 0 ( h x 1 − x 0 , φ m i s ) | ] 2 ≤ M E π N ( N k E 0 ( x 1 − x 0 ) k 2 s k φ n k 2 s ) E π N ( N k E 0 ( x 1 − x 0 ) k 2 s k φ m k 2 s ) → 0 b y the calculation done in ( 4.8 ). T h us ( 4.10 ) holds and since |h φ n , C s φ m i s − h φ n , C N s φ m i s | → 0, equation ( 4.2 ) follo ws from Mark o v’s inequalit y . • Condition ( 4.3 ). F rom Remark 4.2 it follo w s that verifying ( 4.5 ) suffices to establish ( 4.3 ). T o v erify ( 4.5 ), n otice that for an y ǫ > 0, E π N 1 N ⌊ N T ⌋ X k =1 E ξ k − 1 ( k Γ k ,N k 2 s 1 {k Γ k,N k 2 s ≥ ǫN } ) ≤ ⌊ N T ⌋ N E π N ( k Γ 1 ,N k 2 s 1 {k Γ 1 ,N k 2 s ≥ ǫN } ) → 0 b y the domin ated con v ergence theorem sin ce lim N →∞ E π N k Γ 1 ,N k 2 s = trace( C s ) < ∞ . Th us ( 4.5 ) is verified. Th us w e hav e v erifi ed all th ree h yp otheses of Pr op osition 4.1 , proving that W N ( t ) con ve r ges w eakly to W ( t ) in C ([0 , T ]; H s ). Recall that X R ⊂ H s denotes the R -dimensional subspace P R H s . T o pro ve the second claim of Prop osition 2.2 , w e need to sho w that ( x 0 , W N ( t )) con verge s we akly to ( z 0 , W ( t )) in ( H s , C ([0 , T ]; H s )) as N → ∞ w here z 0 ∼ π and z 0 is indep enden t of the limiting noise W . F or sho wing this, it is enou gh 32 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T to sho w that for an y R ∈ N , the pair ( x 0 , P R W N ( t )) con verges w eakly to ( z 0 , Z R ) for ev ery t > 0, wh ere Z R is a Gaussian r andom v ariable on X R with m ean zero, co v ariance tP R C s P R and in dep end en t of z 0 . W e will prov e this statement as the corollary of the follo wing lemma. Lemma 4.3. L et x 0 ∼ π N and let { θ k ,N } b e any statio nary martingale se qu e nc e adapte d to the filtr ation {F k ,N } and furthermor e, assume that ther e exists a stationary se quenc e { U k ,N } su ch that for al l k ≥ 1 and any u ∈ X R : (1) E ξ k − 1 |h u, P R θ k ,N i s | 2 = h u, P R C s u i s + U k ,N , lim N →∞ E π N | U 1 ,N | = 0 . (2) E ξ k − 1 k θ k ,N k 3 s ≤ M . Then for any t ∈ H s , u ∈ X R , R ∈ N and t > 0 , lim N →∞ E π N ( e i h t ,x 0 i s +( i/ √ N ) P ⌊ N t ⌋ k =1 h u,P R θ k,N i s ) (4.11) = E π ( e i h t ,z 0 i s − ( t/ 2) h u,P R C s u i s ) . Note: Here an d in Corollary 4.4 , i = √ − 1. Pr oof of Lemma 4.3 . W e sh o w ( 4.1 1 ) for t = 1, sin ce th e calculations are nearly iden tical for an arb itrary t with minor n otational c han ges. In deed, w e ha ve E π N ( e i h t ,x 0 i s +( i/ √ N ) P N k =1 h u,P R θ k,N i s ) = E π N ( E ξ N − 1 ( e i h t ,x 0 i s +( i/ √ N ) P N k =1 h u,P R θ k,N i s )) . By T a ylor’s expansion, E π N ( E ξ N − 1 ( e i h t ,x 0 i s +( i √ N ) P N k =1 h u,P R θ k,N i s )) = E e i h t ,x 0 i s +( i/ √ N ) P N − 1 k =1 h u,P R θ k,N i s (4.12) × 1 − 1 2 N E ξ N − 1 |h u, P R θ N ,N i s | 2 + M 1 N 3 / 2 V N ∧ 2 , where | V N | ≤ E ξ N − 1 |h u, P R θ N ,N i s | 3 ≤ M , since b y assum ption E ξ N − 1 k θ N ,N k 3 s ≤ M . W e also ha ve that E ξ N − 1 |h u, P R θ N ,N i s | 2 = h u, P R C s u i s + U N ,N , lim N →∞ E π N | U N ,N | = 0 . DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 33 Th us from ( 4.12 ) w e deduce that E π N ( e i h t ,x 0 i s +( i/ √ N ) P N k =1 h u,P R θ k,N i s ) = E π N e i h t ,x 0 i s +( i/ √ N ) P N − 1 k =1 h u,P R θ k,N i s 1 − 1 2 N h u, P R C s u i s + S N , (4.13) | S N | ≤ M E π N 1 2 N | U N ,N | + 1 N 3 / 2 | V N | = M 1 N E π N | U N ,N | + 1 √ N . Pro ceeding r ecursiv ely w e obtain E π N ( e i h t ,x 0 i s +( i/ √ N ) P N k =1 h u,P R θ k,N i s ) = E π N e i h t ,x 0 i s 1 − 1 2 N h u, P R C s u i s N + N X k =1 S k . By the sta tionarit y of { U k ,N } and the f act that E π | U k ,N | → 0 as N → ∞ , from ( 4.13 ) it follo ws that N X k =1 | S k | ≤ M N X k =1 1 N E π N | U k | + 1 √ N ≤ M E π N | U 1 | + 1 √ N → 0 . Th us we h a ve sho w n that E π N e i h t ,x 0 i s 1 − 1 2 N h u, P R C s u i s N = E π N [ e i h t ,x 0 i s − (1 / 2) h u,P R C s u i s ] + o (1) , and the result follo w s from the fact that E π N [ e i h t ,x 0 i s ] → E π [ e i h t ,z 0 i s ], finish- ing the p ro of of Lemma 4.3 . As a corollary of Lemma 4.3 , we obtain the follo wing. Corollar y 4.4. The p air ( x 0 , W N ) c onver ges we akly to ( z 0 , W ) in C ([0 , T ]; H s ) wher e W is a Br ownian motion with c ovarianc e op er ator C s and is indep e ndent of z 0 almost sur ely. Pr oof. As men tioned b efore, it is enough to sho w that f or an y t ∈ H s , u ∈ X R , R ∈ N and t > 0 , lim N →∞ E π N ( e i h t ,x 0 i s +( i/ √ N ) P ⌊ N t ⌋ k =1 h u,P R Γ k,N i s ) (4.14) = E π ( e i h t ,z 0 i s − ( t/ 2) h u,P R C s u i s ) . 34 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T No w we verify the conditions of Lemm a 4.3 to sho w ( 4.14 ). T o ve rify the first hyp othesis of Lemma 4.3 , notice that from Prop osition 2.1 we obtain that for k ≥ 1, E ξ k − 1 |h u, P R Γ k ,N i s | 2 = E ξ k − 1 h B s u, P R Γ k ,N ⊗ Γ k ,N B s u i = h u, P R C s u i s + U k ,N , | U k ,N | ≤ 1 2 ℓ 2 β M R ∧ N X l,j =1 u l u j |h φ l , P M E k ,N φ j i s | + N 2 ℓ 2 β k E ξ k − 1 ( x k − x k − 1 ) k 2 s k u k 2 s + |h u, P R C N s u i s − h u, P R C s u i s | , where { E k ,N } is as defined in ( 2.18 ). Because { Γ k ,N } is s tationary , w e d educe that { U k ,N } is stationary . F rom Prop ositio n 2.1 we obtain lim N →∞ R ∧ N X l,j =1 E π N |h φ l , P M E k ,N φ j i s | = 0 and E π N N 2 ℓ 2 β k E ξ k − 1 ( x k − x k − 1 ) k 2 s → 0 by the calc ulation in ( 4.8 ). Thus we ha ve shown that E π | U 1 ,N | → 0 as N → ∞ . Th e s econd hypothesis of Lem- ma 4.3 is easily v erified since E ξ k − 1 k Γ k ,N k 3 s ≤ M E ξ k − 1 k C 1 / 2 ξ k k 3 s ≤ M . Thus the corollary follo ws from Lemma 4.3 . Th us w e ha ve s ho wn that ( x 0 , W N ) conv erges weakly to ( z 0 , W ) where W is a Bro wn ian motion in H s with co v ariance op erator C s , and by the ab o v e corollary w e see that W is ind ep enden t of x 0 almost su rely , pro vin g the t wo claims mad e in Prop ositi on 2.2 and the pro of is complete. 5. Mean d rift and diffusion: Proof of Prop osition 2.1 . T o pro ve this k ey prop osition we mak e the standing Assump tions 3.1 , 3.4 from Section 3.1 without explicit statemen t of this fact within the individual lemmas. W e start w ith several preliminary b ound s and then consider the drift and d iffu - sion terms, resp ectiv ely . 5.1. Pr e liminary estimates. Recall the definitions of R ( x, ξ ), R i ( x, ξ ) and R ij ( x, ξ ) from equations ( 2.38 ), ( 2.39 ) and ( 2.47 ), resp ectiv ely . T hese qu an- tities w ere in tro duced so that the term in the exp onen tial of the acceptance probabilit y Q ( x, ξ ) could b e replaced with R i ( x, ξ ) and R ij ( x, ξ ) to tak e adv antag e of the fact that, conditional on x , R i ( x, ξ ) is indep enden t of ξ i and R ij ( x, ξ ) is indep enden t of ξ i , ξ j . In the next lemma, w e estimate the additional error due to this r eplacemen t of Q ( x, ξ ). Recall that E ξ 0 denotes exp ectation w ith resp ect to ξ = ξ 0 as in S ection 2.2 . DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 35 Lemma 5.1. E ξ 0 | Q ( x, ξ ) − R i ( x, ξ ) | 2 ≤ M N (1 + | ζ i | 2 ) , (5.1) E ξ 0 ( Q ( x, ξ ) − R ij ( x, ξ )) 2 ≤ M N (1 + | ζ i | 2 + | ζ j | 2 ) . (5. 2) Pr oof. S ince ξ j are i.i.d. N(0 , 1), usin g ( 2.1 ) and ( 3.1 ), w e obtain that E k C 1 / 2 ξ k 4 s ≤ 3( E k C 1 / 2 ξ k 2 s ) 2 ≤ M ∞ X j =1 j 2 s − 2 k ! 2 < ∞ (5.3) since s < k − 1 2 . Starting fr om ( 2.40 ), the estimates in ( 2.32 ) and ( 5.3 ) imply that E ξ 0 | Q ( x, ξ ) − R i ( x, ξ ) | 2 ≤ M E ξ 0 | r ( x, ξ ) | 2 + 1 N E ξ 0 ζ 2 i ξ 2 i + 1 N 2 E ξ 4 i ≤ M 1 N 2 E k C 1 / 2 ξ k 4 s + 1 N ζ 2 i + 3 N 2 ≤ M 1 N (1 + ζ 2 i ) v erifyin g the fi r st part of th e lemma. A v ery similar argum en t for the second part fi nishes the pro of. The random v ariables R ( x, ξ ), R i ( x, ξ ) and R ij ( x, ξ ) are app ro ximately Gaussian ran d om v ariables. Indeed it can b e readily seen that R ( x, ξ ) ≈ N − ℓ 2 , 2 ℓ 2 N k ζ k 2 . The next lemma con tains a crucial observ ation. W e sho w that the sequence of random v ariables { k ζ k 2 N } con v erges to 1 almost surely under b oth π 0 and π . Th us R ( x, ξ ) conv erges almost surely to Z ℓ def = N( − ℓ 2 , 2 ℓ 2 ) and th u s the exp ected acceptance p robabilit y E α ( x, ξ ) = 1 ∧ e Q ( x,ξ ) con verge s to β = E (1 ∧ e Z ℓ ). Lemma 5.2. A s N → ∞ we have 1 N k ζ k 2 → 1 , π 0 -a.s. and 1 N k ζ k 2 → 1 , π -a.s. (5.4) F u rthermor e, for any m ∈ N , α ≥ 2 , s < κ − 1 2 and for any c ≥ 0 , lim sup N ∈ N E π N N X j =1 λ α j j 2 s | ζ j | m e ( c/ N ) k ζ k 2 < ∞ . (5.5) 36 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T Final ly, we have lim N →∞ E π N 1 − 1 N k ζ k 2 2 = 0 . (5. 6) Pr oof. Th e pro of pro ceeds by showing th e conclusions first in the case when x D ∼ π 0 ; this is easier b ecause the fi n ite-dimensional distributions are Gaussian and b y F ernique’s theorem x h as exp onenti al momen ts. Next we notice that the almost sure p rop erties are p reserv ed under the change of measure π . T o sh o w the con vergence of moments, w e use our h y p othesis that the Radon–Nikodym deriv ativ e dπ N dπ 0 is b ounded from abov e indep end en tly of N , as sh o wn in Lemma 3.5 , equation ( 3.8 ). Indeed, fir st let x D ∼ π 0 . Recall that ζ = C − 1 / 2 ( P N x ) + C 1 / 2 ∇ Ψ N ( x ) an d k∇ Ψ N ( x ) k − s ≤ M 3 (1 + k x k s ) . (5.7) Using ( 3.6 ) and the fact th at s < κ − 1 2 so that − κ < − s , we deduce that k C 1 / 2 ∇ Ψ N ( x ) k ≍ k∇ Ψ N ( x ) k − κ ≤ k∇ Ψ N ( x ) k − s ≤ M (1 + k x k s ) uniformly in N . Also, since x is Gaussian un der π 0 , from ( 2.4 ), we ma y w r ite C − 1 / 2 ( P N x ) = P N k =1 ρ k φ k , where ρ k are i.i.d. N(0 , 1). Note that 1 N k ζ k 2 = 1 N k C − 1 / 2 ( P N x ) + C 1 / 2 ∇ Ψ N ( x ) k 2 = 1 N ( k C − 1 / 2 ( P N x ) k 2 + 2 h C − 1 / 2 ( P N x ) , C 1 / 2 ∇ Ψ N ( x ) i + k C 1 / 2 ∇ Ψ N ( x ) k 2 ) (5.8) = 1 N ( k C − 1 / 2 ( P N x ) k 2 + 2 h P N x, ∇ Ψ N ( x ) i + k C 1 / 2 ∇ Ψ N ( x ) k 2 ) = 1 N N X k =1 ρ 2 k + γ , where | γ | ≤ 1 N (2 k x k s k∇ Ψ N ( x ) k − s + k C 1 / 2 ∇ Ψ N ( x ) k 2 ) (5.9) ≤ M N (2 k x k s (1 + k x k s ) + (1 + k x k s ) 2 ) . Under π 0 , we hav e k x k s < ∞ a.s., f or s < κ − 1 2 and hence, by ( 5.9 ), we conclude th at | γ | → 0 almost s urely as N → ∞ . No w, by the strong la w of large num b ers, 1 N P N k =1 ρ 2 k → 1 almost sur ely . Hence, from ( 5.8 ) we obtain DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 37 that un d er π 0 , lim N →∞ 1 N k ζ k 2 = 1 almost surely , proving the first equation in ( 5.4 ). No w the second equation in ( 5.4 ) follo w s by noting that almost s u re limits are preserv ed und er a (absolutely con tinuous) c hange of measure. Next, notice that by ( 5.8 ) and the C auc hy–Sc h w arz inequ alit y , for any c > 0, ( E π 0 e ( c/ N ) k ζ k 2 ) 2 ≤ ( E π 0 e (2 c/ N ) P ρ 2 k )( E π 0 e 2 cγ ) ≤ ( E π 0 e (2 c/ N ) P ρ 2 k )( E π 0 e ( M / N ) k x k 2 s ) . Using the fact that P N k =1 ρ 2 k has c hi-squared distr ibution w ith N d egrees of freedom gives ( E π 0 e ( c/ N ) k ζ k 2 ) 2 ≤ M e − ( N/ 2) log (1 − 4 c / N ) ( E π 0 e ( M / N ) k x k 2 s ) ≤ M , (5.10) where the last inequalit y follo ws fr om F ernique’s theorem sin ce E π 0 e ( M / N ) k x k 2 s < ∞ for sufficien tly large N . Hence, by applying Lemma 3.5 , equation ( 3.8 ), it f ollo ws that lim su p N →∞ E π N e ( c/ N ) k ζ k 2 < ∞ . Notic e that we also ha ve the b ound | ζ k | m ≤ M ( | ρ k | m + | λ k | m (1 + k x k m s )) . Since s < k − 1 / 2, we ha v e that P ∞ j =1 λ 2 j j 2 s < ∞ and therefore, it follo ws that for α ≥ 2, lim sup N →∞ N X k =1 ( E π N λ 2 α k j 2 s | ζ k | 2 m ) 1 / 2 < ∞ . (5.11) Hence the clai m in ( 5.5 ) follo w s f rom applying Cauc h y–Sc hw arz com bined with ( 5.10 ) and ( 5.11 ). Similarly , a s traigh tforward calculation yields th at E π 0 ( | 1 − 1 N k ζ k 2 | 2 ) ≤ M N . Hence, again b y Lemma 3.5 , lim N →∞ E π N 1 − 1 N k ζ k 2 2 = 0 pro v in g the last claim and the pro of is complete. Recall that Q ( x, ξ ) = R ( x, ξ ) − r ( x, ξ ). Thus, fr om ( 2.32 ) and Lemm a 5.1 it follo ws that R i ( x, ξ ) and R ij ( x, ξ ) also are appro ximately Gaussian. Th ere- fore, the conclusion of Lemma 5.2 leads to the reasoning that, for any fixed realizatio n of x D ∼ π , th e r an d om v ariables R ( x, ξ ) , R i ( x, ξ ) and R ij ( x, ξ ) all con verge to the same w eak limit Z ℓ ∼ N( − ℓ 2 , 2 ℓ 2 ) as the d imension of the noise ξ go es to ∞ . In the rest of this subsection, we rigorize this argument b y deriving a Berry–Essen b ound for the w eak conv ergence of R ( x, ξ ) to Z ℓ . F or this p urp ose, it is natural and con venien t to obtain th ese b ounds in the W asserstein metric. Recall that the W asserstein distance b et wee n t wo 38 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T random v ariables W ass ( X, Y ) is defined by W ass( X , Y ) def = sup f ∈ Lip 1 E ( f ( X ) − f ( Y )) , where Lip 1 is the class of 1-Lipsc hitz fu nctions. The f ollo wing lemma giv es a b ound for the W asserstein d istance b et ween R ( x, ξ ) and Z ℓ . Lemma 5.3. Almost sur ely with r esp e ct to x ∼ π , W ass( R ( x, ξ ) , Z ℓ ) ≤ M 1 N 3 / 2 N X j =1 | ζ j | 3 + 1 − k ζ k 2 N + 1 √ N ! , (5.12) W ass( R ( x, ξ ) , R i ( x, ξ )) ≤ M √ N ( | ζ i | + 1) . (5.13) Pr oof. Define the Gaussian rand om v ariable G def = − q 2 ℓ 2 N P N k =1 ζ k ξ k − ℓ 2 . F or any 1-Lipsc hitz function f , | E ξ ( f ( G ) − f ( R ( x, ξ ))) | ≤ ℓ 2 E ξ 1 − 1 N N X k =1 ξ 2 k < M 1 √ N implying that W ass ( G, R ( x, ξ )) ≤ M 1 √ N . Now, from classical Berry –Ess een estimates (see [ 26 ]), we ha ve that W ass( G, Z ℓ ) ≤ M 1 N 3 / 2 N X j =1 | ζ j | 3 + M 1 − k ζ k 2 N . Hence the pr o of of the first claim follo w s from the triangle in equ alit y . T o see the second claim, notice that for any 1-Lipsc hitz function f we h a ve E ξ 0 | f ( R ( x, ξ )) − f ( R i ( x, ξ )) | ≤ E ξ 0 | R ( x, ξ ) − R i ( x, ξ ) | ≤ M 1 √ N (1 + | ζ i | ) and the pro of is complete. Hence, from equations ( 5.13 ) and ( 5.12 ), we obtain W ass( R i ( x, ξ ) , Z ℓ ) (5.14) ≤ M 1 √ N ( | ζ i | + 1) + 1 N 3 / 2 N X j =1 | ζ j | 3 + 1 − k ζ k 2 N ! . W e conclude this section w ith the follo w ing observ ation whic h will b e used later. Recall the Kolmogoro v–Sm irno v (KS) distance b et wee n t w o random v ariables ( W, Z ) : KS( W , Z ) def = sup t ∈ R | P ( W ≤ t ) − P ( Z ≤ t ) | . (5.15) DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 39 Lemma 5.4. If a r andom variable Z ha s a density with r esp e c t to the L eb esgue me asur e, b ounde d by a c onstant M , then KS( W , Z ) ≤ p 4 M W ass ( W, Z ) . (5.16) W e could n ot find th e reference for the ab ov e in an y published literature, so we include a short pro of her e wh ic h wa s tak en from the unp ublished lecture n otes [ 10 ]. Pr oof of Le m ma 5.4 . Fix t ∈ R and ǫ > 0. Define t wo functions g 1 and g 2 as g 1 ( y ) = 1 for y ∈ ( −∞ , t ), g 1 ( y ) = 0 for y ∈ [ t + ǫ, ∞ ) and linea r in terp olatio n in b et ween. Similarly , defi ne g 2 ( y ) = 1 , for y ∈ ( −∞ , t − ǫ ], g 2 ( y ) = 0, for y ∈ [ t, ∞ ) and linear int erp olation in b etw een. Then g 1 and g 2 form up p er and lo we r en velo p es for th e function 1 ( −∞ ,t ] ( y ). So P ( W ≤ t ) − P ( Z ≤ t ) ≤ E g 1 ( W ) − E g 1 ( Z ) + E g 1 ( Z ) − P ( Z ≤ T ) . Since g 1 is 1 ǫ -Lipsc h itz, we ha ve E g 1 ( W ) − E g 1 ( Z ) ≤ 1 ǫ W ass( W, Z ) and E g 1 ( Z ) − P ( Z ≤ t ) ≤ M ǫ since Z has d ensit y b ounded by M . S imilarly , u sing the function g 2 , it follo ws that the s ame b oun d holds for th e d ifference P ( Z ≤ t ) − P ( W ≤ t ). Optimizing o ver ǫ yields the required b ound. 5.2. Rigo r ous estimates for the drift: Pr o of of Pr op osition 2.1 , e quation ( 2.14 ). In the follo wing series of lemmas we retrace the argu m en ts from Section 2.6 while deriving explicit b ound s f or the err or terms. Lemma 5.11 at th e end of the section giv es control of the er r or terms. The follo win g lemma sho ws that Q ( x, ξ ) is well appro ximated by R i ( x, ξ ) − q 2 ℓ 2 N ζ i ξ i , as ind icated in ( 2.40 ). Lemma 5.5. N E 0 ( x 1 i − x i ) = λ i √ 2 ℓ 2 N E ξ 0 ((1 ∧ e R i ( x,ξ ) − √ 2 ℓ 2 / N ζ i ξ i ) ξ i ) + ω 0 ( i ) , | ω 0 ( i ) | ≤ M √ N λ i . Pr oof. W e ha ve N E 0 ( x 1 i − x 0 i ) = N E 0 ( γ 0 ( y 0 i − x i )) = N E ξ 0 α ( x, ξ ) r 2 ℓ 2 N ( C 1 / 2 ξ ) i ! = λ i √ 2 ℓ 2 N E ξ 0 ( α ( x, ξ ) ξ i ) = λ i √ 2 ℓ 2 N E ξ 0 ((1 ∧ e Q ( x,ξ ) ) ξ i ) . No w w e observe that E ξ 0 ((1 ∧ e Q ( x,ξ ) ) ξ i ) = E ξ 0 ((1 ∧ e R i ( x,ξ ) − √ 2 ℓ 2 / N ξ i ζ i ) ξ i ) + ω 0 ( i ) λ i √ 2 ℓ 2 N . 40 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T By ( 2.32 ) and ( 2.40 ), Q ( x, ξ ) − R i ( x, ξ ) + r 2 ℓ 2 N ζ i ξ i 2 ≤ M N 2 ( | ξ i | 4 + k C 1 / 2 ξ k 4 s ) . (5.17) Noticing that th e map y 7→ 1 ∧ e y is Lipsc hitz, we obtain | ω 0 ( i ) | ≤ M λ i √ N E ξ 0 | ((1 ∧ e Q ( x,ξ ) ) − (1 ∧ e R i ( x,ξ ) − √ 2 ℓ 2 / N ξ i ζ i )) ξ i | ≤ M λ i √ N " E ξ 0 Q ( x, ξ ) − R i ( x, ξ ) + r 2 ℓ 2 N ξ i ζ i 2 # 1 / 2 [ E ξ 0 ( ξ i ) 2 ] 1 / 2 ≤ M √ N λ i , where the last inequalit y f ollo ws from ( 5.17 ) and the pro of is complete. The next lemma take s adv antag e of the fact that R i ( x, ξ ) is indep enden t of ξ i conditional on x . Thus, us in g the identit y ( 2.36 ), w e ob tain the b ound for the appro ximation made in ( 2.41 ). Lemma 5.6. E ξ 0 ((1 ∧ e R i ( x,ξ ) − √ 2 ℓ 2 / N ζ i ξ i ) ξ i ) = − r 2 ℓ 2 N ζ i E ξ − i 0 e R i ( x,ξ )+ ℓ 2 ζ i 2 / N Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | + ω 1 ( i ) , (5.18) | ω 1 ( i ) | ≤ M | ζ i | 2 1 N e ( ℓ 2 / N ) k ζ k 2 . Pr oof. App lying ( 2.36 ) with a = − q 2 ℓ 2 N ζ i , z = ξ i and b = R i ( x, ξ ) , we obtain the iden tit y E ξ 0 ((1 ∧ e R i ( x,ξ ) − √ 2 ℓ 2 / N ξ i ζ i ) ξ i ) (5.19) = − r 2 ℓ 2 N ζ i E ξ − i 0 e R i ( x,ξ )+( ℓ 2 / N ) ζ 2 i Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | − r 2 ℓ 2 N | ζ i | ! . No w w e observe that E ξ − i 0 e R i ( x,ξ )+ ℓ 2 ζ i 2 / N = E ξ − i 0 ( e − √ 2 ℓ 2 / N P N j =1 ,j 6 = i ζ j ξ j − ( ℓ 2 / N ) P N j =1 ,j 6 = i ξ j 2 +( ℓ 2 / N ) ζ i 2 ) (5.20) ≤ E ξ − i 0 ( e − √ 2 ℓ 2 / N P N j =1 ,j 6 = i ζ j ξ j +( ℓ 2 / N ) ζ i 2 ) = e ( ℓ 2 / N ) k ζ k 2 . DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 41 Since Φ is globally Lipschitz, it follo ws that E ξ − i 0 e R i ( x,ξ )+ ℓ 2 ζ i 2 / N Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | − r 2 ℓ 2 N | ζ i | ! = E ξ − i 0 e R i ( x,ξ )+ ℓ 2 ζ i 2 / N Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | + ω 1 ( i ) , (5.21) | ω 1 ( i ) | ≤ M | ζ i | 1 √ N E ξ − i 0 e R i ( x,ξ )+ ℓ 2 ζ i 2 / N ≤ M | ζ i | 1 √ N e ( ℓ 2 / N ) k ζ k 2 , where the last estimate follo w s from ( 5.20 ). Th e lemma follo ws from ( 5.19 ) and ( 5.20 ). The n ext few lemmas are tec hnical and giv e qu an titativ e bou n ds for the appro ximations in ( 2.43 ) and ( 2.44 ). Lemma 5.7. E ξ − i 0 e R i ( x,ξ )+ ℓ 2 ζ i 2 / N Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | = E ξ − i 0 e R i ( x,ξ )+ ℓ 2 ζ i 2 / N 1 R i ( x,ξ ) < 0 + ω 2 ( i ) , | ω 2 ( i ) | ≤ M e (2 ℓ 2 / N ) k ζ k 2 ( | ζ i | + 1) E ξ 0 1 (1 + | R ( x, ξ ) | √ N ) 2 1 / 4 . Pr oof. W e first pro ve the follo wing lemma needed f or the p ro of. Lemma 5.8. L et φ ( · ) and Φ( · ) denote the p df and CDF of the standar d normal distribution, r esp e ctiv ely. Then we have: (1) for any x ∈ R , | Φ( − x ) − 1 x< 0 | = | 1 − Φ( | x | ) | . (2) for any x > 0 and ǫ ≥ 0 , 1 − Φ( x ) ≤ 1+ ǫ x + ǫ . Pr oof. F or the first claim, notice that if x > 0, | Φ( − x ) − 1 x< 0 | = | Φ( − x ) | = | 1 − Φ( | x | ) | . If x < 0, | Φ ( − x ) − 1 x< 0 | = | 1 − Φ ( | x | ) | and the claim follo ws. F or the second claim, 1 − Φ( x ) = Z ∞ x φ ( u ) du ≤ Z ∞ x u + ǫ x + ǫ φ ( u ) du ≤ φ ( x ) + ǫ x + ǫ ≤ 1 + ǫ x + ǫ since R ∞ −∞ φ ( u ) du = 1 . W e no w p ro ceed to the pr o of of Lemma 5.7 . By Cauc hy–Sc hw arz and an estimate similar to ( 5.20 ), | ω 2 ( i ) | ≤ E ξ − i 0 e R i ( x,ξ )+( ℓ 2 / N ) ζ i 2 1 R i ( x,ξ ) < 0 − Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | 42 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T ≤ [ E ξ − i 0 e 2 R i ( x,ξ )+(2 ℓ 2 / N ) ζ i 2 ] 1 / 2 E ξ − i 0 1 R i ( x,ξ ) < 0 − Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | 2 1 / 2 (5.22) ≤ M e (2 ℓ 2 / N ) k ζ k 2 E ξ − i 0 1 R i ( x,ξ ) < 0 − Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | 2 1 / 2 ≤ M e (2 ℓ 2 / N ) k ζ k 2 E ξ − i 0 1 R i ( x,ξ ) < 0 − Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | 1 / 2 , where the last t w o observ ations follo w from the compu tation done in ( 5.20 ) and the fact that | 1 R i ( x,ξ ) < 0 − Φ( − R i ( x,ξ ) √ 2 ℓ 2 / N | ζ i | ) | < 1. By applyin g L emma 5.8 , with ǫ = 1 √ 2 ℓ | ζ i | , 1 R i ( x,ξ ) < 0 − Φ − R i ( x, ξ ) p 2 ℓ 2 / N | ζ i | = 1 − Φ | R i ( x, ξ ) | p 2 ℓ 2 / N | ζ i | = 1 − Φ | R i ( x, ξ ) | √ N √ 2 ℓ | ζ i | (5.23) ≤ (1 + √ 2 ℓ | ζ i | ) 1 1 + | R i ( x, ξ ) | √ N . The right-hand side of the estimate ( 5.23 ) dep ends on i b ut we n eed esti- mates whic h are ind ep enden t of i . In the next lemma, w e replace R i ( x, ξ ) b y R ( x, ξ ) and cont rol the extra error term. Lemma 5.9. E ξ − i 0 1 1 + | R i ( x, ξ ) | √ N ≤ M (1 + | ζ i | ) E ξ 0 1 (1 + | R ( x, ξ ) | √ N ) 2 1 / 2 . (5.2 4) Pr oof. W e write E ξ − i 0 1 1 + | R i ( x, ξ ) | √ N = E ξ 0 1 1 + | R i ( x, ξ ) | √ N = E ξ 0 1 1 + | R ( x, ξ ) | √ N + γ (5.25) ≤ E ξ 0 1 (1 + | R ( x, ξ ) | √ N ) 2 1 / 2 + γ , | γ | ≤ E ξ 0 1 1 + | R i ( x, ξ ) | √ N − 1 1 + | R ( x, ξ ) | √ N DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 43 ≤ E ξ 0 √ 2 ℓ | ζ i || ξ i | + ℓ 2 / √ N ξ i 2 (1 + | R i ( x, ξ ) | √ N )(1 + | R ( x, ξ ) | √ N ) (5.26) ≤ E ξ 0 √ 2 ℓ | ζ i || ξ i | + ℓ 2 / √ N ξ i 2 (1 + | R ( x, ξ ) | √ N ) ≤ M ( | ζ i | + 1) E ξ 0 1 (1 + | R ( x, ξ ) | √ N ) 2 1 / 2 , and the claim follo ws fr om ( 5.25 ) and ( 5.26 ). No w, by ap p lying the estimates obtained in ( 5.22 ), ( 5.23 ) and ( 5.24 ), we obtain | ω 2 ( i ) | ≤ M e (2 ℓ 2 / N ) k ζ k 2 ( | ζ i | + 1) E ξ 0 1 (1 + | R ( x, ξ ) | √ N ) 2 1 / 4 and the pro of is complete. The er r or estimate in ω 2 has R ( x, ξ ) instead of R i ( x, ξ ) . This b ound can b e ac hieve d b ecause the terms R i ( x, ξ ) for all i ∈ N ha ve the same w eak limit as R ( x, ξ ) and thus the additional err or term due to th e r eplacemen t of R i ( x, ξ ) by R ( x, ξ ) in the expression can b e con trolled uniformly o ver i for large N . Lemma 5.10. E ξ − i 0 e R i ( x,ξ )+ ℓ 2 ζ i 2 / N 1 R i ( x,ξ ) < 0 = β 2 + ω 3 ( i ) , | ω 3 ( i ) | ≤ M ζ 2 i N e ℓ 2 k ζ k 2 / N + M 1 + | ζ i | √ N + 1 N 3 / 2 N X j =1 | ζ j | 3 + 1 − k ζ k 2 N ! 1 / 2 . Pr oof. S et g ( y ) def = e y 1 y < 0 . W e firs t need to estimate the follo wing: | E ξ 0 ( g ( R i ( x, ξ )) − g ( Z ℓ )) | . Notice that the fun ction g ( · ) is not Lips c hitz and therefore, the W assers tein b ound s obtained earlier cannot b e used directly . Ho wev er, w e use the fact that the n ormal d istribution has a d ensit y whic h is b ounded ab o ve. So by Lemma 5.3 , ( 5.14 ) and ( 5.16 ), KS( R i ( x, ξ ) , Z ℓ ) ≤ 2 M p W ass( R i ( x, ξ ) , Z ℓ ) ≤ M 1 + | ζ i | √ N + 1 N 3 / 2 N X j =1 | ζ j | 3 + 1 − k ζ k 2 N ! 1 / 2 . 44 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T Since g is p ositiv e on ( −∞ , 0], for a r eal v alued con tinuous rand om v ari- able X , E ( g ( X )) = Z 0 −∞ g ′ ( t )( P ( X > t )) dt − g (0) P ( X ≥ 0) . Hence, | E ξ 0 g ( R i ( x, ξ )) − E g ( Z ℓ ) | ≤ Z 0 −∞ g ′ ( t )( P ( R i ( x, ξ ) > t ) − P ( Z ℓ > t )) dt + g (0) | P ( R i ( x, ξ ) ≥ 0) − P ( Z ℓ ≥ 0) | ≤ KS( R i ( x, ξ ) , Z ℓ ) Z 0 −∞ g ′ ( t ) dt + g (0) ≤ M KS( R i ( x, ξ ) , Z ℓ ) . Hence, pu tting the ab o v e calculations together and noticing that E ( e Z ℓ 1 Z ℓ < 0 ) = β / 2, we h a ve just sh o wn that E ξ 0 ( e R i ( x,ξ ) 1 R i ( x,ξ ) < 0 ) − β 2 ≤ M v u u t 1 + | ζ i | √ N + 1 N 3 / 2 N X j =1 | ζ j | 3 + 1 − k ζ k 2 N . Notice that | ω 3 ( i ) | ≤ | e ℓ 2 ζ 2 i / N E ξ 0 ( e R i ( x,ξ ) 1 R i ( x,ξ ) < 0 ) − β / 2 | ≤ | e ℓ 2 ζ 2 i / N − 1 || E ξ 0 ( e R i ( x,ξ ) 1 R i ( x,ξ ) < 0 ) | + | E ξ 0 ( e R i ( x,ξ ) 1 R i ( x,ξ ) < 0 ) − β / 2 | ≤ M ζ 2 i N e ℓ 2 k ζ k 2 / N + | E ξ 0 ( e R i ( x,ξ ) 1 R i ( x,ξ ) < 0 ) − β / 2 | , where the last b oun d follo w s from ( 5.20 ), proving th e claimed error b ound for ω 3 ( i ). F or deriving the error b ounds on ω 3 , we cannot directly apply the W asser- stein b ounds obtained in ( 5.14 ), b eca use the f unction y 7→ e y 1 y < 0 is not Lipsc h itz on R . Ho w ev er, using ( 5.16 ), the KS distance b etw een R i ( x, ξ ) and Z ℓ is b oun ded by the square ro ot of the W assers tein distance. Thus, us- ing the fact that e y 1 y < 0 is b ounded and p ositiv e, w e b ound the exp ectation in Lemma 5.10 by the KS distance. Com b ining all the ab o v e estimates, w e see that N E ξ 0 [ x 1 i − x i ] = − ℓ 2 β ( P N x + C ∇ Ψ ( P N x )) i + r N i (5.27) with | r N i | ≤ | ω 0 ( i ) | + M λ i ( √ N | ω 1 ( i ) | + | ζ i || ω 2 ( i ) | + | ζ i || ω 3 ( i ) | ) . (5.28) The follo wing lemma giv es the cont rol ov er r N and completes the pr o of of ( 2.14 ), Pr op osition 2.1 . DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 45 Lemma 5.11. F or s < κ − 1 / 2 , lim N →∞ E π N k r N k 2 s = lim N →∞ E π N N X i =1 i 2 s | r N i | 2 = 0 . Pr oof. By ( 5.28 ), we ha v e | r N i | ≤ | ω 0 ( i ) | + M λ i ( √ N | ω 1 ( i ) | + | ζ i || ω 2 ( i ) | + | ζ i || ω 3 ( i ) | ). T herefore, E π N N X i =1 i 2 s | r N i | 2 (5.29) ≤ M E π N N X i =1 ( i 2 s | ω 0 ( i ) | 2 + i 2 s λ 2 i ( N ω 1 ( i ) 2 + ζ i 2 ω 2 ( i ) 2 + ζ i 2 ω 3 ( i ) 2 )) . No w we will ev aluate eac h su m of th e righ t-hand side of the ab ov e equation and show that they conv erge to zero. • Since P ∞ i =1 λ 2 i i 2 s < ∞ , N X i =1 E π N i 2 s | ω 0 ( i ) | 2 ≤ M 1 N N X i =1 i 2 s λ 2 i ≤ M 1 N ∞ X i =1 λ 2 i i 2 s → 0 . (5.30) • By Lemmas 5.6 and 5.2 , N E π N N X i =1 λ 2 i i 2 s | ω 1 ( i ) | 2 ≤ M 1 N N X i =1 E π N λ 2 i i 2 s | ζ i | 4 e (2 ℓ 2 / N ) k ζ k 2 → 0 . (5.31) • F rom Lemma 5.7 and Cauch y–Sc hw arz, we obtain N X i =1 E π N λ 2 i i 2 s | ζ i | 2 | ω 2 ( i ) | 2 ≤ M E π N E ξ 0 1 (1 + | R ( x, ξ ) | √ N ) 2 1 / 2 × N X i =1 ( E π N e (8 ℓ 2 / N ) k ζ k 2 λ 4 i i 4 s ( | ζ i | 8 + 1)) 1 / 2 . Pro ceeding sim ilarly as in Lemma 5.2 , it follo ws that N X i =1 ( E π N e (8 ℓ 2 / N ) k ζ k 2 λ 4 i i 4 s ( | ζ i | 8 + 1)) 1 / 2 is b ounded in N . Since, with x D ∼ π 0 , R ( x, ξ ) con v erges w eakly to Z ℓ as N → ∞ , by the b ound ed conv ergence theorem we obtain lim N →∞ E π 0 E ξ 0 1 (1 + | R ( x, ξ ) | √ N ) 2 = 0 46 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T and thus, b y Lemma 3.5 , lim N →∞ E π N E ξ 0 1 (1 + | R ( x, ξ ) | √ N ) 2 = 0 . Therefore, we deduce that lim N →∞ N X i =1 E π N | ζ i | 2 i 2 s λ 2 i | ω 2 ( i ) | 2 = 0 . (5.32) • After some algebra we obtain from Lemma 5.10 that E π N N X i =1 λ 2 i i 2 s | ζ i | 2 | ω 3 ( i ) | 2 ≤ M 1 N 2 N X i =1 E π N λ 2 i i 2 s | ζ i | 6 e 2 ℓ 2 ( k ζ k 2 / N ) + M 1 √ N E π N N X i =1 λ 2 i i 2 s ζ 2 i (1 + | ζ i | ) + M " E π N 1 N 3 / 2 N X j =1 | ζ j | 3 ! 2 + E π N 1 − k ζ k 2 N 2 ! 1 / 2 # × N X i =1 ( E π N λ 4 i i 4 s ζ 4 i ) 1 / 2 . Similar to the previous calculatio ns, using Lemma 5.2 , it is quite s traigh t- forw ard to verify th at eac h of the four terms ab o ve conv erges to 0 . Thus w e obtain lim N →∞ N X i =1 E π N λ 2 i i 2 s | ζ i | 2 | ω 3 ( i ) | 2 = 0 . (5.33) No w the p r o of of Lemma 5.11 follo ws from ( 5.29 )–( 5.33 ). This completes th e pro of of Prop osition 2.1 , equation ( 2.14 ). 5.3. Rigo r ous estimates for the diffusion c o efficient: Pr o of of Pr op osi- tion 2.1 , e quation ( 2.15 ). Recall that for 1 ≤ i, j ≤ N , N E 0 [( x 1 i − x 0 i )( x 1 j − x 0 j )] = 2 ℓ 2 E ξ 0 [( C 1 / 2 ξ ) i ( C 1 / 2 ξ ) j (1 ∧ exp Q ( x, ξ ))] . The follo win g lemma quanti fies the appro ximations made in ( 2.48 ) an d ( 2.49 ). Lemma 5.12. E ξ 0 [( C 1 / 2 ξ ) i ( C 1 / 2 ξ ) j (1 ∧ exp Q ( x, ξ ))] = λ i λ j δ ij E ξ − ij [(1 ∧ exp R ij ( x, ξ )) ] + θ ij , E ξ − ij [(1 ∧ exp R ij ( x, ξ )) ] = β + ρ ij , DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 47 wher e the err or terms satisfy | θ ij | ≤ M λ i λ j (1 + | ζ i | 2 + | ζ j | 2 ) 1 / 2 1 √ N , (5.34) | ρ ij | ≤ M 1 √ N (1 + | ζ i | + | ζ j | ) + 1 N 3 / 2 N X s =1 | ζ s | 3 + 1 − k ζ k 2 N ! . (5.35) Pr oof. W e first deriv e the b ound for θ . Indeed, | θ ij | ≤ E ξ 0 [ | ( C 1 / 2 ξ ) i ( C 1 / 2 ξ ) j ((1 ∧ e Q ( x,ξ ) ) − (1 ∧ e R ij ( x,ξ ) )) | ] ≤ M λ i λ j E ξ 0 [ | ξ i ξ j ((1 ∧ e Q ( x,ξ ) ) − (1 ∧ e R ij ( x,ξ ) )) | ] . By the Cauc hy–Sc hw arz inequ alit y , | θ ij | ≤ M λ i λ j ( E ξ 0 | (1 ∧ e Q ( x,ξ ) ) − (1 ∧ e R ij ( x,ξ ) ) | ) 1 / 2 ≤ M λ i λ j ( E ξ 0 | Q ( x, ξ ) − R ij ( x, ξ ) | 2 ) 1 / 2 . Using the estimate obtained in ( 5.2 ), | θ ij | ≤ M λ i λ j (1 + | ζ i | 2 + | ζ j | 2 ) 1 / 2 1 √ N v erifyin g ( 5.34 ). No w w e turn to v erifying the err or b ound in ( 5.35 ). W e n eed to b ound E ξ 0 ( g ( R ij ( x, ξ )) − g ( Z ℓ )) , where g ( y ) def = 1 ∧ e y . Notice th at E ( g ( Z ℓ )) = β . Since g ( · ) is Lipsc h itz, | E ξ 0 ( g ( R ij ( x, ξ )) − g ( Z ℓ )) | ≤ M W ass ( R ij ( x, ξ ) , Z ℓ ) . (5.36) calculatio n will yield that W ass( R ij ( x, ξ ) , R ( x, ξ )) ≤ M ( | ζ i | + | ζ j | + 1) 1 √ N . Therefore, by the triangle in equalit y and L emma 5.3 , W ass( R ij ( x, ξ ) , Z ℓ ) ≤ M 1 √ N (1 + | ζ i | + | ζ j | ) + 1 N 3 / 2 N X r =1 | ζ r | 3 + 1 − k ζ k 2 N ! . Hence the estimate in ( 5.34 ) follo ws fr om the observ ation made in ( 5.36 ). Putting together all the estimates pro d u ces N E 0 [( x 1 i − x 0 i )( x 1 j − x 0 j )] = 2 ℓ 2 β λ i λ j δ ij + E N ij and (5.37) | E N ij | ≤ M ( | θ ij | + λ i λ j δ ij | ρ ij | ) . Finally we estimate the err or of E N ij . 48 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T Lemma 5.13. We have lim N →∞ N X i =1 E π N |h φ i , E N φ j i s | = 0 , lim N →∞ E π N |h φ i , E N φ j i s | = 0 for any p air of indic es i, j . Pr oof. F rom ( 5.37 ) we obtain that N X i =1 E π N |h φ i , E N φ i i s | ≤ M N X i =1 E π N i 2 s | θ ii | + N X i =1 λ 2 i i 2 s E π N | ρ ii | ! , (5.38) N X i =1 E π N i 2 s | θ ii | ≤ M N X i =1 E π 0 | θ ii | i 2 s ≤ M N X i =1 E π 0 λ 2 i i 2 s (1 + | ζ i | 2 ) 1 / 2 1 √ N (5.39) ≤ M N X i =1 E π 0 λ 2 i i 2 s (1 + | ζ i | ) 1 √ N → 0 due to th e fact that P ∞ i =1 λ 2 i i 2 s < ∞ and Lemma 5.2 . No w the second term of ( 5.38 ), N X i =1 λ 2 i i 2 s E π N | ρ ii | ≤ M E π 0 N X i =1 λ 2 i i 2 s 1 √ N (1 + | ζ i | ) + 1 N 3 / 2 N X s =1 | ζ s | 3 + 1 − k ζ k 2 N ! . The fi r st term ab o ve go es to zero by ( 5.39 ) and th e last term con verges to zero by the same argumen ts used in Lemma 5.2 . As men tioned in the pro of of the estimate for the term ω 3 in Lemma 5.11 , the s um E π N 1 N 3 / 2 P N s =1 | ζ s | 3 go es to zero. Therefore, we ha ve shown that lim N →∞ N X i =1 E π N |h φ i , E N φ i i s | = 0 , pro v in g the fi rst claim. Finally , from ( 5.34 ) it immed iately follo ws that E π |h φ i , E N φ j i s | ≤ E π i s j s | θ ij | → 0 , pro v in g the second claim as w ell. DIFFUSIO N LIMITS OF THE RAN DOM W ALK METROPOLIS A LGORITHM 49 Therefore, we ha ve sho w n N E 0 [( x 1 i − x 0 i )( x 1 j − x 0 j )] = 2 ℓ 2 β h φ i , C φ j i + E N , lim N →∞ N X i =1 E π N |h φ i , E N φ i i| = 0 . This fin ishes the p r o of of Prop osition 2.1 , equation ( 2.15 ). Ac kn owledgmen ts. W e th ank Alex Thiery and an anonymous referee for their careful reading and very insigh tfu l comment s which significan tly impro v ed the clarit y of the p resen tation. REFERENCES [1] B ´ edard, M. (2007). W eak conv ergence of Metrop olis algorithms for n on-i.i.d. target distributions. Ann. Appl. Pr ob ab. 17 1222–1244. MR2344305 [2] B ´ edard, M. (2009). On the optimal scaling p roblem of Metropolis algorithms for hierarc h ical target d istribu t ions. Preprint. [3] B erger, E. (1986). Asy m p totic b ehaviour of a class of sto chastic app roximation proced ures. Pr ob ab. The ory R elate d Fields 71 517–552. MR0833268 [4] B esko s, A. , Rober ts, G. and S tuar t, A. (2009). Op timal scalings for lo cal Metropolis–Hastings chai ns on nonpro duct targets in high dimensions. Ann. Appl. Pr ob ab. 19 863–898. MR2537193 [5] B esko s, A. , Rober ts, G . , Stuar t, A. and Vo ss, J. (2008). MCMC metho ds for diffusion b ridges. Sto ch. Dyn. 8 319–350. MR2444507 [6] B esko s, A. and Stua r t, A. M. (2008). MCMC metho ds for sampling fun ction sp ace. In ICI AM Invite d L e ctur e 2007 ( R. Jel tsch and G. W an ner , eds.). Eu ropean Mathematical So ciet y , Z ¨ urich. [7] B ou-Rabee, N. and V anden-Eijnde n , E. (2010). P athwise accuracy and ergo dicity of Metrop olized integ rators for SDEs. Comm. Pur e Appl. Math. 63 655 –696. MR2583309 [8] B reyer, L. A. , Piccioni, M. and Scarla tti, S. (2004). Optimal scaling of MALA for n on linear regression. Ann. Appl. Pr ob ab. 14 1479–150 5. MR2071431 [9] B reyer, L. A. and Rober ts, G. O. (2000). F rom Metropolis t o diffusions: Gibbs states and optimal scaling. Sto chastic Pr o c ess. Appl. 90 181–206. MR1794535 [10] Cha tterjee, S. (2007). Stein’s metho d . Lecture notes. Av ailable at http:// www.stat.b erkeley.e du/~sourav/stat206Afall07.html . [11] Chen, X. and White, H. (1998). Central limit and functional central limit theorems for Hilb ert-v alued d ep endent heterogeneous arra ys with app lications. Ec onomet- ric The ory 14 260–284 . MR1629340 [12] Cotter, S . L. , Dashti, M. and Stuar t, A . M. (2010). A p proximatio n of Bay esian inv erse p roblems. SIAM Journal of Numeric al Analysis 48 322–345. [13] Da Pra to, G. and Zabczyk, J. (1992). Sto chastic Equations in Infinite Dimensions . Encyclop e di a of Mathematics and Its Applic ations 44 . Cambridge Univ. Press, Cam b rid ge. MR1207136 [14] Ethier, S . N. and Kur tz, T . G. (1986). Markov Pr o c esses: Char acterization and Conver genc e . Wiley , N ew Y ork. MR0838085 50 J. C. MA TTINGL Y, N. S. PILLAI A ND A. M. STUAR T [15] Hairer, M. , S tuar t, A. M. and Voss , J. (2007). A nalysis of SPDEs arising in p ath sampling. I I. The n onlinear case. Ann. Appl. Pr ob ab. 17 1657–17 06. MR2358638 [16] Hairer, M. , Stuar t, A. M. and Voss , J. (2011). Signal pro cessing problems on function space: Bay esian form ulation, stochastic PDEs an d effective MCMC metho d s. In The Oxf or d Handb o ok of Nonline ar Filtering (D. Cri san and B. Rozo vsky , ed s.). Oxford Univ. Press, Oxford. [17] Hairer, M. , Stuar t, A. M. , Voss, J. and Wiberg, P. (2005). An alysis of SPDEs arising in p ath sampling. I . The Gaussian case. Commun. Math. Sci. 3 587–603 . MR2188686 [18] Hastings, W. K. (1970). Monte Carlo sampling method s using Marko v c hains and their app lications. Biometrika 57 97–109. [19] Liu, J. S. (2008). Monte Carlo Str ate gies in Scientific C om puting . Springer, N ew Y ork. MR2401592 [20] Ma, Z. M. and R ¨ ockner, M. (1992). Intr o duction to the The ory of (nonsymmetric) Dirichlet F orms . S pringer, Berlin. MR1214375 [21] Metropol is, N. , Rosenbluth, A. W. , Teller, M. N. and Teller, E. (1953). Equations of state calculations by fast computing machines. J. Chem. Phys. 21 1087–10 92. [22] Rob er t, C. P. and C asella, G . (2004). M onte Carlo Statistic al Metho ds , 2nd ed . Springer, N ew Y ork. MR2080278 [23] Rob er ts, G. O. , Gelman, A. and Gilks, W. R. (1997). W eak conv ergence and optimal scaling of rand om wal k Metrop olis algorithms. Ann. Appl. Pr ob ab. 7 110–120 . MR1428751 [24] Rob er ts, G. O. and Rosenthal, J. S. (1998). Op t imal scaling of discrete ap- proximati ons to Langevin diffusions. J. R. Stat. So c. Ser. B Stat. Metho dol. 60 255–268 . MR1625691 [25] Rob er ts, G. O. an d Rosenthal, J. S. (2001). Optimal scaling for v arious Metropolis–Hastings algori thms. Statist. Sci. 16 351–367. MR1888450 [26] Stroo ck, D. W. (1993). Pr ob abili ty The ory, an Analytic View . Cambridge Un iv. Press, Cambridge. MR1267569 [27] Stuar t, A. M. (2010). I nverse problems: A Ba yesian p ersp ective. A cta Numer. 19 451–559 . MR2652785 J. C. M a ttingl y Dep ar tment of M a thema tics Center for Theoretical and Ma them a tical Sciences Center for Nonlinear and Com plex Systems and Dep ar tmen t of St a tistical Sciences Duke University Durham, Nor th Carolina 277 08-0251 USA E-mail: jonm@math.duk e. edu N. S. Pillai Dep ar tment of S t a tistics Har v a rd University Cambridge, Massachusetts 021 38 USA E-mail: pillai@stat.harv ard.edu A. M. Stuar t Ma themat ics Institute W ar wick University CV4 7AL United Kingdom E-mail: a.m.stuart@warwic k.ac.uk
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment