Adaptive Gibbs samplers and related MCMC methods

The Annals of Applie d Pr obabil ity 2013, V ol. 23, N o. 1, 66–9 8 DOI: 10.1214 /11-AAP806 c  Institute of Mathematical Statistics , 2 013 AD APTIVE GIBBS SAMPLERS AND RELA TED M CMC METHODS By Krzysz tof La tuszy ´ nski 1 , 2 , 3 , Gareth O. R ober ts 3 and Jeffrey S. R os enthal 2 University of Warwick, University of Warwick and University of T or onto W e consider v arious versio ns of adaptive Gibbs and Metropolis- within-Gibbs s amplers, whic h update their selection probabilities (and p erhaps also their proposal distributions) on the ﬂy du ring a run by learning as they go in an attemp t to optimize the algorithm. W e present a cautionary example of ho w even a simple-seeming adaptive Gibbs sampler may fail to conv erge. W e then present v arious p ositiv e results guaran t eeing con vergence of adaptive Gibbs samplers under certain conditions. 1. In tro d uction. Mark o v c hain Mon te Carlo (MCMC) is a commonly used app roac h to ev aluating exp ectations of the f orm θ := R X f ( x ) π (d x ), where π is an intractable probabilit y measure, for example, kno wn up to a normalizing constan t. One sim ulates ( X n ) n ≥ 0 , an ergo dic Marko v c h ain on X , ev olving according to a transition k er n el P with stationary limiting distribution π and, t ypically , tak es ergo dic av erage as an estimate of θ . The approac h is justiﬁed by asymptotic Mark ov c hain theory (see, e.g., [ 30 , 40 ]). Metrop olis algorithms and Gibbs samplers (to b e describ ed in Section 2 ) are among the most common MCMC algorithms; cf. [ 26 , 33 , 40 ]. The qualit y of an estimate p ro duced by an MCMC algorithm d ep end s on p robabilistic p rop erties of the underlying Marko v c hain. Designing an appropriate transition k ern el P that guaran tees rapid con verge nce to sta- tionarit y and eﬃcien t simulation is often a challe nging task, esp ecially in high d im en sions. F or Metrop olis algorithms there are v arious optimal s cal- ing results [ 4 , 10 , 11 , 34 , 38 – 40 , 43 ] wh ich pro vide “prescriptions” of ho w to do this, though they t ypically dep end on unkn o wn c haracteristics of π . Received January 2011; revised June 2011. 1 While preparing this manuscript K. Latuszy ´ nski was a p ostdo ctoral fellow at the Department of Statistics, Universit y of T oronto. 2 Supp orted in p art by N SERC of Canada. 3 Supp orted in p art by CRiSM and other grants from EPSRC. AMS 2000 subje ct classiﬁc ations. Primary 60J05, 65C 05; secondary 62F15. Key wor ds and phr ases. MCMC estimation, adapt ive MCMC, Gibbs sampling. This is an electr onic reprint of the o riginal article published b y the Institute of Mathematical Statistics in The A nnals of Appli e d Pr ob ability , 2013, V ol. 23, No. 1, 6 6 –98 . This reprint diﬀers fr om the original in pag ination and t yp og raphic detail. 1 2 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL F or r andom scan Gibbs and Me trop olis-within-Gibbs samplers, a further design d ecision is c h o osing the selection p robabilities (i.e., co ordinate we igh t- ings) wh ic h will b e used to select whic h co ordinate to up date next. These are usually c h osen to b e uniform, but some recen t w ork [ 12 , 15 , 23 , 25 , 27 , 45 ] has s u ggested that n onuniform weig h tings may sometimes b e preferable. F or a v ery simple to y example to illustrate this issu e, supp ose X = [0 , 1] × [ − 100 , 100], w ith π ( x 1 , x 2 ) ∝ x 100 1 (1 + sin( x 2 )). Then with resp ect to x 1 , th is π puts almost all of the mass righ t up against the line x 1 = 1. Thus, rep eated Gibbs sampler up dates of the coordinate x 1 pro v id e virtually no help in ex- ploring the state space, and d o not need to b e done often at all (unless the fu nctional f of in terest is extr emely sensitive to tin y changes in x 1 ). By con trast, with resp ect to x 2 , this π is a highly m u lti-mo d al densit y with wide sup p ort and man y p eaks and v alleys, requiring man y up d ates to the co ordinate x 2 in order to explore the state space appr opriately . (Of course, as with any Gibbs sampler, rep eatedly up dating one co ord inate do es not help with distributional conv ergence; it only helps with sampling th e entire state s p ace to p ro duce go o d estimates.) Th us, an eﬃcien t Gibbs sampler for this examp le wo u ld not u p d ate eac h of x 1 and x 2 equally often; rather, it w ould u p d ate x 2 v ery often and x 1 hardly at all. Of course, in this simple example, it is easy to see directly that x 1 should b e u p d ated less than x 2 , and fu rthermore, such eﬃciencies wo u ld only improv e th e sampler by ap- pro x im ately a factor of 2. Ho w ever, in a high-dimensional example (cf. [ 12 ]), suc h issues could b e muc h more signiﬁcant , and also muc h more diﬃcult to detect man ually . One pr omising a ve n u e to add ress this challenge is adaptive M CMC al- gorithms . As an MCMC simulation progresses, more and more inform ation ab out the target distribution π is learned. Ad aptiv e MCMC attempts to use this new inf ormation to r edesign the trans ition k ernel P on the ﬂy , based on the curr en t sim ulation output. That is, the tran s ition ke rnel P n used for obtaining X n | X n − 1 ma y d ep end on { X 0 , . . . , X n − 1 } . So, in the ab ov e to y example, a go o d adaptive Gibbs sampler would someho w automat ically “learn” to up date x 1 less often, without requ iring the user to determine this man u ally (which could b e diﬃcult or imp ossible in a v ery high-dimensional problem). Suc h adaptiv e algorithms are only v alid if their ergo dicit y can b e estab- lished. Unfortun ately the s to c hastic pro cess ( X n ) n ≥ 0 for an adaptiv e algo- rithm is n o longer a Mark o v chain; th e p oten tial b eneﬁt of adaptive MCMC comes at the price of requiring more sophisticated theoretical analysis. T here is subs tantial and rapidly growing literature on b oth theory an d practice of adaptiv e MCMC (see, e.g., [ 1 – 3 , 5 – 9 , 13 , 14 , 17 – 19 , 22 , 41 , 42 , 44 , 46 – 48 ]) whic h includ es counterin tuitiv e examples where X n fails to con verge to the desired distr ibution π (cf. [ 5 , 9 , 22 , 41 ]), as wel l as many resu lts guaran- teeing ergo dicit y und er v arious assumptions. Most of the previous work on ergo dicit y of adaptive MCMC h as concentrat ed on adapting Metrop olis and ADAPTIVE GIBBS SAMPLERS 3 related algo r ithms, with less atten tion paid to ergo d icity wh en adapting the selection probabilities for random scan Gibbs samplers. Motiv ated by suc h considerations, in the present pap er w e s tudy the er - go d icit y of v arious t yp es of adaptiv e Gibb s samp lers. T o our knowledge, pro ofs of ergod icit y for adaptiv ely-wei gh ted Gibbs samplers ha ve previously b een considered only by [ 24 ], and we shall p ro vid e a counter-example b elo w (Example 3.1 ) to demonstr ate that their main result is not co rrect. In view of th is, we are n ot a ware of an y v alid ergo dicit y results in the literature that consider adapting selection probabilities of random scan Gibbs samplers, and w e attempt to ﬁll that gap herein. This pap er is organized as follo w s. W e b egin in Section 2 with b asic d ef- initions. In Section 3 we p resen t a cautionary Example 3.1 , wh ere a seem- ingly ergo dic adaptiv e Gibbs sampler is in fact tr ansien t (as we prov e for- mally later in Section 6 ) and provides a coun ter-example to Th eorem 2.1 of [ 24 ]. Next, we establish v arious p ositiv e results for ergo dicit y of adap- tiv e Gibbs samplers. W e consider adaptive r andom scan Gibbs samplers ( AdapRSG ) which up date co ord inate s election probabilities as the simula- tion progresses, adaptiv e random s can Metrop olis-within-Gibb s samplers ( AdapRSMwG ) whic h up date co ord inate s electio n p r obabilities as the sim ula- tion pr ogresses and adaptive random scan ad ap tive Metrop olis-within-Gibbs samplers ( Ada pRSadapM wG ) that up d ate co ordinate selec tion probabilities as w ell as p rop osal d istributions for the Metrop olis steps. Posit iv e results in the uniform setting are d iscussed in Section 4 , whereas Section 5 deals with the nonuniform setting. In eac h case, w e prov e that und er reasonably mild conditions, the adap tive Gib b s samplers are guarantee d to b e ergo dic, al- though our cautionary example d o es sho w that it is imp ortant to verify some conditions b efore app lying su c h algorithms. 2. Preliminaries. Gibbs samplers are commonly used MCMC algorithms for sampling f rom complicated high-d imensional probabilit y d istributions π in cases where the full conditional d istr ibutions of π are easy to sample from. T o d eﬁne them, let ( X , B ( X )) b e a d -dim en sional state space wh ere X = X 1 × · · · × X d and wr ite X n ∈ X as X n = ( X n, 1 , . . . , X n,d ). W e shall use the shorthand notation X n, − i := ( X n, 1 , . . . , X n,i − 1 , X n,i +1 , . . . , X n,d ) and similarly X − i = X 1 × · · · × X i − 1 × X i +1 × · · · × X d . Let π ( ·| x − i ) denote the conditional distribu tion of Z i | Z − i = x − i where Z ∼ π . Th e ran d om scan Gibbs s amp ler d ra ws X n giv en X n − 1 (iterativ ely for n = 1 , 2 , 3 , . . . ) b y ﬁr st choosing one coord inate at rand om according to some selection probabilities α = ( α 1 , . . . , α d ) (e.g ., uniformly), and then up dating that co ordin ate by a dr a w from its conditional distribution. More precisely , the Gibbs sampler transition k ernel P = P α is the result of p erformin g the follo wing three steps. 4 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL Algorithm 2.1 [ RSG ( α )]. (1) Cho ose co ordinate i ∈ { 1 , . . . , d } accord- ing to selection probabilities α , that is, with P ( i = j ) = α j . (2) Draw Y ∼ π ( ·| X n − 1 , − i ). (3) S et X n := ( X n − 1 , 1 , . . . , X n − 1 ,i − 1 , Y , X n − 1 ,i +1 , . . . , X n − 1 ,d ). Whereas the standard app r oac h is to c ho ose the coord in ate i at the ﬁ rst step u niformly at random, w hic h corresp onds to α = (1 /d, . . . , 1 /d ), th is ma y b e a su bstant ial w aste of sim ulation eﬀort if d is large and v ariabilit y of co ordinates diﬀers signiﬁcan tly . This has b een discussed th eoreticall y in [ 27 ] and also observ ed empirically , for example, in Ba yesia n v ariable s electio n for linear mod els in statistica l genetics [ 12 , 45 ]. Throughout the pap er w e den ote th e transition kernel of a random scan Gibbs sampler w ith selection probabilities α as P α and the transition ker- nel of a sin gle Gibb s up d ate of co ordinate i is denoted as P i , hence, P α = P d i =1 α i P i . W e consider a class of adaptiv e random s can Gibbs samplers w h ere selec- tion probabilities α = ( α 1 , . . . , α d ) are sub ject to optimization within some subset Y ⊆ [0 , 1] d of p ossible c hoices. Therefore a s ingle step of our generic adaptiv e algorithm f or dr a wing X n giv en the tra jectory X n − 1 , . . . , X 0 and current selection probabilities α n − 1 = ( α n − 1 , 1 , . . . , α n − 1 ,d ) amounts to the follo wing steps, where R n ( · ) is s ome up date rule for α n . Algorithm 2.2 ( AdapRSG ). (1) Set α n := R n ( α 0 , . . . , α n − 1 , X n − 1 , . . . , X 0 ) ∈ Y . (2) C ho ose co ordinate i ∈ { 1 , . . . , d } according to selection pr ob ab ilities α n . (3) Draw Y ∼ π ( ·| X n − 1 , − i ). (4) S et X n := ( X n − 1 , 1 , . . . , X n − 1 ,i − 1 , Y , X n − 1 ,i +1 , . . . , X n − 1 ,d ). Algorithm 2.2 deﬁn es P n , the tr ansition k ern el used at time n , and α n here pla ys the role of Γ n in the m ore general adaptive s etting of, for exam- ple, [ 9 , 41 ]. Let π n = π n ( x 0 , α 0 ) denote the distribu tion of X n induced by Algorithm 2.1 or 2.2 , giv en starting v alues x 0 and α 0 , th at is, for B ∈ B ( X ), π n ( B ) = π n (( x 0 , α 0 ) , B ) := P ( X n ∈ B | X 0 = x 0 , α 0 ) . (1) Clearly , if o n e uses Algorithm 2.1 then α 0 = α remains ﬁxed and π n ( x 0 , α )( B ) = P n α ( x 0 , B ). By k ν − µ k TV denote the tota l v ariation distance b et w een probability measures ν and µ . Let T ( x 0 , α 0 , n ) := k π n ( x 0 , α 0 ) − π k TV . (2) W e call the adaptiv e Algorithm 2.2 er go dic if T ( x 0 , α 0 , n ) → 0 for π -almost ev ery starting state x 0 and all α 0 ∈ Y . ADAPTIVE GIBBS SAMPLERS 5 W e shall also consider random scan Metrop olis-within-Gibbs samplers that instead of sampling from th e fu ll conditional at step (2) of Algo- rithm 2.1 [resp., at step (3) of Algorithm 2.2 ], p erform a single Metrop o- lis or Metrop olis–Hastings step [ 20 , 29 ]. More precisely , given X n − 1 , − i , the i th co ord inate X n − 1 ,i is u p dated by a d r a w Y from the pr op osal distri- bution Q X n − 1 , − i ( X n − 1 ,i , · ) w ith the usual Metrop olis acceptance probabil- it y f or the marginal stationary distribution π ( ·| X n − 1 , − i ). Such Metrop olis- within-Gibbs algorithms were originally prop osed by [ 29 ] and ha ve b een v ery wid ely used. V ersions of this algorithm wh ic h adapt the prop osal dis- tributions Q X n − 1 , − i ( X n − 1 ,i , · ) w ere considered b y , for example, [ 19 , 42 ], but alw a ys with ﬁxed (usually u niform) co ord in ate selection pr ob ab ilities. If in- stead the prop osal distr ibutions Q X n − 1 , − i ( X n − 1 ,i , · ) remain ﬁxed, but the selection pr obabilities α i are adapted on the ﬂy , we obtain the follo wing algorithm [where q x, − i ( x, y ) is th e densit y fun ction for Q x, − i ( x, · )]. Algorithm 2.3 ( AdapRS MwG ). (1) Set α n := R n ( α 0 , . . . , α n − 1 , X n − 1 , . . . , X 0 ) ∈ Y . (2) C ho ose co ordinate i ∈ { 1 , . . . , d } according to selection pr ob ab ilities α n . (3) Draw Y ∼ Q X n − 1 , − i ( X n − 1 ,i , · ). (4) With probabilit y min  1 , π ( Y | X n − 1 , − i ) q X n − 1 , − i ( Y , X n − 1 ,i ) π ( X n − 1 | X n − 1 , − i ) q X n − 1 , − i ( X n − 1 ,i , Y )  , (3) accept the pr op osal an d set X n = ( X n − 1 , 1 , . . . , X n − 1 ,i − 1 , Y , X n − 1 ,i +1 , . . . , X n − 1 ,d ); otherwise, reject th e prop osal and set X n = X n − 1 . Ergo dicit y of AdapR SMwG is considered in Sections 4.2 and 5 b elo w. Of course, if the pr op osal d istribution Q X n − 1 , − i ( X n − 1 ,i , · ) is symmetric ab out X n − 1 , then the q factors in the accepta nce pr obabilit y ( 3 ) cancel out, and ( 3 ) reduces to the simpler probabilit y min(1 , π ( Y | X n − 1 , − i ) /π ( X n − 1 | X n − 1 , − i )). W e sh all also consider v ersions of the algorithm in which th e p rop osal dis- tributions Q X n − 1 , − i ( X n − 1 ,i , · ) are also c h osen adaptiv ely , from some f amily { Q x − i ,γ } γ ∈ Γ i with corresp onding density fun ctions q x − i ,γ , as in, for example, the s tatistica l genetics application [ 12 , 45 ]. V ersions of su c h algorithms w ith ﬁxed selection p robabilities are considered b y , f or example, [ 19 ] and [ 42 ]. They r equire additional adaptation parameters γ n,i that are up d ated on th e ﬂy and are allo w ed to d ep end on the p ast tra jectories. More precisely , if γ n = ( γ n, 1 , . . . , γ n,d ) and G n = σ { X 0 , . . . , X n , α 0 , . . . , α n , γ 0 , . . . , γ n } , then the conditional distribution of γ n giv en G n − 1 can b e sp eciﬁed b y the particular algorithm used, via a second up date fun ction R ′ n . I f we com bine suc h pro- p osal distrib ution adaptions with co ord in ate selection probabilit y adaptions, this results in a doubly-adaptiv e algorithm, as follo ws. 6 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL Algorithm 2.4 ( AdapRSadapM wG ). (1) Set α n := R n ( α 0 , . . . , α n − 1 , X n − 1 , . . . , X 0 , γ n − 1 , . . . , γ 0 ) ∈ Y . (2) S et γ n := R ′ n ( α 0 , . . . , α n − 1 , X n − 1 , . . . , X 0 , γ n − 1 , . . . , γ 0 ) ∈ Γ 1 × · · · × Γ n . (3) C ho ose co ordinate i ∈ { 1 , . . . , d } according to s electio n probabilities α , that is, with P ( i = j ) = α j . (4) Draw Y ∼ Q X n − 1 , − i ,γ n − 1 ,i ( X n − 1 ,i , · ). (5) With probabilit y giv en by ( 3 ), min  1 , π ( Y | X n − 1 , − i ) q X n − 1 , − i ,γ n − 1 ,i ( Y , X n − 1 ,i ) π ( X n − 1 | X n − 1 , − i ) q X n − 1 , − i ,γ n − 1 ,i ( X n − 1 ,i , Y )  , accept the pr op osal an d set X n = ( X n − 1 , 1 , . . . , X n − 1 ,i − 1 , Y , X n − 1 ,i +1 , . . . , X n − 1 ,d ); otherwise, reject th e prop osal and set X n = X n − 1 . Ergo dicit y of Ad apRSadap MwG is considered in Secti ons 4.3 and 5 b elo w. 3. A coun ter-example. Adaptiv e algorithms destro y the Mark ovia n na- ture of ( X n ) n ≥ 0 , and are th u s notoriously diﬃcu lt to analyze theoretically . In particular, it is easy to b e tric ked into thinking that a simp le adaptive algorithm “m ust” b e ergodic when in fact it is not. F or example, Th eorem 2.1 of [ 24 ] s tates th at ergo dicity of ad ap tive Gibbs samplers follo w s from the follo wing tw o simple cond itions: (i) α n → α a.s. for some ﬁxed α ∈ (0 , 1) d ; and (ii) th e ran d om scan Gibbs sampler with ﬁxed selectio n p r obabilities α induces an ergo dic Mark o v c hain with statio nary distribu tion π . Unfortunately , this claim is false, that is, (i) and (ii) alone do n ot guar- an tee ergo dicit y , as the follo w ing example and prop osition d emonstrate. (It seems that in th e pro of of Theorem 2.1 in [ 24 ], the same measure is used to represent tr a j ectories of the adaptiv e p ro cess and of a corresp onding n on - adaptiv e pro cess, wh ich is not correct and th us leads to the error.) Example 3.1. Let N = { 1 , 2 , . . . } , and let the state space X = { ( i, j ) ∈ N × N : i = j or i = j + 1 } , with target distribu tion giv en b y π ( i, j ) ∝ j − 2 . On X , consider a class of adap tive r andom scan Gibb s s amp lers for π , as deﬁned b y Algorithm 2.2 , with up date rule giv en by R n ( α n − 1 , X n − 1 = ( i, j )) =         1 2 + 4 a n , 1 2 − 4 a n  , if i = j ,  1 2 − 4 a n , 1 2 + 4 a n  , if i = j + 1 (4) for s ome c h oice of the sequence ( a n ) ∞ n =0 satisfying 8 < a n ր ∞ . ADAPTIVE GIBBS SAMPLERS 7 Fig. 1. T r ac e plot of X n, 1 fr om Example 3.1 . Example 3.1 satisﬁes assump tions (i) and (ii) ab o ve. Indeed, (i) clearly holds sin ce α n → α := ( 1 2 , 1 2 ), and (ii) follo w s immediately from the s tan- dard Marko v c hain prop erties of irreducibilit y and aperio d icit y; cf. [ 30 , 40 ]. Ho we v er, if a n increases to ∞ slo wly enough, then the example exhibits transien t b eha vior and is not ergo dic. More pr ecisely , we s h all pro v e the follo wing prop osition. Pr oposition 3.2. Ther e exists a choic e of the ( a n ) for which the pr o- c ess ( X n ) n ≥ 0 deﬁne d in Example 3.1 is not er go dic. Sp e ciﬁc al ly, starting at X 0 = (1 , 1) , we have P ( X n, 1 → ∞ ) > 0 , that is, the pr o c e ss exhibits tr ansient b ehavior with p ositive pr ob ability, so it do es not c onver ge in distribution to any pr ob ability me asur e on X . In p articular, k π n − π k TV 9 0 . Remark 3.3. I n fact, w e b eliev e that in Prop osition 3.2 , P ( X n, 1 → ∞ ) = 1, though to reduce tec h nicalities w e only p r o ve that P ( X n, 1 → ∞ ) > 0, whic h is suﬃcien t to establish n on er go d icit y . A detailed pr o of of Prop osition 3.2 is presente d in Section 6 . W e also sim u lated Example 3.1 on a computer [with the ( a n ) as deﬁ n ed in Section 6 ], resulting in the trace plot of X n, 1 (Figure 1 ) whic h illustrates the transient b ehavio r sin ce X n, 1 increases quic kly and steadily as a function of n : 4. Ergo dicit y—the u niform case. W e no w present p ositiv e results ab out ergo dicit y of adaptiv e Gibb s samplers under v arious assumptions. Results of this section are sp eciﬁc to uniformly er go dic c h ains. (Recall that a Mark o v c hain with transition k er n el P is u niformly ergo d ic if there exist M < ∞ and ρ < 1 s.t. k P n ( x, · ) − π ( · ) k TV ≤ M ρ n for ev ery x ∈ X ; see, e.g., [ 30 , 40 ] for th is and other notions related to general state space Marko v c hains.) In some sense this is a sev ere restriction, since most MCMC algorithms arising 8 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL in statistical app lications are not u niformly ergo dic. How ev er, trun cating the v ariables in volv ed at some (v ery large) v alue is usually suﬃcien t to ensu re uniform ergodicit y without aﬀect ing the statistical conclusions in an y p rac- tical sense, so the r esults of this s ection may b e suﬃcien t for a pr agmatic user. The nonuniform case is considered in the follo w in g Section 5 . T o con tinue, recall that RS G ( α ) stand s for random scan Gibbs s ampler with selection probabilities α as deﬁn ed by Algorithm 2.1 , and AdapRS G is the adaptive v ersion as deﬁned b y Algorithm 2.2 . F or notat ion, let ∆ d − 1 := { ( p 1 , . . . , p d ) ∈ R d : p i ≥ 0 , P d i =1 p i = 1 } b e the ( d − 1)-dimensional probabilit y simplex, and let Y := [ ε, 1] d ∩ ∆ d − 1 (5) for some 0 < ε ≤ 1 /d . W e shall assu me that all ou r selection probabilities are in this set Y . Remark 4.1. The ab o ve assumption ma y seem constraining, it is, h o w- ev er, irrelev an t in practice. T he additional computational eﬀort on top of the unknown optimal strategy α ∗ (that ma y b e in ∆ d − 1 − Y ) is easily co n- trolled b y setting ε := ( K d ) − 1 that eﬀe ctiv ely up p erb ounds it b y 1 /K . The argumen t can b e easily made rigorous, for example, in terms of the total v ariation distance or the asymp totic v ariance. 4.1. A daptive r andom sc an Gibbs sam plers. The main r esu lt of this s ec- tion is the follo wing theorem. Theorem 4.2. L et the sele ction pr ob abilities α n ∈ Y for al l n , with Y as in ( 5 ). Assume that: (a) | α n − α n − 1 | → 0 i n pr ob ability for ﬁxe d starting values x 0 ∈ X and α 0 ∈ Y . (b) ther e exists β ∈ Y s.t. R SG ( β ) is uniformly er g o dic. Then AdapR SG is er g o dic, that is, T ( x 0 , α 0 , n ) → 0 as n → ∞ . (6) Mor e over, if: (a ′ ) su p x 0 ,α 0 | α n − α n − 1 | → 0 in pr ob ability, then c onver genc e of AdapRSG is also uniform over al l x 0 , α 0 , that is, sup x 0 ,α 0 T ( x 0 , α 0 , n ) → 0 as n → ∞ . (7) Remark 4.3. (1) Assumption (b) will t yp ically b e v eriﬁed for β = (1 /d, . . . , 1 /d ); see also Prop osition 4.8 b elo w. ADAPTIVE GIBBS SAMPLERS 9 (2) W e exp ect that most adaptive random s can Gibb s samplers w ill b e designed so th at | α n − α n − 1 | ≤ a n for every n ≥ 1, x 0 ∈ X , α 0 ∈ Y , and ω ∈ Ω, for some deterministic sequence a n → 0 (wh ich holds, e.g., for the adaptations considered in [ 12 ]). In suc h cases, (a ′ ) is automatic ally satisﬁed. (3) T he sequence α n is n ot required to con verge and , in p articular, the amoun t of adaptation, that is, P ∞ n =1 | α n − α n − 1 | , is allo wed to b e inﬁ nite. (4) I n Example 3.1 , condition (a ′ ) is satisﬁed but condition (b) is not. (5) I f we mo dify Example 3.1 by tru ncating the state space to sa y ˜ X = X ∩ ( { 1 , . . . , M } × { 1 , . . . , M } ) for some 1 < M < ∞ , then the corresp onding adaptiv e Gibbs sampler is ergo dic and ( 7 ) h olds. Before we pro ceed with the pro of of Theorem 4.2 , we need some prelimi- nary lemmas, which ma y b e of indep endent interest. Lemma 4.4. L et β ∈ Y with Y as in ( 5 ). If RSG ( β ) is uniformly er go dic, then al so RSG ( α ) is uniformly e r go dic for every α ∈ Y . Mor e over, ther e exist M < ∞ and ρ < 1 s.t. sup x 0 ∈X ,α ∈Y T ( x 0 , α, n ) ≤ M ρ n → 0 . Pr oof. Let P β b e the transition k ernel of RS G ( β ). It is well kno wn that for u niformly ergo d ic Marko v c hains the whole state space X is small (cf. Theorems 5.2.1 and 5.2.4 in [ 30 ] with their ψ = π ). Thus there exists s > 0, a probabilit y measure µ on ( X , B ( X )) and a p ositive integ er m , s.t. for ev ery x ∈ X , P m β ( x, · ) ≥ sµ ( · ) . (8) Fix α ∈ Y and let r := min i α i β i . Since β ∈ Y , we ha ve 1 ≥ r ≥ ε 1 − ( d − 1) ε > 0 and P α can b e written as a mixture of transition kernels of t wo rand om scan Gibbs samplers, namely , P α = r P β + (1 − r ) P q where q = α − r β 1 − r . This, com bined with ( 8 ), implies P m α ( x, · ) ≥ r m P m β ( x, · ) ≥ r m sµ ( · ) (9) ≥  ε 1 − ( d − 1) ε  m sµ ( · ) for ev ery x ∈ X . By Theorem 8 of [ 40 ], condition ( 9 ) implies k P n α ( x, · ) − π ( · ) k TV ≤  1 −  ε 1 − ( d − 1) ε  m s  ⌊ n/m ⌋ for all x ∈ X . (10) Since the r igh t-hand s ide of ( 10 ) do es not d ep end on α , the claim f ollo ws.  10 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL Lemma 4.5. L et P α and P α ′ b e r andom sc an Gibbs samplers using se- le ction pr ob abilitie s α, α ′ ∈ Y := [ ε, 1 − ( d − 1) ε ] d for some ε > 0 . Then k P α ( x, · ) − P α ′ ( x, · ) k TV ≤ | α − α ′ | ε + | α − α ′ | ≤ | α − α ′ | ε . (11) Pr oof. Let δ := | α − α ′ | . T h en r := min i α ′ i α i ≥ ε ε +max i | α i − α ′ i | ≥ ε ε + δ and, reasoning as in the pro of of Lemma 4.4 , w e can wr ite P α ′ = r P α + (1 − r ) P q for s ome q and compute k P α ( x, · ) − P α ′ ( x, · ) k TV = k ( r P α + (1 − r ) P α ) − ( r P α + (1 − r ) P q ) k TV = (1 − r ) k P α − P q k TV ≤ δ ε + δ as cla imed.  Corollar y 4.6. P α ( x, B ) as a function of α on Y is Lipschitz with Lipschitz c onstant 1 /ε f or every ﬁxe d set B ∈ B ( X ) . Corollar y 4.7. If | α n − α n − 1 | → 0 in pr ob ability, then also sup x ∈X k P α n ( x, · ) − P α n − 1 ( x, · ) k TV → 0 in pr ob ability. Pr oof of Theor em 4.2 . W e conclude th e result fr om Theorem 1 of [ 41 ] that requires sim ultaneous un iform ergodicity and diminishing adap- tation. Simulta neous uniform ergo dicit y r esults from com bining assu mption (b) and Lemma 4.4 . Diminishing adaptation results from assumption (a) with Corollary 4.7 . Moreo ve r, note that Lemma 4.4 is un iform in x 0 and α 0 and (a ′ ) yields uniformly diminishin g adaptation aga in b y Corolla ry 4.7 . A lo ok into the pro of of Theorem 1 of [ 41 ] rev eals that this s uﬃces for the uniform part of Theorem 4.2 .  Finally , w e n ote that v erifyin g uniform ergo dicit y of a random s can Gibbs sampler, as required b y assump tion (b) of Theorem 4.2 , may not b e straight- forw ard. Suc h issues ha ve b een in v estigated in, for example, [ 35 ], and more recen tly in relation to th e parametrization of h ierarc hical mo d els (see [ 32 ] and references therein). In the follo wing p rop osition, w e sho w th at to ve rify uniform ergo dicit y of any random scan Gibbs sampler, it suﬃ ces to ver- ify uniform ergo dicity of the corresp onding systematic scan Gibbs sampler (whic h up dates the co ordinates 1 , 2 , . . . , d in sequence r ather than select co- ordinates randomly). See also T heorem 2 of [ 31 ] for a related result. Pr oposition 4.8. L et α ∈ Y with Y as in ( 5 ). If the systematic sc an Gibbs sampler is uniformly er go dic, then so is RS G ( α ) . ADAPTIVE GIBBS SAMPLERS 11 Pr oof. Let P = P 1 P 2 · · · P d b e the transition kernel of th e uniform ly ergo d ic systematic scan Gibb s sam- pler, w h ere P i stands for the step th at up dates co ordinate i . By the minoriza- tion condition c haracterization, th ere exist s > 0, a probabilit y measure µ on ( X , B ( X )) and a p ositiv e integ er m , s.t. for ev ery x ∈ X , P m ( x, · ) ≥ sµ ( · ) . Ho we v er, the pr ob ab ility th at the r an d om scan Gibb s sampler P 1 /d in its md subsequent steps will up d ate the coordin ates in exactly the same order is (1 /d ) md > 0. Therefore, the follo wing m in orization cond ition h olds for the random scan Gibbs sampler. P md 1 /d ( x, · ) ≥ (1 / d ) md sµ ( · ) . W e conclude that RSG (1 /d ) is uniformly ergodic and then, b y Lemma 4.4 , it follo ws that RSG ( α ) is uniformly ergodic for an y α ∈ Y .  4.2. A daptive r andom sc an Metr op olis-within-Gibbs. In this section we consider random scan Metrop olis-within-Gibbs sampler algorithms (see also Section 5 for the nonuniform case). Th us, giv en X n − 1 , − i , the i th co ord inate X n − 1 ,i is up dated b y a dra w Y from the prop osal distribu tion Q X n − 1 , − i ( X n − 1 ,i , · ) with the usual Metrop olis acc eptance probability f or the marginal stationary d istribution π ( ·| X n − 1 , − i ). Here, we consider algo- rithm AdapR SMwG , where the prop osal distr ib utions Q X n − 1 , − i ( X n − 1 ,i , · ) r e- main ﬁxed, but th e selection probabilities α i are adapted on the ﬂ y . W e shall prov e ergo d icit y of su c h algorithms und er some circumstances. (T h e more general algorithm Ada pRSadapM wG is then considered in the f ollo wing section.) T o con tin u e, let P x − i denote the r esulting Metrop olis transition kernel for obtaining X n,i | X n − 1 ,i giv en X n − 1 , − i = x − i . W e shall r equ ire the follo win g assumption. Assumpt ion 4.9. F or ev ery i ∈ { 1 , . . . , d } th e tran s ition k ern el P x − i is uniformly ergo dic for eve r y x − i ∈ X − i . Moreo v er, there exist s i > 0 and an in teger m i s.t. for ev ery x − i ∈ X − i there exists a p robabilit y measure ν x − i on ( X i , B ( X i )), s.t. P m i x − i ( x i , · ) ≥ s i ν x − i ( · ) for eve ry x i ∈ X i . W e hav e th e follo wing counterpart of Theorem 4.2 . Theorem 4.10. L et α n ∈ Y for al l n , with Y as in ( 5 ). Assume that: (a) | α n − α n − 1 | → 0 i n pr ob ability for ﬁxe d starting values x 0 ∈ X and α 0 ∈ Y . 12 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL (b) ther e exists β ∈ Y s.t. R SG ( β ) is uniformly er g o dic. (c) Assumption 4.9 holds. Then AdapR SMwG is er go dic, that is, T ( x 0 , α 0 , n ) → 0 as n → ∞ . (12) Mor e over, if: (a ′ ) su p x 0 ,α 0 | α n − α n − 1 | → 0 in pr ob ability, then c onver genc e of AdapRSM wG is also uniform over al l x 0 , α 0 , that is, sup x 0 ,α 0 T ( x 0 , α 0 , n ) → 0 as n → ∞ . (13) Remark 4.11. Remarks 4.3 (1)–(3) still apply . Also, Assumption 4.9 can easily b e veriﬁed in some ca ses of interest, for example: (1) I n dep en dence samplers are essentiall y uniformly ergo dic if and only if the candidate density is b ound ed b elo w by a m u ltiple of the stationary densit y , that is, q (d x ) ≥ sπ (d x ) for some s > 0; cf. [ 28 ]. (2) T he Metrop olis–Hastings algorithm with con tinuous and p ositiv e pro- p osal densit y q ( · , · ) and b oun ded target densit y π is un iformly ergo d ic if the state space is compact; cf. [ 30 , 40 ]. T o prov e T heorem 4.10 w e build on the approac h of [ 37 ]. In particular, recall the follo wing notions of rev ersibilit y and of strong uniform ergod icit y . Definition 4.12. W e sa y that a tr an s ition k ernel P on X is rev ersib le with resp ect to its statio nary distribution π , if for any A, B ∈ B ( X ) Z A P ( x, B ) π (d x ) = Z B P ( y , A ) π (d y ) . Definition 4.13. W e sa y th at a transition k ernel P on X with station- ary distribu tion π is ( m, s ) - str ongly uniformly er go dic , if f or some s > 0 and p ositiv e intege r m P m ( x, · ) ≥ sπ ( · ) for every x ∈ X . Moreo v er, we w ill say that a family of Marko v c hains { P γ } γ ∈ Γ on X with stationary d istribution π is ( m, s ) - simultane ously str ongly uniformly er go dic , if for some s > 0 an d p ositive int eger m P m γ ( x, · ) ≥ sπ ( · ) for every x ∈ X and γ ∈ Γ . By Prop osition 1 in [ 37 ], if a Mark ov c hain is b oth unif orm ly ergo dic and reversible, then it is strongly u niformly ergo dic. The follo win g lemma impro v es o ver this result b y con trolling b oth inv olv ed parameters. ADAPTIVE GIBBS SAMPLERS 13 Lemma 4.14. L et µ b e a pr ob ability me asur e on X , let m b e a p ositive inte ger and let s > 0 . If a r eversible tr ansition k ernel P satisﬁes the c ondition P m ( x, · ) ≥ sµ ( · ) for every x ∈ X , then it is (( ⌊ log( s/ 4) log (1 − s ) ⌋ + 2) m, s 2 8 ) -str ongly uniformly er go dic. Pr oof. By Theorem 8 of [ 40 ], for ev er y A ∈ B ( X ) w e ha v e k P n ( x, A ) − π ( A ) k TV ≤ (1 − s ) ⌊ n/m ⌋ and, in particular, k P k m ( x, A ) − π ( A ) k TV ≤ s/ 4 for k ≥ log( s/ 4) log(1 − s ) . (14) Since π is stationary f or P , we ha ve π ( · ) ≥ sµ ( · ) and thus an upp er b ound for th e Radon–Nik o dym deriv ativ e d µ/ d π ≤ 1 /s. (15) Moreo v er, by rev ersib ilit y , π (d x ) P m ( x, d y ) = π (d y ) P m ( y , d x ) ≥ π (d y ) sµ (d x ) and co nsequent ly P m ( x, d y ) ≥ s ( µ (d x ) /π (d x )) π (d y ) . (16) No w deﬁne A := { x ∈ X : µ (d x ) /π (d x ) ≥ 1 / 2 } . Clearly µ ( A c ) ≤ 1 / 2. Therefore by ( 15 ) w e ha ve 1 / 2 ≤ µ ( A ) ≤ (1 /s ) π ( A ) and hence, π ( A ) ≥ s/ 2. Moreo v er ( 14 ) yields P k m ( x, A ) ≥ s/ 4 for k :=  log( s / 4) log(1 − s )  + 1 and with k deﬁn ed ab o ve by ( 16 ), w e ha ve P k m + m ( x, · ) = Z X P k m ( x, d z ) P m ( z , · ) ≥ Z A P k m ( x, d z ) P m ( z , · ) ≥ Z A P k m ( x, d z )( s/ 2) π ( · ) ≥ ( s 2 / 8) π ( · ) . The pro of is complete.  W e will need the follo wing generalization of Lemma 4.4 . Lemma 4.15. L et β ∈ Y with Y as in ( 5 ). If RS G ( β ) is uniformly er go dic then th er e exist s ′ > 0 and a p ositive inte ger m ′ s.t. the family { RSG ( α ) } α ∈Y is ( m ′ , s ′ ) -simultane ously str ongly uniformly er go dic. 14 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL Pr oof. P β ( x, · ) is un if orm ly ergo dic and rev ersib le, th erefore, by Prop o- sition 1 in [ 37 ], it is ( m, s 1 )-strongly un iformly ergo dic f or some m and s 1 . Therefore, and arguin g as in the pro of of Lemma 4.4 [cf. ( 9 )] there exist s 2 ≥ ( ε 1 − ( d − 1) ε ) m , s.t. for ev ery α ∈ Y and ev ery x ∈ X P m α ( x, · ) ≥ s 2 P m β ( x, · ) ≥ s 1 s 2 π ( · ) . (17) Set m ′ = m and s ′ = s 1 s 2 .  Pr oof of Theorem 4.10 . W e pro ceed as in the pr o of of Theorem 4.2 , that is, establish diminish ing adaptation and sim ultaneous uniform ergo d - icit y and conclude ( 12 ) and ( 13 ) f rom Th eorem 1 of [ 41 ]. Observe that Lemma 4.5 applies for rand om s can Metrop olis-within-Gibb s algorithms exactly the same w a y as for random scan Gibbs samplers th u s dimin ish- ing adaptation results from assum p tion (a) and Corollary 4.7 . T o estab- lish sim u ltaneous u niform ergo dicit y , observe that, by Assumption 4.9 an d Lemma 4.14 , the Metrop olis transition ke rnel for i th co ord inate, that is, P x − i , has stationary distribution π ( ·| x − i ) and is (( ⌊ log( s i / 4) log(1 − s i ) ⌋ + 2) m i , s 2 i 8 )- strongly uniformly ergod ic. Moreo ve r, by Lemma 4.15 , th e family RSG ( α ), α ∈ Y is ( m ′ , s ′ )-strongly uniformly ergo d ic, therefore, by Th eorem 2 of [ 37 ], the family of random scan Met rop olis-within-Gibbs samplers with selec tion probabilities α ∈ Y , RS MwG ( α ), is ( m ∗ , s ∗ )-sim ultaneously strongly uniformly ergo dic with m ∗ and s ∗ giv en as in [ 37 ].  W e close th is section with the follo wing alternativ e v ers ion of T heo- rem 4.10 . Theorem 4.16. L et α n ∈ Y for al l n , with Y as in ( 5 ). Assume that: (a) | α n − α n − 1 | → 0 in pr ob ability for ﬁxe d starting values x 0 ∈ X and α 0 ∈ Y . (b) ther e exists β ∈ Y s.t. R SMwG ( β ) is uniformly er go dic. Then AdapR SMwG is er go dic, that is, T ( x 0 , α 0 , n ) → 0 as n → ∞ . (18) Mor e over, if: (a ′ ) su p x 0 ,α 0 | α n − α n − 1 | → 0 in pr ob ability, then c onver genc e of AdapRSM wG is also uniform over al l x 0 , α 0 , that is, sup x 0 ,α 0 T ( x 0 , α 0 , n ) → 0 as n → ∞ . (19) Pr oof. Diminish ing adaptation results from assumption (a) and Corol- lary 4.7 . Simulta neous uniform ergo dicit y can b e established as in the pro of of Lemma 4.4 . The claim follo ws from Theorem 1 of [ 41 ].  ADAPTIVE GIBBS SAMPLERS 15 Remark 4.17. Whereas the statement of Theorem 4.16 ma y b e useful in sp eciﬁc examples, t yp ically condition (b), the uniform ergo d icit y of a random scan Metrop olis-within-Gibbs samp ler, will b e not a v ailable and establishing it will in volv e conditions required by Theorem 4.10 . 4.3. A daptive r andom sc an adaptive Metr op olis-within-Gibbs. In this sec- tion, and also later in S ection 5 , we consider the adaptiv e random scan adap- tiv e Metrop olis-within-Gibbs algorithm A dapRSada pMwG , th at u p d ates b oth selection probabilities of the Gibbs kernel and prop osal distribu tions of the Metrop olis step. Thus, giv en X n − 1 , − i , the i th co ordinate X n − 1 ,i is u p dated b y a dra w Y from a prop osal distribution Q X n − 1 , − i ,γ n,i ( X n − 1 ,i , · ) with the usual acceptance probabilit y . This doubly-adaptive algorithm has b een used b y , for example, [ 12 ], for an application in statistica l genetics. As with adap- tiv e Metrop olis algorithms, the adaption of the p rop osal distributions in this setting is motiv ated by optimal scaling r esults for random w alk Metrop olis algorithms [ 4 , 10 , 11 , 34 , 38 – 40 , 42 , 43 ]. Let P x − i ,γ n,i denote the resulting Metrop olis transition kernel for obtain- ing X n,i | X n − 1 ,i giv en X n − 1 , − i = x − i . W e will prov e ergodicit y of this gener- alized algorithm using tools from the previous section. Assumption 4.9 m u st b e reform ulated accordingly , as follo ws. Assumpt ion 4.18. F or ev ery i ∈ { 1 , . . . , d } , x − i ∈ X − i and γ i ∈ Γ i , th e transition ke rnel P x − i ,γ i is unif orm ly ergo dic. Moreo ver, there exist s i > 0 and an inte ger m i s.t. for every x − i ∈ X − i and γ i ∈ Γ i there exists a probabilit y measure ν x − i ,γ i on ( X i , B ( X i )), s.t. P m i x − i ,γ i ( x i , · ) ≥ s i ν x − i ,γ i ( · ) for ev ery x i ∈ X i . W e hav e th e follo wing counterpart of Theorems 4.2 and 4.10 . Theorem 4.19. L et α n ∈ Y for al l n , with Y as in ( 5 ). Assume that: (a) | α n − α n − 1 | → 0 in pr ob ability for ﬁxe d starting values x 0 ∈ X , α 0 ∈ Y and γ 0 ∈ Γ . (b) ther e exists β ∈ Y s.t. R SG ( β ) is uniformly er go dic. (c) A ssumption 4.18 holds. (d) The M etr op olis-within-Gibbs kernels exhibit diminishing adapta tion, that is, for every i ∈ { 1 , . . . , d } the G n +1 me asur able r andom variable sup x ∈X k P x − i ,γ n +1 ,i ( x i , · ) − P x − i ,γ n,i ( x i , · ) k TV → 0 in pr ob ability, as n → ∞ for ﬁxe d starting values x 0 ∈ X , α 0 ∈ Y and γ 0 . Then AdapR SadapMwG is er go dic, that is, T ( x 0 , α 0 , n ) → 0 as n → ∞ . (20) 16 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL Mor e over, if: (a ′ ) su p x 0 ,α 0 | α n − α n − 1 | → 0 in pr ob ability, (d ′ ) sup x 0 ,α 0 sup x ∈X k P x − i ,γ n +1 ,i ( x i , · ) − P x − i ,γ n,i ( x i , · ) k TV → 0 in pr ob abil- ity, then c onver genc e of Ad apRSadap MwG is also uniform over al l x 0 , α 0 , that is, sup x 0 ,α 0 T ( x 0 , α 0 , n ) → 0 as n → ∞ . (21) Remark 4.20. Remarks 4.3 (1)–(3) s till apply and Remark 4.11 applies for ve rifying Ass umption 4.18 . V erifying cond ition (d) is discussed after the pro of. Pr oof of Theorem 4.19 . W e again p ro ceed b y establishing dimin- ishing adaptation and simulta neous uniform ergo dicity and concluding the result from Theorem 1 of [ 41 ]. T o establish sim ultaneous uniform ergo d ic- it y we p ro ceed as in the pro of of Theorem 4.10 . Observe that by Assu m p- tion 4.18 and L emm a 4.14 every adaptive Metrop olis tr ansition k ern el for i th co ordin ate, th at is, P x − i ,γ i , has stationary distribu tion π ( ·| x − i ) and is (( ⌊ log( s i / 4) log (1 − s i ) ⌋ + 2) m i , s 2 i 8 )-strongly u niformly ergo dic. Moreo v er, b y Lemma 4.15 the family RSG ( α ), α ∈ Y , is ( m ′ , s ′ )-strongly uniformly ergo dic, therefore, b y Th eorem 2 of [ 37 ], the family of r andom scan Metrop olis-within-Gibb s samplers with selection p robabilities α ∈ Y and prop osals indexed b y γ ∈ Γ, is ( m ∗ , s ∗ )-sim ultaneously strongly uniformly ergo dic with m ∗ and s ∗ giv en as in [ 37 ]. F or diminishin g adaptation w e write sup x ∈X k P α n ,γ n ( x, · ) − P α n − 1 ,γ n − 1 ( x, · ) k TV ≤ sup x ∈X k P α n ,γ n ( x, · ) − P α n − 1 ,γ n ( x, · ) k TV + sup x ∈X k P α n − 1 ,γ n ( x, · ) − P α n − 1 ,γ n − 1 ( x, · ) k TV . The ﬁrst term ab o v e con v erges to 0 in probability b y Corollary 4.7 and assumption (a). Th e second term sup x ∈X k P α n − 1 ,γ n ( x, · ) − P α n − 1 ,γ n − 1 ( x, · ) k TV ≤ d X i =1 α n − 1 ,i sup x ∈X k P x − i ,γ n +1 ,i ( x i , · ) − P x − i ,γ n,i ( x i , · ) k TV con verge s to 0 in pr ob ab ility as a mixtur e of terms that conv erge to 0 in probabilit y .  ADAPTIVE GIBBS SAMPLERS 17 The follo win g lemma can b e used to v erify assu m ption (d) of Theorem 4.19 (see also Example 4.22 b elo w). Lemma 4.21. Assume that the adaptive pr op osals exhibit diminishing adapta tion, tha t is, for every i ∈ { 1 , . . . , d } the G n +1 me asur able r andom variable sup x ∈X k Q x − i ,γ n +1 ,i ( x i , · ) − Q x − i ,γ n,i ( x i , · ) k TV → 0 in pr ob ability, as n → ∞ for ﬁxe d starting values x 0 ∈ X and α 0 ∈ Y . Then any of the fol lowing c onditions: (i) The Metr op olis pr op osals have symmetric densities, that is, q x − i ,γ n,i ( x i , y i ) = q x − i ,γ n,i ( y i , x i ) , (ii) X i is c omp act for every i , π is c ontinuous, everywher e p ositive and b ounde d, implies c ondition (d ) of The or em 4.19 . Pr oof. Th e ﬁrst statemen t can b e concluded f r om Prop osition 12.3 of [ 1 ], h o wev er, to b e self-con tained, w e pr o vide the argumen t. Let P 1 , P 2 de- note tr an s ition k ernels and Q 1 , Q 2 prop osal k ern els of t wo generic Metrop olis algorithms for s amp ling from π on arbitrary s tate sp ace X . T o see that (i) implies (d) we c h ec k that k P 1 ( x, · ) − P 2 ( x, · ) k TV ≤ 2 k Q 1 ( x, · ) − Q 2 ( x, · ) k TV . Indeed, the acceptance probabilit y α ( x, y ) = min  1 , π ( y ) π ( x )  ∈ [0 , 1] do es not dep end on the prop osal, and for an y x ∈ X and A ∈ B ( X ), we compute | P 1 ( x, A ) − P 2 ( x, A ) | ≤     Z A α ( x, y )( q 1 ( y ) − q 2 ( y )) d y     + I { x ∈ A }     Z X (1 − α ( x, y ))( q 1 ( y ) − q 2 ( y )) d y     ≤ 2 k Q 1 ( x, · ) − Q 2 ( x, · ) k TV . F or the second statemen t note that condition (ii) implies th ere exists K < ∞ , s.t. π ( y ) /π ( x ) ≤ K for ev ery x, y ∈ X . T o conclude that (d) results from (ii) note that | min { a, b } − min { c, d }| < | a − c | + | b − d | (22) 18 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL and recall acceptance probabilities α i ( x, y ) = min { 1 , π ( y ) q i ( y, x ) π ( x ) q i ( x,y ) } . Indeed, for an y x ∈ X and A ∈ B ( X ), using ( 22 ), w e ha ve | P 1 ( x, A ) − P 2 ( x, A ) | ≤     Z A  min  q 1 ( x, y ) , π ( y ) π ( x ) q 1 ( y , x )  − min  q 2 ( x, y ) , π ( y ) π ( x ) q 2 ( y , x )  d y     + I { x ∈ A }     Z X ((1 − α 1 ( x, y )) q 1 ( x, y ) − (1 − α 2 ( x, y )) q 2 ( x, y )) d y     ≤ 4( K + 1) k Q 1 ( x, · ) − Q 2 ( x, · ) k TV and the claim follo ws since a r andom scan Metrop olis-within-Gibbs sampler is a mixture of Metrop olis samplers.  W e no w pro vide an example to sho w that diminishing adaptation of pro- p osals as in L emma 4.21 do es not necessarily imp ly condition (d) of Theo- rem 4.19 so some additional assumption is required, for example, (i) or (ii) of Lemma 4.21 . Example 4.22. Consider a sequence of Metrop olis algorithms with tran- sition k er n els P 1 , P 2 , . . . designed for sampling from π ( k ) = p k (1 − p ) on X = { 0 , 1 , . . . } . Th e transition k er n el P n results from u sing pr op osal kernel Q n and the standard accepta nce rule, where Q n ( j, k ) = q n ( k ) :=          p k  1 1 − p − p n + p 2 n  − 1 , for k 6 = n , p 2 n  1 1 − p − p n + p 2 n  − 1 , for k = n . Clearly , sup j ∈X k Q n +1 ( j, · ) − Q n ( j, · ) k TV = q n +1 ( n ) − q n ( n ) → 0 . Ho we v er, sup j ∈X k P n +1 ( j, · ) − P n ( j, · ) k TV ≥ P n +1 ( n, 0) − P n ( n, 0) = min  q n +1 (0) , π (0) π ( n ) q n +1 ( n )  − min  q n (0) , π (0) π ( n ) q n ( n )  = q n +1 (0) − q n (0) p n → 1 − p 6 = 0 . ADAPTIVE GIBBS SAMPLERS 19 5. Ergo dicit y—nonuniform case. In this section w e consider the case w here nonadaptiv e k ernels are not necessary u niformly ergo d ic. W e study adaptiv e random scan Gibbs adaptive Metrop olis-within-Gibbs ( AdapRSadapMw G ) al- gorithms in the nonuniform s etting, with paramete rs α ∈ Y and γ i ∈ Γ i , i = 1 , . . . , d , sub ject to adaptation. The conclusions w e dra w app ly immediately to adaptiv e random s can Gibbs Metrop olis-within-Gibbs ( Ada pRSMwG ) algo - rithms b y k eeping the parameters γ i ﬁxed for the Metrop olis-within-Gibbs steps. W e k eep the assumption that selectio n probabilities are in Y d eﬁned in ( 5 ), whereas the unif orm ergo d icit y assumption will b e r eplaced by some natural regularit y conditions on the target d en sit y . Our strategy is to use the generic approac h of [ 41 ] and to verify the diminishing adaptation and the con tainment conditions. Th e con tainment condition h as b een extensiv ely stud ied in [ 9 ] and it is essenti ally necessary for ergo dicit y of adaptiv e c hains (see T h eorem 2 therein for the precise result). In p articular, con tainment is implied by simulta neous geometrical ergo dicit y for th e adaptive kernels. More pr ecisely , we sh all use the follo wing result of [ 9 ]. Theorem 5.1 (Corollary 2 of [ 9 ]). Consider the family { P γ : γ ∈ Γ } of Markov chains on X ⊆ R d , satisfying the fol lowing c onditions: (i) for any c omp act set C ∈ B ( X ) , ther e exist some inte ger m > 0 , and r e al ρ > 0 , and a pr ob ability me asur e ν γ on C s.t. P m γ ( x, · ) ≥ ρν γ ( · ) for al l x ∈ C, (ii) ther e exists a function V : X → (1 , ∞ ) , s.t. for any c omp act set C ∈ B ( X ) , we have sup x ∈ C V ( x ) < ∞ , π ( V ) < ∞ , and lim sup | x |→∞ sup γ ∈ Γ P γ V ( x ) V ( x ) < 1 , then for any adaptive str ate gy using { P γ : γ ∈ Γ } , c ontainment holds. Throughout this section we assu me X i = R for i = 1 , . . . , d , and X = R d and let µ k denote the Lebsque m easure on R k . By { e 1 , . . . , e d } denote the co ordinate unit vec tors and let | · | b e the Euclidean norm. Our focus is on random w alk Metrop olis pr op osals with symmetric densi- ties for u p dating X i | X − i denoted as q i,γ i ( · ), γ i ∈ Γ i . W e sh all w ork in the fol- lo wing setting, extensiv ely studied for nonadaptive Metrop olis-within-Gibbs algorithms in [ 16 ] (see also [ 36 , 37 ] for related w ork and [ 21 ] f or analysis of the random wa lk Metrop olis alg orith m ). Assumpt ion 5.2. Th e target distrib ution π is absolutely con tinuous with resp ect to µ d with strictl y p ositiv e and con tin uous dens ity π ( · ) on X . 20 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL Assumpt ion 5.3. The family { q i,γ i } 1 ≤ i ≤ d ; γ i ∈ Γ i of s y m metric prop osal densities w ith resp ect to µ 1 (one-dimensional Leb esgue measure) is su c h that there exist constan ts η i > 0 , δ i > 0, for i = 1 , . . . , d , s.t. inf | x |≤ δ i q i,γ i ( x ) ≥ η i for every 1 ≤ i ≤ d and γ i ∈ Γ i . (23) Assumpt ion 5.4. There exist 0 < δ < ∆ ≤ ∞ , suc h that ξ := inf 1 ≤ i ≤ d,γ i ∈ Γ i Z ∆ δ q i,γ i ( y ) µ 1 ( dy ) > 0 (24) and, for an y sequence x = { x j } with lim j →∞ | x j | = + ∞ , there exists a sub- sequence ˜ x = { ˜ x j } s.t. f or some i ∈ { 1 , . . . , d } and all y ∈ [ δ, ∆], lim j →∞ π ( ˜ x j ) π ( ˜ x j − sign( ˜ x j i ) y e i ) = 0 and lim j →∞ π ( ˜ x j + sign( ˜ x j i ) y e i ) π ( ˜ x j ) = 0 . (25) Discussion of the seemingly in v olv ed 5.4 and simp le criterions for c hec k- ing it are giv en in [ 16 ]. It was sh o wn in [ 16 ] that und er these assu mptions nonadaptiv e random scan Metrop olis-within-Gibb s algorithms are geomet- rically ergo dic for sub exp onent ial densities. W e establish ergo dicit y of the doubly adaptiv e AdapRSa dapMwG algo rithm in the same setting. Theorem 5.5. L et π b e a sub exp onential density and let the sele ction pr ob abilities α n ∈ Y for al l n , with Y as in ( 5 ). M or e over assume that: (a) | α n − α n − 1 | → 0 in pr ob ability for ﬁxe d starting values x 0 ∈ X and α 0 ∈ Y , γ i ∈ Γ i , i = 1 , . . . , d ; (b) The M etr op olis-within-Gibbs kernels exhibit diminishing adapta tion, that is, for every i ∈ { 1 , . . . , d } the G n +1 me asur able r andom variable sup x ∈X k P x − i ,γ n +1 ,i ( x i , · ) − P x − i ,γ n,i ( x i , · ) k TV → 0 in pr ob ability, as n → ∞ for ﬁxe d starting values x 0 ∈ X and α 0 ∈ Y , γ i ∈ Γ i , i = 1 , . . . , d ; (c) A ssumptions 5.2 , 5.3 , 5.4 hold. Then AdapR SadapMwG is er go dic, that is, T ( x 0 , α 0 , γ 0 , n ) → 0 as n → ∞ . (26) Before pro vin g th is result w e state its coun terpart for den sities that are log-co nca ve in th e tails. T his is another typica l setting carefully studied in the con text of geometric ergod icity of nonadaptiv e chains [ 16 , 28 , 37 ] where Assumption 5.4 is replaced b y the follo wing tw o conditions. ADAPTIVE GIBBS SAMPLERS 21 Assumpt ion 5.6. There exists an φ > 0 and δ s .t. 1 /φ ≤ δ < ∆ ≤ ∞ and, for an y sequence x := { x j } with lim j →∞ | x j | = + ∞ , there exists a sub- sequence ˜ x := { ˜ x j } s.t. for some i ∈ { 1 , . . . , d } and for all y ∈ [ δ , ∆], lim j →∞ π ( ˜ x j ) π ( ˜ x j − sign( ˜ x j i ) y e i ) ≤ exp {− φy } and (27) lim j →∞ π ( ˜ x j + sign( ˜ x j i ) y e i ) π ( ˜ x j ) ≤ exp {− φy } . Assumpt ion 5.7. inf 1 ≤ i ≤ d,γ i ∈ Γ i Z ∆ δ y q i,γ i ( y ) µ 1 ( dy ) ≥ 8 εφ ( e − 1) . Remark 5.8. As r emark ed in [ 16 ], Assumption 5.6 generalizes the one- dimensional deﬁnition of log-conca vit y in the tails and Assumption 5.7 is easy to ensure, at least if ∆ = ∞ , b y taking the prop osal distribution to b e a mixture of an adaptiv e comp onen t and a u niform on [ − U, U ] for U large enough or a mean zero Gaussian with large enough v ariance. Theorem 5.9. L et the sele ction pr ob abilities α n ∈ Y for al l n , with Y as in ( 5 ). Mor e over, assume that: (a) | α n − α n − 1 | → 0 in pr ob ability for ﬁxe d starting values x 0 ∈ X and α 0 ∈ Y , γ i ∈ Γ i , i = 1 , . . . , d ; (b) The M etr op olis-within-Gibbs kernels exhibit diminishing adapta tion, that is, for every i ∈ { 1 , . . . , d } the G n +1 me asur able r andom variable sup x ∈X k P x − i ,γ n +1 ,i ( x i , · ) − P x − i ,γ n,i ( x i , · ) k TV → 0 in pr ob ability, as n → ∞ for ﬁxe d starting values x 0 ∈ X and α 0 ∈ Y , γ i ∈ Γ i , i = 1 , . . . , d ; (c) A ssumptions 5.2 , 5.3 , 5.6 , 5.7 hold. Then AdapR SadapMwG is er go dic, that is, T ( x 0 , α 0 , γ 0 , n ) → 0 as n → ∞ . (28) W e now pro ceed to pro ofs. Pr oof of Theorem 5.5 . Ergo dicit y will follo w from Th eorem 2 of [ 41 ] b y establishing diminishin g adaptation and con tainment condition. Dimin- ishing adaptat ion can b e v eriﬁed as in the pr o of of Theorem 4.19 . C ontain- men t will result fr om T heorem 5.1 . Recall that P α,γ is the rand om scan Metrop olis-within-Gibbs k ernel w ith selection p robabilities α and prop osals indexed by { γ i } 1 ≤ i ≤ d . T o verify the 22 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL small set condition (i), observ e that Assump tions 5.2 and 5.3 imply that for every compact set C and ev ery v ector γ i ∈ Γ i , i ∈ 1 , . . . , d , w e can ﬁnd m ∗ and ρ ∗ indep en d en t of { γ i } , and suc h that P m ∗ 1 /d,γ ( x, · ) ≥ ρ ∗ ν ( · ) f or all x ∈ C . Hence, arguing as in the pro of of Lemma 4.4 , there exist m and ρ , indep en d en t of α ∈ Y and { γ i } , suc h that P m α,γ ( x, · ) ≥ ρν ( · ) for all x ∈ C . T o establish the drift condition (ii), let V s := π ( x ) − s for some s ∈ (0 , 1) to b e sp eciﬁed later. Th en by Pr op osition 3 of [ 37 ], for all 1 ≤ i ≤ d , γ i ∈ Γ i , and x ∈ R d w e ha ve P i,γ i V s ( x ) ≤ r ( s ) V s ( x ) where r ( s ) := 1 + s (1 − s ) 1 /s − 1 . (29) Since r ( s ) → 1 as s → 0, w e can c ho ose s small enough, so that r ( s ) < 1 + εξ 1 − 2 εξ . (30) The rest of th e argument follo w s the p ro of of Th eorem 2 in [ 16 ]. W e rep eat most of it since we need to ensure it is indep en den t of α and γ . Assume by con tradiction that there exists an R d -v alued sequence { x j } s.t. lim sup j →∞ sup α ∈Y ,γ i ∈ Γ i , 1 ≤ i ≤ d P α,γ V s ( x j ) /V s ( x j ) ≥ 1 . Then there exists a su bsequence { ˆ x j } suc h that lim j →∞ sup α ∈Y ,γ i ∈ Γ i , 1 ≤ i ≤ d P α,γ V s ( ˆ x j ) /V s ( ˆ x j ) ≥ 1 . Moreo v er, as shown in [ 16 ], pr o of of T h eorem 2, p age 129, ther e exists an in teger k ∈ { 1 , . . . , d } and a further su bsequence { ˜ x j } , suc h that lim j →∞ sup γ k ∈ Γ k P k ,γ k V s ( ˜ x j ) /V s ( ˜ x j ) ≤ r ( s ) − (2 r ( s ) − 1) ξ . (31) The co n tr ad iction follo ws from ( 2 9 ), ( 30 ) and ( 31 ), since lim j →∞ sup α ∈Y ,γ i ∈ Γ i , 1 ≤ i ≤ d P α,γ V s ( ˜ x j ) V s ( ˜ x j ) = lim j →∞ sup α ∈Y d X i =1 α i sup γ i ∈ Γ i P i,γ i V s ( ˜ x j ) V s ( ˜ x j ) = lim j →∞ sup α ∈Y  α k sup γ k ∈ Γ k P k ,γ k V s ( ˜ x j ) /V s ( ˜ x j ) + X i 6 = k α i sup γ i ∈ Γ i P i,γ i V s ( ˜ x j ) V s ( ˜ x j )  ≤ ε ( r ( s ) − (2 r ( s ) − 1) ξ ) + (1 − ε ) r ( s ) < 1 .  Pr oof of Theorem 5.9 . Th e pro of is iden tical to the pro of of Theo- rem 5.5 with the only diﬀerence that no w the dr ift condition (ii) of Th eo- rem 5.1 will b e established under Assumptions 5.6 and 5.7 . ADAPTIVE GIBBS SAMPLERS 23 Establishing (ii) of Theorem 5.1 will follo w closely the pro of of Theorem 3 in [ 16 ]. Let again V s := π ( x ) − s for some s ∈ (0 , 1) to b e s p eciﬁed later and recall that ( 29 ) holds for all 1 ≤ i ≤ d , γ i ∈ Γ i , and x ∈ R d . Assume by con tradiction that there exists an R d -v alued sequence { x j } s.t. lim sup j →∞ sup α ∈Y ,γ i ∈ Γ i , 1 ≤ i ≤ d P α,γ V s ( x j ) /V s ( x j ) ≥ 1 . Then there exists a su bsequence { ˆ x j } suc h that lim j →∞ sup α ∈Y ,γ i ∈ Γ i , 1 ≤ i ≤ d P α,γ V s ( ˆ x j ) /V s ( ˆ x j ) ≥ 1 . Moreo v er, as sh own in [ 16 ], pro of of Theorem 3, p age 137, equation (15), there exists an in teger k ∈ { 1 , . . . , d } and a f urther subsequen ce { ˜ x j } , suc h that lim j →∞ P k ,γ k V s ( ˜ x j ) /V s ( ˜ x j ) ≤ r ( s ) − (2 r ( s ) − 1) J γ k (0) + J γ k ( φs ) (32) + J γ k ( φ (1 − s )) − J γ k ( φ ) , where for b > 0, J γ k ( b ) = Z ∆ δ e − by q k ,γ k ( y ) µ 1 ( dy ) . No w from ( 29 ) and ( 32 ) compute lim j →∞ sup α ∈Y ,γ i ∈ Γ i , 1 ≤ i ≤ d P α,γ V s ( ˜ x j ) V s ( ˜ x j ) = lim j →∞ sup α ∈Y d X i =1 α i sup γ i ∈ Γ i P i,γ i V s ( ˜ x j ) V s ( ˜ x j ) = lim j →∞ sup α ∈Y  α k sup γ k ∈ Γ k P k ,γ k V s ( ˜ x j ) /V s ( ˜ x j ) + X i 6 = k α i sup γ i ∈ Γ i P i,γ i V s ( ˜ x j ) V s ( ˜ x j )  ≤ r ( s ) − ε inf γ k ∈ Γ k ((2 r ( s ) − 1) J γ k (0) + J γ k ( φs ) + J γ k ( φ (1 − s )) − J γ k ( φ )) = sup γ k ∈ Γ k ( r ( s ) − ε ((2 r ( s ) − 1) J γ k (0) + J γ k ( φs ) + J γ k ( φ (1 − s )) − J γ k ( φ ))) =: sup γ k ∈ Γ k H ( γ k , φ, s ) . The result will follo w if we can ﬁnd such an s that sup γ k ∈ Γ k H ( γ k , φ, s ) < 1. Note that H ( γ k , φ, 0) = 1 for ev ery γ k ∈ Γ k and the fu nction is diﬀerent iable. Therefore, it is enough to sho w th at there exist κ 1 > 0 and κ 2 > 0 suc h that ∂ ∂ s H ( γ k , φ, s ) < − κ 1 for all γ k ∈ Γ k and s ∈ (0 , κ 2 ) 24 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL and co nclude (ii) with V s ( x ) = π − s ( x ) and s := κ 2 . T o this end compute 1 ε ∂ ∂ s H ( γ k , φ, s ) =  1 ε − 2 J γ k (0)  ∂ ∂ s r ( s ) − φ Z ∆ δ y e − φsy q γ k ( y ) µ 1 ( dy ) + φ Z ∆ δ y e − φ (1 − s ) y q γ k ( y ) µ 1 ( dy ) = 1 ε (1 − s ) 1 /s log(1 − s ) s ( s − 1) − φI 1 + φI 2 =: ♣ , and notice that by 1 /φ ≤ δ and Assump tion 5.7 , for s small enough we hav e I 1 − I 2 ≥ e − 1 2 e Z ∆ δ y q γ k ( y ) µ 1 ( dy ) ≥ e − 1 2 e 8 εφ ( e − 1) = 4 εφe and (1 − s ) 1 /s log(1 − s ) s ( s − 1) ≤ 2 e . Consequent ly there exists κ 2 > 0 s.t. for all s ∈ (0 , κ 2 ) ♣ ≤ 2 εe − 4 φ εφe = − 2 εe =: κ 1 < 0 .  Example 5.10. W e no w giv e an example in volving a simple generalized linear mixed mo d el. C onsider the mo del and p rior gi v en by Y i ∼ P ois ( e θ + X i ) , (33) X i ∼ N (0 , 1) , (34) θ ∼ N (0 , 1) . (3 5) The mo d el is c hosen to b e extremely simple so as to n ot detract fr om the argumen t u sed to demonstrate ergodicit y of a dapRSada pMwG , although this argumen t readily generalizes to diﬀerent exp onentia l families, link fu nctions and random eﬀect distributions. W e consider s imulating from the p osterior d istribution of θ , X giv en ob- serv ations y 1 , . . . , y n using adapRSada pMwG . More sp eciﬁcally w e set q x − i ,γ ( x i , y i ) = exp {− ( y i − x i ) 2 / 2 γ } √ 2 π γ , (36 ) where the range of p ermissible scales γ is restricted to b e in some range ℜ = [ a, b ] with 0 < a ≤ b < ∞ . W e are in the sub exp onenti al tail case and sp eciﬁcally w e ha ve the follo wing. ADAPTIVE GIBBS SAMPLERS 25 Pr oposition 5.11. Consider adapRSadapMw G applie d to mo del ( 33 ) u s- ing any adaptive scheme satisfying the c onditions (a) and (b) of The o- r e m 5.5 . Then the scheme is er go dic. F or the pro of, we require the follo w ing deﬁnition from [ 16 ]. W e let Φ = { functions φ : R + → R + ; φ ( x ) → ∞ as x → ∞} . Pr oof o f Proposition 5.11 . According to Theorem 5.5 , it remains to c hec k conditions 5.2 , 5.3 , 5.4 h old. Conditions 5.2 and 5.3 h old b y con- struction, w hile cond ition 5.4 consists of t wo separate conditions. One of these, give n in ( 24 ), holds b y construction from ( 36 ). Moreo ve r, [ 16 ] sh o ws that ( 25 ) can b e replaced by the follo wing condition: th ere exist functions { φ i ∈ Φ , 1 ≤ i ≤ d } suc h that i ∈ { 1 , . . . , d } and all y ∈ [ δ, ∆], lim | x i |→∞ sup { x − i ; φ j ( | x j | ) ≤ φ i ( | x i | ) ,j 6 = i } π ( ˜ x j ) π ( ˜ x j − sign( ˜ x j i ) y e i ) = 0 (37) and lim | x i |→∞ sup { x − i ; φ j ( | x j | ) ≤ φ i ( | x i | ) ,j 6 = i } π ( ˜ x j + sign( ˜ x j i ) y e i ) π ( ˜ x j ) = 0 . (38) No w tak e φ i ( x ) = x for all 1 ≤ i ≤ d s o that ( 37 ) can b e rewritten as th e t wo conditions lim | x i |→∞ sup { x − i ; | x j |≤| x i | ,j 6 = i } exp  Z 0 − y ∇ i log π ( x + s ign( x i ) z e i ) d z  = 0 , (39) lim | x i |→∞ sup { x − i ; | x j |≤| x i | ,j 6 = i } exp  Z y 0 ∇ i log π ( x + s ign( x i ) z e i ) d z  = 0 (40) for all y ∈ [ δ , ∆] , where ∇ i denotes the d er iv ativ e in the i th direction. W e shall sh ow that u n iformly on the set S i ( x i ), w hic h is d eﬁned to b e { x − i ; | x j | ≤ | x i | , j 6 = i } , the function ∇ i log π ( x ) con verges to −∞ as x i → + ∞ and to + ∞ as x i approac hes − ∞ . No w we h a ve d = n + 1 and let i corresp on d to the comp onen t x i for 1 ≤ i ≤ n with n + 1 denoting the comp onent θ . Therefore, for 1 ≤ i ≤ n , ∇ i log π ( x ) = − e θ + x i + y i − x i and ∇ n +1 log π ( x ) = − n X i =1 e θ + x i − n X i =1 y i − θ . No w for x i > 0, 1 ≤ i ≤ n ∇ i log π ( x ) ≥ y i − x i , 26 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL whic h is div erging to −∞ indep endently of x − i . Similarly , ∇ n +1 log π ( x ) ≥ n X i =1 y i − θ div erging to −∞ indep end ently of { x i ; 1 ≤ i ≤ n } . F or x i < 0, 1 ≤ i ≤ n and ( x − i , θ ) ∈ S i ( x i ), ∇ i log π ( x ) ≤ y i − x i + 1 again div erging to + ∞ uniformly . Finally , for θ < 0 and x ∈ S n +1 ( θ ), ∇ n +1 log π ( x ) ≥ − n + n X i =1 y i − θ , again demons trating the requ ir ed uniform con vergence. Thus ergod icit y holds.  Remark 5.12. The r andom eﬀec t distribu tion in Example 5.10 can b e altered to giv e diﬀerent results. F or instance, if the d istribution is doub ly exp onentia l, Theorem 4.2 can b e applied usin g ve ry similar arguments to those used ab o ve. Extensions to more complex hierarc h ical mo dels are clearly p ossible though we do not p ursue this here. Remark 5.13. An imp ortan t problem that we ha v e not focused on in- v olve s th e construction of explicit adaptiv e strategies. S ince little is known ab out the optimizatio n of th e r andom scan rand om walk Metrop olis, ev en in the nonadaptiv e case, th is is not a straigh tforward question. W e are engaged in further work exploring adaptation to attempt to m aximize a giv en opti- malit y criterion f or the c hosen class of samp lers . T w o p ossible strategies are: • to scale th e prop osal v ariance to app roac h 2 . 4 times the empirically ob- serv ed conditional v ariance; • to scale the pr op osal v ariance to ac h iev e an algorithm with acceptance prop ortion appro ximately 0 . 44. Both these metho d s are founded in theoretical arguments (see, e.g., [ 39 ]). 6. Pro of of Prop osition 3.2 . The analysis of Example 3.1 is somewhat delicate since the p ro cess is b oth time and space inh omogeneous (as are most non trivial adaptiv e MCMC algorithms). T o establish P r op osition 3.2 , we will deﬁ ne a couple of auxiliary stochastic pro cesses. Consider the follo wing one-dimensional pro cess ( ˜ X n ) n ≥ 0 obtained fr om ( X n ) n ≥ 0 b y ˜ X n := X n, 1 + X n, 2 − 2 . Clearly ˜ X n − ˜ X n − 1 ∈ {− 1 , 0 , 1 } ; moreo ve r, X n, 1 → ∞ and X n, 2 → ∞ if and only if ˜ X n → ∞ . Note that the dynamics of ( ˜ X n ) n ≥ 0 are also b oth time and space inhomogeneo us. ADAPTIVE GIBBS SAMPLERS 27 W e will also use an auxiliary random w alk-lik e space homogeneous pro cess S 0 = 0 and S n := n X i =1 Y i for n ≥ 1 , where Y 1 , Y 2 , . . . are indep endent random v ariables taking v alues in {− 1 , 0 , 1 } . Let the distrib ution of Y n on {− 1 , 0 , 1 } b e ν n :=  1 4 − 1 a n , 1 2 , 1 4 + 1 a n  . (41) W e shall couple ( ˜ X n ) n ≥ 0 with ( S n ) n ≥ 0 , that is, d eﬁne them on th e same probabilit y space { Ω , F , P } , by sp ecifying the join t distribu tion of ( ˜ X n , S n ) n ≥ 0 so that the marginal distributions remain u nc h anged. W e describ e the details of the construction later. No w deﬁne Ω ˜ X ≥ S := { ω ∈ Ω : ˜ X n ( ω ) ≥ S n ( ω ) for every n } (42) and Ω ∞ := { ω ∈ Ω : S n ( ω ) → ∞} . (43) Clearly , if ω ∈ Ω ˜ X ≥ S ∩ Ω ∞ , then ˜ X n ( ω ) → ∞ . In the sequel we sh ow that for our coupling construction P (Ω ˜ X ≥ S ∩ Ω ∞ ) > 0 . (44) W e shall use Ho eﬀding’s inequalit y for S k + n k := S k + n − S k . Since Y n ∈ [ − 1 , 1] , it yields for ev ery t > 0, P ( S k + n k − E S k + n k ≤ − nt ) ≤ exp { − 1 2 nt 2 } . (45) Note that E Y n = 2 /a n and thus E S k + n k = 2 P k + n i = k +1 1 /a i . Th e follo w in g c hoice for th e sequence a n will facilitate fur ther calculations. Let b 0 = 0 , b 1 = 1000 , b n = b n − 1  1 + 1 10 + log( n )  for n ≥ 2 , c n = n X i =0 b i , a n = 10 + log( k ) for c k − 1 < n ≤ c k . Remark 6.1. T o kee p n otation reasonable we ignore the fact that b n will not b e an intege r. It should b e clear th at this does not aﬀect pr o ofs, as the constan ts w e ha ve d eﬁned, that is, b 1 and a 1 , are bigger than r equired. 28 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL Lemma 6.2. L et Y n and S n b e as deﬁne d ab ove and let Ω 1 := { ω ∈ Ω : S k = k for ev ery 0 < k ≤ c 1 } , (46) Ω n :=  ω ∈ Ω : S k ≥ b n − 1 2 for eve ry c n − 1 < k ≤ c n  for n ≥ 2 . (47) Then P ∞ \ n =1 Ω n ! > 0 . (48) Remark 6.3. Note that b n ր ∞ and therefore T ∞ n =1 Ω n ⊂ Ω ∞ . Pr oof of Lemma 6.2 . With p ositiv e prob ab ility , sa y p 1 ,S , w e hav e Y 1 = · · · = Y 1000 = 1 w hic h giv es S c 1 = 1000 = b 1 . Hence, P (Ω 1 ) = p 1 ,S > 0. Moreo v er, recall that S c n c n − 1 is a sum of b n i.i.d. rand om v ariables w ith E S c n c n − 1 = 2 b n 10+log( n ) . Therefore, for ev ery n ≥ 1 by Ho eﬀdin g’s inequalit y with t = 1 / (10 + log( n )), w e can also write P  S c n c n − 1 ≤ b n 10 + log( n )  ≤ exp  − 1 2 b n (10 + log( n )) 2  =: p n . Therefore, using the ab o ve b ou n d iterativ ely , we obtain P ( S c 1 = b 1 , S c n ≥ b n for every n ≥ 2) ≥ p 1 ,S ∞ Y n =2 (1 − p n ) . (49) Note that { S c n ≥ b n } ⊆ Ω n b y th e choice of b n , and hence, equation ( 49 ) implies also P ∞ \ n =1 Ω n ! ≥ p 1 ,S ∞ Y n =2 (1 − p n ) . (50) Clearly in this case p 1 ,S ∞ Y n =2 (1 − p n ) > 0 ⇔ ∞ X n =1 log(1 − p n ) > −∞ ⇔ ∞ X n =1 p n < ∞ . (51) W e conclude ( 51 ) b y comparing p n with 1 /n 2 . W e sho w th at there exists n 0 suc h that for n ≥ n 0 the s eries p n decreases quic k er than the series 1 /n 2 and therefore p n is su mmable. W e c h ec k that log p n − 1 p n > log n 2 ( n − 1) 2 for n ≥ n 0 . (52) ADAPTIVE GIBBS SAMPLERS 29 Indeed log p n − 1 p n = − 1 2  b n − 1 (10 + log( n − 1)) 2 − b n (10 + log( n )) 2  = b n − 1 2  11 + log( n ) (10 + log( n )) 3 − 1 (10 + log( n − 1)) 2  = b n − 1 2  (11 + log( n ))(10 + log ( n − 1)) 2 − (10 + log ( n )) 3 (10 + log( n )) 3 (10 + log( n − 1)) 2  . No w recall that b n − 1 is an in creasing sequen ce. Moreo ver, the n u merator can b e rewr itten as (10 + log( n )) ((10 + log( n − 1)) 2 − (10 + log ( n )) 2 ) + (10 + log( n − 1)) 2 ; no w u se a 2 − b 2 = ( a + b )( a − b ) to iden tify the leading term (10 + log ( n − 1)) 2 . Consequent ly there exists a constant C and n 0 ∈ N s .t. for n ≥ n 0 log p n − 1 p n ≥ C (10 + log( n )) 3 > 2 n − 1 > log n 2 ( n − 1) 2 . Hence, P ∞ n =1 p n < ∞ follo ws.  No w w e will d escrib e the coup lin g co nstruction of ( ˜ X n ) n ≥ 0 and ( S n ) n ≥ 0 . W e already remarke d that T ∞ n =1 Ω n ⊂ Ω ∞ . W e will deﬁne a coupling that implies also P ∞ \ n =1 Ω n ! ∩ Ω ˜ X ≥ S ! ≥ C P ∞ \ n =1 Ω n ! for some universal C > 0 (53) and therefore P (Ω ˜ X ≥ S ∩ Ω ∞ ) > 0 . (54) Th us nonergo dicity of ( X n ) n ≥ 0 will follo w fr om Lemma 6.2 . W e start w ith the follo wing observ ation. Lemma 6.4. Ther e exists a c oupling of ˜ X n − ˜ X n − 1 and Y n , such that: (a) F or every n ≥ 1 and eve ry value of ˜ X n − 1 P ( ˜ X n − ˜ X n − 1 = 1 , Y n = 1) ≥ P ( ˜ X n − ˜ X n − 1 = 1) P ( Y n = 1) . (55) (b) Write even or o dd ˜ X n − 1 as ˜ X n − 1 = 2 i − 2 or ˜ X n − 1 = 2 i − 3 , r esp e c- tively. If 2 i − 8 ≥ a n , then the fol lowing implic ations hold a.s. Y n = 1 ⇒ ˜ X n − ˜ X n − 1 = 1 , (56) ˜ X n − ˜ X n − 1 = − 1 ⇒ Y n = − 1 . (57) 30 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL Pr oof. Pr op ert y (a) is a simple fact for any t wo {− 1 , 0 , 1 } v alued ran- dom v ariables Z and Z ′ with distributions sa y , { d 1 , d 2 , d 3 } and { d ′ 1 , d ′ 2 , d ′ 3 } . Assign P ( Z = Z ′ = 1) := m in { d 3 , d ′ 3 } and (a) follo ws. T o establish (b) we analyze the dyn amics of ( X n ) n ≥ 0 and consequen tly , of ( ˜ X n ) n ≥ 0 . Recall Al- gorithm 2.2 and the u p d ate rule for α n in ( 4 ). Giv en X n − 1 = ( i, j ), th e algorithm will obtain the v alue of α n in step (1); next draw a co ordin ate ac- cording to ( α n, 1 , α n, 2 ) in step (2). In steps (3) and (4) it will mo v e according to conditional distributions for up d ating the ﬁrst or the second coordin ate. These d istributions are (1 / 2 , 1 / 2) and  i 2 i 2 + ( i − 1) 2 , ( i − 1) 2 i 2 + ( i − 1) 2  , resp ectiv ely . Hence, giv en X n − 1 = ( i, i ) , th e distribution of X n ∈ { ( i, i − 1) , ( i, i ) , ( i + 1 , i ) } is  1 2 − 4 a n  i 2 i 2 + ( i − 1) 2 , (58) 1 −  1 2 − 4 a n  i 2 i 2 + ( i − 1) 2 −  1 4 + 2 a n  , 1 4 + 2 a n  , whereas if X n − 1 = ( i, i − 1), th en X n ∈ { ( i − 1 , i − 1) , ( i, i − 1) , ( i, i ) } with probabilities  1 4 − 2 a n , 1 −  1 4 − 2 a n  −  1 2 + 4 a n  ( i − 1) 2 i 2 + ( i − 1) 2 , (59)  1 2 + 4 a n  ( i − 1) 2 i 2 + ( i − 1) 2  , resp ectiv ely . W e can conclud e the ev olution of ( ˜ X n ) n ≥ 0 . Namely , if ˜ X n − 1 = 2 i − 2, then the distribu tion of ˜ X n − ˜ X n − 1 ∈ {− 1 , 0 , 1 } is giv en b y ( 58 ) and if ˜ X n − 1 = 2 i − 3, then the distribution of ˜ X n − ˜ X n − 1 ∈ {− 1 , 0 , 1 } is giv en by ( 59 ). Let ≤ st denote sto c hastic ordering. By sim p le algebra b oth measures deﬁned in ( 58 ) and ( 59 ) are sto chastic ally bigger than µ i n = ( µ i n, 1 , µ i n, 2 , µ i n, 3 ) , (60) where µ i n, 1 =  1 4 − 2 a n  1 + 2 i  = 1 4 − 1 a n − 2 i + 8 − a n 2 ia n , (61) µ i n, 2 = 1 −  1 4 − 2 a n  1 + 2 i  −  1 4 + 2 a n  1 − 2 max { 4 , i }  , µ i n, 3 =  1 4 + 2 a n  1 − 2 max { 4 , i }  = 1 4 + 1 a n + 2 m ax { 4 , i } − 8 − a n 2 a n max { 4 , i } . (62) ADAPTIVE GIBBS SAMPLERS 31 Recall ν n , the distrib ution of Y n deﬁned in ( 41 ). Examine ( 61 ) and ( 62 ) to see that if 2 i − 8 ≥ a n , then µ i n ≥ st ν n . Hence, in this case also, the distribu tion of ˜ X n − ˜ X n − 1 is sto c hastically bigger than th e distribution of Y n . Th e join t probabilit y distribution of ( ˜ X n − ˜ X n − 1 , Y n ) satisfying ( 56 ) and ( 57 ) follo w s.  Pr oof of Pr oposition 3.2 . Deﬁne Ω 1 , ˜ X := { ω ∈ Ω : ˜ X n − ˜ X n − 1 = 1 for ev ery 0 < n ≤ c 1 } . (63) Since the distribution of ˜ X n − ˜ X n − 1 is sto chastica lly bigger than µ i n deﬁned in ( 60 ) and µ i n (1) > c > 0 for ev ery i and n , P (Ω 1 , ˜ X ) =: p 1 , ˜ X > 0 . By Lemma 6.4 (a) we hav e P (Ω 1 , ˜ X ∩ Ω 1 ) ≥ p 1 ,S p 1 , ˜ X > 0 . (64) Since S c 1 = ˜ X c 1 = c 1 = b 1 , on Ω 1 , ˜ X ∩ Ω 1 , th e requiremen ts for Lemma 6.4 (b) hold for n − 1 = c 1 . W e shall use Lemma 6.4 (b) iterat iv ely to k eep ˜ X n ≥ S n for ev ery n . Recall that w e write ˜ X n − 1 as ˜ X n − 1 = 2 i − 2 or ˜ X n − 1 = 2 i − 3. If 2 i − 8 ≥ a n and ˜ X n − 1 ≥ S n − 1 , then by Lemma 6.4 (b) also ˜ X n ≥ S n . Clearly if ˜ X k ≥ S k and S k ≥ b n − 1 2 for c n − 1 < k ≤ c n then ˜ X k ≥ b n − 1 2 for c n − 1 < k ≤ c n , hence, 2 i − 2 ≥ b n − 1 2 for c n − 1 < k ≤ c n . This in turn giv es 2 i − 8 ≥ b n − 1 2 − 6 for c n − 1 < k ≤ c n and since a k = 10 + log( n ), for the iterativ e constru ction to h old, w e need b n ≥ 32 + 2 log ( n + 1). By the deﬁ n ition of b n and standard algebra w e ha ve b n ≥ 100 0 1 + n X i =2 1 10 + log( n ) ! ≥ 32 + 2 log ( n + 1) for every n ≥ 1 . Summarizing the ab o ve argument provides P ( X n, 1 → ∞ ) ≥ P (Ω ∞ ∩ Ω ˜ X ≥ S ) ≥ P ∞ \ n =1 Ω n ! ∩ Ω ˜ X ≥ S ! ≥ P Ω 1 , ˜ X ∩ ∞ \ n =1 Ω n ! ∩ Ω ˜ X ≥ S ! ≥ p 1 , ˜ X p 1 ,S ∞ Y n =2 (1 − p n ) > 0 . Hence, ( X n ) n ≥ 0 is n ot er go d ic, and in particular, k π n − π k TV 9 0.  32 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL REFERENCES [1] Andri eu, C . and M oulines, ´ E. (2006). O n the ergodicity prop erties of some adap- tive MCMC algorithms. Ann. Appl . Pr ob ab. 16 1462–1505 . MR2260070 [2] A tchad ´ e, Y. and For t, G. (2010). Limit theorems for some adaptive MCMC algo- rithms with subgeometric kernels. Bernoul li 16 116– 154. MR2648752 [3] A tchad ´ e, Y. , F or t, G. , Moulines, E. and Priouret, P. (2011). Adaptive Marko v chai n Mon te Carlo: Theory and metho ds. In Bayesian T i me Series Mo dels ( D. Barber , A. T. Cemgil and S. Chiapp a , eds.) 33–53. Cambridge Univ. Press , Cam b ridge. [4] A tchad ´ e, Y. , Ro ber ts, G. O. and Rosenthal, J. S. (200 9). Optimal sca ling of Metropolis-coupled Marko v chain Monte Carlo. Preprint. [5] A tchad ´ e, Y. F. and R osenthal, J. S. ( 2005). On adaptiv e Marko v chain Monte Carlo algorithms. Bernoul li 11 815–828. MR2172842 [6] Bai , Y. (2009). Sim u ltaneous drift conditions for adaptiv e Ma rko v c hain Mon te Carlo algorithms. Preprint. [7] Bai , Y. ( 2009). An adaptive directional Me trop olis-within-Gibbs algori thm. Preprin t. [8] Bai , Y. , Craiu, R. V. and Di Narz o, A. F. (2011 ). Divide and conquer: A mixture- based approach t o regional adaptation for MCMC. J. Comput. Gr aph. Statist. 20 63–79. MR2816538 [9] Bai , Y . , R ober ts, G. O. and R osenthal, J. S. (2011). On th e containment con- dition for adaptive Mark ov chain Mon t e Carlo algorithms. Adv . Appl . Sta t. 21 1–54. MR2849670 [10] B ´ edard, M. (2007). W eak converge nce of Metropolis algorithms for non -i.i.d. target distributions. Ann. Appl. Pr ob ab. 17 1222 –1244. MR 2344305 [11] B ´ edard, M. (2008). Optimal acceptance rates for Metrop olis algorithms: Moving b eyond 0.234. Sto chastic Pr o c ess. Appl. 118 2198–2222. MR2474348 [12] B ottolo, L. , Richardson, S. and Rosenthal, J. S. (2010). Bay esian models for sparse regression analysis of high dimensional data. In Bayesian Statistics 9, Pr o c e e dings of N i nth V alencia International Confer enc e in Bayesian Statistics 539–568 . Oxford Univ. Press, Oxford. [13] B ro ckwell, A. E. an d Kadane, J. B. ( 2005). Id entiﬁcation of regeneration times in MCMC sim ulation, with application to adaptive schemes. J. Comput. Gr aph. Statist. 14 436–458. MR2161623 [14] Cra iu, R. V . , Rosenthal, J. and Y ang, C. (2009). Learn from thy neighbor: P arallel-c hain and regional adapt ive MCM C. J. Amer. Statist. Asso c. 104 1454– 1466. MR2750572 [15] Di aco nis, P. , Khare, K. and S aloff-Coste, L. (2008). Gibbs sampling, exp onen - tial families and orth ogonal p olynomials. Statist. Sci. 23 151–178. With com- ments and a rejoinder b y the auth ors. MR2446500 [16] For t, G. , Moulines, E . , R ober ts, G. O. and Rosenthal, J. S. (2003). On the ge- ometric ergo dicity of h ybrid samplers. J. Appl. Pr ob ab. 40 123–1 46. MR1953771 [17] G ilks, W . R. , Rob er ts, G . O. and Sahu, S. K. (1998). Adaptive Mark ov c hain Mon te Carlo through regeneration. J. Amer. Statist. Asso c. 93 1045–10 54. MR1649199 [18] Ha ario, H. , Saksman, E. and T ammin en, J. ( 2001). An adaptive Metrop olis alg o- rithm. Bernoul li 7 223–242. MR1828504 [19] Ha ario, H. , Saksman, E. and T amm inen, J. (2005). Comp onentw ise adaptation for high dimensional MCMC. Comput. Statist. 20 265 –273. MR2323976 [20] Ha stings, W. K. (1970). Mon te Carl o sampling meth ods using Marko v chains and their applications. Biometrika 57 97–109. ADAPTIVE GIBBS SAMPLERS 33 [21] Jarne r, S. F. and Hansen, E. (2000). Geometric ergodicity of Metrop olis algo- rithms. Sto chastic Pr o c ess. Appl. 85 341–361. MR1731030 [22] La tuszy ´ nski, K. ( 2008). Regeneration and ﬁxed-width analysis of Marko v c hain Mon te Carlo algorithms. Ph.D. dissertation. A v ailable at arXiv : 0907.4 716v1 . [23] Lev ine, R. A. (2005). A note on Mark ov c hain Mon t e Ca rlo sw eep strategies. J. Stat. Comput. Simul. 75 253–262. MR2134639 [24] Lev ine, R. A. and Casella, G. (2006). Op timizing random scan Gibbs sa mplers. J. Multivariate A nal. 97 2071–2 100. MR2301627 [25] Lev ine, R . A. , Yu, Z. , Hanley, W. G. and Ni t a o, J. J. (2005). Implementing random scan Gibbs samplers. Comput. Statist. 20 177 –196. MR2162541 [26] Li u, J. S. (2001). M onte Carlo Str ate gies in Scientiﬁc Computing . Springer, New Y ork. MR1842342 [27] Li u, J. S . , Wo ng, W. H. and Kong, A. (1995). Co va riance structure and conv er- gence rate of the Gibbs sampler with va rious scans. J. R oy. Statist . So c. Ser. B 57 157–169. MR1325382 [28] Me ngersen, K. L. and Tweedie, R. L. (1996 ) . Rates of con vergence of t he Hastings and Metrop olis alg orithms. A nn. Statist. 24 101–121. MR1389882 [29] Me tro p olis, N . , Rosenbluth, A. , Rosenbluth, M. , Teller, A. and Teller, E. (1953). Equations of state calculations by fast computing machines. J. Chem. Phys. 21 1087 –1091. [30] Me yn, S. P. and Tweedie, R. L. (1993). Markov Chains and Sto chastic Stab ility . Springer London Ltd., London. MR1287609 [31] Ne a th, R. C. and Jones, G. L. (2009). V ariable-at-a-time implementations of Metropolis–Hastings. Av ailable at arXiv: 0903.06 64v1 . [32] P ap asp iliopoulos, O. and Rober ts, G. ( 2008). S tabilit y of the Gibb s sampler for Ba yesian h ierarc hical models. A nn. Statist. 36 95–117. MR238796 5 [33] Rober t, C. P. and Case lla, G . (200 4). Monte Carlo Statistic al Metho ds . Springer, New Y ork. [34] Rober ts, G. O. , Ge lman, A. and Gilks, W . R. (1997). W eak conv ergence and optimal scaling of rand om walk Metrop olis algorithms. Ann. Appl. Pr ob ab. 7 110–120 . MR1428751 [35] Rober ts, G. O. and Polso n, N. G . (1994). On the geometric con vergence of th e Gibbs sampler. J. R oy. Statist. So c. Ser. B 56 377–384. MR1281 941 [36] Rober ts, G . O. and Ro senthal, J. S. (1997). Geometric ergodicity and hybrid Mark o v chains. El e ctr on. Commun. Pr ob ab. 2 13–25 (electronic). MR1448322 [37] Rober ts, G. O. and R osenthal, J. S. (1998). Tw o con vergence properties of hybrid samplers. Ann. Appl. Pr ob ab. 8 397–407. MR1624941 [38] Rober ts, G. O. and Rosenthal, J. S . (1998). Optimal scaling of discrete ap- proximati ons to Langevin diﬀusions. J. R. Stat. So c. Ser. B Stat. Metho dol. 60 255–268 . MR1625691 [39] Rober ts, G. O. and Rosenthal, J. S. (2001). O ptimal scaling for v arious Metropolis–Hastings algorithms. Stat i st. Sci. 16 351–367. MR1888450 [40] Rober ts, G. O. and R osenthal, J. S. (2004 ). General state space Marko v c h ains and MCMC alg orithms. Pr ob ab. Surv. 1 20–71. MR2095565 [41] Rober ts, G. O. and Ro senthal, J. S. (2007). Coupling and ergo dicity of adaptive Mark o v chain M onte Carlo algorithms. J. Appl. Pr ob ab. 44 458–4 75. MR234021 1 [42] Rober ts, G . O. and Rosenthal, J. S. (2009). Examples of adaptive MCMC. J. Comput. Gr aph. Statist. 18 349–367. MR2749836 34 K. LA TUSZY ´ NSKI, G. O. ROBER TS AN D J. S. ROSENTHAL [43] Rosenthal, J. S. (2011). Optimal prop osal distributions and adaptiv e MCMC. In Handb o ok of M arkov Chai n Monte Carlo ( S. Br ooks , A. Gelman , G. L. Jones and X.-L. Me ng , eds.). Chapman & H all/CR C, London. [44] Sa ksman, E. and Vihola, M. (2010). On the ergo dicity of the adaptiv e Metrop o- lis algorithm on unb ounded domains. Ann. Appl. Pr ob ab. 20 2178–220 3. MR2759732 [45] Tu rro , E. , Bochkina, N. , Hein, A. M. K. and Richardson, S. (2007). BGX: A B ioconductor pack age for the B a yes ian integrated analysis of A ﬀymetrix GeneChips. BMC Bi oi nformatics 8 439–448. Av ailable at http://www .biomedce ntral.com/1471- 2105/8/439 . [46] Vi hola, M. (2011). O n the stability and ergodicity of adaptive scaling Metropolis algorithms. Sto chastic Pr o c ess. Appl. 121 2839 –2860. MR2844543 [47] Y ang, C. (2008). On the we ak law of large n umb ers for unbounded functionals for adaptive MCMC. Preprin t. [48] Y ang, C. (2008). Recurrent and ergodic properties of adaptive MCMC. Preprin t. K. La tuszy ´ nski G. O. Rober ts Dep ar tment of St a tistics University of W ar wick CV4 7AL, Coventr y United Kingdom E-mail: latuc h@gmail . com Gareth.O.Roberts@warwick .ac.uk J. S. Rosenthal Dep ar tment of St a tistics University of Toronto Toronto, On t ario M5S 3G3 Canada E-mail: jeﬀ@math.toron to.edu

Adaptive Gibbs samplers and related MCMC methods

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment