A Gibbs Sampling Alternative to Reversible Jump MCMC

This note presents a simple and elegant sampler which could be used as an alternative to the reversible jump MCMC methodology.

Authors: Stephen G. Walker

A Gibbs Sampling Alternativ e to Rev ersible Jump MCMC Stephen G. W alk er 1 Abstract. This note presen ts a simple and elegant sampler whic h could b e used as an alt ernat ive to the rev ersible jump MCMC metho d- ology . Keyword s: Gibbs sampler; Model switc hing; V ariable dimension. 1. In tro duction. This no t e is a b out the problem of p erforming p osterior Ba y esian inference via Mark o v c h ains when the dimens ion o f the mo del is not fixed. The standard solution is the reve rsible jump MCMC approach described in Green (1995). The se t–up is t ypically of the form p ( y | θ ( k ) , k ), so that a mo del for data y is describ ed for eac h k = { 1 , 2 , . . . } and the parameter for mo del of dimension k is θ ( k ) . A prior distribution is no w assigned to θ ( k ) , sa y π k ( θ ( k ) ), a nd a prior fo r k , sa y π ( k ). So let us write the (incomplete) joint densit y for ( y , θ ( k ) , k ) as p ( y , θ ( k ) , k ) = p ( y | θ ( k ) , k ) π k ( θ ( k ) ) π ( k ) . This is incomple te (though ob viously a v alid mo del) s ince there is nothing ab out the ( θ ( j ) ; j 6 = k ). So let us add a distribution for ( θ ( j ) ; j 6 = k ) when mo del k is conditioned on; p ( y , ( θ ( j ) ; j = 1 , 2 , . . . ) , k ) = p ( y , θ ( k ) , k ) 1 Y l = k − 1 p ( θ ( l ) | θ ( l +1) ) ∞ Y l = k +1 p ( θ ( l ) | θ ( l − 1) ) , where the choice of p ( θ ( l ) | θ ( l − 1) ) and p ( θ ( l ) | θ ( l +1) ) is arbitr a ry . The marginal mo del is cor r ect, just in tegrate out the ( θ ( j ) ; j 6 = k ). So the latent v ariables ( θ ( j ) ; j 6 = k ), conditioned on k , w ould at fir st sight not to b e needed, but they play a crucial role in that they serv e as the prop osal mo v e b et w een dimensions. The problem now is ho w to mov e b et wee n dimensions, since t he c hoice is infinite and so the precise probabilities can not b e found. Ho w ev er, for this w e will in t r o duce ano t her latent v ariable u whic h fa cilita t es the mo v es in that it mak es the c hoice finite and hence pro babilities can b e computed. W e write p ( u | k ) for this a nd fo r simplicity of exp osition, t hough it is easily 1 Stephen G. W a lk er is Pr ofessor of Statistics, Institute of Mathematics , Statistics & Actuarial Science, Universit y of Kent, Canterbury , U. K. (email: S.G.W a lk er@kent .ac.uk) 1 allo w ed to b e more general, w e tak e u = k + 1 with probabilit y q and u = k with pro babilit y 1 − q , for all k . So, ov erall, the join t densit y b eing considered is p ( u, y , ( θ ( j ) ; j = 1 , 2 . . . ) , k ) = p ( u | k ) p ( y , ( θ ( j ) ; j = 1 , 2 , . . . ) , k ) . W e are no w in a p osition to describe the Gibbs sampler for dimension jump- ing. 2. The Gibbs sampler. Supp ose the c ha in is curren tly at k . Then we sample θ ( k ) from π k ( θ ( k ) | y , k ) ∝ p ( y | θ ( k ) , k ) π k ( θ ( k ) ) in the usual wa y and t ypically this is not an issue in suc h dimension v arying mo dels since it is done assuming k is fixed. No w, given k , w e will a lso sample θ ( k +1) from p ( θ ( k +1) | θ ( k ) ) and θ ( k − 1) from p ( θ ( k − 1) | θ ( k ) ). W e need these t w o since the mo v es from k can b e to { k − 1 , k , k + 1 } . The c hoice of these conditional densities is precisely for the same reasons that particular mov es a re suggested in the rev ersible jump MCMC approac h; basically , to increase the c hance of a mo v e a w ay from k . Once this has b een done, the u is sampled, and is either k or k + 1. Let us assume it is k + 1, so that the next k , we will call it j , can b e either k or k + 1. No w, clearly , w e ha ve π ( j = k + 1 | u = k + 1 , . . . ) ∝ (1 − q ) p ( y , θ ( k +1) , k + 1) p ( θ ( k ) | θ ( k +1) ) and π ( j = k | u = k + 1 , . . . ) ∝ q p ( y , θ ( k ) , k ) p ( θ ( k +1) | θ ( k ) ) . (1) All the other latent v ariables and their densities are common to, and hence cancel out from, b oth terms, and so ar e not needed. On the other hand, if u = k then j can b e either k or k − 1. It is then easy to deriv e that π ( j = k | u = k, . . . ) ∝ (1 − q ) p ( y , θ ( k ) , k ) p ( θ ( k − 1) | θ ( k ) ) and π ( j = k − 1 | u = k , . . . ) ∝ q p ( y , θ ( k − 1) , k − 1) p ( θ ( k ) | θ ( k − 1) ) . (2) Either w ay , j is easily sampled. There is a special case which deserv es atten tion and this is when w e insist on p ( θ ( k ) | θ ( k − 1) ) π k − 1 ( θ ( k − 1) ) = p ( θ ( k − 1) | θ ( k ) ) π k ( θ ( k ) ) (3) for all k . Then, for probabilities (1) , w e now ha v e the simpler situation, π ( j = k + 1 | u = k + 1 , . . . ) ∝ (1 − q ) p ( y | θ ( k +1) , k + 1) π ( k + 1) 2 and π ( j = k | u = k + 1 , . . . ) ∝ q p ( y | θ ( k ) , k ) π ( k ) , and for probabilities (2), w e hav e π ( j = k | u = k , . . . ) ∝ (1 − q ) p ( y | θ ( k ) , k ) π ( k ) and π ( j = k − 1 | u = k , . . . ) ∝ q p ( y | θ ( k − 1) , k − 1) π ( k − 1) . Once the ide a of the Gibbs sampler has b een understo o d, a generalization to more ty p es o f mov es is quite straigh tforw ard, and t his w o uld in v o lv e a mo dification of p ( u | k ). 3. Discussion. In summary , the algorithm is as easy a s fo llo ws, giv en k : 1. Sample θ ( k ) from π k ( θ ( k ) | y , k ), and sample θ ( k +1) and θ ( k − 1) from “pro- p osals” p ( θ ( k +1) | θ ( k ) ) and p ( θ ( k − 1) | θ ( k ) ), r espectiv ely . 2. Sa mple u from p ( u | k ) so t hat u = k + 1 with probability q and u = k with probabilit y q − 1. 3. Sample t he new j fr om the appropriate probabilities (1) or (2 ). Unlik e the reve rsible jump MCMC approa c h we don’t actually need an y sp ecial relation b etw een p ( θ ( k ) | θ ( k − 1) ) and p ( θ ( k − 1) | θ ( k ) ), though the proba- bilities (1) and (2) a r e simplified if they do satisfy a particular relation (3). This is b ecause there is no pressure to force a detailed balance criterion. So, w e ha v e describ ed a Gibbs sampler vers ion o f the rev ersible jump MCMC approac h whic h shares features suc h a s the evidence of prop osal mov es but, as we ha v e just said, remov es some of the pressure, and also la c ks the need for a Jacobian. Also, unlik e rev ersible jump MCMC there is no need to explain the algorithm; it is self eviden t and remark ably simple. It w ould b e quite easy to demonstrate in par t icular cases a mor e effi- cien t algorit hm has b een in tro duced when compared t o the reve rsible jump MCMC. No doubt it would also b e ac hiev able the other wa y round. This is rather b eside the p oint. What is clear is that a v astly simpler algorithm ha s b een presen t ed and it is clear that there is no obv ious r eason wh y it should b e uniformly w o rse than reve rsible jump MCMC. Reference. Green, P . J. (1995). Rev ersible jump Mark o v c hain Monte Carlo computa- tion and Ba ye sian mo del determination. Bio metrik a 82 , 7 1 1–732. 3

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment