Comments on "Particle Markov Chain Monte Carlo" by C. Andrieu, A. Doucet and R. Hollenstein

Commen ts on “ P article Mark o v Chain Mon te Carlo ” b y C. Andrieu, A. Doucet and R . Hollenstein J. Cornebise ∗ and G.W. P eters †‡ July 28, 2021 Abstract W e merge in this note our tw o discussions ab out the Read Paper “P article Mark o v c hain Mon te Carlo” ( Andrieu, Doucet, and Holenstein , 2010 ) presen ted on October 16 th 2009 at the Roy al Statistical So ciet y , app earing in the Journal of the Roy al Statistical So ciet y Series B. W e also presen t a more detailed v ersion of the ABC extension. 1 Intro duction The article Andrieu et al. ( 2010 ) is clearly going to ha v e signiﬁcan t impact on scientiﬁc disciplines with a st rong in terface with computational statistics and non-linear state space mo dels. Our commen ts are based on practical exp erience with PMCMC implemen tation in latent pro cess multifactor SDE mo dels for commo dities ( P eters et al. , 2009 ), wireless comm unications ( Nev at et al. , 2009 ) and p opulation dynamics ( Hay es et al. , 2009 ), using Rao-Blac kwellised particle ﬁlters ( Doucet et al. , 2000 ) and adaptiv e MCMC ( Rob erts and Rosen thal , 2009 ). 2 Generic comments • F rom our implemen tations, ideal use cases consist of highly non-linear dynamic equa- tions for a small dimension d x of the state-space, large dimension d θ of the static parameter, an d poten tially large length T of the time series. In our cases d x w as 2 or 3, d θ up to 20, and T b et w een 100 and 400. ∗ Statistical and Applied Mathematical Sciences Institute, P .O. Box 14006, Research T riangle Park, NC 27709-4006, USA, jcornebise@samsi.info † Sc hool of Mathematics and Statistics, Universit y of New South W ales, Sydney , NSW, 2052, Australia, garethpeters@unsw.edu.au ‡ This material was based up on w ork supp orted b y the National Science F oundation under Agreemen t No. DMS-0635449. An y opinions, ﬁndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reﬂect the views of the National Science F oundation. 1 • In PMH, non-adaptiv e MCMC prop osals for θ (e.g. tuned according to pre-simulati on c hains or burn-in iterations) w ould b e costly for large T , and requires to keep N ﬁxed o v er the whole run of the Mark o v chain. Adaptiv e MCMC prop osals such as the Adaptiv e Metrop olis sampler ( Rob erts and Rosen thal , 2009 ), a v oid such issues and pro v ed particularly relev an t for large d θ and T , as can b e seen in Figure 2 . • The Particle Gibbs (PG) could p otentially stay frozen on a state x 1: T ( i ). Consider a state space mo del with state transition function almost linear in x n for some range of θ , from whic h y 1: T is considered to result, and strongly non-linear elsewhere. If the PG samples θ ( i ) in those regions of strong non-linearit y , the particle tree would lik ely coalesce on the tra jectory preserved b y the conditional SMC, leaving it with a high imp ortance w eigh t, main taining ( θ ( i + 1) , x 1: T ( i + 1)) = ( θ ( i ) , x 1: T ( i )) o v er sev eral iterations. Using PMH within PG w ould help escap e this region, esp ecially using PRC and adaptiv e SMC kernels, outlined in another commen t, to ﬁgh t the degeneracy of the ﬁlter and the high v ariance of ˆ p θ ( y 1: T ). 3 Adaptive Sequential Monte Ca rlo Our commen ts on adaptiv e SMC relate to Part icle marginal Metrop olis-Hastings (PMMH) whic h has acceptance probabilit y given in Equation (13) of the read pap er for prop osed state ( θ ∗ , X ∗ 1: T ), relying on the estimate ˆ p θ ∗ ( y 1: T ) = T Y n =1 1 N N X k =1 w n  x ∗ ,k 1: n  . Although a small N suﬃces to appro ximate the mo de of joint path space distribution, pro ducing a reasonable prop osal for x 1: T , it results in high v ariance estimates of ˆ p θ ∗ ( y 1: T ). W e study a p opulation dynamics example from ( Hay es et al. , 2009 , Mo del 3 excerpt), in volving a log-transformed theta-logistic state space model, see ( W ang , 2007 , Equation 3(a), 3(b)) for parameter settings. PMCMC p erformance dep ends on the trade-oﬀ b et ween degeneracy of the ﬁlter, N , and design of the SMC m utation kernel. Regarding the latter: • A Rao-Blac kwellised ﬁlter ( Doucet et al. , 2000 ) can impro v e acceptance rates, Nev at et al. (see 2009 ). • Adaptiv e mutation kernels, which in PMCMC, can b e considered as adaptive SMC prop osals, can reduce degeneracy on the path space, allo wing for higher dimensional state vect ors x n . Adaption can b e lo cal (within ﬁlter) or global (sampled Mark o v c hain history). Though curren tly particularly designed for ABC metho ds, the work of P eters et al. ( 2008 ) incorp orates in to the m utation kernel of SMC Samplers ( Del Moral et al. , 2006 ) the P artial Rejection Con trol (PRC) mec hanism of Liu ( 2001 ), which is also b eneﬁcial for PMCMC. PRC adaption reduces degeneracy b y rejecting a particle m utation when its incremen tal imp ortance w eigh t is b elow a threshold c n . The PRC m utation kernel q ∗ θ ( x n | y n , x n − 1 ) = r ( c n , x n − 1 ) − 1 min  1 , W n − 1 ( x n − 1 ) w n ( x n − 1 , x n ) c n  q θ ( x n | y n , x n − 1 ) , (1) 2 can also b e used in PMH, where q θ ( x n | y n , x n − 1 ) is the standard SMC prop osal, and r ( c n , x n − 1 ) = Z min  1 , W n − 1 ( x n − 1 ) w n ( x n − 1 , x n ) c n  q θ ( x n | y n , x n − 1 ) dx n . (2) As presen ted in Peters et al. ( 2008 ), algorithmic c hoices for q ∗ θ ( x n | y n , x n − 1 ) can a v oid ev aluation of r ( c n , x n − 1 ). Cornebise ( 2009b ) extend this wo rk, dev eloping PRC for Auxiliary SMC samplers, also useful in PMH. Threshold c n can b e set adaptiv ely: lo cally either at each SMC mutatio n or Mark o v c hain iteration; or globally based on c hain acceptance rates. Additionally , c n can b e set adaptiv ely via quantile estimates of pre-PR C incremen tal weigh ts, see Peters et al. ( 2009 ). • Cornebise et al. ( 2008 ) state that adaptiv e SMC prop osals can b e designed by mini- mizing function-free risk theoretic criteria suc h as Kullbac k-Leibler div ergence b etw een a joint prop osal in a parametric family and a join t target. Cornebise ( 2009a , Chapter 5) and Cornebise et al. ( 2009 ) use a mixture of exp erts, adapting kernels of a mixture on distinct regions of the state-space separated b y a softmax partition. These results extend to PMCMC settings. 0 10 20 30 40 50 60 70 80 90 100 3.5 4 4.5 5 5.5 6 6.5 7 7.5 Population dynamics state space model with log transformed theta−logistic latent process. Generated latent state realisations Generated observations Figure 1: Sequence of sim ulated states and observ ations for the p opulation dy- namic log-transformed theta logistic model from W ang ( 2007 ), with static parame- ter θ = ( r , ζ , K ) under constraints K > 0, r < 2 . 69, ζ ∈ R . State transi- tion is f θ ( x n | x n − 1 ) = N  x n ; x n − 1 + r  1 − (exp( x t − 1 ) /K ) ζ  , 0 . 01  , and lo cal likelih o o d is g θ ( y n | x n ) = N ( y n ; x n , 0 . 04), for T = 100 timesteps. 3 0 1 2 3 4 5 6 7 8 9 10 x 10 4 4.4 4.5 4.6 4.7 x 2 Sample path for latent state value x 2 in the PMH via an SIR filter. Markov chain iteration i 0 1 2 3 4 5 6 7 8 9 10 x 10 4 6.7 6.75 6.8 6.85 6.9 x 37 Sample path for latent state value x 37 in the PMH via an SIR filter. Markov chain iteration i 0 1 2 3 4 5 6 7 8 9 10 x 10 4 6.75 6.8 6.85 6.9 6.95 x 93 Sample path for latent state value x 93 in the PMH via an SIR filter. Markov chain iteration i 0 1 2 3 4 5 6 7 8 9 10 x 10 4 0.8 1 1.2 1.4 1.6 Sample path for parameter r in the PMH via an adaptive Metropolis proposal. Markov chain iteration i r 0 1 2 3 4 5 6 7 8 9 10 x 10 4 885 890 895 900 905 910 Sample path for parameter K in the PMH via an adaptive Metropolis proposal. Markov chain iteration i K 0 1 2 3 4 5 6 7 8 9 10 x 10 4 0.2 0.4 0.6 0.8 1 1.2 Sample path for parameter ζ in the PMH via an adaptive Metropolis proposal. Markov chain iteration i ζ Figure 2: P ath of three sampled laten t states x 2 , x 37 , x 93 , and of the sampled parameters θ = ( r, ζ , L ), ov er 100 , 000 PMH iterations based on N = 200 particles using a simple SIR ﬁlter – the one dimensional state did not call for Rao-Blackw ellisation. Note also the eﬀect the Adaptiv e MCMC prop osal for θ , set-up to start at iteration 5 , 000, particularly visible on the mixing of parameter K . The most noticeable prop ert y of the algorithm is the remark able mixing of the chai n, in spite of the high total dimension of the sampled state: each iteration in v olves a prop osal of ( X 1: T , θ ) of dimension 103. 4 0 10 20 30 40 50 60 70 80 90 100 −80 −60 −40 −20 0 20 40 Initialised PMH state for x 1:T true path x 1:T Initialised Path x 1:T 0 10 20 30 40 50 60 70 80 90 100 −80 −60 −40 −20 0 20 40 Average poster mean path estimate for X 1:T , PMH Markov chain iteration i=10, N=200 particles true path x 1:T Poster Mean Path estimate of E(X 1:T |y 1:T ) 0 10 20 30 40 50 60 70 80 90 100 3 4 5 6 7 Average poster mean path estimate for X 1:T , PMH Markov chain iteration i=20,000, N=200 particles true path x 1:T Poster Mean Path estimate of E(X 1:T |y 1:T ) Figure 3: Con vergence of the distribution of the path of laten t states x 1: T . Note the change of v ertical scale. Initializing PMH on a very unlik ely initial path do es not preven t the MMSE estimate of the latent states con v erging: as few as 10 PMH iterations already b egins to concen trate the sampled paths around the true path – assumed here to b e close to the mo de of the p osterior distribution thanks to the small observ ation noise –, with v ery satisfactory results after 20 , 000 iterations. 5 4 App ro ximate Ba y esian Computation and PMCMC F or in tractable joint likelihoo d p θ ( y 1: T | x 1: T ), we could design a SMC-ABC algorithm (see e.g. Peters et al. , 2008 ; Ratmann , 2010 , Chapter 1) for a ﬁxed ABC tolerance  , using the appro ximations ˆ p AB C θ ( y 1: T ) := 1 N N X k =1 1 S P S s =1 I  ρ ( y k 1 ( s ) , y 1 ) <   µ θ ( x k 1 ) q θ ( x k 1 | y 1 ) × T Y n =2   1 N N X k =1 1 S P S s =1 I  ρ ( y k n ( s ) , y n ) <   f θ ( x k n | x A k n − 1 n − 1 ) q θ ( x k n | y n , x A k n − 1 n − 1 )   or ˆ p AB C θ ( y 1: T ) := 1 N N X k =1 1 S P S s =1 N  y k 1 ( s ); y 1 ,  2  µ θ ( x k 1 ) q θ ( x k 1 | y 1 ) × T Y n =2   1 N N X k =1 1 S P S s =1 N  y k n ( s ); y n ,  2  f θ ( x k n | x A k n − 1 n − 1 ) q θ ( x k n | y n , x A k n − 1 n − 1 )   with ρ a distance on the observ ation space and y k n ( s ) ∼ g θ ( ·| x k n ) simulated observ ations. Ad- ditional degeneracy on the path space induced by ABC appro ximation should b e con trolled, e.g. with PR C ( P eters et al. , 2008 ), see Equation ( 1 ). More details on this algorithm are a v ailable in App endix A , whic h is not con tained in our commen t to JRSSB due to space restrictions. A Algo rithmic details of ABC ﬁltering within PMCMC Here we expand on the commen t w e made ab ov e in whic h w e appro ximated the lo cal like- liho o d g θ ( y n | x n ) of the SMC-based ﬁltering part of the PMCMC algorithm. b y the ABC appro ximation g AB C θ ( y n | x n , y n (1 : S ) ,  ) := 1 S S X s =1 π θ ( y n ( s ) | x n , y n ,  ) (3) where p ossible c hoices for π θ are π I θ ( y n ( s ) | x n , y n ,  ) := I ( ρ ( y n ( s ) , y n ) <  ) or π N θ ( y n ( s ) | x n , y n ,  ) := N  y n ( s ); y n ,  2  with ρ a distance on the observ ation space and y n ( s ) ∼ g θ ( ·| x n ) sim ulated observ ations – assumed here to b e univ ariate for sak e of brevity , but generalisation to multiv ariate setting and summary statistics is straightfor w ard. W e note it is critical in the ﬁltering context to ensure that the particle system under appro ximation do es not collapse in to uniformly n ull incremen tal w eigh ts w n  X k 1: n  = 0 whic h 6 Algorithm 1 SMC-ABC-PRC ﬁltering algorithm targeting p θ ( x 1: T | y 1: T ) as required in Step 2(b) of the PMMH of ( Andrieu et al. , 2010 , Section 2.4.2). Replaces the SMC algorithm pre- sen ted in ( Andrieu et al. , 2010 , Section 2.2.1). Appro ximation g AB C θ is deﬁned in Equation ( 3 ) and function r in Equation ( 2 ). Step 1: Initialize  and c 1 Step 2: A t time n = 1, (a) for k = 1 , . . . , N (i) sample X k 1 ∼ q θ ( ·| y 1 ) (ii) sample Y k 1 ( s ) ∼ g θ ( ·| X k 1 ) for s = 1 , . . . , S (iii) compute the incremental weig h t ˜ w 1 ( X k 1 ) := µ θ ( X k 1 ) g AB C θ ( y 1 | X k 1 , Y k 1 (1 : S ) ,  ) q θ ( X k 1 | y 1 ) , (iv) with probabilit y 1 − p k 1 = 1 − min { 1 , ˜ w 1 ( X k 1 ) /c 1 } , reject X k 1 and go to (i) (v) otherwise, accept X k 1 and set w 1 ( X k 1 ) = ˜ w 1 ( X k 1 ) r ( c 1 ) /p k 1 (b) normalise the weigh ts W k 1 := w 1 ( X k 1 ) / P N m =1 w 1 ( X m 1 ). Step 3: A t times n = 2 , . . . , T , (a) p ossibly adapt c n online (b) for k = 1 , . . . , N (i) sample A k n − 1 ∼ F ( ·| W n − 1 ), (ii) sample X k n ∼ q θ ( ·| y n , X A k n − 1 n − 1 ) and set X k 1: n = ( X A k n − 1 n − 1 , X k n ), and (iii) sample Y k n ( s ) ∼ g θ ( ·| X k n ) for s = 1 , . . . , S (iv) compute the incremental weigh t ˜ w n ( X k 1: n ) := f θ ( X k n | X A k n − 1 n − 1 ) g AB C θ ( y n | X k n , Y k n (1 : S ) ,  ) q θ ( X k n | y n , X A k n − 1 n − 1 ) , (v) with probability 1 − p k n = 1 − min { 1 , ˜ w n ( X k 1: n ) /c n } , reject X k n and go to (ii) (vi) otherwise, accept X k n and set w n ( X k 1: n ) = ˜ w n ( X k 1: n ) r ( c n , X A k n − 1 n − 1 ) /p k n (c) normalise the weigh ts W k n := w n ( X k 1: n ) / P N m =1 w n ( X m 1: n ). 7 ma y o ccur for π I θ at any stage of the ﬁltering during eac h PMCMC iteration, esp ecially for small tollerances  . The PRC m utation kernel q ∗ θ ( x n | y n , x n − 1 ) deﬁned in Equation ( 1 ) is critical to ov ercome b oth this collapse and the additional degeneracy on the path space in troduced by the ABC appro ximation. The algorithm presen ted in McKinley et al. ( 2009 , Sections 3.4 and 3.5.1) is a sp ecial case of the SMC samplers PRC-ABC algorithm of P eters et al. ( 2008 ) in whic h the PRC rejection threshold c n = 0, the mutation k ernel is global and resampling is p erformed at each stage of the ﬁlter, which av oids the computation of the normalizing constant r ( c n , x n − 1 ) deﬁned in Equation ( 2 ). W e further note that the w ork of Cornebise ( 2009a ) casts the SMC sampler PR C algorithm ( P eters et al. , 2008 ) with rejection of the ancestor index A k n − 1 – here in Step 3.(a).(v) – in to an Auxiliary SMC sampler framew ork. The com bination of these tw o concepts recov ers a generalized v ersion of McKinley et al. ( 2009 ), see Algorithm 1 , whic h has adv ant age in the PMCMC setting of allo wing for adaptation of the threshold c n . References Andrieu, C., A. Doucet, and R. Holenstein (2010). Particle Marko v c hain Mon te Carlo metho ds. J. R. Statis. So c. B 72 (2), 1–33. Cornebise, J. (2009a). A daptive Se quential Monte Carlo Metho ds . Ph. D. thesis, Univ ersit y Pierre and Marie Curie – Paris 6. Cornebise, J. (2009b). Auxiliary SMC samplers with applications to PR C and ABC. W orking pap er, Stastistical and Applied Mathematical Sciences Institute. Cornebise, J., E. Moulines, and J. Olsson (2008). Adaptiv e methods for sequential imp ortance sampling with appli cation to state space mo dels. Statistics and Computing 18 (4), 461–480. Cornebise, J., E. Moulines, and J. Olsson (2009). Adaptiv e sequen tial Monte Carlo b y means of mixture of exp erts. W orking pap er, T elecom ParisT ec h. Del Moral, P ., A. Doucet, and A. Jasra (2006). Sequential Mon te Carlo samplers. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) 68 (3), 411–436. Doucet, A., N. de F reitas, K. Murph y , and S. Russell (2000). Rao-Blac kw ellised particle ﬁltering for dynamic Ba y esian net w orks. In Pr o c e e dings of the Sixte enth Confer enc e on Unc ertainty in Artiﬁcial Intel ligenc e , pp. 176–183. Ha y es, K., G. Hosac k, and G. P eters (2009). Searchin g for the all ee eﬀect in laten t state space mo dels via adaptive particle Mark o v chain Mon te Carlo. W orking pap er, Departmen t of Mathematics and Statistics, Universi t y of New South W ales. Liu, J. (2001). Monte Carlo Str ate gies in Scientiﬁc Computing . New Y ork: Springer. McKinley , T., A. Co ok, and R. Deardon (2009). Inference in epidemic mo dels without lik eliho o ds. The International Journal of Biostatistics 5 (1), 24. 8 Nev at, I., G. P eters, and A. Doucet (2009). Channel tracking for relay net w orks via adaptive particle MCMC. W orking pap er, Univ ersit y of New South W ales. P eters, G., M. Briers, P . Shev c henk o, and A. Doucet (2009). Online calibration and ﬁltering for multi factor commo dit y mo dels with seasonality: incorporating futures and options con tracts. W orking pap er, Department of Mathematics and Statistics, Univ ersit y of New South W ales. P eters, G., Y. F an, and S. Sisson (2008). On sequential Mon te Carlo, partial rejection con trol and approximate Ba yesian computation. T echnical rep ort, Departmen t of Mathematics and Statistics, Universit y of New South W ales. P eters, G., Y. F an, and S. Sisson (2009). Lik elihoo d-free Ba yesian inference for α -stable mo dels. T ec hnical rep ort, Departmen t of Mathematics and Statistics, Universit y of New South W ales. Ratmann, O. (2010). Appr oximate Bayesian Computation under mo del unc ertainty, with an applic ation to sto chastic pr o c esses of network evolution . Ph. D. Thesis, to app ear, Imp erial College. Rob erts, G. and J. Rosenthal (2009). Examples of adaptiv e MCMC. Journal of Computa- tional and Gr aphic al Statistics 18 (2), 349–367. W ang, G. (2007). On the laten t state estimation of nonlinear p opulation dynamics using ba y esian and non-Bay esian state-space mo dels. Ec olo gic al Mo del ling 200 (3-4), 521–528. 9

Comments on "Particle Markov Chain Monte Carlo" by C. Andrieu, A. Doucet and R. Hollenstein

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment