On the use of backward simulation in particle Markov chain Monte Carlo methods

Reading time: 7 minute
...

📝 Original Info

  • Title: On the use of backward simulation in particle Markov chain Monte Carlo methods
  • ArXiv ID: 1110.2873
  • Date: 2013-06-01
  • Authors: : Andrieu, Christophe, Doucet, Arnaud, Holenstein, Roman

📝 Abstract

Recently, Andrieu, Doucet and Holenstein (2010) introduced a general framework for using particle filters (PFs) to construct proposal kernels for Markov chain Monte Carlo (MCMC) methods. This framework, termed Particle Markov chain Monte Carlo (PMCMC), was shown to provide powerful methods for joint Bayesian state and parameter inference in nonlinear/non-Gaussian state-space models. However, the mixing of the resulting MCMC kernels can be quite sensitive, both to the number of particles used in the underlying PF and to the number of observations in the data. In the discussion following (Andrieu et al., 2010), Whiteley suggested a modified version of one of the PMCMC samplers, namely the particle Gibbs (PG) sampler, and argued that this should improve its mixing. In this paper we explore the consequences of this modification and show that it leads to a method which is much more robust to a low number of particles as well as a large number of observations. Furthermore, we discuss how the modified PG sampler can be used as a basis for alternatives to all three PMCMC samplers derived in (Andrieu et al., 2010). We evaluate these methods on several challenging inference problems in a simulation study. One of these is the identification of an epidemiological model for predicting influenza epidemics, based on search engine query data.

💡 Deep Analysis

Deep Dive into On the use of backward simulation in particle Markov chain Monte Carlo methods.

Recently, Andrieu, Doucet and Holenstein (2010) introduced a general framework for using particle filters (PFs) to construct proposal kernels for Markov chain Monte Carlo (MCMC) methods. This framework, termed Particle Markov chain Monte Carlo (PMCMC), was shown to provide powerful methods for joint Bayesian state and parameter inference in nonlinear/non-Gaussian state-space models. However, the mixing of the resulting MCMC kernels can be quite sensitive, both to the number of particles used in the underlying PF and to the number of observations in the data. In the discussion following (Andrieu et al., 2010), Whiteley suggested a modified version of one of the PMCMC samplers, namely the particle Gibbs (PG) sampler, and argued that this should improve its mixing. In this paper we explore the consequences of this modification and show that it leads to a method which is much more robust to a low number of particles as well as a large number of observations. Furthermore, we discuss how the

📄 Full Content

Particle Markov chain Monte Carlo (PMCMC) is an umbrella for a collection of methods introduced in [1]. The fundamental idea underlying these methods, is to use a sequential Monte Carlo (SMC) sampler, i.e. a particle filter (PF), to construct a proposal kernel for an MCMC sampler. The resulting methods were shown to be powerful tools for joint Bayesian parameter and state inference in nonlinear, non-Gaussian state-space models. However, to obtain reasonable mixing of the resulting MCMC kernels, it was reported that a fairly high number of particles was required in the underlying SMC samplers. This might seem like a very natural observation; in order to obtain a good proposal kernel based on a PF, we should intuitively use sufficiently many particles to obtain high accuracy in the PF. However, this is also one of the major criticisms against PMCMC. If we need to run a PF with a high number of particles at each iteration of the PMCMC sampler, then a lot of computational resources will be wasted and the method will be very computationally intense (or "computationally brutal" as Flury and Shephard put it in the discussion following [1]). It is the purpose of this work to show that this need not be the case. We will discuss alternatives to each of the three PMCMC methods derived in [1], that will function properly even when using very few particles. The basic idea underlying these modified PMCMC samplers was originally proposed by Whiteley [2].

To formulate the problem that we are concerned with, consider a general, discrete-time state-space model with state-space X, parameterised by θ ∈ Θ,

The initial state has density π θ (x 1 ). We take a Bayesian viewpoint and model the parameter as a random variable with prior density p(θ). Given a sequence of observations y 1:T {y 1 , . . . , y T }, we wish to estimate the parameter θ as well as the system state x 1:T . That is, we seek the posterior density p(θ, x 1:T | y 1:T ).

During the last two decades or so, Monte Carlo methods for state and parameter inference in this type of nonlinear, non-Gaussian state-space models have appeared at an increasing rate and with increasingly better performance. State inference via SMC is thoroughly treated by for instance [4][5][6]. For the problem of parameter inference, the two overview papers [7,8] provide a good introduction to both frequentistic and Bayesian methods. Some recent results on Monte Carlo approaches to maximum likelihood parameter inference in statespace models can be found in [9][10][11][12]. Existing methods based on PMCMC will be discussed in the next section, where we also provide a preview of the material presented in the present work.

In [1], three PMCMC methods were introduced to address the inference problem mentioned above. These methods are referred to as particle Gibbs (PG), particle marginal Metropolis-Hastings (PMMH) and particle independent Metropolis-Hastings (PIMH).

Let us start by considering the PG sampler, which is an MCMC method targeting the joint density p(θ, x 1:T | y 1:T ). In an “idealised” Gibbs sampler (see e.g. [13] for an introduction to Gibbs sampling), we would target this density by the following two-step sweep,

• Draw θ | x 1:T ∼ p(θ | x 1:T , y 1:T ).

• Draw x 1:T | θ ∼ p θ (x 1:T | y 1:T ).

The first step of this procedure can, for some problems be carried out exactly (basically if we use conjugate priors for the model under study). In the PG sampler, this is assumed to be the case. However, the second step, i.e. to sample from the joint smoothing density p θ (x 1:T | y 1:T ), is in most cases very difficult. In the PG sampler, this is addressed by instead sampling a particle trajectory x 1:T using a PF. More precisely, we run a PF targeting the joint smoothing density. We then sample one of the particles at the final time T , according to their importance weights, and trace the ancestral lineage of this particle to obtain the trajectory x 1:T . However, as we shall see in Section 4.2 this can lead to very poor mixing when there is degeneracy in the PF (see e.g. [5,14] for a discussion on PF degeneracy). This will inevitably be the case when T is large and/or N is small.

One way to remedy this problem is to append a backward simulator to the PG sampler, leading to a method that we denote particle Gibbs with backward simulation (PG-BSi). The idea of using backward simulation in the PG context was mentioned by Whiteley [2] in the discussion following [1]. To make this paper self contained, we will present a derivation of the PG-BSi sampler in Section 4. The reason for why PG-BSi can avoid the poor mixing of the original PG sampler will be discussed in Section 4.3. Furthermore, in Section 4.2 we shall see that the PG-BSi sampler can operate properly even with very few particles and vastly outperform PG in such cases. Now, as mentioned above, to apply the PG and the PG-BSi samplers we need to sample from the conditional parameter density p(θ | x 1:T , y 1:T ). This is not alw

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut