This paper proposes a new randomized strategy for adaptive MCMC using Bayesian optimization. This approach applies to non-differentiable objective functions and trades off exploration and exploitation to reduce the number of potentially costly objective function evaluations. We demonstrate the strategy in the complex setting of sampling from constrained, discrete and densely connected probabilistic graphical models where, for each variation of the problem, one needs to adjust the parameters of the proposal mechanism automatically to ensure efficient mixing of the Markov chains.
A common line of attack for solving problems in physics, statistics and machine learning is to draw samples from probability distributions π(•) that are only known up to a normalizing constant. Markov chain Monte Carlo (MCMC) algorithms are often the preferred method for accomplishing this sampling task, see e.g. Andrieu et al. (2003) and Robert & Casella (1998). Unfortunately, these algorithms typically have parameters that must be tuned in each new situation to obtain reasonable mixing times. These parameters are often tuned by a domain expert in a time-consuming and error-prone manual process. Adaptive MCMC methods have been developed to automatically adjust the parameters of MCMC algorithms. We refer the reader to three recent and excellent comprehensive reviews of the field (Andrieu & Thoms, 2008;Atchade et al., 2009;Roberts & Rosenthal, 2009).
Adaptive MCMC methods based on stochastic approximation have garnered the most interest out of the various adaptive MCMC methods for two reasons. Firstly, they can be shown to be theoretically valid. That is, the Markov chain is made inhomogenous by the dependence of the parameter updates upon the history of the Markov chain, but its ergodicity can be ensured (Andrieu & Robert, 2001;Andrieu & Moulines, 2006;Saksman & Vihola, 2010). For example, Theorem 5 of Roberts & Rosenthal (2007) establishes two simple conditions to ensure ergodicity: (i) the non-adaptive sampler has to be uniformly ergodic and (ii) the level of adaptation must vanish asymptotically. These conditions can be easily satisfied for discrete state spaces and finite adaptation.
Secondly, adaptive MCMC algorithms based on stochastic approximation have been shown to work well in practice (Haario et al., 2001;Roberts & Rosenthal, 2009;Vihola, 2010). However, there are some limitations to the stochastic approximation approach. Some of the most successful samplers rely on knowing either the optimal acceptance rate or the gradient of some objective function of interest. Another disadvantage is that these stochastic approximation methods may require many iterations. This is particularly problematic when the objective function being optimized by the adaptation mechanism is costly to evaluate. Finally, gradient approaches tend to be local and hence they can get trapped in local optima when the Markov chains are run for a finite number of steps in practice.
This paper aims to overcome some of these limitations. It proposes the use of Bayesian optimization (Brochu et al., 2009) to tune the parameters of the Markov chain. The proposed approach, Bayesianoptimized MCMC, has a few advantages over adaptive methods based on stochastic approximation.
Bayesian optimization does not require that the objective function be differentiable. This enables us to be much more flexible in the design of the adaptation mechanisms. We use the area under the auto-correlation function up to a specific lag as the objective function in this paper. This objective has been suggested previously by Andrieu & Robert (2001). However, the computation of gradient estimates for this objective is very involved and far from trivial (Andrieu & Robert, 2001). We believe this is one of the main reasons why practitioners have not embraced this approach. Here, we show that this objective can be easily optimized with Bayesian optimization. We argue that Bayesian optimization endows the designer with greater freedom in the design of adaptive strategies.
Bayesian optimization also has the advantage that it is explicitly designed to trade off exploration and exploitation and is implicitly designed to minimize the number of expensive evaluations of the objective function (Brochu et al., 2009;Lizotte et al., 2011).
Another important property of Bayesian-optimized MCMC is that it does not use a specific setting for the parameters of the proposal distribution, but rather a distribution over parameter settings with probabilities estimated during the adaptation process. Indeed, we find that a randomized policy over a set of parameter settings mixes faster than a specific parameter value for the models considered in this paper.
Bayesian optimization has been used with MCMC in Rasmussen (2003) with the intent of approximating the posterior with a surrogate function to minimize the cost of hybrid Monte Carlo evaluations. The intent in this paper is instead to adapt the parameters of the Markov chain to improve mixing.
To demonstrate MCMC adaptation with Bayesian optimization, we study the problem of adapting a sampler for constrained discrete state spaces proposed recently by Hamze & de Freitas (2010). The sampler uses augmentation of the state space in order to make large moves in discrete state space. In this sense, the sampler is similar to the Hamiltonian (hybrid) Monte Carlo for continuous state spaces (Duane et al., 1987;Neal, 2010). Although these samplers typically only have two parameters, these are very tricky to tune even by experts. Moreover
This content is AI-processed based on open access ArXiv data.