GPU-accelerated Gibbs sampling: a case study of the Horseshoe Probit model
📝 Abstract
Gibbs sampling is a widely used Markov chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences. Many implementations of MCMC methods do not extend easily to parallel computing environments, as their inherently sequential nature incurs a large synchronization cost. In the case study illustrated by this paper, we show how to do Gibbs sampling in a fully data-parallel manner on a graphics processing unit, for a large class of exchangeable models that admit latent variable representations. Our approach takes a systems perspective, with emphasis placed on efficient use of compute hardware. We demonstrate our method on a Horseshoe Probit regression model and find that our implementation scales effectively to thousands of predictors and millions of data points simultaneously.
💡 Analysis
Gibbs sampling is a widely used Markov chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences. Many implementations of MCMC methods do not extend easily to parallel computing environments, as their inherently sequential nature incurs a large synchronization cost. In the case study illustrated by this paper, we show how to do Gibbs sampling in a fully data-parallel manner on a graphics processing unit, for a large class of exchangeable models that admit latent variable representations. Our approach takes a systems perspective, with emphasis placed on efficient use of compute hardware. We demonstrate our method on a Horseshoe Probit regression model and find that our implementation scales effectively to thousands of predictors and millions of data points simultaneously.
📄 Content
Statistics and Computing https://doi.org/10.1007/s11222-018-9809-3 GPU-accelerated Gibbs sampling: a case study of the Horseshoe Probit model Alexander Terenin1 · Shawfeng Dong2 · David Draper2 Received: 31 July 2017 / Accepted: 7 March 2018 © The Author(s) 2018 Abstract Gibbs sampling is a widely used Markov chain Monte Carlo (MCMC) method for numerically approximating integrals of interest in Bayesian statistics and other mathematical sciences. Many implementations of MCMC methods do not extend easily to parallel computing environments, as their inherently sequential nature incurs a large synchronization cost. In the case study illustrated by this paper, we show how to do Gibbs sampling in a fully data-parallel manner on a graphics processing unit, for a large class of exchangeable models that admit latent variable representations. Our approach takes a systems perspective, with emphasis placed on efficient use of compute hardware. We demonstrate our method on a Horseshoe Probit regression model and find that our implementation scales effectively to thousands of predictors and millions of data points simultaneously. Keywords Bayesian generalized linear models · Big data · Graphics processing units · High-dimensional statistical modeling · Markov chain Monte Carlo · Parallel computing 1 Introduction The Bayesian statistical paradigm has a variety of desir- able properties. It accounts for the uncertainty inherent in statistical inference by producing a posterior distribution, which fundamentally contains more information about the unknown quantities of interest than a point estimate. It also propagates this uncertainty to predictive distributions and thus does not overfit in a way that paradigms that produce only point estimates may do. Unfortunately, the computa- tional methods required to produce a posterior distribution tend to be expensive. In particular, Markov chain Monte Carlo (MCMC) methods (Metropolis et al. 1953; Hastings 1970; Geman and Geman 1984)—the cornerstone of mod- B Alexander Terenin a.terenin17@imperial.ac.uk Shawfeng Dong shaw@ucsc.edu David Draper draper@ucsc.edu 1 Statistics Section, Department of Mathematics, Imperial College London, London, UK 2 Applied Mathematics and Statistics, University of California, Santa Cruz, Santa Cruz, CA, USA ern Bayesian computation—often do not scale well either with data set size or with model complexity. In this paper, we present a case study of a way to imple- ment MCMC for a large class of Bayesian models that admit exchangeable likelihoods with latent variable representa- tions. We do so by performing the computations on a graphics processing unit (GPU), a widely available parallel processor originally designed for 3D video use cases but well suited to a variety of other tasks. In the sections that follow, we describe GPUs, characterize models in which this approach is usable, and demonstrate the method on a Horseshoe Probit model with N = 1,000,000 and p = 1000. Standard com- putation with such N may easily take O (days)—the method we describe runs in O (minutes). This approach requires no new mathematical theory— instead, we consider the systems perspective. MCMC is widely thought to be inherently sequential and unsuited to parallel environments. Furthermore, many practitioners are not aware that substantially different approaches can be needed to parallelize algorithms for use on GPUs, rather than computer clusters, due to issues such as warp divergence that will be described later. To compare with an approach well- suited to compute clusters, see Terenin et al. (2016). Our contribution here is to demonstrate that Gibbs sampling on GPUs is doable for a generic class of models and to present ideas that a practitioner would need to consider in imple- 123 Statistics and Computing menting GPU Gibbs sampling, in the context of a Horseshoe Probit regression model. 2 Previous work ThereareanumberofapproachesforusingGPUsforacceler- ating computation in Bayesian inference—some approaches are completely model specific, others are generic. The fol- lowing overview is brief, emphatically not exhaustive, and presented in no particular order. – Bayesian mixture models. Suchard et al. (2010) review and describe GPUs, and outline a method for performing calculations needed to fit Bayesian mixture models with MCMC on a GPU. – Hamiltonian Monte Carlo. Beam et al. (2015) outline a method for fitting a Bayesian multinomial logistic regres- sion model on a GPU with Hamiltonian Monte Carlo. – Paralleltempering.MingasandBouganis(2012)describe a method for sampling from multimodal distributions with parallel tempering, using hardware acceleration in the form of a field-programmable grid array (FPGA), and compare their method with GPU implementations. – Sequential Monte Carlo. Lee et al. (2010) review the architecture, programming model, and performance of GPUs and describe methods for running importance- sampling-based algorithms such as sequential Monte Carlo on GPUs. – Sta
This content is AI-processed based on ArXiv data.