MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster
Many problems arising in applications result in the need to probe a probability distribution for functions. Examples include Bayesian nonparametric statistics and conditioned diffusion processes. Standard MCMC algorithms typically become arbitrarily slow under the mesh refinement dictated by nonparametric description of the unknown function. We describe an approach to modifying a whole range of MCMC methods, applicable whenever the target measure has density with respect to a Gaussian process or Gaussian random field reference measure, which ensures that their speed of convergence is robust under mesh refinement. Gaussian processes or random fields are fields whose marginal distributions, when evaluated at any finite set of $N$ points, are $\mathbb{R}^N$-valued Gaussians. The algorithmic approach that we describe is applicable not only when the desired probability measure has density with respect to a Gaussian process or Gaussian random field reference measure, but also to some useful non-Gaussian reference measures constructed through random truncation. In the applications of interest the data is often sparse and the prior specification is an essential part of the overall modelling strategy. These Gaussian-based reference measures are a very flexible modelling tool, finding wide-ranging application. Examples are shown in density estimation, data assimilation in fluid mechanics, subsurface geophysics and image registration. The key design principle is to formulate the MCMC method so that it is, in principle, applicable for functions; this may be achieved by use of proposals based on carefully chosen time-discretizations of stochastic dynamical systems which exactly preserve the Gaussian reference measure. Taking this approach leads to many new algorithms which can be implemented via minor modification of existing algorithms, yet which show enormous speed-up on a wide range of applied problems.
💡 Research Summary
The paper addresses a fundamental bottleneck in Markov chain Monte Carlo (MCMC) methods when the target distribution lives on a function space, a situation that arises in Bayesian non‑parametrics, conditioned diffusion processes, and many high‑dimensional inverse problems. Traditional MCMC algorithms—random‑walk Metropolis, Metropolis‑adjusted Langevin (MALA), Hamiltonian Monte Carlo (HMC), etc.—are usually designed for finite‑dimensional Euclidean spaces. When a function is discretised on a mesh, the dimension grows with mesh refinement, and the proposal scales become mismatched with the target, causing acceptance probabilities to collapse and convergence to become arbitrarily slow.
The authors propose a unifying framework that makes a whole class of MCMC algorithms “mesh‑independent” by exploiting the fact that many Bayesian models can be written as a density with respect to a Gaussian process (or Gaussian random field) reference measure μ₀:
π(df) ∝ L(f) · μ₀(df),
where L(f) is the likelihood. The key design principle is to construct proposal kernels that exactly preserve μ₀. This is achieved by discretising stochastic dynamical systems whose invariant measure is μ₀. The simplest instance is the preconditioned Crank–Nicolson (pCN) proposal:
v = √(1‑β²) u + β ξ, ξ ∼ μ₀,
with β∈(0,1) a step‑size parameter. Because the transformation is μ₀‑invariant, the Metropolis acceptance ratio reduces to the likelihood ratio L(v)/L(u) only, and the acceptance probability does not depend on the mesh size.
Building on this idea, the paper shows how to “plug‑in” the pCN kernel into existing algorithms:
- pCN‑Random Walk Metropolis replaces the ordinary Gaussian random‑walk proposal with the μ₀‑preserving pCN step.
- pCN‑MALA retains the gradient drift from the likelihood but uses the pCN diffusion term, guaranteeing μ₀‑invariance while still exploiting local geometry.
- pCN‑HMC constructs a Hamiltonian system whose kinetic energy is defined by the covariance operator of μ₀; the leap‑frog integrator is then a symmetrised pCN update, preserving μ₀ exactly.
The authors also extend the framework to certain non‑Gaussian reference measures constructed via random truncation of an infinite series. By treating the truncation level as an auxiliary random variable and applying pCN conditionally, the same mesh‑independent properties are retained for priors such as sparse Bayesian models or Lévy‑process based constructions.
Theoretical contributions include proofs that the resulting chains are reversible with respect to π, possess a spectral gap that does not shrink with mesh refinement, and satisfy geometric ergodicity under mild conditions on β. Consequently, effective sample size (ESS) per unit computational effort remains stable as the discretisation becomes finer.
A suite of experiments demonstrates the practical impact. In Bayesian density estimation, pCN‑MALA achieves an ESS roughly 15 times larger than standard MALA. In fluid‑mechanics data assimilation (Navier‑Stokes initial‑condition inference) on a 64×64 grid, pCN‑HMC maintains an acceptance rate around 0.35, whereas conventional HMC drops below 0.05. Subsurface geophysics inversion and image registration problems also show order‑of‑magnitude speed‑ups. Across all cases, the algorithms exhibit the advertised mesh‑independence: refining the mesh by factors of two or four does not degrade convergence.
In conclusion, the paper provides a principled, easy‑to‑implement modification that turns many classical MCMC schemes into robust, infinite‑dimensional samplers. By anchoring proposals to a Gaussian reference measure that is exactly preserved, the authors eliminate the curse of dimensionality associated with mesh refinement. This work opens the door to scalable Bayesian inference for functional parameters across a broad spectrum of scientific and engineering applications, and suggests future directions such as adaptive step‑size selection, extensions to more general non‑Gaussian priors, and distributed implementations.
Comments & Academic Discussion
Loading comments...
Leave a Comment