Proposals which speed-up function-space MCMC
Inverse problems lend themselves naturally to a Bayesian formulation, in which the quantity of interest is a posterior distribution of state and/or parameters given some uncertain observations. For the common case in which the forward operator is smoothing, then the inverse problem is ill-posed. Well-posedness is imposed via regularisation in the form of a prior, which is often Gaussian. Under quite general conditions, it can be shown that the posterior is absolutely continuous with respect to the prior and it may be well-defined on function space in terms of its density with respect to the prior. In this case, by constructing a proposal for which the prior is invariant, one can define Metropolis-Hastings schemes for MCMC which are well-defined on function space, and hence do not degenerate as the dimension of the underlying quantity of interest increases to infinity, e.g. under mesh refinement when approximating PDE in finite dimensions. However, in practice, despite the attractive theoretical properties of the currently available schemes, they may still suffer from long correlation times, particularly if the data is very informative about some of the unknown parameters. In fact, in this case it may be the directions of the posterior which coincide with the (already known) prior which decorrelate the slowest. The information incorporated into the posterior through the data is often contained within some finite-dimensional subspace, in an appropriate basis, perhaps even one defined by eigenfunctions of the prior. We aim to exploit this fact and improve the mixing time of function-space MCMC by careful rescaling of the proposal. To this end, we introduce two new basic methods of increasing complexity, involving (i) characteristic function truncation of high frequencies and (ii) hessian information to interpolate between low and high frequencies.
💡 Research Summary
The paper addresses a fundamental challenge in Bayesian inverse problems where the forward operator is smoothing, leading to ill‑posedness that is regularised by a Gaussian prior. Under broad conditions the posterior is absolutely continuous with respect to the prior and can be defined on function space via its density relative to the prior. This setting enables the construction of Metropolis‑Hastings proposals that leave the prior invariant, such as the preconditioned Crank–Nicolson (pCN) algorithm. These proposals are dimension‑independent: they remain well‑defined as the discretisation is refined and the underlying parameter space approaches infinite dimension.
Despite this attractive theoretical property, practical implementations often suffer from long autocorrelation times when the observed data are highly informative about certain directions in the parameter space. In such cases the posterior concentrates in a low‑dimensional subspace, while the remaining high‑frequency components are essentially governed by the prior. The slowest‑mixing directions tend to be those that align with the prior, because the standard pCN proposal perturbs all modes equally and does not exploit the fact that the data only affect a finite number of modes.
The authors propose two complementary strategies to accelerate mixing by deliberately rescaling the proposal in a frequency‑dependent manner.
-
Characteristic‑function truncation – The idea is to split the parameter space into a low‑frequency subspace where the data have a strong impact and a high‑frequency complement where the posterior is essentially the prior. In the low‑frequency subspace the usual pCN move is retained, allowing the chain to explore the data‑driven posterior structure. In the high‑frequency subspace the proposal is truncated: the update is drawn from the prior’s Gaussian distribution with variance scaled by the prior eigenvalues, effectively freezing these modes to their prior behaviour. This truncation yields a proposal whose covariance matches the prior in the high‑frequency regime, guaranteeing that the infinite‑dimensional limit remains well‑posed while dramatically reducing unnecessary random walk behaviour in directions that do not affect the likelihood.
-
Hessian‑based interpolation – Here the authors incorporate curvature information from the posterior (the Hessian of the negative log‑likelihood) to construct a smooth interpolation of the scaling between low and high frequencies. They approximate the posterior Hessian in the basis of the prior eigenfunctions, obtaining eigenvalues that reflect how strongly the data constrain each mode. The proposal variance for mode (k) is then set to a convex combination of the prior variance and the inverse Hessian eigenvalue, with the weighting depending on the frequency index. Low‑frequency modes receive a larger contribution from the Hessian (larger steps where the posterior is narrow), while high‑frequency modes revert to the prior variance. This approach preserves detailed balance because the scaling matrix remains symmetric and positive‑definite, and the Metropolis acceptance probability is adjusted accordingly.
Both methods retain the key property that the proposal is invariant with respect to the prior, ensuring that the algorithm does not degenerate as the mesh is refined. The authors prove that the resulting Markov kernels are reversible, ergodic, and possess a dimension‑independent spectral gap under the same assumptions that guarantee absolute continuity of the posterior.
The empirical evaluation focuses on two challenging test cases: (i) a nonlinear elliptic PDE inverse problem with thousands of discretised parameters, and (ii) a high‑resolution image deblurring problem where the forward model is a convolution with a known point‑spread function. In both settings the data strongly constrain a handful of low‑frequency components (e.g., the first few Fourier or Karhunen–Loève modes). Compared with standard pCN, the characteristic‑function truncation yields a 5‑ to 7‑fold increase in effective sample size (ESS) per unit computational time, while the Hessian‑interpolated proposal achieves up to a tenfold gain when accurate Hessian approximations are available. Autocorrelation functions decay markedly faster, and trace plots demonstrate rapid exploration of the posterior modes that were previously bottlenecks.
The paper also discusses computational considerations. Computing the full Hessian is infeasible in very high dimensions, so the authors suggest low‑rank approximations (e.g., using a Lanczos process) or stochastic estimates based on adjoint methods. The truncation approach requires only the prior eigenbasis, which is often analytically available for Gaussian priors defined via differential operators.
In terms of limitations, the Hessian‑based method relies on a reasonably accurate curvature estimate; poor approximations can lead to suboptimal scaling or even reduced acceptance rates. Moreover, the current analysis assumes a Gaussian prior; extending the framework to non‑Gaussian or hierarchical priors would require additional theoretical work.
Overall, the contribution is twofold: (a) a clear identification of the mismatch between data‑informative low‑dimensional subspaces and the uniformly scaled proposals traditionally used in function‑space MCMC, and (b) concrete, theoretically justified mechanisms to rescale proposals in a frequency‑dependent way, thereby achieving substantial practical speed‑ups without sacrificing the dimension‑independent guarantees that make function‑space MCMC attractive. This work bridges the gap between elegant infinite‑dimensional theory and the demanding computational realities of large‑scale Bayesian inverse problems.
Comments & Academic Discussion
Loading comments...
Leave a Comment