On the Computational Complexity of MCMC-based Estimators in Large Samples
In this paper we examine the implications of the statistical large sample theory for the computational complexity of Bayesian and quasi-Bayesian estimation carried out using Metropolis random walks. Our analysis is motivated by the Laplace-Bernstein-Von Mises central limit theorem, which states that in large samples the posterior or quasi-posterior approaches a normal density. Using the conditions required for the central limit theorem to hold, we establish polynomial bounds on the computational complexity of general Metropolis random walks methods in large samples. Our analysis covers cases where the underlying log-likelihood or extremum criterion function is possibly non-concave, discontinuous, and with increasing parameter dimension. However, the central limit theorem restricts the deviations from continuity and log-concavity of the log-likelihood or extremum criterion function in a very specific manner. Under minimal assumptions required for the central limit theorem to hold under the increasing parameter dimension, we show that the Metropolis algorithm is theoretically efficient even for the canonical Gaussian walk which is studied in detail. Specifically, we show that the running time of the algorithm in large samples is bounded in probability by a polynomial in the parameter dimension $d$, and, in particular, is of stochastic order $d^2$ in the leading cases after the burn-in period. We then give applications to exponential families, curved exponential families, and Z-estimation of increasing dimension.
💡 Research Summary
The paper investigates how large‑sample statistical theory influences the computational burden of Bayesian and quasi‑Bayesian estimators that rely on Metropolis random‑walk Markov chain Monte Carlo (MCMC) methods. The authors anchor their analysis in the Laplace‑Bernstein‑von Mises (LBV) central limit theorem, which asserts that, as the sample size (n) grows, the posterior (or quasi‑posterior) distribution converges to a multivariate normal distribution centered at the true parameter (\theta_{0}) with covariance equal to the inverse of the Fisher information matrix. By explicitly stating the minimal regularity conditions required for the LBV theorem to hold when the parameter dimension (d) may increase with (n), the authors create a bridge between asymptotic normality and algorithmic performance.
The methodological core focuses on the simplest Metropolis–Hastings scheme: a Gaussian random‑walk proposal (q(\theta’|\theta)=\mathcal N(\theta, \sigma^{2}I_{d})). The step‑size (\sigma) is calibrated as a function of (n) and (d) (typically (\sigma^{2}=c,n^{-1})) so that proposals remain local relative to the high‑probability region of the posterior. Under the LBV assumptions, the log‑likelihood (or extremum criterion) can be locally approximated by a quadratic form whose curvature is governed by the information matrix. This quadratic approximation yields a lower bound on the Metropolis acceptance probability that does not deteriorate faster than a polynomial in (d). Consequently, the mixing time of the chain—measured after an initial burn‑in—scales as (O(d^{2})) with high probability. In probabilistic notation, the running time (T_{n,d}) satisfies (T_{n,d}=O_{p}(d^{2})), i.e., for any (\varepsilon>0) there exists a constant (C_{\varepsilon}) such that (\Pr(T_{n,d}\le C_{\varepsilon}d^{2})\ge 1-\varepsilon) for sufficiently large (n).
A notable contribution is that the analysis does not require global log‑concavity or smoothness of the objective function. The LBV theorem only imposes “local” regularity: the log‑likelihood must be twice differentiable in a shrinking neighbourhood of (\theta_{0}) and its higher‑order derivatives must be sufficiently small relative to (n) and (d). Hence, even if the likelihood is non‑concave, discontinuous, or exhibits irregular behaviour away from the mode, the Metropolis algorithm remains theoretically efficient as long as the posterior mass concentrates in a region where the quadratic approximation holds. This insight broadens the applicability of the result to models where traditional MCMC convergence guarantees are unavailable.
The authors illustrate the theory with three classes of models. First, standard exponential families satisfy all LBV conditions automatically; the resulting complexity bound confirms the well‑known efficiency of random‑walk Metropolis in these settings. Second, curved exponential families—where the natural parameter is a smooth, possibly nonlinear transformation of a lower‑dimensional vector—also meet the conditions provided the transformation is sufficiently smooth and the dimension grows slowly. Third, Z‑estimation problems (including generalized method of moments) are treated by viewing the estimating equations as a quasi‑likelihood; when the estimating functions satisfy a central‑limit type condition, the same (O(d^{2})) bound follows. In each case, the paper shows how the abstract polynomial bound translates into concrete guidance on step‑size selection, burn‑in length, and the number of draws required for a desired Monte‑Carlo error.
From a practical standpoint, the result offers a rigorous justification for using simple Gaussian random‑walk Metropolis in high‑dimensional, large‑sample Bayesian analyses, even when the underlying objective is not globally well‑behaved. The polynomial bound implies that, as long as the dimension does not explode faster than a sub‑linear function of the sample size, the computational cost grows only quadratically with (d), which is manageable for many modern applications (e.g., hierarchical models, latent variable models, and high‑dimensional generalized linear models). Moreover, the analysis provides a template for extending the complexity results to more sophisticated proposals (e.g., Langevin or Hamiltonian dynamics) by comparing their local acceptance behaviour to the Gaussian baseline.
In summary, the paper establishes that under the minimal regularity needed for the Laplace‑Bernstein‑von Mises theorem, Metropolis random‑walk MCMC algorithms enjoy a provably polynomial computational complexity in large samples. The leading order bound of (d^{2}) after burn‑in holds for a wide range of models—including non‑concave and discontinuous likelihoods—thereby offering both theoretical insight and practical guidance for high‑dimensional Bayesian computation.
Comments & Academic Discussion
Loading comments...
Leave a Comment