Robust adaptive Metropolis algorithm with coerced acceptance rate
The adaptive Metropolis (AM) algorithm of Haario, Saksman and Tamminen [Bernoulli 7 (2001) 223-242] uses the estimated covariance of the target distribution in the proposal distribution. This paper introduces a new robust adaptive Metropolis algorithm estimating the shape of the target distribution and simultaneously coercing the acceptance rate. The adaptation rule is computationally simple adding no extra cost compared with the AM algorithm. The adaptation strategy can be seen as a multidimensional extension of the previously proposed method adapting the scale of the proposal distribution in order to attain a given acceptance rate. The empirical results show promising behaviour of the new algorithm in an example with Student target distribution having no finite second moment, where the AM covariance estimate is unstable. In the examples with finite second moments, the performance of the new approach seems to be competitive with the AM algorithm combined with scale adaptation.
💡 Research Summary
The paper revisits the Adaptive Metropolis (AM) algorithm introduced by Haario, Saksman, and Tamminen (2001), which updates the proposal covariance matrix using the empirical covariance of the chain’s past states. While AM is widely used because it automatically learns the shape of the target distribution, its performance can deteriorate dramatically when the target has heavy tails or an infinite second moment. In such cases the empirical covariance may explode, leading to a proposal that is either too diffuse or too narrow, and the chain fails to converge. Existing remedies typically add a separate scale‑adaptation step that forces the acceptance rate toward a pre‑specified value, but this does not address the instability of the covariance estimate itself.
The authors propose a new “robust adaptive Metropolis” (RAM) algorithm that simultaneously adapts the shape of the proposal and coerces the acceptance rate. The key idea is to modify the covariance matrix Σₙ at each iteration by a scalar factor that depends on the discrepancy between the observed acceptance rate rₙ and a target rate r*. Concretely, the update rule is
Σₙ₊₁ = Σₙ × exp(γ (rₙ – r*)),
where γ > 0 is a small learning rate. If the chain is rejecting too many proposals (rₙ < r*), the exponent is negative, the factor is < 1, and the proposal covariance is enlarged, encouraging larger jumps. Conversely, if the acceptance rate is too high, the factor shrinks the covariance, making proposals more local. Because the factor is a scalar, the computational cost is negligible: the same O(d²) operations required for the standard AM covariance update are retained, and only one extra multiplication is needed. Hence the algorithm adds no asymptotic overhead, even in high‑dimensional settings.
The paper provides a thorough empirical evaluation. The first test case uses a univariate Student‑t distribution with one degree of freedom, which lacks a finite second moment. Standard AM’s covariance estimate diverges, and the chain essentially wanders without stabilising. The RAM algorithm, by continuously pulling the acceptance rate toward the target (typically 0.234 for multivariate proposals), prevents the covariance from blowing up and yields a well‑behaved chain that explores the heavy‑tailed target efficiently.
Subsequent experiments involve targets with finite second moments: a multivariate normal distribution, a beta distribution, and a gamma distribution. For each, the authors compare three methods: (i) the original AM, (ii) AM combined with a separate scale‑adaptation scheme (AM+Scale), and (iii) the proposed RAM. Performance metrics include the average acceptance probability, effective sample size (ESS), and integrated autocorrelation time (IACT). Across these benchmarks, RAM matches or slightly outperforms AM+Scale. It maintains an acceptance rate very close to the prescribed target, achieves comparable ESS, and often exhibits lower IACT, indicating more efficient exploration. Notably, in the multivariate normal case RAM converges faster during the burn‑in phase because the scalar acceptance‑rate correction quickly stabilises the covariance estimate, whereas AM+Scale may require more iterations to tune both shape and scale separately.
The authors also discuss theoretical convergence. The scalar update can be interpreted as a Robbins–Monro stochastic approximation with step sizes γₙ that satisfy ∑γₙ = ∞ and ∑γₙ² < ∞. Under these conditions, the adaptive Markov chain retains the target distribution as its invariant measure, following the framework of Andrieu and Thoms (2008) for adaptive MCMC. Hence the RAM algorithm is provably ergodic, provided the usual regularity conditions (boundedness of the proposal density, diminishing adaptation) hold.
In conclusion, the paper introduces a practically simple yet theoretically sound modification of AM that simultaneously learns the covariance structure and enforces a desired acceptance rate. The method is computationally cheap, robust to heavy‑tailed targets lacking finite variance, and competitive with existing scale‑adapted AM variants on standard problems. The authors suggest future work on automatic tuning of the learning rate γ, extensions to hierarchical Bayesian models with very high dimensionality, and integration with variational approximations for hybrid inference schemes.
Comments & Academic Discussion
Loading comments...
Leave a Comment