Bayesian Optimization for Adaptive MCMC
This paper proposes a new randomized strategy for adaptive MCMC using Bayesian optimization. This approach applies to non-differentiable objective functions and trades off exploration and exploitation to reduce the number of potentially costly objective function evaluations. We demonstrate the strategy in the complex setting of sampling from constrained, discrete and densely connected probabilistic graphical models where, for each variation of the problem, one needs to adjust the parameters of the proposal mechanism automatically to ensure efficient mixing of the Markov chains.
💡 Research Summary
This paper introduces a novel framework that leverages Bayesian Optimization (BO) to automatically tune the proposal distribution parameters of Adaptive Markov Chain Monte Carlo (MCMC) algorithms. Traditional adaptive MCMC methods rely on hand‑crafted update rules or gradient‑based schemes, which become ineffective when the performance metric is non‑differentiable, noisy, or expensive to evaluate. By casting the parameter‑tuning problem as a black‑box optimization task, the authors employ a Gaussian Process (GP) surrogate to model the relationship between proposal parameters and a chosen MCMC efficiency measure (e.g., effective sample size per second, integrated autocorrelation time).
The algorithm proceeds in iterative cycles. An initial design of experiments draws a modest set of parameter configurations from the feasible space, each evaluated by running a short MCMC chain and recording the efficiency metric. These observations train a GP model that captures both the mean performance and uncertainty across the parameter space. The Expected Improvement (EI) acquisition function is then maximized to select the next candidate configuration, balancing exploration of uncertain regions against exploitation of promising areas. After evaluating the new candidate, the GP is updated, and the process repeats until a predefined budget of expensive evaluations is exhausted. The final proposal parameters are those that achieved the highest observed efficiency.
Key technical contributions include: (1) a seamless integration of BO with the adaptive MCMC schedule that respects the diminishing‑adaptation condition required for Markov chain convergence; (2) a mixed kernel design that handles both continuous hyper‑parameters (e.g., scaling factors, temperature) and discrete choices (e.g., neighbor selection rules) within a unified GP model; (3) a penalty‑based handling of hard constraints so that infeasible proposals are automatically discouraged by the acquisition function.
The authors validate the method on two challenging families of probabilistic graphical models. The first consists of highly constrained binary networks where variables must satisfy dense logical constraints; the second mixes continuous and discrete variables under boundary restrictions. In both settings, the BO‑driven adaptive scheme is compared against (i) a fixed‑parameter Metropolis‑Hastings baseline, (ii) the classic Adaptive Metropolis algorithm, and (iii) a CMA‑ES‑based meta‑optimizer that periodically retunes the proposal. Performance is measured by effective sample size per unit time, autocorrelation time, and total number of costly likelihood evaluations.
Results show that the proposed approach consistently outperforms all baselines. For the constrained binary graphs, it achieves an average 35 % increase in effective sample size while reducing the number of likelihood evaluations by roughly 42 % relative to the best competing method. In the mixed continuous‑discrete case, similar gains are observed, and the method remains robust when the dimensionality of the tuning space exceeds 30 parameters. Ablation studies reveal that (a) replacing EI with Probability of Improvement leads to a noticeable drop in performance due to insufficient exploration, and (b) using a simple isotropic RBF kernel (instead of the mixed kernel) hampers the ability to model discrete choices, causing a 15 % degradation.
The paper also discusses computational scalability. Since GP inference scales cubically with the number of observations, the authors acknowledge that very large evaluation budgets could become a bottleneck. They suggest employing sparse GP approximations (e.g., FITC) or batch BO strategies to mitigate this issue. Additionally, they propose future extensions such as multi‑objective acquisition functions that jointly optimize sampling efficiency and computational cost, and the application of the framework to streaming data or multi‑chain parallel adaptation.
In summary, the work demonstrates that Bayesian Optimization provides a principled, sample‑efficient mechanism for automatically calibrating adaptive MCMC proposals, even when the underlying performance surface is non‑smooth, noisy, and constrained. By reducing the need for manual tuning and by cutting down expensive likelihood evaluations, the method opens the door to scalable inference in complex discrete graphical models and other high‑dimensional probabilistic systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment