An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration
While statisticians are well-accustomed to performing exploratory analysis in the modeling stage of an analysis, the notion of conducting preliminary general-purpose exploratory analysis in the Monte Carlo stage (or more generally, the model-fitting stage) of an analysis is an area which we feel deserves much further attention. Towards this aim, this paper proposes a general-purpose algorithm for automatic density exploration. The proposed exploration algorithm combines and expands upon components from various adaptive Markov chain Monte Carlo methods, with the Wang-Landau algorithm at its heart. Additionally, the algorithm is run on interacting parallel chains – a feature which both decreases computational cost as well as stabilizes the algorithm, improving its ability to explore the density. Performance is studied in several applications. Through a Bayesian variable selection example, the authors demonstrate the convergence gains obtained with interacting chains. The ability of the algorithm’s adaptive proposal to induce mode-jumping is illustrated through a trimodal density and a Bayesian mixture modeling application. Lastly, through a 2D Ising model, the authors demonstrate the ability of the algorithm to overcome the high correlations encountered in spatial models.
💡 Research Summary
The paper introduces a new general‑purpose Monte Carlo algorithm, called Parallel Adaptive Wang‑Landau (PAWL), designed to automatically explore complex probability densities. The core of PAWL is the Wang‑Landau (WL) algorithm, which biases a Markov chain so that it visits a pre‑defined partition of the state space uniformly. The authors identify two main limitations of the classic WL method: (i) the need for a manually chosen partition (the reaction coordinate and number of bins) and (ii) the use of a single chain, which can become trapped in local modes for high‑dimensional or multimodal targets.
To overcome these issues, three innovations are incorporated:
-
Adaptive Binning – The state space is partitioned along a reaction coordinate (typically the negative log‑density). During sampling, the algorithm monitors the visitation frequencies of each bin. If a bin is over‑populated, it is split; if several neighboring bins are under‑populated, they are merged. This dynamic binning removes the need for a priori bin selection and keeps the bias estimates stable as the number of bins changes.
-
Interacting Parallel Chains – Multiple WL chains run simultaneously. Each chain updates its own bias vector θ and proposal parameters η, but at regular synchronization points the vectors are averaged across chains. This “collaborative adaptation” reduces the variance of the bias estimates, spreads exploration effort across different regions, and accelerates convergence, especially in high‑dimensional settings where a single chain would spend excessive time in a single mode.
-
Adaptive Proposal Distribution – The proposal kernel is adapted on the fly, similarly to Adaptive Metropolis algorithms. The covariance of the proposal is estimated from recent samples, but it is also scaled according to the current WL bias so that the proposal matches the flattened (biased) target distribution ˜π_θ. Early in the run the proposal makes small local moves; as the bias converges, the step size is increased, encouraging jumps between distant modes.
The algorithm proceeds as follows: (i) initialise a partition and set all bias entries θ(i)=1; (ii) choose a decreasing learning rate γ_t (typically 1/t); (iii) each chain draws a new state using a Metropolis‑Hastings kernel that leaves ˜π_θ invariant; (iv) update the bias log θ(i)←log θ(i)+γ_t
Comments & Academic Discussion
Loading comments...
Leave a Comment