The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC’s performance is highly sensitive to two user-specified parameters: a step size {\epsilon} and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS perform at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter {\epsilon} on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all. NUTS is also suitable for applications such as BUGS-style automatic inference engines that require efficient “turnkey” sampling algorithms.

💡 Research Summary

Hamiltonian Monte Carlo (HMC) is a powerful Markov chain Monte Carlo method that uses gradient information to propose distant moves, thereby avoiding the random‑walk behavior that plagues simpler samplers such as random‑walk Metropolis or Gibbs. Its efficiency, however, hinges on two user‑specified hyper‑parameters: the integrator step size ε and the number of leapfrog steps L. An ε that is too large yields low acceptance rates because the discretisation error becomes severe; an ε that is too small wastes computation. The choice of L is even more problematic: if L is too small the trajectory does not travel far enough, resulting in highly correlated samples; if L is too large the simulated particle doubles back on itself, performing unnecessary work and possibly violating detailed balance. Traditionally, practitioners must run costly pilot experiments or rely on expert intuition to tune these parameters, which limits the applicability of HMC in generic inference engines.

The No‑U‑Turn Sampler (NUTS) eliminates the need to pre‑specify L by dynamically building a set of candidate states until the simulated trajectory begins to make a “U‑turn”. The key stopping criterion is the sign of the inner product between the current momentum r and the vector from the starting position θ to the current position θ̃. When this inner product becomes negative, further integration would move the particle back toward its origin, so the expansion stops.

NUTS implements this criterion via a recursive binary‑tree construction. Starting from the current state, a direction (forward or backward in fictitious time) is chosen uniformly at random, and 2^j leapfrog steps are taken in that direction, where j is the current tree depth. This “doubling” process creates a balanced binary tree whose leaves correspond to (θ, r) pairs visited by the integrator. After each doubling, NUTS checks the U‑turn condition on the leftmost and rightmost leaves of every balanced subtree; if any subtree violates the condition, its expansion is halted. All visited states are collected in a set B, while a subset C of B that satisfies detailed balance and the slice constraint is formed deterministically. One element of C is then drawn uniformly as the next Markov state, guaranteeing that the transition kernel preserves the target distribution without requiring an explicit Metropolis acceptance step.

A slice variable u is introduced to simplify the construction: u is drawn uniformly from

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

💡 Research Summary

Comments & Academic Discussion

Leave a Comment