Variable length trajectory compressible hybrid Monte Carlo

Hybrid Monte Carlo (HMC) generates samples from a prescribed probability distribution in a configuration space by simulating Hamiltonian dynamics, followed by the Metropolis (-Hastings) acceptance/rejection step. Compressible HMC (CHMC) generalizes H…

Authors: Akihiko Nishimura, David Dunson

Variable length trajectory compressible hybrid Monte Carlo
V a riable length trajecto ry comp ressible hyb rid Monte Ca rlo Akihik o Nishimura 1 , a) and David Dunson 2 1) Dep artment of Mathematics, Duke University, Durham, North Carolina, 27708, USA 2) Dep artment of Statistical Scienc e, Duke University, Durham, North Car olina, 27708, USA (Dated: 1 No vem b er 2021) Hybrid Mon te Carlo (HMC) generates samples from a prescrib ed probabilit y distribution in a con- figuration space b y simulating Hamiltonian dynamics, follo wed b y the Metrop olis (-Hastings) ac- ceptance/rejection step. Compressible HMC (CHMC) generalizes HMC to a situation in which the dynamics is rev ersible but not necessarily Hamiltonian. This article presents a framework to fur- ther extend the algorithm. Within the existing framework, eac h tra jectory of the dynamics m ust b e integrated for the same amoun t of (random) time to generate a v alid Metrop olis prop osal. Our generalized acceptance/rejection mechanism allows a more delib erate choice of the integration time for each tra jectory . The prop osed algorithm in particular enables an effective application of v ariable step size integrators to HMC-type sampling algorithms based on reversible dynamics. The p oten tial of our framework is further demonstrated b y another extension of HMC whic h reduces the wasted computations due to unstable n umerical appro ximations and corresp onding rejected proposals. I. INTRODUCTION A study of molecular systems often relies on generating random v ariables from a prescribed (unnormalized) prob- abilit y distribution ρ ( q ) ∝ exp( − U ( q )) on the configura- tion space. Mark ov c hain Mon te Carlo (MCMC) gener- ates samples from a target distribution b y constructing a Marko v chain whose stationary distribution coincides with the target distribution. Such a Mark ov chain can b e realized b y building a transition rule that satisfies the de- tailed balance condition. MCMC based on the Metropo- lis (-Hastings) algorithm 1 is a general sampling approac h widely used in computational physical science as w ell as in Bay esian statistics and machine learning. Man y suc h algorithms are inefficient, pro ducing highly corre- lated samples, and require a large num ber of iterations to adequately c haracterize the target distribution 2–4 . Hybrid Monte Carlo 5 (HMC) constructs the prop osal distribution for the Metrop olis algorithm by sim ulating molecular dynamics (MD), a pro cedure that can greatly reduce the correlation among successiv e MCMC samples. More precisely , HMC augments the state space by intro- ducing a momentum v ariable p ; the original v ariable q is often referred to as p osition v ariable in the HMC frame- w ork. In this augmented state space ( q , p ), the prop osal distribution for Metrop olis is constructed b y solving an ordinary differential equation (ODE) corresp onding to Newton’s equations of motion with resp ect to the p oten- tial energy U ( q ). There are applications in whic h (par- tial) analytical solutions to an ODE can b e exploited 6–8 , but in general ODEs are discretized and integrated n u- merically . Within the original HMC framework, an integrator for sim ulating MD m ust be rev ersible and volume-preserving to pro duce a v alid Metrop olis prop osal 9 . In fact, the v olume-preserving property can be relaxed by including a) Electronic mail: an88@duke.edu. a Jacobian factor in the calculation of the Metrop olis acceptance probability . 10,11 Under this generalization of HMC, any rev ersible (discrete) dynamics / bijective map can b e applied to generate a prop osal state. This al- gorithm is formalized as c ompr essible HMC (CHMC) in Ref. 12. A generalization of HMC known as Rieman- nian manifold HMC (RMHMC) 13 also falls within the framew ork of CHMC. This article presents an algorithm to relax another condition required by (compressible) HMC. Given a re- v ersible map F and state ( q , p ), CHMC prop oses the next state by applying the map n times, where the n um- b er of steps n can b e drawn randomly at each iteration. Though often not stated explicitly , the detailed balance requires the n umber of steps to b e determined indep en- den tly of the tra jectory { ( q , p ) , F ( q , p ) , F 2 ( q , p ) , . . . } . As w e will show in Section I II, this constraint can prev en t realizing the full potential of MCMC algorithms based on rev ersible dynamics. Our algorithm generalizes the acceptance-rejection mec hanism behind CHMC to allow the num b er of steps to dep end on each tra jectory of the dynamics while pre- serving the detailed balance. The n umber of numer- ical integration steps taken in simulating a tra jectory of HMC is commonly referred to as the “path length” of a tra jectory in the statistics literature. W e there- fore call our algorithm variable length tr aje ctory CHMC (VL T-CHMC). It should b e mentioned that the No-U- T urn-Sampler (NUTS) is another v ariant of HMC that allo ws the path lengths to v ary from one tra jectory to another. 14 Ho wev er, the motiv ation b ehind NUTS is to spare a user the trouble of manually tuning the num- b er of steps, and NUTS in general p erforms no b etter than HMC with well-c hosen path lengths. 14,15 On the other hand VL T-CHMC can improv e the p erformance of CHMC in a more fundamental and significan t wa y . In particular, VL T-CHMC enables an effective application of rev ersible v ariable step size in tegrators to HMC-type sampling algorithms based on rev ersible dynamics. 2 The rest of the paper is organized as follo ws. Section II reviews the main ideas b ehind CHMC and provides an example in which the compressible dynamics arises from the use of non-traditional integrators in HMC settings. Suc h integrators ha ve prov en to b e more efficient than the commonly used v olume-preserving in tegrators in v ar- ious applications. The example also serv es to introduce the notations and concepts needed in the next section, where VL T-CHMC is motiv ated as a metho d to effec- tiv ely apply v ariable step size integrators in HMC set- tings. The presentation is self-contai ned, but some fa- miliarit y with HMC is assumed. VL T-CHMC is devel- op ed in Section I II. Section II I A explains how the ex- isting framew ork limits the utility of v ariable step size in tegrators to sampling algorithms. The key observ ation in addressing this issue leads to a sp ecial case of VL T- CHMC. More general construction of VL T-CHMC is pro- vided in Section I II B. Section I I I C presents another use case of VL T-CHMC, where HMC is mo dified to reduce the wasted computation due to unstable numerical ap- pro ximations and corresp onding rejected prop osals. The sim ulation results are sho wn in Section IV to demonstrate the p oten tial gains from the framework of VL T-CHMC. I I. REVIEW OF COMPRESSIBLE HMC A. Basic Theory T o keep the description of CHMC and the subsequent dev elopment of VL T-CHMC more intuitiv e, the version of CHMC describ ed here is sligh tly less general than the one in Ref. 12. It is straightforw ard to extend the v ariable length tra jectory algorithm of Section I II to the general settings. A bijective map F is said to b e r eversible if F − 1 = R ◦ F ◦ R (1) or equiv alently ( R ◦ F ) − 1 = R ◦ F for an inv olution R (i.e. R ◦ R = id). Note that the rev ersiblity of F implies that of F n for any n . Let D F n denote the Jacobian matrix of F n and | D F n | its determinant. Given a state z and integer n , CHMC prop oses the state z ∗ = R ◦ F n ( z ) and accepts or rejects the prop osal with probabilit y min  1 , ρ ( z ∗ ) | D F n ( z ) | ρ ( z )  (2) T o see that this transition rule satisfies the detailed bal- ance with resp ect to ρ ( · ), consider a small neigh b orhoo d B around z and B ∗ = R ◦ F n ( B ) around z ∗ , so that R ◦ F n ( B ∗ ) = B . The prop osal mov e sends the proba- bilit y mass Z B ρ ( z 0 )d z 0 ≈ ρ ( z )v ol( B ) from B to B ∗ . On the other hand, the mass sent from B ∗ to B by the prop osal mov e can b e seen to be Z B ∗ ρ ( z 0 )d z 0 = Z B ρ ( z 0 ) | D ( R ◦ F n )( z 0 ) | d z 0 ≈ ρ ( z ∗ ) | D F n ( z ) | v ol( B ) b y the change of v ariable formula and the fact | R | = 1. The acceptance and rejection step of CHMC amoun ts to rejecting the fraction of mov e b y the ratio of the proba- bilit y fluxes and th us imp oses the detailed balance. The abov e transition rule preserves the target density ρ ( · ) for an y n , so in practice the num b er of steps can be dra wn randomly at each iteration of CHMC. The steps of CHMC are summarized in Algorithm 1 b elo w, where the distribution p ( · ) for the num ber of steps is a tuning parameter a user m ust specify . The use of a deterministic map as a proposal distribution do es not yield an ergodic Mark ov chain, and therefore such a transition rule m ust b e alternated with another transition rule that preserves the target density ρ ( · ), as done in Step 1 of the algorithm. W e do not concern ourselves here with how to choose such a random mov e since the choice dep ends critically on the particular form of ρ ( · ). Algorithm 1 (Compressible HMC) . With a presp ecified probabilit y mass function p ( · ) on Z + , CHMC generates a Mark ov chain { z ( m ) } m with the follo wing transition rule z ( m ) → z ( m +1) : 1. Make a random change z ( m ) → z that preserves the target densit y ρ ( · ). 2. Sample n ∼ p ( · ) and prop ose the state z ∗ = R ◦ F n ( z ). 3. Let z ( m +1) = z ∗ with probability min  1 , ρ ( z ∗ ) | D F n ( z ) | ρ ( z )  Otherwise, let z ( m +1) = z . B. Example: (Riemann manifold) HMC with non-volume-p reserving integrato rs HMC and its extension Riemann manifold HMC (RMHMC) construct a reversible and volume-preserving bijectiv e map by numerically approximating Hamilto- nian dynamics. T o this end, they require a geometric in tegrator that preserv es the rev ersibility and v olume- preserv ation prop ert y of Hamiltonian dynamics. Under the CHMC framew ork, ho wev er, Hamiltonian dynamics can b e approximated using a wider range of in tegration tec hniques. In order to sample from a probability density of in- terest ρ 0 ( q ) ∝ exp( − U ( q )) in R d , RMHMC introduces an auxiliary v ariable p ∈ R d whose distribution is de- fined conditionally as p | q ∼ N ( 0 , M ( q )) for a fam- ily of p ositiv e definite matrices kno wn as mass tensors 3 { M ( q ) } q . 9,13,16 The joint density ρ ( q , p ) in the phase space then is given as ρ ( q , p ) ∝ exp( − H ( q , p )) where the Hamiltonian H ( q , p ) is given by H ( q , p ) = U ( q ) + 1 2 p T M ( q ) − 1 p + 1 2 log | M ( q ) | (3) The prop osal is generated b y approximating the solution to Hamilton ’s e quations : d q d t = ∇ p H ( q , p ) , d p d t = −∇ q H ( q , p ) (4) F or the Hamiltonian (3), the solution operator of (4) is reversible with resp ect to a momen tum flip op erator R ( q , p ) = ( q , − p ). Solving (4) using a reversible inte- grator with a constant step size ∆ t yields a reversible map F ∆ t so that F n ∆ t ( q 0 , p 0 ) ≈ ( q ( n ∆ t ) , p ( n ∆ t )) (5) where { ( q ( t ) , p ( t )) } t denotes the exact solution with the initial condition ( q 0 , p 0 ). In other words, F ∆ t appro xi- mates the solution op erator Φ ∆ t of (4) defined through the relation d Φ t d t =  ( ∇ p H ) ◦ Φ t , − ( ∇ q H ) ◦ Φ t  (6) for all t . If the rev ersible map F ∆ t is further required to b e volume preserving, then we hav e | DF n ∆ t | = 1 and the Jacobian factor drops from (2), recov ering HMC and RMHMC algorithms of Ref. 5 and 13. In some applica- tions how ev er, non-volume-preserving appro ximations of (4) hav e b een shown to offer substantial gains in compu- tational efficiency . 11,12 F or example, Lan et. al. 11 considers the ODE corre- sp onding to (4) in terms of reparametrization ( q , v ) = ( q , M ( q ) − 1 p ). The reparametrized ODE admits semi- explicit and explicit reversible approximations, requir- ing fewer or no fixed p oin t iterations compared to the St¨ ormer-V erlet integrator typically employ ed in RMHMC. The prop osal mov e using a simulated tra jec- tory is alternated with sampling v from its conditional densit y v | q ∼ N ( 0 , M ( q ) − 1 ), a random mov e corre- sp onding to Step 1 in Algorithm 1. The CHMC algo- rithm based on the semi-explicit and explicit integrator are found to significantly outp erform RMHMC based on the St¨ ormer-V erlet integrator o ver a range of examples. I II. V ARIABLE LENGTH TRAJECTORY CHMC V ariable length tra jectory CHMC (VL T-CHMC) is most naturally motiv ated as a metho d to effectiv ely ap- ply v ariable step size integrators in RMHMC settings. F or this reason, we first develop this special case of VL T- CHMC in Section I II A. A more general theory is dev el- op ed in Section II I B. Section II I C illustrates the use and p oten tial b enefits of the general VL T-CHMC algorithm through another example. A. Sp ecial case of VL T-CHMC 1. Motivation: RMHMC with variable step size integrato rs and limitations of CHMC In Section I I B, we discussed how CHMC allows us to approximate Hamiltonian dynamics with non-volume- preserving in tegrators and still generate a v alid Metropo- lis prop osal. W e in particular considered the use of a rev ersible integrator with a constant step size. A wider range of reversible in tegration techniques for Hamiltonian systems are a v ailable in the literature, how ever, including a num ber of v ariable step size integrators. 17–20 In theory , a v ariable step size in tegrator similarly pro duces a v alid CHMC prop osal as long as the integrator is reversible. Ho wev er, the use of such an in tegrator under the exist- ing CHMC framework generally leads to an algorithm with sub optimal sampling efficiency , for the reasons w e describ e now. Eac h step of a v ariable step size integrator approxi- mates the evolution ( q ( t n ) , p ( t n )) → ( q ( t n + ∆ t n ) , p ( t n + ∆ t n )) where the step size ∆ t n dep ends on the cur- ren t state ( q ( t n ) , p ( t n )) through a step size c ontr ol ler g ( q , p ). The simplest c hoice of step size would b e ∆ t n = g ( q ( t n ) , p ( t n ))∆ s , but the reversibilit y requires a sligh tly more sophisticated relationship and the con- dition g ( q , p ) = g ( q , − p ) (see Section I II A 2). Most imp ortan tly for our discussion, a v ariable step size sc heme is equiv alent to approximating the following time- r esc ale d Hamiltonian dynamics in a new time scale d s = g ( q , p ) − 1 d t with a constant step size ∆ s : d q d s = g ( q , p ) ∇ p H ( q , p ) , d p d s = − g ( q , p ) ∇ q H ( q , p ) (7) In other words, a rev ersible v ariable step size approxima- tion of (4) yields a rev ersible map F ∆ s suc h that F n ∆ s ( q 0 , p 0 ) ≈ ( q ( n ∆ s ) , p ( n ∆ s )) (8) where { q ( s ) , p ( s ) } s is the solution to the time-rescaled dynamics (7) with the initial condition ( q 0 , p 0 ). This implicit time-rescaling b ehind v ariable step size in tegration causes trouble for CHMC. The utility of Hamiltonian dynamics (4) as a prop osal generation mec h- anism stems from the fact that ρ ( q , p ) ∝ exp( − H ( q , p )) is the invariant distribution of the dynamics i.e. if ( q 0 , p 0 ) has the distribution ρ ( q , p ) ∝ exp( − H ( q , p )), then Φ t ( q 0 , p 0 ) also has the same distribution ρ ( · ) for all t . As a consequence, the prop osal generated by an approx- imate solution ( q ∗ , p ∗ ) = F n ∆ t ( q 0 , p 0 ) as in (5) can b e accepted with probability 1 in the limit ∆ t → 0 and n ∆ t → t 0 . On the other hand, the time-rescaled dynam- ics (7) in general do es not preserve the target density ρ ( q , p ), and the prop osal generated by the appro ximate solution ( q ∗ , p ∗ ) = F n ∆ s ( q 0 , p 0 ) may not b e accepted with high probability ev en in the limit ∆ s → 0 and n ∆ s → s 0 . In fact, the acceptance probability of the CHMC prop osal 4 in the limit is giv en b y: min  1 , g ( q ( s 0 ) , p ( s 0 )) g ( q 0 , p 0 )  (9) where { q ( s ) , p ( s ) } s denotes the solution to (7) with the initial condition ( q 0 , p 0 ). The deriv ation is given in Ap- p endix A. 2. Algo rithm: variable length trajectory scheme for time-rescaled dynamics In order to address the issue caused by the implicit time-rescaling asso ciated with v ariable step size inte- grators, VL T-CHMC approximates the dynamics in the original time scale as follows. Fix the initial condition ( q 0 , p 0 ) and denote ( q i , p i ) = F i ∆ s ( q 0 , p 0 ) where F ∆ s ap- pro ximates the dynamics in the time scale s as in (8). The evolution ( q 0 , p 0 ) → ( q ( t ) , p ( t )) in the original time scale can b e appro ximated b y taking the tra jectory de- p enden t num b er of steps N ( q 0 , p 0 ) = N ( t, q 0 , p 0 ) defined as N ( q 0 , p 0 ) = min ( n : n X i =1 ∆ s 2 ( g ( q i − 1 , p i − 1 ) + g ( q i , p i )) > t ) (10) No w w e consider the map F N ∆ s defined as F N ∆ s ( q , p ) = F N ( q , p ) ∆ s ( q , p ) (11) whic h approximates the solution op erator Φ t as defined in (6). The map how ever cannot b e used directly to generate a prop osal b ecause in general it is neither re- v ersible or even bijectiv e. The map would be reversible if N ( q ∗ 0 , p ∗ 0 ) = N ( q 0 , p 0 ) where ( q ∗ 0 , p ∗ 0 ) = R ◦ F N ∆ s ( q 0 , p 0 ), but (10) only implies N ( q ∗ 0 , p ∗ 0 ) ≤ N ( q 0 , p 0 ). F or exam- ple when g ( q ∗ 0 , p ∗ 0 )  g ( q 0 , p 0 ), the simulated time along the reverse tra jectory  ( q ∗ i , p ∗ i ) = F i ∆ s ( q ∗ 0 , p ∗ 0 )  n i =0 n X i =1 ∆ s 2  g ( q ∗ i − 1 , p ∗ i − 1 ) + g ( q ∗ i , p ∗ i )  will likely reach the threshold t b efore n = N ( q 0 , p 0 ) steps. The key observ ation b ehind VL T-CHMC is that we can nonetheless construct collections of states S and S ∗ con taining ( q 0 , p 0 ) and ( q ∗ 0 , p ∗ 0 ) such that R ◦ F N ∆ s ( S ) ⊂ S ∗ and R ◦ F N ∆ s ( S ∗ ) ⊂ S R ◦ F N ∆ s ( S c ) ⊂ ( S ∗ ) c and R ◦ F N ∆ s (( S ∗ ) c ) ⊂ S c (12) The existence of suc h sets S and S ∗ is a prop ert y of the map F N ∆ s and generalizes the notion of rev ersibility (1). The set S is essentially the pre-image of { ( q ∗ 0 , p ∗ 0 ) } under R ◦ F N ∆ s and can b e constructed b y defining S = { ( q − ` , p − ` ) , . . . , ( q r , p r ) } by choosing `, r ≥ 0 such that ` = max  j ≥ 0 : F N ∆ s ( q − j , p − j ) = F N ∆ s ( q 0 , p 0 )  r = max  j ≥ 0 : F N ∆ s ( q j , p j ) = F N ∆ s ( q 0 , p 0 )  (13) Algorithmically , ` and r can b e found by solving the dy- namics backw ard and forward from ( q 0 , p 0 ) using the equiv alen t definitions b elo w: ` = max    j ≥ 0 : N ( t, q 0 , p 0 ) − 1 X i = − j ∆ t i < t    r = max    j ≥ 0 : N ( t, q 0 , p 0 ) X i = j ∆ t i > t    where ∆ t i = ∆ s 2 ( g ( q i − 1 , p i − 1 ) + g ( q i , p i )) (14) The set S ∗ is the pre-image of { ( q r , p r ) } under R ◦ F N ∆ s and can analogously b e constructed. Denoting ( q ∗ i , p ∗ i ) = F i ∆ s ( q ∗ 0 , p ∗ 0 ), let S ∗ =  ( q ∗ − ` ∗ , p ∗ − ` ∗ ) , . . . , ( q ∗ r ∗ , p ∗ r ∗ )  where ` ∗ , r ∗ ≥ 0 is defined as ` ∗ = max  j ≥ 0 : F N ∆ s ( q ∗ − j , p ∗ − j ) = F N ∆ s ( q ∗ 0 , p ∗ 0 )  r ∗ = max  j ≥ 0 : F N ∆ s ( q ∗ j , p ∗ j ) = F N ∆ s ( q ∗ 0 , p ∗ 0 )  (15) It is shown in App endix B that the ab o ve definition ac- tually implies r ∗ = 0. The proof of (12) and of other facts regarding S and S ∗ are also giv en in App endix B. Ha ving constructed the sets S and S ∗ with the prop- ert y (12), VL T-CHMC imp oses the detailed balance by rejecting a fraction of mo ves b et ween S and S ∗ as de- scrib ed in Algorithm 2 b elo w. Algorithm 2 (VL T-CHMC) . Given a reversible map F ∆ s as in (8) and a tra jectory length function N as in (10), VL T-CHMC generates a Marko v c hain { ( q ( m ) , p ( m ) ) } m with the follo wing transition rule ( q ( m ) , p ( m ) ) → ( q ( m +1) , p ( m +1) ): 1. Sample p 0 from the conditional densit y p | q ( m ) and set q 0 = q ( m ) . 2. Find the indices `, r , ` ∗ , r ∗ as in (13) and (15) by sim ulating the dynamics forward and backw ard from ( q 0 , p 0 ) and R ◦ F N ∆ s ( q 0 , p 0 ). Then set S =  F − ` ∆ s ( q 0 , p 0 ) , . . . , F r ∆ s ( q 0 , p 0 )  S ∗ = n R ◦ F N 0 − r ∗ ∆ s ( q 0 , p 0 ) , . . . , R ◦ F N 0 + ` ∗ ∆ s ( q 0 , p 0 ) o where N 0 = N ( q 0 , p 0 ). 3. Prop ose the transition from S to S ∗ with the ac- ceptance probability which is the smaller of 1 and ` ∗ P j = − r ∗ ρ  R ◦ F N 0 + j ∆ s ( q 0 , p 0 )     D F N 0 + j ∆ s ( q 0 , p 0 )    r P i = − ` ρ  F i ∆ s ( q 0 , p 0 )    D F i ∆ s ( q 0 , p 0 )   (16) 5 4. If the transition in Step 3 is accepted, choose a state R ◦ F N 0 + j ∆ s ( q 0 , p 0 ) from S ∗ with the probabil- it y prop ortional to ρ  R ◦ F N 0 + j ∆ s ( q 0 , p 0 )     D F N 0 + j ∆ s ( q 0 , p 0 )    (17) and set ( q ( m +1) , p ( m +1) ) = R ◦ F N 0 + j ∆ s ( q 0 , p 0 ). Otherwise, choose a state F i ∆ s ( q 0 , p 0 ) from S with the probability prop ortional to ρ  F i ∆ s ( q 0 , p 0 )    D F i ∆ s ( q 0 , p 0 )   (18) and set ( q ( m +1) , p ( m +1) ) = F i ∆ s ( q 0 , p 0 ). 3. Theo ry: VL T-CHMC and detailed-balance condition T o o see how VL T-CHMC achiev es the detailed bal- ance, consider a small neigh b orhoo d B 0 around ( q 0 , p 0 ). The total probabilit y in the neigh b orhoo d B = ∪ r i = − ` F i ∆ s ( B 0 ) of S is Z B ρ ( q , p ) d q d p ≈ r X i = − ` ρ  F i ∆ s ( q 0 , p 0 )    D F i ∆ s ( q 0 , p 0 )     B 0   (19) assuming that B 0 is small enough that F i ∆ s ( B 0 )’s are dis- join t. Similarly , the total probability in the neighborho o d B ∗ = ∪ ` ∗ j = − r ∗ R ◦ F N 0 + j ∆ s ( B 0 ) of S ∗ is Z B ∗ ρ ( q , p ) d q d p ≈ ` ∗ X j = − r ∗ ρ  R ◦ F N 0 + j ∆ s ( q 0 , p 0 )     D F N 0 + j ∆ s ( q 0 , p 0 )      B 0   (20) Comparing the acceptance probabilit y (16) with the probabilit y fluxes (19) and (20), one can see that the acceptance-rejection procedure of Step 3 controls the probabilit y fluxes appropriately to achiev e the detailed balance b et ween the neighborho o ds B and B ∗ . Step 4 then imp oses the detailed balance within B and B ∗ b y sampling a state according to the relative amount of probabilit y in the individual comp onen ts { F i ∆ s ( B 0 ) } r i = − ` of B and { R ◦ F N 0 + j ∆ s ( B 0 ) } ` ∗ j = − r ∗ of B ∗ . 4. Theo retical efficiency: imp rovement over CHMC Throughout Section I I I A w e considered the compress- ible dynamics (7) arising from a v ariable step size inte- gration of Hamiltonian dynamics. In this sp ecific setting with the tra jectory length function N as defined in (10), VL T-CHMC is guaranteed to hav e a high av erage accep- tance probability . In fact, in the limit ∆ s → 0 with t fixed, the acceptance probability (16) of a VL T-CHMC prop osal from ( q 0 , p 0 ) conv erges to a v alue b ounded b e- lo w b y g ( Φ t ( q 0 , p 0 )) g ( q 0 , p 0 )  g ( q 0 , p 0 ) g ( Φ t ( q 0 , p 0 ))  (21) when g ( Φ t ( q 0 , p 0 )) < g ( q 0 , p 0 ). In case g ( Φ t ( q 0 , p 0 )) > g ( q 0 , p 0 ), a similar low er b ound holds for the prop osal from R ◦ Φ t ( q 0 , p 0 ). Note that the quan tity (21) is alw ays larger than 1 / 2 and it tends to 1 as the ratio g ( Φ t ( q 0 , p 0 )) /g ( q 0 , p 0 ) increases, in contrast with the ac- ceptance probability (9) of CHMC. More precise results on the acceptance probability of a VL T-CHMC prop osal are derived in App endix A. Of course, the acceptance rate of a prop osal distribu- tion is not the only factor determining the efficiency of an MCMC algorithm. Nonetheless, the theoretical re- sult ab o ve highligh ts an adv antage VL T-CHMC has ov er the usual CHMC. The b ottom line is that VL T-CHMC prop osals approximate the original dynamic (4) while CHMC prop osals approximate the time-rescaled dynam- ics (7). Therefore, VL T-CHMC will generally outp erform CHMC whenever the exact solution of the original dy- namics constitutes an efficient Marko v chain propagator as is t ypically the case in RMHMC applications. 13,21 This is substantiated by our simulation study in Section IV. B. General VL T-CHMC The k ey step in Algorithm 2 is the construction of the sets S and S ∗ with the prop ert y (12). More generally , the detailed balance can be imp osed by the same type of acceptance-rejection mechanism whenever the phase space can b e partitioned in to a collection of pairs S and S ∗ suc h that the set S ∪ S ∗ and ( S ∪ S ∗ ) c is closed un- der a (deterministic) transition rule. Conceiv ably , a wide range of algorithms can b e devised under this general condition. In this section we present one systematic wa y to generalize the framework of Section I II A. Consider a generic rev ersible map F on a state space z and asso ciated in volution R . Fix z 0 and denote z i = F i ( z 0 ). Cho ose a tr aje ctory termination criteria , or more precisely bo olean v alued functions b n ( z 0 , . . . , z n ) ∈ { 0 , 1 } , with the following property b n ( z 0 , . . . , z n ) = b n ( R ( z n ) , . . . , R ( z 0 )) (22) as well as the prop ert y b n ( z 0 , . . . , z n ) = 1 only if b n − i ( z i , . . . , z n ) = 1 (23) for an y i > 0. These prop erties are satisfied, for exam- ple, b y a termination criteria P n i =1 a ( z i ) + a ( z i − 1 ) > c for a scalar function a ( z ) ≥ 0. Define a corresponding tra jectory length function N ( z 0 ) as N ( z 0 ) = min { N 0 ( z 0 ) , N max } for N 0 ( z 0 ) = min  n : b n ( z 0 , . . . , z n ) = 1  (24) 6 With the reversible map F ∆ s and tra jectory length func- tion N of (10) replaced by the generic ones as ab o v e, Algorithm 2 remains a v alid MCMC scheme. This is be- cause the justification of the algorithm (in App endix B) only require a tra jectory length function N to satisfy the short r eturn condition N ( z ∗ ) ≤ N ( z ) where z ∗ = R ◦ F N ( z ) ( z ) (25) and or der pr eserving condition N ( z ) − n ≤ N ( F n ( z )) for any n (26) The in tuition b ehind the terminologies are explained in App endix B along with the pro of of the general VL T- CHMC algorithm. C. Example: Rejection Avoiding HMC Here we illustrate a use of the general VL T-CHMC framew ork through an algorithm of very differen t fla vor from the sp ecial case presented in Section II I A. A step size required for stable numerical integration of Hamilton’s equation (4) can v ary significantly at differ- en t regions of a phase space in some application areas of HMC. 9 In suc h situations, the Hamiltonian may be ap- pro ximately preserved along a simulated tra jectory for a while un til it suddenly starts to deviate wildly , leading to a prop osal with little chance of acceptance. VL T-CHMC pro vides a w ay to “detect” when the tra jectory b ecomes unstable and select an alternate state along the tra jec- tory to transition to. Let F ∆ t b e a v olume-preserving and rev ersible map as in (5), approximating Hamiltonian dynamics. Consider a tra jectory  ( q i , p i ) = F i ∆ t ( q 0 , p 0 )  i =0 , 1 , 2 ,... . When the tra jectory b ecomes unstable, it can b e detected by a tra- jectory termination criteria such as b n = 1  max 0 ≤ i ≤ n H ( q i , p i ) − min 0 ≤ i ≤ n H ( q i , p i ) ≥   (27) where 1 is an indicator function. W e will actually use an alternative criteria b elo w since this leads to a simpler algorithm implementation: b n = 1 n | H ( q i , p i ) − H ( q i − 1 , p i − 1 ) | ≥  for some i = 1 , . . . , n o (28) It is easy to chec k that the criteria (27) and (28) satisfy the prop erties (22) and (23) and define a v alid tra jectory length function N of the form (24) for Algorithm 2. W e refer to the version of VL T-CHMC based on the criteria (28) as r eje ction avoiding HMC . A proposal of rejection av oiding HMC reco vers the usual HMC prop osal with the tra jectory length N max when the fluctuation of a Hamiltonian at each step is within the error tolerance  . How ev er, up on detecting the fluctuation of magnitude larger than  at the step ( q i − 1 , p i − 1 ) → ( q i , p i ), the algorithm pro ceeds to simu- late the tra jectory bac kward from ( q 0 , p 0 ) and ( q ∗ 0 , p ∗ 0 ) = ( q i , − p i ) to determine the sets S and S ∗ according to the rule in Step 2 of Algorithm 2. IV. NUMERICAL RESUL TS A. Geometrically temp ered HMC with va riable step size integrato r HMC is kno wn to hav e a serious difficulty sampling from a multi-modal target density as the p oten tial en- ergy barriers among the mo des preven ts transition from one mo de to another. T o address this issue, Nishimura and Dunson 21 prop ose a v ersion of RMHMC with a mass tensor having the prop ert y | M ( q ) | 1 / 2 ∝ ρ ( q ) 1 − T − 1 (29) with a temp er atur e parameter T ≥ 1. It can b e sho wn that, with suc h a c hoice of a mass tensor, RMHMC algo- rithm is equiv alen t to the usual HMC algorithm (with a constan t mass tensor) applied to a temp ered distribution ˜ ρ ( ˜ q ) ∝ ρ ( q ) 1 /T on a manifold parametrized b y ˜ q . F or this reason, RMHMC with the property (29) is referred to as ge ometric al ly temp er e d HMC (GTHMC) in Ref. 21. The typical velocity of the dynamics (4) at the p osition q is giv en by the operator norm k M ( q ) k − 1 / 2 . This quan- tit y , and in turn the velocity of the dynamics, necessarily b ecomes unboundedly large in the regions where ρ ( q ) is small, due to the constraint (29). F or this reason, the only practical wa y to approximate the dynamics under- lying GTHMC algorithms is through a v ariable step size in tegrator with a step size prop ortional to k M ( q ) k 1 / 2 . W e take an example with a simple bimo dal target densit y from Ref. 21. The density ρ ( q ) is defined as a mixture of t wo-dimensional Gaussians with unit-v ariance cen tered at (4 , 0) and ( − 4 , 0). The mass tensor is chosen as M ( q ) ∝ ρ ( q ) 2 γ ( 1 − T − 1 ) e 1 e T 1 + ρ ( q ) 2(1 − γ )( d − 1) − 1 ( 1 − T − 1 )  I − e 1 e T 1  (30) for d − 1 ≤ γ ≤ 1 where d = 2 is the dimension of q and e 1 = (1 , 0) is a standard basis v ector. The mass tensors suggested in Ref. 12 and 22 hav e apparent resemblance to (30), but the crucial difference is that they do not satisfy (29) and consequen tly offer rather limited improv emen t o ver the standard HMC. W e compare the performance of CHMC and VL T- CHMC with the explicit v ariable step size in tegrator de- v elop ed in Ref. 21. VL T-CHMC is run with the tra jec- tory length function (10). The main challenge in this example to explore the phase space along the first co or- dinate of q due to the multi-modality along this direc- tion. Therefore the efficiency of the sampling algorithms 7 Num b er of steps 5 10 15 20 25 30 35 Acceptance rate 0.48 0.38 0.37 0.36 0.34 0.33 0.33 ESS 75.7 180 145 83.1 103 123 101 T ABLE I. ESS of CHMC along the first co ordinate p er 10 5 force ev aluations at the v arious n umbers of n umerical in tegra- tion steps. The num b er of steps coincides with that of force ev aluations. t 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Num b er of steps 13 17 21 24 27 30 33 Acceptance rate 0.81 0.78 0.76 0.75 0.73 0.72 0.71 ESS 899 966 924 992 925 921 805 T ABLE I I. ESS of VL T-CHMC along the first co ordinate p er 10 5 force ev aluations. The integration time t determines the tra jectory lengths through the termination criteria in (10). is summarized by the effective sample sizes (ESS) along the first co ordinate of q . The ESS’s as w ell as the ac- ceptance probabilities at different parameter settings of CHMC and VL T-CHMC are summarized in T able I and I I. As predicted by our discussion in Section I II A, VL T- CHMC has substantially higher acceptance probabilities and, across v arious parameter settings, is five times more efficien t than CHMC with the optimal parameter c hoice. The time step size ∆ s = . 75 for the v ariable step size inte- grator was used for all the simulations and was c hosen to con trol the error in the Hamiltonian within a reasonable lev el along the tra jectories. ESS’s were computed using the initial monotone sequence estimator of Gey er. 23 B. Rejection avoiding HMC T o illustrate the b enefit of the rejection av oiding algo- rithm described in Section I II C, w e consider the problem of sampling from a probability densit y function ρ ( x, y ) ∝ exp( − U ( x, y )) as plotted in Figure 1. The densit y ρ ( x, y ) is constructed as a (con tinuous) Gaussian mixture ρ ( x, y ) ∝ Z 10 1 1 σ µ exp  − ( x − µ ) 2 2 σ 2 µ − y 2  d µ (31) where σ µ = 0 . 1 + ( µ/ 10) 2 . The densit y has a prop ert y that, along the x -axis, the partial deriv ativ e ∂ y U ( x, y ) v aries substantially and so do es the stable step size for the leap-frog integrator t ypically emplo yed in HMC. F or example, the leap-frog integrator with the step size ∆ t ≥ 0 . 4 approximates the Newton’s equations of mo- tion quite accurately in the region x > 4, while the step size of ∆ t ≈ 0 . 2 is required for a numerically stable ap- pro ximation in the region x < 2. In in practice, such a kno wledge is obviously not av ailable to us and the ap- propriate step size must b e determined empirically from preliminary runs of HMC. A common strategy is to pick a target acceptance rate for the HMC prop osals, t yp- ically in the range 0 . 65 ∼ 0 . 8, and tune the step size accordingly . 9,24,25 This approach would suggest a step x 0 5 10 y -2 -1 0 1 2 0.2 0.4 0.6 0.8 1 FIG. 1. A plot of (unnormalized) probability density func- tion ρ ( x, y ) ∝ exp( − U ( x, y )) used to illustrate the benefit of rejection av oiding HMC. size w ell ab o ve the stability in this example, how ever. Figure 2 shows that the acceptance rate of HMC to b e quite high even for the step size ∆ t = 0 . 4. The accep- tance rate can b e high despite some unstable tra jecto- ries b ecause the region where the approximation b ecome unstable contains relatively small, though not negligible, probabilit y . On the other hand, the p erformance of HMC is severely undermined by the choice of a to o large step size as can b e seen in Figure 3. The ESS’s for 10 6 force ev aluations, estimated from ten indep enden t sim ulations, are shown so that the computational cost is fixed across the exp erimen ts. The error tolerance in Hamiltonian, as in (28), for rejection av oiding HMC is set to  = 3. When ∆ t = 0 . 2, less than 1% of tra jectories exp erience the error in Hamiltonian ab o v e the tolerance, so there is no practical difference b et w een HMC with and without rejection a voidance. How ever, without rejection a void- ance, the ESS is reduced by the factor as large as five when increasing the step size from ∆ t = 0 . 2 to ∆ t = 0 . 3. The p erformance degradation is less severe for rejection a voiding HMC as the algorithm concentrates the compu- tational efforts on the stable portions of approximated tra jectories. In summary , choosing an optimal step size for HMC is difficult in practice as the choice must be made without the detailed knowledge of a target density . A step size can app ear to appro ximate the dynamics accurately but b e ab o ve the stability limit in some regions. Rejection a voiding HMC can alleviate the effect of a sub optimal step size choice and provides far more ESS’s than the standard HMC in such situations. 8 t 0 2 4 6 8 Acceptance rate 0.6 0.7 0.8 0.9 1 ∆ t = 0.2 ∆ t = 0.3 ∆ t = 0.4 FIG. 2. Acceptance rate of HMC prop osals at v arious set- tings of step size and integration time when sampling from the density shown in Figure 1. t 0 2 4 6 8 ESS × 10 4 0 0.5 1 1.5 2 2.5 3 ∆ t = 0.2 ∆ t = 0.3 ∆ t = 0.4 with rejection-avoidance FIG. 3. ESS p er 10 6 force ev aluations at v arious settings of step size and integration time. The ESS’s are for the mean estimation along the x -axis. V. A CKNOWLEDGMENTS W e would like to thank Jiangfeng Lu for his feedbac k on a preliminary draft of the man uscript. App endix A: Derivation of limiting acceptance probabilit y In this section w e analyse the acceptance probability of CHMC and VL T-CHMC algorithms in the sp ecial case of RMHMC with v ariable step size integrators as describ ed in Section I II A. W e deriv e explicit formulas as well as useful bounds on the acceptance probabilities in the limit ∆ s → 0. 1. Acceptance probabilit y of CHMC When approximating a time-rescaled Hamiltonian dy- namics (7) with a rev ersible map F ∆ s as in (8), the ac- ceptance probability of the CHMC prop osal from ( q , p ) is calculated b y the formula 1 ∧ ρ ( R ◦ F n ∆ s ( q , p )) | D F n ∆ s ( q , p ) | ρ ( q , p ) In the limit ∆ s → 0 and n ∆ s → s 0 , the ab o ve quantit y con verges to 1 ∧ ρ ( R ◦ Φ s 0 ( q , p )) | D Φ s 0 ( q , p ) | ρ ( q , p ) where Φ s is the solution operator of the dynamics (7) i.e. d Φ s d s = ( g ◦ Φ s ) f ◦ Φ s (A1) where f = ( ∇ p H , −∇ q H ). W e hav e ρ ◦ Φ s 0 = ρ since Hamiltonian dynamics conserves the energy and so does the time-rescaled dynamics. W e also hav e ρ ◦ R = ρ , so that ρ ( R ◦ Φ s 0 ( q , p )) = ρ ( q , p ). T o establish the limit- ing acceptance probability (9), therefore, it remains to sho w that | D Φ s 0 ( q , p ) | = g ( Φ s 0 ( q , p )) /g ( q , p ). The Ja- cobian D Φ s satisfies a matrix-v alued differential equa- tion ∂ ∂ s D Φ s = D f ◦ Φ s D Φ s and therefore Liouville’s form ula tells us that | D Φ s 0 | = exp Z s 0 0 tr ( D f ◦ Φ s ) d s ! A straightforw ard calculation shows that tr ( D f ◦ Φ s ) = ∂ ∂ s log g ◦ Φ s , from whic h the iden tity | D Φ s 0 | = g ◦ Φ s 0 /g follo ws. 2. Acceptance probabilit y of VL T-CHMC In the deriv ation b elo w, we will follow the notations of Section I II A 2. Namely , w e set ( q ∗ 0 , p ∗ 0 ) = R ◦ F N ( q 0 , p 0 ), ( q i , p i ) = F i ∆ s ( q 0 , p 0 ), and ( q ∗ i , p ∗ i ) = F i ∆ s ( q ∗ 0 , p ∗ 0 ). The tra jectory length function N = N ( t ) is defined as in (10) and the sets S and S ∗ as in Algorithm 2. Note that ( q 0 , p 0 ) is fixed, but other quantities dep end on ∆ s , in- cluding but not limited to ( q i , p i )’s, N ( q 0 , p 0 ), and S . W e do not denote the dep endence explicitly but it is im- plied. W e will show that the acceptance probability of the transition from S to S ∗ con verges to 1 ∧ g ( Φ t ( q 0 , p 0 )) | S ∗ | g ( q 0 , p 0 ) | S | (A2) as ∆ s → 0 while t fixed. Moreo ver, if g ( Φ t ( q 0 , p 0 )) < g ( q 0 , p 0 ), then in the limit ∆ s → 0 we ha ve | S | = 1 and g ( q 0 , p 0 ) g ( Φ t ( q 0 , p 0 )) − 1 ≤ | S ∗ | ≤ g ( q 0 , p 0 ) g ( Φ t ( q 0 , p 0 )) + 1 (A3) 9 The claimed low er bound (21) on the acceptance proba- bilit y follo ws immediately from (A2) and (A3). It is not difficult to sho w that diam( S ) → 0 and diam( S ∗ ) → 0 as ∆ s → 0. This means that the elements of S (and of S ∗ ) collapse to a single state as ∆ s → 0. More precisely , for all − ` ≤ i ≤ r and − r ∗ ≤ j ≤ ` ∗ , F i ∆ s ( q 0 , p 0 ) → ( q 0 , p 0 ) R ◦ F N 0 + j ∆ s ( q 0 , p 0 ) → R ◦ Φ t ( q 0 , p 0 ) (A4) where N 0 = N ( q 0 , p 0 ) and r, `, r ∗ , ` ∗ are defined as in (13) and (15). It follows that ρ  F i ∆ s ( q 0 , p 0 )    D F i ∆ s ( q 0 , p 0 )   → ρ ( q 0 , p 0 ) ρ  R ◦ F N 0 + j ∆ s ( q 0 , p 0 )     D F N 0 + j ∆ s ( q 0 , p 0 )    → ρ ( R ◦ Φ t ( q 0 , p 0 )) | D Φ t ( q 0 , p 0 )) | (A5) By the same argument as in Section A 1, we can show that ρ ( R ◦ Φ t ( q 0 , p 0 )) | D Φ t ( q 0 , p 0 )) | = ρ ( q 0 , p 0 ) g ( Φ t ( q 0 , p 0 )) g ( q 0 , p 0 ) (A6) establishing the claimed formula (A2). W e now turn to the pro of of the inequalit y (A3). The in tuition b ehind the inequality and the pro of b elo w is that the size of the set | S ∗ | is roughly equal to the n umber of interv als of length ∆ s · g ( q ∗ 0 , p ∗ 0 ) that can b e fit inside the interv al ( t, t + ∆ s · g ( q 0 , p 0 )). Denote N ∗ 0 = N ( q ∗ 0 , p ∗ 0 ). By the definition of N ∗ 0 , r ∗ , and ` ∗ , we m ust ha ve N ∗ 0 − 1 X i = − ` ∗ +1 ∆ s 2  g ( q ∗ i − 1 , p ∗ i − 1 ) + g ( q ∗ i , p ∗ i )  < t < N ∗ 0 X i = r ∗ +1 ∆ s 2  g ( q ∗ i − 1 , p ∗ i − 1 ) + g ( q ∗ i , p ∗ i )  (A7) whic h implies that r ∗ X i = − ` ∗ +1 1 2  g ( q ∗ i − 1 , p ∗ i − 1 ) + g ( q ∗ i , p ∗ i )  < 1 2  g ( q ∗ N ∗ 0 − 1 , p ∗ N ∗ 0 − 1 ) + g ( q ∗ N ∗ 0 , p ∗ N ∗ 0 )  (A8) Also by the definition N ∗ 0 , r ∗ , and ` ∗ , we must hav e N ∗ 0 X i = r ∗ +2 ∆ s 2  g ( q ∗ i − 1 , p ∗ i − 1 ) + g ( q ∗ i , p ∗ i )  < t < N ∗ 0 − 1 X i = − ` ∗ ∆ s 2  g ( q ∗ i − 1 , p ∗ i − 1 ) + g ( q ∗ i , p ∗ i )  (A9) whic h implies that 1 2  g ( q ∗ N ∗ 0 − 1 , p ∗ N ∗ 0 − 1 ) + g ( q ∗ N ∗ 0 , p ∗ N ∗ 0 )  < r ∗ +1 X i = − ` ∗ 1 2  g ( q ∗ i − 1 , p ∗ i − 1 ) + g ( q ∗ i , p ∗ i )  (A10) Since diam( S ) → 0 and diam( S ∗ ) → 0 as ∆ s → 0, the inequalities (A8) and (A10) con verge to g ( q ∗ 0 , p ∗ 0 )( | S ∗ | − 1) ≤ g ( q 0 , p 0 ) ≤ g ( q ∗ 0 , p ∗ 0 )( | S ∗ | + 1) (A11) The desired inequality (A3) is obtained by rearranging the terms in the ab o v e inequalit y . Finally , w e turn to the pro of of the fact that | S | → 1 as ∆ s → 0 when g ( q ∗ 0 , p ∗ 0 ) < g ( q 0 , p 0 ). T o this end, we only need to note that all the arguments in the pro of of (A11) remain v alid if w e switc h the role of ( q ∗ i , p ∗ i ), r ∗ , ` ∗ and N ∗ 0 with ( q i , p i ), r , ` and N 0 . This means that the inequalit y (A11) still holds if we switch the role of S ∗ with S and of ( q ∗ 0 , p ∗ 0 ) with ( q 0 , p 0 ), yielding the inequalit y g ( q 0 , p 0 )( | S | − 1) ≤ g ( q ∗ 0 , p ∗ 0 ) ≤ g ( q 0 , p 0 )( | S | + 1) In particular, we hav e | S | ≤ g ( q ∗ 0 , p ∗ 0 ) g ( q 0 , p 0 ) + 1 and hence | S | = 1. App endix B: Justification of VL T-CHMC algorithm As claimed in Section (II I B), Algorithm 2 remains a v alid algorithm when we replace the reversible map F ∆ s with any reversible map and the tra jectory le n gth function N with any function of the form (24). In Section II I A 3, the detailed balance condition of VL T- CHMC was derived using the notations of Algorithm 2. Ho wev er, it is easy to see that the same analysis carries through when we replace the reversible map F ∆ s of Sec- tion I II A with any reversible map as long as the set S and S ∗ satisfies (12). In this section, we establish the last piece in our pro of of the general VL T-CHMC algo- rithm; the prop ert y (12) holds whenev er N satisfies the short-return (25) and order-preserving condition (26). W e consider a generic reversible map F with an asso- ciated in volution R on a general phase space z as well as a generic tra jectory length function N satisfying the short-return and order-preserving condition. Ho wev er, all the notations and definitions directly parallel those in our presentation of the special case of VL T-CHMC in Section I I I A 2. Fix z 0 and denote z i = F i ( z 0 ), z ∗ 0 = R ◦ F N ( z 0 ), and z ∗ i = F i ( z ∗ 0 ). A tra jectory function N determines the sets via the formula S = { z − ` , . . . , z r } and S ∗ =  z ∗ − ` ∗ , . . . , z ∗ r ∗  where `, r, ` ∗ , r ∗ ≥ 0 are de- 10 fined as ` = max  i ≥ 0 : F N ( z − i ) = F N ( z 0 )  r = max  i ≥ 0 : F N ( z i ) = F N ( z 0 )  ` ∗ = max  i ≥ 0 : F N ( z ∗ − i ) = F N ( z ∗ 0 )  r ∗ = max  i ≥ 0 : F N ( z ∗ i ) = F N ( z ∗ 0 )  (B1) T o build the intuition b ehind the pro of, we define a partial ordering  on the phase space as follo ws: z  ˜ z if F i ( z ) = ˜ z for i ≥ 0 (B2) Note that z  ˜ z if and only if R ( ˜ z )  R ( z ), due to the rev ersibility of F . With this notation, the short-return condition can b e expressed as z  R ◦ F N ( z ∗ ) for z ∗ = R ◦ F N ( z ) (B3) The condition (B3) can b e interpreted intuitiv ely as fol- lo ws; according to the tra jectory termination criteria im- p osed b y N , the reverse tra jectory z ∗ 0 , z ∗ 1 , . . . must ter- minate at z 0 or at z i for i > 0 b efore coming all the w ay bac k to z 0 . The order-preserving condition simply amoun ts to F N ( z )  F N ( ˜ z ) if z  ˜ z (B4) W e no w sho w ho w the order-preserving and short- return condition implies (12). By the order-preserving condition, we know that F N ( z − ` )  F N ( z i )  F N ( z r ) (B5) for all − ` ≤ i ≤ r . On the other hand, we ha ve F N ( z − ` ) = F N ( z r ) = R ( z ∗ 0 ) by the definition of ` and r , so it follows that R ◦ F N ( { z − ` , . . . , z r } ) = { z ∗ 0 } . W e now turn to demonstration of R ◦ F N ( S ∗ ) = { z r } . T o this end, it suffices to show R ◦ F N ( z ∗ 0 ) = z r as the definition of ` ∗ and r ∗ com bined with the order- preserving condition implies R ◦ F N ( z ∗ i ) = R ◦ F N ( z ∗ 0 ) for all − ` ∗ ≤ i ≤ r ∗ . Since R ◦ F N ( z r ) = z ∗ 0 , the short- return condition tells us R ◦ F N ( z ∗ 0 ) = z r + k for some k ≥ 0. T o show that k = 0, first observe that an ap- plication of the short-return condition to the state z r + k implies z ∗ 0  R ◦ F N ( z r + k ). On the other hand, the order-preserving condition implies F N ( z r )  F N ( z r + k ) and hence R ◦ F N ( z r + k )  R ◦ F N ( z r ) = z ∗ 0 . The preced- ing inequalities together sho w that R ◦ F N ( z r + k ) = z ∗ 0 . Since r was defined as the largest integer i such that R ◦ F N ( z i ) = z ∗ 0 , it follo ws that k = 0 and R ◦ F N ( z ∗ 0 ) = z r . The remaining relations in (12) as w ell as the fact r ∗ = 0 can b e prov ed similarly with rep eated applications of the short-return and order-preserving prop erties. 1 N. Metrop olis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. T eller, and E. T eller, “Equation of state calculations b y fast com- puting mac hines,” The Journal of Chemical Ph ysics 21 (1953). 2 G. O. Rob erts and J. S. Rosenthal, “Optimal scaling for v arious Metropolis-hastings algorithms,” Statistical Science 16 , 351–367 (2001). 3 J. C. Mattingly , N. S. Pillai, and A. M. Stuart, “Diffusion limits of the random walk Metrop olis algorithm in high dimensions,” The Annals of Applied Probability 22 , 881–930 (2012). 4 N. S. Pillai, A. M. Stuart, and A. H. Thi´ ery , “Optimal scaling and diffusion limits for the Langevin algorithm in high dimen- sions,” The Annals of Applied Probability 22 , 2320–2356 (2012). 5 S. Duane, A. Kennedy , B. J. Pendleton, and D. Ro weth, “Hybrid Monte Carlo,” Physics Letters B 195 , 216 – 222 (1987). 6 B. Shahbaba, S. Lan, W. O. Johnson, and R. M. Neal, “Split Hamiltonian Monte Carlo,” Statistics and Computing 24 , 339– 349 (2013). 7 A. Pakman and L. Paninski, “Auxiliary-v ariable exact Hamilto- nian Mon te Carlo samplers for binary distributions,” in Advanc es in Neur al Information Pr o c essing Systems 26 (2013) pp. 2490– 2498. 8 A. P akman and L. Paninski, “Exact Hamiltonian Monte Carlo for truncated multiv ariate Gaussians,” Journal of Computational and Graphical Statistics 23 , 518–542 (2014). 9 R. M. Neal, “MCMC using Hamiltonian dynamics,” in Handb ook of Markov Chain Monte Carlo (CRC Press). 10 B. Leimkuhler and S. Reic h, “A Metrop olis adjusted nos ´ e-hoover thermostat,” ESAIM: Mathematical Modelling and Numerical Analysis 43 , 743–755 (2009). 11 S. Lan, V. Stathop oulos, B. Shah baba, and M. Girolami, “Marko v chain Mon te Carlo from Lagrangian dynamics,” Journal of Computational and Graphical Statistics 24 , 357–378 (2015). 12 Y. F ang, J. M. Sanz-Serna, and R. D. Skeel, “Compressible gen- eralized hybrid Mon te Carlo,” The Journal of Chemical Physics 140 (2014), h ttp://dx.doi.org/10.1063/1.4874000. 13 M. Girolami and B. Calderhead, “Riemann manifold Langevin and Hamiltonian Monte Carlo metho ds,” Journal of the Royal Statistical So ciet y: Series B (Statistical Metho dology) 73 , 123– 214 (2011). 14 M. D. Hoffman and A. Gelman, “The No-U-T urn Sampler: adap- tively setting path lengths in Hamiltonian Mon te Carlo,” Journal of Mac hine Learnning Research 15 , 1593–1623 (2014). 15 Z. W ang, S. Mohamed, and N. de F reitas, “Adaptive Hamilto- nian and Riemann manifold Mon te Carlo samplers,” in Pr o c e e d- ings of the 30th International Confer enc e on Machine L e arning , V ol. 28 (2013) pp. 1462–1470. 16 C. H. Bennett, “Mass tensor molecular dynamics,” Journal of Computational Ph ysics 19 , 267 – 279 (1975). 17 M. Calvo, M. Lp ez-Marcos, and J. Sanz-Serna, “V ariable step implementation of geometric integrators,” Applied Numerical Mathematics 28 , 1 – 16 (1998). 18 S. Blanes and A. Iserles, “Explicit adaptiv e symplectic integra- tors for solving Hamiltonian systems,” Celestial Mechanics and Dynamical Astronom y 114 , 297–317 (2012). 19 B. Leimkuhler and S. Reich, Simulating Hamiltonian Dynamics (Cambridge University Press, 2005) cambridge Books Online. 20 E. Hairer, C. Lubich, and G. W anner, Ge ometric Numerical Inte gr ation. Structur e-Pr eserving Algorithms for Or dinary Dif- fer ential Equations (Springer-V erlag Berlin Heidelb erg, 2006). 21 A. Nishim ura and D. B. Dunson, “Geometrically tempered Hamiltonian Mon te Carlo,” in preparation. 22 S. Lan, J. Streets, and B. Shahbaba, “W ormhole Hamiltonian Monte Carlo,” in Pr oc e e dings of the Twenty-Eighth AAAI Con- fer enc e on Artificial Intel ligenc e (2014). 23 C. J. Geyer, “Practical marko v chain monte carlo,” Statist. Sci. 7 , 473–483 (1992). 24 A. Beskos, N. Pillai, G. Rob erts, J.-M. Sanz-Serna, and A. Stu- art, “Optimal tuning of the hybrid Monte Carlo algorithm,” Bernoulli 19 , 1501–1534 (2013). 25 Stan Developmen t T eam, Stan Mo deling L anguage Users Guide and Refer enc e Manual, V ersion 2.9.0 (2015).

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment