A multi-point Metropolis scheme with generic weight functions

The multi-point Metropolis algorithm is an advanced MCMC technique based on drawing several correlated samples at each step and choosing one of them according to some normalized weights. We propose a variation of this technique where the weight funct…

Authors: Luca Martino, Victor Pascual Del Olmo, Jesse Read

A multi-point Metropolis scheme with generic weight functions
A m ulti-p oin t Metrop olis sc heme with generic w eigh t functions Luca Martino, Victor P ascual Del Olmo, Jesse Read Dep artment of Signal The ory and Communic ations, Universidad Carlos III de Madrid. A venida de la Universidad 30, 28911, L e ganes, Madrid, Sp ain. luca@tsc.uc3m.es , vpolmo@tsc.uc3m.es , jesse@tsc.uc3m.es Abstract The m ulti-p oin t Metrop olis algorithm is an adv anced MCMC tec hnique based on drawing sev eral correlated samples at each step and choosing one of them according to some normalized w eights. W e prop ose a v ariation of this tec h- nique where the w eigh t functions are not specified, i.e., the analytic form can b e c hosen arbitrarily . This has the adv antage of greater flexibilit y in the design of high-p erformance MCMC samplers. W e prov e that our metho d fulfills the balance condition, and pro vide a numerical sim ulation. W e also giv e new insigh t into the functionality of different MCMC algorithms, and the connections b et w een them. Keywor ds: Multiple T ry Metrop olis algorithm, Multi-point Metropolis algorithm, MCMC metho ds 1. In tro duction Mon te Carlo statistical metho ds are p o w erful to ols for numerical inference and stochastic optimization (see Rob ert and Casella (2004), for instance). Mark ov chain Mon te Carlo (MCMC) metho ds are classical Monte Carlo tec hniques that generate samples from a target probabilit y density function (p df ) by dra wing from a simpler proposed pdf, usually to approximate an otherwise-incalculable (analytically) in tegral (Liu, 2004; Liang et al., 2010). MCMC algorithms produce a Mark o v c hain with a stationary distribution that coincides with the target p df. The Metrop olis-Hastings (MH) algorithm (Metropolis et al., 1953; Hast- ings, 1970) is the most famous MCMC tec hnique. It can be applied to almost Pr eprint submitte d to Statistics & Pr ob ability L etters Septemb er 6, 2018 an y target distribution. In practice, how ev er, finding a “go o d” prop osal pdf can b e difficult. In some applications, the Marko v chain generated b y the MH algorithm can remain trapp ed almost indefinitely in a local mo de meaning that, in practice, con vergence may not be reached. The Multiple-T ry Metr op olis (MTM) metho d of Liu et al. (2000) is an extension of the MH algorithm in whic h the next state of the c hain is selected among a set of indep enden t and identically distributed (i.i.d.) samples. This enables the MCMC sampler to make large step-size jumps without a low ering the acceptance rate; and thus MTM is can explore a larger portion of the sample space in few er iterations. An in teresting sp ecial case of the MTM, w ell-known in molecular simula- tion field, is the orientational bias Monte Carlo , as describ ed in Chapter 13 of F renk el and Smit (1996) and Chapter 5 of Liu (2004), where i.i.d. candidates are drawn from a symmetric prop osal p df, and one of these is c hosen accord- ing to some weigh ts directly prop ortional to the target p df. Here, how ever, the analytic form of the w eight functions is fixed and unalterable. Casarin et al. (2011) introduced a MTM sc heme using differen t prop osal p dfs. In this case the samples pro duced are indep endent but not identi- cally distributed. In Qin and Liu (2001), another generalization of the MTM (called the multi-p oint Metr op olis metho d) is proposed using correlated can- didates at eac h step. Clearly , the prop osal pdfs are also differen t in this case. Moreo ver, in P andolfi et al. (2010) an extension of the classical MTM tec hnique is introduced where the analytic form of the weigh ts is not sp ecified. In Pandolfi et al. (2010), the same prop osal p df is used to dra w samples, so that the candidates generated eac h step of the algorithm are i.i.d. F urther in teresting and related considerations ab out the use of auxiliary v ariables for building acceptance probabilities within a MH approach can b e found in Storvik (2011). In this pap er, we draw from the t wo approaches (Qin and Liu, 2001) and (P andolfi et al., 2010) to create a no v el algorithm that selects a new state of the c hain among correlated samples using generic w eight functions, i.e., the analytic form of the w eigh ts can b e chosen arbitrarily . F urthermore, we form ulate the algorithm and the acceptance rule in order to fulfill the detailed balance condition. Our metho d allows more flexibility in the design of efficient MCMC sam- plers with a larger co v erage and faster exploration of the sample space. In fact, w e can choose an y b ounded and p ositiv e w eigh t functions to either im- 2 pro ve p erformance or reduce computational complexit y , indep enden tly of the c hosen proposal pdf. Moreo v er, since in our approac h the prop osal p dfs are differen t, adaptiv e or interacting techniques can b e applied, suc h as those in tro duced b y Andrieu and Moulines (2006); Casarin et al. (2011). An im- p ortan t adv antage of our pro cedure is that, since in our pro cedure a new candidate is dra wn from a conditional pdf whic h dep ends on the samples gen- erated earlier during the same time step, it constructs an impro ved prop osal b y automatically building on the information obtained from the generated samples. The rest of the pap er is organized as follo ws. In Section 2 w e recall the standard multi-point Metropolis algorithm. In Section 3 w e introduce our no vel scheme with generic weigh t functions and correlated samples. Section 4 pro vides a rigorous pro of that the nov el scheme satisfies the detailed balance condition. A n umerical sim ulation is pro vided in Section 5 and finally , in Section 6, we discuss the adv antages of our prop osed technique and pro vide insigh t into the relationships among different MTM sc hemes in literature. 2. Multi-p oin t Metrop olis algorithm In the classical MH algorithm, a new p ossible state is drawn from the prop osal p df and the mov ement is accepted with a suitable decision rule. In the m ulti-p oin t approac h, several correlated samples are generated and, from these, a “go od” one is chosen. Sp ecifically , consider a target p df p o ( x ) kno wn up to a constan t (hence, w e can ev aluate p ( x ) ∝ p o ( x )). Given a curren t state x ∈ R (we assume scalar v alues only for simplicity in the treatmen t), we dra w N correlated samples eac h step from a sequence of different prop osal p dfs { π j } N j =1 , i.e., y 1 ∼ π 1 ( ·| x ) ,y 2 ∼ π 2 ( ·| x, y 1 ) , y 3 ∼ π 3 ( ·| x, y 1 , y 2 ) , . . . . . . y N ∼ π N ( ·| x, y 1 , ..., y N − 1 ) . (1) Therefore, w e can write the joint distribution of the generated samples as q N ( y 1 , ..., y N | x ) = q N ( y 1: N | x ) = π 1 ( y 1 | x ) π 2 ( y 2 | x, y 1 ) · · · π N ( y N | x, y 1: N − 1 ) , (2) i.e., q N ( y 1: N | x ) = π 1 ( y 1 | x ) N Y j =2 π j ( y j | x, y 1: j − 1 ) (3) 3 where, for brevit y , we use the notation y 1: j , [ y 1 , ..., y j ] and y j :1 , [ y j , ..., y 1 ] denotes the v ector with the reverse order. A “go o d” candidate among the generated samples is chosen according to w eight functions ω j ( z 1 , ..., z j +1 ) ∈ R j +1 → R + where z 1 , ..., z j +1 , are generic v ariables and j = 1 , ..., N . The sp ecific analytic form of the w eights needed in this technique is ω j ( z 1 , ..., z j +1 ) , p ( z 1 ) π 1 ( z 2 | z 1 ) · · · π N ( z j | z 1: j − 1 ) λ j ( z 1 , ...., z j +1 ) , (4) where p ( x ) ∝ p o ( x ) is the target p df, λ j can b e any b ounded, p ositiv e, and se quential ly symmetric function, i.e., λ j ( z 1 , z 2: j +1 ) = λ j ( z j +1:2 , z 1 ) . (5) Note that, since q j ( z 2: j +1 | z 1 ) = π 1 ( z 2 | z 1 ) · · · π N ( z j | z 1: j − 1 ) (see Eq. (2)), w e can rewrite the w eight functions as ω j ( z 1 , ..., z j +1 ) = p ( z 1 ) q j ( z 2: j +1 | z 1 ) λ j ( z 1 , z 2: j +1 ) . (6) 2.1. A lgorithm Giv en a current state x = x t , the multi-point Metrop olis algorithm con- sists of the follo wing steps: 1. Dra w N samples y 1: N = [ y 1 , y 2 , ..., y N ] from the join t p df q N ( y 1: N | x ) = π 1 ( y 1 | x ) N Y j =2 π j ( y j | x, y 1: j − 1 ) namely , dra w y j from π j ( ·| x, y 1: j − 1 ), with j = 1 , ..., N . 2. Calculate the weigh ts ω j ( y j :1 , x ) as in Eq. (4), and normalize them to obtain ¯ ω j , j = 1 , ..., N . 3. Dra w a y = y k ∈ { y 1 , ...., y N } according to their w eights ¯ ω 1 , ..., ¯ ω N . 4. Set x ∗ 1 = y k − 1 , x ∗ 2 = y k − 2 , . . . , x ∗ k − 1 = y 1 , (7) 4 and finally x ∗ k = x . Then, dra w other “reference” samples x ∗ j ∼ π i ( ·| y , x ∗ 1: j − 1 ) , (8) for j = k + 1 , ..., N . Note that for j = k + 1 w e hav e π j ( ·| y , x ∗ 1: j − 1 ) = π j ( ·| y , x ∗ 1 = y k − 1 , ..., x ∗ k − 1 = y 1 , x ∗ k = x ) , and, for j = k + 2 , ..., N , w e hav e π j ( ·| y , x ∗ 1: j − 1 ) = π j ( ·| y , x ∗ 1 = y k − 1 , ..., x ∗ k − 1 = y 1 , x ∗ k = x, x ∗ k +1: j − 1 ) . 5. Compute ω j ( x ∗ j :1 , y ) as in Eq. (4). 6. Let x t +1 = y k with probabilit y α = min " 1 , P N j =1 ω j ( y j :1 , x ) P N j =1 ω j ( x ∗ j :1 , y ) # , (9) otherwise set x t +1 = x with probability 1 − α . 7. Set t = t + 1 and rep eat from step 1. The kernel of this technique satisfies the detailed balance condition as sho wn in Qin and Liu (2001). Ho wev er, to fulfill this condition, the algorithm needs that the w eights are defined exactly with the form in Eq. (4). 3. Extension with generic w eight functions No w, w e consider generic w eight functions ω j ( z 1 , ..., z j +1 ) ∈ R j +1 → R + , that hav e to b e (a) b ounded and (b) p ositiv e. In this case, the algorithm can b e described as follows. 1. Dra w N samples y 1: N = [ y 1 , y 2 , ..., y N ] from the join t p df q N ( y 1: N | x ) = π 1 ( y 1 | x ) N Y j =2 π j ( y j | x, y 1: j − 1 ) namely , dra w y j from π j ( ·| x, y 1: j − 1 ), with j = 1 , ..., N . 5 2. Choose some suitable (b ounded and p ositiv e) w eight functions. Then, calculate each weigh t ω j ( y j :1 , x ), and normalize them to obtain ¯ ω j , j = 1 , ..., N . 3. Dra w a y = y k ∈ { y 1 , ...., y N } according to ¯ ω 1 , ..., ¯ ω N , and set ¯ W y = ¯ ω k , i.e., ¯ W y , ω k ( y k :1 , x ) P N j =1 ω j ( y j :1 , x ) . (10) 4. Set x ∗ 1 = y k − 1 , x ∗ 2 = y k − 2 , . . . , x ∗ k − 1 = y 1 , (11) and finally x ∗ k = x . Then, dra w the remaining “reference” samples x ∗ j ∼ π j ( ·| y , x ∗ 1: j − 1 ) , (12) for j = k + 1 , ..., N . 5. Compute the general weigh ts ω j ( x ∗ j :1 , y ) and calculate the normalized w eight ¯ W x , ω k ( x ∗ k :1 , y ) P N j =1 ω j ( x ∗ j :1 , y ) . (13) 6. Set x t +1 = y k with probabilit y α = min  1 , p ( y ) π 1 ( x ∗ 1 | y ) π 2 ( x ∗ 2 | y , x ∗ 1 ) · · · π k ( x ∗ k | y , x ∗ 1 , ..., x ∗ k − 1 ) p ( x ) π 1 ( y 1 | x ) π 2 ( y 2 | x, y 1 ) · · · π k ( y k | x, y 1 , ..., y k − 1 ) ¯ W x ¯ W y  . (14) W e can rewrite it in a more compact form as α = min  1 , p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) ¯ W x ¯ W y  , (15) where w e recall q k ( y 1: k | x ) = π 1 ( y 1 | x ) k Y j =2 π j ( y j | x, y 1: j − 1 ) . (16) where k is the index of the c hosen sample y k . Otherwise, set x t +1 = x with probability 1 − α . 7. Set t = t + 1 and rep eat from step 1. W e emphasise that in the algorithm abov e we ha ve not sp ecifically defined the w eight functions. 6 3.1. Examples of weight functions The w eight functions must to be b ounded and p ositiv e. The choice can dep end on some criteria suc h as improving p erformance or reducing compu- tational complexit y . If the target density is b ounded, tw o p ossibilities are ω j ( z 1 , ..., z j +1 ) = p ( z 1 ) , (17) or ω j ( z 1 , ..., z j +1 ) = p ( z 1 ) p ( z 2 ) · · · p ( z j +1 ) , (18) with j = 1 , ..., N . Another p ossible c hoices are the following ω j ( z 1 , ..., z j +1 ) =  p ( z 1 ) q j ( z 1: j | z j +1 )  θ , (19) where θ > 0 is a p ositive constant, or ω j ( z 1 , ..., z j +1 ) = p ( z j ) q 1 ( z j | z j +1 ) p ( z j − 1 ) q 2 ( z j − 1: j | z j +1 ) · · · p ( z 1 ) q j ( z 1: j | z j +1 ) , (20) and a third p ossible c hoice ω j ( z 1 , ..., z j +1 ) = p ( z 1 ) π j +1 ( z 1 | z j +1:2 ) , (21) where π j +1 ( z 1 | z j +1:2 ) is the j + 1-th prop osal p df used in the step 1 of the algorithm. It is imp ortan t to remark that the z -v ariables are ordered such that z 1 is the most recen tly generated sample, z j is the first dra wn sample, and z j +1 represen ts the pr evious step of the chain. Clearly , o wing to the great flexibility in the construction of the w eight functions, it can be difficult to assert whic h is the b est choice in terms of p erformances of the algorithm. Ho w ever, eviden tly , in general including more statistical information in the weigh ts can impro ve performance y et, at the same time, increases the computational cost of the designed tec hnique. More sp ecific theoretical or empirical studies are needed to clear up this issue. Indeed, observe that the p oin t of the b est selection of the weigh ts is ev en unclear in the classical MTM b y Liu et al. (2000), as for the metho d in P andolfi et al. (2010), for instance. 7 3.2. R elationship with the indep endent multiple tries scheme In P andolfi et al. (2010) i.i.d. candidates are prop osed at each time step. The acceptance probabilit y α in Eqs. (14)-(15) may app ear similar to the acceptance probability in Pandolfi et al. (2010). How ever, note that the ex- pression of α in Eq. (15) is differen t to the acceptance probabilit y in Pandolfi et al. (2010) for t wo main reasons: (a) the first factor p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) is distinct (see Eqs. (16)), and (b) the definition and computation of ¯ W x and ¯ W y (see Eqs. (10) and (13)) are also different since here the weigh t functions tak e in account the previous generated samples (in the same time step). If here w e set π j ( y j | x, y 1: j − 1 ) = π ( y j | x ) for all j = 1 , ..., N , then the steps of our algorithm coincides exactly with those of tec hnique in P andolfi et al. (2010) exc ept for the step 4. Indeed, the wa y of c ho osing the “reference” p oin ts are different in the t wo methods (in our case, some of them are fixed while in P andolfi et al. (2010) all the reference point are c hosen random). W e can find a sp ecular difference b et ween the methods in Liu et al. (2000) and Qin and Liu (2001). 3.3. Multi-p oint Metr op olis as sp e cific c ase In the case when the w eight functions are c hosen as in Eq. (6), i.e., ω k ( z 1 , ..., z j +1 ) = p ( z 1 ) q j ( z 2: j +1 | z 1 ) λ j ( z 1 , ...., z j +1 ) , where λ j ( z 1 , z 2: j +1 ) = λ j ( z j +1:2 , z 1 ) , (22) is sequen tially symmetric, then our sc heme coincides exactly with the stan- dard multi-point Metrop olis metho d in Qin and Liu (2001). Indeed, first of all w e can rewrite the expression (15) as α = min " 1 , p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) ω k ( x ∗ k :1 , y ) ω k ( y k :1 , x ) P N j =1 ω j ( y j :1 , x ) P N j =1 ω j ( x ∗ j :1 , y ) # . (23) Then, recalling the Eq. (11), i.e., x ∗ 1 = y k − 1 , x ∗ 2 = y k − 2 , ...., x ∗ k − 1 = y 1 , x ∗ k = x and y = y k , the tw o weigh ts ω k ( x ∗ k :1 , y ) and ω k ( y k :1 , x ) can b e expressed exactly as ω k ( x ∗ k :1 , y ) = ω k ( x ∗ k = x, x ∗ k − 1 = y 1 , ..., x ∗ 1 = y k − 1 , y = y k ) = p ( x ) q k ( y 1: k | x ) λ k ( x, y 1: k ) , 8 and ω k ( y k :1 , x ) = ω k ( y k = y , y k − 1 = x ∗ 1 , ..., y 1 = x ∗ k − 1 , x = x ∗ k ) = p ( y ) q k ( x ∗ 1: k | y ) λ k ( y , x ∗ 1: k ) , resp ectiv ely . Therefore replacing the w eigh ts ω k ( x ∗ k :1 , y ) and ω k ( y k :1 , x ) in Eq. (23), w e obtain α = min " 1 , λ k ( x, y 1: k ) λ k ( y , x ∗ 1: k ) P N j =1 ω j ( y j :1 , x ) P N j =1 ω j ( x ∗ j :1 , y ) # = min " 1 , P N j =1 ω j ( y j :1 , x ) P N j =1 ω j ( x ∗ j :1 , y ) # , that coincides with acceptance probabilit y in Eq. (9) of the standard m ulti- p oin t Metrop olis algorithm. Note that w e ha ve considered λ k ( x, y 1: k ) = λ k ( y , x ∗ 1: k ). Indeed, since x ∗ k = x we can write λ k ( x, y 1: k ) = λ k ( y , x ∗ 1: k − 1 , x ) , then b ecause x ∗ 1: k − 1 = y k − 1:1 , w e obtain λ k ( x, y 1: k ) = λ k ( y , y k − 1:1 , x ) , and as y = y k , finally w e hav e λ k ( x, y 1: k ) = λ k ( y k :1 , x ) , that is exactly the condition assumed in Eq. (22). In the follo wing, w e show the prop osed tec hnique satisfies the detailed balance condition. 4. Pro of of the detailed balance condition T o guaran tee that a Mark o v chain generated b y an MCMC metho d con- v erges to the target distribution p ( x ) ∝ p o ( x ), the kernel A ( y | x ) of the cor- resp onding algorithm fulfills the follo wing detailed balance condition 1 p ( x ) A ( y | x ) = p ( y ) A ( x | y ) . First of all, w e ha ve to find the k ernel A ( y | x ) of the algorithm, i.e., the conditional probabilit y to mo v e from x to y . F or simplicity , w e consider the case x 6 = y (case x = y is trivial). The kernel can b e expressed as A ( y = y k | x ) = N X j =1 h ( y = y k | x, k = j ) , (24) 1 Note that the detailed balance condition is sufficien t but not necessary condition. Namely , the detailed balance ensures inv ariance. The conv erse is not true. Marko v chains that satisfy the detailed balance condition are called r eversible . 9 where h ( y = y k | x, k = j ) is the probabilit y of accepting x t +1 = y k giv en x t = x when the chosen sample y k is the j -th candidate, i.e., when y k = y j . In the sequel, w e study just one h ( y = y k | x, k ) for a generic k ∈ { 1 , ..., N } . Indeed, if h ( y = y k | x, k ) fulfills the detailed balance condition (it is symmetric w.r.t. x and y ), then A ( y | x ) also satisfies the detailed balance b ecause it is a sum of symmetric functions. Therefore, w e wan t to show that p ( x ) h ( y | x, k ) = p ( y ) h ( x | y , k ) , for a generic k ∈ { 1 , ..., N } . F ollo wing the steps ab o ve of the algorithm, we can write p ( x ) h ( y | x, k ) = = p ( x ) Z · · · Z " N Y j =1 π j ( y j | x, y 1: j − 1 ) # ω k ( y k :1 , x ) P N j =1 ω j ( y j :1 , x ) " N Y i = k +1 π i ( x ∗ i | y , x ∗ 1: i − 1 ) # · · min  1 , p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) ¯ W x ¯ W y  dy 1: k − 1 dy k +1: N dx ∗ k +1: N . Note that each factor inside the integral corresp onds to a step of the metho d describ ed in the previous section. The in tegral is o v er all auxiliary v ari- ables. Recalling the definition of the joint probabilit y q k ( y 1: k | x ) and ¯ W y , the expression can b e simplified to p ( x ) h ( y | x, k ) = = p ( x ) Z · · · Z q k ( y 1: k | x ) ·   N Y j = k +1 π j ( y j | x, y 1: j − 1 )   · ¯ W y · " N Y i = k +1 π i ( x ∗ i | y , x ∗ 1: i − 1 ) # · min  1 , p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) ¯ W x ¯ W y  dy 1: k − 1 dy k +1: N dx ∗ k +1: N , and w e only arrange it, obtaining p ( x ) h ( y | x, k ) = = Z · · · Z p ( x ) q k ( y 1: k | x ) ¯ W y   N Y j = k +1 π j ( y j | x, y 1: j − 1 )   " N Y i = k +1 π i ( x ∗ i | y , x ∗ 1: i − 1 ) # · min  1 , p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) ¯ W x ¯ W y  dy 1: k − 1 dy k +1: N dx ∗ k +1: N . 10 No w, w e multiply the tw o mem b ers of the function min[ · , · ] b y the factor p ( x ) q k ( y 1: k | x ) ¯ W y so that p ( x ) h ( y | x, k ) = Z · · · Z " N Y j = k +1 π j ( y j | x, y 1: j − 1 ) # " N Y i = k +1 π i ( x ∗ i | y , x ∗ 1: i − 1 ) # · min  p ( x ) q k ( y 1: k | x ) ¯ W y , p ( y ) q k ( x ∗ 1: k | y ) ¯ W x  dy 1: k − 1 dy k +1: N dx ∗ k +1: N . Therefore, it is straigh tforw ard that the expression ab o ve is symmetric in x and y . Indeed, w e can exchange the notations of x and y , and x ∗ i and y j , resp ectiv ely , and the expression do es not v ary . Then we can write p ( x ) h ( y | x, k ) = p ( y ) h ( x | y , k ) . (25) W e can rep eat the same dev elopmen t for each k obtaining p ( x ) A ( y | x ) = p ( y ) A ( x | y ) , (26) that is the detailed balance condition. Therefore, the generated Marko v chain con verges to our target p df. 5. T o y example No w we provide a simple numerical sim ulation to show an example of m ulti-p oin t scheme with generic weigh t functions and compare it with the tec hnique in P andolfi et al. (2010). Let X ∈ R b e a random v ariable 2 with bimo dal pdf p o ( x ) ∝ p ( x ) = exp  − ( x 2 − 4) 2 / 4  . (27) Our goal is to draw samples from p o ( x ) using our prop osed multi-point tech- nique. W e consider a Gaussian densities as prop osal pdfs (a standard choice) π j ( y j | x t , y 1: j − 1 ) ∝ exp  − ( y j − µ j ) 2 2 σ 2  (28) where w e use σ 2 = 1 and µ j = γ 1 i − 1 ( x t + y 1 + ... + y i − 2 ) + γ 2 y i − 1 , (29) 2 Note that we consider a scalar v ariable only to simplify the treatment. Clearly , all the considerations and algorithms are v alid for multi-dimensional v ariables. 11 i.e, µ is a weigh ted mean ( γ 1 + γ 2 = 1) of the previous state x t and the previous generated samples (at the same time step). Sp ecifically , w e set γ 1 = 0 . 2 and γ 2 = 0 . 8. Moreo ver, w e choose very simple w eight functions dep ending only on first v ariable and on the target p df ω (1) j ( z 1 , z 2 , ...., z j +1 ) = [ p ( z 1 )] θ , (30) with θ = 1 / 2. Note that p ( · ) is b ounded and also p ositiv e (since it is a p df ). This kind of w eights cannot b e used in the multi-point scheme of Qin and Liu (2001), exp ect for θ = 1 and using a sp ecific sequence of the prop osal p dfs. Moreov er, for θ = 1 this weigh t function can b e also used in a standard MTM of Liu et al. (2000) if the chosen prop osal density π ( y | x ) is symmetric (i.e, π ( y | x ) = π ( x | y ) and c ho osing λ ( x, y ) = 1 π ( x | y ) ). W e also compare the p erformances of the prop osed algorithms with the w eights as ω (2) j ( z 1 , ..., z j +1 ) = p ( z 1 ) p ( z 2 ) · · · p ( z j +1 ) , (31) and ω (3) j ( z 1 , ..., z j +1 ) = p ( z 1 ) π j +1 ( z 1 | z j +1:2 ) . (32) Then, w e run the prop osed m ulti-p oin t algorithm with different n um b ers N of candidates and calculate the estimated acceptance rate (the a veraged prob- abilit y of accepting a mo v ement) and linear correlation co efficien t (b et ween one state of the chain and the next). W e also run the metho d in P andolfi et al. (2010) with prop osal pdf π ( y j | x t ) ∝ exp n − ( y j − x t ) 2 2 σ 2 o and compare the p erformances, using w eight functions as in Eq. (30) and third type in Eq. (32). Because the samples are generated indep enden tly , w e do not compare using w eights in Eq. (31), as statistically this no longer makes sense. Moreo ver, observe that in the scheme of Pandolfi et al. (2010) (where the candidates are dra wn indep endently), the w eight functions in Eq. (32) b e- come ω (3) ( y j , x t − 1 ) = p ( y j ) π ( y j | x t − 1 ) where x t − 1 is the previous step of the c hain 3 . Note also that this particular c hoice of weigh ts ω (3) can be used in the stan- dard MTM of Liu et al. (2000) (by c ho osing λ ( x, y ) = 1 π ( y | x ) π ( x | y ) ) and, in this case, the technique of Pandolfi et al. (2010) coincides with a standard MTM. 3 Note that, in the expression of the weigh ts ω (3) ( y j , x t − 1 ), we remov e the subscript j b ecause in Pandolfi et al. (2010) the analytic form of the w eights is the same for each generated sample y j , j = 1 , ..., N . 12 Figure 1(a) depicts the target density p o ( x ) (solid line) and the normalized histogram of 100 , 000 samples dra wn using our proposed sc heme and N = 10. Figures 1(b)-(c) illustrate the mean acceptance probabilit y and the estimated correlation co efficien t (for different v alues of N and a v eraged using 5 , 000 runs) of the t w o techniques and different c hoice of w eights: our metho d is sho wn with squares using ω (1) j , with solid line using ω (2) j and with circles using ω (3) j . The performances of the metho d in P andolfi et al. (2010) are depicted with dashed line corresponding to the first choice ω (1) j , and dotted line with triangles for ω (3) j . W e can see although the proposed technique alw ays attains smaller ac- ceptance rates, the resulting correlations are alwa ys smaller than the corre- lations obtained by the other metho d, except using w eights ω (2) j in Eq. (31). Moreo ver, the best results are obtained with the prop osed tec hnique using the weigh ts ω (3) j in Eq. (32). In this case, the correlation decreases when N increases, up to 0 . 72 with N = 100. − 2 0 2 0 0.1 0.2 0.3 0.4 0.5 (a) 0 20 40 60 80 0 0.2 0.4 0.6 0.8 1 N (b) 0 20 40 60 80 0 0.2 0.4 0.6 0.8 1 N (c) Figure 1: (a) The target densit y p o ( x ) (solid line) and the normalized histogram of the samples generated using the proposed sc heme and with N = 10. (b) The mean acceptance probabilit y of jumping in a new state, dep ending on the n umber of tries N . W e show the results of the tec hnique in P andolfi et al. (2010) (dashed line for weigh ts ω (1) j and dotted line with triangles for ω (3) j ) and our method (squares for ω (1) j , solid line with ω (2) j , and with circles with ω (3) j ). (c) Estimated linear correlation co efficien t dep ending on the n umber of tries N for the different techniques. 6. Discussion In this work, we ha ve in tro duced a Metrop olis sc heme with multiple cor- related p oin ts where the w eight functions are not defined sp ecifically , i.e., the 13 analytic form can b e c hosen arbitrarily . W e pro ved that our nov el sc heme satisfies the detailed balance condition. Our approac h draws from tw o differen t approaches (Pandolfi et al., 2010; Qin and Liu, 2001) to form a nov el efficien t and flexible multi-point scheme. The multi-point approac h with correlated samples pro vides differen t ad- v an tages o ver the standard MTM. F or instance, the m ulti-p oin t pro cedure can iteratively impro ve the prop osal p dfs in tw o differen t wa ys. Firstly , since the prop osal p dfs can b e distinct, as in Casarin et al. (2011), it is p ossible to tune the parameters of eac h proposal in every time step. Secondly , since the candidates are generated sequen tially , successiv e proposal p dfs can b e impro ved learning from the previously pro duced samples during the same time step. Moreo ver, in our tec hnique, the only constrain ts of the weigh t functions are that they m ust b e b ounded and p ositiv e, unlik e in the existing m ulti-p oin t Metrop olis algorithm (Qin and Liu, 2001) which is based on a sp ecific defi- nition of the w eight functions. Here the w eigh ts can b e c hosen with resp ect to some criteria suc h as impro ving performance or reducing computational complexit y . Thus our metho d a voids any con trol or c hec k the existence of a suitable function λ and, therefore, the selection of the weigh t functions is broader and easier. It is interesting to observ e that, in general, the function λ may dep end on the prop osal p df for a sp ecific choice of w eights and, in some cases, may en tail certain constrain ts on the prop osal p df (suc h as that it b e symmetric, for instance). An imp ortan t consequence of this, it is that the weigh ts can b e chosen independently of the sp ecific prop osal p df used in the algorithm. Namely , the prop osal distribution and the weigh t functions can b e selected separately , to fit w ell to the sp ecific problem and to impro v e the p erformance of the tec hnique. Ho wev er, further theoretical or empirical studies are needed to determine the b est c hoice of w eight functions giv en a certain prop osal and target densit y . F urthermore, unlike in P andolfi et al. (2010), in our metho d the weigh ts can dep end on the previous candidates, and the dimension of the weigh t functions gro ws from R 2 to R N , thus b eing more general and potentially more p o werful. Figure 2 illustrates the relationships among different MTM sc hemes according to the flexibilit y in the choice of the prop osal and w eight functions. Finally , w e ha ve also shown a n umerical simulation and a simple m ulti p oin t sc heme that pro vides go od p erformances reducing the correlation in the pro duced c hain. 14 Figure 2: Comparison of different MTM sc hemes in literature according to the flexibility in the choice of the prop osal and w eight functions. With the acronym OBMC w e indicate the orientational bias Monte Carlo in tro duced by F renkel and Smit (1996). 7. Ac kno wledgements W e w ould lik e to thank the Reviewer for his comments which hav e helped us to impro ve the first version of man uscript. Moreov er, this w ork has b een partially supported b y Ministerio de Ciencia e Innov aci´ on of Spain (pro ject MONIN, ref. TEC-2006-13514- C02- 01/TCM, Program Consolider-Ingenio 2010, ref. CSD2008- 00010 COMONSENS, and Distribuited Learning Comm unication and Information Pro cessing (DEIPR O) ref. TEC2009-14504-C02-01) and Com unidad Autonoma de Madrid (pro ject PR OMUL TIDIS- CM, ref. S-0505/TIC/0233). References Andrieu, C., Moulines, E., 2006. On the ergo dicit y prop erties of some adap- tiv e MCMC algorithms. The Annals of Applied Probabilit y 16 (3), 1462– 15 1505. Casarin, R., Craiu, R., Leisen, F., 2011. In teracting m ultiple try algorithms with differen t prop osal distributions. T o app ear in Statistics and Comput- ing, DOI: 10.1007/s11222–011–9301–9. F renk el, D., Smit, B., 1996. Understanding molecular simulation: from algo- rithms to applications. Academic Press, San Diego. Hastings, W . K., 1970. Monte Carlo sampling methods using Mark o v c hains and their applications. Biometrik a 57 (1), 97–109. Liang, F., Liu, C., Caroll, R., 2010. Adv anced Marko v Chain Mon te Carlo Metho ds: Learning from P ast Samples. Wiley Series in Computational Statistics, England. Liu, J. S., 2004. Mon te Carlo Strategies in Scientific Computing. Springer. Liu, J. S., Liang, F., W ong, W. H., Marc h 2000. The m ultiple-try metho d and lo cal optimization in metrop olis sampling. Journal of the American Statistical Asso ciation 95 (449), 121–134. Metrop olis, N., Rosen bluth, A., Rosen bluth, M., T eller, A., T eller, E., 1953. Equations of state calculations b y fast computing mac hines. Journal of Chemical Ph ysics 21, 1087–1091. P andolfi, S., Bartolucci, F., F riel, N., 2010. A generalization of the Multiple- try Metrop olis algorithm for Bay esian estimation and mo del selection. Journal of Machine Learning Research (W orkshop and Conference Pro- ceedings V olume 9: AIST A TS 2010) 9, 581–588. Qin, Z. S., Liu, J. S., 2001. Multi-Poin t Metrop olis metho d with application to h ybrid Monte Carlo. Journal of Computational Physics 172, 827–840. Rob ert, C. P ., Casella, G., 2004. Monte Carlo Statistical Metho ds. Springer. Storvik, G., F ebruary 2011. On the flexibilit y of Metrop olis-Hastings accep- tance probabilities in auxiliary v ariable prop osal generation. Scandinavian Journal of Statistics 38 (2), 342–358. 16

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment