A multi-point Metropolis scheme with generic weight functions

A m ulti-p oin t Metrop olis sc heme with generic w eigh t functions Luca Martino, Victor P ascual Del Olmo, Jesse Read Dep artment of Signal The ory and Communic ations, Universidad Carlos III de Madrid. A venida de la Universidad 30, 28911, L e ganes, Madrid, Sp ain. luca@tsc.uc3m.es , vpolmo@tsc.uc3m.es , jesse@tsc.uc3m.es Abstract The m ulti-p oin t Metrop olis algorithm is an adv anced MCMC tec hnique based on drawing sev eral correlated samples at each step and choosing one of them according to some normalized w eights. W e prop ose a v ariation of this tec h- nique where the w eigh t functions are not speciﬁed, i.e., the analytic form can b e c hosen arbitrarily . This has the adv antage of greater ﬂexibilit y in the design of high-p erformance MCMC samplers. W e prov e that our metho d fulﬁlls the balance condition, and pro vide a numerical sim ulation. W e also giv e new insigh t into the functionality of diﬀerent MCMC algorithms, and the connections b et w een them. Keywor ds: Multiple T ry Metrop olis algorithm, Multi-point Metropolis algorithm, MCMC metho ds 1. In tro duction Mon te Carlo statistical metho ds are p o w erful to ols for numerical inference and stochastic optimization (see Rob ert and Casella (2004), for instance). Mark ov chain Mon te Carlo (MCMC) metho ds are classical Monte Carlo tec hniques that generate samples from a target probabilit y density function (p df ) by dra wing from a simpler proposed pdf, usually to approximate an otherwise-incalculable (analytically) in tegral (Liu, 2004; Liang et al., 2010). MCMC algorithms produce a Mark o v c hain with a stationary distribution that coincides with the target p df. The Metrop olis-Hastings (MH) algorithm (Metropolis et al., 1953; Hast- ings, 1970) is the most famous MCMC tec hnique. It can be applied to almost Pr eprint submitte d to Statistics & Pr ob ability L etters Septemb er 6, 2018 an y target distribution. In practice, how ev er, ﬁnding a “go o d” prop osal pdf can b e diﬃcult. In some applications, the Marko v chain generated b y the MH algorithm can remain trapp ed almost indeﬁnitely in a local mo de meaning that, in practice, con vergence may not be reached. The Multiple-T ry Metr op olis (MTM) metho d of Liu et al. (2000) is an extension of the MH algorithm in whic h the next state of the c hain is selected among a set of indep enden t and identically distributed (i.i.d.) samples. This enables the MCMC sampler to make large step-size jumps without a low ering the acceptance rate; and thus MTM is can explore a larger portion of the sample space in few er iterations. An in teresting sp ecial case of the MTM, w ell-known in molecular simula- tion ﬁeld, is the orientational bias Monte Carlo , as describ ed in Chapter 13 of F renk el and Smit (1996) and Chapter 5 of Liu (2004), where i.i.d. candidates are drawn from a symmetric prop osal p df, and one of these is c hosen accord- ing to some weigh ts directly prop ortional to the target p df. Here, how ever, the analytic form of the w eight functions is ﬁxed and unalterable. Casarin et al. (2011) introduced a MTM sc heme using diﬀeren t prop osal p dfs. In this case the samples pro duced are indep endent but not identi- cally distributed. In Qin and Liu (2001), another generalization of the MTM (called the multi-p oint Metr op olis metho d) is proposed using correlated can- didates at eac h step. Clearly , the prop osal pdfs are also diﬀeren t in this case. Moreo ver, in P andolﬁ et al. (2010) an extension of the classical MTM tec hnique is introduced where the analytic form of the weigh ts is not sp eciﬁed. In Pandolﬁ et al. (2010), the same prop osal p df is used to dra w samples, so that the candidates generated eac h step of the algorithm are i.i.d. F urther in teresting and related considerations ab out the use of auxiliary v ariables for building acceptance probabilities within a MH approach can b e found in Storvik (2011). In this pap er, we draw from the t wo approaches (Qin and Liu, 2001) and (P andolﬁ et al., 2010) to create a no v el algorithm that selects a new state of the c hain among correlated samples using generic w eight functions, i.e., the analytic form of the w eigh ts can b e chosen arbitrarily . F urthermore, we form ulate the algorithm and the acceptance rule in order to fulﬁll the detailed balance condition. Our metho d allows more ﬂexibility in the design of eﬃcient MCMC sam- plers with a larger co v erage and faster exploration of the sample space. In fact, w e can choose an y b ounded and p ositiv e w eigh t functions to either im- 2 pro ve p erformance or reduce computational complexit y , indep enden tly of the c hosen proposal pdf. Moreo v er, since in our approac h the prop osal p dfs are diﬀeren t, adaptiv e or interacting techniques can b e applied, suc h as those in tro duced b y Andrieu and Moulines (2006); Casarin et al. (2011). An im- p ortan t adv antage of our pro cedure is that, since in our pro cedure a new candidate is dra wn from a conditional pdf whic h dep ends on the samples gen- erated earlier during the same time step, it constructs an impro ved prop osal b y automatically building on the information obtained from the generated samples. The rest of the pap er is organized as follo ws. In Section 2 w e recall the standard multi-point Metropolis algorithm. In Section 3 w e introduce our no vel scheme with generic weigh t functions and correlated samples. Section 4 pro vides a rigorous pro of that the nov el scheme satisﬁes the detailed balance condition. A n umerical sim ulation is pro vided in Section 5 and ﬁnally , in Section 6, we discuss the adv antages of our prop osed technique and pro vide insigh t into the relationships among diﬀerent MTM sc hemes in literature. 2. Multi-p oin t Metrop olis algorithm In the classical MH algorithm, a new p ossible state is drawn from the prop osal p df and the mov ement is accepted with a suitable decision rule. In the m ulti-p oin t approac h, several correlated samples are generated and, from these, a “go od” one is chosen. Sp eciﬁcally , consider a target p df p o ( x ) kno wn up to a constan t (hence, w e can ev aluate p ( x ) ∝ p o ( x )). Given a curren t state x ∈ R (we assume scalar v alues only for simplicity in the treatmen t), we dra w N correlated samples eac h step from a sequence of diﬀerent prop osal p dfs { π j } N j =1 , i.e., y 1 ∼ π 1 ( ·| x ) ,y 2 ∼ π 2 ( ·| x, y 1 ) , y 3 ∼ π 3 ( ·| x, y 1 , y 2 ) , . . . . . . y N ∼ π N ( ·| x, y 1 , ..., y N − 1 ) . (1) Therefore, w e can write the joint distribution of the generated samples as q N ( y 1 , ..., y N | x ) = q N ( y 1: N | x ) = π 1 ( y 1 | x ) π 2 ( y 2 | x, y 1 ) · · · π N ( y N | x, y 1: N − 1 ) , (2) i.e., q N ( y 1: N | x ) = π 1 ( y 1 | x ) N Y j =2 π j ( y j | x, y 1: j − 1 ) (3) 3 where, for brevit y , we use the notation y 1: j , [ y 1 , ..., y j ] and y j :1 , [ y j , ..., y 1 ] denotes the v ector with the reverse order. A “go o d” candidate among the generated samples is chosen according to w eight functions ω j ( z 1 , ..., z j +1 ) ∈ R j +1 → R + where z 1 , ..., z j +1 , are generic v ariables and j = 1 , ..., N . The sp eciﬁc analytic form of the w eights needed in this technique is ω j ( z 1 , ..., z j +1 ) , p ( z 1 ) π 1 ( z 2 | z 1 ) · · · π N ( z j | z 1: j − 1 ) λ j ( z 1 , ...., z j +1 ) , (4) where p ( x ) ∝ p o ( x ) is the target p df, λ j can b e any b ounded, p ositiv e, and se quential ly symmetric function, i.e., λ j ( z 1 , z 2: j +1 ) = λ j ( z j +1:2 , z 1 ) . (5) Note that, since q j ( z 2: j +1 | z 1 ) = π 1 ( z 2 | z 1 ) · · · π N ( z j | z 1: j − 1 ) (see Eq. (2)), w e can rewrite the w eight functions as ω j ( z 1 , ..., z j +1 ) = p ( z 1 ) q j ( z 2: j +1 | z 1 ) λ j ( z 1 , z 2: j +1 ) . (6) 2.1. A lgorithm Giv en a current state x = x t , the multi-point Metrop olis algorithm con- sists of the follo wing steps: 1. Dra w N samples y 1: N = [ y 1 , y 2 , ..., y N ] from the join t p df q N ( y 1: N | x ) = π 1 ( y 1 | x ) N Y j =2 π j ( y j | x, y 1: j − 1 ) namely , dra w y j from π j ( ·| x, y 1: j − 1 ), with j = 1 , ..., N . 2. Calculate the weigh ts ω j ( y j :1 , x ) as in Eq. (4), and normalize them to obtain ¯ ω j , j = 1 , ..., N . 3. Dra w a y = y k ∈ { y 1 , ...., y N } according to their w eights ¯ ω 1 , ..., ¯ ω N . 4. Set x ∗ 1 = y k − 1 , x ∗ 2 = y k − 2 , . . . , x ∗ k − 1 = y 1 , (7) 4 and ﬁnally x ∗ k = x . Then, dra w other “reference” samples x ∗ j ∼ π i ( ·| y , x ∗ 1: j − 1 ) , (8) for j = k + 1 , ..., N . Note that for j = k + 1 w e hav e π j ( ·| y , x ∗ 1: j − 1 ) = π j ( ·| y , x ∗ 1 = y k − 1 , ..., x ∗ k − 1 = y 1 , x ∗ k = x ) , and, for j = k + 2 , ..., N , w e hav e π j ( ·| y , x ∗ 1: j − 1 ) = π j ( ·| y , x ∗ 1 = y k − 1 , ..., x ∗ k − 1 = y 1 , x ∗ k = x, x ∗ k +1: j − 1 ) . 5. Compute ω j ( x ∗ j :1 , y ) as in Eq. (4). 6. Let x t +1 = y k with probabilit y α = min " 1 , P N j =1 ω j ( y j :1 , x ) P N j =1 ω j ( x ∗ j :1 , y ) # , (9) otherwise set x t +1 = x with probability 1 − α . 7. Set t = t + 1 and rep eat from step 1. The kernel of this technique satisﬁes the detailed balance condition as sho wn in Qin and Liu (2001). Ho wev er, to fulﬁll this condition, the algorithm needs that the w eights are deﬁned exactly with the form in Eq. (4). 3. Extension with generic w eight functions No w, w e consider generic w eight functions ω j ( z 1 , ..., z j +1 ) ∈ R j +1 → R + , that hav e to b e (a) b ounded and (b) p ositiv e. In this case, the algorithm can b e described as follows. 1. Dra w N samples y 1: N = [ y 1 , y 2 , ..., y N ] from the join t p df q N ( y 1: N | x ) = π 1 ( y 1 | x ) N Y j =2 π j ( y j | x, y 1: j − 1 ) namely , dra w y j from π j ( ·| x, y 1: j − 1 ), with j = 1 , ..., N . 5 2. Choose some suitable (b ounded and p ositiv e) w eight functions. Then, calculate each weigh t ω j ( y j :1 , x ), and normalize them to obtain ¯ ω j , j = 1 , ..., N . 3. Dra w a y = y k ∈ { y 1 , ...., y N } according to ¯ ω 1 , ..., ¯ ω N , and set ¯ W y = ¯ ω k , i.e., ¯ W y , ω k ( y k :1 , x ) P N j =1 ω j ( y j :1 , x ) . (10) 4. Set x ∗ 1 = y k − 1 , x ∗ 2 = y k − 2 , . . . , x ∗ k − 1 = y 1 , (11) and ﬁnally x ∗ k = x . Then, dra w the remaining “reference” samples x ∗ j ∼ π j ( ·| y , x ∗ 1: j − 1 ) , (12) for j = k + 1 , ..., N . 5. Compute the general weigh ts ω j ( x ∗ j :1 , y ) and calculate the normalized w eight ¯ W x , ω k ( x ∗ k :1 , y ) P N j =1 ω j ( x ∗ j :1 , y ) . (13) 6. Set x t +1 = y k with probabilit y α = min  1 , p ( y ) π 1 ( x ∗ 1 | y ) π 2 ( x ∗ 2 | y , x ∗ 1 ) · · · π k ( x ∗ k | y , x ∗ 1 , ..., x ∗ k − 1 ) p ( x ) π 1 ( y 1 | x ) π 2 ( y 2 | x, y 1 ) · · · π k ( y k | x, y 1 , ..., y k − 1 ) ¯ W x ¯ W y  . (14) W e can rewrite it in a more compact form as α = min  1 , p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) ¯ W x ¯ W y  , (15) where w e recall q k ( y 1: k | x ) = π 1 ( y 1 | x ) k Y j =2 π j ( y j | x, y 1: j − 1 ) . (16) where k is the index of the c hosen sample y k . Otherwise, set x t +1 = x with probability 1 − α . 7. Set t = t + 1 and rep eat from step 1. W e emphasise that in the algorithm abov e we ha ve not sp eciﬁcally deﬁned the w eight functions. 6 3.1. Examples of weight functions The w eight functions must to be b ounded and p ositiv e. The choice can dep end on some criteria suc h as improving p erformance or reducing compu- tational complexit y . If the target density is b ounded, tw o p ossibilities are ω j ( z 1 , ..., z j +1 ) = p ( z 1 ) , (17) or ω j ( z 1 , ..., z j +1 ) = p ( z 1 ) p ( z 2 ) · · · p ( z j +1 ) , (18) with j = 1 , ..., N . Another p ossible c hoices are the following ω j ( z 1 , ..., z j +1 ) =  p ( z 1 ) q j ( z 1: j | z j +1 )  θ , (19) where θ > 0 is a p ositive constant, or ω j ( z 1 , ..., z j +1 ) = p ( z j ) q 1 ( z j | z j +1 ) p ( z j − 1 ) q 2 ( z j − 1: j | z j +1 ) · · · p ( z 1 ) q j ( z 1: j | z j +1 ) , (20) and a third p ossible c hoice ω j ( z 1 , ..., z j +1 ) = p ( z 1 ) π j +1 ( z 1 | z j +1:2 ) , (21) where π j +1 ( z 1 | z j +1:2 ) is the j + 1-th prop osal p df used in the step 1 of the algorithm. It is imp ortan t to remark that the z -v ariables are ordered such that z 1 is the most recen tly generated sample, z j is the ﬁrst dra wn sample, and z j +1 represen ts the pr evious step of the chain. Clearly , o wing to the great ﬂexibility in the construction of the w eight functions, it can be diﬃcult to assert whic h is the b est choice in terms of p erformances of the algorithm. Ho w ever, eviden tly , in general including more statistical information in the weigh ts can impro ve performance y et, at the same time, increases the computational cost of the designed tec hnique. More sp eciﬁc theoretical or empirical studies are needed to clear up this issue. Indeed, observe that the p oin t of the b est selection of the weigh ts is ev en unclear in the classical MTM b y Liu et al. (2000), as for the metho d in P andolﬁ et al. (2010), for instance. 7 3.2. R elationship with the indep endent multiple tries scheme In P andolﬁ et al. (2010) i.i.d. candidates are prop osed at each time step. The acceptance probabilit y α in Eqs. (14)-(15) may app ear similar to the acceptance probability in Pandolﬁ et al. (2010). How ever, note that the ex- pression of α in Eq. (15) is diﬀeren t to the acceptance probabilit y in Pandolﬁ et al. (2010) for t wo main reasons: (a) the ﬁrst factor p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) is distinct (see Eqs. (16)), and (b) the deﬁnition and computation of ¯ W x and ¯ W y (see Eqs. (10) and (13)) are also diﬀerent since here the weigh t functions tak e in account the previous generated samples (in the same time step). If here w e set π j ( y j | x, y 1: j − 1 ) = π ( y j | x ) for all j = 1 , ..., N , then the steps of our algorithm coincides exactly with those of tec hnique in P andolﬁ et al. (2010) exc ept for the step 4. Indeed, the wa y of c ho osing the “reference” p oin ts are diﬀerent in the t wo methods (in our case, some of them are ﬁxed while in P andolﬁ et al. (2010) all the reference point are c hosen random). W e can ﬁnd a sp ecular diﬀerence b et ween the methods in Liu et al. (2000) and Qin and Liu (2001). 3.3. Multi-p oint Metr op olis as sp e ciﬁc c ase In the case when the w eight functions are c hosen as in Eq. (6), i.e., ω k ( z 1 , ..., z j +1 ) = p ( z 1 ) q j ( z 2: j +1 | z 1 ) λ j ( z 1 , ...., z j +1 ) , where λ j ( z 1 , z 2: j +1 ) = λ j ( z j +1:2 , z 1 ) , (22) is sequen tially symmetric, then our sc heme coincides exactly with the stan- dard multi-point Metrop olis metho d in Qin and Liu (2001). Indeed, ﬁrst of all w e can rewrite the expression (15) as α = min " 1 , p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) ω k ( x ∗ k :1 , y ) ω k ( y k :1 , x ) P N j =1 ω j ( y j :1 , x ) P N j =1 ω j ( x ∗ j :1 , y ) # . (23) Then, recalling the Eq. (11), i.e., x ∗ 1 = y k − 1 , x ∗ 2 = y k − 2 , ...., x ∗ k − 1 = y 1 , x ∗ k = x and y = y k , the tw o weigh ts ω k ( x ∗ k :1 , y ) and ω k ( y k :1 , x ) can b e expressed exactly as ω k ( x ∗ k :1 , y ) = ω k ( x ∗ k = x, x ∗ k − 1 = y 1 , ..., x ∗ 1 = y k − 1 , y = y k ) = p ( x ) q k ( y 1: k | x ) λ k ( x, y 1: k ) , 8 and ω k ( y k :1 , x ) = ω k ( y k = y , y k − 1 = x ∗ 1 , ..., y 1 = x ∗ k − 1 , x = x ∗ k ) = p ( y ) q k ( x ∗ 1: k | y ) λ k ( y , x ∗ 1: k ) , resp ectiv ely . Therefore replacing the w eigh ts ω k ( x ∗ k :1 , y ) and ω k ( y k :1 , x ) in Eq. (23), w e obtain α = min " 1 , λ k ( x, y 1: k ) λ k ( y , x ∗ 1: k ) P N j =1 ω j ( y j :1 , x ) P N j =1 ω j ( x ∗ j :1 , y ) # = min " 1 , P N j =1 ω j ( y j :1 , x ) P N j =1 ω j ( x ∗ j :1 , y ) # , that coincides with acceptance probabilit y in Eq. (9) of the standard m ulti- p oin t Metrop olis algorithm. Note that w e ha ve considered λ k ( x, y 1: k ) = λ k ( y , x ∗ 1: k ). Indeed, since x ∗ k = x we can write λ k ( x, y 1: k ) = λ k ( y , x ∗ 1: k − 1 , x ) , then b ecause x ∗ 1: k − 1 = y k − 1:1 , w e obtain λ k ( x, y 1: k ) = λ k ( y , y k − 1:1 , x ) , and as y = y k , ﬁnally w e hav e λ k ( x, y 1: k ) = λ k ( y k :1 , x ) , that is exactly the condition assumed in Eq. (22). In the follo wing, w e show the prop osed tec hnique satisﬁes the detailed balance condition. 4. Pro of of the detailed balance condition T o guaran tee that a Mark o v chain generated b y an MCMC metho d con- v erges to the target distribution p ( x ) ∝ p o ( x ), the kernel A ( y | x ) of the cor- resp onding algorithm fulﬁlls the follo wing detailed balance condition 1 p ( x ) A ( y | x ) = p ( y ) A ( x | y ) . First of all, w e ha ve to ﬁnd the k ernel A ( y | x ) of the algorithm, i.e., the conditional probabilit y to mo v e from x to y . F or simplicity , w e consider the case x 6 = y (case x = y is trivial). The kernel can b e expressed as A ( y = y k | x ) = N X j =1 h ( y = y k | x, k = j ) , (24) 1 Note that the detailed balance condition is suﬃcien t but not necessary condition. Namely , the detailed balance ensures inv ariance. The conv erse is not true. Marko v chains that satisfy the detailed balance condition are called r eversible . 9 where h ( y = y k | x, k = j ) is the probabilit y of accepting x t +1 = y k giv en x t = x when the chosen sample y k is the j -th candidate, i.e., when y k = y j . In the sequel, w e study just one h ( y = y k | x, k ) for a generic k ∈ { 1 , ..., N } . Indeed, if h ( y = y k | x, k ) fulﬁlls the detailed balance condition (it is symmetric w.r.t. x and y ), then A ( y | x ) also satisﬁes the detailed balance b ecause it is a sum of symmetric functions. Therefore, w e wan t to show that p ( x ) h ( y | x, k ) = p ( y ) h ( x | y , k ) , for a generic k ∈ { 1 , ..., N } . F ollo wing the steps ab o ve of the algorithm, we can write p ( x ) h ( y | x, k ) = = p ( x ) Z · · · Z " N Y j =1 π j ( y j | x, y 1: j − 1 ) # ω k ( y k :1 , x ) P N j =1 ω j ( y j :1 , x ) " N Y i = k +1 π i ( x ∗ i | y , x ∗ 1: i − 1 ) # · · min  1 , p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) ¯ W x ¯ W y  dy 1: k − 1 dy k +1: N dx ∗ k +1: N . Note that each factor inside the integral corresp onds to a step of the metho d describ ed in the previous section. The in tegral is o v er all auxiliary v ari- ables. Recalling the deﬁnition of the joint probabilit y q k ( y 1: k | x ) and ¯ W y , the expression can b e simpliﬁed to p ( x ) h ( y | x, k ) = = p ( x ) Z · · · Z q k ( y 1: k | x ) ·   N Y j = k +1 π j ( y j | x, y 1: j − 1 )   · ¯ W y · " N Y i = k +1 π i ( x ∗ i | y , x ∗ 1: i − 1 ) # · min  1 , p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) ¯ W x ¯ W y  dy 1: k − 1 dy k +1: N dx ∗ k +1: N , and w e only arrange it, obtaining p ( x ) h ( y | x, k ) = = Z · · · Z p ( x ) q k ( y 1: k | x ) ¯ W y   N Y j = k +1 π j ( y j | x, y 1: j − 1 )   " N Y i = k +1 π i ( x ∗ i | y , x ∗ 1: i − 1 ) # · min  1 , p ( y ) q k ( x ∗ 1: k | y ) p ( x ) q k ( y 1: k | x ) ¯ W x ¯ W y  dy 1: k − 1 dy k +1: N dx ∗ k +1: N . 10 No w, w e multiply the tw o mem b ers of the function min[ · , · ] b y the factor p ( x ) q k ( y 1: k | x ) ¯ W y so that p ( x ) h ( y | x, k ) = Z · · · Z " N Y j = k +1 π j ( y j | x, y 1: j − 1 ) # " N Y i = k +1 π i ( x ∗ i | y , x ∗ 1: i − 1 ) # · min  p ( x ) q k ( y 1: k | x ) ¯ W y , p ( y ) q k ( x ∗ 1: k | y ) ¯ W x  dy 1: k − 1 dy k +1: N dx ∗ k +1: N . Therefore, it is straigh tforw ard that the expression ab o ve is symmetric in x and y . Indeed, w e can exchange the notations of x and y , and x ∗ i and y j , resp ectiv ely , and the expression do es not v ary . Then we can write p ( x ) h ( y | x, k ) = p ( y ) h ( x | y , k ) . (25) W e can rep eat the same dev elopmen t for each k obtaining p ( x ) A ( y | x ) = p ( y ) A ( x | y ) , (26) that is the detailed balance condition. Therefore, the generated Marko v chain con verges to our target p df. 5. T o y example No w we provide a simple numerical sim ulation to show an example of m ulti-p oin t scheme with generic weigh t functions and compare it with the tec hnique in P andolﬁ et al. (2010). Let X ∈ R b e a random v ariable 2 with bimo dal pdf p o ( x ) ∝ p ( x ) = exp  − ( x 2 − 4) 2 / 4  . (27) Our goal is to draw samples from p o ( x ) using our prop osed multi-point tech- nique. W e consider a Gaussian densities as prop osal pdfs (a standard choice) π j ( y j | x t , y 1: j − 1 ) ∝ exp  − ( y j − µ j ) 2 2 σ 2  (28) where w e use σ 2 = 1 and µ j = γ 1 i − 1 ( x t + y 1 + ... + y i − 2 ) + γ 2 y i − 1 , (29) 2 Note that we consider a scalar v ariable only to simplify the treatment. Clearly , all the considerations and algorithms are v alid for multi-dimensional v ariables. 11 i.e, µ is a weigh ted mean ( γ 1 + γ 2 = 1) of the previous state x t and the previous generated samples (at the same time step). Sp eciﬁcally , w e set γ 1 = 0 . 2 and γ 2 = 0 . 8. Moreo ver, w e choose very simple w eight functions dep ending only on ﬁrst v ariable and on the target p df ω (1) j ( z 1 , z 2 , ...., z j +1 ) = [ p ( z 1 )] θ , (30) with θ = 1 / 2. Note that p ( · ) is b ounded and also p ositiv e (since it is a p df ). This kind of w eights cannot b e used in the multi-point scheme of Qin and Liu (2001), exp ect for θ = 1 and using a sp eciﬁc sequence of the prop osal p dfs. Moreov er, for θ = 1 this weigh t function can b e also used in a standard MTM of Liu et al. (2000) if the chosen prop osal density π ( y | x ) is symmetric (i.e, π ( y | x ) = π ( x | y ) and c ho osing λ ( x, y ) = 1 π ( x | y ) ). W e also compare the p erformances of the prop osed algorithms with the w eights as ω (2) j ( z 1 , ..., z j +1 ) = p ( z 1 ) p ( z 2 ) · · · p ( z j +1 ) , (31) and ω (3) j ( z 1 , ..., z j +1 ) = p ( z 1 ) π j +1 ( z 1 | z j +1:2 ) . (32) Then, w e run the prop osed m ulti-p oin t algorithm with diﬀerent n um b ers N of candidates and calculate the estimated acceptance rate (the a veraged prob- abilit y of accepting a mo v ement) and linear correlation co eﬃcien t (b et ween one state of the chain and the next). W e also run the metho d in P andolﬁ et al. (2010) with prop osal pdf π ( y j | x t ) ∝ exp n − ( y j − x t ) 2 2 σ 2 o and compare the p erformances, using w eight functions as in Eq. (30) and third type in Eq. (32). Because the samples are generated indep enden tly , w e do not compare using w eights in Eq. (31), as statistically this no longer makes sense. Moreo ver, observe that in the scheme of Pandolﬁ et al. (2010) (where the candidates are dra wn indep endently), the w eight functions in Eq. (32) b e- come ω (3) ( y j , x t − 1 ) = p ( y j ) π ( y j | x t − 1 ) where x t − 1 is the previous step of the c hain 3 . Note also that this particular c hoice of weigh ts ω (3) can be used in the stan- dard MTM of Liu et al. (2000) (by c ho osing λ ( x, y ) = 1 π ( y | x ) π ( x | y ) ) and, in this case, the technique of Pandolﬁ et al. (2010) coincides with a standard MTM. 3 Note that, in the expression of the weigh ts ω (3) ( y j , x t − 1 ), we remov e the subscript j b ecause in Pandolﬁ et al. (2010) the analytic form of the w eights is the same for each generated sample y j , j = 1 , ..., N . 12 Figure 1(a) depicts the target density p o ( x ) (solid line) and the normalized histogram of 100 , 000 samples dra wn using our proposed sc heme and N = 10. Figures 1(b)-(c) illustrate the mean acceptance probabilit y and the estimated correlation co eﬃcien t (for diﬀerent v alues of N and a v eraged using 5 , 000 runs) of the t w o techniques and diﬀerent c hoice of w eights: our metho d is sho wn with squares using ω (1) j , with solid line using ω (2) j and with circles using ω (3) j . The performances of the metho d in P andolﬁ et al. (2010) are depicted with dashed line corresponding to the ﬁrst choice ω (1) j , and dotted line with triangles for ω (3) j . W e can see although the proposed technique alw ays attains smaller ac- ceptance rates, the resulting correlations are alwa ys smaller than the corre- lations obtained by the other metho d, except using w eights ω (2) j in Eq. (31). Moreo ver, the best results are obtained with the prop osed tec hnique using the weigh ts ω (3) j in Eq. (32). In this case, the correlation decreases when N increases, up to 0 . 72 with N = 100. − 2 0 2 0 0.1 0.2 0.3 0.4 0.5 (a) 0 20 40 60 80 0 0.2 0.4 0.6 0.8 1 N (b) 0 20 40 60 80 0 0.2 0.4 0.6 0.8 1 N (c) Figure 1: (a) The target densit y p o ( x ) (solid line) and the normalized histogram of the samples generated using the proposed sc heme and with N = 10. (b) The mean acceptance probabilit y of jumping in a new state, dep ending on the n umber of tries N . W e show the results of the tec hnique in P andolﬁ et al. (2010) (dashed line for weigh ts ω (1) j and dotted line with triangles for ω (3) j ) and our method (squares for ω (1) j , solid line with ω (2) j , and with circles with ω (3) j ). (c) Estimated linear correlation co eﬃcien t dep ending on the n umber of tries N for the diﬀerent techniques. 6. Discussion In this work, we ha ve in tro duced a Metrop olis sc heme with multiple cor- related p oin ts where the w eight functions are not deﬁned sp eciﬁcally , i.e., the 13 analytic form can b e c hosen arbitrarily . W e pro ved that our nov el sc heme satisﬁes the detailed balance condition. Our approac h draws from tw o diﬀeren t approaches (Pandolﬁ et al., 2010; Qin and Liu, 2001) to form a nov el eﬃcien t and ﬂexible multi-point scheme. The multi-point approac h with correlated samples pro vides diﬀeren t ad- v an tages o ver the standard MTM. F or instance, the m ulti-p oin t pro cedure can iteratively impro ve the prop osal p dfs in tw o diﬀeren t wa ys. Firstly , since the prop osal p dfs can b e distinct, as in Casarin et al. (2011), it is p ossible to tune the parameters of eac h proposal in every time step. Secondly , since the candidates are generated sequen tially , successiv e proposal p dfs can b e impro ved learning from the previously pro duced samples during the same time step. Moreo ver, in our tec hnique, the only constrain ts of the weigh t functions are that they m ust b e b ounded and p ositiv e, unlik e in the existing m ulti-p oin t Metrop olis algorithm (Qin and Liu, 2001) which is based on a sp eciﬁc deﬁ- nition of the w eight functions. Here the w eigh ts can b e c hosen with resp ect to some criteria suc h as impro ving performance or reducing computational complexit y . Thus our metho d a voids any con trol or c hec k the existence of a suitable function λ and, therefore, the selection of the weigh t functions is broader and easier. It is interesting to observ e that, in general, the function λ may dep end on the prop osal p df for a sp eciﬁc choice of w eights and, in some cases, may en tail certain constrain ts on the prop osal p df (suc h as that it b e symmetric, for instance). An imp ortan t consequence of this, it is that the weigh ts can b e chosen independently of the sp eciﬁc prop osal p df used in the algorithm. Namely , the prop osal distribution and the weigh t functions can b e selected separately , to ﬁt w ell to the sp eciﬁc problem and to impro v e the p erformance of the tec hnique. Ho wev er, further theoretical or empirical studies are needed to determine the b est c hoice of w eight functions giv en a certain prop osal and target densit y . F urthermore, unlike in P andolﬁ et al. (2010), in our metho d the weigh ts can dep end on the previous candidates, and the dimension of the weigh t functions gro ws from R 2 to R N , thus b eing more general and potentially more p o werful. Figure 2 illustrates the relationships among diﬀerent MTM sc hemes according to the ﬂexibilit y in the choice of the prop osal and w eight functions. Finally , w e ha ve also shown a n umerical simulation and a simple m ulti p oin t sc heme that pro vides go od p erformances reducing the correlation in the pro duced c hain. 14 Figure 2: Comparison of diﬀerent MTM sc hemes in literature according to the ﬂexibility in the choice of the prop osal and w eight functions. With the acronym OBMC w e indicate the orientational bias Monte Carlo in tro duced by F renkel and Smit (1996). 7. Ac kno wledgements W e w ould lik e to thank the Reviewer for his comments which hav e helped us to impro ve the ﬁrst version of man uscript. Moreov er, this w ork has b een partially supported b y Ministerio de Ciencia e Innov aci´ on of Spain (pro ject MONIN, ref. TEC-2006-13514- C02- 01/TCM, Program Consolider-Ingenio 2010, ref. CSD2008- 00010 COMONSENS, and Distribuited Learning Comm unication and Information Pro cessing (DEIPR O) ref. TEC2009-14504-C02-01) and Com unidad Autonoma de Madrid (pro ject PR OMUL TIDIS- CM, ref. S-0505/TIC/0233). References Andrieu, C., Moulines, E., 2006. On the ergo dicit y prop erties of some adap- tiv e MCMC algorithms. The Annals of Applied Probabilit y 16 (3), 1462– 15 1505. Casarin, R., Craiu, R., Leisen, F., 2011. In teracting m ultiple try algorithms with diﬀeren t prop osal distributions. T o app ear in Statistics and Comput- ing, DOI: 10.1007/s11222–011–9301–9. F renk el, D., Smit, B., 1996. Understanding molecular simulation: from algo- rithms to applications. Academic Press, San Diego. Hastings, W . K., 1970. Monte Carlo sampling methods using Mark o v c hains and their applications. Biometrik a 57 (1), 97–109. Liang, F., Liu, C., Caroll, R., 2010. Adv anced Marko v Chain Mon te Carlo Metho ds: Learning from P ast Samples. Wiley Series in Computational Statistics, England. Liu, J. S., 2004. Mon te Carlo Strategies in Scientiﬁc Computing. Springer. Liu, J. S., Liang, F., W ong, W. H., Marc h 2000. The m ultiple-try metho d and lo cal optimization in metrop olis sampling. Journal of the American Statistical Asso ciation 95 (449), 121–134. Metrop olis, N., Rosen bluth, A., Rosen bluth, M., T eller, A., T eller, E., 1953. Equations of state calculations b y fast computing mac hines. Journal of Chemical Ph ysics 21, 1087–1091. P andolﬁ, S., Bartolucci, F., F riel, N., 2010. A generalization of the Multiple- try Metrop olis algorithm for Bay esian estimation and mo del selection. Journal of Machine Learning Research (W orkshop and Conference Pro- ceedings V olume 9: AIST A TS 2010) 9, 581–588. Qin, Z. S., Liu, J. S., 2001. Multi-Poin t Metrop olis metho d with application to h ybrid Monte Carlo. Journal of Computational Physics 172, 827–840. Rob ert, C. P ., Casella, G., 2004. Monte Carlo Statistical Metho ds. Springer. Storvik, G., F ebruary 2011. On the ﬂexibilit y of Metrop olis-Hastings accep- tance probabilities in auxiliary v ariable prop osal generation. Scandinavian Journal of Statistics 38 (2), 342–358. 16

A multi-point Metropolis scheme with generic weight functions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment