Free Energy Methods for Bayesian Inference: Efficient Exploration of Univariate Gaussian Mixture Posteriors

F ree Energy Metho ds for Ba y esian Inference: Eﬃcien t Exploratio n of Univ ariate Gaussian Mixture P o steriors Nicolas Chopin 1 ∗ , T on y Leli` evre 2 and Gabriel Stoltz 2 1: ENS AE-CREST, 3, Aven ue Pierre Larousse, 92245 Malakoﬀ, F rance. 2: U niversi t´ e P aris Est, CERMICS, Pro jet MICMAC Ecole des Po nts P arisT ech - I NRIA 6 & 8 Av. P ascal, 77455 Marne-la-V all ´ ee Cedex 2, F rance. Abstract Because of their multimodality , mixtur e p os terior distributions are diﬃcult to sample with standard Marko v c ha in Monte Carlo ( MCMC) methods. W e propose a strategy to enhance the sampling of MCMC in this context, using a bia sing proc edure whic h origi- nates from computational Statistical Ph ysics. The principle is ﬁrst to cho ose a “ reaction co ordinate” , that is, a “direction” in which the target dis tribution is multimodal. In a s econd step, the mar ginal log-density of the reaction c o ordinate with r esp ect to the po sterior distribution is estimated; min us this quant ity is called “free energ y” in the com- putational Statistical Physics literatur e . T o this end, we us e adaptive bia sing Marko v chain alg orithms whic h adapt their targeted in v ariant distributio n on the ﬂy , in order to ov er come sampling ba rriers along the chosen reaction coor dinate. Finally , we perfor m an impo rtance sampling step in o rder to r emov e the bia s and r ecov e r the true posterio r. The eﬃciency fa c to r of the impor tance sampling step can ea sily be estimated a priori once the bias is known, a nd a pp ea rs to b e rather large for the test cases w e c o nsidered. A crucial point is the choice of the reaction coo rdinate. One standard c hoic e (used for example in the class ic al W ang-Landau algor ithm) is minus the log-p osterio r density . W e discuss other c hoices. W e show in pa rticular that the h y per -parameter that determines the order of mag nitude of the v ariance of ea ch comp onent is bo th a conv enient and an eﬃcient reactio n co ordina te. W e also show how to adapt the metho d to compute the evidence (marginal lik eliho o d) of a mixtur e model. W e illustra te o ur approach b y analy zing tw o real data sets. Keyw o rds: Ada ptive Bia sing F orce; Adaptiv e Bia sing Potential; Adaptive Marko v c hain Monte Car lo ; Imp ortance sampling; Mixture models. 1 In tro duction Mixture mo deling is presumably the most p opular and the most ﬂexible w a y to mo del heterogeneous data; see e.g. Titterington et al. (1986), McLac hlan and P eel (2000) and F r ¨ uh wirth-Schnatte r (2006) for an o v erview of applications of mixture m o dels. Due to the emergence of Ma rko v c hain Mon te Carlo (MCMC), in terest in the Ba y esian analysis of suc h mo dels has sharply increased in r ecen t years, starting with Dieb olt and Ro b ert (1994). Ho w- ev er, MCMC analysis of mixtures p oses several pr oblems (Celeux et al., 2000; Jasra et al., ∗ Corresponding author: nicolas. chopin@ensae.fr 1 2005). I n this p ap er, we f o cus on the d iﬃculties arising from the multimod alit y o f the poste- rior d istribution. F or the sak e of exposition, w e concen trate our d iscussion on univ ariate Gaussian mixtures, but we explain in th e conclusion (Section 6) ho w our id eas may b e exte nded to other mixture mo dels. The data v ector y = ( y 1 , . . . , y n ) con tains indep enden t and identica lly distrib uted (i.i.d.) real random v ariables with density p ( y | θ ) = n Y i =1 p ( y i | θ ) , p ( y i | θ ) = K X k =1 q k ϕ ( y i ; µ k , λ − 1 k ) , (1) where the v ector θ con tains all the unknown parameters, i.e. the mixture w eights q 1 , ..., q K − 1 , the means µ 1 , . . . , µ K , the precisio ns λ 1 , . . . , λ K , and p ossibly h yp er-parameters, and ϕ ( · ; µ, λ − 1 ) denotes t he G aussian densit y with mean µ and v ariance λ − 1 . This mo d el is not iden tiﬁable, b ecause b oth th e lik eliho o d and the p osterior d ensit y are in v arian t by p ermutati on of comp onen ts, pro vided the prior is symmetric. This is the ro ot of the aforemen tioned problems. By construction, an y lo cal mo de of the p osterior dens it y admits K ! − 1 s ymmetric replicates, while a typica l MCMC sampler reco vers only one of these K ! copies. A p ossible remedy is to int ro du ce steps that p erm ute randomly th e comp onent s (F r ¨ uh wirth-Sc hnatter, 2001). Ho w ev er, m ixture p osterior distribu tions are often “gen uinely m u ltimo dal”, foll o w in g the terminology of Jasra et al. (2005): T he n u m b er of sets of K ! equiv alen t mo des is often larger than one; see also M arin and Rob ert (2007, Chap. 6) for an example of a m ultimo dal p osterior distribution generated b y an iden tiﬁable mixture mo del, and Section 2.2 of this p ap er for an example based on real data. Therefore, and follo wing Celeux et al. (2000) and Jasra et al. (2005), w e consider th at a minimum requirement of con verge nce for a MCMC samp ler is to visit a ll p ossible lab elling of the parameter, without resorting to random p erm utations. Inspired b y tec hniqu es used in molecular dynamics (Leli ` evre et al., 2010), our aim is to dev elop samplers that m eet this r equiremen t, usin g an imp ortance sampling strategy where the imp ortance samp ling fun ction is the marginal distribution of a w ell c hosen v ariable. T h e principle of the metho d is (i) to c h o ose a “reactio n co ordinate”, namely a low-dimensional function of the parameters θ ; (ii) to compute the marginal log-densit y of th is r eactio n co ordi- nate (min us this log-densit y is called the “free en er gy” in the molec ular dyn amics literature); and (iii) to u se the free en ergy to build a bias for the target of the MCMC sampler, in order to mo ve more freely betw een the diﬀeren t modal r egions of the initial target d istribution. More precisely , the biased log-density is obtained b y a dding th e free ener gy to the target lo g-densit y; this c hanges the marginal distribu tion of the reaction co ordinate in to a uniform distr ibution (within c h osen b ound s). A t the ﬁ nal stage of the algorithm, the bias can b e r emo ved by p erformin g a simple imp ortance sampling step from th e biased p osterior d istr ibution to the original un biased p osterior distrib u tion. E x p ectat ions with r esp ect to the p osterio r distribu- tion ma y b e computed fr om th e w eigh ted sample. W e also deriv e a metho d for computing the e vidence (marginal lik eliho o d). If the reaction co ordin ate is w ell c hosen, it is likely that a MCMC sampler targeted at the biased p ost erior conv erges to equilibrium m uc h faster than a standard MCMC s amp ler, see Leli ` evre et al. (2008). Tw o questions are then in order: Ho w to c h o ose th e reaction co ordinate? And ho w to compute the free energy? Concerning the c hoice of the reaction co ordinate, the ap p lication of fr ee energy based metho ds to mixture mo dels is not straightforw ard. In m an y cases, samplers targeting a 2 mixture p osterio r distribu tion are metastable : Th is term means that the tra jectory generated b y the algorithm remains stuc k in the vicinit y of some lo cal attraction p oin t for ve ry long times, b efore mo ving to a d iﬀeren t r egion of the accessible sp ace wh ere it again remains stuc k. W e sho w that the degree of metastabilit y of a sampler targeting a mixture p osterior distribution is strongly determined by certain h yp er-parameters, t ypically those that ca librate in the prior the spread of eac h Gaussian comp onent. W e show that suc h hyper -p arameters are go o d reaction co ordinates, in the sense th at (i) it is p ossible to eﬃcien tly compute the asso ciated free energy (b y adaptiv e algorithms, s ee b elo w), (ii) th e corresp onding free energy biased dynamics explores quic kly the (biased) p osterior distribution, and (iii) th e p oin ts sampled from the biased distribution h a ve non-negligible imp ortance weigh ts with r esp ect to the original target p osterior distribution. O ther reaction co ordinates are discussed, suc h as the p osterior log-densit y , wh ic h is also a goo d reaction coordin ate (with the problem ho we v er that an appropriate range of v ariation f or this reaction coord inate, wh ic h is n eeded for the metho d, is diﬃcult to determine a priori ). Th is reaction co ordinate is the standard c hoice for the W ang-Landau a lgorithm, see for instance Atc had ´ e and Liu (2 010), Liang (2005) and Liang (2010) fo r rela ted w orks in Statistics. T o compute the fr ee energy , w e resort to adaptiv e biasing algorithms (Darve and P ohorille , 2001; H ´ enin and Ch ip ot, 2004; Marsili et al., 2006; Leli ` evre et al. , 20 07). I n cont rast with the adaptiv e MCMC algorithms usually considered in the sta tistical lite rature (see the r eview o f Andrieu and Thoms (2 008) and references therein), these adaptiv e algo rithms mo d ify sequ en - tially the targeted inv arian t d istribution of the Mark o v c hain, in s tead of the parameters of the Marko v kernel. Suc h algorithms were initially designed to compute the free energy in molecular dynamics; see also Leli ` evre et al. (2010) for a recent review of alternativ e s tand ard tec hniqu es in molecular dynamics for computing the free energy , su c h as e. g. therm o dynamic in tegration. It is of course p ossible to co m bine our approac h with other strategies aimed at o verco ming m u ltimo dalit y , suc h as temp ering metho d s; see e.g. Iba (2001); Neal (2001) and Celeux et al. (2000). The pap er is organized as follo ws. S ection 2 presents the univ ariate Gaussian mixture mo del, and the diﬃculties encountered with classica l MCMC algorithms. S ection 3 describ es the metho d we prop ose, wh ic h is b ased on free energy biased dynamics. Section 4 explains ho w to apply this metho d to Ba y esian inference on Gauss ian mixture m o dels. S ection 5 illustrates our app roac h with t wo r eal data-sets. Section 6 discusses f urther research directions, in particular ho w o ur approac h ma y b e extended to other B a yesia n mo d els. 2 Scien tiﬁc con text 2.1 Gaussian mixture p osterior distribution As explained in the intro d uction, w e fo cus on the univ ariate Gaussian mixture mo del (1), asso- ciated with the follo wing prior, tak en from Richardson and Green (1997), whic h is symmetric with r esp ect to the comp onen ts k = 1 , . . . , K : µ k ∼ N( m, κ − 1 ) , λ k ∼ Gamma( α, β ) , β ∼ Gamma( g , h ) , ( q 1 , . . . , q K − 1 ) ∼ Diric h let K (1 , . . . , 1) . 3 In our examples, w e tak e m = M , κ = 4 /R 2 , α = 2, g = 0 . 2, h = 100 g /αR 2 , where R and M are resp ectiv ely the range and the mean o f the o bserve d data, as in Jasra e t al . (2005). The p osterior d ensit y reads p ( θ | y ) = 1 Z K p ( θ ) p ( y | θ ) (2) = κ K/ 2 g h β K α + g − 1 Z K Γ( α ) K Γ( g )(2 π ) n + K 2 K Y k =1 λ k ! α − 1 exp ( − κ 2 K X k =1 ( µ k − M ) 2 − β h + K X k =1 λ k !) × n Y i =1 " K X k =1 q k λ 1 / 2 k exp  − λ k 2 ( y i − µ k ) 2  # . In t his expression, θ is the v ector θ = ( q 1 , . . . , q K − 1 , µ 1 , . . . , µ K , λ 1 , . . . , λ K , β ) ∈ Ω = S K × R K × ( R + ) K +1 , (3) the set S K = ( ( q 1 , . . . , q K − 1 ) ∈ ( R + ) K − 1 , K − 1 X i =0 q i ≤ 1 ) is the probabilit y simplex, and Z K = Z Ω p ( θ ) p ( y | θ ) dθ (4) is the normalizing constan t (namely the marginal lik eliho o d in y ), whic h dep ends on K and y . The s ampling of the p oste rior distribution with densit y (2) is the focus of the paper. 2.2 Metastabilit y in Statistical P hysics In th is section, we d ra w a parallel b et w een the pr oblem of sampling (2) and the p r oblem of s ampling Boltzmann-Gibbs p robabilit y measur es th at arise in computational S tatistica l Ph ysics; see for instance (Balian , 2007). W e take this opp ortunit y to int ro du ce some of the concepts and terms of computational S tatistical Ph ysics t hat are relev an t in our con text. In computational Statistic al Physics, one is often in terested in computing a ve rage thermo- dynamic p rop erties of the system under consideration. This requires sampling a Boltzmann- Gibbs ( probabilit y) measure π ( θ ) = 1 Z exp {− V ( θ ) } , Z = Z Ω exp {− V ( θ ) } dθ , (5) where θ ∈ Ω ⊂ R d , and V is the p otential of the system, assumed to b e suc h that Z < ∞ . F rom no w on, the term p oten tial refers to th e logarithm of a give n (p ossibly unnormalized) probabilit y densit y: e.g. V ( θ ) = − log { p ( θ ) p ( y | θ ) } for th e p osterior densit y (2). Probabilit y den sities suc h as the p osterior density (2) for mixture mo d els, or the Bo ltzmann- Gibbs densit y (5) for systems in Statistic al Physics, are often m ultimo dal. S tandard sampling strategies, for instance the Hastings-Met rop olis algorithm (Met rop olis et a l., 1 953; Hastings, 1970) generate samples which t ypically remain stuc k in a small region around a lo cal m axi- m u m (also called local mod e) of the samp led distr ib ution (or, equiv alen tly , a local minimum of the p ote nt ial V ). The sequences of samp les generated b y these algorithms are said to b e metastable . 4 Figure 1 illustrates this p henomenon, in the Ba y esian mixture con text, for the p oste- rior (2), w ith K = 3 comp onent s, and f or t wo datasets (Fishery data, see Section 5.1, and Hidalgo stamps data, see Section 5.2). The p osterio r densit y admits at least K ! = 6 equiv- alen t mo d es, but very few mo de switc hin gs (if any) are observe d in the MCMC tra jectories. A simple Gaussian random w alk Hastings-Metrop olis is used, bu t w e obtained the same t yp e of plot s (n ot sh o wn ) with Gibbs sampling. 2 4 6 8 1 0 1 2 1 4 y 0 .0 0 .1 0 .2 0 . 3 0 .4 0 .5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 y 0 . 0 0 . 1 0 . 2 0 .3 0 . 4 0 .5 0 .6 0 .7 0 .8 0 2.5e+08 5.0e+08 7.5e+08 1.0e+09 2 5 8 11 Number of iterations Mu 0.0 2.5e+08 5.0e+08 7.5e+08 1.0e+09 7 9 11 Number of iterations Mu Figure 1: L eft (resp. righ t) hand side corresp onds to Fish ery (r esp . Hidalgo stamps) dataset . T op ro w: histograms of the data and an estimated 3 -comp onent Gaussian mixture probabilit y densit y fu nction (solid line). T he d ashed line co rresp onds to a local mo de of the p osterior den- sit y . Bottom ro w: rand om walk Hastings-Metrop olis tra jectories of ( µ 1 , µ 2 , µ 3 ) a s a function of t he n umber o f iterat ions. See Section 5 for more details o n the data and the sampler. The top righ t plot of Figure 1 also represen ts a Gaussian mixture p r obabilit y densit y (in dashed line) whic h corresp onds to a neg ligible (in terms of p oste rior probabilit y) local mo de of the p osterior densit y f or the Hidalgo datase t. In n um er ical tests not rep orted here, wh en this lo cal m o de is used as a starting p oint, the Hastings-Metrop olis sampler used n eeds ab out T = 10 5 iterations to leav e th e attractio n of this meaningless mo de. It is easy to mak e this problem worse b y slig ht ly c hanging th e data. F or instance , T is increased to 10 7 b y adding 2 to the three largest y i ’s (while this local mode remained of v ery small p osterior probabilit y ). This lo cal mo de illustrates the typical “gen uine m ultimo dalit y” of mixture p osteriors mentioned in th e in tro duction – m ultimo dalit y which cannot be cured b y label permutation. 5 3 F ree-energy biased sampling Consider a generic probabilit y density π ( θ ), with θ ∈ Ω ⊂ R d as deﬁned in (5). T he principle of free en ergy biased sampling (describ ed more precisely in S ection 3.1) is to c hange the original densit y π to the biased den s it y: π A ( θ ) ∝ π ( θ ) exp { ( A ◦ ξ )( θ ) } , where ξ is s ome real-v alued function ξ : Ω → [ z min , z max ] where [ z min , z max ] ⊂ R is a b ounded in terv al, ( A ◦ ξ )( θ ) = A ( ξ ( θ )), and the s o-calle d fr ee energy A (see deﬁnition (6) b elo w ) is su c h that the distribution of ξ ( θ ) when θ is d istr ibuted according to π A is un iform o v er the interv al [ z min , z max ]. By sampling π A ( θ ) dθ rather than π ( θ ) dθ , the aim is to remov e th e metastabilit y in the d irection of ξ . Av erages with resp ect to the original distribu tion of interest π ( θ ) dθ are ﬁnally obtained by standard imp ortance sampling (see Section 3.3). W e assume ﬁrst that ξ ( θ ) take s v alues in a b ounded interv al [ z min , z max ], (think of ξ ( θ ) = q 1 and [ z min , z max ] = [0 , 1] f or the m ixture p osterior distribution case), and p ostp one the discussion of ho w to treat reaction co ordinates with v alues in an unboun ded domain to Section 3.4. An imp orta nt part of the algo rithm is to co mpute (an appro ximation of ) the free energy A . There a re man y w a ys to this end. W e describ e a class of metho d s w h ic h are v ery eﬃcie nt in the ﬁeld of computational Statist ical Physics (see Section 3 .2) and w h ic h, to our kn o wledge, ha ve n ot b ee n used so f ar in Statistics. 3.1 Principle of the metho d Consider the conditional probabilit y measures asso ciated with ξ : π ξ ( dθ | ξ ( θ ) = z ) = exp {− V ( θ ) } δ ξ ( θ ) − z ( dθ ) Z Σ( z ) exp {− V ( θ ′ ) } δ ξ ( θ ′ ) − z ( dθ ′ ) , where δ ξ ( θ ) − z ( dθ ) is a m easure with supp ort Σ( z ) = n θ ∈ Ω    ξ ( θ ) = z o , deﬁned by the form u la: for all smooth test fu nctions ϕ and ψ , Z Ω ψ ( ξ ( θ )) ϕ ( θ ) dθ = Z ψ ( z ) Z Σ( z ) ϕ ( θ ) δ ξ ( θ ) − z ( dθ ) dz . The main assumption underlying free-energy biased metho ds is that the fun ction ξ is chosen so that the sampling of π ξ ( dθ | ξ ( θ ) = z ) is signiﬁcan tly “ea sier” th an th e samplin g of π ( θ ) dθ , at least for some v alues of z . I n other wo rds, π ξ ( dθ | ξ ( θ ) = z ) should b e muc h le ss m ultimo dal than π ( θ ) dθ , at least for some v alues of z , see the discussion in Section 4.2 b elo w. T o giv e a co ncrete example, consider the c hoice ξ ( θ ) = q 1 . In this case, π ξ ( dθ | ξ ( θ ) = z ) is the conditional posterior distribution of a ll v ariables except q 1 , conditionally on q 1 = z . 6 The function ξ is calle d a reaction co ordinate in the ph ysics literature, b ecause of its phys- ical in terpretation: this function ξ parameterizes the pr ogress of some c hemical ev en t at a coarser scale (c h emical reaction or change of conformation for example). Giv en the tra ject ory { θ t } t ≥ 0 of a Mark o v chain, ξ ( θ t ) is typically a metastable tra jectory , a nd v aries on timescale s m u c h larger than the t yp ical microscopic ﬂu ctuations of the sy s tem arou n d its metastable conﬁgurations. Of course, in our Bay esian mixture con text, this physic al inte rpretation is not very useful. F or th e moment, w e stic k with the more g eneric (a nd in formal) un derstand- ing mentio ned at the b eginning of this paragraph, i . e. the samp lin g of π ξ ( dθ | ξ ( θ ) = z ) should b e “easier” than the samplin g of π ( θ ) dθ . W e defer the imp ortan t discussion on how to in terpret and c ho ose this “reaction co ordinate” in our sp eci ﬁc con text to Sectio n 4.2. W e also refer the readers to Leli ` evre et al. (2008); Leli ` evre and Minouk adeh (2011) for a precise quan tiﬁcation of this concept in a fun ctional analysis framework. Finally , although w e con- sider only one-dimens ional reaction co ordinates in this p ap er, we men tion that extensions to reaction co ord inates with v alues in R m with m ≥ 2 are p ossible (Leli ` evre et a l., 2010; Chip ot and Leli ` evre , 2010). Some algorithms allo wing to switch b et w een diﬀerent reaction co ordinates ha v e also been dev elop ed (Piana and Laio, 2007). The f r e e ener gy A ( z ) is deﬁned as exp {− A ( z ) } = Z Σ( z ) exp {− V ( θ ) } δ ξ ( θ ) − z ( dθ ) , (6) see for instance Section 5.6 in Balian (2007). In other w ord s, the f r ee energy is minus the marginal log-densit y of the r eactio n co ordinate. As ab ov e, let u s again consider the simple example when the react ion coordinate i s ξ ( θ ) = q 1 . T hen, the free-e nergy is simply , up to an additiv e constan t, equal to A ( q 1 ) = − log Z S K − 1 ( q 1 ) × R K × ( R + ) K +1 exp {− V ( θ ) } dq 2 . . . dq K − 1 dµ 1 . . . dµ K dλ 1 . . . dλ K dβ ! , where S K − 1 ( q 1 ) = ( ( q 2 , . . . , q K − 1 ) ∈ ( R + ) K − 2 , K − 1 X i =2 q i = 1 − q 1 ) . In w ords, t he f ree e nergy is in t his ca se the opp osite of the log -marginal densit y of q 1 . The f ree e nergy can b e used to bias the target densit y π as follo ws: π A ( θ ) = 1 Z A exp {− V ( θ ) + ( A ◦ ξ )( θ ) } . W e refer to d ensities of this form as fr e e ener gy-biase d densities. The essent ial p rop erty of π A ( θ ) is that, by constru ction, the corresp ond ing marginal distr ibution of ξ is uniform on the in terv al [ z min , z max ]. A s amp ler targeting π A ( θ ) dθ is thus m uc h less lik ely to b e metastable (namely to get stuck around a lo cal minimum of the d ensit y) than a s ampler targeting π ( θ ) dθ , b ec ause (i) the former sampler should mov e f reely along the direction ξ ( θ ) deﬁned by the reaction co ord inate, since the marginal d istribution of ξ ( θ ) is uniform, and (ii) w e h a ve assumed that th e r eactio n co ord inate ξ ( θ ) is su ch the conditional p robabilit y distributions π ξ A ( dθ | ξ ( θ ) = z ) = π ξ ( dθ | ξ ( θ ) = z ) are easy to sample, at least for some v alues of z (namely that they do not ha v e v ery separated mo des). 7 Therefore, fr ee energy-based metho d s aim at samp ling π A ( θ ), in order to mov e freely across the sampling space. Then, π is ev en tually reco ve red through an im p ortance sampling step, from π A to π : f or any test function ϕ , E π ( ϕ ) = Z Ω ϕ ( θ ) π ( θ ) dθ = Z Ω ϕ ( θ ) exp {− A ◦ ξ ( θ ) } π A ( θ ) dθ Z Ω exp {− A ◦ ξ ( θ ) } π A ( θ ) dθ = E π A  ϕ exp {− A ◦ ξ }  E π A  exp {− A ◦ ξ }  . (7) W e refer to Section 3.3 for fur th er precisions. Note th at for (7) to hold, A only needs to b e deﬁned u p to an additive constan t. 3.2 Computing the free energy b y adaptiv e metho ds In most cases, th e free energy A deﬁned in (6) do es not admit a closed-form expression, and must b e estimated. There are no w ada ys ma ny tec hn iques to this end, with v arious degrees o f eﬃciencies and conceptual co mplexities. W e p resen t in this section some p o werful algorithms, namely adap tive b iasing method s , wh ich are not so we ll kno wn in the statistical literature. Of course, an y other standard metho d such as thermo dynamic integrat ion could b e used (see the b o ok (Leli` evre et al., 2010) for a precise presenta tion of standard metho ds for fr ee energy compu tations in the framewo rk of compu tational Statistical Ph ysics, as well as G elman and Meng (1998 ) for a review from the viewpoint of Stat istics). 3.2.1 General structure of adapt iv e metho ds In adaptive biasing Marko v c hain Mon te Carlo metho ds, a time-v arying biasing p ote n tial A t ( z ) is considered. T he biasing p oten tial A t is sequen tially u p dated in order to con ve rge to the free energy A in the limit. As already mentioned in the in tro duction, the term “adaptiv e” refers in this pap er to the dynamic adap tation of the targeted probability measur e, and not of the paramete rs of a Marko v k ernel used in th e sim u lations. Sp eciﬁcally , at iteration t , the time-v arying ta rgeted densit y is π A t ( θ ) = 1 Z A t exp {− V ( θ ) + ( A t ◦ ξ )( θ ) } . (8) An adaptiv e MCMC algorithm s im u lates a non-homogeneous Mark o v c hain ( θ t ), t = 1 , 2 , . . . , using t he tw o follo wing steps at iteratio n t : (1) a M CMC mo v e ac cording to the curren t target π A t deﬁned in (8), θ t ∼ K t ( θ t − 1 , · ) , where K t is a Mark o v k ernel lea ving π A t in v arian t; (2) the up date of the bias t o A t +1 , using a tra jectory a v erage, see Sectio n 3.2.2 b elo w. The ﬁ rst step ma y b e d one usin g a Hastings-Metrop olis kernel for instance, see Fi gure 2. Before explaining th e second step, let us mention how the discretization of the reaction co ordinate v alues for the biasing p oten tial A t is done in p ractice. A simple strategy , whic h w e adopt in this pap er, is to u se predeﬁ n ed bins, and approximate the biasing p oten tial A t or 8 its deriv ativ e A ′ t (with resp ect to z ) b y piecewise constant fun ctions. Sp eci ﬁcally , w e consider N z bins of equal sizes ∆ z , [ z min , z max ] = N z − 1 [ i =0 [ z i , z i +1 ] , z i = z min + i ∆ z , ∆ z = z max − z min N z . Other discreti zations ma y of course be used, bu t this is not the fo cus o f this pap er. 3.2.2 Strategies for up dating the bias Recall that the b ottom line of adaptiv e metho ds is that A t should conv erge to A . Adaptive b i- asing metho d s can b e classiﬁed in to t w o categories, d ep endin g on whether it is the free e nergy A t ( z ), o r its deriv ativ e A ′ t ( z ) with resp ect to z , whic h is up dated. Instances of the ﬁrst strat- egy , called adaptiv e b iasing p oten tial (ABP) metho ds, in clude nonequilibrium metadynam- ics (Bussi et al., 2006; Raiteri et al., 200 6), the W ang-Landau algorithm (W ang and L andau, 2001b,a) and Self-Healing Um brella S ampling (Marsili et al., 2006). The adaptiv e biasing force (ABF) m etho dology (Da rv e and Pohorille, 2001 ; H ´ enin and Chip ot , 2004; Leli` evre et al ., 2007), whic h is the m ain adaptive metho d u s ed in this pap er, is an instance of the second. F rom now on, we fo cu s on t w o p articular str ategies, one b elo nging to the ABP class, and another to ABF class. The ABP strategy we c ho ose is b ased on Marsili et al. (2006). In particular, w e do n ot use the W ang-Landau algo rithm, whic h is, to our kno wledge, th e only ABP method discussed b efore in th e Statistic al literature; see e.g. At c h ad ´ e and Liu (20 10), Liang (2005) and Lia ng (2010). Indeed, a delic ate p oin t with the W ang-Landau approac h is ho w to c h o ose the v anish- ing rate of the “gain factor”; see e.g. Liang (20 05). On the other h and, Self-Healing Umbrella Sampling do es not inv olv e such an additional parameter t o b e tun ed . I t c onsists in up dating A t as follo ws. Th e biasing p ote nt ial for z ∈ ( z i , z i +1 ) is initia lly set to exp { − A 0 ( z ) } = 1 / N z (for all i ∈ { 0 , . . . , N z − 1 } ) , and then up dated for all i ∈ { 0 , . . . , N z − 1 } and for t ≥ 1 as ∀ z ∈ ( z i , z i +1 ) , exp {− A t ( z ) } = 1 Z t   1 + t − 1 X j =1 1 { z i ≤ ξ ( θ j ) α t , r eje ct the move and set θ t = θ t − 1 . (d) F ol lowing (12) , up date the biasing for c e, henc e the biasing p otential A t +1 . (e) Go to Step (a). 3.3 Rew eigh ting free-energy biased sim ulations Up on stabilization of the adaptiv e alg orithm at iteration T , an estimate b A = A T of the biasing p oten tial A is o btained, from which one deﬁ n es the biased den sit y e π ( θ ) = π b A ( θ ) = 1 e Z π ( θ ) exp n b A ◦ ξ ( θ ) o . (13) T o sample the true p osterior π , w e use the follo wing simple strategy . W e sim ulate a standard MCMC algorithm, e.g. a rand om wa lk Hastings-Met rop olis algorithm, targeted at the b iased p osterior density e π , and then p erform an imp ortance sampling step from e π to π , based on the importance sampling w eigh ts: w ( θ ) = exp n − b A ◦ ξ ( θ ) o ∝ π ( θ ) e π ( θ ) . (14) 11 F rom the MCMC c hain ( θ t ) t ≥ 1 targeted at e π , the exp ectation with resp ect to π of a test function h can thus b e estimat ed as (see (7)): E π ( h ) = E e π ( hw ) E e π ( w ) ≃ t max X t =1 h ( θ t ) w ( θ t ) t max X t =1 w ( θ t ) , (15) where t max is the n u m b er of iterations of the M CMC c hain. 3.4 Reaction co ordinates with un b ound ed v alues There are man y cases when th e reaction co ord inate takes v alues in an unboun ded in terv al I . Here I should be u ndersto o d as the sup p ort of the d istr ibution of the random v ariable ξ ( θ ) when θ ∼ π ( θ ) dθ . On e ma y thin k of ξ ( θ ) = µ 1 as an example for the mixture p osterior distribution ca se (in which case I = R ), and see S ection 4.2 for more exa mples. It is not p ossible to apply th e ab ov e pro cedur e on the whole inte rv al I . Some truncation is required f or at least t wo r easons. First, numerically , it would b e diﬃcult to discretize in space a fun ction deﬁned on an unb ounded domain. Seco nd (and more imp ortant ly) the use of the full free energy o ver I w ould lead to a den sit y π A whic h is not in tegrable (since the uniform law o v er I is not well d eﬁned as a probabilit y distribution). W e therefore resort to the follo wing strategy . First, we c ho ose some truncation in terv al [ z min , z max ]. Then, in the adap tive MCMC algorithm (wh ic h calculates the free energy), we reject an y p oin t θ suc h that ξ ( θ ) fall outside of this in terv al. This is tan tamount to restricting the sampling s pace with the constraint z min ≤ ξ ( θ ) ≤ z max . In this wa y , one obtains an estimate b A ( z ) of the free energy A ( z ), but only for z ∈ [ z min , z max ]. When b A is obtained, we s imply extend its deﬁnition outside z ∈ [ z min , z max ] as follo ws: b A ( z ) = b A ( z min ), for z ≤ z min , b A ( z ) = b A ( z max ), for z ≥ z max . Finally , we run a standard MCMC s amp ler targeting th e distrib ution e π = π b A , as describ ed in the previous section. Note that the biased distribution e π is d eﬁ ned o v er the whole parameter space Ω ( in particular, no additional r ejection ste p is needed in the samp ling o f th is distr ib ution). In p ractice, one should choose an in terv al [ z min , z max ] whic h is not to o large, bu t at the same time su c h that the pr ob ab ility (with resp ect to π ) of the ev ent z min ≤ ξ ( θ ) ≤ z max is close t o one: Z z max z min exp( − A ( z )) dz Z I exp( − A ( z )) dz ≃ 1 , (16) so that I \ [ z min , z max ] is barely v isited (see also Section 4.1.2). This is one of the practic al diﬃculties that w e shall discuss in the next section. 4 Ba y esian infere nce from free energy b iased dy n amics In this section, we explain ho w to p erf orm Ba yesia n inf erence for the u niv ariate Gaussian mixture mo del describ ed Section 2.1, that is, how to compute quantit ies such as p osterior exp ectations and ratio s of marginal lik eliho o ds (equal to ratios of normalizing constan ts Z K 12 deﬁned in (4)), usin g the free energy asso ciated to a giv en reaction co ordin ate to b u ild an im - p ortance function. Th e Gaussian mixture mo del corresp onds, in the notation of Section 3, to π ( θ ) = p ( θ | y ) ∝ p ( θ ) p ( y | θ ), hence V ( θ ) = − log { p ( θ ) p ( y | θ ) } . F or a giv en reactio n c o ordinate, and a giv en estimate b A of the free energy A , the free-energy biased probabilit y distribution is e p ( θ | y ) ∝ p ( θ | y ) /w ( θ ) ∝ p ( θ | y ) exp  b A ◦ ξ  , where w is deﬁned by (14). W e start by listing the criteria we use to assess the qualit y of the imp ortance sampling pro cedure in Section 4.1. As mentio ned in the int ro du ction, the strategy to sample the p osterior distribu tion (2) consists of three steps: choosing a reactio n coordin ate, computing (an appro ximation of ) the free energy asso ciated to this reaction co ord inate, and using the free energy to b uild an imp ortance sampling prop osal distribu tion according to (13). The previous section was dev oted to th e second and third steps . W e discuss the ﬁrst step in Section 4.2 fo r the mixture model at hand. Section 4.3 presents an extension of the m etho d to the computatio n of the r atio of normalizing constan ts asso ciated to diﬀerent v alues of the n umber of c omp onents K , in order to p erform mo del c hoices b et w een mo dels corresp ondin g to diﬀerent n umber of comp onen ts. Note that we discuss in this section th e th eoretical eﬃciency of th e whole approac h. These discussions are su pp orted b y n umerical exp erimen ts in S ection 5. 4.1 Criteria for c ho osing the reaction co ordinate 4.1.1 General criteria W e consider the follo wing criteria f or ev aluating the pr actical eﬃciency of the wh ole pro cedure, for a giv en c hoice of the reaction co ordinate ξ : (a) In the execution of the (either ABF or ABP) adaptive algorithm, h o w fast d o es the appro ximate fr ee en er gy A t con verge to its l imit A ? (b) Ho w eﬃcien t is the imp ortance samp ling step from the biased distribu tion to the originally targeted p oste rior distribution? Actually , this criterion is t w ofold: (b1) Ho w eﬃcien t is the MCMC sampling of t he b iased densit y e p ( θ | y )? (b2) Ho w representa tiv e are the p oin ts simula ted from the biased distribu tion with re- sp ect to t he target p osterior distrib ution? ( i.e. how many of these p oints are assigned non -n egligible imp ortance we igh ts?) (c) A more practical criterion is (in the case of a reaction co ordinate w ith v alues in an un- b ound ed d omain): How diﬃcult is it to determine, a priori, an in terv al [ z min , z max ] for the reaction coordin ate v alues, wh ic h ensures goo d p erformance with resp ec t to Criteria (a) and (b) and wh ic h satisﬁes (16) ? Criterion (b2) is discussed in the next sectio n. Cr iteria (a) and (b1) can actually b e shown to b e closely related, at least for some family of adaptiv e methods , see Leli ` evre et a l. (2008). Roughly sp eaking, an adaptiv e alg orithm yields qu ickly an estimate of t he free energy , if and only if the free en ergy is in deed a go o d biasing p ote nt ial, in the sense that the dynamics driv en b y the biased p oten tial conv erges quickly to a limiting d istribution. Th eoreticall y and 13 as mentio ned ab o v e, a suﬃcient condition for an eﬃcien t samp lin g is th at the conditional probabilit y distributions π ξ ( dx | ξ ( x ) = z ) are easy to samp le, at least for some v alues of z (namely they do not ha v e v ery separated mo d es). W e refer to Leli ` evre et al. (2008); Leli ` evre and Mi nouk adeh (2 011) for precise mathematic al results. Numerically , to assess th e con verge nce of adaptiv e metho d s, we recommend the follo w in g t wo basic chec ks: (i) that the output of the adaptiv e algorithm has explored the full range [ z min , z max ] and has a distrib ution wh ic h is close to uniform; and (ii) using the criterion men tioned in the in tro d uction, and sp eciﬁcally for mixture p osterior distr ib utions, that the algorithm has visited the K ! symmetric r eplicates of an y signiﬁcant lo cal mo de. Th e same con verge nce c h ec ks can be applied to t he MCMC dynamics ta rgeted at e π . 4.1.2 Eﬃciency of t he imp ortance sampling step W e giv e here a wa y to quan tify Criterion (b2). T o ev aluate the p erformance of the imp ortance sampling step, w e compu te t he f ollo wing eﬃciency factor EF = T X t =1 w ( θ t ) ! 2 T T X t =1 w ( θ t ) 2 where w ( θ ) is deﬁn ed in (14), and where { θ t } t ≥ 0 denotes the MCMC sample targeting the biased p osterior e p ( θ | y ), as describ ed in Section 3.3 . The eﬃciency factor is the E ﬀective Sample Size of Kong et al. (1994) divided by the num b er of sampled v alues. T h is quan tit y lies in [0 , 1]. It is close to one (resp. to zero) w h en the random v ariable w ( θ ) has a sm all (resp. a la rge) v ariance. Indeed, it is easy to chec k that EF =  V ar T ( w ) (E T ( w )) 2 + 1  − 1 , V ar T ( w ) = 1 T T X t =1 w ( θ t ) 2 − h E T ( w ) i 2 , where the latte r quan tit y is the empirical v ariance of the sample { w ( θ t ) } 1 ≤ t ≤ T , and E T ( w ) = P T t =1 w ( θ t ) /T its empirical a verage . W e now prop ose an estimate of the eﬃciency factor in terms of th e con verged bias b A only , whic h ma y therefore b e compu ted b ef or e the MCMC algorithm targeting th e b iased p osterior is run, and the imp ortance sampling step is p erformed. This estimate is based on the fact th at, with resp ect to e p ( θ | y ), the marginal distribu tion of ξ is appro ximately un iform o ver [ z min , z max ]. F or we ll c hosen z min and z max , ξ ( θ t ) hardly visits v alues out of the interv al [ z min , z max ] ( see (1 6) abov e) and thus V ar T ( w ) (E T ( w )) 2 ≃ 1 z max − z min Z z max z min  exp n − b A ( z ) o − 1 z max − z min Z z max z min exp n − b A o  2 dz  1 z max − z min Z z max z min exp n − b A ( z ) o dz  2 , 14 whic h p ro vid es a justiﬁcation for the follo wing “theoretica l” eﬃciency fact or: EF theoretical =  Z z max z min exp n − b A ( z ) o dz  2 ( z max − z min ) Z z max z min exp n − 2 b A ( z ) o dz . (17) The a greemen t b et w een theoret ical and numerically c omputed eﬃciency factors in our sim ula- tions is v er y go o d, see T ables 1 , 2 an d 4 in Section 5.1 .3. Thus, the theoretical eﬃciency fac tor allo ws for a quic k c h ec k that the subsequent imp ortance samp ling i s r easonably e ﬃcien t. F rom the expr ession (17), it is seen that the eﬃciency factor is close to one wh en A is close to a constant. T h us, Criterion (b2) men tioned in the previous section is likely to b e satisﬁed if the fr ee energy asso ciated to ξ has a small amplitude, i.e . max A − m in A is as small a s p ossible. 4.2 Practical c hoice of t he reaction co ordinate W e no w discuss the pr actica l c hoice of the reaction coord inate ξ : θ → R in the mixture p osterior s ampling co n text, with resp ect to the crite ria listed ab ov e. W e discuss su ccessiv ely the follo wing four p ossib le c hoices: ξ ( θ ) = µ 1 , ξ ( θ ) = V ( θ ) = − log { p ( θ ) p ( y | θ ) } , ξ ( θ ) = q 1 and ξ ( θ ) = β . T his discuss ion is also illustrated numerically in Sectio n 5.1.2. The requirement that the multimodalit y of th e target measure conditional on ξ ( θ ) = z is m uch less n oticeable than the m u ltimo d alit y of the original target measure rules out the c hoice of µ 1 as a go o d reaction co ordinate since, conditionally on µ 1 , the p osterior den sit y still has at least ( K − 1)! mo des, as the comp onen ts 2 to K remain exchangea ble. Nu merical tests in deed su pp ort t hese consid er ations, see belo w. A more n atur al reaction co ordinate is min us the p osterior log-densit y ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } , in the spirit of the original W ang-Landau algorithm (W ang and Lan d au, 2001 b ,a). Ind eed, exploring regions of lo w p osterior density should help to escape more easily from local mod es. Unfortunately , determining a range [ z min , z max ] of “lik ely v alues” (with resp ect to the posterior distribution) for su c h functions of θ is not str aigh tforward; see C r iterion (c) ab o v e. Moreov er, since the p osterior densit y is exp ected to b e multimod al and diﬃcult to explore, there se ems to b e litt le p oint in p erforming MCMC pilot runs in order to d etermine [ z min , z max ]. A conser- v ativ e approac h is to c ho ose a v ery wide in terv al [ z min , z max ], but t his mak es the subsequent imp ortance sampling step quite ineﬃcien t. In our simulatio ns, w e rep ort satisfactory results for this reaction co ordin ate, b ut with the ca vea t that our c h oice for [ z min , z max ] was facili- tated by our diﬀeren t simulati on exercises, based on sev eral r eaction co ordinates. Another practical d iﬃcult y we obs er ved in one case is that the estimated bias is quite inaccurate in the immediat e vicinit y of the p osterior mo d e, b ecause the f ree e nergy tends to increase v ery sharply in this reg ion, see Section 5 .2 for more d etails. The choice ξ ( θ ) = q 1 is satisfactory with r esp ect to Criterion (c): Th e range on whic h it can v ary , namely [0 , 1], is clearly kno wn. With resp ect t o (a), this c h oice lo oks app eali ng as wel l, sin ce forcing q 1 to get close to 1 should empt y the K − 1 other comp onen ts, whic h then ma y swap more easily . Unfortun ately , we obs er ve in some of our exp eriments that the dynamics biased b y the f ree-energy asso ciated with this reaction coordin ate is not v ery successful i n te rms of mo de switc hings, see Figures 6 and 1 3. Finally , ξ ( θ ) = β app ears to b e a go o d trade-oﬀ with resp ect to our criteria, at least in the examples we consider b elo w . Concerning the determination of the in terv al [ z min , z max ] 15 (Criterion (c)), since β determines the order of m agnitude of the co mp onent v ariances σ 2 k = λ − 1 k , there should b e high p osterior pr obabilit y that β is a small fraction of R 2 , where R is the r an ge of the d ata. F or instance, w e obtain satisfact ory results in all our exp erimen ts with [ z min , z max ] = [ R 2 / 2000 , R 2 / 20]. Concerning Criterion (a), w e observe that the c hoice ξ = β p erforms w ell ( see the numerical results belo w). W e p rop ose the follo w ing explanation. Since the λ k ’s hav e a Gamma( α, β ) prior, large v alues of β p enalize large v alues for the comp onen t precisions λ k , or equiv alen tly p enalize small v alues for the comp onent standard deviations σ k = λ − 1 / 2 k . If β is large enough, the Gaussian comp onen ts are forced to co v er the complete range of the data, and th u s can switch easily . Th is in teresting phenomenon is illustrated by Figure 8 , see b elo w for further precisions. In ot her w ords, a “goo d” reaction co ord in ate ξ should b e such that the conditional probabilit y distributions π ξ ( dx | ξ ( x ) = z ) are less m ultimo dal than π , at le ast f or some values of z . F or a theoretical r esult supp orting this inte rpretation, w e refer to Leli ` evre and Minou k adeh (2011). 4.3 Computing normalizing c onst ants and mo del choice In this section, w e discuss an extension of the method to p erform mo del c hoice b et we en mo d els with diﬀerent num b ers of comp onent s. Th e principle is to compute the norm alizing c onstan t Z K of the p osterio r densit y for diﬀerent v alues of K , see (4) f or a deﬁn ition of Z K . More precisely , it is su ﬃ cien t to ev aluate Z K / Z K − 1 for a giv en range of K (see (Rob ert and Casella, 2004, Ch ap. 7) for a review on Ba y esian m o del c hoice). W e prop ose the follo wing strategy . The estimat ion of Z K / Z K − 1 can b e p erformed by ﬁrst estimating Z K / e Z , then estimating Z K − 1 / e Z , and ﬁnally dividing the tw o quan tities. A simp le estimator o f Z K / e Z (wh er e e Z is the normalizing constan t in (13)) is give n b y b I K = 1 T T X t =1 w ( θ t ) , where { θ t } t ≥ 0 is a samp le distr ib uted according to the biased p robabilit y e p ( θ | y ) (with K normal comp onen ts). This formula is based on the fact that the exp ect ation of w ( θ ) = exp {− b A ◦ ξ ( θ ) } w ith resp ect to e p ( θ | y ) is Z K / e Z . Let θ − k denote t he parameter ve ctor o btained b y remo ving in θ the parameters attac hed to a giv en comp onen t k , and r eplacing the probabilities q l (for l 6 = k ) by e q l = q l / (1 − q k ). Let p ( y | θ − k ) denote the likel iho o d of the mo del with K − 1 comp onents, and parameter θ − k . Then the follo wing quant it y b I K − 1 = 1 K K X k =1 b I K − 1 ,k , b I K − 1 ,k = 1 T T X t =1 w − k ( θ t ) , where { θ t } t ≥ 0 is the same Marko v c hain as ab o v e, and w − k ( θ ) = p ( y | θ − k ) p ( y | θ ) exp n − b A ◦ ξ ( θ ) o , (18) is an estimator of Z K − 1 / e Z . The estimato rs b I K and b I K − 1 are reminiscen t of the birth and death mo v es of the rev er s ible jump algorithm of Ric hard son and Green (1997), where a new mo del is prop osed by a dding or remo vin g a comp onent chosen at random. The diﬀerence is that the biased p oste rior 16 e p ( θ | y ) acts as an intermediate state b et w een the p osterior w ith K comp onen ts, p ( θ | y ) and the p osterior w ith K − 1 co mp onents (or more precisely , the p ost erior with K − 1 comp onen ts times the prior of a K -th “non-acting” comp onen t, in order to matc h the dimensionalit y of b oth p ( θ | y ) and e p ( θ | y )). In our numerical exp erimen ts, th e estimator of Z K / Z K − 1 obtained from this strategy p erforms w ell, se e Secti on 5 (in particular T able 3). 5 Numerical examples In o ur exp erimen ts and as explained abov e, w e use the follo w in g approac h. First, w e run an adaptiv e algorithm (ABF, except for ξ ( θ ) = V ( θ ) = − log { p ( θ ) p ( y | θ ) } , in whic h case we u se ABP), f or a giv en c hoice of the reaction coordinate ξ , and a giv en interv al [ z min , z max ], un til a con verge d bias b A is obtained. Second, we run a MCMC algorithm, with e p ( θ | y ) giv en b y (1 3) as inv ariant densit y . Th ird, we p erform an imp ortance sampling step from e p ( θ | y ) to p ( θ | y ), the un biased p osterior dens ity . See the in tro duction of Secti on 4 for the notation. The qualit y of the biasing procedu r e is assessed us ing the c riteria menti oned in Sec- tions 4.1. This consists in: (i) c hec kin g that the reaction co ordin ate v alues are u n iformly sampled o v er [ z min , z max ], (ii) c hec kin g that the output is symmetric w ith r esp ect to lab ellings, and man y switc hings b et w een the mo des are o bserved and (iii) computing th e eﬃc iency factor (a go o d indicator b eing the esti mator (17) deﬁ n ed in terms of b A ). In th e ﬁ rst step of the metho d (appr o ximation of the free energy usin g adaptiv e algo- rithm), w e d elib erately use the simplest exploration strategy , namely a Gaussian random w alk Hastings-M etrop olis up date, with small scales (see b elo w for th e precise v alues). This is to illustrate that the abilit y of adaptiv e algorithms to appro ximate the free energy do es not crucially d ep end o n a ﬁnely t uned up dating strategy . In the sec ond step, w e run a Hastings-Metropolis algorithm targeted a t the biased p oste- rior, using Cauch y random w alk prop osals, and the follo win g scales: τ µ = R/ 1000, τ v = 2 /R 2 , τ β = 2 × 10 − 5 αR 2 , where R is the range of the data, w hic h leads to acceptance rates b etw een 10% and 30% in all cases. 5.1 A ﬁrst example : the Fishery data W e ﬁrst consider the Fishery d ata of Titterington et al. (1986) (see also F r ¨ uh wir th -Sc h natter (2006)), wh ic h consist of the lengths of 256 sn app ers, and a Gaussian mixture mo del w ith K = 3 comp onents; see Fi gure 1 for a histogram. 5.1.1 Con vergence of the adaptive algorithms In the adaptiv e algorithm, w e use Gaussian random wa lk p rop osals with scales τ q = 5 × 10 − 4 , τ µ = 0 . 025, τ v = 0 . 05 and τ β = 5 × 10 − 3 . These parameters were also used to p ro duce the un biased tra ject ory in Figure 1. W e illustrate here the conv ergence p ro cess in the case ξ ( θ ) = β , u sing the ABF algorithm describ ed in Section 3.2, with z min = 0 . 05, z max = 4 . 0 and ∆ z = 0 . 01. First, we plot on Figure 3 the tra jectory of ( µ 1 , µ 2 , µ 3 ) and β for T = 10 8 iterations. With the ABF algorithm, the v alues visited b y β co ve r the wh ole in terv al [ z min , z max ], and the applied bias enables a frequen t switc hing o f t he m o des (observe d here on the p arameters ( µ 1 , µ 2 , µ 3 )). The tra jectories f or ( µ 1 , µ 2 , µ 3 ) should b e compared with the ones give n on 17 Figure 1, where no adaptive biasing force is applied (n ote that th e x -axis scale is not the same o n both plots). 0.0 2.5e+07 5.0e+07 7.5e+07 1.0e+08 4 6 8 10 12 Number of iterations Mu 0.0 2.5e+07 5.0e+07 7.5e+07 1.0e+08 0 1 2 3 4 Number of iterations Beta Figure 3: T ra j ectories o v er the 10 8 ﬁrst iteratio ns of th e ABF algorithm for the c hoice ξ ( θ ) = β , for ( µ 1 , µ 2 , µ 3 ) (left) and for the β v ariable (rig h t). Second, w e monitor the con vergence of the biasing p otenti al. T o this end, w e run a sim u lation for a total n umber of iterations T = 10 9 , an d store the biasing p oten tial eve ry N cvg = 10 6 iterations. The d istance b et w een the current bias and the b ias at iteration t − N cvg is mea sured b y δ t = v u u t inf c ∈ R N z − 1 X i =0  A t,i − A t − N cvg ,i − c  2 , (19) where A t,i denotes the v alue o f the bias in bin i , i.e. A t ( z ) = A t,i if z ∈ ( z i , z i +1 ). S ince the p oten tial is deﬁn ed only up to an additiv e constant, w e consider the optimal shift constan t c wh ich minimizes the mean-squared distance b et ween the t w o proﬁles. An ele men tary com- putation shows that this constan t is equal to the diﬀerence b etw een the a v erages of A t and A t − N cvg . W e ﬁnally renorm alize this distance as ε t = δ t q P N z − 1 i =0 A 2 t,i . The relati v e distance ε t as a function of the ite ration index t is plotted in Figure 4 . Corr ect appro ximations of the bias a re obtai ned after a few m ultiples of N cvg iterations (t he relativ e error b eing a lready lo we r than 0.1 a t the ﬁrst con v ergence c h ec k). W e emphasize again t hat w e did not optimize t he prop osal mo v es in order to reac h t he fastest conv ergence of the bias. It is very lik ely that b etter conv ergence results could b e obtained by carefully tun ing the parameters o f t he prop osal fu n ction, or resorting to prop osals of a diﬀeren t t yp e. On Figure 5, w e plot the free energies associated to the four react ion co ordinates ment ioned ab o v e, as estimated by adaptive algorithms. W e r ecall that, for ξ ( θ ) = β , z min = 0 . 05, z max = 4 . 0 and ∆ z = 0 . 01. F or ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } , we used an ABP algorithm, with z min = 500, z max = 540 and ∆ z = 0 . 1. F or ξ ( θ ) = q 1 and ξ ( θ ) = µ 1 , we used ABF, with resp ectiv ely z min = 0, z max = 1, ∆ z = 0 . 005 an d z min = min y i = 2 . 5, z max = max y i = 13, ∆ z = 0 . 05. Recall that the s o-obtained b ias is minus the marginal p osterior log-densit y of ξ . 18 0.0e+00 1.0e+08 2.0e+08 3.0e+08 4.0e+08 5.0e+08 −4 −3 −2 −1 Number of iterations logarithmic relative distance Figure 4: C on vergence of the logarithmic relativ e distance log( ε t ) / log (10) (see (19)), as a function o f the n umber o f iterat ions. This is wh y the thr ee imp ortant mo des in the µ parameter can b e read from the corresp onding bias in Figure 5 . Note also that there is a lo w er b ound on the admissible v alues of m in us the log-p osterior densit y , hence the platea u v alue of the corresp onding bias f or lo w v alues of the reaction c o ordinate corresp onding to unexplored regions. 3.0 6.0 9.0 12.0 −14.0 −10.0 −6.0 −2.0 1.0 Mu Bias 0.0 0.25 0.5 0.75 1.0 −8.0 −4.0 0.0 4.0 8.0 q Bias 0.0 1.0 2.0 3.0 4.0 −4.0 −2.0 0.0 2.0 4.0 6.0 Beta Bias 500 510 520 530 540 −10 −5 0 5 10 15 Negative of the log−posterior density Bias Figure 5: The ﬁshery data. F ree energies ob tained for the reaction co ordinates: ξ ( θ ) = µ 1 (top left), ξ ( θ ) = q 1 (top righ t), ξ ( θ ) = β (bottom left) and ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } (b ottom righ t). 5.1.2 Eﬃciency of t he biasing pro cedure W e n o w discuss the results of the MCMC algorithm targeted at the biased p osterior d istr i- bution, u sing the free energies compu ted ab o v e. In Figure 6, w e observe that all the b iased dynamics are m uc h more successful in terms of m o de switc h ings than the unbiased dynam- 19 Reaction co ord inate β − log { p ( θ ) p ( y | θ ) } q 1 µ 1 EF (numerical) 0.1 7 0.16 0.48 0.04 EF (theoretical) 0.179 0.178 0.4 54 0.079 T able 1 : Eﬃciency factor for v arious c hoices of reaction coordin ates, in the case K = 3. ics (see Figure 1). More pr ecisely , th e dynamics biased b y th e free energy asso ciated w ith ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } is the most successful in terms of switc h ings, but th e dynamics with ξ ( θ ) = β p er f orms correctly as w ell. The dynamics with ξ ( θ ) = q 1 seems to b e less su ccessful. In the case ξ ( θ ) = µ 1 , on e v alue of the parameters µ is forced to visit th e w hole r ange of v alues. The lo w est mo de (a round µ = 3) is not v ery w ell visited here. The eﬃciency factors are pr esen ted in T able 1. Th ey are rather large, w hic h sho w s that the imp ortance sampling pro cedure do es not yield a d egenerate sample. T he c hoice ξ ( θ ) = q 1 is the b est one, but ξ ( θ ) = β and ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } giv e comparable and satisfactory results as w ell. The choice ξ ( θ ) = µ 1 on the other hand is a p o or c hoice in this case. In view of these results, it seems that ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } is the b est choice , with the problem ho w ev er th at w e h ad to sligh tly modify the bias for the lo west v alues of the reactio n co ordinate b ecause of the to o s harp v ariations of the b ias in th is r egion. (Our numerical exp erience is indeed that the bias obtained from m inus the log- p osterior den sit y is sometimes diﬃcult to use d irectly .) On the other h and, the pr o cedure is more automatic for ξ ( θ ) = q 1 and β , the latter r eaction co ordin ate b eing a m uch b etter choic e when it comes to mo de switc hin gs. W e no w fo cus on ξ ( θ ) = β . As explained in the in tro duction, a go o d samp ler s hould visit all the p ossib le lab ellings of the parameter space. This implies in particular that the marginal p osterior distributions of the simulated comp onent parameters should b e nearly iden tical. This is clearly the case here, see the scatter plots of th e 1-in-10 4 sub-sample of the sim ulated pairs ( µ i , log λ i ), i = 1 , . . . , 3 in Figure 7. Th e top left picture in Figure 7 also demonstrates that th e b iased dyn amics in deed samples un if orm ly the v alues of β ov er the c hosen in terv al [ z min , z max ]. Finally , Figure 8 illustrates why the reaction co ordin ate ξ ( θ ) = β a llo ws for escaping from lo cal mo des; s ee the discussion in Section 4.2. Eac h plot represen ts a sub-sample of the sim u- lated p airs ( µ k ,t , log λ k ,t ) (where the s ubscript t denotes the iteration index while the subscrip t k l ab els the comp onen ts), restricted to β t b eing in in terv als, from left to r ight, [0 , 0 . 5], [1 . 5 , 2] and [3 . 5 , 4]. Since the bias fun ction d ep ends only on β , these plots are rough appro ximations of the p osterior distribution conditional on β = 0 . 25 (its p oste rior exp ect ation), β = 1 . 75 and β = 3. In the leftmost plot of Figure 8, β is ﬁxed to its p osterior exp ectatio n and the three mo des are well separated. As β is forced to tak e artiﬁcially large v alues (in the sense that the p osterior probabilit y densit y of suc h v alues is v ery small), the three mo d es get closer and ev entually merge. 5.1.3 Larger v alues of K and mo del c hoice W e apply our approac h to o ther v alues of K , n amely K = 4 to 6, in the case ξ ( θ ) = β . T able 2 rep orts the eﬃciency factor as a fu n ction of K . These factors remain quite satisfactory , whic h is related to th e fact that the amplitude of the free en er gy ( diﬀerence b et w een the m axim um and the minimum v alues) asso ciated to this reaction coord in ate is not to o large o ver the 20 0 2.5e+06 5.0e+08 7.5e+06 1.0e+07 2 5 8 11 14 14 14 14 Number of iterations Mu 0 2.5e+06 5.0e+08 7.5e+06 1.0e+07 2 5 8 11 14 14 14 14 Number of iterations Mu 0 2.5e+06 5.0e+08 7.5e+06 1.0e+07 2 5 8 11 14 14 14 14 Number of iterations Mu 0 2.5e+06 5.0e+08 7.5e+06 1.0e+07 2 5 8 11 14 14 14 14 Number of iterations Mu Figure 6: Th e ﬁshery data. T ra jectories of ( µ 1 , µ 2 , µ 3 ) for the b iased dynamics for diﬀerent reaction co ordinates. T op left: ξ ( θ ) = µ 1 . T op right: ξ ( θ ) = q 1 . Bottom left: ξ ( θ ) = β . Bottom r igh t: ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } . K 3 4 5 6 EF (numerical) 0.17 0.18 0.17 0.1 6 EF (t heoretical) 0.179 0.195 0.180 0.171 T able 2: Eﬃciency f actors, for v arious v alues of the n umber of comp onen ts consid ered in the mixture, w ith the c hoice ξ ( θ ) = β . c hosen inte rv al [ z min , z max ], and d o es not change dramatically , see the p roﬁles obtained for diﬀeren t v alues of K in Figure 9. Figure 10 repr esen ts th e marginal p osterior distribu tion of ( µ 1 , log λ 1 ), for K = 4 , 5 , 6. These plots are obtained by r esampling 2000 p oints from the outpu t of the MCMC targeting the biased p osterior, with probabilit y prop ortional to the imp ortance sampling w eight deﬁned in (14 ). In eac h case, we c hec ke d that the MCMC output is symmetric w ith r esp ect to lab el p ermutatio ns. T able 3 rep orts, for K = 3 , . . . , 6, the estimated log-Ba yes factor for c ho osing a mixture mo del with K comp onents against a mixture mo del with K − 1 comp onent s, w hic h equals log Z K / Z K − 1 , a ssuming equal p rior p robabilit y for d iﬀeren t v alues of K . The reported error lev els in T able 3 corresp ond to 90% conﬁdence in terv als, whic h are deduced from r ep eated indep end en t runs of T = 10 7 iterations from the same MCMC alg orithm targeting the b iased 21 0 .0 0 .5 1 . 0 1 . 5 2 . 0 2 .5 3 . 0 3 . 5 4 .0  0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 .5 2 4 6 8 1 0 1 2  1  3  2  1 0 1 2 3 4 l og(  1 ) 2 4 6 8 1 0 1 2  2  3  2  1 0 1 2 3 4 l og(  2 ) 2 4 6 8 1 0 1 2  3  3  2  1 0 1 2 3 4 l og(  3 ) Figure 7: T op left: Histogram of sim ulated β ’s and estimated marginal p osterior densit y of β . Remaining pictures: Scatter plots of simulate d v alues for ( µ i , log λ i ), for i = 1 , 2 , 3 when β is u s ed as a reaction coordinate. K log Z K / Z K − 1 error 3 7.1 ± 0.2 4 4.2 ± 0.1 5 1.5 ± 0.1 6 0.9 ± 0.1 T able 3: L og Ba y es factors for comparing mo del with K comp onen ts against K − 1 comp o- nen ts, for K = 3 , ..., 6; estimation error as ev aluated from in dep endent MCMC run s. p osterior. Th e estimation er r or is quite small, d espite b eing based on imp ortance samp ling steps in high dimen sional sp aces. 5.2 A second example: the Hidalgo stamps data Another well- kno wn b enchmark for mixtures is the Hidalgo stamps d ataset, ﬁrst studied b y Izenman and Sommer (1988) (see also e.g. Basford et al. (1997)), whic h consists of the thic kn ess (in m m) of n = 485 stamps from a giv en Mexican stamp issue; see Figure 1 for a histogram. (F or con venience w e multiplied the obs erv ations b y 100.) W e fo cus our presen- 22 Figure 8: Sim u lated pairs ( µ 1 , log λ 1 ) conditional on, from left to righ t, β ∈ [0 , 0 . 5], β ∈ [1 . 5 , 2] and β ∈ [3 . 5 , 4], see S ection 5.1.3 f or more detail s. 0 1 2 3 4 −8 −4. 0 4 8 Beta Bias K=3 K=4 K=5 K=6 Figure 9 : Estimated bias (free energy), for K = 3 , . . . , 6, and ξ ( θ ) = β . tation on the c hallenging case K = 3 . F or other v alues of K b etw een 4 and 7 our approac h p erforms b etter than f or K = 3. F or the sake of space the corresp onding results are not rep orted. This example is more c hallenging th an the previous one, presumably b ecause th e num b er of observ ations is larger, w hic h mak es the lik eliho o d more p eak ed. A clear sign of the increasing metastabilit y is the increase in th e free-e nergy barr iers. F or the reaction co ord inates ξ ( θ ) = q 1 , ξ ( θ ) = β , ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } , w e had to ru n the adaptive algorithm for T = 10 9 iterations in order to obtain a con v erged bias and r eco v er a b iased p oste rior sa mple whic h is symmetric b y lab elli ng. Again, a more elab orate prop osa l strateg y for the Hastings-Metrop olis step in the adaptiv e algo rithm w ould b e lik ely to stabilize the bias faster. As an illustration, Figure 11 presents the tr a jectories of ( µ 1 , µ 2 , µ 3 ) sampled by the adaptive algorithm, with random w alk scales: τ q = 0 . 001, τ µ = 0 . 0 5, τ v = 0 . 1 and τ β = 0 . 005. The tra jectories should b e compared to the ones depicted in Figure 1, whic h are obtained with the same prop osal, but without an y biasing p ro cedure. In Figure 12, we represen t the b iases obtained with v arious c h oices of the reaction co or- dinate. In the case ξ ( θ ) = β , w e set z min = 0 . 005, z max = 2 . 5 and ∆ z = 0 . 00 5. F or ξ ( θ ) = q 1 , w e consider z min = 0, z max = 1 and ∆ z = 0 . 005 . Finally , for ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } , w e c ho ose z min = 720, z max = 780 and ∆ z = 0 . 1. 23 2 4 6 8 1 0 1 2  1  2  1 0 1 2 3 log  1 2 4 6 8 1 0 1 2  1  2  1 0 1 2 3 log  1 2 4 6 8 1 0 1 2  1  2  1 0 1 2 3 log  1 Figure 10: Marginal p osterior d istr ibution of ( µ 1 , log λ 1 ), from left to r ight, for K = 4, 5 and 6, as represent ed by 2000 p oints resampled from the MCMC output. 0 2.5e+08 5.0e+08 7.5e+08 1.0e+09 8 10 12 6 Number of iterations Mu 0 2.5e+08 5.0e+08 7.5e+08 1.0e+09 6 8 10 12 Number of iterations Mu Figure 11: S ampled tra jectories for ( µ 1 , µ 2 , µ 3 ) during th e adaptiv e biasing pro cedure. Left: ABF tra j ectory when the reaction co ordinate is β . Righ t: ABP tra jectory when the reaction co ordinate i s min us the log-p osterior densit y . Reaction co ord inate β − log { p ( θ ) p ( y | θ ) } q 1 EF (numerical) 0 .02 0.24 0.23 EF (t heoretical) 0.06 0.13 0.18 T able 4 : Eﬃciency factor for v arious c hoices of reaction coordin ates, in the case K = 3. Biased tra jectories a re pr esen ted in Figure 13 for ξ ( θ ) = q 1 , ξ ( θ ) = β , and ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } . Eﬃciency factors are rep orted in T able 4. Th e results sho w th at, i n terms of mo de sw itc hing, ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } is the b est c hoice. The c h oice ξ ( θ ) = q 1 , although it leads to the highest eﬃciency factor, is a p oor c hoice s in ce v ery few switc hings are observ ed; in particular, the mo d e starting arou n d 7.5 do es not c hange during the ﬁrst 2 . 5 × 10 7 iterations. Such transitions are observ ed in the case ξ ( θ ) = β . 24 0.0 0.25 0.5 0.75 1.0 −30.0 0.0 30.0 60.0 90.0 120.0 q Bias 0.0 0.5 1.0 1.5 2.0 2.5 −10.0 0.0 10.0 20.0 Beta Bias 720 740 760 780 −10 0 10 −20 Negative of the log−posterior density Bias Figure 12: Hidalgo stamps pr oblem. F ree energies obtained for th e reaction co ordinates: ξ ( θ ) = q 1 (top left), ξ ( θ ) = β (top righ t) and ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } (b otto m). 0.0e+00 5.0e+06 1.0e+07 1.5e+07 2.0e+07 2.5e+07 6 8 10 12 14 14 14 14 14 14 Number of iterations Mu 0.0e+00 5.0e+06 1.0e+07 1.5e+07 2.0e+07 2.5e+07 6 8 10 12 14 14 14 14 14 14 Number of iterations Mu 0.0e+00 5.0e+06 1.0e+07 1.5e+07 2.0e+07 2.5e+07 6 8 10 12 14 14 14 14 14 14 Number of iterations Mu Figure 1 3: Hidalgo stamps problem. T ra jectories of ( µ 1 , µ 2 , µ 3 ) o f th e biased dynamics. T op left: reaction coordinate ξ ( θ ) = q 1 . T op righ t: ξ ( θ ) = β . Bottom: ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } . 25 6 Conclusion W e sho we d in this pap er h ow to sample eﬃcient ly p osteriors of univ ariate Gauss ian mixture mo dels, u sing a free energy biasing app roac h, whic h can b e summarized as follo ws: (1) W e c ho ose a reaction coordin ate (a fun ction ξ of θ ). (2) W e r u n an adaptive MC MC sampler, in ord er to compute an estimation of the free energy A associated to ξ . (3) F rom th e estimated free energy b A (the output of the previous step), w e deﬁne the biased densit y e π = π b A . W e ru n a standar d M CMC sampler that targets e π (say a Gaussian random-wa lk Hastings-Metrop olis algorithm). (4) W e do an imp ortance sampling step, from e π to π to remo ve the bias and r eco v er the true p osterior π . In the particular case of univ ariate Ga ussian mixture mod els, a go o d c hoice for the reac tion co ordinate is ξ = β or ξ ( θ ) = − log { p ( θ ) p ( y | θ ) } . When β is c hosen, it is easy to estimate an in terv al of t ypical v alues for β as [ c 1 R 2 , c 2 R 2 ] (with R the range of the data, and c 1 < c 2 small constan ts, sa y c 1 = 1 / 2 000 a nd c 2 = 1 / 20), wh ile the determination of suc h an i nt erv al for − log { p ( θ ) p ( y | θ ) } do es not seem to b e s tr aigh tforward. W e think that the same ideas ma y b e applied to other mixture mo dels. F or instance, Figure 14 p lots the p osterior densit y of a t wo-co mp onent P oisson mixture mo del, conditional on diﬀerent v alues for the hyp er-parameters. S p eciﬁcally , p ( y i | θ ) = q 1 P oisson ( y i ; λ 1 ) + (1 − q 1 )P oisson ( y i ; λ 2 ) for i = 1 , . . . , n , where Poisson( · ; λ ) denotes the probabilit y densit y function of a Poisson distribution of parameter λ . W e use a Gamma ( β ¯ y , β ) p rior for the λ k ’s, and a uniform pr ior for q 1 . The n = 100 observ atio ns are sim ulated fr om this mo del with parameters ( q 1 , λ 1 , λ 2 ) = (0 . 7 , 3 , 10). It c an b e seen a gain t hat b iasing the p oste rior distrib ution to wards larger v alues of β mak es it p ossible to reduce th e distance b et wee n the diﬀerent modes. W e also obtained in teresting preliminary results for m ultiv ariate Ga ussian mixtures. Figure 14: Scatter plots of 10 00 simulated pairs ( λ 1 , λ 2 ) from the posterior distribution of a t w o-comp onen t Po isson mixtu re mo del, and n = 100 simulate d data p oints, with a Gamma( β ¯ y , β ) p rior for the λ k , and, from left to right , β = 1, 10 , 2 0. W e would lik e to highlight some practi cal a dv an tages of our approac h. First, it requires lit- tle tuning: The main tunin g paramete rs are the scales of the random walks in b oth algo rithms (adaptiv e, and MCMC), and we ob tained satisfactory results without trying to o ptimize these scales. S econd, it is easy to chec k that the ﬁnal r esults are correct: If the free energy h as 26 b een we ll estimated, and the MCMC algorithm f or the biased p oste rior has con v erged, then a nearly uniform marginal distribution for the react ion co ordinate is observ ed, and the marginal distributions for the ( µ k , λ k , q k ) are nearly identi cal, b ecause of the symmetry of the tru e p os- terior, and the n umerous mode switc hings in the MC MC tra jectories. Finally , one natural question is ho w to extend such an approac h to other classes of Ba y esian mo dels. As w e ha ve made clear already , th e c hoice of the reactio n co ordinate is the crucial p oint . If the reaction coord inate is p oorly c h osen, a free energy biasing approac h w ill bring no b eneﬁt. As for applications in computational Stat istical Physics, there is no general recip e for c h o osing this r eaction coord inate. No netheless, the follo wing simple remarks ma y b e considered as some guidelines. First, it seems w orth inv estigating alternativ es to the reaction co ordinate usually chose n in S tatistics, namely the negativ e of the p oste rior log- densit y . Second, in d oing so, one ma y keep in mind th e in terpr etation we pr op osed for ξ ( θ ) = β in Section 4.2, i.e. a particular p arameter that ﬁxes to s ome exten t the size of th e energy barriers b et w een the d iﬀeren t modes. In particular, in a giv en Ba yesian hierarc h ical mo del, the h yp er-parameters at the highest le v el of the hierarch y co uld b e in teresting ca ndidates, b ecause their v alues strongly inﬂuence the t yp ical v alues of the other comp onents of the system. More researc h in th is dir ection is ho wev er required to dra w more d eﬁ nite conclusions. Ac kno wledgemen ts P art of this work w as done wh ile the t wo last authors w ere participating to the program “Computational Mathematics” at the Haussdorﬀ Institute for Mathematics in Bonn, Ger- man y . Supp ort f r om the ANR gran ts ANR-00 8-BLAN-02 18 and ANR-09-BLAN-0216 of the F renc h Ministry of Researc h is ac knowledge d. Th e authors thank Julien Cornebise, Arnaud Doucet, Peter J. Green, Pierre Jaco b, Christian P . Rob ert, Gareth Roberts, and the referees for in s igh tfu l remarks. References L. Am brosio, N. F usco, and D. P allara. F unctions of Bounde d V ariation and F r e e Disc onti- nuity Pr oblems . Oxford Science Pub licatio ns, 2000 . C. Andrieu and J. Thoms. A t utorial on adaptiv e MCMC. Statist. Comput. , 18(4):3 43–373 , 2008. Y. F. A tc had´ e and J. S. Liu. The W ang-Landau algorithm for Mon te-Carlo computation in general state spaces. Stat. Sinic a , 20 (1):209– 233, 2010. R. Bal ian. F r om Micr ophysics to Macr ophysics. Metho ds and Applic ations of Stat istic al Physics , v olume I - I I. Sprin ger, 2007. K. E. Basford, G. J. Mcla c h lan, and M. G. Y ork. Mo delling the d istribution o f st amp pap er thic kn ess via ﬁnite norm al mixtures: The 1872 Hidalgo stamp issue of M exico revisited. J. Appl. Stat. , 24(2):169– 180, 1997. G. Bussi, A. Laio, and M. P arrin ello. Equilibrium free energies f rom n onequilibrium metady- namics. Phys. R ev. L ett. , 96 (9):090 601, 2006. 27 G. Celeux, M. Hurn, and C. P . Rob ert. C omputational and inferential diﬃculties with mixtur e p osterior d istributions. J. Am. Sta tist. Asso c. , 95:95 7–970, 2000. C. Chip ot and T. Leli ` evre. Enhanced sampling of multidimensional free-energy landscap es using a daptiv e b iasing force s. arXiv pr eprint , 100 8.3457, 2010. E. Darve and A. P ohorille. Calculating free en er gies using av erage force. J. Ch em. P hys. , 115 (20):9 169–91 83, 2001 . B.M. Dic kson, F. Legoll, T. Lelivre, G. Stoltz, and P . Fleura-Lessard. F ree energy calculations: An eﬃcien t adaptiv e biasing p oten tial method . J. Phys. Chem. B , 114(17 ):5823– 5830, 2010. J. Dieb olt and C. P . Rob ert. Es timation of ﬁnite mixture distribu tions through Ba yesian sampling. J. R. Statist. So c. B , 56: 363–37 5, 1994. L. C. Ev ans and R. F. Gariep y . Me asur e The ory and Fine Pr op erties of F u nc tions . Studies in Ad v anced Mathematics. CR C Press, 1992. S. F r ¨ uhwirth-Sc hnatter. Finite Mixtur e and Markov Switching Mo dels . S pringer, 2 006. S. F r ¨ uhwirth-Sc hnatter. Mark o v c hain Mon te C arlo estimation of cl assical and d ynamic switc hin g a nd mixture mo dels. J. Am. Sta tist. A sso c. , 96(45 3):194– 209, 2001. A. Gelman and X. L. Meng. Sim ulating normalizing constant s: from imp ortance sampling to bridge s ampling to path samp ling. Stat. Sci. , 13(2):1 63–18 5, 1998. W. K. Hastings. Mon te Carlo sampling metho ds using Mark o v c hains and their applications. Biometrika , 5 7:97–10 9, 1970. J. H ´ enin and C. Chip ot. Ov ercoming free energy b arriers using un constrained molecular dynamics sim ulations. J. Chem. Phys. , 121( 7):2904 –2914, 200 4. Y. Iba. Extended ensem ble Mon te Carlo. Int. J. Mo dern Phys. C , 12(5):6 23–65 6, 2001. A. J. Izenman and C. J. S ommer. Philatelic mixtu res and m u ltimo dal d ensities. J. Am. Stat. Asso c. , 8 3(404) :941–95 3, 1988. A. J asra, C. C. Holmes, and D. A. S tephens. Mark ov c h ain Monte C arlo metho ds and the lab el sw itc hing problem in Ba y esian mixture mo d eling. Statist. Scienc e , pages 50–67, 2005 . B. Jourdain, T. Leli ` evre, and R. Roux. Existe nce, un iqueness and con vergence of a particle appro ximation for the Adaptiv e Biasing F orce pr o cess. ESAIM-M ath. Mo del. N u m. , 44(5): 831–8 65, 2010. A. Kong, J. S . Liu , and W. H. W ong. Sequential imputation and ba yesian missin g d ata problems. J. Am. Sta tist. Asso c. , 89:27 8–288 , 1994. T. Leli` evre and K. Minouk adeh. Longtime c on v ergence of an adaptiv e biasing fo rce metho d : The b i-c hann el case. Ar ch. R ation. Me ch. Anal. , accepted, 2 011. T. Leli ` evre, M. Rousset, and G. S toltz. Computation of free energy proﬁles with p arallel adaptiv e dynamics. J. Chem. Phys , 12 6:1341 11, 2007. 28 T. Leli ` evre, M. Rousset, and G. Stoltz. Long-time con v ergence of an adaptiv e biasing force metho d. Nonline arity , 21: 1155–1 181, 2008. T. L eli` evre, M. Rousset, and G. Stoltz. F r e e Ener gy Computations : A Mathematic al Persp e c- tive . Imp erial Colle ge Press, 2010. F. Liang. A generalized W ang-Landau algorithm for Mon te-Carlo c omputation. J. A m. Stat. Asso c. , 1 00(472 ):1311– 1327, 2005. F. L iang. T ra jectory a v eraging for sto chastic appr oximati on MCMC algorithms. Ann. Sta t. , 38(5): 2823–2 856, 2010. J. M. Marin and C. P . Rob ert. Bayesian c or e : a pr actic al appr o ach to c omputat ional Bayesian statistics . Sp r inger V erlag, 20 07. S. Marsili, A. Bardu cci, R. C helli, P . Pro cacci, and V. Sc hettino. Self-healing Umbrella Sampling: A non-equilibriu m approac h for qu an titativ e free energy calculations. J. Phys. Chem. B , 1 10(29): 14011– 14013, 2006. G.J. M cLac hlan and D. Pe el. Finite mixtur e mo dels . Wiley New Y ork, 2000. N. Metropolis, A. W. R osenbluth, M. N. Rosen bluth, A. H. T eller, and E. T elle r. Equations of state calculat ions by fast compu ting mac hines. J . Chem. Phys. , 21(6):10 87–109 1, 1953. R. M. Neal. Annealed importance sampling. Statist. Comput. , 11:12 5–139 , 2001 . S. Piana and A. Laio. A bias-exc hange approac h to protein f olding. J. Phys. Chem. B. , 111 (17):4 553–45 59, 2007 . P . Raiteri, A. Laio, F. L. Gerv asio, C. Mic heletti, and M. P arrinello. Eﬃcien t reconstruction of complex free energy landscap es by m ultiple w alke rs metadynamics. J . Phys. Chem. B , 110(8 ):3533– 3539, 2006. S. Ric hard s on a nd P . J. Green. O n Ba y esian analysis of mixtures with a n unknown num b er of comp onents. J. R. Statist. So c. B , 59(4):731–7 92, 1997 . C. P . Rob ert and G. Casella. Monte Ca rlo Sta tistic al Metho ds, 2nd e d. Sprin ger-V erlag, New Y ork, 2 004. D. M. Titterington, A. F. Sm ith, and U. E. Mak o v. Statistic al Analy sis of Finite Mixtur e Distributions . Wiley , 198 6. F. G. W ang and D. P . L an d au. Determining the density of states for classical statistical mo dels: A random wa lk algorithm to pro du ce a ﬂat histogram. Phys. R ev. E , 64(5): 05610 1, 2001a . F. G. W ang and D. P . Landau. Eﬃcien t, m u ltiple-range rand om w alk a lgorithm to calculate the den s it y of s tates. Phys. R ev. L ett. , 86(10):205 0–205 3, 2001b. 29

Free Energy Methods for Bayesian Inference: Efficient Exploration of Univariate Gaussian Mixture Posteriors

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment