Computational methods for Bayesian model choice

Computational metho ds for Ba y esian mo del c hoice C.P. R ober t ∗ D. Wraith † Octob er 22, 2018 Abstract In this note, w e shortly survey some recent approaches on the approx- imation of the Bay es factor used in Bay esian hypothesis testing and in Ba yesian model c hoice. In particular, w e reassess importance sampling, harmonic mean sampling, and nested sampling from a uniﬁed p erspective. Keyw ords: Ba yesian inference, Ba yes factor, importance sampling, nested sampling, Monte Carlo 1 In tro duction The Ba yes factor is a fundamen tal pro cedure that stands at the core of the Ba yesian theory of testing hypotheses, at least in the approac h advocated by Jeﬀreys (1939). (Rob ert et al. (2009) provides a reassessmen t of the role of Jeﬀreys (1939) in setting a formal framework for Bay esian testing) and by Jaynes (2003). Given an hypothesis H 0 : θ ∈ Θ 0 on the parameter θ ∈ Θ of a statistical mo del, with densit y f ( x | θ ), under a compatible prior of the form π (Θ 0 ) π 0 ( θ ) + π (Θ c 0 ) π 1 ( θ ) , the Bayes factor is deﬁned as the p osterior o dds to prior o dds ratio, namely B 01 ( x ) = π (Θ 0 | x ) π (Θ c 0 | x )  π (Θ 0 ) π (Θ c 0 ) = Z Θ 0 f ( x | θ ) π 0 ( θ )d θ  Z Θ c 0 f ( x | θ ) π 1 ( θ )d θ . Since mo del choice can b e considered from a similar p erspective, under the Ba yesian paradigm (see, e.g., Rob ert (2001)), the comparison of mo dels M i : x ∼ f i ( x | θ i ) , i ∈ I , ∗ CEREMADE - Universit ´ e Paris Dauphine, 75775 Paris, and CREST, ENSAE, F rance. Email: xian@ceremade.dauphine.fr † CEREMADE - Universit ´ e Paris Dauphine, 75775 Paris, F rance. Email: darren@ceremade.dauphine.fr 1 where the family I can b e ﬁnite or inﬁnite, leads to the same quan tities, p i Z Θ i f i ( x | θ i ) π i ( θ i )d θ i  p j Z Θ j f j ( x | θ j ) π j ( θ j )d θ j , i, i ∈ I In this short survey , we consider some of the most common Monte Carlo solu- tions used to appro ximate a generic Bay es factor or its fundamental component, the evidenc e Z k = Z Θ k π k ( θ k ) L k ( θ k ) d θ k , ak a the marginal likelihoo d. Longer en tries can b e found in Carlin and Chib (1995), Chen et al. (2000), Rob ert and Casella (2004), or F riel and Pettitt (2008). Note that we do not mention here trans-dimensional metho ds issued from the revolutionary pap er of Green (1995), since our goal here is to demon- strate that within-mo del simulation allo ws for the computation of Bay es factors and thus av oids the additional complexit y inv olved in trans-dimensional meth- o ds. 2 Imp ortance sampling solutions While a regular imp ortance sampling approach is feasible tow ards the appro xi- mation of the Ba yes factor B 12 = Z Θ 1 f 1 ( x | θ 1 ) π 1 ( θ 1 )d θ 1 Z Θ 2 f 2 ( x | θ 2 ) π 2 ( θ 2 )d θ 2 , as for instance in b B 12 = n − 1 1 P n 1 i =1 f 1 ( x | θ i 1 ) π 1 ( θ i 1 ) /$ 1 ( θ i 1 ) n − 1 2 P n 2 i =1 f 2 ( x | θ i 2 ) π 2 ( θ i 2 ) /$ 2 ( θ i 2 ) whic h relies on imp ortance functions (densities) $ 1 and $ 2 and on simulations θ i 1 ∼ $ 1 and θ i 2 ∼ $ 2 , sp eciﬁc solutions targeted to ward Bay esian mo del choice are indeed av ailable and preferable. Most of those solutions ﬁt under the de- nomination of bridge sampling and aim at taking adv antage of the connections b et w een the t wo mo dels under comparison. In fact, when comparing t wo mo dels with the same complexit y (i.e., the same dimension for their respective param- eter spaces), it is often possible to ﬁnd a reparameterisation of b oth mo dels in terms of some sp eciﬁc momen ts of the sampling mo del, lik e E [ X ], so that parameters under b oth mo dels hav e a common meaning. 2.1 Bridge sampling Assuming that the parameters of b oth mo dels under comparison, θ 1 and θ 2 resp ectiv ely , thus b elong to the same parameter space (i.e., Θ 1 = Θ 2 ), a ﬁrst 2 solution is to syndicate simulations b et ween b oth mo dels in order (a) to recycle sim ulations under one mo del for the other mo del and (b) to create correlation b et w een the estimates of the numerator and of the denominator of the Ba yes factor in order to improv e stability in the estimate. This solution is made clear with the formula of Gelman and Meng (1998) B 12 ≈ 1 n n X i =1 ˜ π 1 ( θ 2 i | x ) ˜ π 2 ( θ 2 i | x ) , when θ 2 i ∼ π 2 ( θ | x ), where π 1 ( θ 1 | x ) ∝ ˜ π 1 ( θ 1 | x ) π 2 ( θ 2 | x ) ∝ ˜ π 2 ( θ 2 | x ) , as in most Ba yesian settings. (An extension to the cases when Θ 1 ⊂ Θ 2 , in- cluding those when the dimension of Θ 1 is smaller than the dimension of Θ 2 , can b e easily derived, as shown in Chen et al. (2000). Note that the assumption Θ 1 = Θ 2 signiﬁes that the represen tations of b oth mo dels hav e b een reparame- terised in terms of the same momen ts.) This is a very sp ecial case of the general representation (T orrie and V alleau 1977) B 12 = E ϕ [ ˜ π 1 ( θ ) /ϕ ( θ )] E ϕ [ ˜ π 2 ( θ ) /ϕ ( θ )] , whic h holds for any density ϕ with a suﬃciently large support and requires a single sample θ 1 , . . . , θ n generated from ϕ to pro duce an imp ortance sampling ratio estimate. In that case, a quasi-optimal solution is provided by Chen et al. (2000), namely ϕ ∗ ( θ ) ∝ | π 1 ( θ ) − π 2 ( θ ) | . The missing normalising constants in b oth π 1 and π 2 ob viously mean that this solution cannot b e used p er se . In fact, considering the very sp ecial case when π 1 ( θ ) = π 2 ( θ ) on some region of the parameter space, we see that the solution ϕ ∗ ( θ ) should not b e used b ecause it is null on some p ortion of the supp ort of π 1 and π 2 , thus contradicting a fundamen tal requiremen t of imp ortance sampling. Another extension of this bridge sampling approac h can be based on the general representation B 12 = Z ˜ π 2 ( θ | x ) α ( θ ) π 1 ( θ | x ) dθ Z ˜ π 1 ( θ | x ) α ( θ ) π 2 ( θ | x ) dθ ≈ 1 n 1 n 1 X i =1 ˜ π 2 ( θ 1 i | x ) α ( θ 1 i ) 1 n 2 n 2 X i =1 ˜ π 1 ( θ 2 i | x ) α ( θ 2 i ) where θ j i ∼ π j ( θ | x ), which applies for any p ositiv e and integrable function α . Some c hoices of α do lead to very p o or p erformances of the metho d in 3 connection with the harmonic mean approach (Raftery et al. 2007), but there exists a quasi-optimal solution, as provided by Gelman and Meng (1998): α ? ∝ 1 n 1 π 1 ( θ | x ) + n 2 π 2 ( θ | x ) . Once again, the optimum cannot b e used p er se , since it requires the normalis- ing constants of b oth π 1 and π 2 . As suggested by Gelman and Meng (1998), an appro ximate v ersion uses iterative versions of α ? , based on successive iterates of approximations to the Bay es factor. Note that this solution recycles simula- tions from each posterior, whic h is quite appropriate since one mo del is selected via the Bay es factor, instead of using an imp ortance sample common to b oth appro ximations. W e will see b elo w an alternative representation of the bridge factor that bypasses this diﬃculty . 2.2 Harmonic means While using the generic harmonic mean approximation to the marginal likeli- ho od is often fraught with danger (Neal 1994, Chopin and Rob ert 2007a), the represen tation (Gelfand and Dey 1994) E π k  ϕ ( θ k ) π k ( θ k ) L k ( θ k )     x  = Z ϕ ( θ k ) π k ( θ k ) L k ( θ k ) π k ( θ k ) L k ( θ k ) Z k d θ k = 1 Z k (1) holds, no matter what the densit y ϕ ( · ) is. This represen tation is remark able in that it allows for a direct pro cessing of Monte Carlo or MCMC output from the p osterior distribution. In addition, and as opp osed to usual imp ortance sampling constraints, the density ϕ ( θ ) must hav e lighter—rather than fatter— tails than π k ( θ k ) L k ( θ k ) for the appro ximation of the Bay es factor 1 , 1 T T X t =1 ϕ ( θ ( t ) k ) π k ( θ ( t ) k ) L k ( θ ( t ) k ) to enjoy ﬁnite v ariance. Therefore, using ϕ ( θ k ) = π k ( θ k ) as in the original har- monic mean approximation (Newton and Raftery 1994) will often result in an inﬁnite v ariance, as discussed by (Neal 1994). On the opp osite, using ϕ ’s with constrained supp orts derived from a Monte Carlo sample, like the conv ex hull of the simulations corresp onding to the 10% or to the 25% HPD regions—that again is easily derived from the sim ulations—is both completely appropriate and implemen table, as illustrated by Figure 1 for a toy example. In this example, w e used the simulations within the HPD region to deﬁne an ellipse and conse- quen tly a uniform densit y ϕ ov er this ellipse. Since the true “evidence” can be computed analytically , c hecking the con vergence of the harmonic approximation is straigh tforward. (W e warn the reader that this “evidence” cannot b e used in a mo del comparison framework, b ecause it is asso ciated with an improp er prior that is not acceptable in testing settings. See DeGro ot (1973) or Rob ert (2001) for more details. Nonetheless, it provides a v alid to y example to chec k the conv ergence of an in tegral appro ximation.) 4 Figure 1: (left) Represen tation of a Gibbs sample of 10 3 parameters ( θ , σ 2 ) for the normal mo del, x 1 , . . . , x n ∼ N ( θ , σ 2 ) with x = 0, s 2 = 1 and n = 10, under Jeﬀreys’ prior, along with the p oin twise approximation to the 10% HPD region (in darker hues) . (right) Ev aluation of the approximation of the evidence based on the density ϕ for the uniform distribution on the ellipse approximating this HPD region. 2.3 Mixture bridge sampling As noted ab o ve, a remark able feature of the representation (1) is that the derived implemen tation can directly exploit the output of any MCMC sampler. Another approac h in tro duced in Chopin and Rob ert (2007b) aims at the same goal and attains the optimal bridge sampler from a completely diﬀerent persp ective. It considers a sp eciﬁc mixture structure as imp ortance function, of the form e ϕ ( θ ) ∝ ω π ( θ ) L ( θ ) + ϕ ( θ ) , where ϕ ( · ) is an arbitrary but fully normalised density . Sim ulating from this mixture, assuming there already exists an MCMC sampler with stationary dis- tribution π ( θ | x ) ∝ π ( θ ) L ( θ ), is straigh tforw ard, thanks to a tailored Gibbs sampler: Mixture Gibbs sampler A t iteration t 1. T ake δ ( t ) = 1 with probabilit y ω 1 π k ( θ ( t − 1) k ) L k ( θ ( t − 1) k )   ω 1 π k ( θ ( t − 1) k ) L k ( θ ( t − 1) k ) + ϕ ( θ ( t − 1) k )  and δ ( t ) = 2 otherwise; 5 2. If δ ( t ) = 1 , generate θ ( t ) k ∼ MCMC ( θ ( t − 1) k , θ k ) where MCMC ( θ k , θ 0 k ) de- notes an arbitra ry MCMC k ernel associated with the p osterio r π k ( θ k | x ) ∝ π k ( θ k ) L k ( θ k ) ; 3. If δ ( t ) = 2 , generate θ ( t ) k ∼ ϕ ( θ k ) indep endently The simulation step 1. selecting b et ween b oth comp onen ts of the mixture is not only allowing for sim ulation from this mixture, but it also do es provide a direct estimate to the evidence. Indeed, the Rao-Blackw ellised estimate ˆ ξ = 1 T T X t =1 ω 1 π k ( θ ( t ) k ) L k ( θ ( t ) k )  ω 1 π k ( θ ( t ) k ) L k ( θ ( t ) k ) + ϕ ( θ ( t ) k ) , con verges to ω 1 Z k / { ω 1 Z k + 1 } and we can th us deduce ˆ Z 3 k = 1 ω 1 P T t =1 ω 1 π k ( θ ( t ) k ) L k ( θ ( t ) k )  ω 1 π ( θ ( t ) k ) L k ( θ ( t ) k ) + ϕ ( θ ( t ) k ) P T t =1 ϕ ( θ ( t ) k )  ω 1 π k ( θ ( t ) k ) L k ( θ ( t ) k ) + ϕ ( θ ( t ) k ) . W e hav e thus recov ered the optimal bridge sampling estimate from this diﬀeren t p erspective, akin to Bartolucci et al. (2006), that is more in line with reversible jump trans-dimensional schemes than regular imp ortance sampling. The only mo diﬁcation compared with the original version is that the sequence ( θ ( t ) k ) is generated from the mixture for b oth the n umerator and the denominator. Figure 2 shows that, for the toy example introduced in Figure 1, the harmonic mean appro ximation does as well as the optimal bridge sampling solution. 3 Nested sampling This method in tro duced in Skilling (2006, 2007) (although an earlier v ersion can b e found in Burrows (1980)) pro duces a very sp eciﬁc type of imp ortance sam- pling based on constrained simulations from the prior distribution. While more details descriptions are av ailable in the ab o ve reference and Chopin and Rob ert (2007b) (as well as a full con vergence assessmen t in the later pap er), let us recall here that nested sampling is based on the one-dimensional represen tation Z = E π [ L ( θ )] = Z 1 0 ϕ ( x ) d x of the evidence, when ϕ − 1 ( l ) = P π ( L ( θ ) > l ) is the surviv al probability function asso ciated with the likelihoo d. The approx- imation of Z b y a Riemann sum: b Z = N X i =1 ( x i − 1 − x i ) ϕ ( x i ) 6 Figure 2: F or the same setting as Figure 1, (left) Ev aluation of the bridge sam- pling approximation of the evidence based on the mixture ˜ ϕ when the density ϕ corresp onds to the uniform distribution on the ellipse appro ximating the same HPD region as in Figure 1, using a v alue of ω equal to one-tenth of the true evidence; (right) b o xplot comparison of the v ariations of b oth approaches based on 100 Mon te Carlo replicas with samples of size 10 4 . On b oth graphs, the true evidence is represented by the horizontal dotted line. where the x i ’s are either deterministic, e.g. x i = exp {− i/ N } , or random, also allo ws for the representation b Z = N − 1 X i =0 { ϕ ( x i +1 ) − ϕ ( x i ) } x i whic h is a sp ecial case of b Z = N − 1 X i =0 { L ( θ ( i +1) ) − L ( θ ( i ) ) } π ( { θ ; L ( θ ) > L ( θ ( i ) ) } ) (2) where · · · L ( θ ( i +1) ) > L ( θ ( i ) ) · · · . (This can b e seen as the Leb esgue version of the Riemann’s sum, the triangulation b eing on the second axis instead of the ﬁrst axis.) Since ϕ is rarely a v ailable in closed form, the nested sampling algorithm relies on an estimate of ϕ ( x i ) or, equiv alently , of π ( { θ ; L ( θ ) > L ( θ ( i ) ) } ): Nested sampling algorithm Sta rt with N values θ 1 , . . . , θ N sampled from π A t iteration i , 1. T ake ϕ i = L ( θ k ) , where θ k is the p oint with smallest likelihoo d in the p ool of θ i ’s 7 2. Replace θ k with a sample from the prio r constrained to L ( θ ) > ϕ i : the current N points a re sampled from prio r constrained to L ( θ ) > ϕ i . In terms of the representation (2), this amounts to use the approximation b π ( { θ ; L ( θ ) > L ( θ ( i ) ) } ) /π ( { θ ; L ( θ ) > L ( θ ( i − 1) ) } ) = ( N − 1) / N . As discussed in Ev ans (2007) and Chopin and Robert (2007c), the dominat- ing term in the approximation is the sto chastic part that conv erges at O( √ n ) sp eed. The metho d thus formally compares with those mentioned in the pre- vious section. At a more practical level, nested sampling can b e interpreted as an imp ortance sampling technique where, instead of simulating a whole sample from the prior distribution, π ( θ ), and appro ximating the evidence by 1 N N X i =1 L ( θ i ) , whic h is usually ineﬃcient, p oints θ ( i ) are successively sampled from the prior restricted to higher and higher levels of the likelihoo d, with decreasing weigh ts 1 N  1 − 1 N  i − 1 . T o illustrate the comparison with a standard imp ortance sampling approx- imation, w e now consider an artiﬁcial example based on the likelihoo d of a t wisted normal distribution (ﬁrst introduced in (Haario et al. 1999) as a b enc h- mark for adaptive MCMC schemes) in t w o dimensions with co v ariance ma- trix Σ = diag( σ 2 1 , 1). The “twist” is due to considering the transform θ 0 2 = θ 2 + β ( θ 2 1 − σ 2 1 ), which leads to a sharp b end in the likelihoo d con tours, as sho wn in Figure 3. Since the Jacobian of the t wist is equal to 1, the density is th us deﬁned as: ψ ( θ 1 , θ 2 ) = ( θ 1 , θ 2 + β ( θ 2 1 − σ 2 1 )) ∼ N 2 (0 , Σ) . If w e consider β and σ 2 1 as kno wn, the app eal of this example is in integrating o ver the parameters θ 1 and θ 2 with priors π ( θ 1 , θ 2 ). The to y evidence can thus b e represented as Z 1 = Z ψ ( θ 1 , θ 2 ) π ( θ 1 , θ 2 )d θ F or the example that we consider, w e ﬁx β = 0 . 03, σ 2 1 = 100 (as represen ted in Figure 3) and we use ﬂat priors on θ 1 in ( − 40 , 40) and on θ 2 also on ( − 40 , 40). (The prior square is chosen arbitrarily to allo w all p ossible v alues and still to retain a compact parameter space. F urthermore, a ﬂat prior allows for an easy implementation of nested sampling since the constrained sim ulation can b e implement ed via a random walk mov e, as p oin ted out in Skilling (2006).). In tegrating the likelihoo d o v er this region of the parameter space presen ts a c hallenging problem for an y approac h as the co verage of the tails of the twisted 8 Figure 3: Contours (at the levels 68%, 95% and 99 . 9%) of the likelihoo d of the t wisted normal mo del for β = . 03 , σ 2 1 = (100). normal distribution can be diﬃcult or even imp ossible to capture. (This p oint is discussed at length in W raith et al. (2009).) In this to y example, the tw o-dimensional nature of the parameter space do es allo w for a numerical integration of Z 1 , th us pro ducing a reference v alue, based on a Riemann approximation of the integral and a grid of 1000 × 1000 p oin ts in the (-40, -40) × (40,40) square (an adaptive quadrature approach was also used as a chec k). This approac h leads to a stable ev aluation of Z 1 that can b e reliably tak en as the reference against which we test alternativ e appro ximation methods. The comparison is here restricted to a standard nested sampling algorithm deriv ed from Skilling (2006) and a p opulation Monte Carlo (PMC) mixture imp ortance sampler constructed in W raith et al. (2009) for this b enchmark problem and introduced in Capp´ e et al. (2008) in a general framework. Brieﬂy , this imp ortance sampling approac h is adaptive as it consists of modifying the parameters of the imp ortance function (a mixture density), bringing it closer to the p osterior density ov er a small num b er of iterations. The pro ximity is measured in terms of the Kullback divergence betw een the p osterior densit y and imp ortance function, since utilising an integrated EM approach ensures that the div ergence successiv ely decreases at eac h iteration. While this adaptive imp ortance sam pler derives a prop osal ϕ to simulate from the (pseudo-)p osterior ψ ( θ 1 , θ 2 ) π ( θ 1 , θ 2 ), it can obviously pro vide in addition an approximation of the marginal likelihoo d Z 1 . (W e stress that any imp ortance sampler used in this setting oﬀers this facility of pro viding an appro ximation of both the evidence and of its v ariability .) F or comparison purp oses, the PMC approach uses 5000 simulated points p er iteration ov er a total of 10 iterations and then a ﬁnal sample of 50000 p oin ts, sim ulated from the “optimal” importance sampling function obtained through the PMC sequence. The imp ortance function to b e optimised consis ts of a mixture of 9 multiv ariate Studen t t’s with 9 degrees of freedom for eac h comp onen t. F or the initial v alues of the importance function, components of 9 Figure 4: Comparison of nested sampling with PMC ov er 100 simulation runs for (left) Evidence estimation; (c entr e) E [ θ 1 ] estimation; (right) E [ θ 2 ] estimation. The true v alue is represen ted as an horizontal dotted line. the mixture are lo cated randomly in diﬀeren t directions sligh tly aw ay from 0: the mean of the comp onen ts are drawn from a biv ariate Gaussian distribution with mean 0 and cov ariance equal to Σ 0 / 5, where Σ 0 is a diagonal matrix with diagonal en tries (200 , 50). In parallel, w e run the nested sampling algorithm with N = 1000 initial points, repro ducing the implementation of Skilling (2006), using 50 steps of a random w alk in ( θ 1 , θ 2 ) constrained by the lik eliho o d b oundary to pro duce the next v alue, based on the con tribution of the curren t v alue of ( θ 1 , θ 2 ) to the approximation of Z 1 . The step size (ie. the v ariance) in the random walk is 0.1 and the pro cess is rep eated (iterated) 10,000 times (and monitored) to ensure a deﬁnitive completion of the accumulation of Z 1 . Alternativ e scenarios, including changes to the num b er of p oints N , to the num b er of steps and to the step size were explored to assess the sensitivity of the results to the v alues set, but they did not lead to an impro vemen t in the results. T o assess the v ariability of the results, 100 simulation runs for b oth PMC and nested sampling algorithms ha ve b een used. Figure 4 summarises our results for PMC compared with nested sampling o ver the 100 simulation runs for ev aluation of the evidence Z 1 and of the p oste- rior mean for θ 1 ( E [ θ 1 ]) and θ 2 ( E [ θ 1 ]), since the outcome of a nested sampling run can b e utilised as any imp ortance sampling output. Those results suggest that nested sampling exhibits a slight upw ard bias for the ev aluation of the evi- dence (a point also noted in Chopin and Rob ert (2007b)) while it appro ximately pro duces the same n umerical v alue for the estimates of E [ θ 1 ] and of E [ θ 2 ], alb eit with a greater v ariability . 4 Commen ts V arious importance sampling strategies hav e b een prop osed recently that ex- plicitly target the evidence. While there is no clear winner emerging from the comparison, w e conclude that the bridge sampling strategy remains a reference 10 in this domain, but also that the harmonic version of Gelfand and Dey (1994), Bartolucci et al. (2006), Chopin and Rob ert (2007b) may pro duce v aluable ap- pro ximations if the empirical HPD regions are exploited in the wa y describ ed in the current pap er. Ac kno wledgemen ts Both C.P . Rob ert and D. W raith are supp orted b y the 2006-2009 ANR “Ecos- stat”. C.P . Rob ert is grateful to the organisers of MaxEnt 2009 for their kind in vitation and to O. Capp ´ e for discussions, as well as to the students attending the course on Bay esian Data Analysis for Ecologists in Gran Paradiso National P ark, Aosta, Italy , for they made him realise the strong p otential of using em- pirical HPD regions. References H. Jeﬀreys, The ory of Pr ob ability , The Clarendon Press, Oxford, 1939, ﬁrst edn. C. Rob ert, N. Chopin, and J. Rousseau, Statist. Scienc e (2009), (to app ear). E. Jaynes, Pr ob ability The ory , Cambridge Univ ersity Press, Cambridge, 2003. C. Rob ert, The Bayesian Choic e , Springer-V erlag, New Y ork, 2001, second edn. B. Carlin, and S. Chib, J. R oyal Statist. So ciety Series B 57 , 473–484 (1995). M. Chen, Q. Shao, and J. Ibrahim, Monte Carlo Metho ds in Bayesian Compu- tation , Springer-V erlag, New Y ork, 2000. C. Robert, and G. Casella, Monte Carlo Statistic al Metho ds , Springer-V erlag, New Y ork, 2004, second edn. N. F riel, and A. Pettitt, J. R oyal Statist. So ciety Series B 70(3) , 589–607 (2008). P . Green, Biometrika 82 , 711–732 (1995). A. Gelman, and X. Meng, Statist. Scienc e 13 , 163–185 (1998). G. T orrie, and J. V alleau, J. Comp. Phys. 23 , 187–199 (1977). A. Raftery , M. Newton, J. Satagopan, and P . Krivitsky , “Estimating the inte- grated likelihoo d via p osterior simulation using the harmonic mean identit y (with discussion),” in Bayesian Statistics 8 , edited by J. Bernardo, M. Ba- y arri, J. Berger, A. Da wid, D. Hec kerman, A. Smith, , and M. W est, Oxford Univ ersity Press, 2007, pp. 371–416. R. Neal, J. R oyal Statist. So ciety Series B 56 (1) , 41–42 (1994). 11 N. Chopin, and C. Rob ert, “Commen ts on ‘Estimating the integrated lik eliho od via p osterior simulation using the harmonic mean iden tit y (with discussion)’,” in Bayesian Statistics 8 , edited b y O. U. P . Bernardo, J. M. et al. (eds), 2007a, pp. 371–416. A. Gelfand, and D. Dey , J. R oyal Statist. So ciety Series B 56 , 501–514 (1994). M. Newton, and A. Raftery , J. R oyal Statist. So ciety Series B 56 , 1–48 (1994). M. DeGro ot, J. Americ an Statist. Asso c. 68 , 966–969 (1973). N. Chopin, and C. Rob ert, Contemplating evidence: properties, extensions of, and alternativ es to nested sampling, T ec h. Rep. 2007-46, CEREMADE, Uni- v ersit´ e Paris Dauphine (2007b), F. Bartolucci, L. Scaccia, and A. Mira, Biometrika 93 , 41–52 (2006). J. Skilling, Bayesian A nalysis 1(4) , 833–860 (2006). J. Skilling, “Nested sampling for general Bay esian computation,” in Bayesian Statistics 8 , edited by J. Bernardo, M. Bay arri, J. Berger, A. David, D. Heck- erman, A. Smith, and M. W est, 2007, (to app ear). B. L. Burrows, IMA J. Appl. Math. 26 , 151–173 (1980). M. Ev ans, “Discussion of Nested Sampling for Bay esian computations b y John Skilling,” in Bayesian Statistics 8 , edited by J. Bernardo, M. Bay arri, J. Berger, A. Da vid, D. Heck erman, A. Smith, and M. W est, Oxford Univer- sit y Press, 2007, pp. 491–524. N. Chopin, and C. Rob ert, “Comments on ‘Nested Sampling’ b y John Skilling,” in Bayesian Statistics 8 , edited b y O. U. P . Bernardo, J. M. et al. (eds), 2007c, pp. 491–524. H. Haario, E. Saksman, and J. T amminen, Computational Statistics 14(3) , 375–395 (1999). D. W raith, M. Kilbinger, K. Benab ed, C. O., J.-F. Cardoso, F. G., S. Prunet, and C. Rob ert, Physic al R eview D 80 (2) , 023507 (2009). O. Capp´ e, R. Douc, A. Guillin, J.-M. Marin, and C. Rob ert, Statist. Comput. 18 , 447–459 (2008). 12

Computational methods for Bayesian model choice

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment