On variance stabilisation by double Rao-Blackwellisation

On v ariance stabilisation b y double Rao-Blac kw ellisation ∗ Alessandra Ia cobucci, CEREMADE, Universit ´ e Paris Dauphine & CNRS, Paris Jean-Michel Marin, INRIA Sa cla y , Pr oje ct select , Universit´ e Paris-Sud, & CREST, INSEE & Christian P. R ober t CEREMADE, Universit ´ e Paris Dauphine & CREST, INSEE Decem b er 14, 2021 Abstract P opulation Monte Carlo has been introduced as a sequential importance sampling tec hnique to o vercome p o or ﬁt of the imp ortance function. In this paper, w e compare the p erformances of the original P opulation Monte Carlo algorithm with a modiﬁed version that eliminates the inﬂuence of the transition particle via a double Rao-Blac kwellisation. This mo diﬁcation is shown to improv e the exploration of the mo des through an large simulation exp erimen t on p osterior distributions of mean mixtures of distributions. Keyw ords: Imp ortance Sampling mixture, adaptive Monte Carlo, P opulation Monte Carlo, mul- timo dalit y , mixture of distributions, random w alk. 1 In tro duction When Capp´ e et al. (2004) introduced Population Monte Carlo (or PMC) as a repeated Sampling Imp ortance Resampling pro cedure, the purp ose was to ov ercome the shortcomings of Imp ortance Sampling (abbreviated to IS in the following) pro cedures, while preserving the adv an tages of IS compared to alternatives suc h as Mark ov Chain Mon te Carlo metho ds, namely the p ossibilit y of dev eloping parallel implemen tations, whic h b ecomes more and more imp ortan t with the generalisation of multiple core mac hines and computer clusters, and of allo wing for easy assessment of the Monte Carlo error and, correlatively , for the dev elopment of on-line calibration mechanisms. Indeed, adaptive Monte Carlo naturally answers the diﬃculties of ﬁnding a “go od” imp ortance function in IS b y gradually impro ving this importance function against a given target density , based on some form of Monte Carlo appro ximation. Previous simulations are used (see, e.g., Douc et al., 2007a,b) to mo dify prop osal distributions represented as mixtures of ﬁxe d prop osals in order to increase the weigh ts of the most appropriate comp onen ts. When the prop osal is inspired from random w alk structures as in Metrop olis–Hastings algorithms and when the up date of those w eights is to o crude, as in Capp´ e et al. (2004), the improv emen t is so short-sighted that m ultiple iterations do not increase the eﬃciency of the metho d, unless a Rao-Blackw ell step is added to replace the actual prop osal with the mixture prop osal in the imp ortance w eight (see Douc et al., 2007a). More recen tly , ∗ This w ork has b een supp orted b y the Agence Nationale de la Recherc he (ANR, 212, rue de Bercy 75012 P aris) through the 2006-2008 pro ject Adap’MC . The third author is particularly grateful to Andrew Barron for his suggestion to look at biased datasets in the con text of mixtures, during the Ba yesian Nonparametric W orkshop in Roma, June 2004. 1 Capp ´ e et al. (2007) ha ve th us replaced random w alk prop osals based on earlier samples with a new PMC scheme with parameterised prop osals whose parameters are estimated from earlier samples, leading to a monotonic decrease in the en tropic distance to the target distribution. There is indeed a m yopic feature in random walk prop osals, namely that, once a (ﬁrst) sample has exhibited some high densit y regions for the target distribution, the algorithm is reluctan t to allow for a wider exploration of the supp ort of the target and it may th us miss imp ortan t density regions, missing the energy to reac h forw ard to these other regions of in terest. In this paper, we nonetheless fo cus on random walk proposals b ecause those kernels are more op en to complex settings than on indep enden t prop osals—the later indeed require some preliminary kno wledge ab out the target or else an ma jor upgrade in computing p ow er to face a m uch larger n umber of comp onen ts in the mixture. More precisely , w e present an exp erimen tal assessment of the use of a so-called double R ao-Blackwel lisation scheme tow ards a b etter exploration of the mo des of the target distribution. The se c ond Rao-Blackw ellisation step used in this double Rao-Blackw ellisation sc heme is essen tially an integration ov er the particles of the previous sample used in the random w alk proposal. While this leads to an additional computing cost in the deriv ation of the imp ortance w eights, double Rao-Blackw ellisation undoubtedly brings a signiﬁcan t improv emen t in the stability of those imp ortance w eights and therefore justiﬁes (in principle) the use of this additional step. Nev ertheless, since an analytic proof of this impro vemen t brought by double Rao-Blackw ellisation is to o delicate to con template, w e use an in tensive Mon te Carlo experiment to establish the clear impro vemen t in the case of a p osterior distribution asso ciated with a Gaussian location mixture and a sample with outliers—in order to increase the num b er of modes. The paper is organised as follo ws: In Section 2, w e recall the basics of our p opulation Mon te Carlo algorithm and w e introduce the double Rao-Blac kwellisation mo diﬁcation. In Section 3, we motiv ate the choice of Gaussian mean mixtures as a w orthy Mon te Carlo exp eriment to test the mo de ﬁnding abilities of b oth algorithm. Section 4 describ es the implemen tation of the Monte Carlo exp erimen t and describ es the results. Section 5 con tains a short discussion. 2 P opulation Monte Carlo Giv en a target distribution with densit y π that is kno wn up to a normalising constan t, the grand sc heme of PMC is the same as with other Mon te Carlo—including MCMC—methods, namely to pro duce a sample that is distributed from π without resorting to direct sim ulation from π . Let us recall that, once a sample ( X 1 , . . . , X N ) is pro duced b y sampling imp ortance resampling (see, e.g., Rob ert and Casella, 2004, Marin and Rob ert, 2007), i.e., b y ﬁrst simulating X i ∼ f ( x ), second pro ducing imp ortance weigh ts ω i ∝ π ( X i ) /f ( X i ), and third resampling the X i ’s by multinomial/bo otstrap sampling with weigh ts ω i , this SIR sample pro vides an appro ximation to the target distribution π and can be used as a stepping stone to wards a better appro ximation to π . 2.1 PMC basics In the PMC algorithm of Capp´ e et al. (2004), if ( X 1 , . . . , X N ) is a sample approximately distributed from π , it is mo diﬁed sto c hastically using an arbitrary Marko v transition kernel q ( x, x 0 ) so as to produce a new sample ( X 0 1 , . . . , X 0 N ) 2 as X 0 i ∼ q ( X i , x ). Conducting a resampling step based on the IS weigh ts ω i = π ( X 0 i ) /q ( X i , X 0 i ), w e then pro duce a new sample ( ˜ X 1 , . . . , ˜ X N ) that equally constitutes an appro ximation to the target distribution π . It is how ev er necessary to stress that, as established in Douc et al. (2007b), repeating this sc heme in an iterativ e manner is only of interest if samples that hav e b een previously sim ulated are used to update (or adapt) the k ernel q ( x, x 0 ): a k ernel q that is ﬁxed ov er iterations does not mo dify the statistical prop erties of the samples. The choice made in Douc et al. (2007b) of an adaptiv e prop osal kernel q represented as a mixture of ﬁxed transition kernels q d , q α ( x, x 0 ) = D X d =1 α d q d ( x, x 0 ) , D X d =1 α d = 1 , (1) can impro ve the eﬃciency of the kernel in terms of deviance (or relative entrop y) from the target densit y . T o ac hieve such an improv ement, the weigh ts α 1 , . . . , α D m ust b e tuned adaptively at each iteration of the PMC algorithm. 2.2 W eigh t up dating If α t =  α t 1 , . . . , α t D  denote the mixture weigh ts at the t -th iteration of the algorithm (where t = 1 , . . . , T ), the up date of the α t ’s of Douc et al. (2007b) tak es adv an tage of the latent v ariable structure that underlines the mixture representation, resulting in an integrated EM (Exp ectation- Maximisation) sc heme. In the current setting, the laten t v ariable Z is the standard comp onen t indicator in the mixture (see, e.g., Marin et al., 2005), with v alues in { 1 , . . . , D } suc h that the join t densit y f of x 0 and z given x satisﬁes f ( z ) = α z and f ( x 0 | z , x ) = q z ( x, x 0 ) . The up dating mechanism for the α d ’s then corresponds to setting the new parameter α t +1 equal to arg max α E X,X 0 π × π  E Z α t  log( α Z q Z ( X , X 0 )) | X , X 0  = arg max α E X,X 0 π × π  E Z α t  log( α Z ) | X , X 0  , where the inner exp ectation is computed under the conditional distribution of Z for the curren t v alue of the parameter, α t , i.e. f ( z | x, x 0 , α t ) = α t z q z ( x, x 0 )  D X d =1 α t d q d ( x, x 0 ) , with solution α t +1 d = E X,X 0 π × π  f ( d | X, X 0 , α t )  . (2) 2.3 Single Rao–Blac kw ellisation up dates Giv en that (2) is not a v ailable in closed form, the adaptivit y of the prop osed pro cedure is ac hieved b y appro ximating this actualising expression based on the sample that has b een previously simulated. The implemen tation of the standard PMC algorithm relies on an arbitrary initial prop osal µ 0 that pro duces a pre-initial sample ( X 1 , − 1 , . . . , X N , − 1 ) , 3 with importance weigh ts π ( X i, − 1 ) /µ 0 ( X i, − 1 ). This initial sample allo ws for the deriv ation of a sample ( ˜ X 1 , − 1 , . . . , ˜ X N , − 1 ) appro ximately distributed from π , using imp ortance sampling based on those weigh ts. The algorithm then pic ks arbitrary starting weigh ts α 0 d in the N -dimensional simplex to produce a (truly) initial sample ( X 1 , 0 , . . . , X N , 0 ) from the mixture X i, 0 ∼ D X d =1 α 0 d q d ( ˜ X i, − 1 , x ) . In the detail of its implemen tation, the pro duction of this initial sample is naturally associated with laten t v ariables ( Z i, 0 ) 1 ≤ i ≤ N that indicate from whic h component d of the mixture the corresponding X i, 0 has b een generated ( i = 1 , . . . , n ). F rom this stage, Douc et al. (2007b) pro ceed recursively . Starting at time t from a sample ( X 1 ,t , . . . , X N ,t ) , asso ciated with ( Z i,t ) 1 ≤ i ≤ N and with the curren t v alue of the w eigh ts α t,N , the normalised imp ortance w eights of the sample p oin t X i,t is provided by ¯ ω i,t = π ( X i,t ) P D d =1 α t,N d q d ( ˜ X i,t − 1 , X i,t )  N X j =1 π ( X j,t ) P D d =1 α t,N d q d ( ˜ X j,t − 1 , X j,t ) . (3) T o approximate the update step (2) in practice, an initial p ossibilit y is α t +1 ,N d = N X i =1 ¯ ω i,t 1 { Z i,t = d } , where the sum do es not need renormalisation because the ¯ ω i,t sum up to 1 (o v er i ). This c hoice is how ev er likely to b e highly v ariable when N is small and/or the num b er of comp onen ts D becomes larger. T o robustify this up date, Capp ´ e et al. (2007) introduce a Rao- Blac kwellisation step (see, e.g., Rob ert and Casella, 2004, Section 4.2) that consists in replacing the Bernoulli v ariable 1 { Z i,t = d } with its conditional expectation giv en X i,t and ˜ X i,t − 1 , that is, α t +1 ,N d = N X i =1 ¯ ω i,t f  d    X i,t , ˜ X i,t − 1 , α t,N  . This replacemen t do es not inv olve a signiﬁcant increase in the computational cost of the algorithm, while b oth approximations are con v erging to the solution of (2) as N grows to inﬁnit y . 2.4 Double Rao-Blac kw ellisation While Capp ´ e et al. (2007) sho wed through tw o exp erimen tal examples that the ab o ve Rao-Blackw el- lisation step do es pro vide a signiﬁcan t impro v ement in the stabilit y of the PMC weigh ts (and, thus, in ﬁne , a reduction in the num b er of iterations), there remains an extra-source of v ariation in the imp ortance w eights (3), namely the dependence of ¯ ω i,t on the previous sample p oin t (or particle) ˜ X i,t − 1 . Ev en though this dep endence illustrates the fact that X i,t is indeed generated from the mixture D X d =1 α t,N d q d ( ˜ X i,t − 1 , x ) , 4 and is th us providing a correct IS weigh t, it also sho ws a conditioning on the result of a (random) m ultinomial sampling in the previous iteration that led to the selection of ˜ X i,t − 1 with probabilit y ¯ ω i,t − 1 . In other w ords, b y de-conditioning, it is also the case that the sample p oin t X i,t has b een generated from the (integrated) distribution N X j =1 ¯ ω j,t − 1 D X d =1 α t,N d q d ( ˜ X j,t − 1 , x ) , (4) when av eraging o ver all m ultinomial outputs. This double av eraging o ver b oth the comp onen ts of the mixture and the initial sample p oin ts is the reason why w e call this represen tation double R ao– Blackwel lisation . The app eal of using (4) is that not only do es the av eraging ov er all p ossible sample p oin ts pro vide a most likely stabilisation of the w eights, but it also eliminates a strange feature of the original approach, namely that tw o identical v alues x i,t and x j,t = x i,t could hav e diﬀerent importance w eights simply b ecause their conditioning sample v alues ˜ X i,t − 1 and ˜ X j,t − 1 w ere diﬀeren t. Naturally , the replacement of the imp ortance sampling distribution in (3) by (4) has a cost of O( N 3 ) compared with the original O( N 2 ), but this is often negligible when compared with the cost of computing π ( X i,t ). (W e also stress that some time-saving steps could b e tak en in order to av oid computing all the q d ( ˜ X j,t − 1 , X i,t ) by considering ﬁrst the distance b et w een X j,t − 1 and X i,t and keeping only close neighbours within the sum although the increase in com puting time in our case did not justify the ﬁltering.) In the follo wing exp erimen t, the additional cost resulting from the double Rao– Blac kwellisation do es not induce a considerable upsurge in the computing time, even though it is not negligible. W e indeed found an increase in the order of three to ﬁv e times the original computing time for the same num b er N of sampling v alues. This ob viously fails to accoun t for the faster stabilisation of the IS appro ximation resulting from using double Rao–Blackw ellisation. Note also that double Rao–Blac kwellisation do es not remo ve the need to sample the ˜ X j,t − 1 ’s: indeed, when simulating each new X i,t from (4), w e need to ﬁrst select whic h ˜ X j,t − 1 is going to b e conditioned up on, then second determine which comp onen t is to b e used. 3 The Gaussian mean mixture b enc hmark As benchmark for our Monte Carlo exp erimen t, w e consider the case of a one-dimensional Gaussian mean mixture distribution, namely x 1 , . . . , x n ∼ p N ( µ 1 , σ 2 1 ) + (1 − p ) N ( µ 2 , σ 2 1 ) , (5) with p , σ 2 1 and σ 2 2 kno wn and θ = ( µ 1 , µ 2 ) b eing the parameter of the model, as in Marin et al. (2005). Using a ﬂat prior on θ within a square region, we are th us in terested in simulating from the posterior distribution asso ciated with a given sample ( x 1 , . . . , x n ). The app eal of this example is that it is suﬃcien tly simple to allow for an explicit c haracterisation of the attractiv e points for the adaptive pro cedure, b eing of dimension tw o, while still illustrating the v ariety of situations found in more realistic applications. In particular as already explained in Marin et al. (2005), the mo del con tains at least one attractiv e p oint that do es not corresp ond to the global minim um of the en tropy criterion as w ell as some regions of attraction that can even tually lead to a failure of the algorithm when the data ( x 1 , . . . , x n ) is not simulated from the ab o ve mixture but by clumps that induce more mo des in the p osterior. Figure 1 illustrates quite well the div ersit y of p osterior features when c hanging the repartition of the sample ( x 1 , . . . , x n ). W e stress the fact that those samples, called artiﬁcial samples, are not resulting from sim ulations from (5) but from sim ulations from a normal mixture with ﬁv e equal and w ell-separated comp onen ts centred in µ 1 = 0 and ± µ 2 , ± 2 µ 2 , and with v ariances 0 . 1. This 5 Figure 1: Series of p osterior distributions asso ciated with (5) represen ted as level sets for v arious artiﬁcial samples ( x 1 , . . . , x n ) and diﬀeren t v alues of n , µ 1 , µ 2 , σ 2 1 and σ 2 2 . c hoice was made in order to increase the p oten tial num b er of mo des on the p osterior surface when mo delling the data with (5). 4 Mon te Carlo exp erimen t Giv en the target distribution deﬁned b y (5), w e w ant to study the improv ement brough t by the double Rao-Blac kwellisation (4) in terms of mo de degeneracy , as well as to ascertain the additional cost of using double Rao-Blac kwellisation. W e therefore study the performance of b oth single and double Rao-Blac kwellisation PMC ov er a whole range of samples, with v arious v alues of n and of the parameters p , µ 2 and σ 2 (since we can alw a ys use µ 1 = 0 and σ 1 = 1 without loss of generalit y). 4.1 Automatic mo de ﬁnding and classiﬁcation A ﬁrst diﬃculty , when building this exp erimen t, is to determine the n umber of modes on the posterior surface for the data at hand. W e ac hiev e this goal b y ﬁrst discretising the parameter space in ( µ 1 , µ 2 ), and then recov ering the modal p oints b y a brute-force searc h for maxima and saddle-p oin ts ov er the grid and then identifying their basins of attraction. (The algorithm is based on multidirectional calls to the function turnpoints() of the pastecs pac k age.) This is obviously prone to ov erlo ok 6 some small local mo des but it also allows for a quic k identiﬁcation of the mo dal regions and of their exploration by the PMC algorithm. T able 1 illustrates the results of this pro cedure for a range of v alues of ( n, p, µ 2 , σ 2 ). When conditioning up on ( µ 2 , σ 2 ) the n umber of mo des is increasing in µ 2 (since the sim ulated samples include ﬁv e clusters that are b etter separated) and slightly decreasing in σ 2 (for apparen tly the same reason). A similar table v arying up on the pair ( n, p ) do es not show m uch v ariation in the n umber of mo des (whic h does mak e sense, since only the relative magnitudes of the mo des are c hanging). ( σ 2 , µ 2 ) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.0 2.053 2.184 2.450 2.863 3.265 3.524 3.641 3.707 3.742 (0.477) (0.486) (1.273) (1.085) (0.895) (0.754) (0.706) (0.668) (0.627) 1.5 2.037 2.124 2.135 2.165 2.294 2.494 2.731 2.958 3.182 (0.707) (0.362) (0.404) (0.565) (0.580) (0.668) (0.696) (0.729) (0.706) 2.0 2.026 2.067 2.121 2.059 2.083 2.132 2.259 2.546 2.807 (0.862) (0.337) (0.359) (0.342) (0.350) (0.369) (0.505) (0.648) (0.636) 2.5 1.976 2.024 2.107 2.067 2.073 2.184 2.472 2.757 2.893 (1.027) (0.354) (0.344) (0.302) (0.313) (0.434) (0.570) (0.533) (0.500) 3.0 1.993 1.982 2.073 2.203 2.137 2.361 2.743 2.933 3.040 (1.147) (0.359) (0.329) (0.599) (0.469) (0.580) (0.572) (0.518) (0.486) 3.5 1.979 1.964 2.076 2.186 2.367 2.662 2.973 3.181 3.354 (1.315) (0.432) (0.415) (0.519) (0.938) (0.844) (0.712) (0.678) (0.719) 4.0 1.961 1.939 2.078 2.209 2.412 2.871 3.214 3.466 3.752 (1.469) (0.507) (0.490) (0.500) (0.851) (0.922) (0.807) (0.806) (0.860) 4.5 1.858 1.893 2.046 2.220 2.524 3.101 3.418 3.777 4.073 (1.471) (0.504) (0.445) (0.540) (0.832) (1.034) (0.898) (0.910) (0.901) 5.0 1.698 1.886 2.063 2.273 2.678 3.187 3.673 4.046 4.295 (1.247) (0.605) (0.566) (0.612) (1.128) (1.013) (1.033) (0.970) (0.867) T able 1: Av erage num b er (and standard deviation) of identiﬁed mo des on a collection of 1470 samples. The code for b oth implemen ting b oth versions of PMC and for ev aluating their mo de-ﬁnding abilities was written in R (R Dev elopment Core T eam, 2006) and is a v ailable as http:\\www.ceremade.dauphine.fr\~xian\2RB.R . F urther documentation is a v ailable as http:\\www.ceremade.dauphine.fr\~xian\2RB.R.pdf . 4.2 Output The exp erimen t w as run on 4860 samples on sev en diﬀerent machines for a total of 238140 simulations, using the same mac hine for b oth single and double Rao-Blackw ellised v ersions at a given v alue of the parameters in order to keep the CPU comparison sensible. As seen from T able 2, the CPU time required to implement the double Rao-Blac kwellised scheme is ﬁve to three time higher than the original PMC sc heme, the additional time decreasing with n . (This CPU time do es not include the mo de ﬁnding and storing steps that are required for the comparison. It only corresponds to the execution of a regular PMC run with 10 iterations.) Note also that the increase in computing time is certainly less than linear. Both single and double Rao-Blackw ellisation PMC samplers were used on the same samples, with diﬀerent v alues of p , σ 2 and µ 2 . T urning now to the cen tral tables, T ables 3 – 6, we can see that the inﬂuence of σ 2 on the de- tection of the mo des is relativ ely limited, in opp osition to the inﬂuence of µ 2 . As µ 2 increases (recall that µ 1 = 0), a larger fraction of mo des gets undetected. Both after 5 and 10 iterations of PMC, the p erformances of the double Rao–Blac kwellised sc heme are sup erior to those of the single 7 n t 1RB t 2RB 20 3.60 15.59 (0.52) (1.12) 30 3.65 15.51 (0.52) (1.13) 40 3.68 15.59 (0.52) (1.13) 50 3.72 15.61 (0.52) (1.13) 100 3.89 15.74 (0.53) (1.16) 500 5.13 17.19 (0.62) (1.26) 1000 6.85 18.92 (0.74) (1.42) T able 2: Ov erall mean CPU times and their standard error (in seconds) vs. sample size. Eac h v alue is obtained by av eraging ov er 4860 generated samples, for a range of parameters ( p = . 1 , . . . , . 8, µ 2 = 1 , . . . , 5, σ 2 = 1 , . . . , 5). Rao–Blac kwellisation in terms of mo de detection, b y a factor of ab out 5%. Running the PMC al- gorithm longer clearly has a do wnw ard impact on the num b er of detected mo des, even though this impact is quite limited. It ho wev er p oints out the ma jor issue that imp ortance sampling algorithms lik e PMC are mostly unable to recov er lost mo des. Another interesting feature is that single Rao– Blac kwellisation suﬀers more from the loss of modes b et w een 5 and 10 iterations, compared with double Rao–Blackw ellisation. As exp ected, the later is more robust b ecause of the av erage o ver all past v alues in the PMC sample. Figure 2 presen ts the evolution of the capture rate for the single and the double Rao–Blackw ellisations as a function of µ 2 after 5 and 10 iterations, obtained by av eraging T ables 3 – 6 o ver σ 2 . If we consider instead the output of this simulation exp erimen t decomp osed by the v alues of n and p , in T ables 7 – 10, a main feature is the quick deterioration in the exploration of the mo des as n increases. This is ob viously to b e exp ected given that the more observ ations, the steep er the slop es of the p osterior surfaces. Th us, for small v alues of n , both types of Rao-Blackw ellised PMC algorithms achiev e high co v erage rates, but larger v alues of n lead to po orer achiev emen ts. Once again, the robustness against the n um b er of PMC iterations is sup erior in the case of the double Rao-Blac kwellisation. Figure 3 presents the evolution of the capture rate for the single and the double Rao–Blackw ellisations as a function of the sample size n after 5 and 10 iterations, obtained b y a veraging T ables 7 – 10 o v er p . Figure 4 illustrates the sup erior performances of the double Rao-Blackw ellisation strategy , com- pared to the single one in terms of modes detection. Those three graphs plot side b y side for a given sample the repartition of the PMC samples in the mo dal domains. The colour co des are asso ciated with the ﬁve p ossible v ariances used in the 5 kernel prop osal. In those three exp erimen ts, the p er- sistence of the samples in all mo des, as w ell as the concen tration in the regions of imp ortance, is noticeable. Both rows of Figure 4 interestingly con tain a ridge structure in one of the modal basins. 8 ( σ 2 , µ 2 ) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.0 0.743 0.714 0.631 0.536 0.452 0.409 0.399 0.386 0.387 (0.260) (0.256) (0.254) (0.259) (0.232) (0.212) (0.216) (0.210) (0.209) 1.5 0.697 0.697 0.612 0.573 0.515 0.472 0.425 0.386 0.350 (0.272) (0.255) (0.226) (0.208) (0.179) (0.175) (0.157) (0.145) (0.121) 2.0 0.653 0.714 0.614 0.564 0.552 0.521 0.495 0.445 0.397 (0.273) (0.257) (0.226) (0.182) (0.173) (0.151) (0.147) (0.147) (0.131) 2.5 0.666 0.718 0.602 0.563 0.540 0.509 0.452 0.394 0.378 (0.288) (0.256) (0.220) (0.181) (0.155) (0.147) (0.142) (0.114) (0.117) 3.0 0.684 0.715 0.605 0.545 0.526 0.477 0.402 0.371 0.355 (0.302) (0.255) (0.219) (0.193) (0.156) (0.151) (0.119) (0.108) (0.094) 3.5 0.706 0.718 0.611 0.538 0.500 0.434 0.373 0.345 0.322 (0.311) (0.257) (0.225) (0.179) (0.170) (0.152) (0.119) (0.103) (0.083) 4.0 0.725 0.719 0.610 0.535 0.484 0.409 0.347 0.318 0.291 (0.314) (0.258) (0.225) (0.187) (0.155) (0.151) (0.108) (0.094) (0.084) 4.5 0.756 0.736 0.630 0.536 0.466 0.379 0.330 0.293 0.273 (0.307) (0.259) (0.237) (0.192) (0.159) (0.137) (0.111) (0.086) (0.091) 5.0 0.781 0.731 0.627 0.528 0.462 0.370 0.312 0.280 0.253 (0.293) (0.261) (0.238) (0.190) (0.179) (0.136) (0.111) (0.095) (0.079) T able 3: Average detection rate after 5 iterations of the single Rao-Blackw ellised PMC algorithm as a function of σ 2 and µ 2 . The num b er of samples p er entry is 1470, obtained for 7 v alues of n and 6 v alues of p . ( σ 2 , µ 2 ) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.0 0.794 0.801 0.725 0.618 0.528 0.474 0.463 0.438 0.440 (0.258) (0.247) (0.266) (0.285) (0.265) (0.241) (0.238) (0.232) (0.229) 1.5 0.747 0.811 0.753 0.700 0.627 0.558 0.487 0.439 0.386 (0.275) (0.240) (0.252) (0.257) (0.256) (0.248) (0.223) (0.216) (0.174) 2.0 0.679 0.832 0.763 0.708 0.669 0.613 0.566 0.500 0.443 (0.279) (0.241) (0.255) (0.253) (0.248) (0.232) (0.212) (0.206) (0.191) 2.5 0.679 0.832 0.756 0.697 0.648 0.586 0.514 0.447 0.416 (0.291) (0.240) (0.257) (0.251) (0.238) (0.220) (0.212) (0.185) (0.160) 3.0 0.697 0.826 0.758 0.672 0.625 0.555 0.451 0.411 0.386 (0.304) (0.242) (0.256) (0.263) (0.237) (0.226) (0.179) (0.159) (0.137) 3.5 0.712 0.807 0.745 0.657 0.591 0.488 0.424 0.376 0.355 (0.312) (0.251) (0.260) (0.257) (0.244) (0.212) (0.178) (0.143) (0.131) 4.0 0.728 0.788 0.740 0.646 0.571 0.457 0.386 0.350 0.317 (0.312) (0.258) (0.260) (0.257) (0.238) (0.204) (0.153) (0.138) (0.123) 4.5 0.758 0.795 0.738 0.627 0.540 0.431 0.363 0.318 0.298 (0.307) (0.255) (0.262) (0.249) (0.229) (0.199) (0.146) (0.125) (0.121) 5.0 0.782 0.783 0.709 0.617 0.532 0.413 0.341 0.305 0.273 (0.292) (0.259) (0.264) (0.253) (0.241) (0.182) (0.141) (0.124) (0.105) T able 4: Same legend as T able 3 for the double Rao-Blac kw ellised PMC algorithm. 9 ( σ 2 , µ 2 ) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.0 0.707 0.641 0.570 0.482 0.413 0.376 0.365 0.356 0.355 (0.259) (0.244) (0.226) (0.220) (0.199) (0.187) (0.187) (0.181) (0.185) 1.5 0.656 0.599 0.537 0.516 0.478 0.443 0.403 0.371 0.340 (0.262) (0.222) (0.170) (0.146) (0.129) (0.129) (0.125) (0.119) (0.102) 2.0 0.638 0.622 0.533 0.518 0.512 0.491 0.474 0.427 0.384 (0.269) (0.231) (0.157) (0.119) (0.115) (0.094) (0.108) (0.115) (0.107) 2.5 0.656 0.641 0.531 0.510 0.507 0.483 0.436 0.386 0.366 (0.285) (0.237) (0.158) (0.111) (0.100) (0.102) (0.117) (0.095) (0.092) 3.0 0.673 0.660 0.542 0.497 0.499 0.456 0.388 0.359 0.346 (0.302) (0.243) (0.168) (0.130) (0.112) (0.114) (0.098) (0.084) (0.080) 3.5 0.700 0.660 0.554 0.499 0.472 0.416 0.362 0.335 0.315 (0.311) (0.245) (0.182) (0.127) (0.122) (0.122) (0.096) (0.085) (0.071) 4.0 0.723 0.670 0.560 0.498 0.462 0.390 0.337 0.308 0.284 (0.312) (0.248) (0.186) (0.139) (0.126) (0.121) (0.095) (0.075) (0.073) 4.5 0.755 0.700 0.572 0.496 0.445 0.365 0.319 0.285 0.267 (0.308) (0.255) (0.201) (0.142) (0.129) (0.126) (0.090) (0.076) (0.079) 5.0 0.780 0.700 0.578 0.501 0.438 0.357 0.303 0.270 0.249 (0.293) (0.258) (0.208) (0.159) (0.149) (0.120) (0.101) (0.078) (0.068) T able 5: Av erage detection rate after 10 iterations of the single Rao-Blackw ellised PMC algorithm as a function of σ 2 and µ 2 . ( σ 2 , µ 2 ) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.0 0.786 0.785 0.712 0.610 0.521 0.467 0.453 0.431 0.432 (0.259) (0.251) (0.268) (0.284) (0.262) (0.240) (0.236) (0.228) (0.227) 1.5 0.745 0.795 0.733 0.679 0.612 0.543 0.479 0.431 0.379 (0.275) (0.246) (0.253) (0.255) (0.249) (0.240) (0.216) (0.208) (0.167) 2.0 0.677 0.813 0.736 0.684 0.647 0.600 0.555 0.494 0.437 (0.280) (0.246) (0.256) (0.247) (0.242) (0.223) (0.207) (0.200) (0.188) 2.5 0.675 0.809 0.726 0.665 0.623 0.567 0.500 0.435 0.409 (0.291) (0.247) (0.256) (0.244) (0.229) (0.208) (0.199) (0.169) (0.152) 3.0 0.694 0.804 0.723 0.643 0.606 0.537 0.443 0.404 0.380 (0.304) (0.251) (0.257) (0.257) (0.227) (0.211) (0.167) (0.154) (0.134) 3.5 0.711 0.783 0.706 0.624 0.568 0.477 0.413 0.372 0.350 (0.311) (0.257) (0.256) (0.244) (0.233) (0.202) (0.169) (0.135) (0.124) 4.0 0.727 0.764 0.704 0.609 0.542 0.446 0.377 0.346 0.312 (0.313) (0.259) (0.255) (0.241) (0.218) (0.193) (0.145) (0.131) (0.116) 4.5 0.757 0.766 0.693 0.596 0.518 0.416 0.355 0.313 0.293 (0.308) (0.259) (0.258) (0.237) (0.215) (0.185) (0.139) (0.117) (0.116) 5.0 0.782 0.764 0.665 0.589 0.512 0.404 0.334 0.300 0.268 (0.293) (0.260) (0.255) (0.238) (0.228) (0.174) (0.134) (0.120) (0.098) T able 6: Same legend as T able 5 for the double Rao-Blackw ellised PMC algorithm. 10 Figure 2: Ev olution of the capture rate for the single and the double Rao–Blac kwellised v ersions as a function of µ 2 after 5 and 10 iterations, obtained by av eraging T ables 3 – 6 ov er σ 2 . ( p , n ) 20 30 40 50 100 500 1000 0.10 0.602 0.565 0.542 0.519 0.478 0.423 0.411 (0.288) (0.276) (0.274) (0.260) (0.243) (0.198) (0.184) 0.20 0.580 0.551 0.527 0.519 0.480 0.431 0.414 (0.278) (0.263) (0.249) (0.246) (0.222) (0.193) (0.173) 0.30 0.583 0.552 0.527 0.529 0.491 0.445 0.424 (0.276) (0.262) (0.243) (0.246) (0.219) (0.193) (0.167) 0.40 0.582 0.558 0.538 0.530 0.496 0.455 0.432 (0.272) (0.259) (0.253) (0.241) (0.221) (0.200) (0.174) 0.50 0.602 0.572 0.564 0.553 0.533 0.476 0.455 (0.283) (0.263) (0.256) (0.247) (0.230) (0.192) (0.172) 0.60 0.597 0.572 0.544 0.533 0.499 0.442 0.427 (0.276) (0.266) (0.256) (0.242) (0.220) (0.174) (0.153) T able 7: Average detection rate after 5 PMC iterations of the single Rao-Blackw ellised algorithm as a function of n and p . The n umber of samples is 2835, obtained for 7 v alues of µ 1 and 5 v alues of σ 2 . ( p , n ) 20 30 40 50 100 500 1000 0.10 0.652 0.619 0.601 0.577 0.542 0.463 0.445 (0.291) (0.286) (0.292) (0.283) (0.283) (0.249) (0.232) 0.20 0.644 0.629 0.613 0.604 0.566 0.482 0.453 (0.286) (0.284) (0.283) (0.283) (0.279) (0.250) (0.229) 0.30 0.653 0.645 0.630 0.628 0.588 0.500 0.467 (0.288) (0.288) (0.285) (0.284) (0.280) (0.256) (0.228) 0.40 0.664 0.662 0.654 0.643 0.598 0.513 0.472 (0.291) (0.289) (0.290) (0.281) (0.283) (0.260) (0.227) 0.50 0.668 0.651 0.648 0.639 0.614 0.524 0.482 (0.298) (0.290) (0.290) (0.284) (0.275) (0.241) (0.211) 0.60 0.665 0.651 0.627 0.623 0.582 0.482 0.450 (0.291) (0.288) (0.285) (0.276) (0.272) (0.222) (0.189) T able 8: Same legend as T able 7 in the double Rao-Blackw ellised case. 11 ( p , n ) 20 30 40 50 100 500 1000 0.10 0.577 0.541 0.519 0.495 0.457 0.408 0.398 (0.281) (0.266) (0.259) (0.243) (0.223) (0.175) (0.161) 0.20 0.545 0.514 0.492 0.485 0.447 0.413 0.402 (0.265) (0.242) (0.222) (0.218) (0.183) (0.164) (0.152) 0.30 0.541 0.510 0.489 0.480 0.456 0.424 0.414 (0.257) (0.233) (0.214) (0.201) (0.180) (0.159) (0.146) 0.40 0.536 0.516 0.496 0.487 0.458 0.431 0.418 (0.252) (0.234) (0.219) (0.207) (0.178) (0.162) (0.143) 0.50 0.564 0.542 0.528 0.517 0.497 0.459 0.443 (0.269) (0.246) (0.232) (0.219) (0.197) (0.166) (0.156) 0.60 0.544 0.519 0.499 0.487 0.463 0.426 0.420 (0.258) (0.238) (0.224) (0.206) (0.180) (0.146) (0.134) T able 9: Av erage detection rate after 10 PMC iterations for the single Rao–Blac kwellised algorithm as a function of n and p . ( p , n ) 20 30 40 50 100 500 1000 0.10 0.639 0.608 0.590 0.564 0.529 0.458 0.441 (0.291) (0.285) (0.288) (0.277) (0.275) (0.243) (0.227) 0.20 0.622 0.612 0.594 0.585 0.549 0.475 0.448 (0.284) (0.281) (0.278) (0.276) (0.267) (0.246) (0.220) 0.30 0.629 0.623 0.608 0.604 0.571 0.493 0.461 (0.284) (0.283) (0.279) (0.277) (0.274) (0.250) (0.223) 0.40 0.641 0.636 0.628 0.620 0.582 0.506 0.468 (0.287) (0.285) (0.286) (0.276) (0.274) (0.254) (0.224) 0.50 0.642 0.630 0.627 0.619 0.600 0.518 0.480 (0.293) (0.284) (0.283) (0.277) (0.269) (0.237) (0.208) 0.60 0.641 0.628 0.608 0.604 0.569 0.475 0.447 (0.287) (0.281) (0.278) (0.271) (0.264) (0.215) (0.184) T able 10: Same legend as T able 9 for the double Rao–Blackw ellised PMC algorithm. Figure 3: Ev olution of the capture rate for the single and the double Rao–Blackw ellisations as a function of the sample size n after 5 and 10 iterations, obtained b y a veraging T ables 7 – 10 o ver p . 12 Figure 4: Comparison b etw een PMC samples using a single Rao-Blackw ellisation (left) and a double Rao-Blac kwellisation (right). 13 5 Conclusions The double Rao-Blackw ellised algorithm provides a more robust framework for adapting general imp ortance sampling densities represen ted as mixtures in the sense that the inﬂuence of the starting p oin ts diminishes when compared with the original algorithm. It is thus unnecessary to rely on a large v alue of the n umber T of iterations of the PMC algorithm: with large enough sample sizes N at each iteration—esp ecially on the initial iteration that requires man y p oin ts to coun ter-weigh t a p oten tially p oor initial prop osal—, it is quite uncommon to fail to sp ot a stabilisation of b oth the estimates and of the en trop y criterion within a few iterations. References Capp ´ e, O., Douc, R., Guillin, A., Marin, J.-M., and Robert, C. (2007). Adaptive imp ortance sampling in general mixture classes. T echnical Rep ort 2007-48, Univ ersit´ e Paris Dauphine. Capp ´ e, O., Guillin, A., Marin, J.-M., and Rob ert, C. (2004). Population Mon te Carlo. J. Comput. Gr aph. Statist. , 13(4):907–929. Douc, R., Guillin, A., Marin, J.-M., and Rob ert, C. (2007a). Conv ergence of adaptive mixtures of imp ortance sampling schemes. Ann. Statist. , 35(1). Douc, R., Guillin, A., Marin, J.-M., and Rob ert, C. (2007b). Minim um v ariance importance sampling via p opulation Monte Carlo. ESAIM: Pr ob ability and Statistics , 11:427–447. Marin, J.-M., Mengersen, K., and Robert, C. (2005). Bay esian modelling and inference on mixtures of distributions. In Rao, C. and Dey , D., editors, Handb o ok of Statistics , v olume 25. Springer-V erlag, New Y ork. Marin, J.-M. and Rob ert, C. (2007). Bayesian Cor e . Springer-V erlag, New Y ork. R Dev elopment Core T eam (2006). R: A L anguage and Envir onment for Statistic al Computing . R F oundation for Statistical Computing, Vienna, Austria. Rob ert, C. and Casella, G. (2004). Monte Carlo Statistic al Metho ds . S pringer-V erlag, New Y ork, second edition. 14

On variance stabilisation by double Rao-Blackwellisation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment