Sequential Monte Carlo on large binary sampling spaces

A Monte Carlo algorithm is said to be adaptive if it automatically calibrates its current proposal distribution using past simulations. The choice of the parametric family that defines the set of proposal distributions is critical for good performanc…

Authors: Christian Sch"afer (CREST, CEREMADE), Nicolas Chopin (CREST

Sequential Monte Carlo on large binary sampling spaces
Sequential Monte Ca rlo on la rge bina ry sampling spaces Christ ian Sc h¨ afer 1 , 2 Nicola s Chopin 1 , 3 June 1 , 2 018 A Mon te Carlo algorithm is said to b e adaptiv e if it automatically cali- brates its curren t prop osal distribution using p ast sim ulations. The c hoice of the parametric family that defines the set of prop osal distributions is critical for go o d p erformance. In this pap er, we present suc h a parametric family for adaptiv e sampling on high-dimensional binary sp aces. A p ractical motiv ation for this problem is v ariable s election in a linear regression con text. W e w an t to sample fr om a Ba ye sian p osterior d istribution on the mo del space using an appropr iate version of Sequentia l Mon te Carlo. Ra w versions of Sequenti al Mon te Carlo are easily im p lemen ted u s ing b i- nary v ectors with indep endent comp onen ts. F or high-dimensional problems, ho wev er, these simple prop osals do not yield satisfactory results. The key to an efficient adaptiv e algorithm are binary parametric families whic h tak e correlations into ac coun t, analogously to the m u ltiv ariate normal distrib ution on con tin u ous sp aces. W e pro vide a review of m o dels for binary data and make one of them w ork in the conte xt of S equen tial Mont e Carlo s amp ling. Computational studies on real life data with ab out a h undr ed co v ariates suggest that, on difficult instances, our Sequen tial Monte Carlo approac h clearly outp erforms standard tec h n iques based on Marko v chain exploration. Keyw ords Adaptiv e Mon te Carlo · Multiv ariate bin ary data · Sequentia l Mon te Carlo · Linear regression · V ariable s electio n 1 Intro duction W e present a Sequen tial Mont e C arlo ( Del Moral et al. , 2006 ) algorithm for adaptive sampling from a binary distribution. A Mo nt e Carlo algo rithm is said to b e a daptiv e if it adju sts, sequ en tially and automatically , its sampling distribution to the problem at hand. Besides Sequentia l Monte Carlo, imp ortant classes of adaptive Mon te Carlo are Adaptiv e Imp ortance Sampling (e.g. Capp´ e et al. , 2008 ) and Adaptiv e Mark ov chain Mon te Carlo (e.g. Andrieu and Thoms , 2008 ) . 1 Centre de Recherche en ´ Economie et Statistique, 3 Avenue Pierre Larous se, 92240 Malakoff, Fran ce 2 CEntre de REcherches en MAth ´ ematiques de la DEcision, Universit´ e Paris-Daup hine, Place du Mar ´ ec hal de Lattre de T assigny 75775 Pa ris, Fran ce 3 Ecole Nationa le de la Statistique et de l’Administration, 3 Avenue Pierre Larousse, 92240 Malakoff, France A central asp ect of adaptiv e algorithms is their n eed for a parametric family of aux- iliary distributions wh ic h should ha ve t he follo wing three prop erties: (a) the family is sufficien tly fl exib le to guaran tee a reasonable p erf ormance in the con text of the sp ecific algorithm; (b) it allo ws to quickly d ra w in dep end en t samp les; (c) it can, with reasonable effort, b e calibrated u sing past sim u lations. F or pr oblems in cont in uous samp ling spaces, the t ypical example is the multiv ariate normal distribution, whic h clearly fulfi ls (b ) and (c), and complies w ith (a) in many practical problems. In this pap er, we prop ose an analogue for high-dimensional binary sampling spaces. 1.1 Adaptive Monte Ca rlo on multi va riate bina ry spaces Our ob jectiv e is to construct a parametric family for Sequential Mon te Carlo on the binary sampling space B d = { 0 , 1 } d , where d is to o large to allo w for exh austiv e en umer- ation of the whole space B d . S ince there is no multiv ariate bin ary family whic h w e can easily p arametrise by its first and sec ond order momen ts like the m ultiv ariate normal, the construction of s u itable prop osal d istributions seems more difficu lt for the discrete adaptiv e sampling problem than for its con tin uous counterpart. The ma jor application for our algorithm is v ariable selection in linear regression mo d- els. In this context, a binary v ector γ ∈ B d enco des wh ether eac h of d p ossible co v ariates are includ ed in the linear regression mo del or n ot. In a Bay esian fr amew ork, and for a ju- dicious c hoice of p r ior d istributions, we can explicitly calculate the p osterior distribu tion π up to a constan t. W e wa nt to sample from this distribution in ord er to appro ximate quantiti es like the exp ected v alue E π ( γ ), that is the marginal probability of inclusion of eac h v ariable. Often, the marginal probabilities p ro vide a ric her picture o f the p osterior distribution than a collectio n of mo d es fou n d using sto c h astic optimisation tec hniques. 1.2 Global versus loc al metho d s Our Sequen tial Monte C arlo appr oac h to v ariable selection views a w ell studied problem from a different angle and p ro vides new p ersp ectiv es. The reason is t wo-fold. Firstly , there is gro wing evidence th at global methods , whic h trac k a p opulation of particles, initially well spr ead ov er the samp lin g sp ace B d , are often more robu st than lo cal m etho ds based on Mark o v chain Mon te Carlo. T h e latter are m ore prone to get trapp ed in the n eigh b ourho o d of lo cal mo des. W e largely illustrate this effect in our sim u lations in Section 6 . Secondly , global metho ds ha v e the prop ert y to be easily parallelisa ble. P arallel im- plemen tations of Mon te Carlo algorithms hav e gained a tremendous interest in the very recen t y ears ( Lee et al. , 2010 ; Suchard et al. , 2010 ), du e to the increasing a v ailabilit y of m u lti-core (cen tral or graphical) pro cessing units in standard computers. 1.3 Plan and notations The pap er is organised as follo w s. In S ection 2 , w e recapitulate the basics of Ba ye sian v ariable select ion in linear regres- sion mo dels as the motiv ating app lication. In S ection 3 , we briefly review the principal Mark o v c hain Mon te Carlo metho ds which are commonly used to in tegrate with resp ect to a binary distribu tions. 2 In Section 4 , w e describ e an alternativ e approac h to th e same problem using Sequen tial Mon te Carlo metho d s. The key ingredien t of this algo rithm is a parametric family whic h is flexible enough to come close to the target d istribution. In Section 5 , we extensively discus s approac hes for constructing ric h p arametric fami- lies on bin ary spaces. This is the core of our w ork. S ome of the binary mo dels discu ssed are not suitable i n the framew ork of ou r Sequ en tial Monte Ca rlo algorithm but men tioned for completeness of the sur v ey . In Section 6 , w e constru ct tw o examples of v ariable selection p roblems wh ic h yield c hallenging p osterior distributions. W e show that standard Mark o v c hain tec h n iques fail to pro duce reliable esti mates of the m arginal prob ab ilities while our Sequential Mon te Carlo approac h successfully cop es with the integ ration p roblem. Notation F or a vec tor x ∈ X d , we write x M for the sub -v ector indexed by M ⊆ { 1 , . . . , d } . W e write x i : j if the indices are a complete sequence i, . . . , j . W e denote by x − i the sub-vect or x { 1 ,...,d }\{ i } . W e write | x | f or P d k =1 x k . F or a matrix A , we denote its comp onent s by a ij , its determinant b y | A | . The op erator diag [ x ] transf orms the ve ctor x in to a d iagonal matrix. F or a fi nite set M , we denote b y # M the num b er of elemen ts in M . 2 V a riable selection: A bi na ry sam pling p roblem The standard linear n ormal mo del p ostulates that the relationship b et ween th e observe d explained v ariable y ∈ R m and the observ ations Z = [ z 1 , . . . , z d ] ∈ R m,d is y | β , γ , σ 2 , Z ∼ N  Z d iag [ γ ] β , σ 2 I m  . Here, β is a v ector of regression co efficient s and σ 2 the v ariance of y . W e denote by I m the iden tity matrix and assume the fi rst column Z · , 1 to b e constan t. The parameter γ ∈ B d = { 0 , 1 } d determines whic h co v ariates are includ ed in or dropp ed from the linear regression mo del. In total, we can construct 2 d differen t linear normal mod els from the data. W e assign a prior distribu tion π ( β , σ 2 , γ | Z ) to the p arameters. F rom the p osterior distribution π ( β , σ 2 , γ | y , Z ) ∝ π ( y | β , σ 2 , γ , Z ) π ( β , σ 2 , γ | Z ) w e ma y compute the p osterior probabilit y of eac h mo del π ( γ | y , Z ) = Z π ( β , σ 2 , γ | y , Z ) d ( β , σ 2 ) (1) b y in tegrating out the parameters β and σ 2 . Hiera rchical Bay esian mo del In a purely Ba ye sian con text, we obtain, up to a constan t, an explicit f orm ula for th e int egral in ( 1 ) b y choosing conjugate hierarc hical pr iors, that is a normal π ( β | σ 2 , γ , Z ) an d an inv erse-gamma π ( σ 2 | γ , Z ). F or all Ba y esian p osterior 3 distributions in this pap er, w e use the prior distribu tions π ( β | σ, γ , Z ) = N  0 , σ 2 v 2 diag [ γ ]  , σ 2 > 0 , π ( σ 2 | γ , Z ) = I ( w / 2 , λw / 2) , w > 0 , λ > 0 , π ( γ | Z ) = U ( B d ) , where I denote an Inv erse-Gamma and U a uniform la w. F or our numerical examples in Section 6 , w e assume n ot to ha ve any prior inf orm ation ab out the data. W e follo w the r ecommendations of George and McCullo c h ( 1997 ) and c ho ose the hyp er -p arameters w = 4 . 0 , λ = b σ 2 1 , v 2 = 10 . 0 /λ, (2) where b σ 2 1 is the least squ are estimate of σ 2 based on the saturated mo del. The rationale b ehind this choic e is to h a ve a flat prior on β and provide σ 2 with sufficien t mass on the in terv al ( ˆ σ 2 1 , ˆ σ 2 0 ), where ˆ σ 2 0 denotes the v ariance of y . Next, we qu ic kly s tate the form of the log-p osterior mass fun ction. W e write Z γ for Z d iag [ γ ] without zero columns. Let b γ = Z ⊺ γ y and C γ ,v C ⊺ γ ,v = Z ⊺ γ Z γ + v − 2 I | γ | (3) a Ch olesky decomp osition. W e denote the le ast square estimate of σ 2 based on ν an d the mo del γ b y b σ 2 γ ,v = 1 m  y ⊺ y − ( C − 1 γ ,v b γ ) ⊺ ( C − 1 γ ,v b γ )  . W e find the log-p osterior p robabilit y to b e log π ( γ | y , Z ) = µ − P | γ | i =1 log c ( γ ,v ) i,i − | γ | log( v ) − w + m 2 log( w λ/m + b σ 2 γ ,v ) , where µ is an unk n o wn norm alization constant. Related approaches In a F requentist framework, w e choose a mo del whic h minim izes some sp ecified criterion. A p opular one is Sc hw arz’s C r iterion ( Sc hw arz , 1978 , also Ba ye sian Information Cr iterion) whic h basically is a second d egree Laplace app ro xima- tion of ( 1 ): log π ( γ | y , Z ) ≈ µ − | γ | 2 log( m ) − m 2 log( b σ 2 γ ) , where b σ 2 γ = lim v → ∞ b σ 2 γ ,v is th e maxim um lik eliho o d estimator of σ 2 based on the m o del γ . No te that for a large sample size m the Hierarchica l Ba yesia n app roac h and the Ba ye sian Information Criterion coincide. Alternative app roaches The p osterior of a Ba y esian linear regression v ariable s electio n problem has, in general, no particular structure we can exp loit to sp eed up optimisation or int egration with resp ect to π . Therefore, alternativ e approac h es such as the Least Absolute S hrink age and Selection Op erator ( Tibshirani , 1996 ) h a ve b een prop osed which dra w from the theory of con ve x optimizat ion and allo w for computation of larger p rob- 4 lems. While a comparison b et ween comp eting appr oac hes to v ariable selectio n is b ey ond the scop e of this pap er, w e r emark that more sophisticated, parallelisa ble algorithms are essentia l for making Ba y esian modelling feasible in the con text of high dimens ional problems where alternativ e metho ds are often u sed for practical reasons only . 3 Ma rk ov chain Mo nte Ca rlo on bina ry spaces Mark ov c hain Monte Carlo is a w ell-studied approac h to app ro ximate the exp ected v alue of a p osterior π giv en by a Ba yesian mo del choi ce p roblem ( George and McCullo ch , 1997 ). In this sect ion, we rapidly revie w the standard met ho ds we are going to compare o ur Sequent ial Mon te Carlo approac h against. There are more adv anced Mark o v c hain Mon te Carlo algorithms that us e parallel temp ering ideas com bin ed with more elaborate local mo v es (see e. g. Liang and W ong , 2000 ; Bottolo and Richardson , 2010 ), bu t a thorough comparison is b ey ond the scop e of this pap er. F or bac kgroun d on Mark ov c h ain Mon te Carlo metho ds, w e refer to stand ard literature (see e.g. Rob ert and Casella , 2004 , c h aps. 7-12). 3.1 F ramewo rk The idea is to construct a transition kernel κ , t ypically some v ersion of a Met rop olis- Hastings kernel, whic h adm its π as u nique stationary distribu tion. Then, the distr ib ution of the Mark o v c h ain x t +1 ∼ κ ( x t , · ) started at some rand omly c hosen p oin t x 0 ∈ B d con verge s to π . W e obtain an esti mate E π ( γ ) ≈ n − 1 P n + b t = b x t for the exp ected v alue via t he ergo dic theorems for Mark o v chains. The firs t b sta tes are usually discard ed to giv e the chai n some time to co nv erge to wa rds the sta tionary distribu tion b efore w e start to a ve rage. F or the estimate to b e v alid, we need to ensure th at th e at time b the c hain is close to its stationary distribu tion π , and at time n + b w e ha v e sampled an ergo dic tra j ectory suc h that the ergo dic th eorems app lies. Classic Mark o v c h ain metho d s on binary spaces work lo cally , that is they p rop ose mo ves to neigh b ourin g mo d els in the Metrop olis-Hastings steps. A neigh b ouring mo d el is a cop y of th e curren t mod el wh ere j u st a few comp onen ts are altered. W e shall see that these kinds of transition ke rnels often fail to sample ergodic tra j ectories within a reasonable amoun t of time if the stationary d istribution π is very multi-modal. Algo r it hm W e lo op o ver a uniformly dra wn sub set of comp onent s I ∼ U ( { M ⊆ { 1 , . . . , d } | # M = k } ) and prop ose to c hange the comp onents i ∈ I . The num b er of comp onen ts k might b e fixed or dr awn from some distrib ution G on the ind ex set { 1 , . . . , d } . Precisely , w e take a cop y y of the current s tate x t and replace y i b y Y i ∼ B p i ( x ) for all i ∈ I , where B p i ( x ) ( γ ) = p i ( x ) γ i (1 − p i ( x )) 1 − γ i is a Bernoulli distribu tion with parameter p i ( x ) ∈ (0 , 1). W e set x t +1 = y with proba- bilit y π ( y ) π ( x t ) Q i ∈ I B p i ( y ) ( x t ) Q i ∈ I B p i ( x t ) ( y ) ∧ 1 , (4) 5 and x t +1 = x t otherwise. This framewo rk, summarized in Algorithm 1 , yields a Mark o v c hain with unique inv ariant distr ibution π for an y fixed p ∈ (0 , 1) d . The in teresting sp ecial cases, how ev er, u se a p ( x ) whic h dep ends on the current state of the chain. Algorithm 1 Generic metrop olised Gibbs ke rnel Input: x ∈ B d U ∼ U ([0 , 1]) , k ∼ G k ∗ I ∼ U ( { M ⊆ { 1 , . . . , d } | # M = k } ) y ← x for i ∈ I do y i ∼ B p i ( x ) if π ( y ) π ( x ) Q i ∈ I B p i ( y ) ( x ) Q i ∈ I B p i ( x ) ( y ) > U then x ← y return x P erf ormance W e refer to the ratio ( 4 ) as the acceptance prob ab ility of the Metrop olis- Hastings ste p. In binary spaces, ho wev er, accepting a prop osal do es not imply w e are c hanging the state of the c hain, since we are lik ely to r e-pr op ose the curren t state y = x t . W e are actually interested in h o w fast the c hain explores the state spaces, precisely its m u tation prob ab ility P ( x t +1 6 = x t ). 3.2 Standa rd Ma rk o v c hain methods F or this s ection, let k = 1 b e constan t. Algorithm 1 collapses to c h anging a single comp onent . Instead of ind ep endently drawing the index i ∼ U ( { 1 , . . . , d } ), we could also iterate i through a un iformly dra w n p ermutati ons σ ( { 1 , . . . , d } ) of the ind ex set { 1 , . . . , d } . Kernels of this kind are often r eferr ed to as metrop olised Gibb s samplers, sin ce they pro ceed comp onent-wise as do es the classical Gibbs sampler, but also inv olv e a Metrop olisHastings step. I n the sequel, w e d iscuss some sp ecial cases. Classical G ibbs The Gibb s s ampler se quenti ally dra w s eac h comp onent from the full marginal distribution, which corresp onds to p i ( x ) := π ( γ i = 1 | γ − i = x − i ) = π ( γ i = 1 , γ − i = x − i ) π ( γ i = 1 , γ − i = x − i ) + π ( γ i = 0 , γ − i = x − i ) . By construction, th e accepta nce p robabilit y is 1 while the mutatio n probability is only π ( y ) / ( π ( x t ) + π ( y )), where y is a cop y of the current state x t with comp onen t i altered. Adaptive m etrop olised Gibbs An adaptiv e extension of the metrop olised Gibbs has b een prop osed by Nott and Kohn ( 2005 ). Th e full m arginal distribu tion π ( γ j = 1 | γ − j = x − j ) is appro ximated by a linear predictor. In their notation, p i ( x ) :=  ψ i − W − i x − i w i,i  ∨ δ  ∧ (1 − δ ) , 6 where ψ is the estimate d mean, W − 1 the estimated co v ariance matrix and δ ∈ (0 , 1 / 2) a design parameter whic h ensures that p i ( x ) is a probabilit y . Analogo usly to our vec tor notation, W − i denotes the matrix W without the i th ro w and column. W e obtain the estimates fr om the past tra jectory of the c hain x b , . . . , x t − 1 and up date them per io di- cally . The m utation probability is of th e same order as that of the Gibbs k ern el, b u t adaption largely a voids the computationally exp ensive ev aluations of π . The non-adaptive Gibb s sampler already requires ev aluation of π ( y ) j ust to p ro duce the prop osal y . In con trast, the adaptiv e m etrop olised Gibbs samples from a proxy and only ev aluates π ( y ) if y 6 = x t . Mo dified metrop olised Gibbs In comparison to the classical Gibbs kernel, we obtain a more efficien t c hain ( Liu , 1996 ) using the simp le form p i ( x ) := 1 − x i . Since we alwa ys prop ose to c hange the current state, the acceptance and mutatio n prob- abilities are the same. They amount to π ( y ) /π ( x ) ∧ 1, where y is a cop y of the current state x w ith comp onen t i altered. Comparin g the m utation p robabilities of th e t wo k ern els, we see that the mo dified metrop olised Gib b s chai n mo ves, on av erage, faster than the classical Gibbs c hain. 3.3 Blo ck up dating The mo dified metrop olised Gibbs easily generalises to th e case where k may tak e v alues larger than one. Sup p ose, for example, we prop ose to c hange k ∼ G k ∗ ( k ) ∝ (1 − 1 /k ∗ ) k − 1 k ∗ 1 { 1 ,...,d } ( k ) comp onent s sim ultaneously , wh ere G k ∗ is a truncated geometric distribution. Note that w e suggest, on a verage , to c hange appr o ximately k ∗ comp onent s. In other w ords, for larger v alues of k ∗ , w e are more lik ely to pr op ose further steps in the sampling sp ace. Large ste p prop osals improv e the mixing prop erties of the chain and help to e scap e from the attraction of local mod es. They are, ho w ev er, less likely to b e accepted than single comp onent steps which leads to a problem-dep end en t trade-off. In our numerical examples, w e could not observe an y b enefit from blo c k up dating, and w e do not fur ther consider it to k eep the comparison w ith our Sequential Mon te Carlo metho d m ore concise. 3.4 Indep endent p rop osals W e can constru ct a fast mixin g Mark o v chain based on indep endent prop osals. Let q b e some distribu tion with π ≪ q , th at is q ( γ ) = 0 ⇒ π ( γ ) = 0 for all γ ∈ B d . W e prop ose a new state y ∼ q and accept it w ith probabilit y π ( y ) π ( x t ) q ( x t ) q ( y ) ∧ 1 . (5) The asso ciated Mark ov chain has the u nique inv ariant measure π . T his kernel is referred to as the ind ep endent Metrop olis-Hastings kernel, since the prop osal distrib ution is n ot 7 a fu n ction of th e cur r en t state x t . Th e muta tion r ate is the acceptance rate minus q ( x t ), so the t wo notions practically coincide in large s p aces. Ob viously , in order to make this approac h work, we need to c ho ose q sufficien tly close to π , wh ic h imp lies high acceptance rates on a ve rage. In absence of reliable p rior infor- mation, how ev er, w e are not able to pr o duce su c h a distribution q . W e shall, ho w ev er, use precisely this Marko v k ernel as part of our S equen tial Mon te Carlo algo rithm. In this context, w e c an c alibrate sequences q t of p r op osal distrib utions to be close to our current particle app ro x im ation. 4 Sequential Monte Ca rlo on bina ry spaces In this section, w e sho w h ow to estimate the exp ected v alue with resp ect to a probabil- it y mass function π ( γ ) defin ed on B d using Sequ en tial Mon te Carlo ( Del Moral et al. , 2006 ) . Th is general class of algorithms alternates imp ortance sampling steps, resampling steps and Mark ov chain transitions, to recursively appro ximate a sequ ence of distribu- tions, u s ing a set of weig h ted ‘particles’ w hic h represent the current distribution. In the follo wing, w e presen t a version whic h is tailored to w ork on binary spaces. F or readers not familiar with Sequential Mont e Carlo, the follo wing algorithm de- scrib ed migh t seem rather complex at first gl ance. W e intro d uce the steps separatel y b efore we look at the complete algorithm. W e giv e compr ehensiv e instructions which corresp ond exactly to our implemen tation in order to mak e our results p lausible and easily repro ducible for the reader. 4.1 Building a sequence of distributions The fi rst ingredient of Sequenti al Monte Carlo is a smo oth sequence of distrib utions ( π t ) τ t =0 , whic h ends up at the distribu tion of int erest π τ = π . Th e intermediary distrib u - tions π t are pur ely instrumental: the idea is to depart from a distribution π 0 with broad supp ort and to progress smo othly to wards the distribu tion of in terest π . Initial distribution Theoretically , we can u se any π 0 with π ≪ π 0 that can sample from as in itial distribution. Numerical exp eriments taught u s, ho we v er, that prematur e adjustment of π 0 , for example usin g Mark o v c h ain p ilot runs, leads to faster but less robust algorithms. Th us, in p ractice, w e recommend the uniform distribution for its simplicit y and reli- abilit y . Therefore, in the sequ el, w e let π 0 = U ( B d ). Intermediate distributions W e co nstruct a smo oth sequence of d istributions b y ju d i- cious choic e of an asso ciated real sequence (  t ) τ t =0 increasing fr om zero to one. The most con venien t and somewhat natural strategy is the geometric bridge ( Gelman and Meng , 1998 ; Neal , 2001 ; Del Moral et al. , 2006 ) π t ( γ ) : ∝ π 0 ( γ ) 1 −  t π ( γ )  t ∝ π ( γ )  t . (6) Alternativ ely , one could use a sequences of m ixtures π t ( γ ) : ∝ (1 −  t ) π 0 ( γ ) +  t π ( γ ) 8 or, in a Ba ye sian con text, a sequences of p osterior distr ibutions where data is added as  t increases, that is π t ( γ ) = π ( γ | z 1 , . . . , z ⌊  t m ⌋ ) , see ( Chopin , 2002 ). In the f ollo wing, w e use the ge ometric b ridge ( 6 ) f or its computa- tional simplicit y and pr esent a p ro cedure to determine a su itable sequence (  t ) τ t =0 . 4.2 Assigning imp ortance w eights Supp ose w e ha ve already pr o duced a sample x ( t − 1) 1 , . . . , x ( t − 1) n of size n from π t − 1 . W e can roughly approximat e π t b y the empirical distribu tion π t ( γ ) ≈ n X k =1 w t ( x [ t − 1] k ) δ x [ t − 1] k ( γ ) , (7) where the corresp onding imp ortance function w t is w t ( x k ) := u t ( x k ) P n i =1 u t ( x i ) , u t ( x ) := π t ( x ) π t − 1 ( x ) = π α t ( x ) . (8) Note that α t =  t −  t − 1 is the step length at time t . As we c ho ose α t larger, that is π t further from π t − 1 , the w eight s b ecome more uneve n and the accuracy of the imp ortance appro ximation deteriorates. Pro cedure 1 Imp ortance we igh ts Input: α, π , X = ( x 1 , . . . , x n ) ⊺ u k ← π α ( x k ) for all k = 1 , . . . , n w k ← u k / ( P n i =1 u i ) for all k = 1 , . . . , n return w = ( w 1 , . . . , w n ) If w e rep eat the w eigh ting steps until w e r eac h π τ = π , we obtain a classical imp or- tance sampling estimate with instr ument al distribu tion π 0 . The idea of th e Sequen tial Mon te Carlo algorithm, h o wev er, is to con trol the weig h t degeneracy such that we can in tersp erse resample and mo v e steps b efore loosing trac k of our particle app ro ximation. Effective sample size W e measure the w eigh t degeneracy th rough the effectiv e sample size criterion, see ( Kong et al. , 1994 ). In our case, we ha ve η ( α, X ) := ( P n k =1 w α ( x k )) 2 n P n k =1 w α ( x k ) 2 = ( P n k =1 π α ( x k )) 2 n P n k =1 π α ( x k ) 2 ∈ [1 /n, 1] . The effect iv e sample size is 1 if all w eigh ts are equal and 1 /n if all mass is concen trated in a single particle. F or a geometric br idge ( 6 ), the effectiv e sample s ize is merely a function of α . W e can th u s con trol the w eight degeneracy by ju dicious c hoice of the step lengths α t . 4.3 Finding the step l ength W e pick a step length α suc h that the effectiv e sample size η ( α ) equals a fixed v alue η ∗ , see ( Jasra et al. , 200 8 ). Since η is contin uous and monotonously in creasing in α , w e 9 solv e η ( α, X ) = η ∗ (9) using bi-sectional s earc h, see Pro cedu re 2 . This approac h is n u merically more stable than a Ne wton-Raphson iteration, for the deriv ativ e ∂ η ( α, x ) /∂ α inv olv es fractions of sums of exp onential s wh ich are difficult to handle. Let α ∗ b e the unique solution to ( 9 ). W e can construct an asso ciated sequ ence setting  t = 1 ∧ (  t − 1 + α ∗ ). T h us, the num b er of steps τ dep ends on the co mplexit y of the in tegration problem at hand and is n ot kno wn in adv ance. In other w ords, for fixed η ∗ , the associated sequence (  t ) τ t is a self-tuning parameter. In our simulations, w e alw a ys c ho ose η ∗ = 0 . 9, w h ic h yields con vincing results on b oth example p roblems in S ection 6 . Smaller v alues significan tly sp eed up the Sequentia l Mon te Carlo algorithm but lead to a higher v ariation in the r esults. Pro cedure 2 Find step length Input: , X = ( x 1 , . . . , x n ) ⊺ l ← 0 , u ← 1 . 05 − ρ, α ← 0 . 05 rep eat if η ( α, X ) < η ∗ then u ← α, α ← ( α + l ) / 2 else l ← α, α ← ( α + u ) / 2 un t il | u − l | < ε or l > 1 −  return α ∧ (1 −  ) 4.4 Resampling the system Supp ose we ha v e a sample X ( t − 1) = ( x ( t − 1) 1 , . . . , x ( t − 1) n ) of size n from π t − 1 with imp or- tance we igh ts as defined in ( 8 ). W e can obtain a sample b X ( t ) = ( ˆ x ( t ) 1 , . . . , ˆ x ( t ) n ) whic h is appro ximately distributed according to π t b y dra wing from the empir ical appro ximation defined in ( 7 ). Pro cedure 3 Resample (systematic) Input: w = ( w 1 , . . . , w n ) , X = ( x 1 , . . . , x n ) ⊺ v ← n w , j ← 1 , c ← v 1 sample u ∼ U ([0 , 1]) for k = 1 , . . . , n do while c < u do j ← j + 1 , c ← c + v j end while ˆ x k ← x j , u ← u + 1 end for return b X = ( ˆ x 1 . . . , ˆ x n ) ⊺ F or the implementa tion of the resampling step, th ere exist sev eral recip es. W e could apply a multinomial resampling ( Gordon et a l. , 19 93 ) w hic h is straigh tforward. There are, how ev er, more efficien t wa ys lik e residu al ( Liu and Chen , 1998 ), stratified ( Kitaga wa , 1996 ) an d systematic resamplin g ( Carp ente r et al. , 199 9 ). W e use the latest in our sim u lations, see Pro cedure 3 . 10 In the resulting unw eigh ted p article system b X ( t ) of size n , the particles with small w eigh ts hav e v anished while the p articles with large weigh ts ha ve b ee multiplied. There are appr oac hes that resample a w eigh ted particle system of size n from an augmente d system of s ize m > n , see ( F earnhead and Clifford , 200 3 ), b ut these tec h niques are computationally more demanding w ithout visibly impro ving our numerical r esults. T h e- oretically , one would exp ect a Rao-Bla c kwellisat ion effect but its analysis is b ey ond the scop e of this pap er. In an y case, if we rep eat the w eighting and resamp lin g steps several times, we rapidly deplete our p article reserv oir reducing the num b er of differen t particles to a v ery few. Th us, the particle appro ximation will b e totally inaccurate. Th e k ey to figh ting the deca y of our appro ximation is the follo wing mo ve step. 4.5 Moving the system The resamp lin g step pr o vid es an unw eigh ted particle system b X ( t ) = ( ˆ x ( t ) 1 , . . . , ˆ x ( t ) n ) of π t con taining multiple copies of man y particles. The cen tral idea of th e Sequ en tial Mon te Carlo al gorithm is to div ersify the r esampled system, r eplacing th e particle s b y dra ws from a Mark o v k er n el κ t with in v arian t measure π t ( Gilks and Berzuini , 2001 ). Since th e particle x (0) k is, appro ximately , d istr ibuted according to π t , a draw x (1) k ∼ κ t ( x (0) k , · ) is again, appro ximately , distributed acco rding to π t . W e can rep eat this pro- cedure o ver and o v er without c h anging the target of the p article appro x im ation. Note that, ev en if the particles x (0) k = · · · = x (0) m are equal after r esampling, the particles x ( s ) k , . . . , x ( s ) m are almost indep end en t after su ffi cien tly many mov e steps. In order to mak e the algorithm practical, ho wev er, w e need a transition k ern el whic h is rapidly mixing and therefore diversifies the p article system within a few steps. Th erefore, the lo cally op erating Mark o v k er n els r eview ed in S ection 3 are not su itable. In fact, our n umerical exp eriment s suggest that m aking a Sequen tial Mon te Carlo algo rithm wo rk with lo cal ke rnels is pr acticall y imp ossible. Therefore, we use a Metrop olis-Hastings k ern el with indep endent prop osals as de- scrib ed in S ection 3.4 . Precisely , we construct a kernel κ t emplo ying a parametric family q θ on B d whic h, for some θ , is sufficiently close to π t to allo w for high acceptance p rob- abilities. F or this purp ose, w e fi t a parameter θ t to th e p article approxima tion ( w t , X t ) of π t according to some con venien t criterion. The c h oice of the parametric family q θ is crucial to a su ccessful implementa tion of the Sequ ential Mon te Carlo algorithm. W e come bac k to this issue in Section 5 . P article diversit y W e need to determine ho w often we mo v e the particle system b efore w e return to the w eigh t-resample step. An easy c riterion for the h ealth of the particle appro ximation X = ( x 1 , . . . , x n ) is its particle d iversit y ζ ( X ) := # { x k | k = 1 , . . . , n } n ∈ [1 /n, 1] , (10) that is the prop ortion of distinct particles. Note that the particle div ersity is a q u alit y criterion whic h has no simp le analog ue in con tinuous sampling spaces. F or optima l results, w e r ecommend to k eep on mo ving the particle system unti l th e particle diversit y cann ot b e augmen ted an y longer. In th e fir st steps of the a lgorithm, 11 Pro cedure 4 Mo ve Input: X (0) = ( x (0) 1 , . . . , x (0) n ) targeting π t κ ( y , γ ) suc h that π t ( γ ) = P y ∈ B d π t ( y ) κ ( y , γ ) s ← 1 rep eat sample x ( s ) k ∼ κ ( x ( s − 1) k , · ) for all k = 1 , . . . , n un t il | ζ ( X ( s ) ) − ζ ( X ( s − 1) ) | < 0 . 02 or ζ ( X ( s ) ) > 0 . 95 return X ( s ) = ( x ( s ) 1 . . . , x ( s ) n ) ⊺ π t is still close to the unif orm distrib ution, and w e manage to raise the particle div ersity up to one. As π t is app roac hing a stron gly m ulti-mo dal target distribu tion π , h o wev er, the particle dive rsit y r eac hes a s teady-state w e cannot push it b eyond. Clearly , eve n if w e could dra w a particle system indep endent ly from π , th e particle div ersit y w ould b e a lot sm aller th an one, since w e w ould draw the mo d es of π sev eral times. Aggregated w eights Sh ifting we igh ts b et w een id entical p articles do es not affect the nature of the approxima tion but it changes the effectiv e sample size η ( w ) which seems parado xical at fi rst sigh t. F or reasons of pars imoniousness, w e could just kee p a sin gle represent ativ e x ∗ for iden tical particles x ∗ 1 = · · · = x ∗ k and aggreg ate the asso ciated w eigh ts to the sum w ∗ = w ∗ 1 + · · · + w ∗ k without c hanging the qualit y of the particle appro ximation. Th ere are, ho wev er, three reasons wh y w e refrain from doing so. Firstly , it is vital not to confuse th e w eigh t disparit y induced by rewei gh ting acco rd- ing to the pr ogression of π t and the weigh t disp arit y du e to aggregatio n of the we igh ts of m ultiply sampled states. F rom the aggregated system, we cannot tell whether the effectiv e sample size is determin ed b y the gap b et ween π t and π t +1 , that is the step length α , or by the p resence of particle copies du e to the mass of π t b eing v ery concen- trated. Th erefore, it seems more d ifficult to con trol the smoothness of t he sequence of distributions and find a suitable sequ ence (  t ) τ t =0 . Secondly , aggregation is an add itional computational effort equiv alen t to k eepin g the particle s y s tem sorted. Here, we trade in computing time for memory w hile the required memory is prop ortional to the num b er of particles and not critical in the context of our algorithm. Thirdly , the straigh tforw ard w ay to implement rep eated mov e steps is b r eaking up the particles into multiple copies corresp ond ing to their weig h ts and mo ving th em separately . Consequent ly , instead of p ermanently splitting and aggregat ing the wei gh ts we migh t just allo w for m ultiple copies of the particles. 4.6 The Resample-move algo rithm Finally , w e summ arize th e complete Sequ en tial Mon te Carlo metho d in Algorithm 2 . Note that, in p ractice, the sequence π t = π ρ t is not indexed by t bu t rather by ρ t , that is the coun ter t is only giv en implicitly . 12 Algorithm 2 Resample-mo ve Input: π : B d → [0 , ∞ ) sample x k iid ∼ U ( B d ) for a ll k = 1 , . . . , n . α ← find step length (0 , X ) (Pro cedure 2 ) w ← imp ortance w eigh ts ( α, π , X ) (Pro cedure 1 ) while  < 1 do q θ ← fit binary mo del ( w , X ) (Section 5 ) b X ← resample ( w, X ) (Pro cedure 3 ) X ← mo ve ( κ π ,q θ , b X ) (Pro cedure 4 ) α ← find step length ( ρ, X ) (Pro cedure 2 ) w ← imp ortance w eights ( α, π , X ) (Pro cedur e 1 ) ρ ←  + α end while return P n k =1 w k x k ≈ E π ( γ ) F or an efficien t implemen tation, w e recommend to store the v alues π ( x 1 ) , . . . , π ( x n ) and q θ ( x 1 ) , . . . , q θ ( x n ) to a void unn ecessary ev aluations. When up d ating the latter set, w e can exploit the fact that, in a systematically resamp led particle system, m u ltiple copies of the same particles are neighbour s. 5 Multiva riate bina ry m o dels In this section, w e add ress the choic e of a multiv ariate binary parametric family q θ with parameter θ ∈ Θ needed to construct the in dep end en t Metrop olis-Hastings kernel used in Pro cedure 4 . 5.1 Desired p rop erties W e fi rst frame the prop erties m aking a parametric family su itable f or our Sequentia l Mon te Carlo algorithm. (a) F or r easons of pars imon y , w e wan t to construct a family of d istributions with at most dim( θ ) ≤ d ( d + 1) / 2 parameters. More complex families are usually computationall y to o exp ensive to handle. (b) Giv en a s ample X = ( x 1 , . . . , x n ) from the target distrib ution π , we wan t to estimate θ ∗ suc h that the b in ary mo d el q θ ∗ is close to π . F or instance, θ ∗ migh t b e a maximum lik eliho o d or metho d of momen ts estimator. (c) W e wan t to generate ind ep endent samples from q θ . If we can compute the cond itional 13 or marginal distributions, we can wr ite q θ as q θ ( γ ) = q θ ( γ 1 ) d Y i =2 q θ ( γ i | γ 1: i − 1 ) (11) = q θ ( γ 1 ) d Y i =2 q θ ( γ 1: i ) /q θ ( γ 1: i − 1 ) . Using the c h ain rule decomp osition ( 11 ), w e can sample a random v ector γ ∼ q θ comp onent -wise, conditioning on th e entries w e already generated. (d) W e need to rapidly ev aluate q θ ( γ ) for any γ ∈ B d in order to compute the Metrop olis- Hastings ratio ( 5 ). (e) Analogously to the m ultiv ariate normal, w e w ant our calibrated binary mo del q θ ∗ to pro du ce samples with the mean and co v ariance of π . If q θ is n ot flexible en ou gh to capture the dep endence structure of π , the Metropolis-Hastings k ern el in Pro cedu re 4 cannot pro vid e satisfactory acceptance r ates for complex target distribu tions π . In the follo wing we construct a suitable parametric family and explain ho w to deplo y it in Algorithm 2 . Most of the literature on binary data stems fr om r esp onse mo dels, m ulti-w ay con- tingency tables and multiv ariate in teraction th eory ( Co x , 1972 ). F or complete ness, w e app end a brief list of other binary mo d els mentioned in the literature whic h fail, for v arious reasons, to work in Sequent ial Monte Carlo applications. Pro viding p aramet- ric families wh ic h meet the ab o ve requirements in high dimensions is a d ifficult task and u n derstanding the shortcomings of alternativ e app r oac hes an imp ortant part of the discussion. 5.2 The logistic conditionals model In the previous paragraph, we already men tioned that a factorization ( 11 ) of the mass function q θ ( γ ) int o conditional distributions p ermits to sample from the parametric family . Unfortunately , for a complex d -dimensional binary mo del, w e u sually cannot calculate closed-form expressions for the conditional or marginal mass functions. W e get aroun d the computation of th e marginal d istributions of q θ ( γ ) if we directly fit univ ariate m o dels q b i ( γ i | γ 1: i − 1 ) to the conditionals π ( γ i | γ 1: i − 1 ) of the target function. Qaqish ( 2003 ) suggested the use of linear regressions to mo d el the conditional probabilities. This approac h, ho wev er, do es n ot guaran tee that the fitted m o del is a v alid distribution since the mass fu nction migh t b e negativ e. Construction of the model W e prop ose to rather u se logistic regressions for the con- ditional probabilities. P recisely , we adju st the univ ariate mo dels ℓ ( P π ( γ i = 1 | γ 1: i − 1 )) := b i,i + P i − 1 j =1 b i,j γ j , i = 1 , . . . , d where ℓ ( p ) = log p − log(1 − p ). In the context of our S equen tial Monte C arlo application, w e tak e the particle system X and regress y ( i ) = X i on th e columns Z ( i ) = ( X 1: i − 1 , 1 ), where the column Z ( i ) i yields the in tercept to complete the logistic mo del. 14 F or a d -dimensional lo wer triangular matrix B , w e define the logistic conditionals mo del as q B ( γ ) := Q d i =1 B p ( b i,i + b i, 1: i − 1 γ ⊺ 1: i − 1 ) ( γ i ) (12) where p ( y ) = ℓ − 1 ( y ) = (1 + exp ( − y )) − 1 is the logistic function. Recall that B p ( γ ) = p γ (1 − p ) 1 − γ denotes the univ ariate Bernoulli d istribution with exp ected v alue p ∈ [0 , 1]. There are d ! p ossib le logistic regressions mo dels and we arbitrarily pic k on e while there should b e a parametrization wh ic h is optimal in a sense of nearness to the d ata Z . W e observ ed, h o wev er, that p ermuting th e comp onen ts had, in practice, no imp act on the qualit y of the approxi mation. Keep in mind th at the n umb er o f observ ations in the log istic regressions is the size n of the particle system and t ypically v er y large. F or instance, we run our numerical examples in Section 6 u sing n = 2 × 10 4 particles. Therefore, the fit of th e logistic regressions is usually ve ry go o d . Spa rse version The ma jor drawbac k of an y kind of multiplicati v e mo del is the fact that w e hav e no closed-form lik eliho o d-maximizers, and th erefore the parameter estimation requires costly iterativ e fitting p ro cedures. Th erefore, ev en b efore discussing the fitting pro cedur e, w e construct a sparse version of the logisti c conditionals mo del whic h we can estimate faster than the saturated mo d el. Instead of fitting the saturated mo del q ( γ i | γ 1: i − 1 ), we p referably w ork with a more parsimonious regression mo del lik e q ( γ i | γ L i ) for some in dex set L i ⊆ { 1 , . . . , i − 1 } , where the n umber of p r edictors # L i is t ypically smaller than i − 1. W e solve this nested v ariable selection problem using some simp le, fast to compute criterion. Giv en a we igh ted p article system w ∈ [0 , 1] n , X ∈ B n × d , we d enote for i, j ∈ { 1 , . . . , d } the we igh ted sample mean b y ¯ x i = P n k =1 w k x k ,i , ¯ x i,j = P n k =1 w k x k ,i x k ,j , (13) and the w eigh ted sample correlation by r i,j = ¯ x i,j − ¯ x i ¯ x j p ¯ x i (1 − ¯ x i ) ¯ x j (1 − ¯ x j ) . (14) F or ε = 0 . 02, w e define the index set I := { i ∈ { 1 , . . . , d } | ¯ x i / ∈ ( ε, 1 − ε ) } . whic h iden tifies the comp onents whic h hav e, according to particle system, a marginal probabilit y close to either b ound ary of the un it in terv al. F or the comp onents i ∈ I , w e d o n ot consider fitting a logistic regression, but set L i = ∅ and d ra w them indep endently . Precisely , we set b i,i = ℓ ( ¯ x i ) and b i, − i = 0 which corresp onds to logistic mo del without p r edictors. Firstly , interactio ns do not really mat- ter if the marginal probabilit y is excessiv ely small or large. Secondly , these comp onents are prone to cause complete separation in th e data or migh t ev en b e constan t. F or the conditional d istribution of the remaining comp onents I c = { 1 , . . . , d } \ I , we 15 construct parsimonious logistic regressions. F or δ = 0 . 075 , w e define th e predictor sets L i := { j ∈ { 1 , . . . , i − 1 } | δ < | r i,j |} , i ∈ I c , whic h iden tifies the comp onent s with ind ex sm aller than i and significant mutual asso- ciation. Runn in g our examples in S ection 6 w ith δ = 0 sho w that a saturated logistic regression k ern el ac hiev es ab out th e same acceptance rates as a sparse one, while setting δ = 0 . 075 dramatically redu ces the computatio nal time w e n eed to calibrate the mo del. Fitting the mo del W e m aximise the log-li k eliho o d fun ction ℓ ( b ) = ℓ ( b | y , Z ) of a w eigh ted logistic regression mo del by solving the fir st order condition ∂ ℓ/∂ b = 0 . W e find a numerical solution via Newton-Raphson iteratio ns − ∂ 2 ℓ ( b ( r ) ) ∂ bb ⊺ ( b ( r +1) − b ( r ) ) = ∂ ℓ ( b ( r ) ) ∂ b , r > 0 , (15) starting at some b (0) ; see Pro cedu re 5 for the exact term s . Oth er up d ating form u las lik e Iterativ ely Rew eigh ted Least Squares or quasi-Newton iterations sh ou ld work as w ell. Pro cedure 5 Fitting the weig h ted logistic r egressions Input: w = ( w 1 , . . . , w n ) , X = ( x 1 , . . . , x n ) ⊺ , B ∈ R d × d for i ∈ I c do Z ← ( X L i , 1 ) , y ← X i , b (0) ← B i,L i ∪{ i } rep eat p k ← ℓ − 1 ( Z k b ( r − 1) ) for all k = 1 , . . . , n q k ← p k (1 − p k ) for all k = 1 , . . . , n b ( r ) ← ( Z ⊺ diag [ w ] d iag [ q ] Z + ε I n ) − 1 × ( Z ⊺ diag [ w ])  diag [ q ] Z b ( r − 1) + ( y − p )  un t il | b ( r ) j − b ( r − 1) j | < 10 − 3 for all j B i,L i ∪{ i } ← b end for return B Sometimes, the Newton-Raphson iterations d o not con v erge b ecause the lik eliho o d function is monotone and thus h as no finite m aximizer. This problem is caused by data with complete or q u asi-complete separation in the sample p oint s ( Alb ert and And erson , 1984 ) . There are seve ral wa ys to hand le this issue. (a) W e just halt the algorithm after a fixed n um b er of iterations and ignore the lac k of con verge nce. Suc h pro ceeding, h o wev er, migh t cause u ncon trolled numerical prob- lems. (b) In general, Firth ( 1993 ) recommends Jeffrey’s prior but this option is computation- ally rather exp ensiv e. Instead, we might use a Gaussian pr ior with v ariance 1 /ε > 0 whic h adds a quadratic p enalty term ε b ⊺ b to the log-li k eliho o d to ensure the target- function is con vex. 16 (c) As we notice that s ome terms of b i are growing b ey ond a certain th r eshold, w e mo v e the comp onent i from the set of comp onents with asso ciated logistic regression mo del I c to the set of indep endent comp onent s I . In p ractice, w e combine the approac h es (c) and (d). In Pro cedure 5 , we did not elab orate ho w to handle non-con verge nce, but added a penalty term to the log-lik eliho o d , whic h causes the extra ε I n in the Newton-Raphson up d ate. Since we solv e the u p d ate equation via Cholesky factorizations, adding a small term on the diagonal also ensures that the matrix is indeed numericall y decomp osable. Starting p oints The Newton-Raphson pr o cedure is known to rapidly con verge for start- ing v alues b (0) i not to o far from the solution b ( ∗ ) i . In absence of prior information ab out b ( ∗ ) i , w e would naturally start with a vec tor of zeros and mayb e setting b (0) i,i = ℓ ( ¯ x i ). In the con text of o ur Sequent ial Mon te Carlo algorithm we can d o b etter than that. Recall that, we constru cted a smo oth sequence ( π t ) τ t =0 of distribu tions which corresp onds to a sequence of prop osal d istributions ( q t ) τ t =0 = ( q θ t ) τ t =0 whic h is asso ciated to a sequence ( θ t ) τ t =0 of parameters. It signifi can tly s p eeds up the Newton-Raphson pro cedur e if we c h o ose B t − 1 as starting p oint for the estimation of B t . Ind eed, to w ards the end of the Sequential Mon te Carlo algorithm, we fit, for the same pr ecision, a logistic regression in less than f our iterations on av erage when s tarting at B t − 1 , compared to ab out 13 iterations on a v erage when starting at zero. Sampling and evaluating In the mov e step of Sequ en tial Monte Carlo w e d iscussed in Section 4.5 , w e need to sample a prop osal state y from q θ and ev aluate the like liho o d q θ ( y ) to compute the Metrop olis-Hastings ratio 5 . F or th e logisti c regression mo del q B , w e can do b oth in one go, see Pr o cedure 6 . Pro cedure 6 Sampling from the mo d el Input: B y ← (0 , . . . , 0) , p ← 1 for i = 1 . . . , d do q ← ℓ − 1 ( b i,i + P j ∈ L i b i,j y j ) sample γ i ∼ B q p ← ( p × q if y i = 1 p × (1 − q ) if y i = 0 end for return y , p 5.3 Why not use a simpler mo del? W e briefly justify why w e should not use a simpler parametric family for our Sequential Mon te Carlo applicatio n. Indisputably , the easiest parametric family on B d that we can think of is a pro duct mo del q p ( γ ) := Q d i =1 B p i ( γ i ) where B p i ( x ) ( γ ) = p i ( x ) γ (1 − p i ( x )) 1 − γ denotes a Bern oulli distribution with exp ected v alue p i ( x ) ∈ [0 , 1]. 17 Figure 1: T oy example showing how well the pr o duct mo del q p and the log istic reg ression mo del q B replicate the ma s s function of a difficult p osterio r distribution π . (a) true mass function π ( γ ) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0.00 0.05 0.10 0.15 0.20 0.25 0.30 (b) pro duct m odel q p ( γ ) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0.00 0.05 0.10 0.15 0.20 0.25 0.30 (c) logistic regression mo del q B ( γ ) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Let us c hec k the requirement list: the pro duct mo d el is parsimonious w ith d im( θ ) = d ; the maxim um lik eliho o d estimator θ ∗ is the sample mean ¯ x = n − 1 P n k =1 x k ; the decomp osition ( 11 ) holds trivially , whic h allo ws us to sample fr om q p and ev aluate q p ( γ ) in O( d ). Ob viously , ho wev er, q p do es not r epro du ce any dep en d encies w e migh t observ e in X . Could w e just forget ab out this last p oint and use the pro duct mo del for its simp licit y? T o y example W e tak e a simple linear relation Y = V 1 + V 2 . F or n = 100 and µ = 10, w e dra w normal v ariates v 1 ∼ N ( − µ, I n ) , v 2 ∼ N ( µ, I n ) , y = v 1 + v 2 where y is the ve ctor of obs er v ations and z 1 , z 2 ∼ N  v 1 , ( µ 2 / 4) I n  , z 3 , z 4 ∼ N  v 2 , ( µ 2 / 4) I n  . four columns of co v ariates. The p osterior d istribution π ( γ ) = π ( γ | y , Z ), using the prior distributions as de- scrib ed in Section 2 , typica lly exhibits s tr ong dep endencies b etw een its comp onents d ue to the correlation in the data. No w we generate pseudo-random data Z from π and fit b oth a pro duct mod el q p and a logisti c r egression model q B . Lo oking at the corresp onding mass function in Figure 1 , w e notice ho w badly the pro duct m o del mimics the true p osterior. This observ ation carries o ver to larger sampling spaces. Acceptance ra t es A go o d w a y to analyse the imp ortance of rep ro ducing the dep enden- cies of π is in terms of acceptance rates and particle div ersities. As we already remark in Section 4.5 , the particle div ersity naturally diminishes as our particle system approac h es a strongly concentrat ed target d istribution π . Ho w ev er, w e wan t our algorithm to k eep up the particle dive rsit y a lo ng as p ossib le to ensu re the particle system is well sp read out o ver the en tire state space. In Figure 2 , we s ho w a comparison (based on the Boston Housing data set explained in Section 6.1 ) b etw een t w o Sequ ential Mon te Carlo algorithms, usin g a p r o duct mo d el q p and a logistic regression mo d el q B as prop osal distribution of the Metrop olis-Hastings k ern el ( 5 ). Clearly , in Figure 2(a) , the acceptance rates achiev ed b y th e p r o duct kernel r apidly 18 Figure 2: W e compare the use of a pro duct mo del q p to a logistic r e g ressio n mo del q B as prop o sal distribution of the Metr op olis-Hasting s kernel ( 5 ). W e monitor a typical r un (  on the x- axis) o f our Sequential Monte Car lo a lgorithm (for the Boston Housing da ta set describ ed in Section 6.1 ) and plot the acceptance rates and particle div ersities (on the y-ax is ). (a) acceptance rates 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● product model logistic re gression model (b) particle diversities 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● product model logistic re gression model decrease an d dwe ll around 5% for the second h alf of the run. In con trast, the logisti c regression kernel alw a ys p ro vid es acceptance rates greater th an 20%. As a consequence, in Figure 2(b) , the particle diversit y sustained b y the p r o duct kernel decreases at an early stage, while the logistic r egression k ernel h olds it up until the v ery last steps. A t first sigh t, it migh t seem o dd that the accepta nce rates o f the logistic regression k ern el increase d uring the fin al steps of the algorithm. If we jump ahead, how ev er , and tak e a lo ok at the results of the Boston Housing problem, see Figure 3(a) , we n otice that quite a few marginal p robabilities of the p osterior π turn out to b e zero, whic h makes it easier to repro du ce the distributions to wards the end of the Resample-Mo ve algorithm. Ho wev er, if we already decide at an early stage that for some comp onen t P ( γ i = 1) = 0, w e fail to ever consider s tates γ ∈ B d with γ i = 1 for the rest of the algorithm. Therefore, the adv an tage of the logistic regression kernel ov er the simple pro d uct kernel is that we do not completely drop an y comp onents fr om the v ariable selection problem un til the final steps. 5.4 Review of alternat ive bina ry mo dels In the follo w ing, we review some alternativ e app roac hes to mo deling m ultiv ariate binary data. Un fortunately , we cannot in corp orate an y of these mo dels in our Sequenti al Monte Carlo algorithm. Still, it is instructiv e to understand wh y alternativ e strategies fail to pro vide suitable prop osal distributions in the sense of Section 5.1 . F or a more detailed review of parametric families suitable f or adaptiv e Mon te Carlo algorithms on binary spaces, see Sc h¨ afer ( 2010 ). Quadratic multi-linea r m o dels F or co efficien ts a ∈ R 2 d , we can write an y mass function on B d as π ( γ ) = P S ⊆{ 1 ,. ..,d } a S Q i ∈ S γ i . It is tempting to construct a d ( d + 1) / 2 parameter mo del q µ, A ( γ ) := µ + γ ⊺ A γ 19 b y remo ving in teraction terms of order h igher than tw o. As Bahadur ( 19 61 ) p oint s out, the main p roblem of any add itiv e ap p roac h is the fact that a truncated mo del migh t not b e non-negativ e and th us not defi n e a probabilit y distribution. Although th e linear structur e allo w s to derive explicit and recursive form u lae for th e marginal and conditional d istr ibutions, w e hardly ev er find a u s eful application for the additiv e mo del. As other authors ( P ark et al. , 1996 ; Emric h and Piedmonte , 1991 ) remark, ad d itiv e r epresent ations lik e the m uc h -cited Bahadur ( 1961 ) expansion are q u ite instructiv e but, unf ortunately , impractical. Quadratic exp onential mo dels F or co efficien ts a ∈ R 2 d , we can wr ite an y mass fu nction on B d as π ( γ ) = exp  P S ⊆{ 1 ,. ..,d } a S Q i ∈ S γ i .  Remo ving h igher order interact ion terms, we can construct a d ( d + 1) / 2 parameter mo del q µ, A ( γ ) := µ exp( γ ⊺ A γ ) , (16) where A is a symmetric matrix. Qu adratic exp onentia l m o dels are a w ell defi ned class of distributions, but ther e is no simple recursiv e structur e for their marginal distributions. Hence, we cannot compute the factoriza tion ( 11 ) w e need to sample from q A . Co x and W erm uth ( 1994 ) prop ose an appro ximation to the marginal distribu tions by expressions of the form ( 16 ), omitting higher order terms in a T a ylor expansion. If w e write the parameter A as A =  A ′ b ⊺ b c  , the parameter of the marginal d istribution q A 1: d − 1 ( γ 1: d − 1 ) is appro ximately giv en by A 1: d − 1 ≈ A ′ +  1 + tanh( c 2 )  diag [ b ] + 1 2 sec h 2 ( c 2 ) bb ⊺ , and the normalizing constant is µ 1: d − 1 = µ (1 + exp( c )). W e can recursiv ely compute appro ximations to all marginal d istributions q A 1: d − 1 , . . . , q A 1:1 and deriv e logistic forms ℓ ( P ( γ i = 1 | γ 1: i − 1 )) = log q A 1: i ( γ i = 1 , γ 1: i − 1 ) q A 1: i ( γ i = 0 , γ 1: i − 1 ) , whic h tak es us back to ( 12 ). Ho wev er, there is no reason to fit a quadratic exp onen tial mo del and compute the appro ximate logistic mo del if we can directly fi t the logistic conditionals mo d el in the same time. Latent variable mo dels Let ϕ θ b e a parametric family on X and τ : X → B d a mapp ing in to the binary state space. W e can sample from a laten t v ariable mo del q θ ( γ ) := R τ − 1 ( γ ) ϕ θ ( v ) d v b y setting y = τ ( v ) for a dr a w v ∼ ϕ θ from the laten t parametric family . Non-normal parametric families with d ( d − 1) / 2 dep enden ce parameters seem to ei - ther ha ve a v ery limited dep end ence stru cture or u nfa v our able prop erties ( Jo e , 199 6 ). 20 Therefore, the multi v ariate normal ϕ ( µ , Σ ) ( v ) = (2 π ) − d/ 2 | Σ | − 1 / 2 e − 1 / 2( v − µ ) ⊺ Σ − 1 ( v − µ ) , τ ( v ) = ( 1 ( ∞ , 0] ( v 1 ) , . . . , 1 ( ∞ , 0] ( v d )) , app ears to b e the n atur al and almo st the only option for p θ . This kind of mo d el has b een discussed r ep eatedly in th e literature ( Emric h and Piedmonte , 1991 ; Leisc h et al. , 1998 ; Co x and W erm uth , 2002 ). The first and second order marginal probabilities of th e mo del q ( µ , Σ ) are giv en b y Φ 1 ( µ i ) and Φ 2 ( µ i , µ j ; σ i,j ), resp ectiv ely , where Φ 1 ( v i ) and Φ 2 ( v i , v j ; σ i,j ) denote the cu- m u lativ e d istribution fu nctions of the univ ariate and biv ariate normal distributions with zero mean, unit v ariance and correlation σ i,j ∈ [ − 1 , 1]. W e can fit the mo del q ( µ , Σ ) to a particle system ( w , X ) by matc hing the momen t, that is adjusting µ and Σ such th at Φ 1 ( µ i ) = ¯ x i , Φ 1 ( µ i , µ j ; σ i,j ) = r i,j with ¯ x i and r i,j as defined in ( 13 ) and ( 14 ). Ho w ev er, the lo cally constructed correlation matrix Σ migh t n ot b e p ositiv e d efinite. Still, w e can ob tain a feasible parameter replacing Σ by Σ ∗ = ( Σ + | λ | I ) / (1 + | λ | ), where λ is smaller than all eigen v alues of the lo cally adjusted matrix Σ . The main d ra w bac k of latent v ariable appr oac hes is the fact that that p oin t-wise ev aluation of the pr obabilit y mass fun ction q θ ( y ) is computationally feasible on ly in sp ecial cases. Hence, we ca nnot us e this cla ss of mo d els in a Sequential Mo nt e Carlo con text. Archimedean copula mo dels The p oten tials and pitfalls of applying copula theory , whic h is wel l dev elop ed for biv ariate, con tin u ous random v ariables, to multiv ariate dis- crete distribu tion is discussed in Genest and Nesleho v a ( 2007 ). There hav e b een earlier attempts to sample binary v ectors via copulae: Lee ( 1993 ) d escrib es ho w to construct an Arc h imedean copu la, more p recisely the F rank family ( Nelsen , 2006 , p.119), for sampling m u ltiv ariate bin ary data. Unfortun ately , th is approac h is limited to v ery lo w d imensions. Multiva riat e reduction mo dels Sev eral approac h es to generating m ultiv ariate binary data are based on a r epresen tation of the comp onents γ i as functions of su ms of indep en- den t v ariables ( P ark et al. , 1996 ; Lunn and Da vies , 1998 ; Om an and Zuck er , 2001 ). Th ese tec hniqu es are limited to certain patterns of non-n egativ e correlatio n, and do, therefore, not yield suitable prop osal d istr ibutions in a Sequent ial Mo n te Carlo applicatio n. W e men tion them for the sak e of completeness. 6 Numerical exp erim ents In this section w e compare our S equen tial Mon te Carlo algorithm to standard Mark o v c hain m etho ds based on lo cal m ov es as in tro duced in Section 3 . These are standard algorithms and widely used. Th ere are other recen t approac hes lik e Ba y esian Adaptiv e Sampling ( Clyde et al. , 201 1 ) or Ev olutionary Sto c hastic Searc h ( Bottolo and Ric hard- son , 2010 ) w h ic h also aim at o v ercoming the difficulties of m u lti-mo dal binary distribu - tions. Ho w ev er, a thorough and ju st comparison of our S equen tial Mont e Carlo approac h 21 to other adv anced metho ds needs careful consideration and is b ey ond the scop e of this pap er. F or testing, we cr eated v ariable selection pr oblems w ith high dep endencies b et w een the co v ariates whic h y ield particularly c hallenging, multi-mod al p osterior mass functions. The pr oblems are bu ild from freely a v ailable datasets b y adding loga rithms, p olynomials and interac tion terms. T he Mark o v c hain Mon te Carlo metho d s pr esen ted in S ection 3 tend to fail on these p roblems due to the very strong m ulti-mo dalit y of the p osterior distribution while the Sequen tial Mon te Carlo appr oac h w e advocate in Section 4 yields v ery reliable results. Note, ho w ev er, that using Sequen tial Mon te Carlo we d o not get something for noth- ing. Firstly , the implementa tion of our algorithm in clud ing the logisti c cond itionals mo del in tro du ced in Sectio n 5.2 is quite in volv ed compared to standard Mark o v chain algorithms. Secondly , simp le Mark o v c hain metho ds are faster than our algorithm w hile pro du cing results of the same accuracy if the comp onents of the target d istribution are nearly indep endent. 6.1 Construction of the dat a sets W e b riefly describ e the v ariable selecti on pr oblems comp osed for our n u merical exp eri- men ts. Boston Housing The fir st example is b ased on th e Boston Housing data set, originally treated by Harrison and Rubinfeld ( 1978 ), whic h is freely a v ailable at the StatLib data arc hive. The data set provi des co v ariates ranging from the nitrogen o x id e concentrat ion to the p er capita crime rate to explain the median prices of o wner-o ccupied homes, see T able 1 . Th e d ata has yet b een treated b y sev eral auth ors , mainly b ecause it pro vides a ric h mixture of con tinuous and discrete v ariables, resulting in an in teresting v ariable selection problem. Sp ecifically , w e aim at explaining the logarithm of the corrected median v alues of o wn er-o ccupied housing. W e enhance the 13 columns of the original d ata set by adding first order in teractions b et w een all co v ariates. F ur th er, we add a constant column and a squared v ersion of eac h co v ariate (except for chas since it is bin ary). This give s us a mo d el choic e problem with 104 p ossible predictors an d 506 observ a- tions. W e use a hierarc hical Ba y esian appr oac h, with pr iors as explai ned in the ab o ve Section 2 , to construct a p osterior distribution π . By construction, there are strong de- p end en cies b etw een th e p ossible pr edictors which leads to a r ather complex, m u lti-mo dal p osterior distribu tion. Concrete C om p r essive Strength The second example is constructed from a less kno wn data set, originally treated b y Y eh ( 1998 ), which is fr eely a v ailable at the UCI Mac hine Learning Rep ository . The data pro vides inf ormation ab out comp onent s o f concrete to explain its compressive strength. The compr essiv e strength app ears to b e a h ighly n on- linear function of age and ingredients. In order to exp lain the compressiv e strength, w e tak e the 8 co v ariates of the original data set and add the logarithms of some co v ariates (indicated by the prefi x lg ), see T able 2 . F ur ther, w e add in teractions b et ween all 13 co v ariates of the augmen ted data set and a constan t column. 22 This give s us a mo d el choic e problem with 79 possib le predictors and 1030 observ a- tions. W e use a hierarc hical Ba y esian appr oac h, with pr iors as explai ned in the ab o ve Section 2 , to construct a p osterior distribution π . Protein activit y data The third example has originally b een analyzed b y Clyde a nd P arm igiani ( 1998 ). L ater, Clyde et al. ( 2011 ) u s ed it as a challengi ng example problem in v ariable selection and included th e r aw data in the R -pac k age BAS av ailable at C RAN whic h implemen ts the Ba y esian Adaptive Sampling algorithm. In order to exp lain the pr otein activit y ( pr ot.act1 ), we first conv ert the f actors b uf , ra and det in to a f actor m o del. W e enhance the 14 column s of this data set b y adding first ord er interac tions b et wee n all cov ariates and a constan t column. F or details on the ra w data see T able 3 . Note that some columns turn out to be c onstan t zeros suc h that we obtain a model c hoice problem with 88 p ossible predictors and 96 observ ations. F or reasons of con- sistency , we c ho ose the priors e xplained in the ab o ve Section 2 instea d of the original g -prior u sed in Clyde et al. ( 2011 ). 6.2 Main effect restrictions In some statistical applications w e might w ant to only include the interac tions if the corresp ondin g main effects are presen t in the mo del. These constrain ts are easy to incorp orate if n eeded but render the sampling problem ev en more c hallenging since the constrained sup p ort m ak es the state space exploration more difficult. Let d denote the n umb er of main effects and γ i,j the int eraction of the main effects γ i and γ j for all i, j = 1 , . . . , d . F or the v ariable selectio n problem on B d ( d +1) / 2 , w e imp ose the prior constrain ts on the feasible interac tions π ( γ | Z ) ∝ 1 { γ ∈ B d ( d +1) / 2 | γ ij ≤ γ i γ j for all i,j =1 ,. ..,d } ( γ ) . (17) While the Mark o v c hain Mon te Carlo algorithms can pro ceed as b efore, w e need to sligh tly mo dify our Sequen tial Mon te Carlo approac h. When sampling from a restricted distribution, we initialise the p article system with an iid sample from the p rior ( 17 ) in- stead of the u niform distribu tion. In the sequ el, w e rep ort f or eac h d ataset a comparison with and without the main effect restrictions. 6.3 Ho w to compa re to Ma rk ov c h ain Monte Ca rlo W e do not think it is r easonable to compare tw o completely different algorithms in terms of p u re computational time. W e cannot guaran tee that our implemen tations are optimal nor that the time measuremen ts can exactly b e repro duced in other computing en vir onmen ts. W e supp ose that the num b er of ev aluations of the target f u nction π is more of a fair stopping criterion, since it sho ws ho w w ell the algorithms exploit the information obtained from π . P r ecisely , we p arameterise the Sequential Mon te Carlo algorithm to n ot exceed a fixed num b er ν of ev aluations and stop the Mark o v c hains when ν ev aluations ha ve b een p erform ed. Assets and drawbacks The Sequentia l Monte Carlo and the Marko v chai n Monte Carlo algorithms b oth ha ve extensions and n u merical sp eed-ups whic h mak e it hard to settle 23 on a fair comparison. Adv o cates of Marko v c h ain method s migh t criticise that the n umber of target ev al- uations is a criterion biased tow ards th e Sequential Monte Carlo approac h, for th ere are up dating s chemes wh ic h allo w f or faster computation of the Cholesky decomp osition ( 3 ) giv en th e decomp osition of a neighbour ing mo del, s ee Dongarra et al. ( 1979 , c haps. 8,10). T h us, Mark o v chains whic h prop ose to change one comp onen t in eac h step can ev aluate π with less effort and p erform more ev aluations of π in the same time. On the other hand , ho wev er, the Sequential Monte C arlo algorithm can b e p arallelised in the sen se that w e can, on suitable hardw are, r un man y ev aluations of π in parallel during the mo ve s tep, see Pr o cedure 4 . No analogue s p eed-up can b e p erformed in the con text of Mark ov c hains. W e ha v e pr o cessed v ariable selection problems from genetics with ab out a thousand co v ariates within a few hours r unning a parallelised ve rsion of the algorithm on a 64-CPU cluster. A detailed r ep ort is going to b e pub lished as supplementary material. F urther, Sequen tial Mon te Carlo metho ds are more su itable th an Marko v chain Monte Carlo to app ro ximate the evi dence, that is the normalization constant of the p osterior distribution. W e can exploit this prop ert y to compare, for instance, regression mo d els with differen t monotonic link fu nctions. P arameters W e run our Sequenti al Monte Carlo (SMC) algorithm with n = 1 . 5 × 10 4 particles and a target effectiv e sample size η = 0 . 9, as exp lained in Section 4 . F or these parameters, t he Sequenti al Monte Carlo algorithm needs less than ν = 2 . 5 × 10 6 ev aluations of π on all examples pr oblems. W e compare our algorithm to b oth the Adaptiv e Marko v c h ain Mon te Carlo ( Nott and Kohn , 2005 , AMCMC) and the standard metrop olised Gibbs ( Liu , 199 6 , MCMC) describ ed in Section 3 . F or the MCMC, we dr a w the num b er of bits to b e flipp ed from a truncated geometric distribution with mean k ∗ = 2 as prop osed in S ection 3.3 . Ho w ev er, as stated earlier, we could not obs er ve a significant effect of c h anges in the blo ck up dating sc h emes on the qualit y of the Mon te C arlo estimate. F or the AMCMC, w e use δ = 0 . 01 and λ = 0 . 01, follo wing the recommendations of Nott and Kohn ( 2005 ). W e up date the estimates ψ an d W ev ery 2 × 10 5 iterations of c hain. Before w e start adapting, we generate 2 . 5 × 10 5 iterations with a metrop olised Gibbs k ernel (after a discard ed b urn-in of 2 . 5 × 10 4 iterations). 6.4 Implementation The n umerical wo rk was completely done in Python 2.6 us in g SciPy pac k ages and run on a cluster with 1 . 86GHz pr o cessors. Scien tific work in applied fields is often more accessible to the r eader if the sour ce co de w hic h generated numerica l evid en ce is released along with the pu blication. Th e complete, do cumented sour ces used in this wo rk can b e found at http:// code.goo gle.com/ p/smcdss . W e also provide instr uctions on ho w to in s tall and run our p ro ject. Th e program can pro cess data sets in standard csv -format and generate R scr ip ts for graph ical visualisation of th e r esults. The released version w as tested to r un on b oth Windo ws and Linux mac hin es. 24 6.5 Results and discussion W e run eac h algorithm 200 times and eac h time w e obtain for all co v ariates a Monte Carlo estimate of the marginal pr obabilit y of inclus ion in the norm al linear mo del. W e visualize the v ariation of the estimator by b ox- plots that sho w how muc h the Mon te Carlo estimates ha ve v aried throughout the 200 runs (Figures 3 and 5 ). Here, the white b oxe s con tain 80% of t he Mon te Carlo results, w hile the blac k b oxes sho w the exten t of the 20% outliers. F or b etter readabilit y , w e add a coloured bar up to the smallest estimate we obtained in the test runs ; otherwise comp onen ts with a small v ariation are hard to see. The vertic al line in the white b o x indicates the median of the Monte Carlo estimates. The median of the Sequen tial Mon te Carlo r u ns corresp ond v ery precisely to the results w e obtained by runnin g a Mark o v chain Mon te Carlo algorithm for a few d a ys. Unq u es- tionably , the Sequen tial Monte Carlo a lgorithm is extremely robust; for 200 test runs and for both d ata sets, the alg orithm did n ot pr o duce a s ingle ma jor outlier in any of the comp onents. This n ot true for either of the Mark ov chain algorithms. The size of white b o xes indicate that adaptive Marko v c hain Mont e Carlo w orks quite b etter than the standard Mark ov c hain p ro cedure. Ho wev er, even the adap tive Mark o v c hain metho d is rather vulnerable to generating outliers. The large blac k b o x es indicate that, for some starting p oints of the c h ain, the estimates of some marginal p robabilities migh t b e completely wrong. The outliers, that is the b lack b o xes, in the MCMC and the AMCMC plots are str ik- ingly similar. The adaptiv e and the standard Mark ov chains apparently b oth fall into the same trap, whic h in tu rn confirms the intuitio n that ad ap tion mak es a method faster but not more robu st against outliers. An adaptive lo cal metho d is still a local metho d and do es not yield reliable estimates f or difficult binary sampling pr oblems. Figure 8 suggests that in co nstrained spaces adaption is d ifficult and migh t ev en ha v e con tra-pr o ductiv e effects. In T ables 3 to 8 , we gather some k ey p erform an ce in dicators, eac h a verag ed o ver the 200 runs of the resp ectiv e algorithms. Note that the time needed to p erform 2 . 5 × 10 6 ev aluations of π is a little less th an the runn in g time of the standard Mark o v chai n. Thus, ev en in terms of compu tational time, the adaptiv e Mark o v chain can hardly comp ete with our Sequentia l Mon te Carlo metho d, ev en if ev aluatio ns of π w ere at no cost. Ackno wledgements N. Ch opin is su p p orted by the ANR gran t ANR-008-BLAN -0218 “BigMC” of th e F rench Ministry of researc h . W e would like to thank Pierre Jacob and t w o anon ymous referees for their v aluable commen ts on this pap er. W e ac kno wledge the StatLib data arc h iv e and th e UCI Mac h in e Learning Rep ository for pr o vidin g the data sets used in this work. 25 Figure 3: Boston Housing da ta set. F or details see Section 6.5 . (a) SMC ∼ 1 . 4 × 10 6 ev al’ns of π CONST CRIM CRIM.x.CRIM ZN ZN.x.ZN ZN.x.CRIM INDUS INDUS.x.INDUS INDUS.x.CRIM INDUS.x.ZN CHAS CHAS.x.CRIM CHAS.x.ZN CHAS.x.INDUS NO X NO X.x.NO X NO X.x.CRIM NO X.x.ZN NO X.x.INDUS NO X.x.CHAS RM RM.x.RM RM.x.CRIM RM.x.ZN RM.x.INDUS RM.x.CHAS RM.x.NO X A GE A GE.x.A GE A GE.x.CRIM A GE.x.ZN A GE.x.INDUS A GE.x.CHAS A GE.x.NO X A GE.x.RM DIS DIS.x.DIS DIS.x.CRIM DIS.x.ZN DIS.x.INDUS DIS.x.CHAS DIS.x.NO X DIS.x.RM DIS.x.A GE RAD RAD.x.RAD RAD.x.CRIM RAD.x.ZN RAD.x.INDUS RAD.x.CHAS RAD.x.NO X RAD.x.RM RAD.x.A GE RAD.x.DIS T AX T AX.x.T AX T AX.x.CRIM T AX.x.ZN T AX.x.INDUS T AX.x.CHAS T AX.x.NO X T AX.x.RM T AX.x.A GE T AX.x.DIS T AX.x.RAD PTRA TIO PTRA TIO.x.PTRA TIO PTRA TIO.x.CRIM PTRA TIO.x.ZN PTRA TIO.x.INDUS PTRA TIO.x.CHAS PTRA TIO.x.NO X PTRA TIO.x.RM PTRA TIO.x.A GE PTRA TIO.x.DIS PTRA TIO.x.RAD PTRA TIO.x.T AX B B.x.B B.x.CRIM B.x.ZN B.x.INDUS B.x.CHAS B.x.NO X B.x.RM B.x.A GE B.x.DIS B.x.RAD B.x.T AX B.x.PTRA TIO LST A T LST A T .x.LST A T LST A T .x.CRIM LST A T .x.ZN LST A T .x.INDUS LST A T .x.CHAS LST A T .x.NO X LST A T .x.RM LST A T .x.A GE LST A T .x.DIS LST A T .x.RAD LST A T .x.T AX LST A T .x.PTRA TIO LST A T .x.B 0.0 0.2 0.4 0.6 0.8 1.0 (b) A MCMC 2 . 5 × 10 6 ev al’ns of π CONST CRIM CRIM.x.CRIM ZN ZN.x.ZN ZN.x.CRIM INDUS INDUS.x.INDUS INDUS.x.CRIM INDUS.x.ZN CHAS CHAS.x.CRIM CHAS.x.ZN CHAS.x.INDUS NO X NO X.x.NO X NO X.x.CRIM NO X.x.ZN NO X.x.INDUS NO X.x.CHAS RM RM.x.RM RM.x.CRIM RM.x.ZN RM.x.INDUS RM.x.CHAS RM.x.NO X A GE A GE.x.A GE A GE.x.CRIM A GE.x.ZN A GE.x.INDUS A GE.x.CHAS A GE.x.NO X A GE.x.RM DIS DIS.x.DIS DIS.x.CRIM DIS.x.ZN DIS.x.INDUS DIS.x.CHAS DIS.x.NO X DIS.x.RM DIS.x.A GE RAD RAD.x.RAD RAD.x.CRIM RAD.x.ZN RAD.x.INDUS RAD.x.CHAS RAD.x.NO X RAD.x.RM RAD.x.A GE RAD.x.DIS T AX T AX.x.T AX T AX.x.CRIM T AX.x.ZN T AX.x.INDUS T AX.x.CHAS T AX.x.NO X T AX.x.RM T AX.x.A GE T AX.x.DIS T AX.x.RAD PTRA TIO PTRA TIO.x.PTRA TIO PTRA TIO.x.CRIM PTRA TIO.x.ZN PTRA TIO.x.INDUS PTRA TIO.x.CHAS PTRA TIO.x.NO X PTRA TIO.x.RM PTRA TIO.x.A GE PTRA TIO.x.DIS PTRA TIO.x.RAD PTRA TIO.x.T AX B B.x.B B.x.CRIM B.x.ZN B.x.INDUS B.x.CHAS B.x.NO X B.x.RM B.x.A GE B.x.DIS B.x.RAD B.x.T AX B.x.PTRA TIO LST A T LST A T .x.LST A T LST A T .x.CRIM LST A T .x.ZN LST A T .x.INDUS LST A T .x.CHAS LST A T .x.NO X LST A T .x.RM LST A T .x.A GE LST A T .x.DIS LST A T .x.RAD LST A T .x.T AX LST A T .x.PTRA TIO LST A T .x.B 0.0 0.2 0.4 0.6 0.8 1.0 (c) MCMC 2 . 5 × 10 6 ev al’ns of π CONST CRIM CRIM.x.CRIM ZN ZN.x.ZN ZN.x.CRIM INDUS INDUS.x.INDUS INDUS.x.CRIM INDUS.x.ZN CHAS CHAS.x.CRIM CHAS.x.ZN CHAS.x.INDUS NO X NO X.x.NO X NO X.x.CRIM NO X.x.ZN NO X.x.INDUS NO X.x.CHAS RM RM.x.RM RM.x.CRIM RM.x.ZN RM.x.INDUS RM.x.CHAS RM.x.NO X A GE A GE.x.A GE A GE.x.CRIM A GE.x.ZN A GE.x.INDUS A GE.x.CHAS A GE.x.NO X A GE.x.RM DIS DIS.x.DIS DIS.x.CRIM DIS.x.ZN DIS.x.INDUS DIS.x.CHAS DIS.x.NO X DIS.x.RM DIS.x.A GE RAD RAD.x.RAD RAD.x.CRIM RAD.x.ZN RAD.x.INDUS RAD.x.CHAS RAD.x.NO X RAD.x.RM RAD.x.A GE RAD.x.DIS T AX T AX.x.T AX T AX.x.CRIM T AX.x.ZN T AX.x.INDUS T AX.x.CHAS T AX.x.NO X T AX.x.RM T AX.x.A GE T AX.x.DIS T AX.x.RAD PTRA TIO PTRA TIO.x.PTRA TIO PTRA TIO.x.CRIM PTRA TIO.x.ZN PTRA TIO.x.INDUS PTRA TIO.x.CHAS PTRA TIO.x.NO X PTRA TIO.x.RM PTRA TIO.x.A GE PTRA TIO.x.DIS PTRA TIO.x.RAD PTRA TIO.x.T AX B B.x.B B.x.CRIM B.x.ZN B.x.INDUS B.x.CHAS B.x.NO X B.x.RM B.x.A GE B.x.DIS B.x.RAD B.x.T AX B.x.PTRA TIO LST A T LST A T .x.LST A T LST A T .x.CRIM LST A T .x.ZN LST A T .x.INDUS LST A T .x.CHAS LST A T .x.NO X LST A T .x.RM LST A T .x.A GE LST A T .x.DIS LST A T .x.RAD LST A T .x.T AX LST A T .x.PTRA TIO LST A T .x.B 0.0 0.2 0.4 0.6 0.8 1.0 T able. Boston Housing data se t. Averaged key indicato rs complementary to Figure 3 . Sequent ial MC Adaptiv e MCMC Standard MCMC computational time 0 : 36 : 59 h 4 : 50 : 52 h 0 : 38 : 06 h ev aluations of π 1 . 36 × 10 6 2 . 50 × 10 6 2 . 50 × 10 6 a verag e accepta nce r ate 36 . 4% 29 . 1% 0 . 81% length t of the c h ain x t 7 . 52 × 10 7 2 . 50 × 10 6 mo ves x t 6 = x t − 1 7 . 28 × 10 5 2 . 07 × 10 4 26 Figure 4: Boston Housing da ta set with main effect restr ic tions. F or details see Section 6 .5 . (a) SMC ∼ 1 . 2 × 10 6 ev al’ns of π CONST CRIM CRIM.x.CRIM ZN ZN.x.ZN ZN.x.CRIM INDUS INDUS.x.INDUS INDUS.x.CRIM INDUS.x.ZN CHAS CHAS.x.CRIM CHAS.x.ZN CHAS.x.INDUS NO X NO X.x.NO X NO X.x.CRIM NO X.x.ZN NO X.x.INDUS NO X.x.CHAS RM RM.x.RM RM.x.CRIM RM.x.ZN RM.x.INDUS RM.x.CHAS RM.x.NO X A GE A GE.x.A GE A GE.x.CRIM A GE.x.ZN A GE.x.INDUS A GE.x.CHAS A GE.x.NO X A GE.x.RM DIS DIS.x.DIS DIS.x.CRIM DIS.x.ZN DIS.x.INDUS DIS.x.CHAS DIS.x.NO X DIS.x.RM DIS.x.A GE RAD RAD.x.RAD RAD.x.CRIM RAD.x.ZN RAD.x.INDUS RAD.x.CHAS RAD.x.NO X RAD.x.RM RAD.x.A GE RAD.x.DIS T AX T AX.x.T AX T AX.x.CRIM T AX.x.ZN T AX.x.INDUS T AX.x.CHAS T AX.x.NO X T AX.x.RM T AX.x.A GE T AX.x.DIS T AX.x.RAD PTRA TIO PTRA TIO.x.PTRA TIO PTRA TIO.x.CRIM PTRA TIO.x.ZN PTRA TIO.x.INDUS PTRA TIO.x.CHAS PTRA TIO.x.NO X PTRA TIO.x.RM PTRA TIO.x.A GE PTRA TIO.x.DIS PTRA TIO.x.RAD PTRA TIO.x.T AX B B.x.B B.x.CRIM B.x.ZN B.x.INDUS B.x.CHAS B.x.NO X B.x.RM B.x.A GE B.x.DIS B.x.RAD B.x.T AX B.x.PTRA TIO LST A T LST A T .x.LST A T LST A T .x.CRIM LST A T .x.ZN LST A T .x.INDUS LST A T .x.CHAS LST A T .x.NO X LST A T .x.RM LST A T .x.A GE LST A T .x.DIS LST A T .x.RAD LST A T .x.T AX LST A T .x.PTRA TIO LST A T .x.B 0.0 0.2 0.4 0.6 0.8 1.0 (b) A MCMC 2 . 5 × 10 6 ev al’ns of π CONST CRIM CRIM.x.CRIM ZN ZN.x.ZN ZN.x.CRIM INDUS INDUS.x.INDUS INDUS.x.CRIM INDUS.x.ZN CHAS CHAS.x.CRIM CHAS.x.ZN CHAS.x.INDUS NO X NO X.x.NO X NO X.x.CRIM NO X.x.ZN NO X.x.INDUS NO X.x.CHAS RM RM.x.RM RM.x.CRIM RM.x.ZN RM.x.INDUS RM.x.CHAS RM.x.NO X A GE A GE.x.A GE A GE.x.CRIM A GE.x.ZN A GE.x.INDUS A GE.x.CHAS A GE.x.NO X A GE.x.RM DIS DIS.x.DIS DIS.x.CRIM DIS.x.ZN DIS.x.INDUS DIS.x.CHAS DIS.x.NO X DIS.x.RM DIS.x.A GE RAD RAD.x.RAD RAD.x.CRIM RAD.x.ZN RAD.x.INDUS RAD.x.CHAS RAD.x.NO X RAD.x.RM RAD.x.A GE RAD.x.DIS T AX T AX.x.T AX T AX.x.CRIM T AX.x.ZN T AX.x.INDUS T AX.x.CHAS T AX.x.NO X T AX.x.RM T AX.x.A GE T AX.x.DIS T AX.x.RAD PTRA TIO PTRA TIO.x.PTRA TIO PTRA TIO.x.CRIM PTRA TIO.x.ZN PTRA TIO.x.INDUS PTRA TIO.x.CHAS PTRA TIO.x.NO X PTRA TIO.x.RM PTRA TIO.x.A GE PTRA TIO.x.DIS PTRA TIO.x.RAD PTRA TIO.x.T AX B B.x.B B.x.CRIM B.x.ZN B.x.INDUS B.x.CHAS B.x.NO X B.x.RM B.x.A GE B.x.DIS B.x.RAD B.x.T AX B.x.PTRA TIO LST A T LST A T .x.LST A T LST A T .x.CRIM LST A T .x.ZN LST A T .x.INDUS LST A T .x.CHAS LST A T .x.NO X LST A T .x.RM LST A T .x.A GE LST A T .x.DIS LST A T .x.RAD LST A T .x.T AX LST A T .x.PTRA TIO LST A T .x.B 0.0 0.2 0.4 0.6 0.8 1.0 (c) MCMC 2 . 5 × 10 6 ev al’ns of π CONST CRIM CRIM.x.CRIM ZN ZN.x.ZN ZN.x.CRIM INDUS INDUS.x.INDUS INDUS.x.CRIM INDUS.x.ZN CHAS CHAS.x.CRIM CHAS.x.ZN CHAS.x.INDUS NO X NO X.x.NO X NO X.x.CRIM NO X.x.ZN NO X.x.INDUS NO X.x.CHAS RM RM.x.RM RM.x.CRIM RM.x.ZN RM.x.INDUS RM.x.CHAS RM.x.NO X A GE A GE.x.A GE A GE.x.CRIM A GE.x.ZN A GE.x.INDUS A GE.x.CHAS A GE.x.NO X A GE.x.RM DIS DIS.x.DIS DIS.x.CRIM DIS.x.ZN DIS.x.INDUS DIS.x.CHAS DIS.x.NO X DIS.x.RM DIS.x.A GE RAD RAD.x.RAD RAD.x.CRIM RAD.x.ZN RAD.x.INDUS RAD.x.CHAS RAD.x.NO X RAD.x.RM RAD.x.A GE RAD.x.DIS T AX T AX.x.T AX T AX.x.CRIM T AX.x.ZN T AX.x.INDUS T AX.x.CHAS T AX.x.NO X T AX.x.RM T AX.x.A GE T AX.x.DIS T AX.x.RAD PTRA TIO PTRA TIO.x.PTRA TIO PTRA TIO.x.CRIM PTRA TIO.x.ZN PTRA TIO.x.INDUS PTRA TIO.x.CHAS PTRA TIO.x.NO X PTRA TIO.x.RM PTRA TIO.x.A GE PTRA TIO.x.DIS PTRA TIO.x.RAD PTRA TIO.x.T AX B B.x.B B.x.CRIM B.x.ZN B.x.INDUS B.x.CHAS B.x.NO X B.x.RM B.x.A GE B.x.DIS B.x.RAD B.x.T AX B.x.PTRA TIO LST A T LST A T .x.LST A T LST A T .x.CRIM LST A T .x.ZN LST A T .x.INDUS LST A T .x.CHAS LST A T .x.NO X LST A T .x.RM LST A T .x.A GE LST A T .x.DIS LST A T .x.RAD LST A T .x.T AX LST A T .x.PTRA TIO LST A T .x.B 0.0 0.2 0.4 0.6 0.8 1.0 T able. Boston Housing data se t with main effect restrictions. Av era ged key indicator s complementary to Figure 4 . Sequent ial MC Adaptiv e MCMC Standard MCMC computational time 0 : 18 : 05 h 4 : 33 : 20 h 0 : 14 : 13 h ev aluations of π 1 . 15 × 10 6 2 . 50 × 10 6 2 . 50 × 10 6 a verag e accepta nce r ate 20 . 79% 45 . 4% 1 . 2 0% length t of the c h ain x t 8 . 01 × 10 7 2 . 50 × 10 6 mo ves x t 6 = x t − 1 1 . 13 × 10 6 2 . 96 × 10 4 27 Figure 5: Concrete Compre s sive Strength da ta set. F or details see Section 6.5 . (a) SMC ∼ 1 . 2 × 10 6 ev al’ns of π CONST C LG_C LG_C.x.C BLAST BLAST .x.C BLAST .x.LG_C F ASH F ASH.x.C F ASH.x.LG_C F ASH.x.BLAST W W .x.C W .x.LG_C W .x.BLAST W .x.F ASH LG_W LG_W .x.C LG_W .x.LG_C LG_W .x.BLAST LG_W .x.F ASH LG_W .x.W PLAST PLAST .x.C PLAST .x.LG_C PLAST .x.BLAST PLAST .x.F ASH PLAST .x.W PLAST .x.LG_W CA CA.x.C CA.x.LG_C CA.x.BLAST CA.x.F ASH CA.x.W CA.x.LG_W CA.x.PLAST LG_CA LG_CA.x.C LG_CA.x.LG_C LG_CA.x.BLAST LG_CA.x.F ASH LG_CA.x.W LG_CA.x.LG_W LG_CA.x.PLAST LG_CA.x.CA F A F A.x.C F A.x.LG_C F A.x.BLAST F A.x.F ASH F A.x.W F A.x.LG_W F A.x.PLAST F A.x.CA F A.x.LG_CA LG_F A LG_F A.x.C LG_F A.x.LG_C LG_F A.x.BLAST LG_F A.x.F ASH LG_F A.x.W LG_F A.x.LG_W LG_F A.x.PLAST LG_F A.x.CA LG_F A.x.LG_CA LG_F A.x.F A A GE A GE.x.C A GE.x.LG_C A GE.x.BLAST A GE.x.F ASH A GE.x.W A GE.x.LG_W A GE.x.PLAST A GE.x.CA A GE.x.LG_CA A GE.x.F A A GE.x.LG_F A LG_A GE LG_A GE.x.C LG_A GE.x.LG_C LG_A GE.x.BLAST LG_A GE.x.F ASH LG_A GE.x.W LG_A GE.x.LG_W LG_A GE.x.PLAST LG_A GE.x.CA LG_A GE.x.LG_CA LG_A GE.x.F A LG_A GE.x.LG_F A LG_A GE.x.A GE 0.0 0.2 0.4 0.6 0.8 1.0 (b) A MCMC 2 . 5 × 10 6 ev al’ns of π CONST C LG_C LG_C.x.C BLAST BLAST .x.C BLAST .x.LG_C F ASH F ASH.x.C F ASH.x.LG_C F ASH.x.BLAST W W .x.C W .x.LG_C W .x.BLAST W .x.F ASH LG_W LG_W .x.C LG_W .x.LG_C LG_W .x.BLAST LG_W .x.F ASH LG_W .x.W PLAST PLAST .x.C PLAST .x.LG_C PLAST .x.BLAST PLAST .x.F ASH PLAST .x.W PLAST .x.LG_W CA CA.x.C CA.x.LG_C CA.x.BLAST CA.x.F ASH CA.x.W CA.x.LG_W CA.x.PLAST LG_CA LG_CA.x.C LG_CA.x.LG_C LG_CA.x.BLAST LG_CA.x.F ASH LG_CA.x.W LG_CA.x.LG_W LG_CA.x.PLAST LG_CA.x.CA F A F A.x.C F A.x.LG_C F A.x.BLAST F A.x.F ASH F A.x.W F A.x.LG_W F A.x.PLAST F A.x.CA F A.x.LG_CA LG_F A LG_F A.x.C LG_F A.x.LG_C LG_F A.x.BLAST LG_F A.x.F ASH LG_F A.x.W LG_F A.x.LG_W LG_F A.x.PLAST LG_F A.x.CA LG_F A.x.LG_CA LG_F A.x.F A A GE A GE.x.C A GE.x.LG_C A GE.x.BLAST A GE.x.F ASH A GE.x.W A GE.x.LG_W A GE.x.PLAST A GE.x.CA A GE.x.LG_CA A GE.x.F A A GE.x.LG_F A LG_A GE LG_A GE.x.C LG_A GE.x.LG_C LG_A GE.x.BLAST LG_A GE.x.F ASH LG_A GE.x.W LG_A GE.x.LG_W LG_A GE.x.PLAST LG_A GE.x.CA LG_A GE.x.LG_CA LG_A GE.x.F A LG_A GE.x.LG_F A LG_A GE.x.A GE 0.0 0.2 0.4 0.6 0.8 1.0 (c) MCMC 2 . 5 × 10 6 ev al’ns of π CONST C LG_C LG_C.x.C BLAST BLAST .x.C BLAST .x.LG_C F ASH F ASH.x.C F ASH.x.LG_C F ASH.x.BLAST W W .x.C W .x.LG_C W .x.BLAST W .x.F ASH LG_W LG_W .x.C LG_W .x.LG_C LG_W .x.BLAST LG_W .x.F ASH LG_W .x.W PLAST PLAST .x.C PLAST .x.LG_C PLAST .x.BLAST PLAST .x.F ASH PLAST .x.W PLAST .x.LG_W CA CA.x.C CA.x.LG_C CA.x.BLAST CA.x.F ASH CA.x.W CA.x.LG_W CA.x.PLAST LG_CA LG_CA.x.C LG_CA.x.LG_C LG_CA.x.BLAST LG_CA.x.F ASH LG_CA.x.W LG_CA.x.LG_W LG_CA.x.PLAST LG_CA.x.CA F A F A.x.C F A.x.LG_C F A.x.BLAST F A.x.F ASH F A.x.W F A.x.LG_W F A.x.PLAST F A.x.CA F A.x.LG_CA LG_F A LG_F A.x.C LG_F A.x.LG_C LG_F A.x.BLAST LG_F A.x.F ASH LG_F A.x.W LG_F A.x.LG_W LG_F A.x.PLAST LG_F A.x.CA LG_F A.x.LG_CA LG_F A.x.F A A GE A GE.x.C A GE.x.LG_C A GE.x.BLAST A GE.x.F ASH A GE.x.W A GE.x.LG_W A GE.x.PLAST A GE.x.CA A GE.x.LG_CA A GE.x.F A A GE.x.LG_F A LG_A GE LG_A GE.x.C LG_A GE.x.LG_C LG_A GE.x.BLAST LG_A GE.x.F ASH LG_A GE.x.W LG_A GE.x.LG_W LG_A GE.x.PLAST LG_A GE.x.CA LG_A GE.x.LG_CA LG_A GE.x.F A LG_A GE.x.LG_F A LG_A GE.x.A GE 0.0 0.2 0.4 0.6 0.8 1.0 T able. Concrete Compressive Str ength data s et. Av erage d key indicators complementary to Figure 5 . Sequent ial MC Adaptiv e MCMC Standard MCMC computational time 0 : 29 : 01 min 2 : 02 : 06 min 0 : 43 : 17 min ev aluations of π 1 . 19 × 10 6 2 . 50 × 10 6 2 . 50 × 10 6 a verag e accepta nce r ate 30 . 7% 70 . 4% 7 . 20% length t of the c h ain x t 2 . 43 × 10 7 2 . 50 × 10 6 mo ves x t 6 = x t − 1 1 . 76 × 10 6 1 . 79 × 10 5 28 Figure 6: Concrete Co mpressive Strength da ta set with main effect restrictio ns. F o r details see Section 6.5 . (a) SMC ∼ 2 . 4 × 10 6 ev al’ns of π CONST C LG_C LG_C.x.C BLAST BLAST .x.C BLAST .x.LG_C F ASH F ASH.x.C F ASH.x.LG_C F ASH.x.BLAST W W .x.C W .x.LG_C W .x.BLAST W .x.F ASH LG_W LG_W .x.C LG_W .x.LG_C LG_W .x.BLAST LG_W .x.F ASH LG_W .x.W PLAST PLAST .x.C PLAST .x.LG_C PLAST .x.BLAST PLAST .x.F ASH PLAST .x.W PLAST .x.LG_W CA CA.x.C CA.x.LG_C CA.x.BLAST CA.x.F ASH CA.x.W CA.x.LG_W CA.x.PLAST LG_CA LG_CA.x.C LG_CA.x.LG_C LG_CA.x.BLAST LG_CA.x.F ASH LG_CA.x.W LG_CA.x.LG_W LG_CA.x.PLAST LG_CA.x.CA F A F A.x.C F A.x.LG_C F A.x.BLAST F A.x.F ASH F A.x.W F A.x.LG_W F A.x.PLAST F A.x.CA F A.x.LG_CA LG_F A LG_F A.x.C LG_F A.x.LG_C LG_F A.x.BLAST LG_F A.x.F ASH LG_F A.x.W LG_F A.x.LG_W LG_F A.x.PLAST LG_F A.x.CA LG_F A.x.LG_CA LG_F A.x.F A A GE A GE.x.C A GE.x.LG_C A GE.x.BLAST A GE.x.F ASH A GE.x.W A GE.x.LG_W A GE.x.PLAST A GE.x.CA A GE.x.LG_CA A GE.x.F A A GE.x.LG_F A LG_A GE LG_A GE.x.C LG_A GE.x.LG_C LG_A GE.x.BLAST LG_A GE.x.F ASH LG_A GE.x.W LG_A GE.x.LG_W LG_A GE.x.PLAST LG_A GE.x.CA LG_A GE.x.LG_CA LG_A GE.x.F A LG_A GE.x.LG_F A LG_A GE.x.A GE 0.0 0.2 0.4 0.6 0.8 1.0 (b) A MCMC 2 . 5 × 10 6 ev al’ns of π CONST C LG_C LG_C.x.C BLAST BLAST .x.C BLAST .x.LG_C F ASH F ASH.x.C F ASH.x.LG_C F ASH.x.BLAST W W .x.C W .x.LG_C W .x.BLAST W .x.F ASH LG_W LG_W .x.C LG_W .x.LG_C LG_W .x.BLAST LG_W .x.F ASH LG_W .x.W PLAST PLAST .x.C PLAST .x.LG_C PLAST .x.BLAST PLAST .x.F ASH PLAST .x.W PLAST .x.LG_W CA CA.x.C CA.x.LG_C CA.x.BLAST CA.x.F ASH CA.x.W CA.x.LG_W CA.x.PLAST LG_CA LG_CA.x.C LG_CA.x.LG_C LG_CA.x.BLAST LG_CA.x.F ASH LG_CA.x.W LG_CA.x.LG_W LG_CA.x.PLAST LG_CA.x.CA F A F A.x.C F A.x.LG_C F A.x.BLAST F A.x.F ASH F A.x.W F A.x.LG_W F A.x.PLAST F A.x.CA F A.x.LG_CA LG_F A LG_F A.x.C LG_F A.x.LG_C LG_F A.x.BLAST LG_F A.x.F ASH LG_F A.x.W LG_F A.x.LG_W LG_F A.x.PLAST LG_F A.x.CA LG_F A.x.LG_CA LG_F A.x.F A A GE A GE.x.C A GE.x.LG_C A GE.x.BLAST A GE.x.F ASH A GE.x.W A GE.x.LG_W A GE.x.PLAST A GE.x.CA A GE.x.LG_CA A GE.x.F A A GE.x.LG_F A LG_A GE LG_A GE.x.C LG_A GE.x.LG_C LG_A GE.x.BLAST LG_A GE.x.F ASH LG_A GE.x.W LG_A GE.x.LG_W LG_A GE.x.PLAST LG_A GE.x.CA LG_A GE.x.LG_CA LG_A GE.x.F A LG_A GE.x.LG_F A LG_A GE.x.A GE 0.0 0.2 0.4 0.6 0.8 1.0 (c) MCMC 2 . 5 × 10 6 ev al’ns of π CONST C LG_C LG_C.x.C BLAST BLAST .x.C BLAST .x.LG_C F ASH F ASH.x.C F ASH.x.LG_C F ASH.x.BLAST W W .x.C W .x.LG_C W .x.BLAST W .x.F ASH LG_W LG_W .x.C LG_W .x.LG_C LG_W .x.BLAST LG_W .x.F ASH LG_W .x.W PLAST PLAST .x.C PLAST .x.LG_C PLAST .x.BLAST PLAST .x.F ASH PLAST .x.W PLAST .x.LG_W CA CA.x.C CA.x.LG_C CA.x.BLAST CA.x.F ASH CA.x.W CA.x.LG_W CA.x.PLAST LG_CA LG_CA.x.C LG_CA.x.LG_C LG_CA.x.BLAST LG_CA.x.F ASH LG_CA.x.W LG_CA.x.LG_W LG_CA.x.PLAST LG_CA.x.CA F A F A.x.C F A.x.LG_C F A.x.BLAST F A.x.F ASH F A.x.W F A.x.LG_W F A.x.PLAST F A.x.CA F A.x.LG_CA LG_F A LG_F A.x.C LG_F A.x.LG_C LG_F A.x.BLAST LG_F A.x.F ASH LG_F A.x.W LG_F A.x.LG_W LG_F A.x.PLAST LG_F A.x.CA LG_F A.x.LG_CA LG_F A.x.F A A GE A GE.x.C A GE.x.LG_C A GE.x.BLAST A GE.x.F ASH A GE.x.W A GE.x.LG_W A GE.x.PLAST A GE.x.CA A GE.x.LG_CA A GE.x.F A A GE.x.LG_F A LG_A GE LG_A GE.x.C LG_A GE.x.LG_C LG_A GE.x.BLAST LG_A GE.x.F ASH LG_A GE.x.W LG_A GE.x.LG_W LG_A GE.x.PLAST LG_A GE.x.CA LG_A GE.x.LG_CA LG_A GE.x.F A LG_A GE.x.LG_F A LG_A GE.x.A GE 0.0 0.2 0.4 0.6 0.8 1.0 T able. Concrete Compressive Str ength data s et with main effect restrictions. Averaged key indicators co mplementary to Figure 6 . Sequent ial MC Adaptiv e MCMC Standard MCMC computational time 0 : 43 : 01 min 2 : 29 : 16 min 0 : 41 : 48 min ev aluations of π 2 . 42 × 10 6 2 . 50 × 10 6 2 . 50 × 10 6 a verag e accepta nce r ate 30 . 98% 61 . 1% 5 . 3 1% length t of the c h ain x t 2 . 72 × 10 7 2 . 50 × 10 6 mo ves x t 6 = x t − 1 1 . 53 × 10 6 1 . 32 × 10 5 29 Figure 7: Protein data s et. F or details see Section 6.5 . (a) SMC ∼ 6 . 1 × 10 5 ev al’ns of π CONST B UFMES B UFPO4 B UFTRS PH PH.x.PH PH.x.B UFMES PH.x.B UFPO4 PH.x.B UFTRS NA CL NA CL.x.NA CL NA CL.x.B UFMES NA CL.x.B UFPO4 NA CL.x.B UFTRS NA CL.x.PH CON CON.x.CON CON.x.B UFMES CON.x.B UFPO4 CON.x.B UFTRS CON.x.PH CON.x.NA CL RABME RABME.x.B UFMES RABME.x.B UFPO4 RABME.x.B UFTRS RABME.x.PH RABME.x.NA CL RABME.x.CON RADTT RADTT .x.B UFMES RADTT .x.B UFPO4 RADTT .x.B UFTRS RADTT .x.PH RADTT .x.NA CL RADTT .x.CON DETG DETG.x.B UFMES DETG.x.B UFPO4 DETG.x.B UFTRS DETG.x.PH DETG.x.NA CL DETG.x.CON DETG.x.RABME DETG.x.RADTT DETN DETN.x.B UFMES DETN.x.B UFPO4 DETN.x.B UFTRS DETN.x.PH DETN.x.NA CL DETN.x.CON DETN.x.RABME DETN.x.RADTT DETT DETT .x.B UFMES DETT .x.B UFPO4 DETT .x.B UFTRS DETT .x.PH DETT .x.NA CL DETT .x.CON DETT .x.RABME DETT .x.RADTT MGCL2 MGCL2.x.B UFMES MGCL2.x.B UFPO4 MGCL2.x.B UFTRS MGCL2.x.PH MGCL2.x.NA CL MGCL2.x.CON MGCL2.x.RABME MGCL2.x.RADTT MGCL2.x.DETG MGCL2.x.DETN MGCL2.x.DETT TEMP TEMP.x.TEMP TEMP.x.B UFMES TEMP.x.B UFPO4 TEMP.x.B UFTRS TEMP.x.PH TEMP.x.NA CL TEMP.x.CON TEMP.x.RABME TEMP.x.RADTT TEMP.x.DETG TEMP.x.DETN TEMP.x.DETT TEMP.x.MGCL2 0.0 0.2 0.4 0.6 0.8 1.0 (b) A MCMC 2 . 5 × 10 6 ev al’ns of π CONST B UFMES B UFPO4 B UFTRS PH PH.x.PH PH.x.B UFMES PH.x.B UFPO4 PH.x.B UFTRS NA CL NA CL.x.NA CL NA CL.x.B UFMES NA CL.x.B UFPO4 NA CL.x.B UFTRS NA CL.x.PH CON CON.x.CON CON.x.B UFMES CON.x.B UFPO4 CON.x.B UFTRS CON.x.PH CON.x.NA CL RABME RABME.x.B UFMES RABME.x.B UFPO4 RABME.x.B UFTRS RABME.x.PH RABME.x.NA CL RABME.x.CON RADTT RADTT .x.B UFMES RADTT .x.B UFPO4 RADTT .x.B UFTRS RADTT .x.PH RADTT .x.NA CL RADTT .x.CON DETG DETG.x.B UFMES DETG.x.B UFPO4 DETG.x.B UFTRS DETG.x.PH DETG.x.NA CL DETG.x.CON DETG.x.RABME DETG.x.RADTT DETN DETN.x.B UFMES DETN.x.B UFPO4 DETN.x.B UFTRS DETN.x.PH DETN.x.NA CL DETN.x.CON DETN.x.RABME DETN.x.RADTT DETT DETT .x.B UFMES DETT .x.B UFPO4 DETT .x.B UFTRS DETT .x.PH DETT .x.NA CL DETT .x.CON DETT .x.RABME DETT .x.RADTT MGCL2 MGCL2.x.B UFMES MGCL2.x.B UFPO4 MGCL2.x.B UFTRS MGCL2.x.PH MGCL2.x.NA CL MGCL2.x.CON MGCL2.x.RABME MGCL2.x.RADTT MGCL2.x.DETG MGCL2.x.DETN MGCL2.x.DETT TEMP TEMP.x.TEMP TEMP.x.B UFMES TEMP.x.B UFPO4 TEMP.x.B UFTRS TEMP.x.PH TEMP.x.NA CL TEMP.x.CON TEMP.x.RABME TEMP.x.RADTT TEMP.x.DETG TEMP.x.DETN TEMP.x.DETT TEMP.x.MGCL2 0.0 0.2 0.4 0.6 0.8 1.0 (c) MCMC 2 . 5 × 10 6 ev al’ns of π CONST B UFMES B UFPO4 B UFTRS PH PH.x.PH PH.x.B UFMES PH.x.B UFPO4 PH.x.B UFTRS NA CL NA CL.x.NA CL NA CL.x.B UFMES NA CL.x.B UFPO4 NA CL.x.B UFTRS NA CL.x.PH CON CON.x.CON CON.x.B UFMES CON.x.B UFPO4 CON.x.B UFTRS CON.x.PH CON.x.NA CL RABME RABME.x.B UFMES RABME.x.B UFPO4 RABME.x.B UFTRS RABME.x.PH RABME.x.NA CL RABME.x.CON RADTT RADTT .x.B UFMES RADTT .x.B UFPO4 RADTT .x.B UFTRS RADTT .x.PH RADTT .x.NA CL RADTT .x.CON DETG DETG.x.B UFMES DETG.x.B UFPO4 DETG.x.B UFTRS DETG.x.PH DETG.x.NA CL DETG.x.CON DETG.x.RABME DETG.x.RADTT DETN DETN.x.B UFMES DETN.x.B UFPO4 DETN.x.B UFTRS DETN.x.PH DETN.x.NA CL DETN.x.CON DETN.x.RABME DETN.x.RADTT DETT DETT .x.B UFMES DETT .x.B UFPO4 DETT .x.B UFTRS DETT .x.PH DETT .x.NA CL DETT .x.CON DETT .x.RABME DETT .x.RADTT MGCL2 MGCL2.x.B UFMES MGCL2.x.B UFPO4 MGCL2.x.B UFTRS MGCL2.x.PH MGCL2.x.NA CL MGCL2.x.CON MGCL2.x.RABME MGCL2.x.RADTT MGCL2.x.DETG MGCL2.x.DETN MGCL2.x.DETT TEMP TEMP.x.TEMP TEMP.x.B UFMES TEMP.x.B UFPO4 TEMP.x.B UFTRS TEMP.x.PH TEMP.x.NA CL TEMP.x.CON TEMP.x.RABME TEMP.x.RADTT TEMP.x.DETG TEMP.x.DETN TEMP.x.DETT TEMP.x.MGCL2 0.0 0.2 0.4 0.6 0.8 1.0 T able. Protein data set. Averaged key indicator s complementary to Fig ure 7 . Sequent ial MC Adaptiv e MCMC Standard MCMC computational time 0 : 14 : 55 min 3 : 58 : 32 min 0 : 29 : 38 min ev aluations of π 6 . 17 × 10 5 2 . 50 × 10 6 2 . 50 × 10 6 a verag e accepta nce r ate 30 . 7% 60 . 7% 1 . 20% length t of the c h ain x t 9 . 19 × 10 7 2 . 50 × 10 6 mo ves x t 6 = x t − 1 1 . 51 × 10 6 3 . 03 × 10 5 30 Figure 8: Protein data s et with main effect restrictions. F or deta ils see Section 6.5 . (a) SMC ∼ 6 . 1 × 10 5 ev al’ns of π CONST B UFMES B UFPO4 B UFTRS PH PH.x.PH PH.x.B UFMES PH.x.B UFPO4 PH.x.B UFTRS NA CL NA CL.x.NA CL NA CL.x.B UFMES NA CL.x.B UFPO4 NA CL.x.B UFTRS NA CL.x.PH CON CON.x.CON CON.x.B UFMES CON.x.B UFPO4 CON.x.B UFTRS CON.x.PH CON.x.NA CL RABME RABME.x.B UFMES RABME.x.B UFPO4 RABME.x.B UFTRS RABME.x.PH RABME.x.NA CL RABME.x.CON RADTT RADTT .x.B UFMES RADTT .x.B UFPO4 RADTT .x.B UFTRS RADTT .x.PH RADTT .x.NA CL RADTT .x.CON DETG DETG.x.B UFMES DETG.x.B UFPO4 DETG.x.B UFTRS DETG.x.PH DETG.x.NA CL DETG.x.CON DETG.x.RABME DETG.x.RADTT DETN DETN.x.B UFMES DETN.x.B UFPO4 DETN.x.B UFTRS DETN.x.PH DETN.x.NA CL DETN.x.CON DETN.x.RABME DETN.x.RADTT DETT DETT .x.B UFMES DETT .x.B UFPO4 DETT .x.B UFTRS DETT .x.PH DETT .x.NA CL DETT .x.CON DETT .x.RABME DETT .x.RADTT MGCL2 MGCL2.x.B UFMES MGCL2.x.B UFPO4 MGCL2.x.B UFTRS MGCL2.x.PH MGCL2.x.NA CL MGCL2.x.CON MGCL2.x.RABME MGCL2.x.RADTT MGCL2.x.DETG MGCL2.x.DETN MGCL2.x.DETT TEMP TEMP.x.TEMP TEMP.x.B UFMES TEMP.x.B UFPO4 TEMP.x.B UFTRS TEMP.x.PH TEMP.x.NA CL TEMP.x.CON TEMP.x.RABME TEMP.x.RADTT TEMP.x.DETG TEMP.x.DETN TEMP.x.DETT TEMP.x.MGCL2 0.0 0.2 0.4 0.6 0.8 1.0 (b) A MCMC 2 . 5 × 10 6 ev al’ns of π CONST B UFMES B UFPO4 B UFTRS PH PH.x.PH PH.x.B UFMES PH.x.B UFPO4 PH.x.B UFTRS NA CL NA CL.x.NA CL NA CL.x.B UFMES NA CL.x.B UFPO4 NA CL.x.B UFTRS NA CL.x.PH CON CON.x.CON CON.x.B UFMES CON.x.B UFPO4 CON.x.B UFTRS CON.x.PH CON.x.NA CL RABME RABME.x.B UFMES RABME.x.B UFPO4 RABME.x.B UFTRS RABME.x.PH RABME.x.NA CL RABME.x.CON RADTT RADTT .x.B UFMES RADTT .x.B UFPO4 RADTT .x.B UFTRS RADTT .x.PH RADTT .x.NA CL RADTT .x.CON DETG DETG.x.B UFMES DETG.x.B UFPO4 DETG.x.B UFTRS DETG.x.PH DETG.x.NA CL DETG.x.CON DETG.x.RABME DETG.x.RADTT DETN DETN.x.B UFMES DETN.x.B UFPO4 DETN.x.B UFTRS DETN.x.PH DETN.x.NA CL DETN.x.CON DETN.x.RABME DETN.x.RADTT DETT DETT .x.B UFMES DETT .x.B UFPO4 DETT .x.B UFTRS DETT .x.PH DETT .x.NA CL DETT .x.CON DETT .x.RABME DETT .x.RADTT MGCL2 MGCL2.x.B UFMES MGCL2.x.B UFPO4 MGCL2.x.B UFTRS MGCL2.x.PH MGCL2.x.NA CL MGCL2.x.CON MGCL2.x.RABME MGCL2.x.RADTT MGCL2.x.DETG MGCL2.x.DETN MGCL2.x.DETT TEMP TEMP.x.TEMP TEMP.x.B UFMES TEMP.x.B UFPO4 TEMP.x.B UFTRS TEMP.x.PH TEMP.x.NA CL TEMP.x.CON TEMP.x.RABME TEMP.x.RADTT TEMP.x.DETG TEMP.x.DETN TEMP.x.DETT TEMP.x.MGCL2 0.0 0.2 0.4 0.6 0.8 1.0 (c) MCMC 2 . 5 × 10 6 ev al’ns of π CONST B UFMES B UFPO4 B UFTRS PH PH.x.PH PH.x.B UFMES PH.x.B UFPO4 PH.x.B UFTRS NA CL NA CL.x.NA CL NA CL.x.B UFMES NA CL.x.B UFPO4 NA CL.x.B UFTRS NA CL.x.PH CON CON.x.CON CON.x.B UFMES CON.x.B UFPO4 CON.x.B UFTRS CON.x.PH CON.x.NA CL RABME RABME.x.B UFMES RABME.x.B UFPO4 RABME.x.B UFTRS RABME.x.PH RABME.x.NA CL RABME.x.CON RADTT RADTT .x.B UFMES RADTT .x.B UFPO4 RADTT .x.B UFTRS RADTT .x.PH RADTT .x.NA CL RADTT .x.CON DETG DETG.x.B UFMES DETG.x.B UFPO4 DETG.x.B UFTRS DETG.x.PH DETG.x.NA CL DETG.x.CON DETG.x.RABME DETG.x.RADTT DETN DETN.x.B UFMES DETN.x.B UFPO4 DETN.x.B UFTRS DETN.x.PH DETN.x.NA CL DETN.x.CON DETN.x.RABME DETN.x.RADTT DETT DETT .x.B UFMES DETT .x.B UFPO4 DETT .x.B UFTRS DETT .x.PH DETT .x.NA CL DETT .x.CON DETT .x.RABME DETT .x.RADTT MGCL2 MGCL2.x.B UFMES MGCL2.x.B UFPO4 MGCL2.x.B UFTRS MGCL2.x.PH MGCL2.x.NA CL MGCL2.x.CON MGCL2.x.RABME MGCL2.x.RADTT MGCL2.x.DETG MGCL2.x.DETN MGCL2.x.DETT TEMP TEMP.x.TEMP TEMP.x.B UFMES TEMP.x.B UFPO4 TEMP.x.B UFTRS TEMP.x.PH TEMP.x.NA CL TEMP.x.CON TEMP.x.RABME TEMP.x.RADTT TEMP.x.DETG TEMP.x.DETN TEMP.x.DETT TEMP.x.MGCL2 0.0 0.2 0.4 0.6 0.8 1.0 T able. Protein data set with main effect r estrictions. Av er aged key indica tors complementary to Figure 8 . Sequent ial MC Adaptiv e MCMC Standard MCMC computational time 0 : 14 : 45 min 3 : 32 : 06 min 0 : 30 : 21 min ev aluations of π 6 . 19 × 10 5 2 . 50 × 10 6 2 . 50 × 10 6 a verag e accepta nce r ate 26 . 65% 22 . 3% 1 . 2 0% length t of the c h ain x t 1 . 07 × 10 8 2 . 50 × 10 6 mo ves x t 6 = x t − 1 5 . 56 × 10 6 3 . 03 × 10 5 31 References Alb ert, A. and An derson, J. A. (1984). O n the existence of maximum lik eliho o d estimates in logistic regression mo dels. Biometrika , (72 ):1–10. Andrieu, C. and T homs, J . (2008) . A tutorial on ad ap tive MCMC. Statist ics and Computing , 18(4):34 3–373. Bahadur, R. (1961). A repr esen tation of the joint d istribution of resp onses to n dic h oto- mous items. In Solomon, H., editor, Studies in Item A nalysis and Pr e diction , p ages pp. 158–68 . Stanford Univ er s it y Press. Bottolo , L. and Richardson, S. (2010). Evolutionary sto c hastic searc h for Ba yesia n mo del exploration. Bayesian Analy sis , 5(3):583– 618. Capp´ e, O., Douc, R., Guillin, A., Marin, J., and Rob ert, C. (2008). Adaptiv e imp ortance sampling in general mixture classes. Statistics and Computing , 18(4):44 7–459. Carp enter, J., C lifford, P ., and F earnhead, P . (1999) . Imp ro ved P article Filter for non- linear problems. IEE Pr o c. R adar, Sonar Navigation , 146(1 ):2–7. Chopin, N. (2002). A sequen tial particle filter metho d for stat ic mo dels. Biometrika , 89(3): 539. Clyde, M., Ghosh, J., and Littman, M. (20 11). Ba y esian adaptiv e sampling for v ariable selection and m o del a ve raging. Journal of Computational and Gr aphic al Sta tistics , 20(1): 80–101 . Clyde, M. and P armigiani, G. (1998). Protein construct storage: Ba yesia n v ariable selec- tion and prediction with mixtures. Journal of biopharmac eutic al statistics , 8(3):431 . Co x, D. (1972). The analysis of multiv ariate bin ary data. Applie d Statistics , p ages 113–1 20. Co x, D. and W erm u th , N. (19 94). A note on th e qu adratic exp onen tial binary d istr ibu- tion. Biometrika , 81(2):403 –408. Co x, D. and W erm u th, N. (2002). On some m o dels for multiv ariate bin ary v ariables par- allel in complexit y with the m u ltiv ariate Gaussian distribution. Biometrika , 89(2):462. Del Moral, P ., Doucet, A., and J asra, A. (2006). Sequen tial Mont e Carlo samplers. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) , 68(3 ):411– 436. Dongarra, J., Moler, C., Bunc h, J., and Stew art, G. (197 9). LINP ACK: use rs’ gui de . So ciet y for Indu strial and Ap plied Mathematics. Emric h, L. and Piedmon te, M. (199 1). A metho d for generating high-dimens ional mul- tiv ariate binary v ariates. The Americ an Statistician , 45:302–30 4. F earnhead, P . and Clifford, P . (2003). Onlin e inf er en ce for hidden Mark o v mo d els via particle filters. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho d- olo gy) , 65(4):8 87–899 . 32 Firth, D. (1993). Bias reduction of maxim um lik eliho o d estimate s. Biometrika , (80):27– 38. Gelman, A. and Meng, X. (1998). S im u lating normalizing constan ts: F rom imp ortance sampling to bridge sampling to path samplin g. Statistic al Sci enc e , 13(2) :163–18 5. Genest, C . and Nesleho v a, J. (2007 ). A pr imer on copulas for count data. Astin Bul letin , 37(2): 475. George, E. I. and McCullo c h, R. E. (19 97). Ap proac h es for Ba yesian v ariable selection. Statistic a Sinic a , (7):33 9–373 . Gilks, W. and Berzuini, C. (2001). F ollo wing a mo ving target — Mon te Carlo infer - ence for dynamic Ba y esian mo dels. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) , 63(1):12 7–146 . Gordon, N. J., S almond, D. J., and Smith, A. F. M. (1993). Nov el appr oac h to nonlinear/non-Gaussian Ba y esian state estimation. IEE Pr o c. R adar , Sonar Navi- gation , 140(2 ):107–1 13. Harrison, D. and Ru binfeld, D. L. (1978). Hedonic h ou s ing prices and the demand for clean air. Journal of Envir onmental Ec onomics and Management , 5(1) :81–102 . Jasra, A., S tephens, D., Doucet, A., and Tsagaris, T. (2008). Inference for L´ evy-Driv en Sto c h astic V olatilit y Mo dels via Adaptive Sequent ial Mon te Carlo. Sc andinavia n Jour- nal of Statistics . Jo e, H. (1996). F amilies of m-v ariate distr ib utions with giv en margins and m (m-1)/2 biv ariate dep endence parameters. L e ctur e Notes-Mono gr aph Series , 28:1 20–14 1. Kitaga w a, G. (1996 ). Mon te Carlo fi lter and smo other for non-Gaussian nonlin ear state space mo dels. Journal of c omputationa l and gr aphic al statistics , 5(1):1–25 . Kong, A., Liu, J. S ., and W ong, W. H. (1994 ). Sequent ial imputation and Ba y esian missing data problems. Journal of the Amer ic an Statistic al A sso ciation , 89:278– 288. Lee, A. (1993). Generati ng Random Binary Deviates Ha ving Fixed Marginal Distribu - tions and Sp ecified Degrees of Asso ciation. The Americ an Statistician , 47(3). Lee, A., Y au, C., Giles, M., Doucet, A., and Holmes, C. (2010). O n the utilit y of grap h ics cards to p erform massiv ely parallel sim u lation of adv anced Monte Carlo m etho ds. Journal of Computa tional and Gr aphic al Statistics , 19(4): 769–78 9. Leisc h, F., W eingessel, A., and Horn ik, K. (1998 ). On the generation of correlated artificial binary data. T ec hnical rep ort, WU Vienna Universit y of Economics and Business. Liang, F. and W ong, W. (2000). Evolutio nary Mon te Carlo: Ap p lications to C p mo d el sampling and c h ange p oin t problem. Statistic a Sinic a , 10(2) :317–34 2. Liu, J. (1996). P esku n’s theorem and a mo d ified discrete-state Gibbs sampler. Biometrika , 83(3):68 1–682. 33 Liu, J. and Chen, R. (1998). Sequ ential Monte Carlo metho ds for d ynamic systems. Journal of the Americ an Statistic al Asso ciation , 93(443):10 32–1044. Lunn, A. and Da v ies, S. (1998) . A note on generating correlated binary v ariables. Biometrika , 85(2):48 7–490. Neal, R. (2001). Annealed imp ortance samp lin g. Statistics and Computing , 11(2 ):125– 139. Nelsen, R. (2006). An intr o duction to c opulas . Sp r inger V erlag . Nott, D. and Kohn, R. (2005). Adaptiv e sampling for Ba yesian v ariable selection. Biometrika , 92(4):74 7. Oman, S. and Zuck er, D. (2001). Mo d elling and generating correlated binary v ariables. Biometrika , 88(1):28 7. P ark, C., P ark, T ., and Sh in, D. (1996). A simple metho d for generating correlated binary v ariates. The Americ an Statistician , 50(4). Qaqish, B. (20 03). A family of m u ltiv ariate binary distributions for sim u lating corre- lated b inary v ariables with sp ecified m arginal means and correlatio ns. Biometrika , 90(2): 455. Rob ert, C. and Casella, G. (2004). M onte Carlo statistic al m etho ds . Spr inger V erlag. Sc h ¨ afer, C. (201 0). P arametric f amilies on binary spaces. T ec hnical r ep ort, CREST. Sc hw arz, G. (1978). Estimating the dimension of a mo del. The Anna ls of Statistics , (6):46 1–464. Suc hard, M., Holmes, C., and W est, M. (2010). Some of the What?, Wh y?, Ho w?, Who? and Where? of Gr aphics Pro cessing Unit C omputing for Ba yesia n Analysis. In et al. Oxford Univ er s it y Press. Bernardo, J. M., editor, Bayesian Statistics 9 . Tibshirani, R. (1996). Regression sh rink age and selectio n via th e lasso. Journal o f the R oyal Statistic al So ciety. Series B (Metho dolo gic al) , 58(1):267– 288. Y eh, I. (1998). Mo deling of strength of high-p erf ormance concrete using artificial n eural net works. Cement and Concr e te r ese ar ch , 28(12):1 797–18 08. 34 T able 1: Boston Housing da ta summary . short name explanatio n crim p er capita crime zn prop ortions of residential land zoned for lots o ver 2323 m 2 indus prop ortions of non-retail bu siness acres chas tract b orders Charles Rive r (binary ) nox nitric o xides concen tration (parts p er 10 7 ) rm a verag e n u m b ers of ro oms p er d w elling age prop ortions of o wn er-o ccupied units built prior to 1940 dis w eigh ted distances to five Boston emplo yment cen tres rad accessibilit y to radial high w a ys t ax full-v alue prop ert y-tax r ate p er USD 10 4 ptra tio pupil-teac her ratios b (Bk − 0 . 63) 2 where Bk is the pr op ortion of the blac k p opulation lst a t p ercenta ge of lo wer status p opu lation T able 2: Concrete Compr essive Strength data s ummary . Co mpo nent s are measure d a s kg/m 3 . short name explanation c , lg c cemen t blast blast furn ace s lag f ash fly ash w , lg w w ater plast sup erplasticizer ca , lg ca coarse aggregate f a , lg f a fi ne aggrega te age , lg age age in d a ys T able 3: Protein ac tiv ity data summary . short name explanatio n det detergen t buf pH buffer NaCl salt con p rotein concen tration ra reducing agen t MgCl2 magnesium chloride temp temp erature 35

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment