Note on the computation of the Metropolis-Hastings ratio for Birth-or-Death moves in trans-dimensional MCMC algorithms for signal decomposition problems

T E C H N I C A L R E P O R T Note on the computatio n of the Metrop olis-Hastings ratio for Birth-or-Death mov es in trans-dimensional MCMC algorithms for signal decom position problem s Alireza Roodaki ∗ , † , ‡ Julien Bect ∗ , § Gilles Fleury § E3S — S U P E L E C S Y S T E M S S C I E N C E S Department of Signal Processing and Electronic Systems S U P E L E C, Gif-sur-Yve tte, France. Abstract Re versible jump MCMC (RJ-MCMC) sampling techniques, which allow to jointly tackle model selection and parameter estimation problems in a co herent Bayesian frame work, have become increasingly popular in the signal p rocessing literatur e since the s eminal p aper of An- drieu and Doucet ( IEEE T rans. Signa l Pr ocess. , 47(10 ), 1999). Crucial to the implementation of any RJ-MCMC samp ler is the computation of the so-ca lled Metro polis-Hastings-Green (MHG) ratio, which determines the acceptance probab ility for the pr oposed mov es. It turns ou t that the expr ession of the MHG r atio th at was gi ven in th e paper of Andrieu and Dou cet for “Birth-or-Death” moves —the simplest kind of trans-d imensional move, used in virtually a ll applications of RJ-MCMC to sign al d ecomposition problems—was erro neous. Unfortu nately , this mistake has been reprod uced in m any su bsequent papers dealing with R J- MCMC sampling in the signal processing literature. This note discusses th e co mputation o f the MHG ratio, with a focus on the case where the proposal kernel can be decomp osed as a mixture of simpler kernels, for which the MHG ratio is easy to com pute. W e provide suf ﬁcient condition s un der which the MHG ratio of the mixture can be deduced from the MHG ratios of the elementary kernels of which it is composed . As an a pplication, we con sider the case of Birth-o r -Death moves, and provide a corrected expression for the erroneous ratio in the paper of Andrieu and Doucet. 1 Introd uction Model sel ection and p arameter esti mation are fundamen tal tasks arisin g in man y (if n ot all) si gnal proces sing problems, when parametric models are employed . Let us conside r a collectio n of models {M k , k ∈ K} , index ed by so me ﬁnite or coun table set K ⊂ N , with parameter v ecto r θ k ∈ Θ k ⊂ R n k under model M k . In a Bayesian framew ork, model selectio n (or av eragin g) an d paramete r estimation can in principle be carried out jointly , u sing the posterior distrib utio n of the pair ( k , θ k ) , π ( k , θ k ) ∝ p ( y | k , θ k ) p ( k , θ k ) , (1) ∗ Corresponding authors. † The results presented here are also part of the PhD thesis of the ﬁrst author [ 39 , Chapter 1]. ‡ Email: alireza4702 @gmail.com. § Email: { julien.bect, gilles.ﬂeury } @supelec.fr 1 where y is the observed data and ∝ indica tes proporti onality . Note that the distrib ution π is deﬁned on X = S k ∈K { k } × Θ k , which is a disj oint unio n of subspac es with differ ing d imen- sional ity . Generic M ark o v Chain Monte Carlo (MCM C) methods for probabi lity distri b ution s deﬁned on such spaces became av ailable during the 90’ s, most notably Green’ s w idel y applicab le RJ-MCMC sa mpler [ 17 ], making it possible to use a ful ly Bayesian approach for model selec- tion (or av eragi ng) an d parameter estimation in all sorts of applicatio ns. The reader is referre d to [ 4 , 9 , 18 , 37 , 4 5 ] for a broad er view on tr ans-di mension al sampl ing techniqu es (includin g alterna- ti ve s to the RJ-MCMC sampler). Green’ s RJ-MCMC samp ler can be seen as a gener alizati on of the well-kn o wn Metropo lis- Hastings sample r [ 19 , 28 ], which is c apable of exp loring not only t he ﬁxed -dimensi onal para m- eter spaces Θ k , b ut also the space K of all models u nder con sidera tion. This algorithm relies on an accep t/rejec t mechan ism, with an ac ceptan ce rati o calibrate d in such a way that the in v ariant distrib ution of the ch ain is the targ et dis trib ution π . The co mputatio n of this acceptance ratio for trans-d imensio nal moves is in general a de licate issue 1 , in volv ing measure theore tic consider a- tions. Andrieu and Doucet [ 1 ] pioneere d the u se of RJ-MCMC sampling in “signal decomposition ” proble ms, by tackling joint mo del selectio n and parameter est imation for an unkno wn number of sinuso idal signal s observe d in white Gaus sian noi se. (At the same period, RJ-MCMC also be- came po pular for image p rocess ing tasks such as se gmentat ion and object recog nition ; see, e.g., [ 11 , 21 , 32 , 3 3 , 41 ].) T his seminal pap ers was follo w ed by many others in the sign al proce ssing literat ure [ 3 , 5 , 6 , 10 , 20 , 25 – 27 , 30 , 31 , 40 , 42 , 43 ], re lying systematic ally on the original paper [ 1 ] for th e co mputatio n of the acceptance ratio o f “Birth-or -Death” mov es—the most elementary typ e of trans-di mension al mov e, which either adds or remov es a component from the signal decompo- sition . Unfor tunate ly , the expressi on of the accep tance rat io for Birth -or -Death mov es prov ided by [ 1 , Equation (20)] t urns out to be erroneo us, as will be explaine d late r . W orse, the e xac t same mistak e has been reproduce d in most of the follo wing papers, referred to abov e. The aim of this note is to provide clear statements of some mathematic al results , perhap s not completely ne w b ut nev er stated explic itly , w hich can be used for a clean justiﬁcatio n of the accepta nce ratio of Birth-or -Death moves in signal decompositi on (and similar) problems. Section 2 recalls , v ery quickly , the basics of MCMC methods , with a focus on Metropolis-Hast ings algori thms on genera l state spaces ( also kn o wn as RJ-MCMC alg orithms) . S ecti on 3 discus ses the computa tion of the acceptance ratio for mixture kernel s, and provid es conditio ns under which the ratio of the mixture can be di rectly deri ved from the ratio of the elementary k ernel s of which it is composed. Section 4 deﬁnes Birth-or -Death mov es and prov ides the express ion of the ratio; se ve ral distinct b ut related m athe matical representati ons—“u nsorted vectors ”, “sorted vectors” and Point processes —are disc ussed. As an illustratio n, Section 5 returns to the problem conside red in [ 1 ] and provi des a corrected expressi on for the Birth-o r -Death ratio. Sec tion 6 conclude s the paper . 2 Backgrou nd on MCMC methods This section recalls bas ic deﬁni tions and resu lts fo r the MCMC method. The reader is ref erred to [ 16 – 18 , 29 , 37 , 38 , 48 , 49 ] for more detail ed explana tions. 1 Fortunately , the simple and po werful “d imension matching” argu ment [ 17 ] allo ws to bypass this difﬁculty for a large c lass of proposal distributions. 2 2.1 MCMC with r eve rsible ker nels Let π be a pro babil ity distri b ution on a m easu rable space ( X , B ) , which is to be sampled from. MCMC sa mpling meth ods pro ceed by constru cting a time-h omogene ous Markov c hain ( x n ) with in va riant distrib ution π , using a tra nsitio n k ernel P that is r ever sibl e with respect to π , i.e., a kernel that satisﬁes the detai led balance condi tion π (d x ) P  x , d x ′  = π  d x ′  P  x ′ , d x  . (2) For a ll measurabl e sets A ∈ B , integra ting ( 2 ) on X × A yields Z X π (d x ) P ( x , A ) = π ( A ) , which means that π is an in vari ant distrib utio n for the ker nel P (it is also said that “ P lea ves π in va riant” ). If the transition kernel P is π -irreducib le and aperio dic, then [ 48 , Theorem 1] π is the unique in va riant dist rib utio n and the chain con ver ges in total vari ation to π for π -almos t all starti ng states x . If P is also Harris recurre nt, then con ver gence occurs for all initial distri b ution s [ 37 , Theorem 6.51]. Remark Some of the abov e requiremen ts on the chain ( x n ) can be relaxe d. Most notably , t ime- inhomog eneou s chai ns are used in the context of “adapti v e MCMC” algori thms; se e, e.g., [ 2 , 7 ] and the referen ces therein . It is also possib le to depart from the re ver sibilit y assumption , which is a sufﬁcie nt b ut no t nec essary c onditi on for π to be an in vari ant dist rib utio n (see, e.g., [ 13 ]), th ough the vast majority of M CMC algorithms c onside red in the lite rature ar e based on re versible k ernels. 2.2 Metr opolis-Hastings-Green k er nels The v ery popular Met ropoli s-Hasting s-Green ke rnels, sometimes simply ca lled Metro polis- Hastings kernels , corre spond to the follo wing two- stage sampling pr ocedur e: ﬁrst, gi ven that t he curren t stat e of the Mar ko v chain is x ∈ X , a new state x ′ ∈ X is prop osed from a transitio n ker nel Q ( x , d x ′ ) ; secon d, this move is accep ted with probabil ity α ( x , x ′ ) a nd rejected othe rwise—in which c ase the new state is equa l to x . More fo rmally , for all x ∈ X and B ∈ B , the tran sition ker nel is giv en by P ( x , B ) = Z B Q  x , d x ′  α  x , x ′  + s ( x ) 1 B ( x ) , (3) where 1 B denote s the indicator function of B , and s ( x ) = Z X Q  x , d x ′   1 − α  x , x ′  is the probabil ity of rejection at x . It is easily seen that the detailed balance condition ( 2 ) holds if and only if [ 17 , 48 , 49 ] π (d x ) Q  x , d x ′  α  x , x ′  = π  d x ′  Q  x ′ , d x  α  x ′ , x  . (4) This is achi e ved , for instance , by the acceptanc e probabili ty α  x , x ′  = min  1 , r  x , x ′  , (5) where r ( x , x ′ ) d enotes the Metropol is-Hastin gs-Green (MHG) rat io r  x , x ′  = π (d x ′ ) Q ( x ′ , d x ) π (d x ) Q ( x , d x ′ ) · (6) 3 The righ t-hand side of ( 6 ) is the Radon-Nyko dim deri vati ve of π (d x ′ ) Q ( x ′ , d x ) with respect to π ( d x ) Q ( x , d x ′ ) ; see [ 49 , Section 2] for techni cal details. Remark It is prov ed in [ 49 , S ecti on 4] that the acceptan ce probabilit y ( 5 ) is optimal in the sense of minimizing the asymptotic varian ce of sample path a verages among all acceptance rat es satis- fying ( 4 ). 3 Mixtur e of prop osal ker nels 3.1 Metr opolis-Hastings-Green ratio f or mixture of pr oposal ker nels It is often con venient to consider a proposa l kern el Q b uilt as a mixture of simpler transition ker nels Q m , with m in so me ﬁnite or countabl e index set M . In this case we ha ve Q  x , d x ′  = X m ∈ M j ( x , m ) Q m  x , d x ′  , (7) where j ( x , m ) is the p robabi lity of choos ing the mov e type m giv en that th e cu rrent state is x . Note that the actua l value of Q m ( x , · ) is irrele van t w hen j ( x , m ) = 0 . It turns out that, unde r some assumptio ns, the M HG ratio for a mixture kerne l Q can be con ven iently deduce d from the elementary ratio s computed for each indi vidual kern el Q m using the formul a r  x , x ′  = j ( x ′ , m ′ ) j ( x , m ) π (d x ′ ) Q m ′ ( x ′ , d x ) π (d x ) Q m ( x , d x ′ ) , (8) where m ∈ M denotes the sp eciﬁc mov e that has be en used to propose x ′ , and m ′ ∈ M is the corres pondin g “re vers e m o ve” . Equation ( 8 ) is routinely used in appl icatio ns of the RJ-MCMC algori thm, and is alluded to in Green’ s paper [ 17 , p. 717] in the sentence : “ If [other] discr ete variab les ar e gener ated in making pr oposals , the pr obabil ity functions of their r ealized values ar e multiplied i nto the move pr obabilities ”—b ut it is wrong in general. Sufﬁcien t con dition s for Equation ( 8 ) to hold are pro vided by the follo wing result: Pro position 1. Let R m (d x , d x ′ ) = j ( x , m ) π (d x ) Q m ( x , d x ′ ) . Assume that ther e e xists a family of disjoint sets W m ∈ B ⊗ B in de xed by M suc h tha t : i) F or each m ∈ M , R m is supported by W m , w hic h mea ns R m  X 2 \ W m  = 0 . ii) Each move m ∈ M has a unique “re ver se move” ϕ ( m ) ∈ M in the sense that W ϕ ( m ) = W T m , wher e W T m = { ( x ′ , x ) : ( x , x ′ ) ∈ W m } . Then, then MHG r atio is give n by Equa tion ( 8 ) with m ′ = ϕ ( m ) . Pr oof. For π (d x ) Q ( x , d x ′ ) -almost ev erywhere o n X 2 , the re is a un ique m = m x , x ′ ∈ M such that ( x , x ′ ) ∈ W m . Equa tion ( 8 ) can be rewritt en as: r  x , x ′  = R ϕ ( m x , x ′ ) (d x ′ , d x ) R m x , x ′ (d x , d x ′ ) · 4 Then, for all A ∈ B ⊗ B , Z Z A r ( x , x ′ ) R (d x , d x ′ ) = Z Z A R ϕ ( m x , x ′ ) (d x ′ , d x ) R m x , x ′ (d x , d x ′ ) · X m 0 ∈ M R m 0 (d x , d x ′ ) = X m 0 ∈ M Z Z A ∩ W m 0 R ϕ ( m 0 ) (d x ′ , d x ) R m 0 (d x , d x ′ ) R m 0 (d x , d x ′ ) = X m 0 ∈ M Z Z A ∩ W m 0 R ϕ ( m 0 ) (d x ′ , d x ) = X m 0 ∈ M Z Z A T ∩ W T m 0 R ϕ ( m 0 ) (d x , d x ′ ) = Z Z A T R (d x , d x ′ ) becaus e W T m 0 = W ϕ ( m 0 ) = Z Z A R (d x ′ , d x ) . 3.2 Mixtur e rep r esentation of trans-dimensional kern els Consider the ca se of a v aria ble-di mensiona l space, that can be written as X = ∪ k ∈K { k } × Θ k , with K a ﬁnite or countable set (usually K ⊂ N ) a nd Θ k ⊂ R n k . A point x ∈ X is a pair ( k, θ ) with k ∈ K and θ ∈ Θ k . The problem of sampli ng a (po sterior ) distrib ution on such a sp ace typica lly occurs in the conte xt of Bayesian model selectio n or av eragin g. Set X k = { k } × Θ k . Any kernel Q on X admits a natural representa tion as a mixture of ﬁxed- dimensio nal and trans-dimens ional kernels : Q  x , d x ′  = X ( k, l ) ∈K 2 p k ,l ( x ) Q k ,l  x , d x ′  , (9) where p k ,l ( x ) = 1 X k ( x ) Q ( x , X l ) , Q k ,l ( x , · ) = 1 p k ,l ( x ) Q ( x , · ∩ X l ) . (An arbitrary v alue can be chosen for Q k ,l ( x , · ) when p k ,l ( x ) = 0 to make it a complete ly deﬁned tra nsition ker nel.) The k ernels Q k ,k , k ∈ K , corres pond to the “ﬁx ed-d imension al” part of the tran sition kernel Q ; while th e kerne ls Q k ,l , ( k , l ) ∈ K 2 , k 6 = l , corres pond to the “trans- dimensio nal” part. The mixture represent ation ( 9 ) satisfy the assumptions of Prop ositio n 1 with M = K 2 , W k ,l = X k × X l for all ( k , l ) ∈ M and ϕ ( k , l ) = ( l , k ) . Therefore, if the current state x is i n X k and the propo sed state x ′ in X l , the MHG ratio ( 8 ) reads r ( x , x ′ ) = p l,k ( x ′ ) p k ,l ( x ) π (d x ′ ) Q l,k ( x ′ , d x ) π (d x ) Q k ,l ( x , d x ′ ) · (10 ) In most “tut orial” papers about the RJ-MCMC meth od, this expressi on is directly written in the specia l case where Green’ s dimension matching argument can be applied (see, e.g., [ 18 ], Sec- tions 2.2 a nd 2.3). Unfort unatel y , th e dimensio n matching ar gument does not a pply direct ly to the 5 commonly used Birth-or -Death kern els (see next section) if the mixture representa tion ( 9 ), which leads to ( 10 ), is used. 4 Birth-or -Death ker nels 4.1 Birth-or -Death kernels on (un sorted) vectors Let us con sider the situati on where a poin t x ∈ X describes a set of k objects s 1 , . . . , s k ∈ S , with ( S , ν ) an atomless 2 measure space and k ∈ N . One possible—a nd co mmonly used—way of repres enting this is to consider pairs ( k, s ) , where the o bjects s i , 1 ≤ i ≤ k , ha ve b een arranged in a vec tor s = ( s 1 , . . . , s k ) ∈ S k . The correspon ding space is X = ∪ k ≥ 0 X k , X k = { k } × S k , with the con vention that S 0 = { ∅ } . Remark The results that will be presented in this section are easily gener alized if the model includ es addit ional (ﬁx ed-d imension al) parameters tha t a re left unc hange d by the Birth-or -Death mov es (for in stance the parameter s Λ an d δ 2 in a fully Bayes ve rsion of the model presen ted in Section 5 ). Birth-or -death ker nels are th e most natural kin d of tra ns-dimen siona l moves in such spaces. Giv en k ∈ N , s = ( s 1 , . . . , s k ) ∈ S k and s ∗ ∈ S , we introduce the notations s − i = ( s 1 , . . . , s i − 1 , s i +1 , . . . , s k ) ∈ S k − 1 , s ⊕ i s ∗ = ( s 1 , . . . , s i − 1 , s ∗ , s i , . . . , s k ) ∈ S k +1 , where 1 ≤ i ≤ k in the ﬁrst case and 1 ≤ i ≤ k + 1 in the seco nd case. Starting from x = ( k, s ) , a birth move inserts a new component s ∗ ∈ S , generated according t o some pr oposa l dist rib utio n q ( s ) ν (d s ) , at a ran domly selec ted location: Q b ( x , · ) = 1 k + 1 k +1 X i =1 Z S δ ( k +1 , s ⊕ i s ∗ ) q ( s ∗ ) ν (d s ∗ ) . (11) A death mov e, on the cont rary , remov es a ran domly sele cted component form the current state: Q d ( x , · ) = 1 k k X i =1 δ ( k − 1 , s − i ) . (12 ) Finally , the birth-or -death k ernel is a mixture of the two: Q ( x , · ) = p b ( x ) Q b ( x , · ) + p d ( x ) Q d ( x , · ) , (13) with p b ( x ) ≥ 0 , p d ( x ) ≥ 0 , p b ( x ) + p d ( x ) = 1 , and p d ((0 , ∅ )) = 0 . 4.2 Expr ession of the MHG ratio The follo w ing proposi tion provid es the exp ressio n of the MHG ratio for the m odel and ke rnel descri bed in Section 4.1 . 2 See, e.g., [ 14 ]. As a concrete example, t hink of S = R d endo wed wi th its usual Borel σ -algebra an d ν equal to Lebesgue’ s meas ure. W e will use the following property in the proo f of Proposition 2 : if ( S , ν ) i s atomless, then the diagonal ∆ = { ( s, s ) : s ∈ S } is ν ⊗ ν -negligible in S × S . 6 Pro position 2. Assume that, for all k ≥ 1 , th e tar get measur e π r estricted to X k admits a pr oba- bility density function f k with r espect to ν ⊗ k . Then th e M HG ratio i s r ( x , x ′ ) = f k +1 ( x ′ ) f k ( x ) p d ( x ′ ) p b ( x ) 1 q ( s ∗ ) (14) for a birth mo ve fr om x = ( k, s ) to x ′ = ( k + 1 , s ⊕ i s ∗ ) . Pr oof. Although a direct computation of the MHG ratio would be possible based on E qua- tions ( 11 )–( 13 ), we ﬁnd it much m ore illuminating to deduce the result fr om Proposition 1 using ker nels which are simple r than Q b and Q d . T o do so, let us consider the family of element ary ker nels Q m , with m in th e index set M =  ( α, k , i ) ∈ { 0 , 1 } × N 2 : 1 ≤ i ≤ k + α  where Q 1 ,k ,i is the kern el from X k to X k +1 that inserts a ne w component s ∗ ∼ q ( s ) ν ( ds ) in positi on i , and Q 0 ,k ,i is the ker nel from X k to X k − 1 that remov es the i th compone nt. T hen we ca n write Q ( x , · ) = X m ∈ M j ( x , m ) Q m ( x , · ) , (15) with j ( x , m ) deﬁned for all x = ( k, s ) ∈ X as j ( x , m ) =      p b ( x ) / ( k + 1) if m = ( 1 , k , i ) , 1 ≤ i ≤ k + 1 , p d ( x ) /k if m = ( 0 , k , i ) , 1 ≤ i ≤ k, 0 otherwis e. Denote by e X k the set of all x ∈ X k in which no tw o component s are equal. For all k , π ( X k \ e X k ) = 0 , since π | X k admits a density with respec t to the product measure ν ⊗ k . T he mixture repres entatio n ( 15 ) thus satisﬁes the assumptio ns of Proposition 1 with W (1 ,k, i ) = n ( x , x ′ ) ∈ e X k × e X k +1 : ∃ s ∈ S k , ∃ s ∗ ∈ S , x = ( k , s ) , x ′ = ( k + 1 , s ⊕ i s ∗ ) o , W (0 ,k, i ) = W T (1 ,k − 1 , i ) , ϕ (1 , k , i ) = (0 , k + 1 , i ) and ϕ (0 , k , i ) = (1 , k − 1 , i ) . A cco rding to Proposit ion 1 , the MHG ratio for a birth mov e m = (1 , k , i ) is thus r ( x , x ′ ) = p d ( x ′ ) p b ( x ) π (d x ′ ) Q 0 ,k +1 ,i ( x ′ , d x ) π (d x ) Q 1 ,k ,i ( x , d x ′ ) . Observ e that the 1 / ( k + 1) terms, in the mov e selection probabilitie s, cancel each other . T o complete the proo f, it remains to sho w that π (d x ′ ) Q 0 ,k +1 ,i ( x ′ , d x ) π (d x ) Q 1 ,k ,i ( x , d x ′ ) = f k +1 ( x ′ ) f k ( x ) 1 q ( s ∗ ) · (16) This can be o btaine d, in the genera l case 3 , by a dir ect computat ion of the densit ies with respect to the symmetri c m easu re ξ  d( k , s ) , d x ′  = ν ⊗ k (d s )  δ ( k − 1 , s − i ) (d x ′ ) + Z S δ ( k +1 , s ⊕ i s ∗ ) ν (d s ∗ )  . 3 In the important special case where S ⊂ R d and ν is (the restriction of) the d -dimen sional Lebesgu e measure, ( 16 ) can be simply seen as the result of Green ’ s dimension matching argumen t [ 17 , Section 3.3], in a very simple case where the Jacobian is equal to one. 7 W e emphas ize tha t ( 15 ) is not the us ual mixture represen tation of tran s-dimens ional ke rnels introd uced in Sec tion 3.2 . Ind eed, start ing, e.g., fro m X k , th ere a re se veral ele mentary k erne ls that can propose a point in X k +1 . T his shows the u sefuln ess of Propositi on 1 , which pr ovi des s uf ﬁcient condit ions for ( 8 ) to hold be yond the case of the usual mixture represent ation ( 9 ). 4.3 Birth-or -Death kernels on sorted vec tors Let us assume now that the ob jects are “sor ted”, in some sense, befo re be ing arranged in the v ector s = ( s 1 , . . . , s k ) ∈ S k . This happens , in p ractic e, either when there is a natura l orderi ng on the set of objects (e.g., the jump times in si gnal segmentatio n or multiple change-po int proble ms [ 17 , 34 ]) or when artiﬁcial constrai nts ar e intr oduced to resto re identiﬁabil ity in the case o f exch angea ble compone nts (see [ 9 , 23 , 35 , 36 , 46 ] for the case of mixture models). T o formalize this, let us conside r the same space X as in Section 4.1 . Ass ume that S is endo w ed with a total order and that the correspo nding “ sort function” ψ : X → X is measurabl e. What we are assuming now is that the tar get m easu re, denoted by e π in this section, i s suppor ted b y ψ ( X ) —in other word s, the components of x ∈ X are e π -almos t surely sorted. In such a setting , the deﬁnition of the Birth-or -Death kern el has to be slightly modiﬁed in order to acc ommodate the sort constrai nt: the death kernel is uncha nged, but ne w co mponent s are inserte d determinist ically at the only location that mak es the resulting vect or sorted (i nstead of being adde d at a random locati on). Mathematically , for x = ( k, s ) ∈ X k , we no w ha v e: e Q b ( x , · ) = Z S δ ψ ( k +1 , s ⊕ 1 s ∗ ) q ( s ∗ ) ν (d s ∗ ) , e Q d ( x , · ) = 1 k k X i =1 δ ( k − 1 , s − i ) = Q d ( x , · ) . Proceedi ng as in the proof of Proposition 2 , it can be prov ed that the MHG ratio for a birth move from x = ( k , s ) to x ′ = ( k + 1 , s ⊕ i s ∗ ) i s r ( x , x ′ ) = e f k +1 ( x ′ ) e f k ( x ) · p d ( x ′ ) / ( k + 1) p b ( x ) η i ( x ) · 1 q ( s ∗ ) /η i ( x ) , (17) where e f k denote s the pdf of e π on X k and η i ( x ) the probabili ty that s ∗ ∼ q ( s ) ν (d s ) is inserted at locatio n i in x . (Not e that p b ( x ) η i ( x ) is the proba bility of performing a birth m o ve at locatio n i , and p d ( x ′ ) / ( k + 1 ) the probab ility of the rev erse death m o ve; this is the appropria te way of decompo sing this kernel as mixture in order to use Proposition 1 .) Let us now consider the case where, in the setti ng of Section 4.1 , the tar get probabil ity measure π is in var iant under permuta tions of the components indices (in other words, the cor - respon ding random v ariab les are exc han gea ble [ 8 , Chapter 4]). Sorting the compon ents (as an identi ﬁability devic e) is equi v alent to looking at the image measure e π = π ψ , which has the pdf e f k = k ! f k 1 ψ ( X ) on X k . As a conse quence , th e MHG ratios ( 14 ) and ( 17 ) are equal. Remark Another opti on, w hen t he componen ts of the vector ( s 1 , . . . , s k ) are exchang eable, is to for get abou t th e indice s and consider the set { s 1 , . . . , s k } instea d. T he obj ect of interest is then a (random) ﬁnite set of points in S —in other words, a point process on S . The expr ession of the MHG ratio for Birth-or -Death mov es in the point proces s frame wor k, with t he Poiss on p oint proces s as a refer ence measure, has b een gi ve n in [ 15 ] (on e year before the p ublica tion of Green’ s paper [ 17 ]). Point process es ha ve been widely used, since th en, in image processing and object identi ﬁcation (see, e.g., [ 12 , 24 , 41 , 47 ]). 8 5 Example: joint detection and estimation of sinusoids in white Gaussian noise The resu lts present ed in Section 4 can b e used to co mpute the MHG rat io easily in man y sign al decompo sition proble ms. Let us illu strate this with the joint Bay esian model selection and param- eter estimation of sinusoids in white Gauss ian n oise, as ﬁ rst consid ered by [ 1 ]. As explaine d in the introd uction , this se minal pa per introduce d the RJ-MCMC methodology in the signal pro cessin g community , and at the same time intro duced an erro neous exp ressio n of the MHG ra tio that h as been, since th en, reprod uced in a long series o f pap ers. W e follo w closely the mod el an d notat ions of [ 1 ]; the reader is referred to the original paper for more details . Let y = ( y 1 , y 2 , . . . , y N ) t be a vecto r of N observ ations of an observ ed signal. W e consider the ﬁnite family of nested models M 0 ⊂ M 1 ⊂ · · · ⊂ M k max , where M k assumes that y is compo sed of k sinuso ids observe d in white Gaussian nois e. L et ω k = ( ω 1 ,k , . . . , ω k ,k ) and a k =  a c 1 ,k , a s 1 ,k , . . . , a c k,k , a s k,k  be the v ectors of radial frequ encie s and cosine/sin e amplitudes under mode l M k , respec ti ve ly; moreov er , let D k be the correspon ding N × 2 k design matri x. Then, the observ ed signal y follows unde r M k a normal linea r regr ession model: y = D k . a k + n , where n is a white Gaussian noise with va riance σ 2 . The unkno wn parameters are, then, assumed to be the number of co mponent s k , the compo nent-s peciﬁc parameters θ k = { a k , ω k } and the noise v ariance σ 2 which is common to all models. T he joint prior distr ib ution is ch osen to ha ve the foll o wing hierarchic al structure: p  k , θ k , σ 2  = p  a k | k , ω k , σ 2  p  ω k | k  p  k  p  σ 2  , where the prior ove r a k is the con ventiona l g -prio r distrib utio n [ 50 ], which is a zero mean Gaussian with σ 2 δ 2 ( D t k D k ) − 1 as its cov aria nce matrix. Conditional on k , the radial frequencie s are inde- pende nt and identically distrib uted, with a uniform dis trib ution on (0 , π ) . The n oise v arianc e σ 2 is endo wed with Jeff reys improper prior , i.e. p ( σ 2 ) ∝ 1 /σ 2 . The number of components k is giv en a Poisson distrib utio n with mean Λ , truncated to { 0 , 1 , . . . , k max } . The parameters a k and σ 2 can be inte grat ed out analytica lly , and the resu lting marg inal posterior becomes p ( k , ω k | y ) ∝ ( y t P k y ) − N/ 2 Λ k π − k k ! ( δ 2 + 1) k 1 (0 ,π ) k ( ω k ) , (18) with P k = I N − δ 2 1 + δ 2 D k  D t k D k  − 1 D t k when k ≥ 1 an d P 0 = I N . Inferen ce under this hierarc hical Bayesian m odel is carried out in [ 1 ] using an RJ-MCMC sampler on X = S k max k =0 { k } × (0 , π ) k with tar get density ( 18 ). W e only focus here on the “between- models” mov es, which are Birth-or -Death mov es of the kind describ ed in Section 4.1 , with a unifor m density on (0 , π ) for the prop osal distrib ution of the ne w frequenc y in the birth mov es. Let u s no w c ompute the M HG r atio for a birth mov e. Note that the p osterio r density ( 18 ) is written in the case of “u nsorte d” compone nts d escribe d in Sections 4.1 – 4.2 . W e shall therefore make use of Propositio n 2 , which assumes that new component is inserted at a random position i (all co mponen ts being selected with th e same pro babilit y). The correct MHG rati o, for a birth mov e from x = ( k , ω k ) t o x ′ = ( k + 1 , ω k ⊕ i ω ∗ ) , turns out to be r ( x , x ′ ) = p ( k + 1 , ω k ⊕ i ω ∗ | y ) p ( k , ω k | y ) × p d ( x ′ ) p b ( x ) × 1 q ( ω ∗ ) , (19) 9 pdf k 0 2 4 6 8 10 0 0 . 2 0 . 4 Figure 1: The pdf ’ s o f Poisson (gray) and accelerate d Poiss on (black) distrib ution s with m ean Λ = 5 . Both distri b ution s are truncate d to the set { 0 , . . . , 32 } . where q denotes the uniform distrib utio n of (0 , π ) . Using p d ( x ′ ) p b ( x ) = p 0 ( k ) p 0 ( k + 1) = k + 1 Λ as in [ 1 ], with p 0 standi ng for the (truncate d Poisson) prior distrib ution o f k , we ﬁnally ﬁnd r ( x , x ′ ) =  y t P k +1 y y t P k y  − N/ 2 Λ π − 1 (1 + k )(1 + δ 2 ) × k + 1 Λ × 1 π − 1 =  y t P k +1 y y t P k y  − N/ 2 1 1 + δ 2 · (20) Note tha t the e xpr ession of the ratio pro posed in [ 1 , Equation (2 0)] dif fers from the on e we ﬁnd here by a factor 1 / ( k + 1) . A s imilar mistak e in computin g R J-MCMC ratios has be en repo rted in the ﬁeld of gene tics [ 22 , 44 ]. In fa ct, using the expr ession of the birth ratio with an additio nal factor of 1 / ( k + 1) , as in [ 1 ], amounts to assigning a dif feren t prior distrib utio n over k called “acceler ated Poisson distrib u- tion” [ 44 ] which read s p 2 ( k ) ∝ e − Λ Λ k ( k !) 2 1 N ( k ) . (21) Figure 1 illustrates the differe nce between both the accelerated (black) and the usual (gray) Poisson distrib utions whe n mean Λ = 5 . It can be observ ed th at the accelerated Poisson distrib ution ( 21 ) puts a stron ger emphasis on “spar se” models, i.e., models with a small number of componen ts. Let us consi der an exp eriment in which the observe d signal of length N = 64 consists of k = 3 sinu soida l components with the radial frequen cies ω k = (0 . 63 , 0 . 68 , 0 . 73 ) t and amplitud es a 2 c i,k + a 2 s i,k = (20 , 6 . 32 , 20) t , 1 ≤ i ≤ k . The signal to nois e ratio, deﬁned as SNR , k D k . a k k 2 / ( N σ 2 ) , is set to a moderate value of 7 dB. Samples from the poste- rior distrib ution of k are ob tained using th e RJ-MCMC sa mpler of [ 1 ], with an in verse Gamm a prior I G (2 , 100) on δ 2 and a Gamm a prior G (1 , 10 − 3 ) on Λ . For each observ ed si gnal in 100 r epli- cation s of the e xperiment, the sampler was run twic e: once with the correct exp ressio n of the rati o, gi ve n by ( 20 ), and once with the erroneous expre ssion from [ 1 ]. Figure 2 shows the frequenc y of 10 probab ility k 0 2 4 6 8 10 0 0 . 2 0 . 4 0 . 6 0 . 8 Figure 2: F requ enc y of selectio n for each model M k for 100 re plicati ons of the exp eriment de- scribe d in Section 5 , using the expressio n of th e ratio gi ven in [ 1 , Equation (20) ] (black) and the correc ted ra tio ( 20 ) (gray). There are k = 3 sinus oidal co mponen ts in the observed s ignal y and the SN R = 7 dB. 100k sampl es were generate d using RJ-MCMC sampler and th e ﬁ rst 20k w ere discar ded as b urn-i n period. selecti on of each model under both the Poisson and the accelerated P oiss on distrib utio n as a prior for k . It appears th at the ( uninte nded) use of t he accelerat ed Poisson distrib ution, induced b y the errone ous e xpression of the MHG ratio, can result in a signiﬁcant shift to the left of the posterior distrib ution of k . Remark W orking with “sorted” v ectors of frequen cies would be quite nat ural in this proble m, since the frequenci es are exchange able under the p osterio r ( 18 ). As explained in S ecti on 4.3 , th e exp ressio n of the MHG ratio would be the same. Remark The reason why the MH G ratio in [ 1 ] is wron g can be understoo d from a subse quent paper [ 4 ], where the same computat ion is e xpl ained in greater detail. T here we can see that the author s, wor king with an “unsorted v ector” re presen tation , consider t hat the new component in a birth move is inserted a t t he end . The death mo v e, h o we ver , is deﬁned as in the pres ent p aper: a sinuso id to be removed is sel ected randomly among the e xistin g componen ts. Here is th e mistak e: if the ne w comp onent is inserted at the end during a birth move , then an y attempt at removing a compone nt which is not the last one should be rejec ted during a de ath move. In other words, the acceptance probability shou ld be zero when an y component b ut the last on e is pic ked to be remov ed during a death move . 6 Conclusion The c omputat ion of MHG ra tios is a delicate matter in v olving measu re-theo retic considerati ons, for which practitioner s need clear mathematica l statements that can be used “out of the box”. Such a st atement has been a vailab le for a long time in the c lassic al ﬁ x ed-dime nsiona l M etro polis- Hastings sample r , and more recently pro vided by Green [ 17 ] for trans-d imensio nal mov es that comply with the assu mptions of his dimensio n m atch ing arg ument. In this note , we hav e prov ided the express ion of the M HG ra tio for Birth-or -Death moves, using a general result for mixtu res of propo sal kernels , and corrected the erro neous express ion pro vided by [ 1 ]. A similar corr ection has to be ap plied to th e ratios used in the long series of 11 signal process ing pap ers [ 3 – 6 , 10 , 2 0 , 25 – 27 , 30 , 3 1 , 40 , 4 2 , 43 ] that h a ve been found to co ntain the same mistak e. While writing this note, we discov ered that a ve ry similar mistak e had been detected and correc ted in the ﬁeld of geneti cs by [ 22 ], from which w e borro w our concl uding words : The fact tha t this err or has r emained in the li tera tur e for o ver 5 year s [12 years in the pre sent case] under sco r es the vie w tha t while Bayesia n analysis using Marko v chain Monte Carl o is incr edibly ﬂe xible and ther efor e powerful, the de vil is in the details. F urth ermor e, incorr ect analyses can give r esults that se em q uite r easonabl e. Refer ences [1] C. Andri eu an d A. D ouce t. Joint Baye sian mode l sele ction and es timation o f no isy sinu soids via re versible jump MCMC. IEEE T rans action s on Si gnal Pr ocessing , 47(10):2 667–2 676, 1999. [2] C. Andrieu and ´ E. Moulines . On the er godicity prop erties of some adapti ve MCMC algo- rithms. Annals of Applied Pr obabil ity , 16(3):1462 –1505 , 2006. [3] C. Andrieu, N. Freitas, and A. Doucet . Rev ersib le Jump MCMC Simulated Anneal ing for Neural Netwo rks. In Pr oceedings of the 16th Conf er ence on U ncer tainty in Artiﬁcial Intel li- gen ce , pages 11–18. Morga n Kaufmann Publishers Inc., 2000. [4] C. Andrieu, PM Djuri ´ c, and A. D ouc et. Model selection by MCMC computati on. Sig nal Pr ocessing , 81(1):19 –37, 2001. [5] C. Andrieu, N. Freitas , and A. Doucet. Robus t fu ll Bayesian learning for radial basis net- works . Neural C omputa tion , 13(10):2 359–2 407, 2001. [6] C. And rieu, E. Barat, and A. Doucet. Bayesia n decon volu tion of noi sy ﬁltered point pro- cesses . IE E E T r ansac tions on S ignal Pr ocessing , 49(1):134 –146, 2002. [7] Yves F . Atchad ´ e an d J. S. Rosenthal . On ada pti v e Markov chain Monte Carlo algo rithms. Bernoull i , 11(5):815 –828, 2005. [8] J.M. Bernardo and A.F .M. Smith. Bayes ian theo ry . Wi ley , 2000. [9] O. Capp ´ e, C.P . Robert, and T . Ryd ´ en. Re vers ible jump, birth-and -death and more general contin uous time Marko v chain Monte Carlo samplers. Jour nal of the Royal Statistic al Soci- ety: Seri es B , 65(3) :679– 700, 2 003. [10] M. Da vy , S . J. Godsill, an d J. Idie r . Baye sian analys is of polyphon ic wester n tonal music. J ourna l of the Acoustica l Society of America , 119:2498 –2517 , 2006 . [11] X. Descombes, MNM va n L iesh out, R. Stoica, and J. Zerubia. Parameter estimation by a Marko v cha in Monte C arlo tech nique for the C andy model . In IEEE W orks hop Statistica l Signa l Pr ocessing , Singap or e , 200 1. [12] X. Descombe s, F . Kruggel , G. W ollny , and H.J. Gertz. An object -based approach for detect- ing small brain lesions: appli cation t o V ircho w-Robin spaces. IEEE T ra nsacti ons on Medical Imagi ng , 23(2):24 6–255 , 2004 . [13] P . Diaconis , S. Holmes, an d R.M. Neal. A naly sis of a non re ve rsible mar ko v chain sampler . Annals of Applied Pr obability , 10(3):726 –752 , 2000. 12 [14] Davi d H. Fremlin. Measur e Theory , V olume 2 : Br oad F ounda tions . T orres Fremlin, 2001. [15] C. J. G e yer an d J. Møller . Simulation pr ocedu res and lik elihood inference for spatial poin t proces ses. Scandinavi an J ournal of S tatist ics , 21(4):35 9–373 , 19 94. [16] W .R. Gilks, S. Richard son, and D.J. Spiegelha lter . Marko v chain Mon te Carl o in practice . Chapman & Hall/CRC, 199 6. [17] P . J. Green. Rev ersible jump MCMC comput ation and Bayesi an m ode l determinat ion. Biometrika , 82(4 ):711– 732, 1 995. [18] P . J. Green. T rans- dimensio nal Mark o v chain Monte Carlo . In P . J. Green, N. L. Hjort, a nd S. Richards on, editors, H ighly Struc tur ed Stochas tic Systems , pages 179–19 8. O.U .P ., 2003. [19] W .K. Hasting s. Monte Carlo sampling methods using Marko v chains and their applicati ons. Biometrika , 57(1 ):97–1 09, 1970. [20] M. Hong, M.F . B uga llo, and P .M. Djuric. Joint M ode l Selection and Parameter Estimation by Populati on Monte Carlo Simulation. IEEE Jou rnal of Selecte d T opics in Signal Pr ocessing , 4(3):5 26–53 9, 2 010. [21] M. Hurn and H. Rue. High level ima ge prio rs in confocal micr oscopy ap plicat ions , pages 36–43 . 1997. [22] J.-L. Jannink and R. L . Fernando . On the Metropolis-Has tings accepta nce probabili ty to add o r dro p a quantitati ve trait locus in Marko v chain Mo nte Carl o-base d Bayesian analy ses. Genetics , 166:641–6 43, 2004. [23] A. Jasra, C C Holmes, an d D A S teph ens. Marko v chain Mon te Carlo methods an d the label switchin g problem in Bayesian mixture modeling . Statistical Science , 20(1):50 –67, 2005. [24] C. Lacoste, X. Descombes, an d J. Zerubia. Point proces ses for unsupervise d line network ex- tractio n in remote sensing. IE EE T ra nsacti ons on P attern Analysis and M achine Intellig ence , pages 156 8–157 9, 2 005. [25] Jean Ren ´ e L aroc que and Ja mes P . Reilly . Re versible jump MCMC for joint detection and estimatio n of sources in coloured nois e. IEEE T ransac tions on Signal P r ocessin g , 50:231– 240, 200 2. [26] J.R. Larocque, J.P . R eilly , and W . Ng. Pa rticle ﬁlters for track ing an unkno w n nu mber of source s. IEEE T ra nsacti ons on Si gnal Pr ocessin g , 50(12):29 26–2 937, 2002 . [27] L. Melie-Garc ´ ıa, E.J. Canales-Rod r ´ ıguez, Y . Alem ´ an-G ´ o mez, C.P . Lin, Y . It urria-Me dina, and P .A . V ald ´ es-Hern ´ andez. A Bayesian fra me work to identify princip al intra vox el dif fusio n proﬁles based on dif fusion-weigh ted MR imagin g. Neur oimage , 42(2):7 50–77 0, 200 8. [28] N. Metropolis , A.W . Rosenbluth, M.N. Rosenblu th, A.H . T eller , and E. T eller . E qua tion of state calcu lation s by f ast computing machin es. The J ournal of Chemical Physic s , 21(6): 1087, 1953 . [29] S.P . Meyn, R.L. T weedie, and JL Hibey . Marko v Chains and S toc hastic Stabili ty . Springer - V erlag London , 1993. [30] W . Ng, J.P . Reilly , T . Kiruba rajan, and J.R. Larocque. Wi deban d array signal proc essing using MCMC method s. IEE E T ransactio ns on Sig nal Pr ocessin g , 53(2):4 11–42 6, 200 5. 13 [31] W . Ng, T . Ch an, HC So, an d KC Ho. On p articl e ﬁlters for landmin e detect ion using impulse groun d penetrating rad ar. In Pr oceedings of the 5th IEEE W orksho p on Senso r Arr ay and Multic hanne l Signal Pr ocessin g , pages 225–22 8, 2008. [32] G.K. Nicholls. Bayesian image analysis with Markov chain Monte Carlo and coloured con- tinuum triangula tion models. J ourna l of the Royal Statistical S ociety: Series B , 60(3):643– 659, 199 8. [33] A. P ie vatol o and PJ G reen . Objec t re storat ion through d ynamic polygons. J ourn al of the Royal Statistical Society : Series B , 60:609 –26, 1998. [34] E. Punskay a, C. Andrieu, A. Doucet, a nd WJ F itzge rald. Bayesian curve ﬁtting using MCMC with applic ations to signal segme ntation . IEEE T ransactio ns on Signal Pr ocessing , 50(3): 747–7 58, 2002. [35] S. Richardson and P . J. Green. On Baye sian analysis of mix tures with an unkno w n number of compon ents. J ourna l of the Royal St atistic al Society: Series B , 59(4) :731– 792, 1997. [36] S. Richardson and P . J. Green. Corrigen dum: On Bayesian analysis of mixtures with an unkno wn number of compon ents. Jou rnal of the R oya l S tatisti cal Society : Series B , 60(3): 661, 199 8. [37] C.P . Robert and G. Casella. M onte Carlo Statis tical Methods (second edition) . Spring er V erlag, 200 4. [38] G.O. Roberts a nd J.S. R osen thal. G ener al state sp ace Markov chains and MCMC algorithms . Pr obability Su rve ys , 1:20–71, 2004. [39] A. R ooda ki. Signal Decompositio ns using T r ans-di mension al B ayes ian Methods . PhD thesis, ´ Ecole Sup ´ erieure d’ ´ Electricit ´ e (Sup ´ elec), Gif-sur -Yvett e, France, 2012. [40] D.V . Rubtso v and J.L. Grif ﬁn. Time-d omain Bayesian detectio n and estimatio n of noisy damped sinusoidal sign als applied t o N MR spectrosco py . J ourna l of Magnetic Reso nance , 188(2 ):367– 379, 2 007. [41] H. Rue and M.A. Hurn. Bayesian object identiﬁcatio n. Biometrika , 86(3):6 49–66 0, 199 9. [42] Mikkel N. Schmidt and Morten Mørup. Inﬁnite non- neg ati v e matrix f actori zation . In Pr o- ceedin gs of th e 18 th Eur opean Signal Pr ocessing Confer ence (EUSIPCO) , Aug 2010 . [43] Z.G. Shi, J.X. Z hou , H.Z. Zhao, and Q. Fu. S tud y on join t Bayesian model sele ction and pa- rameter estimation method o f GTD mod el. Scienc e i n China Series F : Information Sciences , 50(2): 261–2 72, 2007. [44] M.J. Sillanp aa, D . Gasba rra, and E. Arjas. Comment on “ On the Metr opolis -Hasting s Ac- ceptan ce P rob ability to Add or Drop a Quantitati v e Trait Locus in Marko v C hain Monte Carlo-Based Bayesian Analyse s”. Genetics , 167( 2):103 7, 2 004. [45] S. A. Sisson. Tran sdimensi onal Mark ov chains: a decade of progress and fut ure perspecti ves. J ourna l of the American S tatisti cal Association , 100(471 ):107 7–1090, 2005. [46] M. S teph ens. Dealin g with label switching in mixture models. J ournal of the R oya l Statistical Societ y: Series B , pag es 795–809, 2000. [47] R. Stoica, X. Descombes, and J. Zerubia. A Gibbs point process for road extracti on fro m remotely sen sed images. Internat ional Jo urnal o f Computer V ision , 57(2):121– 136, 2 004. 14 [48] L. Tie rne y . M ark o v chains for e xplo ring posterior dist rib utio ns. Annals of S tatisti cs , 22(4) : 1701– 1728, 1994. [49] L. T ierne y . A note on Metropoli s-Hastin gs kerne ls for general state spaces. Annals of A ppli ed Pr obability , pages 1–9, 1998. [50] A. Zellne r . On assess ing prior distr ib utio ns and bay esian reg ressio n analys is with g-prio r distrib utions. In P . K. Goel and A. Zell ner , editors, Bayesian I nfer ence and D ecision T ech- niques : Essays in Honor of Bruno de F inetti , pages 233–24 3. N orth -Holland /Else vier , 1986. 15

Note on the computation of the Metropolis-Hastings ratio for Birth-or-Death moves in trans-dimensional MCMC algorithms for signal decomposition problems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment