Beta-Product Poisson-Dirichlet Processes
Time series data may exhibit clustering over time and, in a multiple time series context, the clustering behavior may differ across the series. This paper is motivated by the Bayesian non--parametric modeling of the dependence between the clustering …
Authors: Federico Bassetti, Roberto Casarin, Fabrizio Leisen
BET A-PR ODUCT POISSON-DIRICHLET PROCESSES FEDERICO BASSETTI, ROBER TO CASARIN, A N D F ABRIZIO LEISEN Abstract. Time series data may exhibit clustering o ver time and, i n a multiple time seri es con text, the clustering b eha vior may differ across the seri es. This paper is motiv ated by the Ba yesian non–parametric mo deling of the dependence b etw een the clustering structures and the distributions of different time series. W e follow a D irichlet process mixture approach and int ro duce a new class of multiv ariate dependen t Diri c hlet pro cesses (DDP). The proposed DDP are represen ted in terms of v ector of stick-breaking processes wi th dependent weigh ts. The we ights are beta r andom ve ctors that determine di fferen t and dep enden t clustering effects along the dimension of the DDP vec tor. W e discuss some theoretical prop erties and provide an efficien t Mon te Carlo Marko v Chain algorithm for posterior co mputation. The effectiv eness of the method is illustrated with a simulat ion study and an app lication to the United States and the European Union industrial pro duction indexes. JEL: C11,C14,C32 Keyw ords: Ba yesian non–parametrics, Dir ich let process, Po issonDiri c hlet pr ocess, Multipl e Time-series non–parametrics 1. Introduction This pap er is co ncerned with some multiv a riate extensions of the Poisson-Dirichlet proc ess. In this pap er the class o f mo dels co nsidered o riginates fro m the F ergus on Dirichlet pro cess (DP) (see F erg uson (19 73, 1974)) tha t is now widely used in non-par ametric Bayesian statistics. Our extensions rely up on the so-c alled Sethuraman’s repres e ntation of the DP . Sethuraman (1994) shows tha t, given a Polish space ( X , X ) and a pr obability mea sure H 0 on X , a random proba bilit y measure G on X is a Dirichlet Pr o cess o f pr ecision pa rameter α > 0 and base measure H 0 , in short D P ( α, H 0 ), if and o nly if it ha s the s tick-breaking representation: (1) G ( · ) = X k ≥ 1 W k δ ˜ ϕ k ( · ) , where ( ˜ ϕ k ) k and ( W k ) k are stochastically indep endent, ( ˜ ϕ k ) k is a seq uence of indep endent random v aria ble s (atoms) with co mmon distribution H 0 (base measure), and the weigh ts W k s are defined by the stick-breaking construction: (2) W k := S k Y j − l and bas e mea sure H 0 , is a r a ndom pro bability measure that can be represented with a Sethuraman-like construction by taking in (1)-(2) a sequence of indep endent random v ar iables S k with S k ∼ B eta (1 − l , α + l k ), see P itman (2006) and Pitman and Y or (1997). F urther genera liz ations based on the stick-breaking construc tio n of the DP ca n b e found in Ishw aran and James (2 001). The DP pro cess and its univ aria te extensions are no w widely used in Bay esian non-parametric statistics. A r ecent account o f B ay e s ian non-par a metric inference can b e found in Hjor t et al. (2010). The univ ariate DP is usua lly employed as a prior for a mixing distribution, resulting in a DP mixture (DPM) mo del (see for example Lo (1984)). The DPM mo dels incorp orate DP priors for para meter s in Bay esian hier archical mo dels, providing an extremely flexible class o f mo dels. More sp ecifically , the DPM mo dels define a random density b y setting: (3) f ( y ) = Z K ( y | θ ) G ( dθ ) = X k ≥ 1 W k K ( y | ˜ ϕ k ) , where K is a s uitable dens it y kernel. Due to the av ailability of simple and e fficie nt methods for po sterior computation, starting from Escobar (1 9 94) a nd Escobar and W est (19 95), DPM mo dels are now routinely implemented and used in ma n y fields. The first aim of this pape r is to intro duce new class of vectors of Poisson-Dirichlet pro cesses a nd of DPM mo dels. V ector s of random pr obability mea s ures arise naturally in g eneralizatio ns of the DP a nd DP M mo dels that accommo date dependence of o bserv a tions on co v aria tes or time. Using cov a riates, data may b e divided into different g roups and this leads to consider gr oup-sp ecific random pr obability mea sures a nd, a s we shall see, to as sume that the observ a tio ns are pa r tially exchangeable. Probably the first pap er in this direction, that intro duced vectors o f prior s for partially ex- changeable data, is Cifarelli and Regazzini (19 78). More recently , MacE achern (1999, 20 01) in- tro duced the so-called dep endent Dirichlet pro cess (DDP) and the DDP mixture mo dels. His sp ecification of the DDP incorp orates dep endence on cov ariates through the atoms while as- suming fixed weigh ts. More sp ecifically , the ato ms in the Sethuraman’s r epresentation (1)-(3) are r eplaced with sto chastic pro cesse s ˜ ϕ k ( z ), z b eing a set of cov ariates. There exist many applications of this specifica tion of the DDP . F or ins tance, De Io rio et a l. (200 4) prop o s ed an ANOV A-t yp e dep endence structur e for the atoms , while Gelfand et al. (2004) consider ed a spatial depe ndence structure fo r the ato ms . La ter, DDP with both dep endent a toms a nd weights w as BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 3 int ro duced in Griffin and Steel (200 6). Other constructions that incorp ora te a dependence str uc- ture in the weigh ts hav e b een prop osed, for insta nce, in Duan et al. (2007); Chung and Dunson (2011); Dunson a nd Peddada (2008); Dunson et a l. (2008) and Ro driguez et al. (2010). Other approa ches to the definition of dependent vectors o f r andom measures r e ly up on either suitable conv ex com binations of independent DPs (e.g., M ¨ uller et al. (20 0 4); P ennell and Dunson (2006); Hatjispyrosa et al. (2 011); Kolo ssiatis et al. (2 011)) or hierarchical structures of s tic k- breakings (e.g., T eh et al. (2006); Sudder th and Jor dan (2009)). Finally , we should note that it is pos sible to follow alternative routes other than the Seth u- raman’s repres e n tation to the definition of vectors of dep endent random proba bilities. F or ex- ample, Leisen and Lijo i (2011) used no rmalized vectors o f completely rando m mea s ures, while Ishw a r an and Zarep our (200 9) emplo yed biv a riate ga mma proc esses. In this pap er , we introduce a new class of m ultiv ariate P oisson-DP and DP b y using a v ector of stick-breaking pro cesses with multiv ariate dep endent weigh ts. In the co nstruction of the depe ndent weigh ts, we co ns ider the class of m ultiv aria te b eta distributions introduce d by Nadar a jah and Kotz (2005) that hav e a tractable sto chastic repr esentation and ma kes the Bay esian inference pro ce dur es easier. W e discuss some prop erties o f the r esulting m ultiv ariate DP and Poisson-DP and show that o ur pro cess ha s the app ealing prop erty that the ma rginal distributions are DP or P oisson- DP . The second aim of the pap er is to a pply the new DP to Bayesian non–para metric infere nc e and to provide a simple a nd efficie n t metho d for p osterio r computation of DDP mixture mo d- els. W e follow a data-augmentation framework and extend to the m ultiv aria te co n text the slice sampling a lgorithm des crib ed in W alker (2007) and Kalli et a l. (2 0 11). The s ampling methods for the full conditional dis tr ibutions of the resulting Gibbs sampling pro cedure a re detailed and the effectiveness of the prop os ed algorithm is studied with a set o f simulation expe r iments. Another contribution of this pap er is to present an applica tion of the prop os ed mult iv ar iate DDP mixture models to m ultiv ariate time series modeling. In the recen t y ears, the interest in Bayesian non-para metr ic mo dels for time series has increased. In this con text, DP hav e b een recently employ e d in different ways. Ro driguez and ter Horst ( 20 08) use d a Dirichlet pro cess to define an infinite mixtur e of time s eries models. T a ddy and Ko tta s (2009) prop osed a Mar ko v-switching finite mixture of indep endent Dirichlet pro cess mixtures. Jensen a nd Maheu (20 10) considered Dirichlet proc e ss mixture of sto chastic v olatility mo dels and Gr iffin (201 1) pro p o sed a cont inuous- time non–par ametric mo del fo r v olatility . A flexible non–par ametric mo del with a time-v arying stick-breaking pro cess has b een r ecently pro po sed by Griffin and Steel (2011). In their mo del, a sequence of depe nden t Dirichlet pro ce sses is used for capturing time-v a riations in the clus tering structure of a set of time series. In o ur paper , we extend the existing B ay e s ian non–para metric mo dels for mu ltiple time ser ies b y allowing each series to have a p ossible different clustering structure and by accounting for dep endence b etw een the series- sp e cific cluster ing s. Since we obtain 4 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN a dynamic infinite-mixture mo del and since the n umber o f comp o nents with negligible weigh ts can be different in ea ch series, our mo del represents a non–para metric alternative to multiv ariate dynamic finite-mixture mo dels (e.g., Ma r ko v-switching mo dels) that are usually employ ed in time series ana lysis. The structur e of the pap er is as follows. Section 2 in tro duces vectors of dep endent stick-breaking pro cesses and defines their prop erties . Section 3 intro duces vectors of Poisson-Dirichlet pro ces s es for pr ior mo delling . Section 4 pr op oses a Mo nt e Ca rlo Ma rko v Chain (MCMC) algorithm for approximated inference for vector o f DP mixtures. Section 5 provides so me applications to b oth simulated data a nd to the time series o f the industria l pr o duction index for the United States and the Euro pe a n Union. Section 6 concludes the pap er. 2. Dependent stick-breaking processe s Consider a set of observ ations, taking v alues in a s pa ce Y , say a s ubs et of R d , divided in r sub–samples (or gr oup of observ ations), that is : Y ij i = 1 , . . . r , j = 1 , . . . , n i . Above Y ij is the j -th observ a tio n within sub–sample i . F or instance, i ma y corresp ond to a space lab el or predictors . Typically , one assumes that the observ a tions o f the blo ck i hav e the same (conditional) densit y f i and that the obse r v ations a re (conditiona lly) indep endent. Hence, to per form a non–parametric Ba yesian ana lysis of the da ta, one needs to specify a prior distribution for the vector of densities ( f 1 , f 2 , . . . , f r ). Moreover, in assessing a prio r for ( f 1 , f 2 , . . . , f r ), a po ssible interest is on b orr owing info r mation acr oss blo cks. T o do this, first we in tro duce a sequence o f density kernels K i : Y × X → [0 , 1] ( i = 1 , . . . , r ) (where K i is jointly measura ble and C 7→ R C K i ( y | x ) ν ( dy ) defines a probability measure on Y fo r any x in X , ν b eing a dominating measure on Y ). Seco ndly w e define: (4) f i ( y ) := Z K i ( y | θ ) G i ( dθ ) i = 1 , . . . , r, where ( G 1 , . . . , G r ) is a vector of dep endent s tic k breaking pro cess e s that will b e defined in the next section. 2.1. V ectors of stic k-breaking pro cesses. F ollowing a general definition o f dep endent s tic k- breaking pro cess es, essentially propo s ed in MacEachern (1999, 2001), we let (5) G i ( · ) := X k ≥ 1 W ik δ ˜ ϕ ik ( · ) i = 1 , . . . , r . where the vectors of w eights W k = ( W 1 k , . . . , W r k ) and the atoms ˜ ϕ k = ( ˜ ϕ 1 k , . . . , ˜ ϕ r k ) satisfy the following h ypo theses: • ( ˜ ϕ k ) k and ( W k ) k are sto chastically independent; BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 5 • ( ˜ ϕ k ) k is an i.i.d. seq ue nce taking v alues in X r with common pro bability distribution G 0 ; • ( W k ) k are determined via the stick breaking constr uction W ik := S ik Y j j and S k = ( S 1 k , . . . , S r k ) are sto chastically indep endent ra ndom vectors taking v alues in [0 , 1 ] r such that P k ≥ 1 W ik = 1 a.s. for every i . Note that ( G 1 , . . . , G r ) is a v ector of dependent random meas ures whenever ( S 11 , . . . , S r 1 ) or ( ˜ ϕ 11 , . . . , ˜ ϕ r 1 ) are vectors of dep endent random v aria bles. The dep endence b etw een the measures affects the dep endence structur e underlying the densities f 1 , . . . , f r , which can b e repr esented as infinite mixtures (6) f i ( y ) = X k ≥ 1 W ik K i ( y | ˜ ϕ ik ) i = 1 , . . . , r functions of the atoms ( ˜ ϕ k ) k and the weigh ts ( W k ) k . The a b ove definition of dep endent random measures is quite general. F or the s a ke of complete- ness, w e shall notice that our spe c ific a tion of v ectors of stick-breaking pr o cesses can be e x tended, even if not stra ightforw ardly , up to include more rich structure suc h as the matrix of stick-breaking pro cesses prop os ed in Dunson et al. (20 08). In the rest of this section w e briefly discuss the choice of a toms { ˜ ϕ k } k and analyze some general fea tures of vectors of stic k breaking pro cesses . While in the next section we fo cus on the main c o ntribution of this works which is a new sp ecifica tion of the stick-vectors { S k } k based on multiv a riate b eta distribution. 2.2. Atoms. The simplest assumption for the atoms is that they a r e commo n to all the measures G i . O therwise stated this means that the base measur e of the ato ms is (7) G 0 ( A 1 × A 2 · · · × A r ) = H 0 ( \ i A i ) for every A 1 , . . . , A r measurable subsets of X , H 0 being a probability measure on X , which corre- sp onds to the case (8) ( ˜ ϕ 1 k , . . . , ˜ ϕ r k ) = ( ˜ ϕ 0 k , ˜ ϕ 0 k , . . . , ˜ ϕ 0 k ) with ˜ ϕ 0 k distributed acco rding to H 0 . Even tually one ca n choos e a more co mplex structure for the law of the ato ms , including co- v aria tes (or exogenous effects) r elated to the sp ecific blo ck i . F o r instance one ca n ass ume a n ANOV A-like sc heme of the for m (9) ( ˜ ϕ 1 k , . . . , ˜ ϕ r k ) = ( ˜ µ 0 k + ˜ µ 1 k , ˜ µ 0 k + ˜ µ 2 k , . . . , ˜ µ 0 k + ˜ µ r k ) 6 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN where ˜ µ 0 k represents the overall ”mean” (of the k - th mixture comp onent) a nd ˜ µ ik the spe - cific ”mean” for fac tor i (of the k - th mixtur e comp onent). A similar choice has b een used in De Iorio et al. (2004). In ma n y s ituations it is rea sonable to assume that the comp onents of the mixture a re es sentially the same for all the blocks but that they ha ve different w eights. In a ddition, the choice (7)-(8) yields a simple fo r m of the co rrelatio n b etw een the rela ted random mea sure. This feature, which may be useful in the pa rameter elicitation, is discussed in the next s ubs ection. 2.3. Correlatio n and D ep ende nce Structure. In the genera l definition giv en in Subsection 2.1 the vectors S k of s tick v ariables are assumed to be independent. If in addition o ne a s sumes that they hav e the sa me distribution, i.e. that (10) ( S k ) k ≥ 1 is a se quence of i.i.d. ra ndom vectors , then it is easy to compute the correlation of t wo elements of the vector ( G 1 , . . . , G r ) a s w ell as the correla tion betw een f i ( y ) a nd f j ( y ). F or the shake of simplicit y we shall consider only the case i = 1 , j = 2 and set (11) C 1 , 2 := E [ S 11 S 21 ] 1 − E [(1 − S 11 )(1 − S 21 )] s (2 E [ S 11 ] − E [ S 2 11 ])(2 E [ S 21 ] − E [ S 2 21 ]) E [ S 2 11 ] E [ S 2 21 ] . and, for e very y , κ G 01 ( y ) := Z K 2 ( y | x ) G 01 ( dx ) , κ G 02 ( y ) := Z K 2 ( y | x ) G 02 ( dx ) , κ G 0 ( y ) := Z K 1 ( y | x 1 ) K 2 ( y | x 2 ) G 0 ( dx 1 × dx 2 × X r − 2 ) where G 0 i denotes the i - th marginal of G 0 . Prop ositio n 1. Assum e t hat (10) holds true, then for al l me asur able set A and B (12) C or ( G 1 ( A ) , G 2 ( B )) = C 1 , 2 × G 0 ( A × B × X r − 2 ) − G 01 ( A ) G 02 ( B ) p G 01 ( A )(1 − G 01 ( A )) G 02 ( B )(1 − G 02 ( B )) and for every y in Y C or ( f 1 ( y ) , f 2 ( y )) = C 1 , 2 × κ G 0 ( y ) − κ G 01 ( y ) κ G 02 ( y ) p κ G 01 ( y )(1 − κ G 01 ( y )) κ G 02 ( y )(1 − κ G 02 ( y )) . (13) The pro of of Prop osition 1 is in Appe ndix . Corollary 2. As s u me that (8 ) and (10) hold true, then for every me asur able set A (14) C or ( G 1 ( A ) , G 2 ( A )) = C 1 , 2 . wher e C 1 , 2 is define d in (11) . BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 7 2.4. Pa rtial exc hangeability . W e conclude this section by observ ing that [ Y ij : i = 1 , . . . r, j ≥ 1] is a partially exchangeable rando m a rray , indeed the joint law of the infinite pr o cess of observ a tion is characterized b y: (15) P { Y ij ∈ A ij i = 1 , . . . r, j = 1 , . . . , n i } = E r Y i =1 Z X n i Y j =1 Z A ij K i ( y ij | x i ) ν ( dy ij ) G i ( dx i ) , where the e x pec tation is r esp ect to the joint law of ( G 1 , . . . , G r ). Reca ll that an ar ray [ Y ij : i = 1 , . . . r, j ≥ 1] is said to b e row-wise par tially exchangeable if, for every n > 1, every measur able sets A ij and any permutations ρ 1 , . . . , ρ r of { 1 , . . . , n } , P { Y ij ∈ A ij i = 1 , . . . r , j = 1 , . . . , n } = P { Y iρ i ( j ) ∈ A ij i = 1 , . . . r , j = 1 , . . . , n } . In other words, the joint law is not necessa rily in v ar ia nt to p ermutations of obse rv ations from different groups. F rom a practical p oint of view, the partial ex changeability represents a suitable mo del for s e ts of data that exhibit sub-s amples with some po ssibly different features. 3. Bet a-Product Po isson-Dirichlet P rocesses W e pr op ose a new class of vector of dependent pro ba bilit y measure s ( G 1 , . . . , G r ) in such a wa y that (margina lly) G i is a Diric hlet pro ces s for every i . This result follows from the Seth urman’s representation (1) if one co nsiders a m ultiv aria te distribution for ( S 1 k , . . . , S r k ) such tha t S ik ∼ B e ta (1 , α i ) , ∀ i, k . It is worth noticing tha t there are ma ny po ssible definitions of mult iv ar iate b eta dis tribution (see for ex a mple Olkin and Liu (2003) and Nadara jah and Kotz (2 005)), but not a ll of them has a tractable sto chastic representation and leads to s imple B ay e s ian inference pro cedur e s. F or this reason w e follow Nadar a jah and Kotz (2005) and consider a suitable pro duct o f indep endent b eta random v a riables. Mor e sp ecifically we apply the fo llowing result. Prop ositio n 3 (Radhakrishna Rao (19 49)) . If U 1 , U 2 , . . . , U p ar e indep endent b eta r andom vari- ables with shap e p ar ameters ( a i , b i ) , i = 1 , 2 , . . . , p and if a i +1 = a i + b i , i = 1 , 2 , . . . , p − 1 , then the pr o duct U 1 U 2 · · · U p is also a b eta r andom variable with p ar ameters ( a 1 , b 1 + · · · + b p ) . Pr o of. It is easy to chec k that the Mellin transfor m of a b eta ra ndom v ariable Z of pa r ameters ( a, b ) is given b y M Z ( s ) := E [ Z s ] = Γ( a + b )Γ( a + s ) Γ( a )Γ( a + b + s ) . Hence, since a 2 = a 1 + b 1 and using the indep endence as sumption M U 1 U 2 ( s ) = E [ U s 1 ] E [ U s 2 ] = Γ( a 1 + b 1 )Γ( a 1 + s ) Γ( a 1 )Γ( a 1 + b 1 + s ) Γ( a 2 + b 2 )Γ( a 2 + s ) Γ( a 2 )Γ( a 2 + b 2 + s ) = Γ( a 1 + b 1 + b 2 )Γ( a 1 + s ) Γ( a 1 )Γ( a 1 + b 1 + b 2 + s ) . Whic h gives the result for p = 2. The genera l case follows. 8 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN W e obtain tw o alterna tive sp ecifications of the multidimensional b eta v ar iables. Sp ecifica lly , if we s et (16) ( S 1 k , S 2 k , . . . , S r k ) := ( V 0 k V 1 k , V 0 k V 2 k , . . . , V 0 k V r k ) with V 0 k , . . . , V r k independent, V ik ∼ B eta ( α 0 k , α 1 k ) ( i = 1 , 2 , . . . , r ) and V 0 k ∼ B eta ( α 0 k + α 1 k , α 2 k ), then S ik ∼ B eta ( α 0 k , α 1 k + α 2 k ). As an alter native we consider (17) ( S 1 k , S 2 k , . . . , S r k ) := ( V 0 k V 1 k . . . V r − 1 k , V 0 k V 1 k . . . V r − 2 k , . . . , V 0 k ) with V 0 k , . . . , V r − 1 k independent and V ik ∼ B eta ( α 0 k + · · · + α ik , α i +1 k ), i = 0 , . . . , r − 1, that gives S ik ∼ B eta ( α 0 k , α 1 k + · · · + α r +1 − i,k ). It should be noted that (16) resembles the specification of the matrix stic k-brea king pro cess in Dunson et al. (2008) . In that paper all the comp onents of the stick-breaking pro cess a re pro ducts of t wo indepe ndent b eta v ar ia bles with fixed parameter s (1 , α ) and (1 , β ), that prec ludes obtaining Poisson-Dirichlet marginals. F or this reason we pro po se the sp ecifica tions in (1 6) and (17) that, for a sp ecia l choice of the parameters α ik , allow fo r P oisson- Dir ichlet marginals. In the fir s t construction w e obtain a rando m vector with iden tical marginal distributio ns , while in the second construction the vector ha s different marg inals. Mor eov er, in the seco nd case S 1 k ≤ S 2 k ≤ · · · ≤ S r k , whic h induces an ordering on the concentration para meters of the Diric hlet mar g inals. These asp ects will b e further disc ussed in the following s e ctions. 3.1. Diri c hl et Pro cess marginal. F or the sake of clarity , w e start with r = 2. W e assume (10) and we discuss how to choos e the pa rameters in (16)-(17) in order to get DP ma rginals. Note that since (10) ho lds true, then S k = ( S 1 k , S 2 k ) ∼ ( S 11 , S 21 ). According to the co nstruction schemes g iven in (1 6) and (17), with α 0 k = 1, α 1 k = α 1 , α 2 k = α 2 , tw o p ossible a nd alternative sp ecifications of ( S 11 , S 21 ) are: (H1) ( S 11 , S 21 ) := ( V 0 V 1 , V 0 V 2 ) , with V 0 , V 1 , V 2 indep endent, V 0 ∼ B eta (1 + α 1 , α 2 ) and V i ∼ B e ta (1 , α 1 ) , i = 1 , 2 , wher e α 1 > 0 and α 2 > 0; (H2) ( S 11 , S 21 ) := ( V 0 V 1 , V 0 ) , with V 0 , V 1 indep endent, V 0 ∼ B eta (1 , α 1 ) and V 1 ∼ B eta (1 + α 1 , α 2 ) with α 1 > 0 and α 2 > 0. Thanks to Pr op osition 3, if (H1) holds, then S 11 ∼ B eta (1 , α 1 + α 2 ) and S 21 ∼ B eta (1 , α 1 + α 2 ), while if (H2) holds, then S 11 ∼ B eta (1 , α 1 + α 2 ) and S 21 = V 0 ∼ B eta (1 , α 1 ). Hence, we ha ve the following result. Prop ositio n 4. Under (10) , if (H1) holds tru e, G 1 and G 2 ar e (mar ginal ly) Dirichlet pr o c esses with the same pr e cision p ar ameter α 1 + α 2 and b ase me asur es G 01 , G 02 r esp e ctively. If (H2) holds true, then the first c omp onent of ( G 1 , G 2 ) is a Dirichlet pr o c ess with pr e cision p ar ameter α 1 + α 2 BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 9 Correlation under H1 α 2 α 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 α 2 α 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Correlation under H2 α 2 α 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 α 2 α 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Figure 1. Left column: cor relation b etw een S 11 and S 21 under (H1) (first row) and (H 2) (second ro w). Right column: co rrelation betw een G 1 and G 2 , assuming (7), under (H1) (first row) and (H2) (seco nd row) and b ase me asur e G 01 and the se c ond c omp onent is a Dirichlet pr o c ess with pr e cision p ar ameter α 1 and b ase me asur e G 02 . Since with this constructio n the G i ’s ar e Dirichlet pro c esses, we c all ( G 1 , G 2 ) Beta-P ro duct Dependent Diric hlet Pro cess of parameters α = ( α 1 , α 2 ) and bas e measure G 0 , in sho rt β i − DDP( α, G 0 ), wher e i = 1 for (H1) and i = 2 for (H2). In addition, when we assume (7), we denote the resulting pr o cess with β i − DDP( α, H 0 ). It should b e noted that the tw o pro cesses hav e differen t mar ginal b ehaviors. The β 1 − DDP( α, G 0 ) pro cess has marginals with the s ame precision parameter and should b e used a s a prior when the clustering along the differe nt vector dimension is exp ected to b e s imila r. In the β 2 − DDP( α, G 0 ) pro cess, the prec is ion parameter decreases alo ng the v ector dimension. This pro cess should be used as a prior when a priori one susp ects that the cluster ing features are different in the subgroups of obser v ations. 10 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN F or pa rameter elicita tio n purp oses , it is useful to analyze how the choice of ( α 1 , α 2 ) affects the correlation betw een G 1 and G 2 . Let us start by considering the correlation betw een the stick v ar ia bles. F rom Theo r em 4 and 6 in Nadar a jah and Kotz (2005), one obtains the following correla tion betw een the comp onents S 1 h and S 2 h in the ca ses of (H1) a nd (H2) C or ( S 11 , S 21 ) = α 2 ( α 1 +1)( α 1 + α 2 ) for ( H 1) q α 1 (2+ α 1 + α 2 ) ( α 1 + α 2 )(2+ α 1 ) for ( H 2) . Fig. 1 shows the correlation level b etw een the stick-breaking c omp onents (left column) and the random measures (right column) for differen t v alues of α 1 and α 2 . In thes e graphs, the white color is used fo r correlatio n v alues equal to one and the blac k is used for a corre la tion v a lue equal to zero. The gr ay a reas repr esent correlatio n v alues in the (0 , 1) interv a l under b oth (H1) and (H2) b eta mo dels . According to the graph at the top-left of Fig. 1 , one ca n conclude that the parametriza tion used in this paper allows for cov er ing the whole ra nge of p oss ible correlatio n v alues in the (0,1) in terv al. F or instanc e , under (H1), a low cor relation b etw een the comp onents of the stick-breaking cor resp onds to lo w v alues of the parameter α 1 , say b etw een 0 and 0.1, fo r any c hoice of the par ameter α 2 . Corollary 5. Un der the same assumptions of Pr op osition 4 , C 1 , 2 = ( α 1 + α 2 +1)( α 1 +2) 2( α 1 +1)( α 1 + α 2 +1) − ( α 1 +2) for ( H 1) 2( α 1 +1) ( α 1 +2)(2 α 1 + α 2 +1) − α 1 p ( α 1 + 1)( α 1 + α 2 + 1) for ( H 2 ) . Recall that if (7) holds tr ue, then fo r any measurable set A : C or ( G 1 ( A ) , G 2 ( A )) = C 1 , 2 . The gr a phs at the bottom of Fig. 1 s how how the par ameters α 1 and α 2 affect the c o rrelatio n betw een the comp o nents S 1 h and S 2 h of the biv ariate beta used in the stic k breaking pro cess and the corr elation be t ween the random measure s G 1 and G 2 – when as suming (7). It is worth noticing that, under (H1), ( S 11 , S 21 ) c onv er ges (in distribution) to ( V 1 , V 2 ) a s α 2 → 0, where V 1 and V 2 independent r a ndom v a riables with distribution B eta (1 , α 1 ). While, under (H2), ( S 11 , S 21 ) conv erges to ( V 0 , V 0 ) as α 2 → 0, where V 0 is a B eta (1 , α 1 ) rando m v aria ble. In particular, if one ass umes (7) and (H2), when α 2 → 0, one g ets the limit situation in which all the obser v ations are sampled fro m a co mmon mix tur e of Dirichlet pro cesses. In other words, in this limit ca se, as can b e seen by (15) for G 1 = G 2 , o ne considers the observ a tions (globally) exchangeable, so no distinction b etw een the t wo blo cks are allowed. The o ther limiting case is when one ass umes (H1) and takes ( ˜ ϕ 1 k , ˜ ϕ 2 k ) to b e independent r andom elements with probability distributio n G 01 and G 02 . In the limit for α 2 → 0, one o btains tw o indep endent Dirichlet pro cesses G 1 and G 2 with ba se measures G 01 and G 02 . In other words, with this choice, one cons ide r s the blo cks of BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 11 observ a tions as tw o indep endent blo cks of exchangeable v aria bles and no sharing of informa tion is allow ed. The (H1) constructio n for the case r > 2 follows immediately from (16) with α 0 k = 1, α 1 k = α 1 , α 2 k = α 2 , tha t is by assuming V 0 k , V 1 k , V 2 k , . . . , V r k to b e indep endent with V 0 k ∼ B eta (1 + α 1 , α 2 ) and V ik ∼ B eta (1 , α 1 ), i = 1 , 2 , . . . , r , where α 1 > 0 and α 2 > 0. In this case, S ik has B eta (1 , α 1 + α 2 ) distribution for i = 1 , . . . , r a nd hence G i is marginally D P ( α 1 + α 2 , G 0 i ). Also the for mula for the co rrelation b etw ee n tw o measur es easily extends to the case r > 2 under (7): C or r ( G i ( A ) , G j ( A )) = ( α 1 + α 2 + 1)( α 1 + 2) 2( α 1 + 1)( α 1 + α 2 + 1) − ( α 1 + 2) . The (H2) construction extends to r > 2 by setting in (17) α 0 k = 1 and α ik = α i ( i ≥ 1 ), that is by ta king V 0 k , . . . , V r − 1 k to b e indep endent rando m v ar iables with V 0 k ∼ B e ta (1 , α 1 ), V 1 k ∼ B eta (1 + α 1 , α 2 ), . . . , V r − 1 k ∼ B eta (1 + α 1 + · · · + α r − 1 , α r ), wher e α i > 0 for all i = 1 , . . . , r . In this la s t case S ik ∼ B eta (1 , α 1 + · · · + α r − i +1 ) for every i = 1 , . . . , r . Hence G i is marginally DP ( α 1 + · · · + α r − i +1 , G 0 i ). Under (7) the correla tion b etw ee n G i ( A ) and G j ( A ) with 1 ≤ i < j ≤ r is g iven b y C or r ( G i ( A ) , G j ( A )) = 2 p (1 + α 1 + · · · + α r − i +1 )(1 + α 1 + · · · + α r − j +1 ) 3 2 2(1 + α 1 + · · · + α r − j +1 ) 2 + (2 + α 1 + · · · + α r − j +1 )( α r − j +2 + · · · + α r − i +1 ) . (18) The pro o f of this last re s ult is given in the App endix. 3.2. Poisson-Diric hlet pro cess marginal. Recall that a Poisson– Dirichlet pro cess, P D ( α, l , H 0 ), with parameter s 0 ≤ l < 1 and α > − l , and ba se measure H 0 , is obtained by taking in (1)-(2) a sequence of indepe ndent rando m v ar iables S k with S k ∼ B eta (1 − l , α + lk ). In this section we show that b y a s uita ble choice of the pa rameters in (16) -(17) we o btain a vectors of dep endent random measur es with Poisson-Dirichlet marg ina ls. In the firs t case use (16) with α 0 k = 1 − l , α 1 k = α 1 and α 2 k = α 2 + lk , that is take V 0 k , . . . , V r k to b e indep endent random v ariables such that (19) V 0 k ∼ B eta (1 − l + α 1 , α 2 + l k ) , V ik ∼ B eta (1 − l , α 1 ) i = 1 , . . . , r where α 1 > 0, α 2 > 0, a nd 0 ≤ l < 1. Pr op osition 3 yields that S ik ∼ B eta (1 − l , α 1 + α 2 + l k ). In the second ca se use (17) with α 0 k = 1 − l , α 1 k = α 1 + l k , a nd α ik = α i for i ≥ 2, that is take V 0 k , . . . , V r − 1 k to b e indep endent random v ariables such that V 0 k ∼ B eta (1 − l , α 1 + l k ) , V 1 k ∼ B eta (1 + α 1 + l ( k − 1 ) , α 2 ) , . . . , . . . V r − 1 k ∼ B eta (1 + α 1 + · · · + α r − 1 + l ( k − 1 ) , α r ) (20) with α i > 0 i = 0 , . . . , r and 0 ≤ l < 1 . In this last case S ik has B eta (1 − l, α 1 + · · · + α r − i +1 + l k ) for every i = 1 , . . . , r . Summarizing we ha ve pr ov e d the following 12 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN Prop ositio n 6. If (1 6) and (19) hold t rue G i is a P D ( α 1 + α 2 , l , G 0 i ) for every i = 1 , . . . , r , while if (17) and (20) hold tru e G i is a P D ( α 1 + · · · + α r − i +1 , l , G 0 i ) for every i = 1 , . . . , r . 4. Slice Sampling Algorithm f or Posterior Simula tion F or posterio r co mputation, w e pr op ose a n e xtension of the slice sampling algorithm in tr o duced in W alker (200 7); Ka lli et al. (2011). F or the sa ke of simplicit y we shall describ e the sampling strategy for a v ector o f Beta - Pro duct DDP with r = 2 ( β i − DDP( α, G 0 )), see Subse ction 3.1. The prop osed alg orithm can b e e asily extend to the cas e r > 2 a nd to the Beta-Pro duct dep endent Poisson-DP . Recall that in β i − DDP( α, G 0 ) the stick v a riables are defined by ( S 1 k , S 2 k ) := ( V 0 k V 1 k , V 0 k V 2 k ) for a sequence of indep endent vectors V k := ( V 0 k , V 1 k , V 2 k ) with the same distr ibutio n of ( V 0 , V 1 , V 2 ) and the conven tion V 2 k = 1 and V k := ( V 0 ,k , V 1 ,k ) under (H2). Starting fro m (6 ), the key idea of the slice sampling is to find a finite num b er of v ar iables to be sampled. Firs t we introduce a latent v ariable u in suc h a w ay that f i ( y ) is the marginal density of f i ( y , u ) = X k ≥ 1 I { u ≤ W ik } K i ( y | ˜ ϕ ik ) . It is clear that given u , the num b er of c o mpo nents is finite. In additio n we in tro duce a further latent v aria ble d which indicates which of these finite num b er of compo nent s provides the observ a tio n, that is (21) f i ( y , u, d ) := I { u < W i,d }K i ( y | ˜ ϕ d ) . Hence, the likelihoo d function for the augmented v a r iables ( y , u, d ) is av ailable as a simple pro duct of terms a nd crucially d is finite. T o b e more precise we introduce the allo cation v ariables D ij ( i = 1 , 2 ; j = 1 , . . . , n i ) taking v alues in N and the slice v ariables U ij ( i = 1 , 2; j = 1 , . . . , n i ) taking v alues on [0 , 1 ]. W e shall use the notation Y ( n i ) i := ( Y i 1 , . . . , Y in i ) , D ( n i ) i := ( D i 1 , . . . , D in i ) , U ( n i ) i := ( U i 1 , . . . , U in i ) and we write: ˜ ϕ for ( ˜ ϕ k ) k , V for ( V k ) k , U for [ U ( n 1 ) 1 , U ( n 2 ) 2 ], D ( n ) for [ D ( n 1 ) 1 , D ( n 2 ) 2 ] a nd Y for [ Y ( n 1 ) 1 , Y ( n 2 ) 2 ]. The ra ndo m elements ( ˜ ϕ, V ) hav e the law alr e ady describ ed. While conditionally on ( ˜ ϕ , V ), the random vectors ( Y ij , U ij , D ij ), i = 1 , 2; j = 1 , . . . , n i , are sto chastically independent with the joint densit y (21). W e co nclude by observing that it can b e useful to put a pr ior distribution ev en on the h y p er pa- rameters ( α 1 , α 2 ). The law of ˜ ϕ is assumed independent of the random vector o f hyperpar ameters BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 13 ˜ α := ( ˜ α 1 , ˜ α 2 ), while the distr ibution o f V j depe nds on ˜ α through (H1) or (H2), so we sha ll write P { V j ∈ dv j | ˜ α } . Our target is the explora tion of the poster ior distribution (given Y ) of [ ˜ ϕ, V , U, ˜ α, D ] by Marko v Chain Monte Carlo sampling. Essentially we shall use a block Gibbs sampler which itera tively simulates ˜ ϕ given [ V , U, D , ˜ α, Y ], [ V , U, ˜ α ] given [ D, ˜ ϕ, Y ] and D given [ V , U, ˜ ϕ, ˜ α, Y ]. F or the one dimensio nal, this blo cking structure case has b een intro duced in Papaspiliop oulo s (2008) and Kalli et al. (2 011) as a n alternative and mor e efficien t version of the original algorithm of W alker (200 7). Our algor ithm extends the one dimensional slice sampling of K alli et al. (2011) to the mult idimensional case. This extension is not trivial as it inv o lves generation of random samples from vectors o f alloca tion and slice v ariables of a m ultiv aria te s tick-breaking pro ce s s. W e present a n efficient Gibbs sampling algorithm by ela b orating further on the blo cking str a tegy of Kalli et a l. (2011). In order to des crib e in details the full-conditionals o f the ab ov e sket ched block Gibbs sampler, we ne e d some mor e notation. Define for i = 1 , 2 and k ≥ 1, D i,k := { j ∈ { 1 , . . . , n i } : D i,j = k } , A i,k := n i X j =1 I { D i,j = k } = car d ( D i,k ) , B i,k := n i X j =1 I { D i,j > k } . Moreov er, let (22) D ∗ := max i =1 , 2 max 1 ≤ j ≤ n i D i,j . In our MCMC algo rithm we shall treat V as three blocks of random length: V = ( V ∗ , V ∗∗ , V ∗∗∗ ), where V ∗ = { V k : k ∈ D ∗ } , V ∗∗ = { V k : k 6∈ D ∗ , k ≤ D ∗ } , V ∗∗∗ = { V k : k > D ∗ } and D ∗ = { k : D 1 ,k ∪ D 2 ,k 6 = ∅} . Note that D ∗ = max { k ∈ D ∗ } and |D ∗ | ≤ D ∗ almost s urely . In the following s ubs e ctions we give the details of the full conditionals of the blo cking Gibbs sa mpler, further details on the algo rithm are given in Appendix . 4.1. The full conditional of ˜ ϕ . The atoms ˜ ϕ given [ V , D , U, ˜ α, Y ] ar e conditionally indep endent and the full conditionals are: P { ˜ ϕ k ∈ dϕ k | D , U, V , ˜ α, Y } = P { ˜ ϕ k ∈ dϕ k | D , Y } ∝ G 0 ( dϕ k ) Y j ∈D 1 ,k K 1 ( Y 1 ,j | ϕ 1 k ) Y j ∈D 2 ,k K 2 ( Y 2 ,j | ϕ 2 k ); (23) where ϕ k = ( ϕ 1 k , ϕ 2 k ). The strateg y for sampling from this full conditional dep ends on the specific form of K i and G 0 . In the next section we will discuss a p oss ible stra tegy for Gaussian kernels. 14 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN 4.2. The full conditional of [ V , U, ˜ α ] . In order to sample fro m the conditional distribution of [ V , U, ˜ α ] given [ D, ˜ ϕ, Y ] a further blo cking is used: • [ V ∗ , ˜ α ] g iven [ D, ˜ ϕ, Y ]. The joint conditional distribution of [ V ∗ , ˜ α ] g iven [ D, ˜ ϕ, Y ] is P { V ∗ ∈ dv ∗ , ˜ α ∈ ( dα 1 , dα 2 ) | Y , ˜ ϕ, D } = P { V ∗ ∈ dv ∗ , ˜ α ∈ ( dα 1 , dα 2 ) | D } ∝ Y k ∈D ∗ Q k ( v k | D , α 1 , α 2 ) B ( α 1 + 1 , α 2 ) B 2 (1 , α 1 ) dv k π ( dα 1 , dα 2 ) (24) where π ( dα 1 , dα 2 ) = P { ˜ α ∈ ( dα 1 , dα 2 ) } is the prior on the concentration parameters and (25) Q k ( v k | D , α ) := v α 1 + A 1 k + A 2 k 0 k (1 − v 0 k ) α 2 − 1 Y i =1 , 2 v A ik ik (1 − v ik ) α 1 − 1 (1 − v 0 k v ik ) B ik under (H1) with v k = ( v 0 k , v 1 k , v 2 k ) v A 1 k + A 2 k 0 k (1 − v 0 k ) α 1 + B 2 k − 1 v α 1 + A 1 k 1 k (1 − v 1 k ) α 2 − 1 (1 − v 0 k v 1 k ) B 1 k under (H2) with v k = ( v 0 k , v 1 k ) T o sample fr om (24), w e itera te a tw o -step Metro po lis-Hastings (M.-H.) within Gibbs with full conditionals (26) P { V ∗ ∈ dv ∗ | ˜ α, D } ∝ Y k ∈D ∗ Q k ( v k | D , ˜ α ) dv k and P { ˜ α ∈ ( dα 1 , dα 2 ) | V ∗ , D } ∝ Y k ∈D ∗ 1 B ( α 1 + 1 , α 2 ) v α 1 0 k (1 − V 0 k ) α 2 − 1 · Y i =1 , 2 1 B (1 , α 1 ) 2 (1 − V ik ) α 1 − 1 π ( dα 1 , dα 2 ) . (27) F or the ea ch ele ment ( V 0 k , V 1 k , V 2 k ) o f V ∗ we co nsider a multiv ariate Gaussian ra ndom walk pr op osal with dia gonal scale matrix τ 2 I 3 , with τ 2 = 0 . 05 in order to hav e acceptance rates b et ween 0.3 and 0.5 for the elements of V ∗ . • [ V ∗∗ , V ∗∗∗ ] g iven [ D , V ∗ , ˜ ϕ, ˜ α, Y ]. The V k (with k 6∈ D ∗ ) are c o nditionally indep endent given [ D , V ∗ , ˜ ϕ, ˜ α, Y ] with P { V k ∈ dv k | ˜ α, D , V ∗ } ∝ Q k ( v k | D , ˜ α ) dv k if k ≤ D ∗ and P { V k ∈ dv | V ∗ , ˜ ϕ, D , ˜ α, Y } = P { V k ∈ dv | ˜ α } if k > D ∗ . Note that if k 6∈ D ∗ and k ≤ D ∗ , then A i,k = 0 in the definition of Q k ( v k | D , ˜ α ). In or der to sample from Q k ( v k | D , ˜ α ) the same M.-H. step, used for the full conditional in (2 6), is employed. • U given [ V , D , ˜ ϕ, ˜ α, Y ]. The slice v ar iables U are conditionally indep endent given [ V , D , ˜ ϕ, ˜ α , Y ] with (28) P { U i,j ∈ du | V , Y , ˜ ϕ, D } = P { U i,j ∈ du | V , D } = I { u ≤ W i,D i,j } W i,D i,j du. BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 15 4.3. The full conditional of D . The D ’s are co nditionally independent given [ V , U , ˜ ϕ, ˜ α, Y ] with (29) P { D i,j = d | ˜ ϕ, V , U, ˜ α, Y } ∝ K i ( Y i,j | ˜ ϕ id ) I { U i,j ≤ W i,d } . Here an imp ortant remark is in order. As in the slic e sampling prop o s ed in W alker (2007); Kalli et a l. (2 011), the full co nditio nal (29) samples, almost surely , from a finite num b er of terms. So aga in it is easy to sa mple from this full conditional. Mor e precisely , following W alker (2007), we note that d > N ∗ i,j ensures that W i,d < U i,j where N ∗ i,j ( i = 1 , 2; j = 1 , . . . , n i ) is the smalles t int eger such that (30) N ∗ i,j X k =1 W i,k > 1 − U i,j . 5. Ill ustra tions W e apply our new Be ta -pro duct dep endent Dirichlet pro c ess to make inference for mixture of normals and mixture o f vector autor egress ive pro cesses. T he r esulting mo del and inference pro cedure ha ve b een applied on b oth simulated data and real data on the industrial pro ductio n in the United States and the Europ ean Union. 5.1. β 1 − DDP ( α, H 0 ) mixtures of Gaussian distributio ns. In this section, we apply our β 1 − DDP( α, H 0 ) Gaussian mixture model fo r inference on sy nthetic data generated fr om finite Gaussian mixtur es. More pr ecisely we as sume (8)-(10), (H1) and Ga us sian kernels K i ( y | ϕ ) ( i = 1 , 2 ) with means µ and v ar ia nce σ 2 for ϕ = ( µ, σ 2 ), i.e. K i ( y | ϕ ) = 1 √ 2 π σ exp n − 1 2 σ 2 ( y − µ ) 2 o . As ba se measure H 0 ( dϕ ) we ta ke the pr o duct o f a normal N (0 , s − 2 ) a nd in verse gamma I G ( λ, λ ), which are conjugate distributions for the biv aria te kernel at hand. F o r the vector α = ( α 1 , α 2 ) of the precision parameters of the biv ar iate Diric hlet proces s w e consider indep endent gamma priors G ( ζ 11 , ζ 21 ) G ( ζ 12 , ζ 22 ). In summary the Bay esian non–parametric mo del is Y ij | ( ˜ µ ij , ˜ σ 2 ij ) ind ∼ N ( ˜ µ ij , ˜ σ 2 ij ) i = 1 , 2 , j ≥ 1 ( ˜ µ ij , ˜ σ 2 ij ) | G 1 , G 2 iid ∼ G i i = 1 , 2 ( G 1 , G 2 ) | ˜ α ∼ β 1 − DDP( ˜ α, H 0 ) ˜ α ∼ G ( ζ 12 , ζ 22 ) G ( ζ 12 , ζ 22 ) The s ampling procedur e fo r U and D given in the previo us s ection applies stra ightf orwardly to this example. W e shall describe here in more details the sa mpling strategy for the other unkno wn quantities of the mo del. F or the sa ke of simplicity we will omit indicating the dep endence of the full conditional o n the hyper parameter s . 16 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN In o rder to sample from the full-co nditional P { ˜ ϕ k ∈ dϕ k | V , D , Y ( n ) , U } , for k ≥ 1 we consider a tw o -step Gibbs sa mpler with full co nditional distributions P { ˜ µ k ∈ dµ k | ˜ σ − 2 k , D , Y ( n ) } ∝ ∝ exp − 1 2 s − 2 µ 2 k Y m =1 , 2 Y i ∈D m,k exp − 1 2 ˜ σ 2 k ( Y mi − µ k ) 2 dµ k ∝ exp − 1 2 µ 2 k s 2 + ˜ σ − 2 k ( A 1 ,k + A 2 ,k ) + µ k 1 ˜ σ 2 k ( η (1) 1 ,k + η (1) 2 ,k ) dµ k (31) and P { ˜ σ − 2 k ∈ dσ − 2 k | ˜ µ k , D , Y ( n ) } ∝ ∝ exp − λσ − 2 k ( σ − 2 k ) λ +1 Y m =1 , 2 Y i ∈D m,k ( σ − 2 k ) 1 / 2 exp − 1 2 σ 2 k ( Y mi − ˜ µ k ) 2 dσ − 2 k ∝ exp − ( λ + 1 2 ( η (2) 1 ,k + η (2) 2 ,k )) σ − 2 k ( σ − 2 k ) λ + 1 2 ( A 1 ,k + A 2 ,k ) − 1 dσ − 2 k (32) which are prop o rtional to the density function o f a nor ma l (33) N ˜ σ − 2 k ( η (1) 1 ,k + η (1) 2 ,k ) s 2 + ˜ σ − 2 k ( A 1 ,k + A 2 ,k ) , 1 s 2 + ˜ σ − 2 k ( A 1 ,k + A 2 ,k ) ! and an inverse g amma (34) I G λ + 1 2 ( A 1 ,k + A 2 ,k ) , λ + 1 2 ( η (2) 1 ,k + η (2) 2 ,k ) resp ectively , where A m,k , m = 1 , 2 hav e been defined in the previous section and (35) η (1) m,k = X i ∈D m,k Y im and η (2) m,k = X i ∈D m,k ( Y im − ˜ µ k ) 2 . A sample fro m the conditional joint distribution of the pr e cision parameters and the stick breaking elements ca n b e obta ined following the blocking scheme describ ed in Subse c tion 4.2. Since w e a ssume ga mma prio rs, G ( ζ 11 , ζ 21 ) and G ( ζ 12 , ζ 22 ) for α 1 and α 2 resp ectively , (27) b eco mes P { ˜ α ∈ ( dα 1 , dα 2 ) | V ∗ , D } ∝ 1 B ( α 1 + 1 , α 2 ) d 1 1 B (1 , α 1 ) d 2 α ζ 11 − 1 1 exp − α 1 ¯ ζ 21 α ζ 12 − 1 2 exp − α 2 ¯ ζ 22 dα 1 dα 2 (36) where ¯ ζ 21 = ζ 21 − P k ∈D (log( V 0 k )+ lo g(1 − V 1 k )+ lo g(1 − V 2 k )) a nd ¯ ζ 22 = ζ 22 − P k ∈D log(1 − V 0 k ) a nd d 1 = card ( D ∗ ) a nd d 2 = 2 d 1 . W e simulate from the full co nditional by a M.- H. step. W e co nsidered t wo a lternative pr o p osals. First we assume indep endent pro p o sals. At the j - th iteration, given α ( j − 1) , we sim ulate (37) α ( ∗ ) 1 ∼ G a ( ζ 11 , ζ 21 ) , α ( ∗ ) 2 ∼ G a ( ζ 12 , ζ 22 ) and accept with probability (38) min ( 1 , B ( α ( j − 1) 1 + 1 , α ( j − 1) 2 ) d 1 B (1 , α ( j − 1) 1 ) d 2 B ( α ( ∗ ) 1 + 1 , α ( ∗ ) 2 ) d 1 B (1 , α ( ∗ ) 1 ) d 2 ) . BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 17 In our exp eriments this kind of prop os al turns out to b e highly inefficient and the M.-H. exhibits low acceptance ra tes, th us we co nsider a gamma random walk prop osal. A t the j -th iteration of the algo rithm, giv en the pre v ious v alue ( α ( j − 1) 1 , α ( j − 1) 2 ) of the chain, we sim ulate (39) α ( ∗ ) 1 ∼ G a ( κ ( α ( j − 1) 1 ) 2 , κα ( j − 1) 1 ) , α ( ∗ ) 2 ∼ G a ( κ ( α ( j − 1) 2 ) 2 , κα ( j − 1) 2 ) where κ r epresents the scale of random walk. The pr o p osal is ac c e pted with probability (40) min ( 1 , P { ( α ( ∗ ) 1 , α ( ∗ ) 2 ) | V ∗ , D } P { α ( j − 1) 1 , α ( j − 1) 2 | V ∗ , D } g ( α ( j − 1) 1 | α ( ∗ ) 1 ) g ( α ( ∗ ) 1 | α ( j − 1) 1 ) g ( α ( j − 1) 2 | α ( ∗ ) 2 ) g ( α ( ∗ ) 2 | α ( j − 1) 2 ) ) where g ( y | x ) = 1 Γ( κx 2 ) ( κx ) κx 2 y κx 2 − 1 e − κxy is the conditional density of the ga mma random walk propo sals. W e set the sc ale parameter κ in order to have ac ceptance rates close to 0.5. W e s im ulate n = 50 indep endent vectors, ( Y 1 ,j , Y 2 ,j ) with j = 1 , . . . , n , of observ atio ns. The comp onent of the vectors ( Y 1 ,j , Y 2 ,j ) are indep endent and alternativ ely follow one of these mo dels . • The same thr e e-comp onent mixture of normals (mo del Mix1) 1 3 N ( − 10 , 1) + 1 3 N (0 , 1) + 1 3 N (10 , 1 ) 1 3 N ( − 10 , 1) + 1 3 N (0 , 1) + 1 3 N (10 , 1 ) • Two differen t mixtures with tw o commo n comp onents (model Mix2) 1 4 N (0 , 0 . 5 ) + 1 4 N (3 , 0 . 2 5) + 1 4 N (2 , 0 . 2 5) + 1 4 N (5 , 0 . 5 ) 1 4 N (0 , 0 . 5 ) + 1 4 N (3 , 0 . 2 5) + 1 4 N ( − 3 , 0 . 25) + 1 4 N (7 , 0 . 5 ) • The same three-comp onent mixture of normals with different comp onent pr obabilities (mo de l Mix3) 1 3 N ( − 10 , 1) + 1 3 N (0 , 1) + 1 3 N (10 , 1 ) 1 6 N ( − 10 , 1) + 4 6 N (0 , 1) + 1 6 N (10 , 1 ) The simulated set of data considered in the exp eriments are given in Fig. 2. In the left column is the histogra m of the first comp onent of the set of da ta a nd in the right column is the histogr am of the s econd comp onent. Then w e estimate the β 1 − DDP( α, H 0 ) mixture mo del on the different set of data. In the inference exercise, we choose a non–informative prior sp ecification fo r the mean and precisio n parameters of the ba se measure and set s 2 = 0 . 1, λ = 0 . 5 (see for exa mple W alker (2 007)). F or the concentration parameter s ( α 1 , α 2 ) of the stick-breaking comp o nents, we fo llow Ka lli et al. (201 1) a nd consider tw o alternative sp ecificatio ns of the pr iors: weakly informative (WI) prio r and strongly informa tive (SI) prior. F or the WI case, the hyperpara meters setting is ( ζ 1 j = 0 . 0 1 , ζ 2 j = 0 . 0 1), for j = 1 , 2, in all the Mix1, Mix2 and Mix3 exp eriments. 18 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN −20 0 20 0 0.05 0.1 0.15 0.2 −20 0 20 0 0.05 0.1 0.15 0.2 −5 0 5 10 0 0.02 0.04 0.06 0.08 0.1 0.12 −5 0 5 10 0 0.05 0.1 0.15 0.2 −20 0 20 0 0.05 0.1 0.15 0.2 −20 0 20 0 0.05 0.1 0.15 0.2 Figure 2. Sample o f data for the first (left) and second (rig h t) comp onent gen- erated under differe nt mo del assumptions: Mix1 (first row), Mix2 (se c o nd row), and Mix3 (third row). This setting corr esp onds to diffuse priors on the concentration parameters, with prior mea ns E ( ˜ α 1 ) = E ( ˜ α 2 ) = 1 and v a riances V ( ˜ α 1 ) = V ( ˜ α 2 ) = 10 0, and to a low prior dependence level ( C or ( G 1 , G 2 ) = 0 . 6902 ) b etw ee n the ra ndom marginal densities. In o ur WI setup, a sma ll amount of information exchange is allowed a prio ri, betw een the tw o margina l densities and the p os terior level of information ex changed is heavily a ffected b y the empirical ev idenc e . F or the SI ca se, a large a mo unt of infor mation ex change is desir ed ins tead b etw een the tw o sets o f data . In the SI, we set ( ζ 11 = 100 , ζ 21 = 400) and ( ζ 12 = 100 , ζ 22 = 200) in the Mix1 and Mix3 examples and ( ζ 11 = 10 , ζ 21 = 100) and ( ζ 12 = 200 , ζ 22 = 10 0) in the Mix2 example. These settings corr esp ond to a v ery concentrated prior and a hig h prior de p endence level b etw een the t wo margina l dens ities C or ( G 1 , G 2 ) = 0 . 7609 and C or ( G 1 , G 2 ) = 0 . 9343 resp ectively . F or both the WI and SI settings, the Gibbs s ampler, pr esented in the previous section, was run for 20,0 0 0 iteratio ns. The raw o utput of the MCMC chain for the num b er o f clusters is given in Fig. BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 19 3. F or the estimation of the num b er of cluster s, a burn-in p er io d of 10,000 samples w a s discar ded. A t each Gibbs iter ation fro m 10 ,000 onw ards, a s ample ( Y 1 ,n +1 , Y 2 ,n +1 ) from the predictive was taken. The solid lines in Figure 4 show the estimated predictive distributions using 10,0 00 samples from the Gibbs and the original sets of data . Fig. 5 shows the raw output a nd the ergo dic average of the MCMC chain for the para meters α 1 and α 2 in the WI and SI prior settings. 5.2. β 1 − DDP ( α, H 0 ) mixtures of v ector autoregress iv e p ro cesses. In pa rametric models for the gr owth r ate of the industrial pr o duction (business cycle), grea t adv ances hav e b een made by allowing for separa te para meter v alues in p e r io ds (ca lled reg imes ) of recess ion and expa nsion. The seminal pap er of Ha milton (1989) pr op oses to use a dynamic mixture mo del with tw o co m- po nent s for capturing clustering of observ ations during the recession and expans ion phases in a business cycle. This simple mo del has b een successfully extended in many directions. In pa rticu- lar, the estimation o f the num b er of regimes is a n impo rtant issue studied in many pap ers (e.g ., Kim and Mur r ay (2 0 02), Kim and P iger (200 0) and Krolzig (200 0)). The e stimation of the num- ber of regimes is still an op en is s ue in the ana lysis of the business cycle. Mor eov er, specifically it is in teresting to verify whether the strong con traction in 200 9 ca lls fo r the use of a higher nu mber of reg ime than three or four in business cy c le mo dels. The above cited pap er s cons ider par ametric models with a regime-switching mechanism and use some mo del selection criteria to estimate the num ber of regimes. Co nv er sely , in this paper , we prop ose a non–par a metric approa ch to the joint estimation of the num b er o f r egimes in m ultiple time series . W e assume our Dirichlet mixture pro c ess β 1 − DDP( α, H 0 ) as a prior for the parameters of a vector autore gressive mo del (V AR) for time ser ies da ta . W e consider t w o well studied cycles of the in ternationa l economic system: the United States (US) and the E urop ean Union (EU) c ycles. Even if the feature s of the regimes (or cluster s) in the US and the EU gr owth rates ar e different, one could exp ect that the regimes in the tw o cycles also exhibit some dep endence. F or this reason, we apply a dependent multiv ariate Dirichlet pro cess to account for the similarity b etw e e n the clustering of the t w o series. In this sense, our mo del extends the existing liter a ture o n the use of univ aria te Dir ichlet proce s s prior for time series mo dels. In this litera ture, the sa me clustering pro cess is usua lly assumed for a ll the para meters of a multiv ariate mo del. W e consider sea sonally and working day a djusted industrial pro duction indexes (IPI), at a monthly frequency from the time o f April 19 71 to Jan uary 2011, for the US and the EU, { X 1 t } T t =1 and { X 2 t } T t =1 resp ectively (see first row in Fig. 6). W e take the quarterly growth r ate: Y it = log X it − log X it − 4 (second row in Fig. 6). The histograms of these time ser ies (see histogr ams Fig. 7) exhibit many mo des that ar e the r esults of differen t regimes in the ser ies. W e consider the 20 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN W eakly Informativ e Prior (WI) Strongly Informativ e Prior (SI) 0 10 20 0 0.1 0.2 0.3 0.4 0 10 20 0 0.1 0.2 0.3 0.4 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0 10 20 0 0.1 0.2 0.3 0.4 0.5 0 10 20 0 0.1 0.2 0.3 0.4 1 2 3 4 5 6 7 8 9 10 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0 10 20 0 0.1 0.2 0.3 0.4 0 10 20 0 0.1 0.2 0.3 0.4 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 Figure 3. Num b er of clusters for different prio r settings (panels WI and SI), for the fir s t (left column) and seco nd co mpo nent (right column) and for the different mo dels: Mix1 (first row), Mix2 (second row) a nd Mix3 (third row). W eakly Informativ e Prior (WI) Strongly Informativ e Prior (SI) −20 0 20 0 0.05 0.1 0.15 0.2 0.25 −20 0 20 0 0.1 0.2 0.3 0.4 −20 0 20 0 0.05 0.1 0.15 0.2 −20 0 20 0 0.05 0.1 0.15 0.2 −5 0 5 10 0 0.1 0.2 0.3 0.4 0.5 −5 0 5 10 0 0.1 0.2 0.3 0.4 −5 0 5 10 0 0.2 0.4 0.6 0.8 −5 0 5 10 0 0.2 0.4 0.6 0.8 −20 0 20 0 0.05 0.1 0.15 0.2 −20 0 20 0 0.1 0.2 0.3 0.4 −20 0 20 0 0.05 0.1 0.15 0.2 −20 0 20 0 0.1 0.2 0.3 0.4 Figure 4. Approximated predictive density (solid lines) for the tw o settings (panels WI and SI) and for the first (left) and second (right) comp onent of the data from mo dels Mix1 (first row), Mix2 (second r ow), and Mix3 (third r ow). BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 21 W eakly Informativ e Prior (WI) Strongl y Informativ e Prio r (SI) 5000 10000 0 0.5 1 1.5 2 2.5 3 MCMC α 1 5000 10000 0 0.5 1 1.5 2 2.5 3 3.5 4 MCMC α 2 5000 10000 0 0.5 1 1.5 2 MCMC α 1 5000 10000 0 0.5 1 1.5 2 MCMC α 2 5000 10000 0 0.5 1 1.5 2 2.5 3 MCMC α 1 5000 10000 0 0.5 1 1.5 2 2.5 MCMC α 2 5000 10000 0 0.5 1 1.5 2 MCMC α 1 5000 10000 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 MCMC α 2 5000 10000 0 0.5 1 1.5 2 2.5 3 3.5 MCMC α 1 5000 10000 0 0.5 1 1.5 2 2.5 3 3.5 MCMC α 2 5000 10000 0 0.5 1 1.5 2 MCMC α 1 5000 10000 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 MCMC α 2 Figure 5. Output of the M.-H. within Gibbs for the tw o s ettings (panels WI and SI) for the parameter s α 1 and α 2 of the mode ls Mix1 (firs t row), Mix2 (se c ond row), and Mix3 (third row). following specifica tion for the V AR mo del Y 1 t Y 2 t = ˜ µ 1 t ˜ µ 2 t + Z ′ t O ′ 2 p O ′ 2 p Z ′ t Υ 1 Υ 2 + ε 1 t ε 2 t for t = 1 , . . . , T , wher e O 2 p = (0 , . . . , 0 ) ′ ∈ R 2 p , Υ i = ( υ 1 , 1 ,i , . . . , υ 1 , 2 p,i ) ′ and Z t = ( Y 1 t − 1 , . . . , Y 1 t − p , Y 2 t − 1 , . . . , Y 2 t − p ) ′ and ε it ∼ N (0 , ˜ σ 2 it ) with ε 1 t and ε 2 s independent ∀ s, t . In this pa pe r we consider fo ur lag s (i.e. p = 4) as for example in Hamilton (1989) and Kro lz ig (2000). Mor eov er, as most of the forecast errors are due to shifts to the deterministic factors (see Krolzig (2000) and Clement s and Krolzig (1 9 98)), w e pro p ose a mo del with shifts in the intercept and in the volatilit y and assume a vector of Dirichlet pro cesses as a prior for ( µ it , σ 2 it ), i = 1 , 2 ( ˜ µ 1 t , ˜ σ 2 1 t ) | G 1 , G 2 i.i.d. ∼ G 1 ( ˜ µ 2 t , ˜ σ 2 2 t ) | G 1 , G 2 i.i.d. ∼ G 2 ( G 1 , G 2 ) ∼ β 1 − DDP( ˜ α, H 0 ) ˜ α ∼ G ( ζ 11 , ζ 21 ) G ( ζ 12 , ζ 22 ) (41) 22 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN where the bas e measure H 0 is a pr o duct of norma l N (0 , s − 2 I 2 ) and inv erse gamma I G ( λ, λ ). F ollowing the standard practice in Bay esian V AR mo delling, for the par a meters Υ 1 and Υ 2 we consider improp er uniform prior o n R 4 p and obtain a multiv ariate normal as full conditional po sterior distr ibution to b e used in the Gibbs sampler. The charts in the first row of Fig. 7 s how the pre dic tive distributions (solid lines) generated by the non– parametric approa ch conditioning on all v alues of y it , for i = 1 , 2 a nd t = 1 , . . . , T and the b est no r mal fits (dashed lines ) for the empirical distributions o f the tw o series . F rom a co mparison with the empir ical distribution, w e note that the no n– parametric appro ach, as o ppo sed to the normal model, is able to captur e asymmetry , excess o f kurtosis , a nd m ultimodal- it y in the data . The results from o ur non–parametr ic appro ach ar e in line with the practice of using o f time-v arying parameter mo dels (e.g., Ma rko v-switching mo de ls ) to capture a symmetry and non-linear ity in bo th the US and the EU business cy cles. The po sterior distribution of the n umber of clusters is given in the seco nd r ow of Fig . 7. The lo cation of the p os ter ior mo de of the histograms allows us to conclude that the non–par ametric approach detects three clusters for the US cycle and four clusters for the EU cycle. The result for the US data is co he r ent with the results av ailable in the literature wher e three-reg ime Marko v- switching mo dels (see for example Kr olzig (2 000)) a re usually co nsidered. Mo reov er, we obser ve that the inclusion in the sample of the 20 09 negative-growth (reces s ion) p e rio d extends the v alidity of many past empirical findings that do not include the 2009 slowdown in the economic a c tivit y . An inspec tio n o f the po s terior mean o f the a toms and o f the mar ginal c lus tering (see below in this section) allows us to conclude that the three clusters can hav e the economic in terpretation of business cycle phase s asso cia ted to substantially differen t levels of IPI growth-rates. The results for the US a nd the EU c ycles ar e, in a certain w ay , coherent with the output of parametric studies which suggest to consider at least three r egimes. Nevertheless, the effects of the 2009 recession on the past empir ical findings is an op en issue and a matter of re search. The result from our non–pa rametric approach is an interesting o ne bec a use it sugge s ts that four comp onents are needed in order to capture the effects of the 200 9 recessio n phase (see Fig. 7). As a consequence o f the 200 9 recession, a long left tail pres ent in the predictive (solid line in Fig. 7) is fatter than the tail o f the b est normal (dashed line in the s ame figure). Fig. 8 shows the seq ue nc e o f predictive densities (g r ay are a) indexed by time t , for t = 1 , . . . , T . The predictive density fo r y it has b een estimated conditionally on the whole set of da ta and has bee n ev aluated sequentially ov e r time at the curre nt v alues of the predictors y it − 1 , . . . , y it − p , for i = 1 , 2. In this figure, the effects of the reces s ion are evident from the presence of non-negligible probability v a lues in co r resp ondence of extremely negative growth rates that were not realiz e d befo re 2009. Similarly , we found that, in bo th the e xpansion a nd r ecession phases, p oster ior distribution of the atoms exhibit m ultimoda lit y and asymmetr y . As an example Fig. 9, sho ws the BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 23 1971M03 1991M02 2011M01 40 60 80 100 x 1t (IPI US ) 1971M03 1991M02 2011M01 40 60 80 100 x 2t (IPI EU ) 1971M04 1991M02 2011M01 −10 −5 0 5 y 1t 1971M04 1991M02 2011M01 −10 −5 0 5 y 2t Figure 6. First row: Industrial Pr o duction Index (IP I ) for the US ( x 1 t ) a nd the EU ( x 2 t ) a t monthly frequency fo r the per io d: March 1971 to January 2011. Second row: log arithmic quarterly changes in the US IPI ( y 1 t ) and the E U IPI ( y 2 t ) v ar iables. −10 −5 0 5 10 0 0.1 0.2 0.3 0.4 −15 −10 −5 0 5 10 0 0.1 0.2 0.3 0.4 0 5 10 0 0.2 0.4 0.6 0.8 0 5 10 0 0.2 0.4 0.6 0.8 Figure 7. Fir s t row: IP I log -changes (histogram), predictive distribution (solid line), b est no r mal (dashed line). Seco nd row: num b er of cluster s. 24 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN 50 100 150 200 250 300 350 400 450 −10 −5 0 5 50 100 150 200 250 300 350 400 450 −10 −5 0 5 Figure 8. US a nd EU IPI growth rates (black lines) and predic tive densities (gray area s) ev a luated sequen tially for t = 1 , . . . , T , at the v alues of the predictors y it − 1 , . . . , y it − p , for i = 1 , 2. approximated p osterio r of the ato ms µ it and σ it in p erio ds of expansio n ( t = 4 30) and r e cession ( t = 450). The p osterio r distr ibution o f µ it exhibits t wo modes in the po sitive half of the real line during an expansion phase and tw o mo des in the nega tive half during a recession phase (fir s t column of Fig. 9 ). F rom the second co lumn o f the sa me fig ure, one c an conclude that the volatilit y po sterior distribution for b oth the US and the EU is more concentrated around low er v alues in expansion p erio ds . In or der to ident ify the different comp onents of o ur DP mixture mo del, we compute the p osterio r clustering o f the data a nd the asso ciated v alues o f the atoms for each obs erv ations a nd co un try . W e apply the leas t square clustering metho d prop osed originally in Dahl (2006). The method has bee n success fully used in many applications (se e for example K im et al. (20 06) and Ro driguez et a l. (2008)) and is bas ed on the p osterio r pair wise probabilities of joint classification P { D is = D j t | Y } . T o estima te this matr ix, one can use the following pairwise proba bilit y matrix: P ij,st = 1 M M X l =1 δ D l is ( D l j t ) that is es timated by using every pair o f allo catio n v ariable D l is D l j t , with s, t = 1 , . . . , T and ov er all the l = 1 , . . . , M MCMC iterations . In Dahl (2 006)’s alg orithm, one needs to e v aluate BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 25 t=430 (1st of July 20 07) −3 −2 −1 0 1 2 0 0.5 µ 1t −3 −2 −1 0 1 2 0 0.1 0.2 µ 2t 0 1 2 3 0 0.2 0.4 σ 1t 0 1 2 3 0 0.05 0.1 σ 2t t=450 (1st of M arc h 2009) −15 −10 −5 0 5 0 0.5 1 µ 1t −15 −10 −5 0 5 0 0.02 0.04 µ 2t 0 2 4 6 8 10 0 0.5 1 σ 1t 0 2 4 6 8 10 0 0.05 0.1 σ 2t Figure 9. Posterior density approximations fo r the atoms of the US (( µ 1 t , σ 1 t )) and the EU (( µ 2 t , σ 2 t )) gr owth rates y 1 t and y 2 t during ex pansion ( t = 43 0 ) and recession p erio d ( t = 45 0). P ij,st for i = j and i = 1 , 2. The le a st s quare marginal clustering D i,LS is the clustering D l i i = ( D l i i 1 , . . . , D l i iT ) (see Fig. 10) s ampled at the l i -th iteratio n which minimizes the sum of squared deviations from the pairwise p os terior probability: l i = arg min l ∈{ 1 ,...,M } T X t =1 T X s =1 δ D l is ( D l it ) − P ii,st 2 . More s pec ifically , the first row (second ro w) shows the posterio r pro babilities that tw o observ a- tions of the US cycle (EU cycle) b elong to the sa me cluster. In the first column, one can clearly detect the presence of vertical and hor iz o ntal dark g ray bands. They cor resp ond to o bserv a tions that do not cluster frequently together with other o bs erv ations and that a r e asso cia ted with neg- ative growth rates. A similar remark is true for the light gr ay a reas. In the second column of Fig. 10, o ne can se e the differ e n t b ehavior of the clustering for the US and the EU dur ing the 2009 crisis. 26 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN P osterior Clustering for the US data P osterior Clustering for the E U data P osterior Common Clustering for the EU and US data Figure 10. Pairwise p osterior probabilities for the cluster ing of the US da ta P 11 ,st and the EU data P 22 ,st , and the common clustering be t ween the US a nd EU data, P 12 ,st for s, t ∈ { 1 , . . . , T } Finally , the assumption in Eq. (7) implies that the set of atoms sampled a t every MCMC iter- ation is the same for the t wo series. This mak es the allo catio n v ar iables D l 1 t and D l 2 s comparable. F or this rea s on, we apply Dahl (2006)’s algor ithm to study the p os terior pro bability P ij,st that t wo observ a tions, each one from a different se ries (i.e. i 6 = j ), belong to the s ame cluster. That BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 27 100 200 300 400 −8 −6 −4 −2 0 2 4 y 1t µ 1t 100 200 300 400 −8 −6 −4 −2 0 2 4 y 2t µ 2t 100 200 300 400 −8 −6 −4 −2 0 2 4 y 1t σ 1t 100 200 300 400 −8 −6 −4 −2 0 2 4 y 2t σ 2t Figure 11. Atom ( ˆ µ ( l i ) it , σ ( l i ) it ) asso c iated to the LS mar ginal clustering l i for i = 1 , 2. gives a measure of a sso ciation b etw een the tw o clus tering, induced by the dep endent DP , for the t wo s eries. The estimated pairwise proba bilit y is given in the third row of Fig. 10 that shows the probability that tw o o bserv atio ns, o ne o f the US cy cle a nd another o ne of the E U cycle, b elong to the same cluster. The white a nd light g ray lines show that the t wo marginal cluster ing s hare some atoms. The le a st square clustering allows us to find the p o s terior clustering of the data and to identify the different clusters. F o r the US cycle, the obser v ations cluster together in three g roups (see Fig. 11) and the a toms asso cia ted with the three cluster s are ( µ 1 t , σ 1 t ) ∈ { ( − 2 . 4 142 , 1 . 36 25), (0 . 136 , 0 . 61 66), (0 . 6216 , 1 . 0618 ) } and lea d to the identification of the clus ter as recession, no r mal expansion, and strong e x pansion phases. F or the EU cy cle, the observ a tions are cla ssified in four groups (see Fig. 11) and the atoms ar e (( − 10 . 61 7 0 , 0 . 6600 ) , ( − 1 . 21 83 , 2 . 785 7 ) , (0 . 07 60 , 0 . 5794 ) , (0 . 4909 , 1 . 2 165)) and ar e interpreted as strong r ecession, normal recess io n, norma l expansio n, and strong expansion phases. These results on the featur es of the cycle phas es are coherent with the recent findings in the business cycle literature with an exception for the E U cycle, whic h presents a fourth cluster o f observ ations with v ery high neg ative g r owth ra te ( − 10 . 61 70) corresp onding to the 200 9 recession p erio d. 6. Conclusions W e in tro duce a new class o f m ultiv ariate dep endent Dirichlet proces ses for mo deling vectors of random measures. W e discus s some prop erties of the pro ces s. W e apply the dep endent Dirichlet pro cess to the context of non– parametric Bay esian infer ence and pr ovide an efficient MCMC 28 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN algorithm for po sterior co mputation. Since our pro ces s is particular ly suitable for groups of data that exhibit a differen t clustering behavior, we apply it to multiple time s eries analysis. W e pr ovide an original a pplication to the join t analysis of the US and the E U business cycles and show that our no n–parametr ic Bayesian model is able able to highlig ht so me impo r tant issues for this kind of data. BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 29 Appendix A. The algorithm In the Blo ck Gibbs Sampler describ ed in Section 4 in principle one needs to sample an infinite nu mber of V k and ˜ ϕ k . But in o rder to pro ceed w ith the c hain it suffices to sample a finite nu mber of V k s to chec k condition (30) and the finite num b er o f ˜ ϕ k to b e us ed in (29). F or the sake of clarit y we summarize here the blo cked Gibbs sampling algor ithm. THE ALGORITHM • INITIAL STE P . Initialize U , D , ˜ α . With D compute D ∗ by using (22). • UPDA TING STEP . Supp ose to hav e a sa mple o f all the v aria bles inv olved in the algorithm. This v ar ia bles that comes from the prev i- ous step is lab eled with ”old”. The v ariables that will b e generated in the nex t step a r e labele d with ”new” . (1) The ( V ∗ , ˜ α ) | ne w are sampled by using the D | ol d and (24) with Metrop olis within Gibbs step as desc r ib ed in Subse c tion 4.2. (2) The V ∗∗ | new ar e sampled b y using the D | ol d and ˜ α | ne w b y a Metrop olis step as describ ed in Subsectio n 4.2. (3) The U | ne w a r e sa mpled b y using the ( V ∗ , V ∗∗ , ˜ α ) | new and (28); (4) N ∗ i,j are computed by using (30), with U | new a nd V j | new . If some V k | new with k > D ∗ | old are needed they a re sa mpled from the prio r P { V k ∈ dv k | ˜ α } . (5) The ˜ ϕ j | new for j = 1 , . . . , N ∗ , with N ∗ := max i =1 , 2 max 1 ≤ j ≤ n i ( N ∗ i,j ), are s ample by us ing (23) and D | ol d a s describ ed in (23) and in Section 5 .1. (6) The D | ne w are sampled by using (29) with U | new , V ( n ) | new and ˜ ϕ 1 | new , . . . , ˜ ϕ N ∗ | new . Appendix B. Proofs Pr o of of Pr op osition 12. First of all o bserve that E [ G 1 ( A ) G 2 ( B )] = X h ≥ 1 ,k ≥ 1 E [ I A ( ˜ ϕ 1 k ) I B ( ˜ ϕ 2 h )] E [ W 1 k W 2 h ] = X h ≥ 1 ,k ≥ 1 ,h 6 = k E [ I A ( ˜ ϕ 1 k )] E [ I B ( ˜ ϕ 2 h )] E [ W 1 k ] E [ W 2 h ] + X h ≥ 1 E [ I A × B ( ˜ ϕ 1 h , ˜ ϕ 2 h )] E [ W 1 h W 2 h ] . (42) 30 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN Now note that the following equalities hold (43) X h ≥ 1 E S 1 h S 2 h Y m ≤ h − 1 (1 − S 1 m )(1 − S 2 m ) = E [ S 11 S 21 ] E [ S 11 ] + E [ S 21 ] − E [ S 11 S 21 ] (44) X h 6 = k E S 1 h S 2 k Y m ≤ h − 1 (1 − S 1 m ) Y l ≤ k − 1 (1 − S 2 l ) = E [ S 11 ] + E [ S 21 ] − 2 E [ S 11 S 21 ] E [ S 11 ] + E [ S 21 ] − E [ S 11 S 21 ] Combining (42) with (4 3)-(44) it follows that E [ G 1 ( A ) G 2 ( B )] = G 0 ( A × B × X r − 2 ) E [ S 11 S 21 ] E [ S 11 ] + E [ S 21 ] − E [ S 11 S 21 ] + G 01 ( A ) G 02 ( B ) E [ S 11 ] + E [ S 21 ] − 2 E [ S 11 S 21 ] E [ S 11 ] + E [ S 21 ] − E [ S 11 S 21 ] . Since E [ G i ( · )] = G 0 i ( · ), i = 1 , . . . , r , it follows (45) C ov [ G 1 ( A ) , G 2 ( B )] = E [ S 11 S 21 ] E [ S 11 ] + E [ S 21 ] − E [ S 11 S 21 ] [ G 0 ( A × B × X r − 2 ) − G 01 ( A ) G 02 ( B )] . In a similar fashion (46) E [ G 1 ( A ) 2 ] = G 01 ( A ) E [ S 2 11 ] + 2 G 2 01 ( A )( E [ S 11 ] − E [ S 2 11 ]) 2 E [ S 11 ] − E [ S 2 11 ] (47) E [ G 2 ( B ) 2 ] = G 02 ( B ) E [ S 2 21 ] + 2 G 2 02 ( B )( E [ S 21 ] − E [ S 2 21 ]) 2 E [ S 21 ] − E [ S 2 21 ] and then (48) V ar [ G 1 ( A )] = G 01 ( A )(1 − G 01 ( A )) E [ S 2 11 ] 2 E [ S 11 ] − E [ S 2 11 ] (49) V ar [ G 2 ( B )] = G 02 ( B )(1 − G 02 ( B )) E [ S 2 21 ] 2 E [ S 21 ] − E [ S 2 21 ] Pr o of of Cor ol lary 5. By direct ca lculation or using the r esults in Nadar a jah and Kotz (200 5) one obtains E ( S 21 ) = E ( S 11 ) = 1 1 + α 1 + α 2 , E ( S 2 21 ) = E ( S 2 11 ) = 2 (1 + α 1 + α 2 )(2 + α 1 + α 2 ) E ( S 21 S 11 ) = B (2 , α 1 ) B (2 , α 1 ) B ( α 2 , α 1 + 3) B (1 , α 1 ) B (1 , α 1 ) B ( α 1 + 1 , α 2 ) = α 1 + 2 ( α 1 + 1)( α 1 + α 2 + 1)( α 1 + α 2 + 2) for (H1) and E ( S 11 ) = 1 1 + α 1 + α 2 , E ( S 21 ) = 1 1 + α 1 , E ( S 2 11 ) = 2 (1 + α 1 + α 2 )( α 1 + α 2 + 2) , E ( S 2 21 ) = 2 (1 + α 1 )(2 + α 1 ) E ( S 21 S 11 ) = B (3 , α 1 ) B (2 + α 1 , α 2 ) B (1 , α 1 ) B ( α 1 + 1 , α 2 ) = 2 (2 + α 1 )(1 + α 1 + α 2 ) BET A-PRODUC T POISS ON-DIRICHLET PR OCESSES 31 for (H2). Hence the correla tion betw een the tw o random measur es is C or ( G 1 ( A ) , G 2 ( A )) = E [ S 21 S 11 ] 1 − E [(1 − S 21 )(1 − S 11 )] s (2 E [ S 21 ] − E [ S 2 21 ])(2 E [ S 11 ] − E [ S 2 11 ]) E [ S 2 11 ] E [ S 2 21 ] = ( α 1 + α 2 + 1)( α 1 + 2) 2( α 1 + 1)( α 1 + α 2 + 1) − ( α 1 + 2) and C or ( G 1 ( A ) , G 2 ( A )) = E [ S 21 S 11 ] 1 − E [(1 − S 21 )(1 − S 11 )] s (2 E [ S 21 ] − E [ S 2 21 ])(2 E [ S 11 ] − E [ S 2 11 ]) E [ S 2 11 ] E [ S 2 21 ] = 2( α 1 + 1) ( α 1 + 2)(2 α 1 + α 2 + 1) − α 1 p ( α 1 + 1)( α 1 + α 2 + 1) for (H1) and (H2), res pec tively . Pr o of of (18) . F or the sake of simplicit y write V k in place of V k 1 . Recall that V 0 ∼ B e ta (1 , α 1 ) and, for 1 ≤ k ≤ r − 1, V k ∼ B eta (1 + α 1 + · · · + α k , α k +1 ). Let 1 ≤ i < j ≤ r . Since S i 1 = V 0 V 1 . . . V r − i , one ge ts S i, 1 S j, 1 = V 2 0 V 2 1 . . . V 2 r − j V r − j +1 . . . V r − i . After some co mputations, using the fa c t that V j are indep endent, E [ S i, 1 S j, 1 ] = 2 (2 + α 1 + · · · + α r − j +1 )(1 + α 1 + · · · + α r − i +1 ) . In addition, one has E [ S i, 1 ] = 1 1 + α 1 + · · · + α r − i +1 , E [ S 2 i, 1 ] = 2 (1 + α 1 + · · · + α r − i +1 )(2 + α 1 + · · · + α r − i +1 ) . Set C i,j := E [ S i, 1 S j, 1 ] E [ S i, 1 ] + E [ S j, 1 ] − E [ S i, 1 S j, 1 ] s 2 E [ S i, 1 ] E [ S 2 i, 1 ] − 1 2 E [ S j, 1 ] E [ S 2 j, 1 ] − 1 . Simple algebr a gives s 2 E [ S i, 1 ] E [ S 2 i, 1 ] − 1 2 E [ S j, 1 ] E [ S 2 j, 1 ] − 1 = q (1 + α 1 + · · · + α r − i +1 )(1 + α 1 + · · · + α r − j +1 ) and E [ S i, 1 S j, 1 ] E [ S i, 1 ] + E [ S j, 1 ] − E [ S i, 1 S j, 1 ] = 2(1 + α 1 + · · · + α r − j +1 ) 2(1 + α 1 + · · · + α r − j +1 ) 2 + (2 + α 1 + · · · + α r − j +1 )( α r − j +2 + · · · + α r − i +1 ) . That is C i,j = 2 p (1 + α 1 + · · · + α r − i +1 )(1 + α 1 + · · · + α r − j +1 ) 3 2 2(1 + α 1 + · · · + α r − j +1 ) 2 + (2 + α 1 + · · · + α r − j +1 )( α r − j +2 + · · · + α r − i +1 ) which giv es (18) since C i,j = corr ( G i ( A ) , G j ( A )). 32 FEDERICO BASSETTI, ROBER TO CASARIN, AND F ABRIZIO LEIS EN F u l l-c onditionals. The joint distribution of [ V , ˜ ϕ, U, D , Y , ˜ α ] is P { V ∈ dv , ˜ ϕ ∈ dϕ, Y ∈ dy , U ∈ du ( n ) , D = d ( n ) , ˜ α ∈ ( dα 1 , dα 2 ) } = h Y i =1 , 2 n i Y j =1 I { u ij < w i,d i,j }K ( y i,j | ϕ d i,j ) i ⊗ i =1 , 2 ⊗ n i j =1 dy ij du ij ⊗ k ≥ 1 h P { V k ∈ dv k | ˜ α = ( α 1 , α 2 ) } ⊗ G 0 ( dϕ k ) i ⊗ P { α ∈ ( dα 1 , dα 2 ) } (50) where w i,k = v 0 k v ik Q j
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment