Posterior mean and variance approximation for regression and time series problems

P osterior mean and v ariance appro ximati on for regression and time s eries problems K. T rian tafyllop oulos ∗ P .J. Harrison † Octob er 29, 2018 Abstract This pap er develops a methodolog y for approximating the pos terior ﬁrst tw o moment s of the p osterio r distribution in Bay esian inference. Partially sp eciﬁed pro bability mod- els, which are deﬁned only by sp ecifying mea ns and v a r iances, are constructed based upo n second- o rder conditional indep endence, in order to fa cilitate p oster ior up dating a nd prediction of require d distributional quantities. Such mo dels are formulated particular ly for m ultiv ariate regressio n and time ser ies analysis with unknown observ ationa l v ariance - cov aria nce c omp onents. The s imilarities and diﬀerences of these mo dels with the Bay es linear a pproach are es tablished. Several sub cla sses of imp or tant mo dels , including regres- sion a nd time series mo dels with erro rs following m ultiv ariate t , inv erted multiv ar iate t and Wishart distributions, are discussed in detail. Two n umerica l examples consisting of simulated data and o f US in vestment and change in inv entory data illustrate the pr o p osed metho dology . Some key wor ds: Ba yesian inference, conditional indep endence, reg r ession, time series, Bay es linea r metho ds, s tate space mo dels, dynamic linear mo dels, Ka lman ﬁlter , Bay esian forecasting. 1 In tro d uction Regression an d time series problems are imp ortan t problems of statistical inference, whic h app ear widely in many science ﬁelds, as for example in econometrics and in m edicine. Re- gression has b een d iscussed in many textb o oks (Mardia et al. , 1979, Ch apter 6; S riv asta v a and Sen, 1990); f rom a Ba ye sian standp oint Tiao and Z ellner (1964), Bo x and Tiao (1973) , Mouc hart and Simar (1984) , Pilz (1986), Leonard and Hsu (1999, Chapter 5) and O’Hagan and F orster (2004, Ch apter 9) d iscuss a v ariet y of parametric regression mo dels, wh ere th e residuals follo w norm al or Student t distribu tions. Recen t wo r k on non-normal resp onses includes regression mo dels in the t yp e of generalize d linear m o dels (GLMs) (McCullagh and Nelder, 19 89) and time series mo dels in the t yp e of dyn amic GLMs (F ahrmeir and Kaufmann, 1987, 199 1; F ahrmeir, 1992; W est and Harrison , 1997, Chapter 12 ; F ahrmeir and T utz, 2001 , Chapter 8; Kedem and F okianos, 2002; Go d olphin and T rian tafyllop oulos, 2006). Hartigan (1969 ) and Goldstein (1976) dev elop Ba y esian inferen ce for a general class of linear regres- sion p roblems, in wh ic h the parameters or states of the regression equation are estimated ∗ Department of Probabilit y and Statistics, Hicks Building, Un iversi ty of Sheﬃeld, Sheﬃeld S 3 7RH, UK, email: k.triantafyllopo ulos@sheffield.ac.uk † Universit y of W arwick, Co ven try , UK 1 b y min imizing the p osterior exp ected risk. Goldstein (1979, 1983), Wilkinson and Goldstein (1996 ) and Wilkinson (1997) p rop ose mo diﬁcations to the Ba yes linear estimators to allo w for v ariance estimation in regression and time series problems. S uc h consider ations are us efu l in practice b eca u se they allo w inf erence to a r an ge of p roblems that otherwise the mo d eller w ould need to r esort to Mon te Carlo estimation (Gamerman, 1997) or to other simulat ion based metho d s (Kitaga wa and Gersc h, 1996) . W est and Harrison (1997, C hapter 4) and Wilkinson (1997) discuss h o w the ab ov e m en tioned regression estimation can b e applied to a sequen tial estimation pr oblem, whic h is necessa r y to consider in time series analysis. In this pap er we prop ose a mo delling framew ork that allo ws approximate calculation of the ﬁ rst tw o momen ts of the p osterior distrib ution in Ba y esian inference. Th is is motiv ated b y situations when a mo del ma y b e partially sp eciﬁed in terms of its ﬁrst t wo momen ts, or its probabilit y distr ibution ma y b e diﬃcult to sp ecify (or it ma y b e sp eciﬁed w ith uncertain t y). P artially sp eciﬁed p rior p osterior (PS P P) mo dels are develo p ed for dynamic situation in whic h a mo d eller is reluctant to sp ecify a full probab ility mo del and yet requires a facilit y for appro xim ate prior/p osterior up dating on mean and v ariance/co v ariance comp onen ts of that mo del. The basic idea is th at a linear function φ ( X, Y ) of t wo random vect ors , X , Y , is second-order indep enden t of the observe d v alue of Y . Th en in learning, no matter wh at v alue of Y is o b serv ed, the mean and the v ariance of φ ( X, Y ) tak es exac tly the sa m e v alue. A further requirement is that the mean and v ariance of X | Y = y can b e deduced b y the mean and v ariance of φ ( X, Y ). W e sho w that for a class o f r egression mo dels, linear Ba y es methods are equiv alen t to PSPP , while w e describ e situatio n s where PS PP ca n pr ovide more eﬀect ive estimation pro cedures than linear Ba yes. W e then describ e t w o w ide classes of regressio n a n d time series mo dels, t h e scaled observ ational precision (SOP) and t h e generalized SOP , b oth of whic h are aimed at multi v ariate application. F or the former mo d el, we g ive the corresp ondence of PSPP (based on sp eciﬁcatio n of prior means and v ariances only) with the normal/gamma mo del (based on sp eciﬁcation of the prior distribu tion as normal/gamma) . F or the latter mo del, we s h o w that PS P P can prod uce eﬃcie nt estimation, o vercoming problems of existing time series mod els. T his relates to co v ariance estimation for multiv ariate state space mod els when th e observ ation co v ariance matrix is un kno wn . F or this interesting mo del w e pr esen t t wo n umerical illustrations, consisting of sim u lated biv ariate data and of US in vestmen t and c hange in inv entory data. The pap er is organized as follo ws. PSPP mo dels are deﬁned in Section 2. S ections 3 and 4 apply PSPP mo delling to regression and time series problems. The numerical illustr ations are given in Section 5. Section 6 giv es concludin g comment s and the app endix details the pro of of a theo r em of Sectio n 2. 2 P artially s p eciﬁed probabilit y mo delling 2.1 F ull probabilit y mo delling In Ba yesian analysis, a f ull probabilit y mo del for a random vect or Z compr ises the joint dis- tribution of all its elements. Th e forecast d istribution of any function of Z is then just that function’s marginal distribu tion. Learnin g or up dating simply derives the conditional distri- bution of Z giv en the receiv ed in formation on the app ropriate fun ction of Z . F or example, let Z = [ X ′ Y ′ ] ′ , where X , Y a r e real v alued r andom v ectors, and the pr obabilit y densit y function of Z b e d enoted by p ( . ). X will often b e the vect or comprising th e p arameters or states of the mo d el and Y will b e the v ector comprising the observ ations of in terest. T he 2 mo del is precisely deﬁned, if a densit y of Y giv en X is sp eciﬁed, e.g. p ( Y | X ) s o that p ( y | X ) is the like liho od function of X based on the single observ ation Y = y . Then the one-step forecast distribution of Y is the marginal distribution of Y p ( Y ) = Z S p ( X, Y ) dX, (1) where S is the space of X , also known as parametric space. When the v alue y of Y is observed, the revised densit y of X is p ( X | Y = y ) = p ( y | X ) p ( X ) p ( y ) , (2) from direct applicatio n of the Ba y es th eorem. Most Ba yesian parametric r egression and time series mo dels (includ ing linear and non- linear) adopt the ab o ve mo del stru cture a n d their inferen ce i nv olv es the ev aluation of in tegral (1) and the Bay es rule (2). Ho w ever, in man y situati ons , the e v aluation of the ab ov e in tegral is n ot o b tained in closed form and the application of rule (2) do es not lead to a conju gate analysis, which is us u ally desirable in a sequ en tial setting suc h as for time series ap p lication. F or suc h s itu ations, it is desirable to appro ximate only the mean and v ariance of X | Y = y . In this paper we consider the general problem of obtaining appro xim ations of the ﬁrst t wo moments of X | Y = y , when w e only sp ecify the ﬁrst t wo m oments of X and Y alone and n ot their joint distribu tion. W e ac hiev e this b y replacing the fu ll conditional ind ep end ence stru cture, which is based on the join t d istribution of X and Y , by second ord er ind ep enden ce, which is based on means and v ariances of X and Y . Ou r motiv ation is generated from the Gaussian case; sup p ose that X and Y ha v e a joint normal distribution, then X − A xy Y and Y are m utually indep enden t and the distribution of X | Y = y can b e deriv ed from the distribution of X − A xy Y , w here A xy is the regression matrix of X on Y (for a deﬁ nition of A xy see Section 2.2). So w e can d eﬁne a su b class of the Ba yesian mo dels of (1) and (2), w here w e can replace the strict m u tual indep en d ence r equ iremen t b y second order indep end en ce. Details app ear in our deﬁn ition of prior p osterior probabilit y mo d els that follo w. 2.2 P oster ior mean and v ariance appro ximation Let X ∈ R m , Y ∈ R p , W ∈ R q b e any rand om v ectors with a j oin t distribution ( m, p, q ∈ N − { 0 } ). W e use the n otation E ( X ) for the mean v ector of X , V ar( X ) f or the co v ariance matrix of X and Co v( X, Y ) for the co v ariance matrix of X and Y . W e u se the notation X ⊥ 2 Y to ind icate that X and Y are s econd order indep enden t, i.e. E ( X | Y = y ) = E ( X ) and V ar( X | Y = y ) = V ar( X ), for an y v alue y of Y . F urthermore, we use the notation X ⊥ 2 W | Y to ind icate that, giv en Y , X and W are second ord er in dep en d en t, i.e. E ( X | W = w, Y = y ) = E ( X | Y = y ) and V ar( X | W = w , Y = y ) = V ar( X | Y = y ). Details on conditional in d ep end ence can b e foun d in Whittak er (1990 ) or L auritzen (1996), who discuss indep en d ence in a m uch more sophisticated lev el necessary for the dev elopment of graphical mo dels. Considering vec tors X and Y as ab ov e, it is w ell kno wn that X − A xy Y and Y a r e uncorrelated, where A xy = Cov( X, Y ) { V ar ( Y ) } − 1 is the r egression matrix of X on Y . In order to obtain appro ximations of the p osterior mean E ( X | Y = y ) and the p osterior co v ariance matrix V ar( X | Y = y ) it is necessary to go one step fur th er and assume that X − A xy Y ⊥ 2 Y , (3) 3 whic h of course implies that X − A xy Y an d Y are uncorrelated. With µ x = E ( X ) and µ y = E ( Y ), the prior means of X and Y , resp ectiv ely , the ab o v e assump tion is equiv alen t to the follo wing t wo p ostulates. 1. Give n Y , the p osterior mea n E ( X − A xy Y | Y = y ) of X − A xy Y do es not dep end on the v alue of y of Y , so that th e v alue of this mean m ust b e the same for all v alues of Y , and so b e equal to it s prior exp ectation µ x − A xy µ y . 2. Give n Y , the p osterior co v ariance m atrix V ar( X − A xy Y | Y = y ) of X − A xy Y do es not dep end on the v alue y of Y , so that this p osterior co v ariance m atrix tak es the same v alue for all v alues y of Y and is necessarily equal to its p rior co v ariance m atrix V ar ( X − A xy Y ). Th u s it is p ossible to appro ximate E ( X | Y = y ) and V ar( X | Y = y ), since from the deﬁnition of second order in dep en d ence (giv en ab ov e), w e ha ve E ( X − A xy Y | Y = y ) = E ( X − A xy Y ) ⇒ E ( X | Y = y ) − A xy y = µ x − A xy µ y ⇒ E ( X | Y = y ) = µ x − A xy ( y − µ y ) , V ar ( X | Y = y ) = V ar( X − A xy Y | Y = y ) = V ar( X − A xy Y ) = Σ x + A xy Σ y A ′ xy − 2Cov( X, Y ) A ′ xy = Σ x − A xy Σ y A ′ xy and so we write X | Y = y ∼ { µ x + A xy ( y − µ y ) , Σ x − A xy Σ y A ′ xy } , where Σ x = V ar( X ) and Σ y = V ar( Y ). Therefore w e can deﬁne mo dels that ha v e a prior/p osterior up dating facilit y that is based on second ord er ind ep end ence and that can appr o ximate th e p osterio r mean and v ariance obtained from an application of the Ba yes theorem when the full distributions are sp eciﬁed. Th u s w e ha v e the follo w ing deﬁn ition. Deﬁnition 1. L et X and Y b e any ve ctors of dimensions m and p r esp e ctively and assume that it exists the joint distribution of Z = [ X ′ Y ′ ] ′ . L et A xy b e the r e gr ession matrix of X on Y . A ﬁrst or der p artial ly sp e ciﬁe d prior p osterior pr ob ability mo del for ( X ; Y ) (notation: PSPP(1)), is deﬁne d such that: (a ) X − A xy Y ⊥ 2 Y and (b) for any value y of Y , the me an ve ctor and the c ovarianc e matrix of X | Y = y ar e obtainable fr om the me an ve ctor and the c ovarianc e matrix of X − A xy Y . W e note that if X and Y ha v e a joint normal distribution, then s econd order indep endence is guarant eed and in p articular X − A xy Y and Y are m u tually indep enden t, w h ic h is m uch stronger than p rop erty (3). In this case E ( X | Y = y ) and V ar( X | Y = y ) are the exact p osterior momen ts, pro du ced by an applicatio n of Ba y es ru le (2). It follo ws that the app ro ximation of the ﬁr s t t wo momen ts reﬂects on the appr o ximation of p ostulate (3). T h u s the appr oximati ons of E ( X | Y = y ) and V ar ( X | Y = y ) will b e so accurate as the condition (3) is satisﬁed. The question is: as we d epart from normalit y , ho w justiﬁed are we to apply (3)? In ord er to answe r this question an d to supp ort the adoption of (3 ), we giv e the n ext result, whic h s tates that Ba y es linear estimation is equiv alen t to mean and v ariance estimation emplo ying assu mption (3). Theorem 1. Consider the ve ctors X and Y as ab ove. Under quadr atic loss, µ x + A xy ( Y − µ y ) is the Bayes line ar estimator if and only if X − A xy Y ⊥ 2 Y . 4 The pr o of of this result is giv en in the app endix. Thus, if on e is happy to ac cept the assumptions of Ba y es linear optimalit y , she has to employ (3). Next w e gi ve three illustrativ e examples that sho w assumption (3) ma y b e appro ximately satisﬁed. Example A: che cking p ostulate (3) for the m ultiv ariate Studen t t distribu- tion Let X ∈ R m and Y ∈ R p b e random ve ctors with a j oin t Student t d istribution with n degrees of freedom (Gupta and Naga r , 1999, § 4 .2). F or example the marginal densit y of X is the Student t distribution X ∼ T m ( n, µ x , C 11 ) with densit y function p ( X ) = π − p/ 2 n n/ 2 Γ { ( n + p ) / 2 } Γ( n/ 2) | C 11 | 1 / 2  n + ( X − µ x ) ′ C − 1 11 ( X − µ x )  ( n + p ) / 2 , for µ x = E ( X ) and V ar( X ) = nC 11 / ( n − 2), where Γ( . ) denotes the gamma function and | · | denotes determinan t. W rite Z =  X Y  ∼ T m + p  n,  µ x µ y  ,  C 11 C 12 C 12 C 22  , for some kno w n p arameters µ x , µ y , C 11 , C 12 , and C 22 . The regression co eﬃcient of X on Y is A xy = C 12 C − 1 22 so that  X − A xy Y Y  ∼ T m + p  n,  µ x − A xy µ y µ y  ,  C 11 − A xy C 22 A ′ xy 0 0 C 22  . No w for an y v alue y of Y , the conditional distribution of X − A xy Y giv en Y = y is X − A xy Y | Y = y ∼ T m  n + p, µ x − A xy µ y , ( C 11 − A xy C 22 A ′ xy )  1 + n − 1 ( y − µ y ) C − 1 22 ( y − µ y ) ′  . Th u s for an y n > 0, E ( X − A xy Y | Y = y ) = E ( X − A xy Y ), while for the v ariance, for n > 2, it is V ar( X − A xy Y | Y = y ) ≈ n ( n − 2) − 1 ( C 11 − A xy C 22 A ′ xy ) = V ar ( X − A xy Y ). F or large n p ostulate X − A xy Y ⊥ 2 Y is though t to b e satisfact ory . Example B : c heckin g p ost ulate (3) for the inv erted mu lt iv ariate Studen t t distribution The in verted Student t distribution is discussed in Dic k ey (1967) , in Gup ta and Nagar (1999, § 4.4) and it is generated from a multiv ariate normal and a Wish art distrib ution as follo ws. Supp ose that X ∗ ∼ N p (0 , I p ) and Σ ∼ W p ( n + p − 1 , I p ), for some n > 0, wh ere W p ( n + p − 1 , I p ) denotes a Wish art d istribution w ith n + p − 1 degrees of fr eedom and parameter matrix I p ; this distribution b elongs to the orthogonally inv ariant and residual indep endent f amily of distributions, discussed in Khatrie et al. (1991) and Gupta and Nagar (1999, § 9.5). F or a v ector µ and a co v ariance matrix C w e deﬁn e X = n 1 / 2 C 1 / 2 { Σ + X ∗ ( X ∗ ) ′ } − 1 / 2 X ∗ + µ , wh ere C 1 / 2 denotes the symmetric square r o ot of C . Then the densit y of X is p ( X ) = Γ { ( n + p ) / 2 } π p/ 2 Γ( n/ 2) | C | 1 / 2 n ( p + n − 2) / 2  n − ( X − µ ) ′ C − 1 ( X − µ )  n/ 2 − 1 . This density deﬁnes the in v erted m ultiv ariate Stud en t t distribution and the n otation used is X ∼ I T p ( n, µ, C ). 5 F ollo wing a similar thinking as in Examp le A we ha v e th at X − A xy Y ∼ I T m ( n, µ x − A xy µ y , C 11 − A xy C 22 A ′ xy ) and conditioning on Y = y (Gupta and Nag ar, 1999, § 4.4) w e obtain X − A xy Y | Y = y ∼ I T m { n, µ x − A xy µ y , ( C 11 − A xy C 22 A ′ xy )[1 − n − 1 ( y − µ y ) ′ C − 1 22 ( y − µ y )] } . So we conclud e that f or large n th e mean and v ariance of X − A xy Y | Y = y and X − A xy Y are appro ximately the same and th us X − A xy Y ⊥ 2 Y . Example C: chec king p ostulate (3) for the Wishart distr ibution Supp ose that Σ = (Σ i,j ) i,j =1 , 2 follo ws a Wishart distribution Σ ∼ W 2 ( n, S ) with densit y p (Σ) = n 2 n Γ 2 ( n/ 2) | S | n/ 2 o − 1 | Σ | ( n − 3) / 2 exp  − 1 2 tr( S − 1 Σ)  , where exp( . ) denotes exp onen t, tr( . ) denotes the trace of a square m atrix, S = ( S ij ) i,j =1 , 2 , n > 0 are the degrees of fr eedom and Γ 2 ( x ) = √ π Γ( x )Γ( x − 1 / 2) denotes the b iv ariate gamma function. Let X = Σ 12 and Y = Σ 22 and assu me that we observ e Y = y so that E ( Y ) = nS 22 ≈ y . F rom the exp ecte d v alues of the Wishart distr ib ution (Gupta and Nagar, 1999, § 3. 3.6), we can write  X Y  ∼  n  S 12 S 22  , n  S 11 S 22 + S 2 12 2 S 12 S 22 2 S 12 S 22 2 S 2 22  , whic h, with A xy = S 12 /S 22 , yields E ( X − A xy Y ) = 0 and V ar( X − A xy Y ) = n ( S 11 S 22 − S 2 12 ). F rom Gu pta and Nagar (1999 , § 3.3 .4), the p osterior distrib ution of X | Y = y is X | Y = y ∼ N { S 12 y /S 22 , ( S 11 − S 2 12 /S 22 ) y } leading to E ( X − A xy Y | Y = y ) = 0 = E ( X − A xy Y ) and V ar( X − A xy Y | Y = y ) = V ar( X | Y = y ) = ( S 11 − S 2 12 /S 22 ) y = ( S 11 S 22 − S 2 12 ) y /S 22 = V ar ( X − A xy Y ). Thus we can establish that X − A xy Y ⊥ 2 Y . Examples A and B sho w that PSP P (1) m o delling can b e regraded as approxi m ation to the tru e p osterior mean and v ariance, corresp ond ing to the full pr obabilit y mo del assuming the distribution of these exa mp les. Returning to Deﬁnition 1, there are situations where the prior mean ve ctors a n d co v ariance matrices of X and Y are a v ailable, co n ditional on some ot h er paramet ers , the typica l example b eing when the momen ts of X and Y are giv en conditional on a co v ariance matrix V . Then, as V is usually unkno wn , the purp ose of the stud y is to approximat e th e p osterior mean v ector and co v ariance matrix of X | Y = y as wel l as to appr o ximate the p osterior mean v ector and co v ariance matrix of V . In suc h situations p ostulate (3) r eads X − A xy Y ⊥ 2 Y | V and another p ostulate for V is necessary in ord er to appro ximate the moments of X | Y = y , unconditionally of V . Regression p roblems of this kind are met f requent ly in practice, as V can represent an observ ation v ariance or volat ility , whic h estimation is b en eﬁcial to accounti n g for the uncertaint y of p redictions. W e can then extend Deﬁnition 1 to accommo date for the estimation of V . 6 Deﬁnition 2. L et X , V and Y b e any ve ctors of dimensions m , r and p r esp e ctively and assume th at it exists the joint distr ib ution of Z = [ X ′ V ′ Y ′ ] ′ . L et A xy b e the r e gr ession matrix of X on Y , given V and let B vy the r e gr ession matrix of V on Y . A se c ond or der p artial ly sp e ciﬁe d prior p osterior pr ob ability mo del for ( X, V ; Y ) (notation: PSPP(2)), is deﬁne d suc h that: (a) X − A xy Y ⊥ 2 Y | V and V − B vy Y ⊥ 2 Y and (b) for any value y of Y , the me an ve ctor and the c ovarianc e matrix of X | V , Y = y and V | Y = y ar e obtainable fr om the me an ve ctor and the c ovarianc e matric es of X − A xy Y and V − B vy Y , r esp e ctively. An example of PSPP(2) mo del is the scaled observ ational precision m o del, which is ex- amined in detail in Sections 3 and 4. Next we discuss the diﬀerences of PSPP(2) and Bay es linear estimation when V is a scalar v ariance. Goldstein (1979, 1983), Wilkinson and Goldstein (1996 ) and Wilkinson (1997) examine some v arian ts of th is problem b y consid ering v ariance mod iﬁcations of the basic linear Ba ye s rule, consid ered in Hartigan (1969) a n d in Goldstein (1976). Belo w w e giv e a basic description of the prop osed estimators and we indicate the similarities an d the diﬀerences of the p rop osed PSPP mo dels and of the Ba yes linear estimators. Consider a simple regression problem form u lated as Y | X, V ∼ ( X , V ), X ∼ { E ( X ) , V ar ( X ) } , wh ere Y is a scalar resp onse v ariable, X is a s calar regressor v ariable and E ( X ), V ar( X ) are th e prior mean and v ariance of X . If V is known the p osterior mean E ( X | V , Y = y ) can b e approximat ed by the Ba yes linear ru le µ = E ( X ) V + y V ar( X ) V + V ar( X ) = E ( X ) + A xy { y − E ( X ) } , (4) with related p osterior exp ecte d risk R ( µ ) = V ar( X ) V V ar ( X ) + V = V ar( X )(1 − A xy ) , where A xy = V ar( X ) / { V ar ( X ) + V } is the reg r ession coeﬃcient of X on Y , conditional on V . As it is well kno wn R ( µ ) is the minim u m p osterior exp ected risk, o ve r all linear estimators for E ( X | Y = y ), and in this s ense µ attains Ba yes linear optimalit y . If one assum es that the distributions of Y | X, V and X are normal distributions, then µ giv es the exact p osterior mean E ( X | V , Y = y ) and R ( µ ) giv es the exact p osterior v ariance V ar( X | V , Y = y ). Ho w ev er, in pr actice in m an y problems, V is not known, and ideally the mo deller w ish es to estimate V and pro vid e an appro ximation to the mean and v ariance of X | Y = y , u n conditionally of V . S upp ose that in addition to the ab o v e mo d elling assump tions, in order to estimate V , a prior mean E ( V ) and prior v ariance V ar( V ) of V are sp eciﬁed, namely V ∼ { ( E ( V ) , V ar( V ) } . Goldstein (197 9, 1983) suggest to estimate V with the Ba y es linear rule V ∗ = E ( V )V ar( Y ∗ ) + y ∗ V ar ( V ) V ar ( Y ∗ ) + V ar( V ) , (5) where y ∗ is an observ ation from Y ∗ , a stati stic that is u n b iased for V , and V ar ( Y ∗ ) is speciﬁed a priori . Then the Ba y es r ule µ is rep laced by the r ule µ ∗ , wh ere V in µ is replaced by its estimate V ∗ . O n e can see that the revised regression matrix A ∗ xy b ecomes A ∗ xy = V ar ( X ) V ar ( X ) + V ∗ = V ar ( X )V ar( Y ∗ ) + V ar( X )V ar( V ) V ar ( X )V ar( Y ∗ ) + V ar( X )V ar( V ) + E ( V )V ar( Y ∗ ) + y ∗ V ar( V ) and so th e v ariance mo diﬁ ed Bay es ru le for E ( X | Y = y ) is µ ∗ = E ( X ) + A ∗ xy { y − E ( X ) } . 7 F rom Theorem 1, it is eviden t that the Ba y es ru le (4) is equiv alent to X − A xy Y ⊥ 2 Y | V . The Ba yes rule (5) co rr esp ond s to th e p ostulate V − B vy Y ⊥ 2 Y , although th e latter do es n ot establish the equiv alence of the PSPP m o dels and Ba y es linear estimation metho ds, since it can b e v eriﬁed that µ ∗ and V ∗ are n ot the s ame as in th e PSPP mo delling approac h (see Section 3). In addition, the roles of Y ∗ and y ∗ are not fu lly u ndersto o d; for example one question is h o w y and y ∗ are related and how one can determine y ∗ from y , esp ecial ly wh en y is a v ector of observ ations. Th e main prob lem exp erienced in the v ariance mo d iﬁ ed Ba y es linear estimator µ ∗ is that the related exp ect ed risk R ( µ ∗ ) can not easily b e determined and the w ork in this d irection (Goldstein, 1979, 1983) has led to either intuitiv e ev aluation for R ( µ ∗ ) or it has led to imp osing even more restrictions to the mo del in order to obtain an analytic form u la for R ( µ ∗ ). Although, b oth of these approac h es can work in regression problems, they are n ot app ropriate for time series p roblems, where s equ en tial up d ating is required and thus an accurate ev aluation of th at r isk is n ecessary . On the other hand the PSPP app r oac h com bin es the tw o p ostulates, X − A xy Y ⊥ 2 Y | V and V − B vy Y ⊥ 2 Y , using conditional exp ectations. It should b e noted that the PSPP treatmen t is free of most of the assum ptions made to the v ariance mo diﬁed Ba yes linear system so th at appr o ximate estimation of the p osterior V ar( X | Y ) b e give n . The PSPP mo dels are dev elop ed mainly for multiv ariate regression and time series problems and they are aimed to situations that either a fu lly Ba ye s ian mo d el is not a v ailable, or computationally inte n siv e cal cu lations, suc h as Mont e Carlo metho ds, are un desirable, or a mo del can only b e sp eciﬁed via means and v ariances. 3 The scaled observ ational precision mo del 3.1 Main theory The sca led observ ational p r ecision (SOP) mo del is a co n jugate regressio n mo del, whic h illus- trates the n ormal dynamic lin ear mo del with observ ational v ariances, see for example W est and Harrison (199 7, § 4.5). This model is widely used in practice b ecause it is capable to han- dle th e practical pr oblem of unknown observ ation v ariances. Here we constru ct a PS PP(2) mo del and w e compare it with the usu al co n jugate SOP mo del. Let V b e a sca lar v ariance, X ∈ R m , Y ∈ R p with Z =  X Y       V ∼  µ x µ y  , V  Σ x A xy Σ y A y x Σ x Σ y  , for some kno wn µ x , µ y , Σ x and Σ y . Assuming X − A xy Y ⊥ 2 Y | V , the partially sp eciﬁed p osterior is X | V , Y = y ∼ { µ x + A xy ( y − µ y ) , V (Σ x − A xy Σ y A ′ xy ) } . Let T b e a, generally non-linear, function of Y , often tak en as T = ( Y − µ y ) ′ Σ − 1 y ( Y − µ y ) . Deﬁne K to b e a α times the v ariance of T | V , for some α > 0, and A vτ to b e the regression co eﬃcien t of V on T , conditional on K . W e assume V − A vτ T ⊥ 2 Y , K w ith f orecast T | V , K ∼ ( V , K/α ) and Co v ( T , V | K ) = V ar( V | K ) , 8 where V | K ∼ ( b V , K/η ), w h ic h is η /α times as pr ecise as the conditional distribu tion of T, for some kno wn b V , α, η , with  V T       K ∼ (" b V b V # , K η  1 1 1 ( η + α ) /α  ) . Giv en the observ ation T = τ , and using V − A vτ T ⊥ 2 Y , K with A vτ = α/ ( η + α ) w e hav e E ( V | K, T = τ ) = E ( V | K ) + α η + α [ τ − E ( T | K )] = η b V + ατ η + α , V ar ( V | K, T = τ ) = V ar( V | T = τ ) − C ov( V , T | K ) { V ar( T | K ) } − 1 Co v ( T , V | K ) = K η − K 2 η 2 η α K ( η + α ) = K η  1 − α η + α  = K η + α so that V | K , T = τ ∼ η b V + ατ η + α , K η + α ! . (6) Hence using co n ditional exp ectations, it follo ws that X | Y = y ∼ ( µ x + A xy ( y − µ y ) , η b V + ατ η + α (Σ x − A xy Σ y A y x ) ) , (7) where τ = ( y − µ y ) ′ Σ − 1 y ( y − µ y ). 3.2 Comparison with the conjugate normal/gamma mo del No w consider the relationship of the ab o ve mod el with standard normal co n j ugate mo d els. A t ypical normal conjugate m o del with u n kno w n scalar v ariance V , p ostulates the d istribution of Z giv en V as Z =  X Y       V ∼ N mp  µ x µ y  , V  Σ x A xy Σ y A y x Σ x Σ y  , with the distrib u tion of V as an in verse gamma so that ν s/V ∼ χ 2 ν . Here N mp ( ., . ) denotes the mp -dimensional normal distribu tion and χ 2 ν denotes the c hi-squared distribu tion with ν degrees of freedom. W r iting T = ( Y − µ y ) ′ Σ − 1 y ( Y − µ y ), the conditional d istribution of T giv en V can b e easily deriv ed from the distribution of T V − 1 | V wh ic h is T V − 1 | V ∼ χ 2 p . T h en the p osterior distribution of V − 1 giv en Y = y is p  1 V    T = τ  = p ( τ | V ) p (1 /V ) p ( τ ) ∝  1 V  ( ν + p ) / 2 − 1 exp  − ν s + τ 2 V  , from whic h it is deduced th at, giv en Y = y , ( ν s + τ ) V − 1 | Y = y ∼ χ 2 ν + p . The p osterior distribution of X | Y = y is a m u ltiv ariate Student t distribution based up on ν + p d egrees of freedom with X | Y = y ∼ T m  ν + p, µ x + A xy ( y − µ y ) , ν s + τ ν + p (Σ x − A xy Σ y A y x )  , (8) ν s + τ V    Y = y ∼ χ 2 ν + p , τ = ( y − µ y ) ′ Σ − 1 y ( y − µ y ) . (9) 9 Note that, if b V = ν s/ ( ν + p − 3) , η = ν + p − 3, and α = 1, then the p osterior mean vec tor and co v ariance m atrix of (7) and (8) are identica l. Ho we ver, this is not consistent with the conjugate mo del since from the pr ior assumption ν s/V ∼ χ 2 ν it is E ( V | s ) = ν s ν − 2 6 = b V , ( ν > 2 ) , for an y p > 1. If we wan t to adopt the sa m e prior f or b V = ν s/ ( ν − 2) in b oth the PS PP and the co n jugate mo dels, then the resp ectiv e p osterior means for V will diﬀer, i.e . E ( V | Y = y, PSPP mo del) − E ( V | Y = y, conjugate mod el) = ( p − 1) ν s ( ν − 2)( ν + p − 2) , where w e ha ve used η = ν + p − 3 and α = 1 a s b efore. Note that if Y is a scalar resp onse, e.g . p = 1, then the tw o v ariance estimates are iden tical. So the resp ectiv e p osterior v ariances of equations (7) and (8) will diﬀer accordingly only when p > 1. F rom the p osterior distrib ution of 1 /V w e ha v e th at V ar ( V | Y = y, conjugate mo del) = 2( τ + ν s ) 2 ( ν + p − 2) 2 ( ν + p − 4) (10) while, from equation (6), the resp ectiv e p osterior v ariance for the PSPP model is V ar ( V | K , Y = y, PSPP mo del) = K ν + p − 2 , (11) where we hav e u s ed α = 1 and η = ν + p − 3. If w e choose K = 2( τ + ν s ) 2 / { ( ν + p − 2)( ν + p − 4) } , then the t wo v ariances will b e the same. Note that, irresp ectiv ely of the choic e of K (giv en that K is b ound ed), as th e degrees of freedom ν tend to inﬁ nit y , th e v ariances of b oth equations (10) and (11) conv erge to zero and so as ν → ∞ , V concen trates ab out its mean asymptotically degenerating. 3.3 Application t o t ime series mo delling I The ab o v e ideas can b e app lied to time series mo delling when interest is place d on the esti m a- tion of the observ ation or measurement v ariance. Consider, for examp le, the p -dimensional time series v ector Y t , whic h at a particular time t sets Y t = B t X t + ǫ t , ǫ t ∼ ( 0 , V Z ) , X t = C t X t − 1 + ω t , ω t ∼ (0 , V W ) , (12) where B t is a known p × m d esign matrix, C t is a known m × m transition m atrix and th e inno v ation error sequences { ǫ t } and { ω t } are ind ividually and mutually un correlated. The p × p and m × m co v ariance matrices Z and W are assumed kno wn, wh ile the scalar v ariance V is unkno w n. In itially w e assume X 0 | V ∼ ( m 0 , V P 0 ) and V ∼  b V 0 , K 0 η 0  , for some kno wn m 0 , P 0 , b V 0 , K 0 and η 0 . It is also assumed that a prio ri , X 0 is uncorrelated with { ǫ t } and { ω t } . Denote with y t the information set comprising the observ ations y 1 , y 2 , . . . , y t . 10 Then the PSPP mod el describ ed ab o v e, applies at eac h time t with µ x = C t m t − 1 , µ y = f t = B t C t m t − 1 , Σ x = R t = C t P t − 1 C ′ t + W and Σ y = Q t = B t R t B ′ t + Z , where m t − 1 and P t − 1 are calculate d w ith the same w ay at time t − 1, starting with t = 1. Giv en y t − 1 , the regression matrix of X t on Y t is A xy = A t = R t B ′ t Q − 1 t , wh ic h is indep end ent of V . It follo ws that V | y t ∼ ( b V t , K t /η t ). With α = 1, it is K t = K t − 1 and η t = η t − 1 + 1 so that η t b V t = η t − 1 b V t − 1 + e ′ t Q − 1 t e t , where e ′ t Q − 1 t e t = τ t and e t = y t − f t is th e 1-step forecast err or vecto r. Th e ab o v e estimate b V t appro ximates the v ariance estimate of the conjugate d ynamic linear m o del (W est and Harrison, 1997, § 4.5), w hic h, assuming a prior η t − 1 b V t − 1 V − 1 | y t − 1 ∼ χ 2 η t − 1 , arrives at the p osterior ( η t − 1 b V t − 1 + τ t ) V − 1 | y t ∼ χ 2 η t − 1 + p so that E ( V | y t ) = η t b V t / ( η t + p − 3) ≈ b V t . The v ariance of V | y t in the conjugate mo del is V ar ( V | y t ) = 2 η 2 t b V 2 t ( η t − 2) 2 ( η t − 4) , whereas the resp ectiv e v ariance in the PSPP mo del is V ar ( V | y t ) = K/η t , with K = K 0 . Although these tw o v ariances diﬀer consid er ab ly , in the s en se that in the co n jugate mod el the v ariance of V | y t is a function of th e d ata y t and in the PSPP m o del the v ariance of V | y t is only a fu n ction of time t and on the prior K 0 , it can b e seen that as t → ∞ , b oth v ariances con v er ge to zero and so in b oth cases V | y t concen trates ab out its mean b V t asymptotically degenerating. In the PSP P mo d el, the p osterior m ean vect or and co v ariance matrix of X t | y t are give n b y X t | y t ∼ ( m t , b V t P t ), where m t = C t m t − 1 + A t e t and P t = R t − A t Q t A ′ t . T h ese appro ximate the r esp ectiv e mean v ector and co v ariance matrix pr o duced by the conju gate mo del, which, under the inv erted gamma prior, results to the p osterior Stu den t t d istribution: X t | y t ∼ T m ( η t , m t , b V t P t ). 4 The generalized observ ational precision mo del 4.1 Main theory The generaliz ation of the SOP mod el of S ection 3 when V is a p × p v ariance-co v ariance matrix is not a v ailable and only sp ecia l forms of conjugate SOP mo d els are kno wn (W est and Harrison, 1997, Chapter 16). Th e p roblem is th at since the dimens ions of X and Y are diﬀerent, it is n ot p ossible to scale the co v ariance matrix of X | V by V , b ecause X has dimension m and V is a p × p matrix. This pr oblem is discussed in detail in Barb osa an d Harrison (1992) and T rian tafyllop oulos (2007). Next w e prop ose a generalizatio n of the SO P mo del, in which, give n V , we a vo id to scale the co v ariance matrices of X and Y by V . This setting is more natural than the setting of the SOP , whic h considers the s omewh at mathematicall y conv enien t v ariance scaling. Let V b e a p × p co v ariance matrix, X ∈ R m , Y ∈ R p with Z =  X Y       V ∼  µ x µ y  ,  Σ x A xy (Σ y + V ) (Σ y + V ) A ′ xy Σ y + V  , for some known µ x , µ y , Σ x and Σ y , not d ep end in g on V . Note that no w we cannot gain a scaled precision mo del. Eve n if we assu me prior distributions for Z | V and V , w e can not 11 obtain the marginal distribu tions X | Y = y and V | Y = y in closed form, since the co v ariance matrices of X and Y are not scale d b y V . Assuming X − A xy Y ⊥ 2 Y | V , conditional on V , the partially sp eciﬁed p osterior is X | V , Y = y ∼ { µ x + A xy ( y − µ y ) , Σ x − A xy (Σ y + V ) − 1 A ′ xy } . (13) Deﬁne T = ( Y − µ y )( Y − µ y ) ′ − Σ y and denote with v ec h( V ) the col u mn stac kin g op erator of a low er p ortion of the symmetric p ositiv e deﬁn ite matrix V . Giv en V , the foreca st of T is v ec h ( T ) | V , K ∼  v ec h ( V ) , K α  and Co v { v ech( V ) , v ec h ( T ) } = K η = V ar { v ec h ( V | K ) } , where α , η are kn o wn positive scalars and K is a kno wn { p ( p + 1) / 2 } × { p ( p + 1) / 2 } co v ariance matrix. With b V the prior estimate of V and I p ( p +1) / 2 the { p ( p + 1) / 2 } × { p ( p + 1) / 2 } iden tit y matrix, w e ha ve  v ec h ( V ) v ec h ( T )       K ∼ (" v ec h ( b V ) v ec h ( b V ) # , K η  I p ( p +1) / 2 I p ( p +1) / 2 I p ( p +1) / 2 ( η + α ) α − 1 I p ( p +1) / 2  ) . The reg r ession matrix of v ec h ( V ) on vec h ( T ) is A vτ = α ( η + α ) − 1 I p ( p +1) / 2 . Assumin g no w that v ec h ( V ) − A vτ v ec h ( T ) ⊥ 2 T | K w e obtain the p osterior mean and co v ariance of V as E { v ec h ( V ) | K, T = τ } = v ech( b V ) + α η + α n v ec h ( τ ) − ve ch( b V ) o and V ar { ve ch( V ) | K, T = τ } = V ar { v ec h ( V ) | K } + A vτ V ar { v ec h ( T ) | K } A ′ vτ = K η + α so that v ec h ( V ) | K, T = τ ∼ ( v ec h ( η b V + ατ ) η + α , K η + α ) , (14) from whic h w e see that the p osterior mean of V can b e wr itten as E ( V | K , T = τ ) = b V + α η + α  τ − b V  = η b V + ατ η + α . W e note that in general the regression matrix A xy in (13) will b e a function of V − 1 and this adds more complications to th e calculation of th e mean and co v ariance matrix of X | Y = y . Ho w ever, if we imp ose the assumption that Co v( X , Y | V ) = A V ar ( Y ), w here A is a kno wn m × p matrix not dep ending on V , then A xy = A is indep enden t of V and so w e get X | Y = y ∼  µ x + A xy ( y − µ y ) , Σ x − 1 η + α A xy  Σ y + η b V + ατ  A ′ xy  , (15) where τ = ( y − µ y )( y − µ y ) ′ − Σ y . Giv en that K is b oun ded, as η → ∞ , the co v ariance matrix of v ech( V ) | K , T = τ con v er ges to the zero matrix an d so V | K, T = τ concen trates ab out its mean E ( V | K, T = τ ) asymptotically degenerating. Th is can b e a theoretical v alidation of the prop osed pro cedu re for the accuracy of th e estima tor of V , E ( V | K , T = τ ) = ( η b V + ατ ) / ( η + α ). 12 4.2 Application t o linear regression mo delling A t ypical linear regression mo del sets Y = B X + ǫ, ǫ ∼ (0 , V ) , X ∼ ( µ x , Σ x ) , (16) where Y is a p -dimensional v ector of respon s e v ariables, B is a k n o wn p × m d esign matrix a n d ǫ is a p -dimensional err or ve ctor, which is u ncorrelated with the rand om m -dimensional v ector X . The mean v ector µ x and the co v ariance matrix Σ x are assumed kno wn and Σ y = B Σ x B ′ so that V ar( Y ) = B Σ x B ′ + V . The co v ariance matrix of X and Y is Co v( X , Y ) = Σ x B ′ and so the assumption Cov( X, Y ) = A { V ar( Y ) } − 1 , does not hold, since V ar( Y ) is a function of V . Th u s the p osterior mean vecto r and co v ariance matrix of equation (15) do n ot apply , since now A xy is s to c hastic in V . In order to resolv e th is d iﬃcult y next we prop ose an app ro ximation that will allo w computation of equation (13). In order to proceed, we w ill need to ev aluate E { (Σ y + V ) − 1 | Y = y } and V ar { v ec h { (Σ y + V ) − 1 }| Y = y } . S in ce w e only h a ve e qu ation (14) and we h a v e no information on the distrib u- tion of V , w e can not obtain the ab o v e mean vec tor and co v ariance matrix. Here we c ho ose to adopt an intuitiv e approac h su ggesting that e V = E { (Σ y + V ) − 1 | K, T = τ } ≈ { Σ y + E ( V | K, T = τ ) } − 1 = ( η + α ) n ( η + α )Σ y + η b V + ατ o − 1 , e e V = V ar[v ech { (Σ y + V ) − 1 }| K, T = τ ] ≈ V ar { vec h (Σ y + V ) | K, T = τ } = K η + α . The rea s on in g of this i s as follo ws. Sin ce lim η →∞ V ar { v ec h ( V ) | K, T = τ } = 0, V concen trates ab out its mean and so w e can write V ≈ E ( V | K, T = τ ), for suﬃ cien tly large η . T hen (Σ y + V ) − 1 ≈ { Σ y + E ( V | K, T = τ ) } − 1 . The co v ariance matrix of v ec h { (Σ y + V ) − 1 } has b een set approxima tely th e same with the cov ariance matrix of v ec h (Σ y + V ) ensurin g that for large η , b oth co v ariance matrices con verge to zero. The abov e problem of the sp eciﬁcation of e V and e e V can b e generally present ed as foll ows. Supp ose that M is a b ounded cov ariance m atrix and assume that E ( M ) and V ar { v ec h ( M ) } are ﬁnite and k n o wn. Th e qu estion is, giv en only this in formation, can one obtain E ( M − 1 ) and V ar { v ec h ( M − 1 ) } ? F or example on e can notice that if M follo ws a Wishart or inv erted Wishart distribu tions, then e V is app r o ximately tr u e. F ormally , if M ∼ W p ( n, S ) ( M follo ws the Wishart distribu tion with n degrees of freedom and parameter matrix S , see e.g. Gupta and Nagar, 1999, Chapter 3), we hav e E ( M ) = nS and E ( M − 1 ) = S − 1 / ( n − p − 1) = n { E ( M ) } − 1 / ( n − p − 1), wh ic h implies E ( M − 1 ) ≈ { E ( M ) } − 1 , for large n . If M ∼ I W p ( n, S ) ( M follo ws the in verted Wishart distribution with n d egrees of f r eedom and paramet er matrix S , see e.g. Gupta and Nagar, 1999, Ch apter 3), w e h a ve E ( M ) = S/ ( n − 2 p − 2) and so E ( M − 1 ) = ( n − p − 1 ) S − 1 = ( n − p − 1) { E ( M ) } − 1 / ( n − 2 p − 2), whic h again implies E ( M − 1 ) ≈ { E ( M ) } − 1 , for large n . Of course M might not f ollo w Wishart of in verted Wishart distributions and in many pr actical situations w e w ill not h a v e access to the distr ib ution of M . F or general application w e can verify that E ( M − 1 ) ≈ { E ( M ) } − 1 , if and only if M and M − 1 are uncorrelated. The accuracy of the c hoice of e V is reﬂected on the accuracy of the one-step predictions, whic h is illustrated in Secti on 5.1. W e ca n no w app ly conditional exp ecta tions to obtain the mean v ector and the co v ariance matrix of X | Y = y . Ind eed f rom the ab ov e and equatio n (13) w e ha ve E ( X | Y = y ) = µ x + E ( A xy | Y = y )( y − µ y ) = µ x + Σ x B ′ e V ( y − µ y ) . 13 F or the co v ariance matrix V ar ( X | Y = y ) w e hav e E { V ar( X | V , Y = y ) | Y = y } = Σ x − Σ x B ′ E { (Σ + V ) − 1 | Y = y } B Σ x = Σ x − Σ x B ′ e V B Σ x and V ar { E ( X | V , Y = y ) | Y = y } = V ar[v ec { Σ x B ′ (Σ y + V ) − 1 ( y − µ y ) }| Y = y ] = { ( y − µ y ) ′ ⊗ Σ x B ′ } G p e e V G ′ p { ( y − µ y ) ⊗ B Σ x } . where ⊗ d enotes Kronec ker pro duct, v ec( · ) d enotes the column stac king op erato r of a lo wer p ortion of a matrix and G p is the duplication matrix, namely vec { (Σ y + V ) − 1 } = G p v ec h { (Σ y + V ) − 1 } . Th u s the mea n vec tor and the co v ariance m atrix of X | Y = y are X | Y = y ∼ n µ x + Σ x B ′ e V ( y − µ y ) , Σ x − Σ x B ′ e V B Σ x +[( y − µ y ) ′ ⊗ Σ x B ′ ] G p e e V G ′ p [( y − µ y ) ⊗ B Σ x ]  . (17) W e note that the mean ve ctor and co v ariance matrix of X | Y = y dep end on the estimates e V and e e V . A simple in tuitiv e approac h w as emplo y ed in this section and next we give an assessmen t of this app roac h by simulatio n . In general, equation (17 ) holds wh er e e V and e e V are an y estimates of the mean vect or and co v ariance matrix of (Σ y + V ) − 1 | Y = y . 4.3 Application t o t ime series mo delling I I In this sec tion w e c onsid er the state space mo del (12), but the c ov ariance mat r ices o f the err or drifts ǫ t and ω t are V ar( ǫ t ) = V and V ar( ω t ) = W . Here V is an unkn own p × p co v ariance matrix and W is a kno wn m × m co v ariance matrix. The priors are partially sp eciﬁed by X 0 ∼ ( m 0 , P 0 ) and v ec h ( V ) ∼  v ec h ( b V 0 ) , K 0 η 0  , for some kno w n m 0 , P 0 , b V 0 , K 0 and η 0 . It is also assumed that a priori , X 0 is u ncorrelated with { ǫ t } and { ω t } . Note that in con trast with mo del (12), the ab o ve mo del is not scaled b y V and in fact any factoriza tion of the co v ariance matrices by V w ould lead to restrictiv e forms of the mo del; for a discus sion of this topic see Harv ey (1989) , Barb osa and Harrison (1992 ), W est and Harr ison , (1997, § 16.4), and T rian tafyllop oulos (2006a, 2007) . Before we giv e the prop osed estima tion algorithm, w e giv e a brief description of the r elated matrix- v ariate d y n amic models (MV-DLMs) and the restrictions imp osed in these mo d els. Supp ose { Y t } is a p -d im en sional vec tor of observ ations, w hic h are ob s erv ed in rough ly equal in terv als of time t = 1 , 2 , 3 , . . . . W rite Y t = [ Y 1 t Y 2 t · · · Y pt ] ′ , where eac h of Y it is mo delled as a univ ariate dynamic linear mo d el (DLM) : Y it = B ′ t X it + ǫ it , X it = C t X i,t − 1 + ω it , ǫ it ∼ N (0 , σ ii ) , ω it ∼ N m (0 , σ ii W i ) , where B t is an m -dimensional design v ector, X it is an m -d imensional state v ector, C t is an m × m transition matrix and the error drifts ǫ it and ω it are individu ally and mutually 14 uncorrelated and also they are uncorrelated with the s tate prior X i, 0 , whic h is assumed to follo w the norm al distribution X i, 0 ∼ N m ( m i, 0 , P i, 0 ), for some kno wn m i, 0 and P i, 0 . The m × m co v ariance matrix W i is assumed kno wn and the v ariances σ 11 , σ 22 , . . . , σ pp form the diagonal elemen ts of the co v ariance matrix Σ = ( σ ij ) i,j =1 , 2 ,...,p , whic h is assumed unkn o wn and it is sub ject to Ba yesia n estimat ion u nder the inv erted Wishart prior Σ ∼ I W p ( n 0 + 2 p, n 0 S 0 ), for some kno wn n 0 and S 0 . The mod el ca n b e wr itten in compact form as Y ′ t = B ′ t X t + ǫ ′ t , X t = C t X t − 1 + ω t , ǫ t ∼ N p (0 , Σ) , ve c ( ω t ) ∼ N mp (0 , Σ ⊗ W ) , (18) where B ′ t = [ B ′ 1 t B ′ 2 t · · · B ′ pt ], X t = [ X 1 t X 2 t · · · X pt ], C t = diag( C 1 t , C 2 t , . . . , C pt ), v ec ( X 0 ) ∼ N mp { v ec( m 0 ) , Σ ⊗ P 0 } , for m 0 = [ m 1 , 0 m 2 , 0 · · · m p, 0 ] and P 0 = diag( P 1 , 0 , P 2 , 0 , . . . , P p, 0 ). Mo del (18) is termed as matrix-v ariate dynamic linear m o del (MV-DLM) and it is stud ied in Quinta n a and W est (1987 , 1988), Smith (1992 ), W est and Harrison (1997, Chapter 16) T r ian tafyllop oulos and Pik oulas (2002), Salv ador et al. (2003, 2004), Salv ador and Gargallo (2004 ), and T rian tafyllop oulos (2006a, 2006b); Harve y (1986, 1989) d ev elop a s imilar mo del where Σ is esti m ated b y a quasi lik eliho od estimation pro cedu re. The disadv an tage of mo del (18) is that Y 1 t , Y 2 t , . . . , Y pt are restricted to follo w similar patterns s in ce the mo del comp o- nen ts B t and C t are common for all i = 1 , 2 , . . . , p . One can notice that the only diﬀerence b et ween Y it and Y j t ( i 6 = j ), is due to the error drifts ǫ it , ω it and ǫ j t , ω j t . Th us , for example, mo del (18) is not appropriate to mo del Y t = [ Y 1 t Y 2 t ] ′ , where Y 1 t is a trend time series and Y 2 t is a seasonal time series. It follo ws that w hen there are structural c hanges b et ween Y it and Y j t , the MV-DLM might b e th ough t of as restrictiv e and in appropriate mo del and its u se is not recommended. When p is large one can hardly justify the “similarit y” of Y 1 t , Y 2 t , . . . , Y pt . W e b eliev e that in practice the p opularit y of th e MV-DLM is dr iven from its mathematical prop erties (fully Ba y esian conjugate estimation pro cedur es for sequ en tial forecasting and ﬁl- tering/smo othing), rather t h an from a d ata d riv en analysis. Although w e accept that in some cases the MV-DLM can b e a useful mo d el, w e would sub mit that in many time series pr oblems this mod el is unjus tiﬁ ab le a n d the ab o v e discussion expresses our reluctance in suggesting the MV-DLM for general u se for multiv ariate time series problems. Returning now to the PS PP dynamic mo del, denote with y t the information set comprising data y 1 , y 2 , . . . , y t . If at time t − 1 the p osterio r s are p artially sp eciﬁed by X t − 1 | y t − 1 ∼ ( m t − 1 , P t − 1 ) and v ec h ( V ) | y t − 1 ∼ { v ech( b V t − 1 ) , η − 1 t − 1 K t − 1 } , for some kn o wn m t − 1 , P t − 1 , b V t − 1 , K t − 1 and η t − 1 , then b y direct application of the theory of Section 4 we ha ve for time t : µ x = C t m t − 1 , Σ x = R t = C t P t − 1 C ′ t + W , µ y = f t = B t C t m t − 1 , Σ y = B t R t B ′ t and A xy = A t = R t B ′ t ( B t R t B ′ t + V ) − 1 . The 1-step ahead forecast co v ariance matrix is Q t = V ar( Y t | y t ) = B t R t B ′ t + b V t − 1 and so w e h av e Y t | y t − 1 ∼ ( f t , Q t ). Giv en Y t = y t , the error vec tor is e t = y t − f t and so th e p osterior mean of V | y t is η t b V t = η t − 1 b V t − 1 + e t e ′ t − B t R t B ′ t , where w e ha ve used α = 1. Thus it is v ec h ( V ) | y t ∼  v ec h ( b V t ) , K t η t  , where η t = η t − 1 + 1 an d K t = K t − 1 . It follo ws that K t = K 0 and therefore as t → ∞ , V | y t concen trates ab out b V t asymptotically degenerating. By observin g that B t R t B ′ t = Q t − b V t − 1 and writing the up dating of b V t recurrentl y , we get b V t = b V t − 1 + e t e ′ t − Q t η t = b V 0 + t X i =1 e i e ′ i − Q i η 0 + i . 15 By forming now th e standardized 1-step ahead forecast errors e ∗ t = Q − 1 / 2 t e t , where Q − 1 / 2 t denotes the symmetric square r o ot of Q − 1 t , on e can obtain a measure of go o dness of ﬁt, since e ∗ t ∼ (0 , I p ). Th is can easily b e implemen ted, by c hec king whether the mean of e ∗ 1 ( e ∗ 1 ) ′ , e ∗ 2 ( e ∗ 2 ) ′ , . . . , e ∗ t ( e ∗ t ) ′ is close to I p or equiv alen tly b y c hecking th at, f or e ∗ t = [ e ∗ 1 t e ∗ 2 t · · · e ∗ pt ] ′ , the mean of eac h ( e ∗ i, 1 ) 2 , ( e ∗ i, 2 ) 2 , . . . , ( e ∗ it ) 2 is clo s e to 1 and e ∗ it is uncorrelated with e ∗ j t , for all t and i 6 = j . Applying the pr o cedure ad op ted in linear regression, w e ha ve that the p osterior mean v ector and co v ariance matrix are giv en b y X t | y t ∼ ( m t , P t ), with m t = C t m t − 1 + R t B ′ t e V t e t and P t = R t − R t B ′ t e V t B t R t + ( e ′ t ⊗ R t B ′ t ) G p e e V t G ′ p ( e t ⊗ B t R t ) , where e V t = ( B t R t B ′ t + b V t ) − 1 and e e V t = K 0 η t . F rom η t = η t − 1 + 1 it follo ws that as lim t →∞ η t = ∞ it is lim t →∞ e e V t = 0 and so for large t th e p osterior co v ariance matrix P t can b e approximat ed b y P t ≈ R t − R t B ′ t e V t B t R t . T his can motiv ate computational sa vings, since there is no need to p erform calc u lations in volving Kronec ker prod ucts. 5 Numerical illustrations In this sec tion w e giv e t wo n umerical examples of the state space m o del consider ed in Section 4.3. 5.1 A simula t ion study W e sim u late 1000 biv ariate time s eries u nder 3 state space mo dels and w e compare th e p erforman ce of the prop osed mo del of Section 4.3 (referred her e as DLM1), of the MV-DLM discussed in 4.3 (referr ed here as DLM2) a n d of the general multiv ariate dynamic linear mo del (referred here as DLM3) . Let Y t = [ Y 1 t Y 2 t ] ′ b e a biv ariate time series. In the ﬁrst state space mo del w e sim u late 1000 biv ariate time series from the model Y t =  1 0 0 1  X t + ǫ t , X t =  1 0 0 1  X t − 1 + ω t , ǫ t ∼ N 2 (0 , V ) , ω t ∼ N 2 (0 , I 2 ) , (19) where X t is a biv ariate state v ector and the remaining comp onent s are as in Section 4.3. Initially w e assume that X 0 ∼ N 2 (0 , I 2 ) and th e co v ariance matrix V is V = ( V ij ) i,j =1 , 2 =  1 2 2 5  , whic h means that the v ariables Y 1 t and Y 2 t are highly correlated. The generated time series { Y t } comprise t wo lo cal lev el comp onents, namely { Y 1 t } and { Y 2 t } . W e note th at DLM3 is the correct mod el, since it is used to generate the 1000 time series. 16 T able 1: Performance of th e PS PP dynamic m o del (DLM1), MV-DLM (DLM2) and the general b iv ariate dyn amic mo del (DLM3) o ver 1000 sim u lated time series of t wo lo cal lev el comp onent s (LL), one local lev el and one linear trend comp onent (L T) and one lo cal lev el and one seasonal comp onen t (LS ). S ho wn are the a ve r age (o ve r all 1000 simula ted series) v alues of the mean square standard er r or (MSS E), of the mean square error (MSE), of the mean absolute error (MAE) and of the mean error (ME). t yp e mod el MSSE MSE MAE ME y 1 t y 2 t y 1 t y 2 t y 1 t y 2 t y 1 t y 2 t LL DLM1 0.905 1.045 2.536 7.975 1.521 2.249 -0.049 -0 .022 DLM2 1.009 1.075 2.556 8.635 1.259 2.348 0.012 -0.004 DLM3 0.998 1.022 2.342 7.894 1.208 2.238 0.013 0.008 L T DLM1 0.913 1.057 3.407 13.017 1.399 2.784 -0.157 -0 .276 DLM2 1.113 1.075 3.835 16.105 1.55 2 3.170 -0.003 -0.106 DLM3 0.996 0.993 2.569 11.221 1.27 4 2.614 -0.093 -0.320 LS DLM1 1.0 54 0.9 53 2.373 7. 897 1.228 2.235 0.015 0.119 DLM2 1.186 2.829 2.450 2 00.963 1.2 59 10.755 -0.00 6 0.05 7 DLM3 0.982 0.994 2.361 7.856 1.224 2.218 0.017 0.112 In the sec ond state space model we simulate 10 00 time series from the mod el Y t =  1 0 0 1  X t + ǫ t , X t =  1 1 0 1  X t − 1 + ω t , ǫ t ∼ N 2 (0 , V ) , ω t ∼ N 2 (0 , I 2 ) , and the remaining comp onen ts are as in (19). The generate d time series from this mo del are time series comprising { Y 1 t } as a lo cal lev el comp onen t and { Y 2 t } as a linear trend comp onent. Finally , in the third state space mod el, we sim ulate 1000 time series from th e model Y t =  1 0 0 0 1 0  X t + ǫ t , X t =   1 0 0 0 cos( π / 6) sin( π / 6) 0 − sin( π / 6) cos ( π / 6)   X t − 1 + ω t , (20) where ǫ t ∼ N 2 (0 , V ), ω t ∼ N 3 (0 , I 3 ) and here X t is a triv ariate state vec tor with initial distribution X 0 ∼ N 3 (0 , I 3 ) and the r emaining comp onent s of the mo del are as in (19). The generated time series from this mo del are biv ariate time series comprisin g { Y 1 t } as a lo cal lev el comp onen t and { Y 2 t } as a seasonal comp onen t with p erio d π / 3. Su c h seasonal time series app ear frequen tly (Ameen and Harrison, 198 4; Go dolphin , 2001; Harvey , 2004). T ables 1 and 2 sho w the resu lts. In T able 1 the three state space mo dels (DLM1, DLM2 and DLM3) are compared via the mean of squared stand ard 1-step forecast errors (MSSE), the mean squ are 1- s tep forecast error (MSE), the mean absolute 1-step forecast error (MAE) and the mean 1-step forecast err or (ME). F or a discussion of these measures of go o d ness of ﬁt, known also as measures of forecast accuracy , the reader is referred to general time series textb o oks, see e.g. Reinsel (1997) and Dur bin and Ko opman (2001). In a Ba y esian ﬂav our , go o dness of ﬁt ma y b e measured via comparisons with MCMC metho ds (whic h pro vide the correct p osterior destinies) or via Ba y es m onitoring s y s tems, suc h as those using Ba y es f actors; see W est and Harrison (1997 ). 17 T able 2: P erformance of estimators of the co v ariance matrix V = ( V ij ) i,j =1 , 2 , pro du ced by the PSPP dynamic mod el (DLM 1) and the MV-DLM (DLM2). Sho wn are the a v erage (o ver all 1000 simulated series; see T able 1) v alues of eac h estimator for times t = 100, t = 200 and t = 500. t yp e V = ( V ij ) ij =1 , 2 DLM1 D LM2 DLM1 DLM2 DLM1 DLM2 t = 100 t = 200 t = 500 LL V 11 = 1 1.347 0.961 1.072 0.954 0.988 0.974 V 12 = 2 2.352 1.047 1.792 0.914 2.087 1.113 V 22 = 5 5.846 3.407 4.332 2.874 5.215 3.290 L T V 11 = 1 2.087 0.475 1.599 0.647 1.210 0.678 V 12 = 2 3.169 0.463 2.375 0.721 2.217 0.802 V 22 = 5 6.200 2.509 4.627 2.718 5.043 2.851 LS V 11 = 1 0.627 0.729 0.782 0.851 0.960 0.955 V 12 = 2 1.497 0.887 1.674 0.901 1.872 0.907 V 22 = 5 4.084 3.548 4.104 11.439 4.626 76.60 9 Section 4.3 details ho w the MSSE h as b een calculated. Out of the three mod els w e kno w that DLM3 is the correct mo d el, since it is us ed to generate the time series data. F or th e lo cal lev el comp on ents (LL), b oth DLM1 and DLM2 put goo d p erf orm ances with the DLM2 ha ving the edge and b eing closer to the p erformance of the DLM3 . This is exp ected, since as w e n oted in Section 4.3 when b oth time series comp onents Y 1 t and Y 2 t are similar the MV- DLM (DLM2) has go o d p erformance. Ho w ever, in the L T and L S time s er ies comp onen ts, where the t wo series Y 1 t and Y 2 t in eac h case, are not similar, we exp ect that the DLM2 will not p erf orm v ery w ell. T his is indeed co n ﬁrmed by our sim ulations, for which T able 1 clea r ly sho ws that the p erformance of DLM1 is b ette r than that of the DLM2. F or example, for the LS comp on ent, the MSSE of the DLM1 is [1 . 054 0 . 953] ′ , whic h is close to [1 1] ′ , while the resp ectiv e MSSE of the DLM2 is [1 . 186 2 . 829] ′ . T able 2 lo oks at the accuracy of t h e estimation of th e co v ariance matrix V , for eac h mod el. F or the LL comp onent s V 11 = 1 is estimated b etter from DLM2, although for t = 500 the estimate from DLM1 is s ligh tly b etter. F or V 12 = 2 and V 22 = 5 , DLM2 pro duces p o or results as compared to the DLM1. F or example, even for t = 500 the estimate of V 22 = 5 of the DLM2 is only 3.290, while the estimate of the DLM1 is 5.215. This ph enomenon app ears to b e magniﬁed w hen lo oking at th e L T and LS comp onen ts, where f or example ev en at t = 500 for the L T the estimate of V 12 = 2 and for the LS the estimat e of V 22 = 5 are 0.802 and 76.609 , wh ile the r esp ectiv e estimates from the DLM1 are 2.217 and 4.626. The conclusion is that the DLM1 pro du ces a consisten t estimation b eha viour o ver a wide range of biv ariate time series, while the DLM2 (matrix-v ariate DLM) pro duces acceptable p erformance when the comp onent time series are all s im ilar. It should b e state d h ere that, the matrix-v ariate state s pace mo dels of Harv ey (198 6) pro du ce a similar p erformance with the DLM2; Harvey (1989) call s the abov e m atrix-v ariate mo dels as ’seemingly unrelated time series models’ to indicate the similarit y of the comp onen t time series. The mo dels of T rian tafyllop oulos and Pikoulas (2002) and T riantafyllo p oulos (2006 a, 2006b) and of many other authors (see the cit ations in Harvey , 1989 ; W est and Harrison, 1997; Dur bin and Ko opman, 2001) can only accommo date f or r egression t yp e state 18 Real data vs 1−step forecasts year response 1950 1955 1960 1965 1970 0 50 100 150 Figure 1: US Inv estment and C hange in I n ven tory time series y t = [ y 1 t y 2 t ] ′ with its 1-step forecast mean f t = [ f 1 t f 2 t ] ′ . The top solid line shows y 1 t and the b ottom solid line sh o ws y 2 t ; the top d ashed line sho ws f 1 t and the b ottom dashed line s h o ws f 2 t . space m o dels and for lo cal leve l mo dels. More general structur es, suc h that of mo del (20 ) can only b e dealt w ith via simulatio n -based metho ds , such as Monte Carlo simulatio n . F or high-dimensional dynamical sy s tems and in particular for observ ation co v ariance estimation, the prop osal of PSP P s tate sp ace mo del of Section 4.3 oﬀers a fast and reliable appro ximate estimation pro cedure, whic h can b e applied for a wid e range of time series. 5.2 The US inv estmen t and business inv entory data W e consider US inv estmen t and c h ange in business in ven tory data, whic h are deseasonalised and they are measured quarterly in to a biv ariate time series (v ariable y 1 t : US in v estment data and v ariable y 2 t : US change in inv en tory data ) o v er the p erio d 1947-1971. The data are fully describ ed and tabulated in L¨ utk ep ohl (19 93) and Reinsel (1 997, Ap p end ix A). T he data are plotted in Figure 1 with their forecasts, wh ic h are generated by ﬁtting the linear trend PSPP state space m o del 19 Y t =  1 0 0 1  X t + ǫ t , X t =  1 1 0 1  X t − 1 + ω t , ǫ t ∼ ( 0 , V ) , ω t ∼ ( 0 , W t ) , (21) where here we ha ve not sp eciﬁed the distributions of ǫ t and ω t as normal and w e ha ve replaced the time-i nv ariant W of Section 4.3 with a time-dep endent W t . Mo d el (21) is a PSPP linear trend sta te space mo del, for whic h w e choose the p riors m 0 = [80 . 62 2 4 . 047] ′ (mean of [ Y 1 t Y 2 t ] ′ for t = 1941 − 1956, indicated in Figure 1 b y the v ertical line) , P 0 = 1 000 I 2 (w eakly informativ e p rior co v ariance matrix or lo w pr ecision P − 1 0 ≈ 0 ) and V 0 =  66 . 403 22 . 239 22 . 239 46 . 547  , whic h is tak en as the sample co v ariance matrix of Y 1 t and Y 2 t , for the time p erio d 1941-1955 . The co v ariance matrix W t measures the durabilit y and the stabilit y of the change or evo lu tion of the states X t . Here we sp ecify W t with 2 discount factors, δ 1 and δ 2 , as follo ws. With G as the ev olution matrix of X t and ∆ the discount matrix G =  1 1 0 1  , ∆ =  δ 1 0 0 δ 2  , w e hav e W t = ∆ − 1 / 2 GP t − 1 G ′ ∆ − 1 / 2 − GP t − 1 G ′ , where R t in the r ecursions of Section 4.3 is replaced by R t = GP t − 1 G ′ + W t . Although this discoun ting sp eciﬁcation is n ot advocated b y W est and Harrison (1997, § 6.4), it has b een successfully u sed (McKenzie, 1974, 1976; Abraham and Ledolter, 198 3, Chapter 7; Ameen and Harrison, 19 85; Go o dwin , 1997). The v alues of δ 1 and δ 2 are c hosen by exp erimentatio n . The ab o ve mo del gav e the b est result with a combination of discount f actors δ 1 = 0 . 2 and δ 2 = 0 . 4. The p erformance measures were MSSE = [1 . 001 1 . 101] ′ , MSE = [111 . 165 66 . 941] ′ , MAE = [6 . 718 6 . 855] ′ and ME = [0 . 076 1 . 725] ′ . Other com binations of δ 1 and δ 2 yield less accurate results, with the usual eﬀect that one of the t wo series y 1 t and y 2 t is accurately pred icted, b ut the other one series is bad ly predicted. This problem ce r tainly arises when δ 1 = δ 2 , whic h clearly indicate s the need of multiple discoun ting. Also, Figure 2 p lots the observ ation v ariance, cov ariance and correlation estimates in the time p erio d 1956-1970 . F r om this plot we observe that the v ariability of the c hange in in ven tory time s er ies comp onen t y 2 t is muc h larger than th at of y 1 t . The estimate of the observ ation correlation indicates the h igh cross-correlation b etw een the t wo series. 6 Discussion This pap er deve lops a metho d f or approxima ting the ﬁrst tw o momen ts of the p osterior distribution in Ba yesia n inference. This w ork is particularly app ealing in regression and time series problems when the resp onse and parameter distribu tions are on ly p artially sp eciﬁed b y m eans and v ariances. Our partially sp eciﬁed prior p osterior (PSPP) m o dels oﬀer an appro ximation to prior/p osterior up d ating, which is app ropriate for sequen tial app lication, suc h as in time series analysis. The s imilarities and diﬀerences with Bay es linear metho ds 20 Posterior variance year variance / covariance 1960 1965 1970 20 30 40 50 60 Posterior correlation year correlation 1960 1965 1970 0.94 0.95 0.96 0.97 0.98 0.99 1.00 Figure 2: P osterior estimates of the ob s erv ation co v ariance matrix V = ( V ij ) i,j =1 , 2 and esti- mates of the correlation ρ = V 12 / √ V 11 V 22 . I n th e left panel graph, sho wn are: estimate of the v ariance V 11 (solid line), estimate of the v ariance V 12 (dashed line), and estimate of the v ariance V 22 (dotted line). In the right panel graph, the solid line sho ws the estimate of ρ . are indicated and, although the authors do b eliev e that Ba yes linear metho ds oﬀer a great statistica l to ol, it is p ointed out th at in some problems, consid ered in this pap er and in particular f or time ser ies data, the PSPP modelling approac h can oﬀer adv an tages as opp osed to Ba y es linear metho d s. PSPP mo dels are dev elop ed having in mind Ba ye s ian inference f or multiv ariate state space mo dels wh en the observ ation co v ariance matrix is u nkno w n and it is sub ject to estimation. This pap er outlines the deﬁciency of the existing metho ds to tac kle this problem and it is sho wn empirically that, for a class of important time series data, including lo cal lev el, linear trend and seasonal comp onents, PS PP generates muc h more accurate and reliable p osterior estimators, whic h are remark ably fast and applicable to a w id e range of time ser ies data. US in vestmen t and change in inv en tory d ata are used to illustr ate the capabilities of the PSPP state space mod els. Giv en the similarities of the PSP P with Ba ye s linear method s, it is b eliev ed that the applicabilit y of the PS PP approac h go es b eyond the examples considered in this pap er . F or example one area th at is only slightly touc h ed, is inference for data follo wing non-normal 21 distributions, other than the multiv ariate t , the in verted m u ltiv ariate t , and the Wish art distributions. In this sense a more detailed comparison of PS P P with Ba yes linear metho ds and in particular with Ba y es linea r kinemati cs (Goldstein and Sha w, 2004), should shed more ligh t on the p erformance of P SPP . It is our pu rp ose to consid er s u c h co m parisons in a futu re pap er. Ac kno wledgemen ts The authors are grateful to the S tatistics D ep artment at W arwic k Univ ers ity , where th is work w as initiated. W e are grateful to three referees for pro viding helpful commen ts. App endix Pr o of of The or em 1. (= ⇒ ) By hyp othesis E ( X | Y ) = µ x + A xy ( Y − µ y ) ⇒ E ( X − A xy Y | Y ) = µ x − A xy µ y = constant. F urthermore V ar( X | Y ) = E { ( X − µ x − A xy ( Y − µ y ))( X − µ x − A xy ( Y − µ y )) ′ | Y } = Σ x − A xy Σ y A ′ xy = constant ⇒ V ar ( X − A xy Y | Y ) = V ar( X | Y ) = constan t. It follo ws that X − A xy Y ⊥ 2 Y . ( ⇐ =) T h e assumption X − A xy Y ⊥ 2 Y im p lies that E ( X − A xy Y | Y ) = µ constant ⇒ E ( X | Y ) = A xy Y + µ , whic h is a linear function of Y . Giv en that E ( X | Y ) m inimizes th e quadratic prior exp ected risk and µ x + A xy ( Y − µ y ) minimizes this r isk among all linear estimators, it follo ws that E ( X | Y ) = µ x + A xy ( Y − µ y ). References [1] Abr aham, B. and Ledolter, A. (1983) Statistic al Metho ds f or F or e c asting . Wiley , New Y ork. [2] Ameen, J.R.M. and Harrison, P .J. (19 84) Discoun t w eight ed estimation. Journal of F or e- c asting 3 , 285-296. [3] Ameen, J.R.M. and Harrison, P .J. (1985 ) Normal discoun t Ba yesian mod els. In Bayesian Statistics 2 , J.M. Bernardo, M.H. DeGroot, D.V. Lindley , and A.F.M. Sm ith (Ed s). North- Holland, Amsderdam, and V alencia Unive r sit y P r ess. [4] Barb osa, E . and Harrison, P .J. ( 1992) V ariance estimation for m ultiv ariate dynamic lin ear mo dels. Journal of F or e c asting 1 1 , 621-62 8. [5] Bo x, G.E.P . and Tiao, G.C. (1973) Bayesian Infe r enc e in Statistic al Analysis . Ad dison- W esley , Massac husetts. [6] Dic key , J.M. (1967) Matrix-v ariate generalizations of the m u ltiv ariate t distrib ution and the in verte d m u ltiv ariate t distribution. Annal s of Mathematic al Statistics 38 , 511 -518. [7] Durb in, J. and Ko op m an, S.J. (200 1) Time Series Analysis by State Sp ac e Metho ds . O x - ford Univ ersit y P ress, Oxford. [8] F ahrmeir, L. (1992) Poste r ior mo de estimation b y extended Kalman ﬁltering for m u lti- v ariate dynamic generalized linear mo dels. Journa l of the Am eric an Statistic al Asso ciation 87 , 501 -509. 22 [9] F ahrmeir, L. and Kaufmann , H. (1987) Regression mo dels for non-stationary categorica l time series. Journal of Time Series Analysis 8 , 147-160. [10] F ahrmeir, L. and Kaufmann, H. (1991) On Kalman ﬁltering, posterior mo d e estimat ion and Fisher scoring in dy n amic exponential family regression. Metrika 38 , 37-60. [11] F ahrmeir, L. an d T utz, G. (2001) Multivariate Statistic al M o del ling Base d on Gener alize d Line ar Mo dels , 2nd edn. Springer-V erlag, New Y ork. [12] Gamerman, D. (1997 ) Markov Chain Monte Carlo - Sto chastic simulation for Bayesian infer e nc e . Chapman and Hall, New Y ork. [13] Go dolphin, E.J. (2001) Observ able trend -p ro jecting state-space mo dels. Journal of Ap- plie d Statistics 28 , 379-38 9. [14] Go dolphin, E.J. and T rian tafyllop oulos, K. (2006) Decomp osition of time series mo d els in state- s p ace form. Computationa l Statistics and Data Analy sis 50 , 2232-224 6. [15] Goldstein, M. (1976). Ba y esian analysis of regression problems. Biometrika 63 , 51-58. [16] Goldstein, M. (1 979). The v ariance m o diﬁed linear Ba yes estimator. Journal of the R oyal Statistic al So ciety Series B 41 , 96-100. [17] Goldstein, M. (1983). General v ariance mo diﬁ cations for linea r Bay es estimators. Journa l of the Americ an Statistic al A sso ciation 78 , 616-618. [18] Goldstein, M. and Shaw, S . (2004 ) Ba y es linear kinematics and Ba yes linear Ba y es graph- ical mo dels. Biometrika 91 , 425-446 . [19] Go odw in, P . (1997) Adjusting judgementa l extrap olations u sing Th eil’s metho d and discoun ted we ighted regression. Journal of F or e c asting 16 , 37-46. [20] Gu pta, A.K. and Nagar, D.K. (1999). Matrix V ariate Distributions . Chapman and Hall, New Y ork. [21] Hartigan, J.A. (1969) Linear Ba yesian metho ds . Journal of the R oyal Statistic al So ciety Series B 31 , 446-45 4. [22] Harvey , A.C. (198 6) Analysis and ge n eralisation of a m ultiv ariate e xp onen tial smo othing mo del. Management Sci e nc e 32 , 374-380. [23] Harvey , A.C. (1989) F or e c asting Structur al Time Series Mo dels and the Kalman Filter . Cam br idge Univ ersit y Pr ess, Cam bridge. [24] Harvey , A.C. (2 004) T ests for cycles. In State Sp ac e and Unobserve d Comp onent Mo dels: The ory and Applic ations , A.C Harv ey , S.J . Ko opman and N. S hephard (Eds.). Cam bridge Univ ersity Press, Cam bridge. [25] Horn , R.A. and John son, C.R. (1999) M atrix Analysis. C am brid ge Universit y Press, Cam br idge. [26] K edem, B. and F okianos, K . (20 02) R e gr ession Mo dels for Time Series Analysis . Wile y , New Y ork. 23 [27] K hatri, C.G., Khattree, R. and Gupta, R.D. (1991) On a class of orthogonal inv arian t and residual indep endent matrix distributions. Sankhy¯ a Se rie s B 53 , 1-10. [28] K itaga w a, G. and Gersc h , W. (1996) Smo othness Priors Ana lysis of Time Series . Springer-V erlag, New Y ork. [29] L au r itzen, S. (19 96) Gr aphic al Mo dels. Oxford Univ ersity Pr ess, Oxford. [30] L eonard , T. and Hsu, J.S.J. (1 999) Bayesian Metho ds. Cam br idge Universit y Press, Cam br idge. [31] L ¨ utk ep ohl, H. (1993) Intr o duction to Multiple Time Series Analysis . Springer-V erlag, Berlin. [32] Mard ia, K.V., Ken t, J.T. and Bibb y , J.M. (1979 ) Multivariate A nalysis . Academic Press, London. [33] McCu llagh, P . and Nelder, J.A. (1989) Gener alize d Line ar Mo dels , (2nd. edition). Ch ap - man and Hall , Lond on. [34] McKenzie, E. ( 1974) A comparison of standard forecasting systems with the Bo x-Jenk in s approac h. The Statistician 23 , 107-1 16. [35] McKenzie, E. (1976) An analysis of general exponential smo othing. O p er ational R ese ar ch 24 , 131 -140. [36] Mouchart, M. and Simar, L. (198 4) A note on least-squares approxima tion in the Ba y esian analysis of regression m o dels. Journal of the R oyal Statistic al So ciety Series B , 46 , 124 -133. [37] O ’Hagan, A. and F ors ter, J.J . (2004) Bayesian Infer enc e , 2nd edn. Kendall’s Adv anced Theory of S tatistics, V ol. 2B. Arnold, London. [38] P ilz, J. (1986) Minimax linear regression estimation w ith s y m metric parameter restric- tions. Journal of Statistic al Planning and Infer enc e 13 , 297-318. [39] P itt, M.K. and Shephard, N. (1999) Time v arying cov ariances: a factor sto c h astic v olatil- it y approac h (with discussion). In J.M. Bernardo, J.O. Berger, A.P . Da wid and A.F.M. Smith (Eds.), Bayesian Statistics 6 , 547-57 0, Oxford Universit y Press, Oxford. [40] Q uint ana, J.M. and W est, M. (1987). An analysis of internatio n al exc hange rates u sing m ultiv ariate DLMs. The Statistician 36 , 275-28 1. [41] Q uint ana, J .M. and W est, M. (198 8) Time series analysis of comp ositional d ata. In Bayesian Statistics 3 , J.M. Bernardo, M.H. DeGro ot, D.V. Lindley and A.F.M. Smith (Eds.). Oxford Univ ersit y Press, Oxford, 74 7-756. [42] Reins el, G.C. (1997) Elements of Multivariate Ti me Series Analysis , 2nd. ed. S pringer- V erlag, New Y ork. [43] S alv ador, M. and Gargallo, P . (2004). Automatic monitoring and int erven tion in multi- v ariate d y n amic linear models. Computationa l Statistics and Data Analysis 47 , 40 1-431. 24 [44] S alv ador, M., Gallizo , J .L. and Gargallo, P . (200 3). A dynamic principal comp onen ts analysis b ased on multiv ariate matrix normal d y n amic linear m o dels. J ournal of F or e c asting 22 , 457 -478. [45] S alv ador, M. , Galliz o, J.L. and Gargallo, P . (2004). Ba yesia n inference in a matrix normal dynamic linear mod el with u nknown co v ariance matrices. Statistics 38 , 307- 335. [46] S mith, J.Q. (19 92) Dynamic graphical mod els. In Bay e sian Statistics 4 , J.M. Bernardo, J.O. Berger, A.P . Da wid and A.F.M. Smith (Eds.). Oxford Univ ersit y Press, Oxford, 741- 751. [47] S riv asta v a, M. and Sen, A. (1990) R e gr ession Analysis: The ory, Metho ds and Applic a- tions . Springer-V erlag, New Y ork. [48] T iao, A.C. and Zellner, A. (1964) Ba yes’ theorem and the u se of prior k n o wledge in regression analysis. Biometrika 51 , 219-230. [49] T rianta f yllop oulos, K . (20 07) Cov ariance estimation for m ultiv ariate conditionally Gaus- sian dynamic linea r mo dels. Journal of F or e c asting (to app ear). [50] T rianta f yllop oulos, K. (2006a ) Multiv ariate discoun t w eighte d regression and lo cal lev el mo dels. Computat i onal Statistics and Data Analysis 50 , 3702-3720. [51] T rianta f yllop oulos, K . (20 06b) Multiv ariate con trol c h arts based on Ba yesian state space mo dels. Quality and R eliability E ng i ne ering International 22 , 693-70 7. [52] T rianta f yllop oulos, K. and Pik oulas, J. (2002) Multiv ariate regression app lied to the problem of net w ork securit y . Journal of F or e c asting 21 , 579-594. [53] W est, M. and Harrison, P .J. (199 7). Bayesian F or e c asting and Dynamic Mo dels , 2nd edn. Springer-V erlag, New Y ork. [54] Wh ittake r , J. (1990) Gr aphic al M o dels in Applie d Multivariate Statistics. Wiley , New Y ork. [55] Wilkinson, D.J. and G oldstein, M. (1 996) Ba yes’ linea r adjustmen t for v ariance matrice s. In J.M. Bernardo et al. editors, Bayesian Statistics 5 , 79 1-800, Oxford Un iv ersity p r ess, Oxford. [56] Wilkinson, D.J. (1997) Ba yes linear v ariance adjustment for lo cal ly linear DLMs. Journal of F or e c asting 16 , 329- 342. 25

Posterior mean and variance approximation for regression and time series problems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment