An EM Algorithm for Continuous-time Bivariate Markov Chains

An EM Algorithm fo r Con tin uous-tim e Biv ariate Mark ov Chains ∗ Brian L. Mark and Y ariv Eph raim Dept. of Electrical and Computer Engineering George Mason Univ ersity F airfax, V A 22030, U.S.A. Octob er 31, 2018 Abstract W e study proper ties and parameter es timation of ﬁnite-state homogeneo us contin uous-time bivariate Ma rko v c hains. Only one of the tw o pro cesses of the biv ariate Marko v c hain is ob- serv able. The general form of the biv a r iate Marko v c ha in studied here mak es no a s sumptions on the str ucture of the generator of the c hain, and hence , neither the underly ing pro cess nor the observ a ble pro cess is necess arily Markov. The biv ariate Mar ko v c hain allows for sim ulta- neous jumps of the underly ing and observ able pr o cesses. F urthermor e, the in ter-arr iv a l time of observed even ts is pha s e-type. The biv ariate Marko v chain g eneralizes the batch Markovian arriv al proces s as well as the Markov mo dulated Mark ov pro cess. W e develop a n exp ectation- maximization (EM) pro cedure for es tima ting the genera tor o f a biv aria te Markov c hain, and we demonstr a te its p erformance. The pro cedure do es not rely on an y numerical in tegration or sampling sc heme of the contin uous-time biv a riate Marko v chain. The propo sed EM algorithm is equally a pplicable to multiv aria te Markov c ha ins. Keywor ds: Parameter estimation, EM algorithm, Contin uous-time biv ariate Markov chain, Marko v mo dulated pro cesses 1 In tro duction W e consider the problem of estimating the parameter of a con tin uous-time ﬁnite-state homogeneous bivariate Mark ov c hain. Only one of the tw o pro cesses of the biv ariate Mark o v c hain is obs erv able. The other is commonly referred to as the underlying p ro cess. W e do not restrict the structur e of the generator of the biv ariate Mark o v c h ain to ha v e an y p articular form. Th us, simultaneous jumps of the observ able and un d erlying p r o cesses are p ossib le, and neither of these t w o pro cesses is necessarily Mark ov. In [25], a con tin uous-time biv ariate Mark o v c hain w as u sed to mo del dela ys and congestion in a computer net w ork, and a parameter estimation algorithm was prop osed. Th e mo del w as motiv ated b y the desire to capture correlations observ ed empirically in samples of netw ork dela ys. A con tin uous-time multiv ariate Mark o v c h ain wa s used to mo del ion channel current s in [3]. ∗ This w ork was supp orted in part b y the U.S. National Science F oun dation under Gra nt CC F-0916568 . P art of the w ork in this pap er w as presented in a preliminary form at the 45th Conference on I nformation Science and Systems hosted by The Johns Hopk ins Un ivers ity , Baltimore, MD, March 23-25, 2011. 1 The b iv ariate Mark o v c hain generalizes commonly used m o dels such as the batc h Mark ovia n ar- riv al pro cess (BMAP) [5, 13], the Mark o v mo d ulated Mark ov pro cess (MMMP) [8], and the Mark ov mo dulated P oisson pro cess (MMPP) [10, 20, 21, 18]. In the BMAP , f or example, the generator is an inﬁnite upp er triangular b lo c k T o eplitz matrix. In the MMMP and MMPP , the generator is suc h that no simultaneo us jump s of the un derlying and observ able pro cesses are allo wed. In addition, the underlying pro cesses in all three examples are h omogeneous contin uous-time Mark ov chains. W e dev elop an exp ectation-maximiz ation (EM) algorithm for estimat ing the p arameter of a con tinuous-time b iv ariate Mark o v chain. The prop osed EM algorithm is equ ally applicable to mul- tiv ariate Marko v c hains. An EM algorithm for the MMPP w as originally dev elop ed b y Ryd ´ en[21]. Using a similar approac h , EM algorithms w ere sub sequen tly develo p ed for the BMAP in [5, 13] and the MMMP in [8]. The EM algorithm dev elop ed in the present pap er also relies on Ryd´ en’s approac h . It co nsists of closed-form, stable recursions emplo ying scaling and V an Loan’s approac h for compu tation of in tegrals of matrix exp onent ials [24 ], along the lines of [18 , 8 ]. In the p arameter estimation algorithm of [25], the con tin uou s -time b iv ariate Mark ov c hain is ﬁrst sampled and th e transition m atrix of th e resu lting discr ete-time b iv ariate Marko v c hain is estimated using a v arian t of the Baum algorithm [4]. The generator of the con tinuous-time biv ariate Mark ov c hain is subsequently obtained from the transition matrix estimate. As discus s ed in [17], this approac h may lead to am biguous estimates of the generator of the biv ariate Marko v c hain, and in some cases it will not lead to a v alid estimate. Moreo v er, the appr oac h do es not allo w structuring of the generator estimate since it is ob tained as a bypro duct of the transition matrix estimate. The EM algorithm dev elop ed in this pap er estimates the generator of the biv ariate Mark ov c hain directly fr om a samp le path of the conti n uous-time observ able p ro cess. T his leads to a m ore accurate computationally eﬃcient estimator whic h is free of the ab o ve d ra wbac ks. The remainder of this pap er is organized as follo ws. In S ection 2 , w e d iscuss prop erties of the con tinuous-time biv ariate Marko v c h ain and d ev elop asso ciated lik eliho o d functions. In Section 3, w e deve lop the EM algorithm. In Section 4, we discuss th e implementat ion of the EM algorithm and pro vide a numerical example. Concluding r emarks are giv en in S ection 5 . 2 Con tin uous-time Biv ariate Mark o v Chain Consider a ﬁnite-state homog eneous con tin uous-time biv ariate Marko v c hain Z = ( X , S ) = { ( X ( t ) , S ( t )) , t ≥ 0 } , (1) deﬁned on a stand ard probab ility space, and assume th at it is irred ucible. The pr o cess S = { S ( t ) , t ≥ 0 } is the underlying pro cess with state space of sa y { a 1 , . . . , a r } , and X = { X ( t ) , t ≥ 0 } is the observ able pro cess with state space of sa y { b 1 , . . . , b d } . The ord ers r and d are assumed kno w n. W e assu me without loss of generalit y that a i = i for i = 1 , . . . , r and b l = l for l = 1 , . . . , d . The state sp ace of Z is then give n by { 1 , . . . , d } × { 1 , . . . , r } . Neither X n or S need b e Mark o v. Necessary and suﬃcien t conditions for either pro cess to b e a homogeneous con tinuous-time Mark ov c hain are give n in [3, Theorem 3.1]. With probabilit y one, all sample paths of Z are right -con tinuous step fu nctions with a ﬁ nite num b er of jumps in an y ﬁnite in terv al [1, Th eorem 2.1]. The b iv ariate Ma rk o v c hain is parameterized by a generator matrix H = { h ln ( ij ) , l, n = 1 , . . . d ; i, j = 1 , . . . r } , (2) 2 b c b b b c b c b b c b b c b X S b c b c b c b c b b b b T 1 T 0 = 0 T 2 T 3 T 4 T 5 ∆ T 1 ∆ T 2 ∆ T 3 ∆ T 4 ∆ T 5 T X 0 = 1 X 1 = 2 X 2 = 1 X 3 = 2 X 4 = 1 X 5 = 2 S 0 = 1 S 1 = 2 S 2 = 2 S 3 = 2 S 4 = 2 S 5 = 1 Figure 1: Example sample path of Z = ( X , S ). where th e s et of join t states { ( l , i ) } is ordered lexicographical ly . With P ( · ) d enoting th e prob ab ility measure on the given space, h ln ( ij ) = lim ǫ → 0 1 ǫ P ( Z ( t + ǫ ) = ( n, j ) | Z ( t ) = ( l , i )) , (3) for ( l , i ) 6 = ( n, j ). The generator matrix can b e expressed as a b lo c k matrix H = { H ln , l , n = 1 , . . . , d } , where H ln = { h ln ( ij ) , i, j = 1 , . . . , r } are r × r m atrices. The n u m b er of indep endent scalar v alues that constitute the generator H is at most r d ( r d − 1). Since none of the ro ws of H is iden tically zero, the su bmatrix H ll , l ∈ { 1 , . . . , d } , is strictly diagonally dominan t, i.e., − h ll ( ii ) = X ( n,j ):( n,j ) 6 =( l,i ) h ln ( ij ) > X j : j 6 = i h ll ( ij ) , (4) for all i = 1 , . . . , r , and thus, H ll is nonsingular [23, p. 476]. Clearly , the observ able pro cess X ( t ) is a d eterministic fun ction of the biv ariate Mark ov c hain Z ( t ). Con v ersely , the pair consisting of a univ ariate Mark ov c hain together with a d eterministic function o f that c hain is a biv ariate Mark o v c hain (see [19 ]). 2.1 Densit y of observ able pro cess Assume that the observ able pro cess X of a biv ariate Marko v c hain Z = ( X , S ) starts fr om some state X 0 at time T 0 = 0 and ju mps N times in [0 , T ] at 0 < T 1 < T 2 < · · · < T N ≤ T . Let X k = X ( T k ) denote th e state of X in the interv al [ T k , T k +1 ) for k = 1 , 2 , . . . , N − 1 and let X N denote the state of X in the inte rv al [ T N , T ]. This con v ention diﬀers slig h tly from th at used in [8], where X k , X ( T k − 1 ), k = 1 , . . . , N + 1. Deﬁne S k = S ( T k ) to b e th e state of S at the jump time T k of X . Let Z k = Z ( T k ) = ( X k , S k ). Let ∆ T k = T k − T k − 1 denote the dwe ll time of X in state X k − 1 during the interv al [ T k − 1 , T k ), k = 1 , . . . , N . W e d enote realizations of X k , S k , Z k , T k , and ∆ T k b y x k , s k , z k , t k , and ∆ t k , resp ectiv ely . Figure 1 depicts a sample p ath of a biv ariate Marko v c hain Z = ( X , S ) for whic h N = 5, r = d = 2, a 1 = 1, a 2 = 2, b 1 = 1, and b 2 = 2. F rom the ﬁgure, w e see that the sequence { Z k } is giv en by { (1 , 1) , (2 , 2) , (1 , 2) , (2 , 2) , (1 , 2) , (2 , 1) } . (5) 3 Note that in Fig ure 1, the pro cesses X and S jump sim ultaneously at times T 3 and T 5 . The biv ariate Marko v c hain Z is a pur e-jump Mark ov pro cess and is therefore strong Mark o v (see [26, Section 4-1]) Using the strong Mark o v pr op erty of Z , it follo ws that P ( Z k +1 = z , ∆ T k +1 ≤ τ | Z 0 , . . . , Z k ; T 0 , . . . , T k ) = P ( Z k +1 = z , ∆ T k +1 ≤ τ | Z k ) (6) for all z ∈ { 1 , . . . , d } × { 1 , . . . , r } ; τ ≥ 0; and k = 0 , 1 , 2 , . . . . Th erefore, { ( Z k , T k ) } is a Marko v renew al pro cess (see [6]). S ince the observ able p ro cess X can b e represen ted in a n equiv alen t form as { ( X k , T k ) } , it follo ws that the densit y of X in [0 , T N ] ma y b e obtained from th e pro duct of transition densities of { ( Z k , T k ) } . If T > T N , an additional term is required to obtain the density of X in [0 , T ], as w ill b e sp eciﬁed shortly . Assuming a stationary biv ariate Mark o v c hain, th e transition d ensit y of { ( Z k , T k ) } follo w s fr om the densit y co rresp on d ing to P ( Z ( τ ) = ( n, j ) , T 1 ∈ [ τ , τ + dτ ) | Z (0) = ( l, i )) , (7) for l 6 = n , which w e denote by f ln ij ( τ ). Let f ln ( τ ) = { f ln ij ( τ ) , i, j = 1 , . . . , r } denote the transition densit y matrix of { ( Z k , T k ) } . When τ do es n ot coincide with a jump time of the observ able pr o cess, the transition probabilit y ¯ f l ij ( τ ) = P ( S ( τ ) = j, T 1 > τ | X (0) = l , S (0) = i ) (8) is also required. L et ¯ f l ( τ ) = { ¯ f l ij ( τ ) , i, j = 1 , . . . , r } denote the corresp onding tr an s ition matrix. The f ollo wing prop osition giv es exp licit forms for f ln ( τ ) and ¯ f l ( τ ). Prop osition 1. F or τ ≥ 0 , f ln ( τ ) = e H ll τ H ln , l 6 = n , (9) and ¯ f l ( τ ) = e H ll τ . (10) F urthermor e, P ( Z k +1 = ( n, j ) | Z k = ( l , i )) =  − H − 1 ll H ln  ij . (11) Pr o of. F ollo wing an argument similar to that give n in [11, 8], th e d ensit y f ln ij ( t ) s atisﬁes th e follo w in g equation: f ln ij ( τ ) = h ln ( ij ) e h ll ( ii ) τ + e h ll ( ii ) τ Z τ 0 e − h ll ( ii ) t X k 6 = i h ll ( ik ) f ln k j ( t ) dt. (12) Diﬀeren tiating b oth sides of (12) with resp ect to t and simplifying, it follo ws that d f ln ij ( τ ) dτ = h ll ( ii ) f ln ij ( τ ) + X k 6 = i h ll ( ik ) f ln k j ( τ ) = X k h ll ( ik ) f ln k j ( tτ ) . (13) 4 Therefore, d f ln ( τ ) dτ = H ll f ln ( τ ) (14) with initial condition f ln (0) = H ln from (12 ). Hence, (9) follo ws . Integ rating (9) from 0 to ∞ giv es (11). The transition probabilit y ¯ f l ij ( τ ) satisﬁes an equation similar to (12) except that the ﬁrst term on the righ t-han d side is r eplaced b y e h ll ( ii ) τ δ ij . This change only aﬀects the initia l condition, viz. , ¯ f l (0) = I , where I den otes the iden tit y matrix. Hence, (10) follo ws. T o s im p lify n otation in the sequel, w e sh all u se P ( · ) to denote not only a probability measure, but also a d ensit y , as appropriate (cf. [21, 8]). The exact m eaning of expr essions inv olving P ( · ) should b e clear from the con text. In particular, all null probabilities are to b e inte rpreted in the density sense. The dens ity of X in [0 , T ] d ep ends on the initial state probabilities µ x 0 ( i ) = P ( X 0 = x 0 , S 0 = i ), i = 1 , . . . , r . Let µ x 0 = { µ x 0 (1) , µ x 0 (2) , . . . , µ x 0 ( r ) } (15) denote the initial state distribution. Using the Marko v r enew al prop ert y of { ( Z k , T k ) } , the d ensit y of X in [0 , T ] can b e expressed as P ( X ( t ) , 0 ≤ t ≤ T ) = µ x 0 ( N Y k =1 f x k − 1 x k (∆ t k ) ) ¯ f x N ( T − t N ) 1 , (16) where 1 denotes a column v ector of all ones. This expression will b e used in Secti on 3 to dev elop the EM recursions. 2.2 F orward-b ac kward r ecursions The density in (16) can b e ev aluated using forw ard and bac kw ard r ecursions. The forwar d density is deﬁ ned b y the ro w v ector L ( k ) = { P ( X ( t ) , 0 ≤ t ≤ t k , S k = i ) , i = 1 , . . . , r } , (17) for k = 0 , 1 , . . . , N . The f orw ard recur sion is giv en by L (0) = µ x 0 , L ( k ) = L ( k − 1) f x k − 1 x k (∆ t k ) . (18) The b ackwar d density is deﬁned by the column ve ctor R ( k ) = { P ( X ( t ) , t k − 1 < t ≤ T | X k − 1 = x k − 1 , S k − 1 = i ) , i = 1 , . . . , r } ′ , for k = N + 1 , N , . . . , 1, where ′ denote mat rix tran s p ose. The backw ard recursion is giv en by R ( N + 1) = ¯ f x N ( T − t N ) 1 , R ( k ) = f x k − 1 x k (∆ t k ) R ( k + 1) . (19) 5 F rom (1 6)–(19), the density of the observ able process in [0 , T ] is giv en b y P ( X ( t ) , 0 ≤ t ≤ T ) = L ( k ) R ( k + 1) , k = 0 , . . . , N . (20) T o ensure n um er ical stabilit y , it is necessary to sca le the ab o ve recursions. Using an approac h similar to that dev elop ed in [18], the scaled forw ard recursion is giv en by ˜ L (0) = µ x 0 , ˜ L ( k ) = ˜ L ( k − 1) f x k − 1 x k (∆ t k ) c k , k = 1 , . . . , N , (21) where c k , ˜ L ( k − 1) f x k − 1 x k (∆ t k ) 1 , k = 1 , . . . , N . (22) The s caled b ackw ard recursion is giv en by ˜ R ( N + 1) = ¯ f x N ( T − t N ) 1 , ˜ R ( k ) = f x k − 1 x k (∆ t k ) ˜ R ( k + 1) c k , k = 1 , . . . , N . (23) Clearly , ˜ L (0) = L (0) and ˜ R ( N + 1) = R ( N + 1). F or k = 1 , . . . , N , one can sho w straightforw ardly that th e sca led and unscaled iterat es of the forw ard and backw ard recursions are related by ˜ L ( k ) = L ( k ) Q k m =1 c m and ˜ R ( k ) = R ( k ) Q N m = k c m . (24) F rom (24) and (17), one sees th at for k = 1 , . . . , N , the scaled forwa rd ve ctor ˜ L ( k ) can b e int erpreted as the probab ility distribution of the underlying p ro cess S at time T k conditioned on the observ able sample path up to and including time t k : ˜ L ( k ) = { P ( S k = i | X ( t ) , 0 ≤ t < t k ) , i = 1 , . . . , r } . (25) Th us, the d ensit y of X up to and includin g th e N th ju mp can b e expressed as the pro duct of the scaling constants as follo ws: P ( X ( t ) , 0 ≤ t ≤ t N ) = L ( N ) 1 = N Y k =1 c k ! ˜ L ( N ) 1 = N Y k =1 c k . (26) Therefore, th e log-lik eliho o d of the observ ed sample p ath is giv en b y L = N X k =1 log c k . (27) The ab o v e forw ard and bac kw ard recursions can b e generalize d to app ly to an y time t b et w een jump ti mes of the observ able pro cess. In particular, for t ∈ [ t k , t k +1 ), consid er the ro w ve ctor ˜ ℓ ( t ) = { P ( S ( t ) = i | X ( τ ) , 0 ≤ τ ≤ t ) , i = 1 , . . . , r } . (28) It follo ws that ˜ ℓ ( t ) = L ( k ) ¯ f x k ( t − t k ) L ( k ) ¯ f x k ( t − t k ) 1 . (29) A similar result w as deriv ed b y [19]. 6 2.3 Prop erties The observ able pro cess X is conditionally Marko v giv en S , and vice ve rsa. In con trast to the MMMP and BMAP (see Section 2.4), the und erlying pro cess S of the biv ariate Mark o v chain Z need not b e Mark o v. A necessary and su ﬃcien t cond ition for S to b e a (homogeneous) Mark ov c hain is th at th er e exists a matrix Q satisfying [3, Theorem 3.1] Q = d X n =1 H ln (30) for l = 1 , . . . , d . In this case, Q is the generator of S . V arious statistics of X and S can b e expressed in terms of the stationary distribution of Z . W e denote this d istribution b y π = { π nj } , w here π nj = lim t →∞ P ( Z ( t ) = ( n, j ) | Z (0) = ( l , i )) and the set of joint states { ( n, j ) } is ordered lexicographically . The v ector π is the u n ique solution to the follo wing system: π H = 0 , π 1 = 1 . (31) The pro cess { Z k } is a homogeneous discrete-time Mark o v c hain with transition pr obabilities giv en by (11) in Prop osition 1. Deﬁne the r × r m atrix A ln = − H − 1 ll H ln , f or { l , n : l 6 = n ∈ { 1 , . . . , n } . F or l = n , let A ll b e a matrix of all zeros. T he transition matrix of { Z k } is giv en b y A = { A ln } . Let D H = diag { H ll , l = 1 , . . . , d } . It follo w s that A = − D − 1 H H + I . Let ν = { ν nj } denote the stationary distribu tion of { Z k } , where ν nj = lim k →∞ P ( Z k = ( n, j ) | Z 0 = ( l , i )). The vec tor ν is the unique solution to the follo wing system: ν A = ν, ν 1 = 1 . (32) Then ν can be related to π as follo ws (cf. [10, (6) ]): ν = π D H π D H 1 . (33) The dwe ll time of th e observ able pro cess X in a giv en state has a ph ase-t yp e d istribution [16, Chapter 2]. T he p hase-t yp e distribution generalizes mixtures and con v olutions of exp onen tial distributions and can b e used to appro ximate a large class of dwell time distribu tions. T o state this prop erty f orm ally , we d enote the conditional density of the k th dwe ll time ∆ T k giv en that X k − 1 = l b y f ∆ T k | X k − 1 ( τ | l ). Deﬁne α li = lim k →∞ P ( S k = i | X k = l ) and let α l = { α li , i = 1 , . . . , r } . The conditional probabilit y α li can be expressed in terms of ν as follo ws: α li = ν li P i ν li . (34) W e then ha v e the follo wing prop osition, w h ic h is pr o ved in A. Prop osition 2. Th e stationary c onditional distribution of ∆ T k given X k − 1 = l is phase-typ e. In p articular, lim k →∞ f ∆ T k | X k − 1 ( τ | l ) = α l e H ll τ β l , (35) wher e β l = P n : n 6 = l H ln 1 . The p hase-t yp e dw ell time of the observ able pro cess ma y b e useful in explicit durational mo d - eling em b edded in a hidden Mark o v m o del system (cf. [9, 22, 28]). It should b e clear that the d w ell time of the und erlying pro cess S also has a phase-t yp e distribution. This ma y b e us efu l in certa in applications for which the un derlying process has a non-Mark o vian charac ter. 7 Bivariate Markov chains BMAP MAP MMMP MMPP Poisson Figure 2: Relationships among v arious biv ariate Marko v c hains. 2.4 Relation to other mo dels In this section, w e relate the con tinuous-time biv ariate Marko v c h ain to other widely used s to c hastic mo dels men tioned in Section 1. The MMMP (cf. [8]) is a b iv ariate Mark o v c h ain ( X , S ) for whic h the u nderlying pr o cess S is a h omogeneous irreducible Mark ov c hain. The observ ab le pr o cess X , cond itioned on S , is a nonhomogeneous ir r educible Marko v c h ain. The generator of S satisﬁes Q = P d n =1 H ln , for all l , and the su bmatrices { H ln , l 6 = n } of the b iv ariate generator matrix H are all diagonal matrices. This implies that with probabilit y one, the und erlying and observ able c hains do n ot jump s im u ltaneously . Conditioned on S ( t ) = i , the generator of the observ able pr o cess is giv en by G i = { h ln ( ii ) , l , n = 1 , . . . , d } . T h us, the MMMP may b e parameterized by { Q, G 1 , . . . , G r } . The n um b er of indep enden t scalar v alues that constitute the paramete r of an MMMP is at most r ( r − 1) + r d ( d − 1). The BMAP (cf. [14, 13]) is a biv ariate Mark o v c hain ( N , S ) for wh ic h the u nderlying pro cess S is a homogeneous irreducible ﬁnite state Marko v c h ain, and the observ able pro cess N is a counting pro cess with s tate space { 0 , 1 , 2 , . . . , ... } . The size of eac h jump of N , i.e., th e batc h size, lies in { 1 , 2 , . . . , d − 1 } . The generator of a BMAP is an up p er tr iangular b lo c k T o ep litz matrix with ﬁrst ro w giv en b y { D 0 , D 1 , . . . , D d − 1 , 0 , 0 , . . . } . The generator Q of the u nderlying c hain S satisﬁes Q = P d − 1 m =0 D m . The BMAP b ecomes a Mark ovi an arriv al p ro cess (MAP) when d = 1, i.e., the observ able pro cess can ju mp b y at most one. F or this mo del, w e only hav e D 0 and D 1 . The MAP in turn b ecomes an MMPP (cf. [10, 20, 21, 18]) when D 1 is a diagonal matrix with the corresp onding P oisson rates along th e main d iagonal. In contrast to the MMMP and MMPP , the observ able and underlying chains of the BMAP and MAP can ju mp simultaneously . The BMAP can b e repr esented in the fr amew ork of th is pap er using a ﬁnite-state biv ariate Mark ov chain Z = ( X , S ) w here X is deﬁn ed by X ( t ) = ( N ( t ) mo d d ) + 1; i.e., the observ able pro cess X r ecords the mo d ulo- d coun ts of the counting pro cess N of the BMA P . In this case, the generator of Z is giv en by H = { H ln , l, n = 1 , . . . , d } , where H ln , D l − n mod d (36) and H is a blo ck circulan t matrix. The num b er of indep end en t scalar v alues that constitute the parameter of the BM AP is at most r 2 d − r . The biv ariate Marko v c h ain ma y also b e seen as a hid den Marko v pro cess wh ere { Z k } pla ys th e role of the un derlying Mark o v chain, and the obs er v ations are con tinuous random v ariables giv en by 8 { ∆ T k } [21]. The conditional d ensit y of eac h observ ation ∆ T k dep end s on b oth Z k − 1 and Z k , w hic h follo ws from (9). A review of hidden Mark o v pro cesses m a y b e found in [7]. The EM algorithm dev elop ed in Section 3 is app licable to any of the ab ov e particular cases, as well as multiv ariate Mark ov chains (see [3]). 3 EM Algorithm In this section, we describ e an EM algorithm for ML estimation of the p arameter of a b iv ariate c hain, denoted by φ 0 , giv en the sample path of the observ able p r o cess in the in terv al [0 , T ]. In the EM app roac h, a new parameter estimate, say φ ι +1 , is obtained f rom a giv en parameter estimate, sa y φ ι , as follo ws: φ ι +1 = arg max φ E { log P ( { Z ( t ) , 0 ≤ t ≤ T } ; φ ) | X ( t ) , 0 ≤ t ≤ T ; φ ι } , (37) where the exp ectation is take n o v er { S ( t ) , 0 ≤ t ≤ T } giv en the obs er v able s amp le p ath { X ( t ) , 0 ≤ t ≤ T } . Th e maximization is o ver φ , which consists of the oﬀ-diagonal element s of the biv ariate generator H . The density of a univ ariate Mark o v c hain was derive d in [1]. A s im ilar a pproac h can b e used to deriv e th e densit y of the biv ariate Marko v c hain { Z ( t ) , 0 ≤ t ≤ T } whic h is r equired in (37). The resulting log-density is expressed in terms of the num b er of jumps m ln ij from eac h state ( l, i ) to any other state ( n, j ) and th e dwe ll time D l i in eac h state ( l , i ). Let ϕ li ( t ) = I ( Z ( t ) = ( l , i )), where I ( · ) denotes the indicator function, and let # d en ote set cardinalit y . Then, m ln ij = # { t : 0 < t ≤ T , Z ( t − ) = ( l, i ) , Z ( t ) = ( n, j ) } = X t ∈ [0 ,T ] ϕ li ( t − ) ϕ nj ( t ) , (38) D l i = Z T 0 ϕ li ( t ) dt, (39) where the sum in (38) is o v er the jump p oint s of Z ( t ). The conditional mean in (37) inv olv es the conditional mean estimates ˆ m ln ij = E { m ln ij | X ( t ) , 0 ≤ t ≤ T } , and (40) ˆ D l i = E { D l i | X ( t ) , 0 ≤ t ≤ T } , (41) where the dep enden cy on φ ι is suppressed. The maximization in (37) yields the follo wing in tuitive estimate in the ι + 1st iteration of the EM alg orithm [1]: ˆ h ln ( ij ) = ˆ m ln ij ˆ D l i , ( l, i ) 6 = ( n, j ) . (42) Next, we devel op closed-form exp r essions for the estimates ˆ m ln ij and ˆ D l i . 9 3.1 Num b er of jumps estimate The cond itional expectation of m ln ij in (38) is give n b y ˆ m ln ij = X t ∈ [0 ,T ] P ( Z ( t − ) = ( l, i ) , Z ( t ) = ( n , j ) | X ( τ ) , 0 ≤ τ ≤ T ) . (43) T o f urther ev aluate this expr ession, w e consid er t w o cases: 1) l = n , i 6 = j ; and 2) l 6 = n . Case 1) : ( l = n , i 6 = j ) In this case, the sum in (43) is ov er jump s of the underlying pro cess S from i to j while the observ able c hain X remains in state l . The estimate in (43) can b e wr itten as a Riemann integral b y partitioning the interv al [0 , T ] into N sub in terv als of length ∆ su c h that N ∆ = T and then taking the limit as ∆ approac hes zero: ˆ m ll ij = lim ∆ → 0 N X k =1 ∆ · P ( Z (( k − 1)∆) = ( l , i ) , Z ( k ∆) = ( l , j ) | X ( t ) , 0 ≤ τ ≤ T ) ∆ = Z T 0 P ( Z ( t − ) = ( l, i ) , Z ( t ) = ( l , j ) | X ( τ ) , 0 ≤ τ ≤ T ) dt. (44) In (44), P ( Z ( t − ) = ( l, i ) , Z ( t ) = ( l, j ) | X ( τ ) , 0 ≤ τ ≤ T ) d enotes the conditional den sit y giv en X ( τ ) , 0 ≤ τ ≤ T , of a jump of Z fr om ( l, i ) to ( l , j ) at time t . A result similar to (44 ) w as originally stated in [1, 2, 21]. A detailed pro of w as pro vided in [1], in the con text of estimating ﬁn ite-state Mark ov c hains, and in [2], in the conte xt of estimating p hase-t yp e d istributions. Th e pr o of was adapted in [8] for estimating MMMPs. W e ha v e th e follo wing pr op osition, wh ich is stated for the case when T = t N . Prop osition 3. F or k = 0 , . . . , N − 1 , deﬁne the 2 r × 2 r matrix C k =  H x k x k H x k x k +1 ˜ R ( k + 2) ˜ L ( k ) 0 H x k x k  . (45) L et I k b e the r × r upp er right blo ck of the matrix exp onential e C k ∆ t k +1 , denote d by I k =  e C k ∆ t k +1  12 . (46) Then, ˆ m ll ij =   H ll ⊙ X k : x k = l I ′ k c k +1   ij (47) wher e ⊙ denotes element-b y- element matrix multiplic ation. 10 Pr o of. Let 1 i denote a column vec tor with a one in the i th elemen t and zeros elsewhere. S upp ose t ∈ [ t k , t k +1 ) and x k = l . Applyin g (1 8), (1 9), (26) in that order w e obtain P ( Z ( t − ) = ( l , i ) , Z ( t ) = ( l , j ) | X ( τ ) , 0 ≤ τ ≤ T ) = P ( Z ( t − ) = ( l, i ) , Z ( t ) = ( l , j ) , X ( τ ) , 0 ≤ τ ≤ T ) P ( X ( τ ) , 0 ≤ τ ≤ T ) = n µ x 0 Q k m =1 f x m − 1 x m (∆ t m ) o P ( X ( τ ) , 0 ≤ τ ≤ T ) ¯ f x k ( t − t k ) 1 i h x k x k ( ij ) 1 ′ j f x k x k + 1 ( t k + 1 − t ) · ( N Y m = k + 2 f x m − 1 x m (∆ t m ) ) 1 = h ll ( ij ) L ( k ) P ( X ( τ ) , 0 ≤ τ ≤ T ) ¯ f x k ( t − t k ) 1 i 1 ′ j f x k x k + 1 ( t k + 1 − t ) R ( k + 2) = h ll ( ij ) ˜ L ( k ) c k +1 ¯ f x k ( t − t k ) 1 i 1 ′ j f x k x k + 1 ( t k + 1 − t ) ˜ R ( k + 2) = h ll ( ij ) c k +1 h f x k x k + 1 ( t k + 1 − t ) ˜ R ( k + 2) ˜ L ( k ) ¯ f x k ( t − t k ) i j i . (48) Substituting (48) into (44), it follo w s that ˆ m ll ij = X k : x k = l h ll ( ij ) c k +1  Z t k +1 t k f x k x k +1 ( t k +1 − t ) ˜ R ( k + 2) ˜ L ( k ) ¯ f x k ( t − t k ) dt  j i . (49) Denoting the in tegral in the abov e expression by I k and using (9) and (1 0), w e obtain I k = Z ∆ t k + 1 0 e H x k x k (∆ t k + 1 − y ) H x k x k + 1 ˜ R ( k + 2) ˜ L ( k ) e H x k x k y dy . (50) The result (47) n o w follo ws from (49) and (50). F ollo wing the appr oac h of [18, 8], we apply the result in [24] to ev aluate th e in tegral in (50) and obtain (46 ). Case 2) : ( l 6 = n ) In this case, the su m in (43) is o ver the jump p oints of the observ able p ro cess X from state l to state n , irr esp ectiv e of j u mps of S . Hence, the conditional mean of the num b er of jumps can b e written as ˆ m ln ij = X k : x k = l, x k + 1 = n P ( Z ( t k − ) = ( l , i ) , Z ( t k ) = ( n, j ) | X ( τ ) , 0 ≤ τ ≤ T ) . (51) W e ha v e th e follo wing result, whic h holds for T ≥ t N . Prop osition 4. L et J k = ˜ R ( k + 2) ˜ L ( k ) e H x k x k ∆ t k , k = 0 , . . . , N − 1 . (52) 11 Then for l 6 = n , ˆ m ln ij =     H ln ⊙ X k : x k = l, x k +1 = n J ′ k c k +1     ij , (53) Pr o of. Supp ose k is su c h that x k = l and x k +1 = n . Similarly to Prop osition 3, P ( Z ( t k − ) = ( l, i ) , Z ( t k ) = ( n, j ) | X ( τ ) , 0 ≤ τ ≤ T ) = P ( Z ( t k − ) = ( l, i ) , Z ( t k ) = ( n, j ) , X ( τ ) , 0 ≤ τ ≤ T ) P ( X ( τ ) , 0 ≤ τ ≤ T ) = h ln ( ij ) n µ x 0 Q k m =1 f x m − 1 x m (∆ t m ) o P ( X ( τ ) , 0 ≤ τ ≤ T ) ¯ f x k + 1 (∆ t k + 1 ) 1 i 1 ′ j ( N Y m = k + 2 f x m − 1 x m (∆ t m ) ) 1 = h ln ( ij ) P ( X ( τ ) , 0 ≤ τ ≤ T ) L ( k ) ¯ f x k (∆ t k ) 1 i 1 ′ j R ( k + 2) = h ln ( ij ) c k +1 h ˜ R ( k + 2) ˜ L ( k ) ¯ f x k (∆ t k ) i j i . (54) Substituting (54) into (51), ˆ m ln ij = X k : x k = l, x k +1 = n h ln ( ij ) c k +1 h ˜ R ( k + 2) ˜ L ( k ) ¯ f x k (∆ t k ) i j i . (55) The result f ollo ws b y usin g (10) in (55) and deﬁning J k as the b rac ket ed term in th at expression. 3.2 Dw ell t ime estimate Next, we p ro vid e an expression for the d well time estimate ˆ D l i . T aking th e conditional exp ectatio n in (39) , it follo ws that ˆ D l i = Z T 0 P ( Z ( t ) = ( l , i ) | X ( τ ) , 0 ≤ τ ≤ T ) dt. (56) W e ha v e th e follo wing result, whic h is stated for the case T = t N . Prop osition 5. ˆ D l i =   X k : x k = l I ′ k c k +1   ii , (57) wher e I k is given in (46) . 12 Pr o of. The in tegrand in (56) can b e non-zero only for v alues of t for w hic h X ( t ) is in state l . Hence, it f ollo ws that ˆ D l i = X k : x k = l Z t k +1 t k P ( Z ( t ) = ( l , i ) | X ( τ ) , 0 ≤ τ ≤ T ) dt. (58) F or t ∈ [ t k , t k +1 ) and x k = l , w e ha ve similarly to Pr op osition 3, P ( Z ( t ) = ( l , i ) | X ( τ ) , 0 ≤ τ ≤ T ) = P ( Z ( t ) = ( l , i ) , X ( τ ) , 0 ≤ τ ≤ T ) P ( X ( τ ) , 0 ≤ τ ≤ T ) = n µ x 0 Q k m =1 f x m − 1 x m (∆ t m ) o P ( X ( τ ) , 0 ≤ τ ≤ T ) · ¯ f x k ( t − t k ) 1 i 1 ′ i · ( f x k x k +1 ( t k +1 − t ) N Y m = k + 2 f x m − 1 x m (∆ t m ) ) 1 = L ( k ) P ( X ( τ ) , 0 ≤ τ ≤ T ) ¯ f x k ( t − t k ) 1 i 1 ′ i f x k x k + 1 ( t k + 1 − t ) R ( k + 2) = 1 c k +1 h f x k x k + 1 ( t k + 1 − t ) ˜ R ( k + 2) ˜ L ( k ) ¯ f x k ( t − t k ) i ii . (59) The r esult follo ws from s ubstituting (59) into (58). 4 Implemen tation and Numerical Example The EM algorithm for co n tinuous-time biv ariate Mark o v c hains dev elop ed in Section 3 w as imple- men ted in Python u sing the S ciPy and NumPy libraries. Th e matrix exp onenti al fu nction from th e SciPy libr ary is based on a Pad ´ e approxima tion, whic h has a computational complexit y of O ( r 3 ) for an r × r matrix (see [15]). F or comparison p urp oses, th e paramete r estimation al gorithm based on time-sampling prop osed in [25] was also im p lemen ted in P ython. W e refer to this algorithm as the Baum-b ase d algorithm for estimating the parameter of contin uous-time biv ariate Marko v c hains. 4.1 Baum-based Algorithm In the Baum-based algorithm describ ed in [25], the con tin u ous-time biv ariate Mark o v c hain Z is time-sampled to obtain a discrete-time biv ariate Mark o v c hain ˜ Z = ( ˜ X , ˜ S ) = { ˜ Z k = ( ˜ X k , ˜ S k ) } , where ˜ Z k = Z ( k ∆) , k = 0 , 1 , 2 , . . . , and ∆ is th e sampling interv al. Let R denote the tr ansition matrix of ˜ Z . A v ariant of the Baum algorithm [4] is then emplo ye d to obtain a maxim um lik eliho o d estimate, ˆ R , of R . An estimate of the ge nerator of the co n tin uous -time biv ariate Mark ov c hain is obtained from ˆ H = 1 ∆ ln( ˆ R ) , (60) 13 where ln( ˆ R ) d enotes the principal bran ch of the matrix logarithm of ˆ R , w hic h is given by its T a ylor series expansion ln( ˆ R ) = ∞ X n =1 ( − 1) n − 1 ( ˆ R − I ) n n , (61) whenev er the ser ies conv erges. Existence and un iqueness of a generator ˆ H corresp ondin g to a transition matrix ˆ R are n ot guarantee d (see [12, 17]). In practice, existence and un iqueness of ˆ H for a giv en ˆ R dep end on the sampling in terv al ∆ (see [17]). Moreo ver, if a generator matrix ˆ H of a certain structur e is desired (e.g., a generator for an MMPP), that structure is diﬃcult to imp ose through estima tion of ˆ R . A suﬃcien t condition for the series in (61) to conv erge is that the diagonal en tries of ˆ R are all greater than 0. 5, i.e., ˆ R is strictly diagonally dominan t (see Theorem 2.2 in [12] and Theorem 1 in [25]). In this case, the ro w sums of ln( ˆ R ) are guaran teed to b e zero, but some of the oﬀ-diagonal elemen ts may p ossibly b e n egativ e [12, Th eorem 2.1]. An appr o ximate generator can then b e obtained by setting the n egativ e oﬀ-diagonal en tr ies to zero and adjus tin g the diagonal elements suc h that the ro w sums of the mo d iﬁ ed matrix are zero. If ˆ R is not strictly diagonally dominan t, the algorithm in [25] uses the ﬁr st term in the series expan s ion of (61) to obtain an ap p ro ximate generator, i.e ., ˆ H = ˆ R − I ∆ . (62) 4.2 Computational and Storage Requiremen ts The computational requirement of the EM algorithm dev elop ed in Section 3 dep ends linearly on the num b er of jump s, N , of the observ able p ro cess. F or eac h jum p of the obser v able pro cess, matrix exp onentia ls for the transition densit y matrix f x k x k +1 (∆ t k ) in (18) and (19) and for th e matrix I k in (46) are computed. Computation of the matrix exp onen tial of an r × r matrix r equires O ( r 3 ) arithmetic op erations (see [15]). Thus, the computational requiremen t due to compu tation of matrix exp onenti als is O ( N r 3 ). The elemen t-b y-elemen t matrix multiplicat ions in (47) and (53) contribute a computatio nal requiremen t of O ( N ( r 2 d 2 )). Therefore, the o ve rall computational complexit y of the EM algorithm can b e stated as O ( N ( r 3 + r 2 d 2 )). Th e storage requirement of the EM algorithm is dominated by the (scaled) forward and backw ard v ariables ˜ L ( k ) and ˜ R ( k ). Hence, the o v erall storage required is O ( N r ). By comparison, the computational requ iremen t of the Baum-b ased algorithm is O ( ˜ N r 2 d 2 ), where ˜ N = T / ∆ is th e n um b er of discrete-time samples. The storage requirement of the Baum- based algorithm is O ( ˜ N r d ). Clearly , b oth the computational and storage requirement of this algorithm are highly d ep endent on th e c h oice of the sampling in terv al ∆. 4.3 Numerical Example A simple n u merical example of estimating the parameter of a con tinuous-time biv ariate Mark ov c hain using the EM pro cedure d ev elop ed in Section 3 is presented in T able 1. F or this examp le, the n um b er of un derlying states is r = 2 and the num b er of observ able states is d = 2. The generator matrix H is display ed in terms of its blo c k matrix comp onen ts H ln , whic h are 2 × 2 matrices f or l, n ∈ { 1 , 2 } . The column lab eled φ 0 sho w s the true p arameter v alue f or the biv ariate Mark o v 14 φ 0 φ 0 ˆ φ em H 11 -70 10 -120 30 -77.0 3 14.76 20 -55 2 -8 8.46 -47.95 H 12 50 10 70 20 51.80 10.46 25 10 5 1 32.66 6.83 H 21 50 0 70 0 49.50 0 0 10 0 1 0 9.54 H 22 -60 10 -100 30 -59.5 1 10.01 20 -30 2 -3 19.61 -29.15 T able 1: φ 0 = tru e; φ 0 = initial; ˆ φ em = EM-based estimate. ˆ φ baum ∆ 0 . 1 0 . 01 0 . 005 0 . 0025 H 11 -10.00 0.13 -77.5 4 10.40 -78.18 15.40 -80.43 15.98 2.35 -6.08 11.3 0 -49.21 6.16 -48.20 8.73 -48.53 H 12 0.42 9.44 57.34 9.81 49.29 13.48 52.13 12.32 1.28 2.45 20.63 17.29 33.49 8.55 31.44 8.36 H 21 0.00 7.92 7.10 2.91 52.04 1.18 51.92 0.45 1.85 1.39 2.19 15.36 2.11 9.74 0.97 10.63 H 22 -10.00 2.18 -56.2 9 6.28 -64.22 11.01 -64.51 12.13 0.53 -3.76 4.21 -21.77 16.09 -27.93 17.72 -29.31 T able 2: P arameter estimates obtained using the Baum-based approac h of [25] with diﬀeren t sam- pling interv als ∆. 15 c hain. S im ilarly , the co lumns labeled φ 0 and ˆ φ em sho w , resp ectiv ely , the initial parameter and the EM-based estimate rounded to t wo d ecimal places. T h e observ ed d ata, generated using the true parameter φ 0 , consisted of N = 10 4 observ able jump s. The EM algorithm was terminated when the relat iv e diﬀerence of successiv e log-l ik eliho o d v alues, ev aluated using (27), f ell b elo w 10 − 7 . The b iv ariate Mark ov c hain parameterized by φ 0 in T able 1 is neither a BMAP nor an MMMP . Indeed, H is not blo c k circulan t as in a BMAP and H 12 is not diagonal as in an MMMP . Moreo ve r, according to [3, Th eorem 3.1], the underlying pro cess S is n ot a homogeneo us con tin uous -time Mark ov chain since H 11 + H 12 6 = H 21 + H 22 (cf. (30)). The estimate ˆ φ em w as obtained after 63 iterations of the E M pro cedure. An imp ortant p rop erty of the EM algorithm is that whenev er an oﬀ-diagonal elemen t of the generator H is zero in the initial parameter, the corresp onding elemen t in an y EM iterate remains zero. This can b e seen easily from Prop ositions 3 and 4. Th us, if structural information ab out H is kno wn, that structure can b e in corp orated into the initial parameter estimate and it will b e preserved by the EM algorithm in subsequ ent iterations. In the example of T able 1, H 21 is diagonal in the initial p arameter φ 0 and retains its diagonal s tr ucture in the estimate ˆ φ em . W e also see that the estimate of H 21 , wh ic h has the d iagonal stru cture requ ir ed for an MMMP , is m arkedly more accurate than that of H 12 . Based on th e numerical exp erience gained from this and other examples, we can qualitativ ely sa y that estimation of the d iagonal elemen ts of H ln ( l 6 = n ) tends to b e more accurate and requir es fewer iterations than that of the oﬀ-dia gonal ele men ts. F or comparison purp oses, we ha ve implement ed the Baum-b ased approac h p rop osed in [25] and app lied it to the b iv ariate Mark o v chain sp eciﬁed in T able 1 w ith true p arameter φ 0 and initial parameter estimate φ 0 , us ing the sampling interv als ∆ = 0 . 1, 0 . 01, 0 . 005 , and 0 . 0025. The corresp ondin g num b er of discrete-time samples ˜ N was 2408, 24077, 48153, and 96305 , resp ectiv ely . The algorithm w as terminated wh en th e relativ e diﬀerence of successiv e log-lik eliho o d v alues fell b elo w 10 − 7 . The num b er of iteratio ns required for the four sampling in terv als was 499, 446, 105, and 123, resp ectiv ely . In this example, a generator matrix could b e obtained from the tran s ition matrix estimate using (60) for all of the sampling interv als except for ∆ = 0 . 1. When ∆ = 0 . 1, the generator w as o btained using the appro ximation (62). The results are sho w n in T able 2. F or all of the sampling in terv als, the estimate of H 21 is not a diagonal matrix, b u t the accuracy of this estimate app ears to impro ve as ∆ is decreased. The Baum-based estimates of the other blo c k matrices H ln also app ear to b ecome closer in v alue to the EM-based estimate ˆ φ em sho w n in T able 1 as the sampling in terv al decreases. On the other han d , as ∆ decreases, th e compu tational requ ir emen t o f the Baum-based approac h increases prop ortionally , as discussed in Section 4.2. In the case ∆ = 0 . 1, the parameter estimate is far fr om the tru e parameter, wh ic h is not surprisin g, as man y jumps of th e observ able p ro cess are missed in the sampling pro cess. Ind eed, the lik elihoo d of the ﬁn al p arameter estimate ˆ φ baum obtained in th is case is actually lo wer than that of the initial parameter estimate φ 0 giv en in T able 1 . Th is example illustrates not only the h igh sensitivit y of the ﬁnal parameter estimate with resp ect to the size of the sampling in terv al, bu t also that the lik eliho o d v alues of the con tinuous-time biv ariate Mark o v c hain ma y decrease from one iteration to the next in the Baum-based appr oac h. In con trast, th e EM algorithm generates a sequence of parameter estimates with nond ecreasing likelihoo d v alues. Conditions for con vergence of the sequence of p arameter est imates w ere giv en in [27]. 16 5 Conclusion W e ha v e studied prop erties of the con tin u ous-time biv ariate Mark ov c hain and dev elop ed exp licit forw ard-bac kward recursions for estimating its p arameter based on the EM algorithm. The pro- p osed EM algorithm do es n ot require any sampling sc h eme or n umerical int egration. The biv ariate Mark ov c hain generalizes a large class of stochasti c mo dels including the MMMP and the BMAP , whic h b oth generalize the MMPP b u t are n ot equiv alen t. In its general form, the biv ariate Marko v c hain has b een used to mo d el ion c hann el currents (see [3]) and congestion in computer n etw orks (see [25]). S ince the prop osed EM pro cedure preserves the zero v alues in the esti mates of the gen- erator for the biv ariate Marko v c hain, it can b e applied to estimate the parameter of sp ecial cases, for example, the MMMP and BMAP , by s p ecifying an initial parameter estimate of the appropriate form. A Pro of of Prop osition 2 The d ensit y corresp on d ing to (7) can b e expressed as f ∆ T k ,X k ,S k | X k − 1 ,S k − 1 ( τ , n, j | l , i ) = [ f ln ( τ )] ij , k ≥ 1 . (63) Summing b oth sides of (63) o v er n , for n 6 = l , and o v er j , ap p lying (9) , and using β l , P n : n 6 = l H ln 1 , w e obtain f ∆ T k | X k − 1 ,S k − 1 ( τ | l , i ) =   e H ll τ X n : n 6 = l H ln 1   i =  e H ll τ β l  i . (64) Applying the la w of to tal probabilit y , f ∆ T k | X k − 1 ( τ | l ) = X i P ( S k − 1 = i | X k − 1 = l )  e H ll τ β l  i . (65) T aking th e limit as k → ∞ , it follo ws that lim k →∞ f ∆ T k | X k − 1 ( τ | l ) = α l e H ll tτ β l . (66) Equation (66) has the form of a ph ase-t yp e distribu tion parameterized by ( α l , H ll ) [16, Ch ap ter 2]. Indeed, the statio nary distribution of ∆ T k conditioned on X k − 1 = l is equiv alen t to the d istri- bution of the absorption time of a Mark o v c hain deﬁned on the state sp ace { 1 , 2 , . . . , r + 1 } with initial distribution giv en by α l and g enerator matrix giv en by G =  H ll β l 0 0  , (67) where 0 denotes a ro w v ector of all zeros. Here, r + 1 is an ab s orbing state, while the remaining states, 1 , . . . , r , are trans ient. 17 References [1] A. Alb ert. Estimating the in ﬁ nitesimal generator of a contin uous time, ﬁnite state Marko v pro cess. Anna ls of Mathematic al Statistics , 23(2 ):727– 753, 1962. [2] S. Asmussen, O. Nerman, and M. O lsson. Fitting phase-t yp e d istributions via the EM algo - rithm. Sc and. J. Stat. , 23(4):419–4 41, 1996 . [3] F. Ball and G. F. Y eo. Lu mpabilit y and marginalisabilit y for co n tin uou s -time Marko v c hains. Journal of Applie d Pr ob ability , 30(3 ):518– 528, 199 3. [4] L. E. Baum, T. Petrie, G. S olues, and N. W eiss. A maximization tec hnique occur ring in the statistica l analysis of probabilistic functions of Mark o v chains. Ann. Math. Statist. , 41:164 –171, 1970. [5] L. Breuer. An EM algorithm f or batc h Marko vian arriv al pro cesses and its comparison to a simpler esti mation pro cedure. Annals of Op er ations R ese ar ch , 112:123 –138, 2002. [6] E. C ¸ inlar. Mark o v renew al theory: A survey. M anagement Scienc e , 21(7):727– 752, 1975 . [7] Y. Ephraim an d N. Merha v. Hidden Mark ov pro cesses. IEE E T r ans. Inform. The or , 48:151 8– 1569, 2002. [8] Y. Ephraim and W. J. J. Rob erts. An EM algorithm for Marko v mo du lated Mark o v pro cesses. IEEE T r ans. Sig. Pr o c. , 57(2), 200 9. [9] J. D. F erguson. V ariable duration mo d els f or sp eec h. In Symp. Applic ation of Hidden Markov Mo dels to T ext and Sp e e ch , pages 14 3–179 , Oct. 1980 . [10] W. Fisc her and K. Meier-Hellstern. Th e Marko v-mo dulated Poi sson pro cess (MMPP) co ok- b o ok. Perform. Eval. , 18:1 49–1 71, 1992. [11] D. S. F r eed and L. A. Shepp. A P oisson pro cess whose rate is a hid den Mark o v c hain. A dv. Appl. Pr ob ab. , 14:2 1–36 , 1982. [12] R. B. Israel, J. S. Rosenthal, and J. Z. W ei. Finding generators for Mark o v c hains via empir ical transition matrice s with applications to credit ratings. Math. Financ e , 11(2):24 5–265 , 2001. [13] A. Klemm, C . Lindemann, and M. Lohmann. Mo d eling IP traﬃc u sing the batc h Mark ovia n arriv al p ro cess. Performanc e Evaluation , 54:149–17 3, 200 3. [14] D. M. Lucant oni. New r esults on the sin gle serv er queue w ith a batc h Marko vian arriv al pro cess. Sto chastic Mo dels , 7(1 ):1–4 6, 1991. [15] C. Moler and C. V an Loan. Ninete en d ubious wa ys to compute the exp onen tial of a matrix, t wen t y-ﬁv e y ears later. SIAM R evi ew , 4 5(1):3– 49, 2003. [16] M. F. Neuts. Matrix-Ge ometric Solutions in Sto chastic M o dels: A n A lgorithmic A ppr o ach . Do ve r Publications, Inc., 1981. [17] W. J . J. Rob erts and Y. Ep hraim. An EM algorithm for ion-c hannel cur r en t estimation. IE EE T r ans. Sig. Pr o c. , 56:26 –33, 2008. 18 [18] W. J . J . Rob erts, Y. Ephraim, and E. Dieguez. On Ryd ´ en’s EM algorithm for estimating MMPPs. IEEE Sig. Pr o c. L et. , 13(6 ):373– 377, 2006. [19] M. Ru demo. S tate Estimation f or P artially Observed Mark ov Chains. J. Mathematic al Analysis and Applic ations , 56 :26–3 3, 1973. [20] T. R y d´ en. P arameter estimation for Mark o v mod ulated Poisson p ro cesses. Comm. Statist. Sto c hastic Mo dels , 10( 4):795 –829, 1994. [21] T. Ryd´ en. An EM algo rithm for estimation in Marko v-mod ulated Po isson p ro cessess. Com- putational Statistics and Data Analysis , 21:431–44 7, 1996. [22] R. A. Sohn. Sto c hastic analysis of exit ﬂu id temp erature records from the activ e T AG h y- drothermal mound (Mid-A tlan tic Ridge, 26 ◦ N): 2. Hidden Mark ov mo dels of ﬂo w episo des. J. Ge ophysic al R es , 112, 2007. [23] G. Strang. Intr o duction to Line ar Algebr a . W ellesley- Cam b ridge Press, 3rd edition, 2003 . [24] C. F. V an Loan. Computing integrals in v olving the matrix exp onential. IEEE T r ans. on Autom atic Contr ol , 23( 3), 197 8. [25] W. W ei, B. W ang, and D. T o wsley . C ontin uous-time hidden Marko v mo dels for net work p erformance ev aluation. Performanc e E valuation , 49:129–1 46, S ept. 2002. [26] R. W. W olﬀ. Sto chastic Mo deling and the The ory of Queues . Pr en tice-Hall, 1989. [27] C. F. J. W u. O n the con verge nce pr op erties of the EM algorithm. Ann. Statist. , 11(1):95–10 3, 1983. [28] S.-Z. Y u and H. Koba yashi. Practical implemen tation of an eﬃcient forward-bac kw ard algo - rithm for an explicit-duration h idden Mark o v mo d el. IEEE T r ans. Sig. P r o c. , 54:1947–1 951, 2006. 19

An EM Algorithm for Continuous-time Bivariate Markov Chains

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment