Infinite Viterbi alignments in the two state hidden Markov models

Innite Viterbi alignmen ts in the t w o state hidden Mark o v mo dels Jüri Lem b er 1 a and Alexey K olo ydenk o 2 1 Univ ersit y of T artu, Estonia, email:jyrilut.ee 2 Ro y al Hollo w a y , Univ ersit y of London, UK, email:alexey .k olo ydenk orh ul.a.uk Abstrat Sine the early da ys of digital omm uniation, Hidden Mark o v Mo dels (HMMs) ha v e no w b een routinely used in sp ee h reognition, pro essing of natural languages, images, and in bioinformatis. An HMM ( X i , Y i ) i ≥ 1 assumes ob- serv ations X 1 , X 2 , . . . to b e onditionally indep enden t giv en an explanotary Mark o v pro ess Y 1 , Y 2 , . . . , whi h itself is not observ ed; moreo v er, the on- ditional distribution of X i dep ends solely on Y i . Cen tral to the theory and appliations of HMM is the Viterbi algorithm to nd a maximum a p oste- riori estimate q 1: n = ( q 1 , q 2 , . . . , q n ) of Y 1: n giv en the observ ed data x 1: n . Maxim um a p osteriori paths are also alled Viterbi paths or alignmen ts. Re- en tly , attempts ha v e b een made to study the b eha vior of Viterbi alignmen ts of HMMs with t w o hidden states when n tends to innit y . It has indeed b een sho wn that in some sp eial ases a w ell-dened limiting Viterbi align- men t exists. While inno v ativ e, these attempts ha v e relied on rather strong assumptions. This w ork pro v es the existene of innite Viterbi alignmen ts for virtually an y HMM with t w o hidden states. Keyw ords Hidden Mark o v mo dels, maxim um a p osterior path, Viterbi alignmen t, Viterbi extration, Viterbi training 1 In tro dution W e onsider hidden Mark o v mo dels (HMM) ( Y , X ) with t w o hidden states. Namely , Y represen ts the hidden pro ess Y 1 , Y 2 , . . . , whi h is an irreduible ap erio di Mark o v  hain with state spae S = { a , b } . In partiular, the transi- tion probabilities P = ( p lm ) , l , m ∈ S , are p ositiv e and the stationary distri- bution π = π P is unique. F or te hnial on v eniene, Y 1 is assumed to follo w π , ho w ev er, the results of the pap er hold for arbitrary initial distributions. T o ev ery state l ∈ S there orresp onds an emission distribution P l on X = R d . a The author has b een supp orted b y the Estonian Siene F oundation Gran t 7553 1 Giv en a realization y 1: ∞ ∈ S ∞ of Y , the observ ations X 1: ∞ := X 1 , X 2 , . . . are generated as follo ws. If Y i = a (resp. Y i = b ), then X i is distributed aording to P a (resp. P b ) and indep enden tly of ev erything else. W e refer to this mo del as the (general) 2-state HMM. In (Cappé et al., 2005), HMMs are alled `one of the most suessful sta- tistial mo delling ideas that ha v e [emerged℄ in the last fort y y ears'. Sine their lassial appliation to digital omm uniation in 1960s (see further referenes in (Cappé et al., 2005)), HMMs ha v e had a dening impat on the mainstream resear h in sp ee h reognition (Huang et al., 1990, Jelinek, 1976, 2001, MDermott and Hazen, 2004, Ney et al., 1994, P admanabhan and Pi hen y, 2002, Rabiner and Juang, 1993, Rabiner et al., 1986, Sh u et al., 2003, Stein biss et al., 1995, Ström et al., 1999), natural language mo dels (Ji and Bilmes, 2006, O h and Ney, 2000), and more reen tly omputational biology (Durbin et al., 1998, Eddy, 2004, Krogh, 1998, Lomsadze et al., 2005). Th us, for example, DNA regions an b e lab eled as a , `o ding', or b , `non-o ding', with P a and P b represen ting the resp etiv e distributions on the { A, C, G, T } alphab et. Giv en observ ations x 1: n := x 1 , . . . , x n , and treating the hidden states y 1: n := y 1 , . . . , y n as parameters, inferene in HMMs t ypially in v olv es v ( x 1: n ) , a maximum a p osteriori (MAP) estimate of Y 1: n . It has no w b een reognized that `[in℄ spite of the theoretial and pratial imp ortane of the MAP path estimator, v ery little is kno wn ab out its prop erties' (Calieb e, 2006). The same estimates are also kno wn as Viterbi , or for  e d alignments and an b e eien tly omputed b y a dynami programming algorithm also b earing the name of Viterbi . When substituted for true y 1: n in the lik eliho o d funtion Λ( y 1: n ; x 1: n , ψ ) , Viterbi alignmen ts an also b e used to estimate ψ , an y un- kno wn free parameters of the mo del. Starting with an initial guess ψ (0) and alternating b et w een maximization of the lik eliho o d Λ( y 1: n ; x 1: n , ψ ) in y 1: n and ψ is at the ore of Viterbi tr aining (VT), or extr ation (Jelinek, 1976), also kno wn as se gmental K-me ans (Ephraim and Merha v, 2002, Juang and Rabiner, 1990). Resulting estimates ˆ ψ VT ( x 1: n , ψ (0) ) are kno wn to b e dier- en t from the maximum likeliho o d (ML) estimates ˆ ψ ML ( x 1: n , ψ (0) ) whi h in this ase are most ommonly deliv ered b y the EM pro edure (Baum and P etrie, 1966, Bilmes, 1998, Ephraim and Merha v, 2002). Ev en if ψ w ere kno wn, Viterbi alignmen ts v ( x 1: n ; ψ ) w ould t ypially dier from true paths y 1: n , and the long-run prop erties of v ( x 1: n ; ψ ) are not ob vious (Calieb e, 2006, Calieb e and Rösler, 2002, K olo ydenk o et al., 2007, Lem b er and K olo ydenk o, 2007, 2008). F urthermore, (K olo ydenk o et al., 2007, Lem b er and K olo ydenk o, 2007, 2008) prop ose a h ybrid of VT and EM whi h tak es in to aoun t the asymptoti disrepany b et w een ˆ ψ ML ( x 1: n , ψ (0) ) and ˆ ψ VT ( x 1: n , ψ (0) ) in order to inrease omputational and statistial eienies of estimation of ψ for n large. Th us or otherwise, an imp ortan t question is how to nd the asymp- toti pr op erties of Viterbi alignments, given that ( n + 1) th observation  an in priniple hange the pr evious alignment entir ely, i.e. v ( x 1: n +1 ) i 6 = v ( x 1: n ) i , 2 1 ≤ i ≤ n ? Do the Viterbi alignmen ts then admit w ell-dened extensions? W e answ er this question p ositiv ely in (Lem b er and K olo ydenk o, 2008) for general HMMs (in partiular, allo wing more than t w o hidden states) b y on- struting pr op er innite Viterbi alignments . Generalizing and larifying re- lated results of (Calieb e, 2006, Calieb e and Rösler, 2002), the approa h in (Lem b er and K olo ydenk o, 2008) is to extend alignmen ts pie  ewise , separat- ing individual piees b y no des (see 2 b elo w). Although the onstrution is natural, a detailed formal pro of of its orretness for general HMMs is rather long and requires ertain mild te hnial assumptions. This pap er, on the other hand, sho ws that in the sp e ial  ase of two state HMMs, the existen e of innite Viterbi alignments ne e ds no sp e ial assumptions and  an b e pr oven  onsider ably mor e e asily. The results of this pap er essen tially omplete and generalize those of (Calieb e, 2006, Calieb e and Rösler, 2002). 2 Preliminaries Let λ b e a suitable σ -nite referene measure on R d so that P a and P b ha v e densities with resp et to λ . F or example, λ an b e a Leb esgue measure, or, as in the ase of disrete observ ations, a oun ting measure. Th us, let f a and f b b e the densities of P a and P b , resp etiv ely . Throughout the rest of the pap er, w e assume that P a 6 = P b or, equiv alen tly , λ { x ∈ X : f a ( x ) 6 = f b ( x ) } > 0 . (1) Assumption (1) is natural sine there w ould b e no need to mo del the observ a- tion pro ess b y an HMM should the emission distributions oinide. Note also that unlik e in the general ase, the p ositivit y of the transition probabilities is also a natural assumption for the t w o state HMMs. No mor e assumption on the HMM is made in this p ap er. In partiular, unlik e (Calieb e, 2006, Calieb e and Rösler, 2002), w e do not assume the square in tegrabilit y of log( f a /f b ) , or equalit y of the supp orts of P a and P b . Ho w ev er, the latter ondition is not v ery restritiv e, sine for the t w o state HMMs with unequal supp orts the existene of innite Viterbi alignmen ts follo ws rather trivially (Corollary 2.1). Th us, for an y n ≥ 1 and an y x 1: n ∈ X n and y 1: n ∈ S n , the lik eliho o d Λ π ( y 1: n ; x 1: n ) is giv en b y P ( Y 1: n = y 1: n ) n Y i =1 f y i ( x i ) , where P ( Y 1: n = y 1: n ) = π y 1 n Y i =2 p y i − 1 y i . Sine estimation of ψ is not a goal of this pap er, the dep endene on ψ is suppressed. Deomp osition (2) and reursion (3) b elo w pro vide a basis for the Viterbi algorithm to ompute alignmen ts. Namely , for all u ∈ { 1 , 2 , . . . , n − 1 } , max y 1: n ∈ S n Λ π ( y 1: n ; x 1: n ) = max l ∈ S  δ u ( l ) × max y u +1: n ∈ S n − u Λ ( p l · ) ( y u +1: n ; x u +1: n )  , (2) 3 where ( p l · ) is the transition distribution giv en state l ∈ S , and the s or es δ u ( l ) := max y 1: u − 1 ∈ S u − 1 Λ(( y 1: u − 1 , l ); x 1: u ) , l = a, b, are dened for all u ≥ 1 , and x 1: u ∈ X u . Th us, δ u ( l ) is the maxim um of the lik eliho o d of the paths terminating at u in state l . Note that δ 1 ( l ) = π l f l ( x 1 ) and δ u ( l ) dep ends on x 1: u . δ u +1 ( a ) = max { δ u ( a ) p aa , δ u ( b ) p ba } f a ( x u +1 ) , (3) δ u +1 ( b ) = max { δ u ( a ) p ab , δ u ( b ) p bb } f b ( x u +1 ) , u ≥ 1 , Example 2.1 L et X 1 , X 2 , . . . b e i.i.d. fol lowing a mixtur e distribution π a P a + π b P b with density π a f a ( x ; θ a ) + π b f b ( x ; θ b ) and mixing weights π a , π b > 0 . Suh a se quen e is an HMM with the tr ansition pr ob abilities π a = p aa = p ba , π b = p bb = p ab . In this sp e ial  ase the alignment is e asy to exhibit. Inde e d, in this  ase r e ursion (3) writes for any u ≥ 1 as δ u +1 ( a ) = cπ a f a ( x u +1 ) , δ u +1 ( b ) = cπ b f b ( x u +1 ) , (4) wher e c = max { δ u ( a ) , δ u ( b ) } . Hen e, the alignment v ( x 1: n )  an b e obtaine d p ointwise as fol lows: v ( x 1: n ) = ( v ( x 1 ) , . . . , v ( x n )) , wher e v ( x ) = ar g max { π a f a ( x ) , π b f b ( x ) } . Equivalently (ignoring p ossible ties), using a gener alize d V or onoi p artition X = X a ∪ X b with X a = { x ∈ X : π a f a ( x ) ≥ π b f b ( x ) } , X b = { x ∈ X : π b f b ( x ) > π a f a ( x ) } , v ( x ) = a if and only if x ∈ X a , and otherwise (i.e. x ∈ X b ) v ( x ) = b . Generally , it follo ws from (3) that, if δ u ( a ) p aa > δ u ( b ) p ba , δ u ( a ) p ab > δ u ( b ) p bb , (5) for some u , 1 ≤ u , and some x 1: u ∈ X u , then for an y n > u and for an y extension x u +1: n ∈ X n − u , the Viterbi alignmen t go es through state a at time u . Namely , trunation v ( x 1: n ) 1: u oinides with the Viterbi alignmen t v ( x 1: u ) (indeed, (5) implies δ u ( a ) > δ u ( b ) ). Th us, under ondition (5), maximization of Λ π (( y 1: n , l ); x 1: n ) an b e reset at time u b y learing x 1: u from the memory , retaining v 1: u , and replaing the initial distribution π b y ( p a · ) for further maximization of Λ ( p a · ) ( y u +1: n ; x u +1: n ) . F ollo wing (Lem b er and K olo ydenk o, 2008), if ondition (5) holds, then x u is alled a str ong a -no de (of realization x 1: n , n > u ), where `strong' refers to the inequalities in (5) b eing strit. Supp ose x 1: ∞ on tains innitely man y strong a -no des at times u 1 < u 2 < . . . . Let v 1 = v ( x 1: u 1 ) , and let v k maximize Λ ( p a · ) ( y u k − 1 +1: u k ; x u k − 1 +1: u k ) , for 4 all k ≥ 2 . Then, onatenation ( v 1 , v 2 , v 3 , . . . ) is naturally alled the innite pie  ewise Viterbi alignment (Lem b er and K olo ydenk o, 2008). Apparen tly , the almost sur e existene of our innite alignmen ts diretly dep endends on the existene of innitely man y (strong) no des. A t the same time, whether or not x u is a no de dep ends on x 1: u and hene is diult to v erify diretly . F ortunately , in man y ases x u is guaran teed to b e a no de based on sev eral preeding observ ations x u − m : u , 1 ≤ m p ib f b ( x ) p bj , ∀ i, j ∈ S. (6) It is easy to  he k that for an y u ≥ 2 , x u = x is a strong a -no de for an y x 1: u − 1 . Hene, if x 1: ∞ on tains innitely man y observ ations satisfying (6), then x 1: ∞ also on tains innitely man y strong no des. This previous ondition in its turn is met pro vided λ ( { x ∈ X : p ia f a ( x ) p aj > p ib f b ( x ) p bj , ∀ i, j ∈ S } ) > 0 . (7) Indeed, sine our underlying Mark o v  hain Y is ergo di, it is rather easy to see that X is ergo di as w ell (Ephraim and Merha v, 2002, Genon-Catalot et al., 2000, Leroux, 1992). Also, (7) implies that P a ( { x ∈ X : p ia f a ( x ) p aj > p ib f b ( x ) p bj , ∀ i, j ∈ S } ) > 0 . Th us, it follo ws from ergo diit y of X that almost ev ery realization of X has innitely man y elemen ts satisfying (6) and, hene innitely man y strong no des. W e ha v e th us pro v ed the follo wing Lemma. Lemma 2.1 Assume that (7) holds. Then almost every se quen e of obser- vations x 1: ∞ has innitely many str ong a -no des. (Clearly , in ter hanging a and b giv es the same results in terms of b -no des.) Lemma 2.1 is essen tially Theorem 1 in (Calieb e and Rösler, 2002) (disre- garding a misprin t in the statemen t). Condition (7) holds for man y t w o-state HMMs inluding the so-alled additiv e Gaussian noise mo del (Calieb e, 2006), where the emission distributions are Gaussian. Another trivial example is the mo del with unequal supp orts of P a and P b . Indeed, in that ase (7) holds (at least up to sw apping a and b ). Hene, the follo wing Corollary . Corollary 2.1 If the supp orts of P a and P b ar e not e qual, then almost every se quen e of observations has innitely many str ong no des. The go al of this work is essential ly to r emove  ondition (7) fr om L emma 2.1. T o this end, follo wing (Lem b er and K olo ydenk o, 2008), w e all an ob- serv ation satisfying (6) an a - b arrier of length 1. More generally , a blo  k of observ ations z 1: k ∈ X k is alled a (strong) barrier of length k ≥ 1 if for ev ery m ≥ 0 and x 1: m ∈ X m , z 1: k on tains a (strong) no de of realization 5 ( x 1: m , z 1: k ) . In (Lem b er and K olo ydenk o, 2008), w e pro v e the existene of innitely man y barriers for a v ery general lass of HMMs. F or the t w o-state HMMs, the onditions of our result in (Lem b er and K olo ydenk o, 2008) are giv en b y (8) and (9) b elo w. P a ( { x ∈ X : f a ( x ) max { p aa , p ba } > f b ( x ) max { p bb , p ab }} ) > 0 and (8) P b ( { x ∈ X : f b ( x ) max { p bb , p ab } > f a ( x ) max { p aa , p ba }} ) > 0 . (9) T o a hiev e our goal, w e will rst pro v e the same result for the t w o-state HMM under the relaxed assumption that (8) or (9) holds. As w e shall see b elo w (Lemma 3.1), in our t w o-state HMM one of these onditions is automatially satised and, moreo v er, all barriers are strong. Hene, o urrene of innitely man y strong barriers in this ase will b e sho wn (Theorem 4.1) to require no additional assumptions. Finally , if a no de is not strong and v ( x 1: n ) is not unique, an alignmen t migh t exist that do es not go through this no de. Su h t yp e of pathologies ause te hnial inon v enienes in dening an innite Viterbi alignmen t and are treated in (Lem b er and K olo ydenk o, 2008). F ortunately , unlik e in the general ase, in the ase of t w o-state HMMs almost ev ery realization has in- nitely man y strong no des (Theorem 4.1). This allo ws for a simple resolution of the non-uniqueness in the ase of t w o-state HMMs. 3 Main results 3.1 Three t yp es of the t w o-state HMM The follo wing three ases exhaust all the p ossibilities: 1. p aa > p ba ( ⇔ p bb > p ab ) ; 2. p aa δ u ( b ) p ba δ u ( b ) p bb > δ u ( a ) p ab or ( B )  δ u ( b ) p ba > δ u ( a ) p aa δ u ( a ) p ab > δ u ( b ) p bb Case (A) is equiv alen t to p bb p ab > δ u ( a ) δ u ( b ) > p ba p aa (10) and ase (B) is equiv alen t to p bb p ab < δ u ( a ) δ u ( b ) u , the Viterbi alignmen t v ( x 1: n ) m ust satisfy v ( x 1: n ) u = v ( x 1: n ) u +1 . Similarly , in ase (B) δ u +1 ( a ) = δ u ( b ) p ba f a ( x u +1 ) and δ u +1 ( b ) = δ u ( a ) p ab f b ( x u +1 ) , i.e. v ( x 1: n ) u 6 = v ( x 1: n ) u +1 . Eviden tly , ase 1 and ase (B) are m utually exlusiv e, and so are ase 2 and ase (A). There- fore, if the transition matrix satises the onditions of ase 1, then x u is not a no de if and only if onditions (A) are fullled. This implies that in ase 1, no des are the only p ossibilit y for v ( x 1: n ) to  hange state. On the other hand, if the transition matrix satises the onditions of ase 2, then x u is not a no de if and only if (B) holds. Hene, in ase 2 no des are the only p ossibilit y for v ( x 1: n ) to remain in one state. Case 3 orresp onds to the mixture mo del (see Example 2.1 ab o v e). Apparen tly (4), ev ery observ ation is a no de in this ase (see also Figure 1 b elo w). Let us no w examine onditions (8) and (9). F rom equation (1), it follo ws that λ ( { x ∈ X : f a ( x ) > f b ( x ) } ) > 0 , λ ( { x ∈ X : f a ( x ) < f b ( x ) } ) > 0 (12) and, for an y α > β > 0 , λ ( { x ∈ X : αf a ( x ) > β f b ( x ) } ) > 0 ⇔ P a ( { x ∈ X : αf a ( x ) > β f b ( x ) } ) > 0 (13) λ ( { x ∈ X : αf b ( x ) > β f b ( x ) } ) > 0 ⇔ P b ( { x ∈ X : αf b ( y ) > β f b ( y ) } ) > 0 . (14) Therefore, w e ha v e the follo wing Lemma. Lemma 3.1 A ny two state HMM satises at le ast one of the  ondtions (8) and (9) . Pro of. In ase 1 , (8) and (9) are equiv alen t to P a ( { x ∈ X : f a ( x ) p aa > f b ( x ) p bb } ) = P a  x ∈ X : f b ( x ) p bb f a ( x ) p aa < 1  > 0 (15) P b ( { x ∈ X : f b ( x ) p bb > f a ( x ) p aa } ) = P b  x ∈ X : f a ( x ) p aa f b ( x ) p bb < 1  > 0 , (16) resp etiv ely . If p aa = p bb , then (12) implies that b oth (15) and (16) are satised, and hene b oth (8) and (9) hold. If p aa > p bb , then (15), and subsequen tly (8), follo w from (13). If p aa < p bb , then (16), and subsequen tly (9), follo w from (14). Hene, at least one of the assumptions (8), (9) is alw a ys guaran teed to hold. In ase 2 , (8) and (9) are equiv alen t to P a ( { x ∈ X : f a ( x ) p ba > f b ( x ) p ab ) } = P a  x ∈ X : f b ( x ) p ab f a ( x ) p ba < 1  > 0 (17) P b ( { x ∈ X : f b ( x ) p ab > f a ( x ) p ba ) } = P b  x ∈ X : f a ( x ) p ba f b ( x ) p ab < 1  > 0 , (18) 7 resp etiv ely . Again, if p aa = p bb , then (17) and (18) b oth hold without further assumptions. If p aa > p bb , then (17) is automatially satised. Lik ewise, (18) holds if p aa f b ( x ) π b } ) > 0 , (19) P b ( { x ∈ X : f b ( x ) π b > f a ( x ) π a } ) > 0 . (20) Assume π a ≥ π b . Then, (12) implies λ ( { x ∈ X : π a f a ( x ) > π b f b ( x ) } ) > 0 , whi h in turn implies (19). Finally , w e state and pro v e the main results for ea h of the three ases. ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦       ❅ ❅ ❅    ❅ ❅ ❅    ❅ ❅ ❅       ❅ ❅ ❅ ❅ ❅ ❅    ❅ ❅ ❅    b a b a a b Figure 1: Distint patterns of the Viterbi alignmen t in the t w o-state HMM: T op: Case 1, state an p ossibly  hange only at no des (larger irles). Middle: Case 2, states alw a ys alternate, exept p ossibly at no des. Bottom: Case 3, ev ery observ ation is a no de. 8 3.2 Case 1 First, note that ondition (7) in this ase is equiv alen t to λ ( { x ∈ X : p ba f a ( x ) p ab > p bb f b ( x ) p bb } ) > 0 , (21) As men tioned in 2, ondition (7) need not hold in general. Nonetheless, for the t w o-state HMM, w e ha v e the follo wing Lemma. Lemma 3.2 In  ase 1, almost every r e alization of the two-state HMM has innitely many str ong b arriers. Pro of. Without loss of generalit y , assume p aa ≥ p bb . Then (15) holds implying that there exists ǫ > 0 su h that P a ( X a ) > 0 , where X a :=  x ∈ X : f b ( x ) p bb f a ( x ) p aa < 1 − ǫ  . Let in teger k b e suien tly large for (1 − ǫ ) k k b e arbitrary and let z 0: k ∈ X k +1 a b e the last k + 1 observ ations in a generi sequene x 1: u ∈ X u − k − 1 × X k +1 a . T o shorten the notation, w e write d j ( z i ) for δ u − k + i ( j ) for ev ery i = 0 , 1 , . . . , k , j = a, b . Next, w e sho w that x u − k : u on tains at least one strong no de, and onsequen tly , z 0: k is a strong barrier. Indeed, if none of the observ ations x u − k : u w ere a strong a -no de then w e w ould ha v e d b ( z k ) = d b ( z 0 ) k Y j =1 f b ( z j ) p bb . Similarly , if none among the observ ations x u − k +1: u w ere a strong b -no de, w e w ould ha v e δ u ( a ) ≥ δ u − k ( b ) p ba ( k Y j =1 f a ( z j )) p k − 1 aa . Hene, δ u ( b ) δ u ( a ) ≤ δ u − k ( b ) p bb ( Q k j =1 f b ( z j )) p k − 1 bb δ u − k ( b ) p ba ( Q k j =1 f a ( z j )) p k − 1 aa = Q k j =1 ( f b ( z j ) p bb ) Q k j =1 ( f a ( z j ) p aa ) p aa p ba and b y (22) δ u ( b ) δ u ( a ) 0 , b y ergo diit y of HMM, almost ev ery realization has in- nitely man y barriers z 0: k ∈ X k +1 a , implying also that ev ery realization has innitely man y strong no des. The next Theorem renes the previous result. Theorem 3.1 Supp ose the (tr ansition matrix of the) two-state HMM me ets the  ondition of  ase 1. If p aa ≥ p bb , then almost every r e alization has in- nitely many str ong a -b arriers. (If p aa ≤ p bb , then almost every r e alization has innitely many str ong b -b arriers.) Pro of. Let p aa ≥ p bb and use the notation of the pro of of Lemma 3.2. First, w e sho w that none of the observ ations x k − u +1: u is a b -no de. Indeed, sine d b ( z 1 ) = max { d a ( z 0 ) p ab , d b ( z 0 ) p bb } f b ( z 1 ) , at least one of the follo wing t w o inequalities m ust hold: p ab f b ( z 1 ) p ba ≥ p aa f a ( z 1 ) p aa , p bb f b ( z 1 ) p ba ≥ p ba f a ( z 1 ) p aa (23) in order for x u − k +1 to b e a b -no de. Ho w ev er, (15) implies that p ba f a ( z 1 ) p aa > p bb f b ( z 1 ) p ba and, sine p bb > p ab , w e ha v e p bb f b ( z 1 ) p ba > p ab f b ( z 1 ) p ba . Hene, neither of the t w o inequalities (23) holds. Th us, x u − k +1 annot b e a b - no de, and the same argumen t sho ws that none of the subsequen t observ ations x u − k +2 , . . . , x u an b e a b -no de either. The argumen t of the pro of of Lemma 3.2 then sho ws that one of the observ ations in x u − k : u is a strong a -no de and therefore z 0: k is a strong a - barrier. The ergo di argumen t nishes the pro of. (The same argumen t with a and b sw app ed establishes the seond part of the Theorem.) Note that the ondition p bb ≥ p aa is suien t but not neessary for (16) to hold. In fat, for man y 2-state HMMs, su h as the one with additiv e white Gaussian noise, b oth (15) and (16) hold for an y (p ositiv e) v alues of p aa and p bb . On the other hand, it migh t happ en that one of the onditions (15) and (16), sa y (16) , fails. This w ould mean P b ( { x ∈ X : p bb f b ( x ) > p aa f a ( x ) } ) = 0 or, equiv alen tly , λ ( { x ∈ X : p bb f b ( x ) > p aa f a ( x ) } ) = 0 . (24) Corollary 3.1 In  ase 1, e quation (24) implies that almost ev ery se quen e of observations has innitely many str ong a -b arriers and no str ong b -no des. F urthermor e, e quation (24) in  ase 1 implies that for almost ev ery r e alization, if a b -no de do es o  ur, it o  urs b efor e the rst a -no de. Pro of. F rom the pro of of Theorem 3.1, it follo ws that no observ ation x ∈ X su h that p bb f b ( x ) ≤ p aa f a ( x ) (i.e. from the omplemen t of the set in (24)) 10 an b e a strong b -no de; a loser insp etion of the pro of atually sho ws that ev en a w eak (i.e. not strong) b -no de annot o ur after an a -no de (sine in ase 1 p bb > p ba ). Theorem 3.1 then implies that almost every sequene of observ ations has innitely man y strong a -barriers. Corollary 3.1 in its turn implies that starting with the rst strong a - no de on w ard, the Viterbi alignmen t v ( x 1: n ) sta ys in state a . As w e ha v e already men tioned, Viterbi alignmen ts need not b e unique (see (Lem b er and K olo ydenk o, 2008)), i.e. ties are p ossible in general, and in this ase, in partiular, they are p ossible up un til the rst strong a -no de. Ho w ev er, the imp ossibilit y of strong b -no des in this ase implies that the ties an b e brok en in fa v or of a , resulting in the onstan t all a alignmen t. Theorem 3.1 is a generalization of Theorem 7 in (Calieb e, 2006), whi h basially states that in ase 1, if (15) and (16) hold then under some additional assumptions (equal supp orts of P a and P b and further onditions A2), almost ev ery realization has innitely man y no des. Th us, (Calieb e, 2006) stops short of realizing that in ase 1 onditions (15) and (16) alone ensure the existene of a − and b -no des. This results in (Calieb e, 2006) in v oking Theorem 2 of (Calieb e and Rösler, 2002) to pro v e the existene of no des, hene sup eruous assumptions A1, A2. Also the pro of of Theorem 7 in (Calieb e and Rösler, 2002) ould b e simplied and shortened with the help of the notions of no des and barriers. Finally , Corollary 3.1 generalizes Theorems 8 and 9 of (Calieb e, 2006). 3.3 Case 2 Reall that w e ha v e b een pro ving the existene of barriers without ondition (7). Note that in ase 2, ondition (7) b eomes λ ( { x ∈ X : p aa f a ( x ) p aa > p ab f b ( x ) p ba } ) > 0 . Reall (2) also that in ter hanging a with b giv es a similar ondition for strong b -no des to o ur innitely often in almost every realization. It follo ws from (12) that for some ǫ > 0 , the sets X a := { x ∈ X : f a ( x )(1 − ǫ ) > f b ( x ) } , X b := { x ∈ X : f a ( x ) < f b ( x )(1 − ǫ ) } b oth ha v e p ositiv e λ -measure. Hene P a ( X a ) > 0 and P b ( X b ) > 0 . Then, for x 1:2 ∈ X a × X b , the follo wing holds: f b ( x 1 ) f a ( x 2 ) f a ( x 1 ) f b ( x 2 ) < (1 − ǫ ) 2 . (25) Lemma 3.3 In  ase 2, almost every r e alization has innitely many str ong b arriers. 11 Pro of. Let X a and X b b e as ab o v e. Cho ose k suien tly large for (1 − ǫ ) 2 k 2 k , ev ery sequene of observ ations x 1: u ∈ X u su h that x u − 2 k : u = z 0:2 k , on tains a strong no de, making z 0:2 k a strong barrier. The  hoie of k and z 0:2 k implies Q k i =1 p ba f a ( z 2 i − 1 ) p ab f b ( z 2 i ) Q k i =1 p ab f b ( z 2 i − 1 ) p ba f a ( z 2 i ) < (1 − ǫ ) 2 k < p bb p aa p ba p ab . (26) If there is no strong no de among x u − 2 k : u , then d b ( z 2 k ) = d b ( z 0 ) k Y i =1 p ba f a ( z 2 i − 1 ) p ab f b ( z 2 i ) and d a ( z 2 k ) ≥ d b ( z 0 ) p bb p ab k Y i =1 p ab f b ( z 2 i − 1 ) p ba f a ( z 2 i ) . Hene, b y (26) d b ( z 2 k ) d a ( z 2 k ) ≤ Q k i =1 p ba f a ( z 2 i − 1 ) p ab f b ( z 2 i ) p bb p ab Q k i =1 p ab f b ( z 2 i − 1 ) p ba f a ( z 2 i ) p ab f b ( x ) . (28) Hene, (17) holds. W e m ultiply the righ t side of (28) b y p ba p bb and the left side b y p ab p aa , and use (27) to obtain f a ( x ) p aa > f b ( x ) p bb . (29) Finally , for x ∈ X b , w e ha v e f a ( x ) < f b ( x ) . (30) W e will need the follo wing Lemma. 12 Lemma 3.4 Assume (in addition to b eing in  ase 2) that p ab ≤ p ba . a) In any p air of observations z 1:2 ∈ X a × X b , z 1 is not a b -no de. b) In any p air of observations z 2:3 ∈ X b × X a , if z 2 is a b -no de, then z 3 is a str ong a -no de. Pro of. Assume that p ab ≤ p ba , and onsider a ) . First note that sine w e are in ase 2, z 1 is a b -no de if and only if d b ( z 1 ) p bb ≥ d a ( z 1 ) p ab . (31) Supp ose rst that z 0 is not a no de, in whi h ase d b ( z 1 ) = d a ( z 0 ) p ab f b ( z 1 ) and d a ( z 1 ) = d b ( z 0 ) p ba f a ( z 1 ) . Then d a ( z 1 ) p ab = d b ( z 0 ) p ba f a ( z 1 ) p ab ≥ d a ( z 0 ) p aa f a ( z 1 ) p ab > d a ( z 0 ) p bb f b ( z 1 ) p ab = d a ( z 0 ) p ab f b ( z 1 ) p bb = d b ( z 1 ) p bb . The rst inequalit y ab o v e follo ws from the reursion prop ert y (3) of sores δ , whereas the seond one follo ws from (29) . Th us, when z 0 is not a no de, z 1 annot b e a b -no de. Similarly , supp osing that z 0 is an a -no de, w e obtain that z 1 is not a b -no de. Supp ose nally that z 0 is a b -no de. Then d b ( z 1 ) = d b ( z 0 ) p bb f b ( z 1 ) and d a ( z 1 ) = d b ( z 0 ) p ba f a ( z 1 ) . Applying onseutiv ely p bb < p ab , (28) and p bb < p ab again, w e obtain: p bb f b ( z 1 ) p bb < p ab f b ( z 1 ) p bb ≤ p ba f a ( z 1 ) p bb < p ba f a ( z 1 ) p ab . Th us, on trary to (31) d b ( z 1 ) p bb = d b ( z 0 ) p bb f b ( z 1 ) p bb < d b ( z 0 ) p ba f a ( z 1 ) p ab = d a ( z 1 ) p ab , that is, z 1 is not a b -no de in this ase either. Let us no w pro v e b) . If z 2 is a b -no de, then d a ( z 3 ) = d b ( z 2 ) p ba f a ( z 3 ) and d b ( z 3 ) = d b ( z 2 ) p bb f b ( z 3 ) . By (29), w e no w ha v e d a ( z 3 ) p aa = d b ( z 2 ) p ba f a ( z 3 ) p aa > d b ( z 2 ) p bb f b ( z 3 ) p ba = d b ( z 3 ) p ba . Similarly to the argumen t regarding b -no des guaran teed b y (31) ab o v e, w e no w ha v e d a ( z 3 ) > d b ( z 3 ) , implying d a ( z 3 ) p ab > d b ( z 3 ) p bb . Th us z 3 is a strong a -no de. Theorem 3.2 If p ba ≥ p ab , then almost every r e alization has innitely many str ong a -no des. If p ba ≤ p ab , then almost every r e alization has innitely many str ong b -no des. Pro of. Assume again that p ba ≥ p ab . Let z 0:2 k b e as in the pro of of Lemma 3.3 and atta h one more elemen t z 2 k +1 ∈ X b to the end. Th us, z 2 i ∈ X a and z 2 i +1 ∈ X b , i = 0 , 1 , . . . , k . F rom (the pro of of ) Lemma 3.3 w e kno w that z 0:2 k on tains at least one strong no de. If this is an a -no de, then the theorem is pro v en. Otherwise this is a b -no de, whi h, aording to part a) of Lemma (3.4), an only b e among z 1 , z 3 , . . . , z 2 k − 1 . Applying part b) of Lemma (3.4) sho ws that there m ust also b e a strong a -no de z 2 , z 4 , . . . , z 2 k . In v oking ergo diit y again nishes the pro of. 13 Clearly , sw apping a and b in the ab o v e disussion follo wing the pro of of Lemma 3.3, establishes the other part of the theorem. Inequalit y (27) guaran tees (17) . Often, the mo del is su h that in ad- dition to (17) , (18) also holds. Ho w ev er, to apply the previous pro of (i.e. of Theorem 3.2) to guaran tee the sim ultaneous existene of innitely man y strong a and b -no des, w e w ould need the follo wing oun terpart of (29) : P b ( { x ∈ X : f b ( x ) p ab > f a ( x ) p ba , f b ( x ) p bb > f a ( x ) p aa } ) > 0 , whi h is stronger than (18). Ho w ev er, this previous ondition is indeed often met, resulting in innitely man y strong a - and b -no des (in almost every realiza- tion x 1: ∞ ). Lemma 3.3 app ears without pro of as Theorem 10 in (Calieb e, 2006). The author of (Calieb e, 2006) atually suggests that Theorem 10 and other re- sults for ase 2 are analogous to the orresp onding results for ase 1, mainly Theorem 7 (of the same w ork). It is further stated in (Calieb e, 2006) that the pro ofs of those results are not giv en as they are v ery similar to the or- resp onding pro ofs in ase 1. Our presen t w orkings atually sho w that ase 2 is quite dissimilar to ase 1 (due to the utuating nature of the t ypial Viterbi alignmen t) and in partiular requires a more areful treatmen t. Note that, ev en if Theorem 10 in (Calieb e, 2006) assumed (8) and (9) (as Theorem 7 in (Calieb e, 2006) do es) to help one to pro v e this Theorem b y analogy to Theorem 7, it is still not lear ho w the t w o pro ofs ould b e v ery similar. 3.3.1 Case 3 (the mixture mo del) Reall that ev ery observ ation in this ase is a (not neessarily strong) no de. F urthermore, ev ery observ ation from { x ∈ X : π a f a ( x ) > π b f b ( x ) } is a strong a no de. Th us, w e ha v e the follo wing oun terpart of Theorems 3.1 and 3.2. Theorem 3.3 If π a ≥ π b , then almost every r e alization has innitely many str ong a -no des. If π a ≤ π b , then almost every r e alization has innitely many str ong b -no des. 4 Conlusion In summary , w e ha v e pro v ed Theorem 4.1 stated b elo w and pro viding a basis for the pieewise onstrution and asymptoti analysis of the Viterbi align- men ts of t w o-state HMMs. Theorem 4.1 A lmost every r e alization of the two-state HMM has innitely many str ong b arriers. F urthermor e a) if the tr ansition pr ob abilities satisfy p aa ≥ p ba then (almost every r e aliza- tion of ) the hain has innitely many str ong s -b arriers wher e s is suh that p ss = max { p aa , p bb } , 14 b) otherwise (i.e. if p aa < p ba ) (almost every r e alization of ) the hain has innitely many str ong s -b arriers wher e s is suh that p ts = max { p ab , p ba } (for some t ∈ S ). Referenes Baum, L. E., P etrie, T., 1966. Statistial inferene for probabilisti funtions of nite state Mark o v  hains. Ann. Math. Statist. 37, 15541563. Bilmes, J., 1998. A gen tle tutorial of the EM algorithm and its appliation to parameter estimation for Gaussian mixture and hidden Mark o v mo dels. T e h. Rep. 97021, In ternational Computer Siene Institute, Berk eley , CA, USA. Calieb e, A., 2006. Prop erties of the maxim um a p osteriori path estimator in hidden Mark o v mo dels. IEEE T rans. Inform. Theory 52 (1), 4151. Calieb e, A., Rösler, U., 2002. Con v ergene of the maxim um a p osteriori path estimator in hidden Mark o v mo dels. IEEE T rans. Inform. Theory 48 (7), 17501758. Cappé, O., Moulines, E., Rydén, T., 2005. Inferene in hidden Mark o v mo d- els. Springer Series in Statistis. Springer, New Y ork, with Randal Dou's on tributions to Chapter 9 and Christian P . Rob ert's to Chapters 6, 7 and 13, With Chapter 14 b y Gersende F ort, Philipp e Soulier and Moulines, and Chapter 15 b y Stéphane Bou heron and Elisab eth Gassiat. Durbin, R., Eddy , S., A., K., Mit hison, G., 1998. Biologial Sequene Anal- ysis: Probabilisti Mo dels of Proteins and Nulei Aids. Cam bridge Uni- v ersit y Press. Eddy , S., 2004. What is a hidden Mark o v mo del? Nature Biote hnology 22 (10), 1315  1316. Ephraim, Y., Merha v, N., 2002. Hidden Mark o v pro esses. IEEE T rans. In- form. Theory 48 (6), 15181569, sp eial issue on Shannon theory: p ersp e- tiv e, trends, and appliations. Genon-Catalot, V., Jean theau, T., Larédo, C., 2000. Sto  hasti v olatilit y mo dels as hidden Mark o v mo dels and statistial appliations. Bernoulli 6 (6), 10511079. Huang, X., Ariki, Y., Ja k, M., 1990. Hidden Mark o v mo dels for sp ee h reognition. Edin burgh Univ ersit y Press, Edin burgh, UK. Jelinek, F., 1976. Con tin uous sp ee h reognition b y statistial metho ds. Pro . IEEE 64, 532556. 15 Jelinek, F., 2001. Statistial metho ds for sp ee h reognition. The MIT Press, Cam bridge, MA, USA. Ji, G., Bilmes, J., 2006. Ba k o mo del training using partially observ ed data: Appliation to dialog at tagging. In: Pro . Human Language T e hn. Conf. NAA CL, Main Conferene. Asso iation for Computational Linguis- tis, New Y ork Cit y , USA, pp. 280287. URL http://www.alweb .o rg /an th ol ogy /N /N 06/ N0 6- 1 03 6 Juang, B.-H., Rabiner, L., 1990. The segmen tal K-means algorithm for esti- mating parameters of hidden Mark o v mo dels. IEEE T rans. A oust. Sp ee h Signal Pro . 38 (9), 16391641. K olo ydenk o, A., Käärik, M., Lem b er, J., 2007. On adjusted Viterbi training. A ta Appl. Math. 96 (1-3), 309326. Krogh, A., 1998. Computational Metho ds in Moleular Biology . Elsevier Si- ene, Ch. An In tro dution to Hidden Mark o v Mo dels for Biologial Se- quenes. Lem b er, J., K olo ydenk o, A., 2007. A djusted Viterbi training: A pro of of onept. Probab. Eng. Inf. Si. 21 (3), 451475. Lem b er, J., K olo ydenk o, A., 2008. The Adjusted Viterbi training for hidden Mark o v mo dels. Bernoulli 14 (1), 180206. Leroux, B. G., 1992. Maxim um-lik eliho o d estimation for hidden Mark o v mo d- els. Sto  hasti Pro ess. Appl. 40 (1), 127143. Lomsadze, A., T er-Ho vhannisy an, V., Cherno, V., Boro do vsky , M., 2005. Gene iden tiation in no v el euk ary oti genomes b y self-training algorithm. Nulei A ids Res. 33 (20), 64946506. MDermott, E., Hazen, T., 2004. Minim um lassiation error training of landmark mo dels for real-time on tin uous sp ee h reognition. In: Pro . ICASSP . Ney , H., Stein biss, V., Haeb-Um ba h, R., et. al. , 1994. An o v erview of the Philips resear h system for large v o abulary on tin uous sp ee h reognition. In t. J. P attern Reognit. Artif. In tell. 8 (1), 3370. O h, F., Ney , H., 2000. Impro v ed statistial alignmen t mo dels. In: Pro . 38th Ann. Meet. Asso . Comput. Linguist. Asso . Comput. Linguist., pp. 440  447. P admanabhan, M., Pi hen y , M., 2002. Large-v o abulary sp ee h reognition algorithms. Computer 35 (4), 42  50. Rabiner, L., Juang, B., 1993. F undamen tals of sp ee h reognition. Pren tie- Hall, In., Upp er Saddle Riv er, NJ, USA. 16 Rabiner, L., Wilp on, J., Juang, B., 1986. A segmen tal K-means training pro edure for onneted w ord reognition. A T&T Te h. J. 64 (3), 2140. Sh u, I., Hetherington, L., Glass, J., 2003. Baum-Wel h training for segmen t-based sp ee h reognition. In: Pro . IEEE ASR U W orkshop. St. Thomas, U. S. Virgin Islands, h ttp://groups.sail.mit.edu/sls/publiations/2003/ASR U_Sh u.p df, pp. 4348. Stein biss, V., Ney , H., Aub ert, X., et. al. , 1995. The Philips resear h system for on tin uous-sp ee h reognition. Philips J. Res. 49, 317352. Ström, N., Hetherington, L., Hazen, T., Sandness, E., Glass, J., 1999. A ous- ti mo deling impro v emen ts in a segmen t-based sp ee h reognizer. In: Pro . IEEE ASR U W orkshop. pp. 139142. 17

Infinite Viterbi alignments in the two state hidden Markov models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment