Infinite Viterbi alignments in the two state hidden Markov models
Since the early days of digital communication, Hidden Markov Models (HMMs) have now been routinely used in speech recognition, processing of natural languages, images, and in bioinformatics. An HMM $(X_i,Y_i)_{i\ge 1}$ assumes observations $X_1,X_2,.…
Authors: J. Lember, A. Koloydenko
Innite Viterbi alignmen ts in the t w o state hidden Mark o v mo dels Jüri Lem b er 1 a and Alexey K olo ydenk o 2 1 Univ ersit y of T artu, Estonia, email:jyrilut.ee 2 Ro y al Hollo w a y , Univ ersit y of London, UK, email:alexey .k olo ydenk orh ul.a.uk Abstrat Sine the early da ys of digital omm uniation, Hidden Mark o v Mo dels (HMMs) ha v e no w b een routinely used in sp ee h reognition, pro essing of natural languages, images, and in bioinformatis. An HMM ( X i , Y i ) i ≥ 1 assumes ob- serv ations X 1 , X 2 , . . . to b e onditionally indep enden t giv en an explanotary Mark o v pro ess Y 1 , Y 2 , . . . , whi h itself is not observ ed; moreo v er, the on- ditional distribution of X i dep ends solely on Y i . Cen tral to the theory and appliations of HMM is the Viterbi algorithm to nd a maximum a p oste- riori estimate q 1: n = ( q 1 , q 2 , . . . , q n ) of Y 1: n giv en the observ ed data x 1: n . Maxim um a p osteriori paths are also alled Viterbi paths or alignmen ts. Re- en tly , attempts ha v e b een made to study the b eha vior of Viterbi alignmen ts of HMMs with t w o hidden states when n tends to innit y . It has indeed b een sho wn that in some sp eial ases a w ell-dened limiting Viterbi align- men t exists. While inno v ativ e, these attempts ha v e relied on rather strong assumptions. This w ork pro v es the existene of innite Viterbi alignmen ts for virtually an y HMM with t w o hidden states. Keyw ords Hidden Mark o v mo dels, maxim um a p osterior path, Viterbi alignmen t, Viterbi extration, Viterbi training 1 In tro dution W e onsider hidden Mark o v mo dels (HMM) ( Y , X ) with t w o hidden states. Namely , Y represen ts the hidden pro ess Y 1 , Y 2 , . . . , whi h is an irreduible ap erio di Mark o v hain with state spae S = { a , b } . In partiular, the transi- tion probabilities P = ( p lm ) , l , m ∈ S , are p ositiv e and the stationary distri- bution π = π P is unique. F or te hnial on v eniene, Y 1 is assumed to follo w π , ho w ev er, the results of the pap er hold for arbitrary initial distributions. T o ev ery state l ∈ S there orresp onds an emission distribution P l on X = R d . a The author has b een supp orted b y the Estonian Siene F oundation Gran t 7553 1 Giv en a realization y 1: ∞ ∈ S ∞ of Y , the observ ations X 1: ∞ := X 1 , X 2 , . . . are generated as follo ws. If Y i = a (resp. Y i = b ), then X i is distributed aording to P a (resp. P b ) and indep enden tly of ev erything else. W e refer to this mo del as the (general) 2-state HMM. In (Cappé et al., 2005), HMMs are alled `one of the most suessful sta- tistial mo delling ideas that ha v e [emerged℄ in the last fort y y ears'. Sine their lassial appliation to digital omm uniation in 1960s (see further referenes in (Cappé et al., 2005)), HMMs ha v e had a dening impat on the mainstream resear h in sp ee h reognition (Huang et al., 1990, Jelinek, 1976, 2001, MDermott and Hazen, 2004, Ney et al., 1994, P admanabhan and Pi hen y, 2002, Rabiner and Juang, 1993, Rabiner et al., 1986, Sh u et al., 2003, Stein biss et al., 1995, Ström et al., 1999), natural language mo dels (Ji and Bilmes, 2006, O h and Ney, 2000), and more reen tly omputational biology (Durbin et al., 1998, Eddy, 2004, Krogh, 1998, Lomsadze et al., 2005). Th us, for example, DNA regions an b e lab eled as a , `o ding', or b , `non-o ding', with P a and P b represen ting the resp etiv e distributions on the { A, C, G, T } alphab et. Giv en observ ations x 1: n := x 1 , . . . , x n , and treating the hidden states y 1: n := y 1 , . . . , y n as parameters, inferene in HMMs t ypially in v olv es v ( x 1: n ) , a maximum a p osteriori (MAP) estimate of Y 1: n . It has no w b een reognized that `[in℄ spite of the theoretial and pratial imp ortane of the MAP path estimator, v ery little is kno wn ab out its prop erties' (Calieb e, 2006). The same estimates are also kno wn as Viterbi , or for e d alignments and an b e eien tly omputed b y a dynami programming algorithm also b earing the name of Viterbi . When substituted for true y 1: n in the lik eliho o d funtion Λ( y 1: n ; x 1: n , ψ ) , Viterbi alignmen ts an also b e used to estimate ψ , an y un- kno wn free parameters of the mo del. Starting with an initial guess ψ (0) and alternating b et w een maximization of the lik eliho o d Λ( y 1: n ; x 1: n , ψ ) in y 1: n and ψ is at the ore of Viterbi tr aining (VT), or extr ation (Jelinek, 1976), also kno wn as se gmental K-me ans (Ephraim and Merha v, 2002, Juang and Rabiner, 1990). Resulting estimates ˆ ψ VT ( x 1: n , ψ (0) ) are kno wn to b e dier- en t from the maximum likeliho o d (ML) estimates ˆ ψ ML ( x 1: n , ψ (0) ) whi h in this ase are most ommonly deliv ered b y the EM pro edure (Baum and P etrie, 1966, Bilmes, 1998, Ephraim and Merha v, 2002). Ev en if ψ w ere kno wn, Viterbi alignmen ts v ( x 1: n ; ψ ) w ould t ypially dier from true paths y 1: n , and the long-run prop erties of v ( x 1: n ; ψ ) are not ob vious (Calieb e, 2006, Calieb e and Rösler, 2002, K olo ydenk o et al., 2007, Lem b er and K olo ydenk o, 2007, 2008). F urthermore, (K olo ydenk o et al., 2007, Lem b er and K olo ydenk o, 2007, 2008) prop ose a h ybrid of VT and EM whi h tak es in to aoun t the asymptoti disrepany b et w een ˆ ψ ML ( x 1: n , ψ (0) ) and ˆ ψ VT ( x 1: n , ψ (0) ) in order to inrease omputational and statistial eienies of estimation of ψ for n large. Th us or otherwise, an imp ortan t question is how to nd the asymp- toti pr op erties of Viterbi alignments, given that ( n + 1) th observation an in priniple hange the pr evious alignment entir ely, i.e. v ( x 1: n +1 ) i 6 = v ( x 1: n ) i , 2 1 ≤ i ≤ n ? Do the Viterbi alignmen ts then admit w ell-dened extensions? W e answ er this question p ositiv ely in (Lem b er and K olo ydenk o, 2008) for general HMMs (in partiular, allo wing more than t w o hidden states) b y on- struting pr op er innite Viterbi alignments . Generalizing and larifying re- lated results of (Calieb e, 2006, Calieb e and Rösler, 2002), the approa h in (Lem b er and K olo ydenk o, 2008) is to extend alignmen ts pie ewise , separat- ing individual piees b y no des (see 2 b elo w). Although the onstrution is natural, a detailed formal pro of of its orretness for general HMMs is rather long and requires ertain mild te hnial assumptions. This pap er, on the other hand, sho ws that in the sp e ial ase of two state HMMs, the existen e of innite Viterbi alignments ne e ds no sp e ial assumptions and an b e pr oven onsider ably mor e e asily. The results of this pap er essen tially omplete and generalize those of (Calieb e, 2006, Calieb e and Rösler, 2002). 2 Preliminaries Let λ b e a suitable σ -nite referene measure on R d so that P a and P b ha v e densities with resp et to λ . F or example, λ an b e a Leb esgue measure, or, as in the ase of disrete observ ations, a oun ting measure. Th us, let f a and f b b e the densities of P a and P b , resp etiv ely . Throughout the rest of the pap er, w e assume that P a 6 = P b or, equiv alen tly , λ { x ∈ X : f a ( x ) 6 = f b ( x ) } > 0 . (1) Assumption (1) is natural sine there w ould b e no need to mo del the observ a- tion pro ess b y an HMM should the emission distributions oinide. Note also that unlik e in the general ase, the p ositivit y of the transition probabilities is also a natural assumption for the t w o state HMMs. No mor e assumption on the HMM is made in this p ap er. In partiular, unlik e (Calieb e, 2006, Calieb e and Rösler, 2002), w e do not assume the square in tegrabilit y of log( f a /f b ) , or equalit y of the supp orts of P a and P b . Ho w ev er, the latter ondition is not v ery restritiv e, sine for the t w o state HMMs with unequal supp orts the existene of innite Viterbi alignmen ts follo ws rather trivially (Corollary 2.1). Th us, for an y n ≥ 1 and an y x 1: n ∈ X n and y 1: n ∈ S n , the lik eliho o d Λ π ( y 1: n ; x 1: n ) is giv en b y P ( Y 1: n = y 1: n ) n Y i =1 f y i ( x i ) , where P ( Y 1: n = y 1: n ) = π y 1 n Y i =2 p y i − 1 y i . Sine estimation of ψ is not a goal of this pap er, the dep endene on ψ is suppressed. Deomp osition (2) and reursion (3) b elo w pro vide a basis for the Viterbi algorithm to ompute alignmen ts. Namely , for all u ∈ { 1 , 2 , . . . , n − 1 } , max y 1: n ∈ S n Λ π ( y 1: n ; x 1: n ) = max l ∈ S δ u ( l ) × max y u +1: n ∈ S n − u Λ ( p l · ) ( y u +1: n ; x u +1: n ) , (2) 3 where ( p l · ) is the transition distribution giv en state l ∈ S , and the s or es δ u ( l ) := max y 1: u − 1 ∈ S u − 1 Λ(( y 1: u − 1 , l ); x 1: u ) , l = a, b, are dened for all u ≥ 1 , and x 1: u ∈ X u . Th us, δ u ( l ) is the maxim um of the lik eliho o d of the paths terminating at u in state l . Note that δ 1 ( l ) = π l f l ( x 1 ) and δ u ( l ) dep ends on x 1: u . δ u +1 ( a ) = max { δ u ( a ) p aa , δ u ( b ) p ba } f a ( x u +1 ) , (3) δ u +1 ( b ) = max { δ u ( a ) p ab , δ u ( b ) p bb } f b ( x u +1 ) , u ≥ 1 , Example 2.1 L et X 1 , X 2 , . . . b e i.i.d. fol lowing a mixtur e distribution π a P a + π b P b with density π a f a ( x ; θ a ) + π b f b ( x ; θ b ) and mixing weights π a , π b > 0 . Suh a se quen e is an HMM with the tr ansition pr ob abilities π a = p aa = p ba , π b = p bb = p ab . In this sp e ial ase the alignment is e asy to exhibit. Inde e d, in this ase r e ursion (3) writes for any u ≥ 1 as δ u +1 ( a ) = cπ a f a ( x u +1 ) , δ u +1 ( b ) = cπ b f b ( x u +1 ) , (4) wher e c = max { δ u ( a ) , δ u ( b ) } . Hen e, the alignment v ( x 1: n ) an b e obtaine d p ointwise as fol lows: v ( x 1: n ) = ( v ( x 1 ) , . . . , v ( x n )) , wher e v ( x ) = ar g max { π a f a ( x ) , π b f b ( x ) } . Equivalently (ignoring p ossible ties), using a gener alize d V or onoi p artition X = X a ∪ X b with X a = { x ∈ X : π a f a ( x ) ≥ π b f b ( x ) } , X b = { x ∈ X : π b f b ( x ) > π a f a ( x ) } , v ( x ) = a if and only if x ∈ X a , and otherwise (i.e. x ∈ X b ) v ( x ) = b . Generally , it follo ws from (3) that, if δ u ( a ) p aa > δ u ( b ) p ba , δ u ( a ) p ab > δ u ( b ) p bb , (5) for some u , 1 ≤ u , and some x 1: u ∈ X u , then for an y n > u and for an y extension x u +1: n ∈ X n − u , the Viterbi alignmen t go es through state a at time u . Namely , trunation v ( x 1: n ) 1: u oinides with the Viterbi alignmen t v ( x 1: u ) (indeed, (5) implies δ u ( a ) > δ u ( b ) ). Th us, under ondition (5), maximization of Λ π (( y 1: n , l ); x 1: n ) an b e reset at time u b y learing x 1: u from the memory , retaining v 1: u , and replaing the initial distribution π b y ( p a · ) for further maximization of Λ ( p a · ) ( y u +1: n ; x u +1: n ) . F ollo wing (Lem b er and K olo ydenk o, 2008), if ondition (5) holds, then x u is alled a str ong a -no de (of realization x 1: n , n > u ), where `strong' refers to the inequalities in (5) b eing strit. Supp ose x 1: ∞ on tains innitely man y strong a -no des at times u 1 < u 2 < . . . . Let v 1 = v ( x 1: u 1 ) , and let v k maximize Λ ( p a · ) ( y u k − 1 +1: u k ; x u k − 1 +1: u k ) , for 4 all k ≥ 2 . Then, onatenation ( v 1 , v 2 , v 3 , . . . ) is naturally alled the innite pie ewise Viterbi alignment (Lem b er and K olo ydenk o, 2008). Apparen tly , the almost sur e existene of our innite alignmen ts diretly dep endends on the existene of innitely man y (strong) no des. A t the same time, whether or not x u is a no de dep ends on x 1: u and hene is diult to v erify diretly . F ortunately , in man y ases x u is guaran teed to b e a no de based on sev eral preeding observ ations x u − m : u , 1 ≤ m < u , ignoring the rest. Sp eially , supp ose for example that x ∈ X is su h that p ia f a ( x ) p aj > p ib f b ( x ) p bj , ∀ i, j ∈ S. (6) It is easy to he k that for an y u ≥ 2 , x u = x is a strong a -no de for an y x 1: u − 1 . Hene, if x 1: ∞ on tains innitely man y observ ations satisfying (6), then x 1: ∞ also on tains innitely man y strong no des. This previous ondition in its turn is met pro vided λ ( { x ∈ X : p ia f a ( x ) p aj > p ib f b ( x ) p bj , ∀ i, j ∈ S } ) > 0 . (7) Indeed, sine our underlying Mark o v hain Y is ergo di, it is rather easy to see that X is ergo di as w ell (Ephraim and Merha v, 2002, Genon-Catalot et al., 2000, Leroux, 1992). Also, (7) implies that P a ( { x ∈ X : p ia f a ( x ) p aj > p ib f b ( x ) p bj , ∀ i, j ∈ S } ) > 0 . Th us, it follo ws from ergo diit y of X that almost ev ery realization of X has innitely man y elemen ts satisfying (6) and, hene innitely man y strong no des. W e ha v e th us pro v ed the follo wing Lemma. Lemma 2.1 Assume that (7) holds. Then almost every se quen e of obser- vations x 1: ∞ has innitely many str ong a -no des. (Clearly , in ter hanging a and b giv es the same results in terms of b -no des.) Lemma 2.1 is essen tially Theorem 1 in (Calieb e and Rösler, 2002) (disre- garding a misprin t in the statemen t). Condition (7) holds for man y t w o-state HMMs inluding the so-alled additiv e Gaussian noise mo del (Calieb e, 2006), where the emission distributions are Gaussian. Another trivial example is the mo del with unequal supp orts of P a and P b . Indeed, in that ase (7) holds (at least up to sw apping a and b ). Hene, the follo wing Corollary . Corollary 2.1 If the supp orts of P a and P b ar e not e qual, then almost every se quen e of observations has innitely many str ong no des. The go al of this work is essential ly to r emove ondition (7) fr om L emma 2.1. T o this end, follo wing (Lem b er and K olo ydenk o, 2008), w e all an ob- serv ation satisfying (6) an a - b arrier of length 1. More generally , a blo k of observ ations z 1: k ∈ X k is alled a (strong) barrier of length k ≥ 1 if for ev ery m ≥ 0 and x 1: m ∈ X m , z 1: k on tains a (strong) no de of realization 5 ( x 1: m , z 1: k ) . In (Lem b er and K olo ydenk o, 2008), w e pro v e the existene of innitely man y barriers for a v ery general lass of HMMs. F or the t w o-state HMMs, the onditions of our result in (Lem b er and K olo ydenk o, 2008) are giv en b y (8) and (9) b elo w. P a ( { x ∈ X : f a ( x ) max { p aa , p ba } > f b ( x ) max { p bb , p ab }} ) > 0 and (8) P b ( { x ∈ X : f b ( x ) max { p bb , p ab } > f a ( x ) max { p aa , p ba }} ) > 0 . (9) T o a hiev e our goal, w e will rst pro v e the same result for the t w o-state HMM under the relaxed assumption that (8) or (9) holds. As w e shall see b elo w (Lemma 3.1), in our t w o-state HMM one of these onditions is automatially satised and, moreo v er, all barriers are strong. Hene, o urrene of innitely man y strong barriers in this ase will b e sho wn (Theorem 4.1) to require no additional assumptions. Finally , if a no de is not strong and v ( x 1: n ) is not unique, an alignmen t migh t exist that do es not go through this no de. Su h t yp e of pathologies ause te hnial inon v enienes in dening an innite Viterbi alignmen t and are treated in (Lem b er and K olo ydenk o, 2008). F ortunately , unlik e in the general ase, in the ase of t w o-state HMMs almost ev ery realization has in- nitely man y strong no des (Theorem 4.1). This allo ws for a simple resolution of the non-uniqueness in the ase of t w o-state HMMs. 3 Main results 3.1 Three t yp es of the t w o-state HMM The follo wing three ases exhaust all the p ossibilities: 1. p aa > p ba ( ⇔ p bb > p ab ) ; 2. p aa < p ba ( ⇔ p bb < p ab ) ; 3. p aa = p ba ( ⇔ p bb = p ab ) . F rom the denition of no des, it follo ws that x u is not a no de only in one of the follo wing t w o ases: ( A ) δ u ( a ) p aa > δ u ( b ) p ba δ u ( b ) p bb > δ u ( a ) p ab or ( B ) δ u ( b ) p ba > δ u ( a ) p aa δ u ( a ) p ab > δ u ( b ) p bb Case (A) is equiv alen t to p bb p ab > δ u ( a ) δ u ( b ) > p ba p aa (10) and ase (B) is equiv alen t to p bb p ab < δ u ( a ) δ u ( b ) < p ba p aa . (11) 6 Th us, in ase (A), w e ha v e δ u +1 ( a ) = δ u ( a ) p aa f a ( x u +1 ) and δ u +1 ( b ) = δ u ( b ) p bb f b ( x u +1 ) , so that for an y n > u , the Viterbi alignmen t v ( x 1: n ) m ust satisfy v ( x 1: n ) u = v ( x 1: n ) u +1 . Similarly , in ase (B) δ u +1 ( a ) = δ u ( b ) p ba f a ( x u +1 ) and δ u +1 ( b ) = δ u ( a ) p ab f b ( x u +1 ) , i.e. v ( x 1: n ) u 6 = v ( x 1: n ) u +1 . Eviden tly , ase 1 and ase (B) are m utually exlusiv e, and so are ase 2 and ase (A). There- fore, if the transition matrix satises the onditions of ase 1, then x u is not a no de if and only if onditions (A) are fullled. This implies that in ase 1, no des are the only p ossibilit y for v ( x 1: n ) to hange state. On the other hand, if the transition matrix satises the onditions of ase 2, then x u is not a no de if and only if (B) holds. Hene, in ase 2 no des are the only p ossibilit y for v ( x 1: n ) to remain in one state. Case 3 orresp onds to the mixture mo del (see Example 2.1 ab o v e). Apparen tly (4), ev ery observ ation is a no de in this ase (see also Figure 1 b elo w). Let us no w examine onditions (8) and (9). F rom equation (1), it follo ws that λ ( { x ∈ X : f a ( x ) > f b ( x ) } ) > 0 , λ ( { x ∈ X : f a ( x ) < f b ( x ) } ) > 0 (12) and, for an y α > β > 0 , λ ( { x ∈ X : αf a ( x ) > β f b ( x ) } ) > 0 ⇔ P a ( { x ∈ X : αf a ( x ) > β f b ( x ) } ) > 0 (13) λ ( { x ∈ X : αf b ( x ) > β f b ( x ) } ) > 0 ⇔ P b ( { x ∈ X : αf b ( y ) > β f b ( y ) } ) > 0 . (14) Therefore, w e ha v e the follo wing Lemma. Lemma 3.1 A ny two state HMM satises at le ast one of the ondtions (8) and (9) . Pro of. In ase 1 , (8) and (9) are equiv alen t to P a ( { x ∈ X : f a ( x ) p aa > f b ( x ) p bb } ) = P a x ∈ X : f b ( x ) p bb f a ( x ) p aa < 1 > 0 (15) P b ( { x ∈ X : f b ( x ) p bb > f a ( x ) p aa } ) = P b x ∈ X : f a ( x ) p aa f b ( x ) p bb < 1 > 0 , (16) resp etiv ely . If p aa = p bb , then (12) implies that b oth (15) and (16) are satised, and hene b oth (8) and (9) hold. If p aa > p bb , then (15), and subsequen tly (8), follo w from (13). If p aa < p bb , then (16), and subsequen tly (9), follo w from (14). Hene, at least one of the assumptions (8), (9) is alw a ys guaran teed to hold. In ase 2 , (8) and (9) are equiv alen t to P a ( { x ∈ X : f a ( x ) p ba > f b ( x ) p ab ) } = P a x ∈ X : f b ( x ) p ab f a ( x ) p ba < 1 > 0 (17) P b ( { x ∈ X : f b ( x ) p ab > f a ( x ) p ba ) } = P b x ∈ X : f a ( x ) p ba f b ( x ) p ab < 1 > 0 , (18) 7 resp etiv ely . Again, if p aa = p bb , then (17) and (18) b oth hold without further assumptions. If p aa > p bb , then (17) is automatially satised. Lik ewise, (18) holds if p aa < p bb . Hene, one of the assumptions (8), (9) is alw a ys guaran teed to hold. In ase 3 , (8) and (9) write P a ( { x ∈ X : f a ( x ) π a > f b ( x ) π b } ) > 0 , (19) P b ( { x ∈ X : f b ( x ) π b > f a ( x ) π a } ) > 0 . (20) Assume π a ≥ π b . Then, (12) implies λ ( { x ∈ X : π a f a ( x ) > π b f b ( x ) } ) > 0 , whi h in turn implies (19). Finally , w e state and pro v e the main results for ea h of the three ases. ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ b a b a a b Figure 1: Distint patterns of the Viterbi alignmen t in the t w o-state HMM: T op: Case 1, state an p ossibly hange only at no des (larger irles). Middle: Case 2, states alw a ys alternate, exept p ossibly at no des. Bottom: Case 3, ev ery observ ation is a no de. 8 3.2 Case 1 First, note that ondition (7) in this ase is equiv alen t to λ ( { x ∈ X : p ba f a ( x ) p ab > p bb f b ( x ) p bb } ) > 0 , (21) As men tioned in 2, ondition (7) need not hold in general. Nonetheless, for the t w o-state HMM, w e ha v e the follo wing Lemma. Lemma 3.2 In ase 1, almost every r e alization of the two-state HMM has innitely many str ong b arriers. Pro of. Without loss of generalit y , assume p aa ≥ p bb . Then (15) holds implying that there exists ǫ > 0 su h that P a ( X a ) > 0 , where X a := x ∈ X : f b ( x ) p bb f a ( x ) p aa < 1 − ǫ . Let in teger k b e suien tly large for (1 − ǫ ) k < p ab p ba / ( p aa p bb ) to hold. Then ev ery sequene z 1: k ∈ X k a satises k Y j =1 f b ( z j ) p bb f a ( z j ) p aa < (1 − ǫ ) k < p ab p ba p aa p bb . (22) Let u > k b e arbitrary and let z 0: k ∈ X k +1 a b e the last k + 1 observ ations in a generi sequene x 1: u ∈ X u − k − 1 × X k +1 a . T o shorten the notation, w e write d j ( z i ) for δ u − k + i ( j ) for ev ery i = 0 , 1 , . . . , k , j = a, b . Next, w e sho w that x u − k : u on tains at least one strong no de, and onsequen tly , z 0: k is a strong barrier. Indeed, if none of the observ ations x u − k : u w ere a strong a -no de then w e w ould ha v e d b ( z k ) = d b ( z 0 ) k Y j =1 f b ( z j ) p bb . Similarly , if none among the observ ations x u − k +1: u w ere a strong b -no de, w e w ould ha v e δ u ( a ) ≥ δ u − k ( b ) p ba ( k Y j =1 f a ( z j )) p k − 1 aa . Hene, δ u ( b ) δ u ( a ) ≤ δ u − k ( b ) p bb ( Q k j =1 f b ( z j )) p k − 1 bb δ u − k ( b ) p ba ( Q k j =1 f a ( z j )) p k − 1 aa = Q k j =1 ( f b ( z j ) p bb ) Q k j =1 ( f a ( z j ) p aa ) p aa p ba and b y (22) δ u ( b ) δ u ( a ) < p ab p bb 9 that on tradits (10). Th us, at least one of x u − k : u m ust b e a strong no de. Sine P a ( X a ) > 0 , b y ergo diit y of HMM, almost ev ery realization has in- nitely man y barriers z 0: k ∈ X k +1 a , implying also that ev ery realization has innitely man y strong no des. The next Theorem renes the previous result. Theorem 3.1 Supp ose the (tr ansition matrix of the) two-state HMM me ets the ondition of ase 1. If p aa ≥ p bb , then almost every r e alization has in- nitely many str ong a -b arriers. (If p aa ≤ p bb , then almost every r e alization has innitely many str ong b -b arriers.) Pro of. Let p aa ≥ p bb and use the notation of the pro of of Lemma 3.2. First, w e sho w that none of the observ ations x k − u +1: u is a b -no de. Indeed, sine d b ( z 1 ) = max { d a ( z 0 ) p ab , d b ( z 0 ) p bb } f b ( z 1 ) , at least one of the follo wing t w o inequalities m ust hold: p ab f b ( z 1 ) p ba ≥ p aa f a ( z 1 ) p aa , p bb f b ( z 1 ) p ba ≥ p ba f a ( z 1 ) p aa (23) in order for x u − k +1 to b e a b -no de. Ho w ev er, (15) implies that p ba f a ( z 1 ) p aa > p bb f b ( z 1 ) p ba and, sine p bb > p ab , w e ha v e p bb f b ( z 1 ) p ba > p ab f b ( z 1 ) p ba . Hene, neither of the t w o inequalities (23) holds. Th us, x u − k +1 annot b e a b - no de, and the same argumen t sho ws that none of the subsequen t observ ations x u − k +2 , . . . , x u an b e a b -no de either. The argumen t of the pro of of Lemma 3.2 then sho ws that one of the observ ations in x u − k : u is a strong a -no de and therefore z 0: k is a strong a - barrier. The ergo di argumen t nishes the pro of. (The same argumen t with a and b sw app ed establishes the seond part of the Theorem.) Note that the ondition p bb ≥ p aa is suien t but not neessary for (16) to hold. In fat, for man y 2-state HMMs, su h as the one with additiv e white Gaussian noise, b oth (15) and (16) hold for an y (p ositiv e) v alues of p aa and p bb . On the other hand, it migh t happ en that one of the onditions (15) and (16), sa y (16) , fails. This w ould mean P b ( { x ∈ X : p bb f b ( x ) > p aa f a ( x ) } ) = 0 or, equiv alen tly , λ ( { x ∈ X : p bb f b ( x ) > p aa f a ( x ) } ) = 0 . (24) Corollary 3.1 In ase 1, e quation (24) implies that almost ev ery se quen e of observations has innitely many str ong a -b arriers and no str ong b -no des. F urthermor e, e quation (24) in ase 1 implies that for almost ev ery r e alization, if a b -no de do es o ur, it o urs b efor e the rst a -no de. Pro of. F rom the pro of of Theorem 3.1, it follo ws that no observ ation x ∈ X su h that p bb f b ( x ) ≤ p aa f a ( x ) (i.e. from the omplemen t of the set in (24)) 10 an b e a strong b -no de; a loser insp etion of the pro of atually sho ws that ev en a w eak (i.e. not strong) b -no de annot o ur after an a -no de (sine in ase 1 p bb > p ba ). Theorem 3.1 then implies that almost every sequene of observ ations has innitely man y strong a -barriers. Corollary 3.1 in its turn implies that starting with the rst strong a - no de on w ard, the Viterbi alignmen t v ( x 1: n ) sta ys in state a . As w e ha v e already men tioned, Viterbi alignmen ts need not b e unique (see (Lem b er and K olo ydenk o, 2008)), i.e. ties are p ossible in general, and in this ase, in partiular, they are p ossible up un til the rst strong a -no de. Ho w ev er, the imp ossibilit y of strong b -no des in this ase implies that the ties an b e brok en in fa v or of a , resulting in the onstan t all a alignmen t. Theorem 3.1 is a generalization of Theorem 7 in (Calieb e, 2006), whi h basially states that in ase 1, if (15) and (16) hold then under some additional assumptions (equal supp orts of P a and P b and further onditions A2), almost ev ery realization has innitely man y no des. Th us, (Calieb e, 2006) stops short of realizing that in ase 1 onditions (15) and (16) alone ensure the existene of a − and b -no des. This results in (Calieb e, 2006) in v oking Theorem 2 of (Calieb e and Rösler, 2002) to pro v e the existene of no des, hene sup eruous assumptions A1, A2. Also the pro of of Theorem 7 in (Calieb e and Rösler, 2002) ould b e simplied and shortened with the help of the notions of no des and barriers. Finally , Corollary 3.1 generalizes Theorems 8 and 9 of (Calieb e, 2006). 3.3 Case 2 Reall that w e ha v e b een pro ving the existene of barriers without ondition (7). Note that in ase 2, ondition (7) b eomes λ ( { x ∈ X : p aa f a ( x ) p aa > p ab f b ( x ) p ba } ) > 0 . Reall (2) also that in ter hanging a with b giv es a similar ondition for strong b -no des to o ur innitely often in almost every realization. It follo ws from (12) that for some ǫ > 0 , the sets X a := { x ∈ X : f a ( x )(1 − ǫ ) > f b ( x ) } , X b := { x ∈ X : f a ( x ) < f b ( x )(1 − ǫ ) } b oth ha v e p ositiv e λ -measure. Hene P a ( X a ) > 0 and P b ( X b ) > 0 . Then, for x 1:2 ∈ X a × X b , the follo wing holds: f b ( x 1 ) f a ( x 2 ) f a ( x 1 ) f b ( x 2 ) < (1 − ǫ ) 2 . (25) Lemma 3.3 In ase 2, almost every r e alization has innitely many str ong b arriers. 11 Pro of. Let X a and X b b e as ab o v e. Cho ose k suien tly large for (1 − ǫ ) 2 k < p aa p bb p ba p ab to hold. Next, onsider a sequene z 0:2 k ∈ X 2 k +1 , where z 0 , z 2 i ∈ X a , z 2 i − 1 ∈ X b , for ev ery i = 1 , . . . , k . W e sho w that for ev ery u > 2 k , ev ery sequene of observ ations x 1: u ∈ X u su h that x u − 2 k : u = z 0:2 k , on tains a strong no de, making z 0:2 k a strong barrier. The hoie of k and z 0:2 k implies Q k i =1 p ba f a ( z 2 i − 1 ) p ab f b ( z 2 i ) Q k i =1 p ab f b ( z 2 i − 1 ) p ba f a ( z 2 i ) < (1 − ǫ ) 2 k < p bb p aa p ba p ab . (26) If there is no strong no de among x u − 2 k : u , then d b ( z 2 k ) = d b ( z 0 ) k Y i =1 p ba f a ( z 2 i − 1 ) p ab f b ( z 2 i ) and d a ( z 2 k ) ≥ d b ( z 0 ) p bb p ab k Y i =1 p ab f b ( z 2 i − 1 ) p ba f a ( z 2 i ) . Hene, b y (26) d b ( z 2 k ) d a ( z 2 k ) ≤ Q k i =1 p ba f a ( z 2 i − 1 ) p ab f b ( z 2 i ) p bb p ab Q k i =1 p ab f b ( z 2 i − 1 ) p ba f a ( z 2 i ) < p aa p ba whi h on tradits (11). Next, w e rene this result. Without loss of generalit y assume p ba ≥ p ab . Therefore p ab p aa ≥ p ba p bb , (27) and also, for ev ery x ∈ X a , p ba f a ( x ) > p ab f b ( x ) . (28) Hene, (17) holds. W e m ultiply the righ t side of (28) b y p ba p bb and the left side b y p ab p aa , and use (27) to obtain f a ( x ) p aa > f b ( x ) p bb . (29) Finally , for x ∈ X b , w e ha v e f a ( x ) < f b ( x ) . (30) W e will need the follo wing Lemma. 12 Lemma 3.4 Assume (in addition to b eing in ase 2) that p ab ≤ p ba . a) In any p air of observations z 1:2 ∈ X a × X b , z 1 is not a b -no de. b) In any p air of observations z 2:3 ∈ X b × X a , if z 2 is a b -no de, then z 3 is a str ong a -no de. Pro of. Assume that p ab ≤ p ba , and onsider a ) . First note that sine w e are in ase 2, z 1 is a b -no de if and only if d b ( z 1 ) p bb ≥ d a ( z 1 ) p ab . (31) Supp ose rst that z 0 is not a no de, in whi h ase d b ( z 1 ) = d a ( z 0 ) p ab f b ( z 1 ) and d a ( z 1 ) = d b ( z 0 ) p ba f a ( z 1 ) . Then d a ( z 1 ) p ab = d b ( z 0 ) p ba f a ( z 1 ) p ab ≥ d a ( z 0 ) p aa f a ( z 1 ) p ab > d a ( z 0 ) p bb f b ( z 1 ) p ab = d a ( z 0 ) p ab f b ( z 1 ) p bb = d b ( z 1 ) p bb . The rst inequalit y ab o v e follo ws from the reursion prop ert y (3) of sores δ , whereas the seond one follo ws from (29) . Th us, when z 0 is not a no de, z 1 annot b e a b -no de. Similarly , supp osing that z 0 is an a -no de, w e obtain that z 1 is not a b -no de. Supp ose nally that z 0 is a b -no de. Then d b ( z 1 ) = d b ( z 0 ) p bb f b ( z 1 ) and d a ( z 1 ) = d b ( z 0 ) p ba f a ( z 1 ) . Applying onseutiv ely p bb < p ab , (28) and p bb < p ab again, w e obtain: p bb f b ( z 1 ) p bb < p ab f b ( z 1 ) p bb ≤ p ba f a ( z 1 ) p bb < p ba f a ( z 1 ) p ab . Th us, on trary to (31) d b ( z 1 ) p bb = d b ( z 0 ) p bb f b ( z 1 ) p bb < d b ( z 0 ) p ba f a ( z 1 ) p ab = d a ( z 1 ) p ab , that is, z 1 is not a b -no de in this ase either. Let us no w pro v e b) . If z 2 is a b -no de, then d a ( z 3 ) = d b ( z 2 ) p ba f a ( z 3 ) and d b ( z 3 ) = d b ( z 2 ) p bb f b ( z 3 ) . By (29), w e no w ha v e d a ( z 3 ) p aa = d b ( z 2 ) p ba f a ( z 3 ) p aa > d b ( z 2 ) p bb f b ( z 3 ) p ba = d b ( z 3 ) p ba . Similarly to the argumen t regarding b -no des guaran teed b y (31) ab o v e, w e no w ha v e d a ( z 3 ) > d b ( z 3 ) , implying d a ( z 3 ) p ab > d b ( z 3 ) p bb . Th us z 3 is a strong a -no de. Theorem 3.2 If p ba ≥ p ab , then almost every r e alization has innitely many str ong a -no des. If p ba ≤ p ab , then almost every r e alization has innitely many str ong b -no des. Pro of. Assume again that p ba ≥ p ab . Let z 0:2 k b e as in the pro of of Lemma 3.3 and atta h one more elemen t z 2 k +1 ∈ X b to the end. Th us, z 2 i ∈ X a and z 2 i +1 ∈ X b , i = 0 , 1 , . . . , k . F rom (the pro of of ) Lemma 3.3 w e kno w that z 0:2 k on tains at least one strong no de. If this is an a -no de, then the theorem is pro v en. Otherwise this is a b -no de, whi h, aording to part a) of Lemma (3.4), an only b e among z 1 , z 3 , . . . , z 2 k − 1 . Applying part b) of Lemma (3.4) sho ws that there m ust also b e a strong a -no de z 2 , z 4 , . . . , z 2 k . In v oking ergo diit y again nishes the pro of. 13 Clearly , sw apping a and b in the ab o v e disussion follo wing the pro of of Lemma 3.3, establishes the other part of the theorem. Inequalit y (27) guaran tees (17) . Often, the mo del is su h that in ad- dition to (17) , (18) also holds. Ho w ev er, to apply the previous pro of (i.e. of Theorem 3.2) to guaran tee the sim ultaneous existene of innitely man y strong a and b -no des, w e w ould need the follo wing oun terpart of (29) : P b ( { x ∈ X : f b ( x ) p ab > f a ( x ) p ba , f b ( x ) p bb > f a ( x ) p aa } ) > 0 , whi h is stronger than (18). Ho w ev er, this previous ondition is indeed often met, resulting in innitely man y strong a - and b -no des (in almost every realiza- tion x 1: ∞ ). Lemma 3.3 app ears without pro of as Theorem 10 in (Calieb e, 2006). The author of (Calieb e, 2006) atually suggests that Theorem 10 and other re- sults for ase 2 are analogous to the orresp onding results for ase 1, mainly Theorem 7 (of the same w ork). It is further stated in (Calieb e, 2006) that the pro ofs of those results are not giv en as they are v ery similar to the or- resp onding pro ofs in ase 1. Our presen t w orkings atually sho w that ase 2 is quite dissimilar to ase 1 (due to the utuating nature of the t ypial Viterbi alignmen t) and in partiular requires a more areful treatmen t. Note that, ev en if Theorem 10 in (Calieb e, 2006) assumed (8) and (9) (as Theorem 7 in (Calieb e, 2006) do es) to help one to pro v e this Theorem b y analogy to Theorem 7, it is still not lear ho w the t w o pro ofs ould b e v ery similar. 3.3.1 Case 3 (the mixture mo del) Reall that ev ery observ ation in this ase is a (not neessarily strong) no de. F urthermore, ev ery observ ation from { x ∈ X : π a f a ( x ) > π b f b ( x ) } is a strong a no de. Th us, w e ha v e the follo wing oun terpart of Theorems 3.1 and 3.2. Theorem 3.3 If π a ≥ π b , then almost every r e alization has innitely many str ong a -no des. If π a ≤ π b , then almost every r e alization has innitely many str ong b -no des. 4 Conlusion In summary , w e ha v e pro v ed Theorem 4.1 stated b elo w and pro viding a basis for the pieewise onstrution and asymptoti analysis of the Viterbi align- men ts of t w o-state HMMs. Theorem 4.1 A lmost every r e alization of the two-state HMM has innitely many str ong b arriers. F urthermor e a) if the tr ansition pr ob abilities satisfy p aa ≥ p ba then (almost every r e aliza- tion of ) the hain has innitely many str ong s -b arriers wher e s is suh that p ss = max { p aa , p bb } , 14 b) otherwise (i.e. if p aa < p ba ) (almost every r e alization of ) the hain has innitely many str ong s -b arriers wher e s is suh that p ts = max { p ab , p ba } (for some t ∈ S ). Referenes Baum, L. E., P etrie, T., 1966. Statistial inferene for probabilisti funtions of nite state Mark o v hains. Ann. Math. Statist. 37, 15541563. Bilmes, J., 1998. A gen tle tutorial of the EM algorithm and its appliation to parameter estimation for Gaussian mixture and hidden Mark o v mo dels. T e h. Rep. 97021, In ternational Computer Siene Institute, Berk eley , CA, USA. Calieb e, A., 2006. Prop erties of the maxim um a p osteriori path estimator in hidden Mark o v mo dels. IEEE T rans. Inform. Theory 52 (1), 4151. Calieb e, A., Rösler, U., 2002. Con v ergene of the maxim um a p osteriori path estimator in hidden Mark o v mo dels. IEEE T rans. Inform. Theory 48 (7), 17501758. Cappé, O., Moulines, E., Rydén, T., 2005. Inferene in hidden Mark o v mo d- els. Springer Series in Statistis. Springer, New Y ork, with Randal Dou's on tributions to Chapter 9 and Christian P . Rob ert's to Chapters 6, 7 and 13, With Chapter 14 b y Gersende F ort, Philipp e Soulier and Moulines, and Chapter 15 b y Stéphane Bou heron and Elisab eth Gassiat. Durbin, R., Eddy , S., A., K., Mit hison, G., 1998. Biologial Sequene Anal- ysis: Probabilisti Mo dels of Proteins and Nulei Aids. Cam bridge Uni- v ersit y Press. Eddy , S., 2004. What is a hidden Mark o v mo del? Nature Biote hnology 22 (10), 1315 1316. Ephraim, Y., Merha v, N., 2002. Hidden Mark o v pro esses. IEEE T rans. In- form. Theory 48 (6), 15181569, sp eial issue on Shannon theory: p ersp e- tiv e, trends, and appliations. Genon-Catalot, V., Jean theau, T., Larédo, C., 2000. Sto hasti v olatilit y mo dels as hidden Mark o v mo dels and statistial appliations. Bernoulli 6 (6), 10511079. Huang, X., Ariki, Y., Ja k, M., 1990. Hidden Mark o v mo dels for sp ee h reognition. Edin burgh Univ ersit y Press, Edin burgh, UK. Jelinek, F., 1976. Con tin uous sp ee h reognition b y statistial metho ds. Pro . IEEE 64, 532556. 15 Jelinek, F., 2001. Statistial metho ds for sp ee h reognition. The MIT Press, Cam bridge, MA, USA. Ji, G., Bilmes, J., 2006. Ba k o mo del training using partially observ ed data: Appliation to dialog at tagging. In: Pro . Human Language T e hn. Conf. NAA CL, Main Conferene. Asso iation for Computational Linguis- tis, New Y ork Cit y , USA, pp. 280287. URL http://www.alweb .o rg /an th ol ogy /N /N 06/ N0 6- 1 03 6 Juang, B.-H., Rabiner, L., 1990. The segmen tal K-means algorithm for esti- mating parameters of hidden Mark o v mo dels. IEEE T rans. A oust. Sp ee h Signal Pro . 38 (9), 16391641. K olo ydenk o, A., Käärik, M., Lem b er, J., 2007. On adjusted Viterbi training. A ta Appl. Math. 96 (1-3), 309326. Krogh, A., 1998. Computational Metho ds in Moleular Biology . Elsevier Si- ene, Ch. An In tro dution to Hidden Mark o v Mo dels for Biologial Se- quenes. Lem b er, J., K olo ydenk o, A., 2007. A djusted Viterbi training: A pro of of onept. Probab. Eng. Inf. Si. 21 (3), 451475. Lem b er, J., K olo ydenk o, A., 2008. The Adjusted Viterbi training for hidden Mark o v mo dels. Bernoulli 14 (1), 180206. Leroux, B. G., 1992. Maxim um-lik eliho o d estimation for hidden Mark o v mo d- els. Sto hasti Pro ess. Appl. 40 (1), 127143. Lomsadze, A., T er-Ho vhannisy an, V., Cherno, V., Boro do vsky , M., 2005. Gene iden tiation in no v el euk ary oti genomes b y self-training algorithm. Nulei A ids Res. 33 (20), 64946506. MDermott, E., Hazen, T., 2004. Minim um lassiation error training of landmark mo dels for real-time on tin uous sp ee h reognition. In: Pro . ICASSP . Ney , H., Stein biss, V., Haeb-Um ba h, R., et. al. , 1994. An o v erview of the Philips resear h system for large v o abulary on tin uous sp ee h reognition. In t. J. P attern Reognit. Artif. In tell. 8 (1), 3370. O h, F., Ney , H., 2000. Impro v ed statistial alignmen t mo dels. In: Pro . 38th Ann. Meet. Asso . Comput. Linguist. Asso . Comput. Linguist., pp. 440 447. P admanabhan, M., Pi hen y , M., 2002. Large-v o abulary sp ee h reognition algorithms. Computer 35 (4), 42 50. Rabiner, L., Juang, B., 1993. F undamen tals of sp ee h reognition. Pren tie- Hall, In., Upp er Saddle Riv er, NJ, USA. 16 Rabiner, L., Wilp on, J., Juang, B., 1986. A segmen tal K-means training pro edure for onneted w ord reognition. A T&T Te h. J. 64 (3), 2140. Sh u, I., Hetherington, L., Glass, J., 2003. Baum-Wel h training for segmen t-based sp ee h reognition. In: Pro . IEEE ASR U W orkshop. St. Thomas, U. S. Virgin Islands, h ttp://groups.sail.mit.edu/sls/publiations/2003/ASR U_Sh u.p df, pp. 4348. Stein biss, V., Ney , H., Aub ert, X., et. al. , 1995. The Philips resear h system for on tin uous-sp ee h reognition. Philips J. Res. 49, 317352. Ström, N., Hetherington, L., Hazen, T., Sandness, E., Glass, J., 1999. A ous- ti mo deling impro v emen ts in a segmen t-based sp ee h reognizer. In: Pro . IEEE ASR U W orkshop. pp. 139142. 17
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment