Infinite Viterbi alignments in the two state hidden Markov models

Since the early days of digital communication, Hidden Markov Models (HMMs) have now been routinely used in speech recognition, processing of natural languages, images, and in bioinformatics. An HMM $(X_i,Y_i)_{i\ge 1}$ assumes observations $X_1,X_2,.…

Authors: J. Lember, A. Koloydenko

Infinite Viterbi alignments in the two state hidden Markov models
Innite Viterbi alignmen ts in the t w o state hidden Mark o v mo dels Jüri Lem b er 1 a and Alexey K olo ydenk o 2 1 Univ ersit y of T artu, Estonia, email:jyrilut.ee 2 Ro y al Hollo w a y , Univ ersit y of London, UK, email:alexey .k olo ydenk orh ul.a.uk Abstrat Sine the early da ys of digital omm uniation, Hidden Mark o v Mo dels (HMMs) ha v e no w b een routinely used in sp ee h reognition, pro essing of natural languages, images, and in bioinformatis. An HMM ( X i , Y i ) i ≥ 1 assumes ob- serv ations X 1 , X 2 , . . . to b e onditionally indep enden t giv en an explanotary Mark o v pro ess Y 1 , Y 2 , . . . , whi h itself is not observ ed; moreo v er, the on- ditional distribution of X i dep ends solely on Y i . Cen tral to the theory and appliations of HMM is the Viterbi algorithm to nd a maximum a p oste- riori estimate q 1: n = ( q 1 , q 2 , . . . , q n ) of Y 1: n giv en the observ ed data x 1: n . Maxim um a p osteriori paths are also alled Viterbi paths or alignmen ts. Re- en tly , attempts ha v e b een made to study the b eha vior of Viterbi alignmen ts of HMMs with t w o hidden states when n tends to innit y . It has indeed b een sho wn that in some sp eial ases a w ell-dened limiting Viterbi align- men t exists. While inno v ativ e, these attempts ha v e relied on rather strong assumptions. This w ork pro v es the existene of innite Viterbi alignmen ts for virtually an y HMM with t w o hidden states. Keyw ords Hidden Mark o v mo dels, maxim um a p osterior path, Viterbi alignmen t, Viterbi extration, Viterbi training 1 In tro dution W e onsider hidden Mark o v mo dels (HMM) ( Y , X ) with t w o hidden states. Namely , Y represen ts the hidden pro ess Y 1 , Y 2 , . . . , whi h is an irreduible ap erio di Mark o v  hain with state spae S = { a , b } . In partiular, the transi- tion probabilities P = ( p lm ) , l , m ∈ S , are p ositiv e and the stationary distri- bution π = π P is unique. F or te hnial on v eniene, Y 1 is assumed to follo w π , ho w ev er, the results of the pap er hold for arbitrary initial distributions. T o ev ery state l ∈ S there orresp onds an emission distribution P l on X = R d . a The author has b een supp orted b y the Estonian Siene F oundation Gran t 7553 1 Giv en a realization y 1: ∞ ∈ S ∞ of Y , the observ ations X 1: ∞ := X 1 , X 2 , . . . are generated as follo ws. If Y i = a (resp. Y i = b ), then X i is distributed aording to P a (resp. P b ) and indep enden tly of ev erything else. W e refer to this mo del as the (general) 2-state HMM. In (Cappé et al., 2005), HMMs are alled `one of the most suessful sta- tistial mo delling ideas that ha v e [emerged℄ in the last fort y y ears'. Sine their lassial appliation to digital omm uniation in 1960s (see further referenes in (Cappé et al., 2005)), HMMs ha v e had a dening impat on the mainstream resear h in sp ee h reognition (Huang et al., 1990, Jelinek, 1976, 2001, MDermott and Hazen, 2004, Ney et al., 1994, P admanabhan and Pi hen y, 2002, Rabiner and Juang, 1993, Rabiner et al., 1986, Sh u et al., 2003, Stein biss et al., 1995, Ström et al., 1999), natural language mo dels (Ji and Bilmes, 2006, O h and Ney, 2000), and more reen tly omputational biology (Durbin et al., 1998, Eddy, 2004, Krogh, 1998, Lomsadze et al., 2005). Th us, for example, DNA regions an b e lab eled as a , `o ding', or b , `non-o ding', with P a and P b represen ting the resp etiv e distributions on the { A, C, G, T } alphab et. Giv en observ ations x 1: n := x 1 , . . . , x n , and treating the hidden states y 1: n := y 1 , . . . , y n as parameters, inferene in HMMs t ypially in v olv es v ( x 1: n ) , a maximum a p osteriori (MAP) estimate of Y 1: n . It has no w b een reognized that `[in℄ spite of the theoretial and pratial imp ortane of the MAP path estimator, v ery little is kno wn ab out its prop erties' (Calieb e, 2006). The same estimates are also kno wn as Viterbi , or for  e d alignments and an b e eien tly omputed b y a dynami programming algorithm also b earing the name of Viterbi . When substituted for true y 1: n in the lik eliho o d funtion Λ( y 1: n ; x 1: n , ψ ) , Viterbi alignmen ts an also b e used to estimate ψ , an y un- kno wn free parameters of the mo del. Starting with an initial guess ψ (0) and alternating b et w een maximization of the lik eliho o d Λ( y 1: n ; x 1: n , ψ ) in y 1: n and ψ is at the ore of Viterbi tr aining (VT), or extr ation (Jelinek, 1976), also kno wn as se gmental K-me ans (Ephraim and Merha v, 2002, Juang and Rabiner, 1990). Resulting estimates ˆ ψ VT ( x 1: n , ψ (0) ) are kno wn to b e dier- en t from the maximum likeliho o d (ML) estimates ˆ ψ ML ( x 1: n , ψ (0) ) whi h in this ase are most ommonly deliv ered b y the EM pro edure (Baum and P etrie, 1966, Bilmes, 1998, Ephraim and Merha v, 2002). Ev en if ψ w ere kno wn, Viterbi alignmen ts v ( x 1: n ; ψ ) w ould t ypially dier from true paths y 1: n , and the long-run prop erties of v ( x 1: n ; ψ ) are not ob vious (Calieb e, 2006, Calieb e and Rösler, 2002, K olo ydenk o et al., 2007, Lem b er and K olo ydenk o, 2007, 2008). F urthermore, (K olo ydenk o et al., 2007, Lem b er and K olo ydenk o, 2007, 2008) prop ose a h ybrid of VT and EM whi h tak es in to aoun t the asymptoti disrepany b et w een ˆ ψ ML ( x 1: n , ψ (0) ) and ˆ ψ VT ( x 1: n , ψ (0) ) in order to inrease omputational and statistial eienies of estimation of ψ for n large. Th us or otherwise, an imp ortan t question is how to nd the asymp- toti pr op erties of Viterbi alignments, given that ( n + 1) th observation  an in priniple hange the pr evious alignment entir ely, i.e. v ( x 1: n +1 ) i 6 = v ( x 1: n ) i , 2 1 ≤ i ≤ n ? Do the Viterbi alignmen ts then admit w ell-dened extensions? W e answ er this question p ositiv ely in (Lem b er and K olo ydenk o, 2008) for general HMMs (in partiular, allo wing more than t w o hidden states) b y on- struting pr op er innite Viterbi alignments . Generalizing and larifying re- lated results of (Calieb e, 2006, Calieb e and Rösler, 2002), the approa h in (Lem b er and K olo ydenk o, 2008) is to extend alignmen ts pie  ewise , separat- ing individual piees b y no des (see 2 b elo w). Although the onstrution is natural, a detailed formal pro of of its orretness for general HMMs is rather long and requires ertain mild te hnial assumptions. This pap er, on the other hand, sho ws that in the sp e ial  ase of two state HMMs, the existen e of innite Viterbi alignments ne e ds no sp e ial assumptions and  an b e pr oven  onsider ably mor e e asily. The results of this pap er essen tially omplete and generalize those of (Calieb e, 2006, Calieb e and Rösler, 2002). 2 Preliminaries Let λ b e a suitable σ -nite referene measure on R d so that P a and P b ha v e densities with resp et to λ . F or example, λ an b e a Leb esgue measure, or, as in the ase of disrete observ ations, a oun ting measure. Th us, let f a and f b b e the densities of P a and P b , resp etiv ely . Throughout the rest of the pap er, w e assume that P a 6 = P b or, equiv alen tly , λ { x ∈ X : f a ( x ) 6 = f b ( x ) } > 0 . (1) Assumption (1) is natural sine there w ould b e no need to mo del the observ a- tion pro ess b y an HMM should the emission distributions oinide. Note also that unlik e in the general ase, the p ositivit y of the transition probabilities is also a natural assumption for the t w o state HMMs. No mor e assumption on the HMM is made in this p ap er. In partiular, unlik e (Calieb e, 2006, Calieb e and Rösler, 2002), w e do not assume the square in tegrabilit y of log( f a /f b ) , or equalit y of the supp orts of P a and P b . Ho w ev er, the latter ondition is not v ery restritiv e, sine for the t w o state HMMs with unequal supp orts the existene of innite Viterbi alignmen ts follo ws rather trivially (Corollary 2.1). Th us, for an y n ≥ 1 and an y x 1: n ∈ X n and y 1: n ∈ S n , the lik eliho o d Λ π ( y 1: n ; x 1: n ) is giv en b y P ( Y 1: n = y 1: n ) n Y i =1 f y i ( x i ) , where P ( Y 1: n = y 1: n ) = π y 1 n Y i =2 p y i − 1 y i . Sine estimation of ψ is not a goal of this pap er, the dep endene on ψ is suppressed. Deomp osition (2) and reursion (3) b elo w pro vide a basis for the Viterbi algorithm to ompute alignmen ts. Namely , for all u ∈ { 1 , 2 , . . . , n − 1 } , max y 1: n ∈ S n Λ π ( y 1: n ; x 1: n ) = max l ∈ S  δ u ( l ) × max y u +1: n ∈ S n − u Λ ( p l · ) ( y u +1: n ; x u +1: n )  , (2) 3 where ( p l · ) is the transition distribution giv en state l ∈ S , and the s or es δ u ( l ) := max y 1: u − 1 ∈ S u − 1 Λ(( y 1: u − 1 , l ); x 1: u ) , l = a, b, are dened for all u ≥ 1 , and x 1: u ∈ X u . Th us, δ u ( l ) is the maxim um of the lik eliho o d of the paths terminating at u in state l . Note that δ 1 ( l ) = π l f l ( x 1 ) and δ u ( l ) dep ends on x 1: u . δ u +1 ( a ) = max { δ u ( a ) p aa , δ u ( b ) p ba } f a ( x u +1 ) , (3) δ u +1 ( b ) = max { δ u ( a ) p ab , δ u ( b ) p bb } f b ( x u +1 ) , u ≥ 1 , Example 2.1 L et X 1 , X 2 , . . . b e i.i.d. fol lowing a mixtur e distribution π a P a + π b P b with density π a f a ( x ; θ a ) + π b f b ( x ; θ b ) and mixing weights π a , π b > 0 . Suh a se quen e is an HMM with the tr ansition pr ob abilities π a = p aa = p ba , π b = p bb = p ab . In this sp e ial  ase the alignment is e asy to exhibit. Inde e d, in this  ase r e ursion (3) writes for any u ≥ 1 as δ u +1 ( a ) = cπ a f a ( x u +1 ) , δ u +1 ( b ) = cπ b f b ( x u +1 ) , (4) wher e c = max { δ u ( a ) , δ u ( b ) } . Hen e, the alignment v ( x 1: n )  an b e obtaine d p ointwise as fol lows: v ( x 1: n ) = ( v ( x 1 ) , . . . , v ( x n )) , wher e v ( x ) = ar g max { π a f a ( x ) , π b f b ( x ) } . Equivalently (ignoring p ossible ties), using a gener alize d V or onoi p artition X = X a ∪ X b with X a = { x ∈ X : π a f a ( x ) ≥ π b f b ( x ) } , X b = { x ∈ X : π b f b ( x ) > π a f a ( x ) } , v ( x ) = a if and only if x ∈ X a , and otherwise (i.e. x ∈ X b ) v ( x ) = b . Generally , it follo ws from (3) that, if δ u ( a ) p aa > δ u ( b ) p ba , δ u ( a ) p ab > δ u ( b ) p bb , (5) for some u , 1 ≤ u , and some x 1: u ∈ X u , then for an y n > u and for an y extension x u +1: n ∈ X n − u , the Viterbi alignmen t go es through state a at time u . Namely , trunation v ( x 1: n ) 1: u oinides with the Viterbi alignmen t v ( x 1: u ) (indeed, (5) implies δ u ( a ) > δ u ( b ) ). Th us, under ondition (5), maximization of Λ π (( y 1: n , l ); x 1: n ) an b e reset at time u b y learing x 1: u from the memory , retaining v 1: u , and replaing the initial distribution π b y ( p a · ) for further maximization of Λ ( p a · ) ( y u +1: n ; x u +1: n ) . F ollo wing (Lem b er and K olo ydenk o, 2008), if ondition (5) holds, then x u is alled a str ong a -no de (of realization x 1: n , n > u ), where `strong' refers to the inequalities in (5) b eing strit. Supp ose x 1: ∞ on tains innitely man y strong a -no des at times u 1 < u 2 < . . . . Let v 1 = v ( x 1: u 1 ) , and let v k maximize Λ ( p a · ) ( y u k − 1 +1: u k ; x u k − 1 +1: u k ) , for 4 all k ≥ 2 . Then, onatenation ( v 1 , v 2 , v 3 , . . . ) is naturally alled the innite pie  ewise Viterbi alignment (Lem b er and K olo ydenk o, 2008). Apparen tly , the almost sur e existene of our innite alignmen ts diretly dep endends on the existene of innitely man y (strong) no des. A t the same time, whether or not x u is a no de dep ends on x 1: u and hene is diult to v erify diretly . F ortunately , in man y ases x u is guaran teed to b e a no de based on sev eral preeding observ ations x u − m : u , 1 ≤ m < u , ignoring the rest. Sp eially , supp ose for example that x ∈ X is su h that p ia f a ( x ) p aj > p ib f b ( x ) p bj , ∀ i, j ∈ S. (6) It is easy to  he k that for an y u ≥ 2 , x u = x is a strong a -no de for an y x 1: u − 1 . Hene, if x 1: ∞ on tains innitely man y observ ations satisfying (6), then x 1: ∞ also on tains innitely man y strong no des. This previous ondition in its turn is met pro vided λ ( { x ∈ X : p ia f a ( x ) p aj > p ib f b ( x ) p bj , ∀ i, j ∈ S } ) > 0 . (7) Indeed, sine our underlying Mark o v  hain Y is ergo di, it is rather easy to see that X is ergo di as w ell (Ephraim and Merha v, 2002, Genon-Catalot et al., 2000, Leroux, 1992). Also, (7) implies that P a ( { x ∈ X : p ia f a ( x ) p aj > p ib f b ( x ) p bj , ∀ i, j ∈ S } ) > 0 . Th us, it follo ws from ergo diit y of X that almost ev ery realization of X has innitely man y elemen ts satisfying (6) and, hene innitely man y strong no des. W e ha v e th us pro v ed the follo wing Lemma. Lemma 2.1 Assume that (7) holds. Then almost every se quen e of obser- vations x 1: ∞ has innitely many str ong a -no des. (Clearly , in ter hanging a and b giv es the same results in terms of b -no des.) Lemma 2.1 is essen tially Theorem 1 in (Calieb e and Rösler, 2002) (disre- garding a misprin t in the statemen t). Condition (7) holds for man y t w o-state HMMs inluding the so-alled additiv e Gaussian noise mo del (Calieb e, 2006), where the emission distributions are Gaussian. Another trivial example is the mo del with unequal supp orts of P a and P b . Indeed, in that ase (7) holds (at least up to sw apping a and b ). Hene, the follo wing Corollary . Corollary 2.1 If the supp orts of P a and P b ar e not e qual, then almost every se quen e of observations has innitely many str ong no des. The go al of this work is essential ly to r emove  ondition (7) fr om L emma 2.1. T o this end, follo wing (Lem b er and K olo ydenk o, 2008), w e all an ob- serv ation satisfying (6) an a - b arrier of length 1. More generally , a blo  k of observ ations z 1: k ∈ X k is alled a (strong) barrier of length k ≥ 1 if for ev ery m ≥ 0 and x 1: m ∈ X m , z 1: k on tains a (strong) no de of realization 5 ( x 1: m , z 1: k ) . In (Lem b er and K olo ydenk o, 2008), w e pro v e the existene of innitely man y barriers for a v ery general lass of HMMs. F or the t w o-state HMMs, the onditions of our result in (Lem b er and K olo ydenk o, 2008) are giv en b y (8) and (9) b elo w. P a ( { x ∈ X : f a ( x ) max { p aa , p ba } > f b ( x ) max { p bb , p ab }} ) > 0 and (8) P b ( { x ∈ X : f b ( x ) max { p bb , p ab } > f a ( x ) max { p aa , p ba }} ) > 0 . (9) T o a hiev e our goal, w e will rst pro v e the same result for the t w o-state HMM under the relaxed assumption that (8) or (9) holds. As w e shall see b elo w (Lemma 3.1), in our t w o-state HMM one of these onditions is automatially satised and, moreo v er, all barriers are strong. Hene, o urrene of innitely man y strong barriers in this ase will b e sho wn (Theorem 4.1) to require no additional assumptions. Finally , if a no de is not strong and v ( x 1: n ) is not unique, an alignmen t migh t exist that do es not go through this no de. Su h t yp e of pathologies ause te hnial inon v enienes in dening an innite Viterbi alignmen t and are treated in (Lem b er and K olo ydenk o, 2008). F ortunately , unlik e in the general ase, in the ase of t w o-state HMMs almost ev ery realization has in- nitely man y strong no des (Theorem 4.1). This allo ws for a simple resolution of the non-uniqueness in the ase of t w o-state HMMs. 3 Main results 3.1 Three t yp es of the t w o-state HMM The follo wing three ases exhaust all the p ossibilities: 1. p aa > p ba ( ⇔ p bb > p ab ) ; 2. p aa < p ba ( ⇔ p bb < p ab ) ; 3. p aa = p ba ( ⇔ p bb = p ab ) . F rom the denition of no des, it follo ws that x u is not a no de only in one of the follo wing t w o ases: ( A )  δ u ( a ) p aa > δ u ( b ) p ba δ u ( b ) p bb > δ u ( a ) p ab or ( B )  δ u ( b ) p ba > δ u ( a ) p aa δ u ( a ) p ab > δ u ( b ) p bb Case (A) is equiv alen t to p bb p ab > δ u ( a ) δ u ( b ) > p ba p aa (10) and ase (B) is equiv alen t to p bb p ab < δ u ( a ) δ u ( b ) < p ba p aa . (11) 6 Th us, in ase (A), w e ha v e δ u +1 ( a ) = δ u ( a ) p aa f a ( x u +1 ) and δ u +1 ( b ) = δ u ( b ) p bb f b ( x u +1 ) , so that for an y n > u , the Viterbi alignmen t v ( x 1: n ) m ust satisfy v ( x 1: n ) u = v ( x 1: n ) u +1 . Similarly , in ase (B) δ u +1 ( a ) = δ u ( b ) p ba f a ( x u +1 ) and δ u +1 ( b ) = δ u ( a ) p ab f b ( x u +1 ) , i.e. v ( x 1: n ) u 6 = v ( x 1: n ) u +1 . Eviden tly , ase 1 and ase (B) are m utually exlusiv e, and so are ase 2 and ase (A). There- fore, if the transition matrix satises the onditions of ase 1, then x u is not a no de if and only if onditions (A) are fullled. This implies that in ase 1, no des are the only p ossibilit y for v ( x 1: n ) to  hange state. On the other hand, if the transition matrix satises the onditions of ase 2, then x u is not a no de if and only if (B) holds. Hene, in ase 2 no des are the only p ossibilit y for v ( x 1: n ) to remain in one state. Case 3 orresp onds to the mixture mo del (see Example 2.1 ab o v e). Apparen tly (4), ev ery observ ation is a no de in this ase (see also Figure 1 b elo w). Let us no w examine onditions (8) and (9). F rom equation (1), it follo ws that λ ( { x ∈ X : f a ( x ) > f b ( x ) } ) > 0 , λ ( { x ∈ X : f a ( x ) < f b ( x ) } ) > 0 (12) and, for an y α > β > 0 , λ ( { x ∈ X : αf a ( x ) > β f b ( x ) } ) > 0 ⇔ P a ( { x ∈ X : αf a ( x ) > β f b ( x ) } ) > 0 (13) λ ( { x ∈ X : αf b ( x ) > β f b ( x ) } ) > 0 ⇔ P b ( { x ∈ X : αf b ( y ) > β f b ( y ) } ) > 0 . (14) Therefore, w e ha v e the follo wing Lemma. Lemma 3.1 A ny two state HMM satises at le ast one of the  ondtions (8) and (9) . Pro of. In ase 1 , (8) and (9) are equiv alen t to P a ( { x ∈ X : f a ( x ) p aa > f b ( x ) p bb } ) = P a  x ∈ X : f b ( x ) p bb f a ( x ) p aa < 1  > 0 (15) P b ( { x ∈ X : f b ( x ) p bb > f a ( x ) p aa } ) = P b  x ∈ X : f a ( x ) p aa f b ( x ) p bb < 1  > 0 , (16) resp etiv ely . If p aa = p bb , then (12) implies that b oth (15) and (16) are satised, and hene b oth (8) and (9) hold. If p aa > p bb , then (15), and subsequen tly (8), follo w from (13). If p aa < p bb , then (16), and subsequen tly (9), follo w from (14). Hene, at least one of the assumptions (8), (9) is alw a ys guaran teed to hold. In ase 2 , (8) and (9) are equiv alen t to P a ( { x ∈ X : f a ( x ) p ba > f b ( x ) p ab ) } = P a  x ∈ X : f b ( x ) p ab f a ( x ) p ba < 1  > 0 (17) P b ( { x ∈ X : f b ( x ) p ab > f a ( x ) p ba ) } = P b  x ∈ X : f a ( x ) p ba f b ( x ) p ab < 1  > 0 , (18) 7 resp etiv ely . Again, if p aa = p bb , then (17) and (18) b oth hold without further assumptions. If p aa > p bb , then (17) is automatially satised. Lik ewise, (18) holds if p aa < p bb . Hene, one of the assumptions (8), (9) is alw a ys guaran teed to hold. In ase 3 , (8) and (9) write P a ( { x ∈ X : f a ( x ) π a > f b ( x ) π b } ) > 0 , (19) P b ( { x ∈ X : f b ( x ) π b > f a ( x ) π a } ) > 0 . (20) Assume π a ≥ π b . Then, (12) implies λ ( { x ∈ X : π a f a ( x ) > π b f b ( x ) } ) > 0 , whi h in turn implies (19). Finally , w e state and pro v e the main results for ea h of the three ases. ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❡ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦ ❦       ❅ ❅ ❅    ❅ ❅ ❅    ❅ ❅ ❅       ❅ ❅ ❅ ❅ ❅ ❅    ❅ ❅ ❅    b a b a a b Figure 1: Distint patterns of the Viterbi alignmen t in the t w o-state HMM: T op: Case 1, state an p ossibly  hange only at no des (larger irles). Middle: Case 2, states alw a ys alternate, exept p ossibly at no des. Bottom: Case 3, ev ery observ ation is a no de. 8 3.2 Case 1 First, note that ondition (7) in this ase is equiv alen t to λ ( { x ∈ X : p ba f a ( x ) p ab > p bb f b ( x ) p bb } ) > 0 , (21) As men tioned in 2, ondition (7) need not hold in general. Nonetheless, for the t w o-state HMM, w e ha v e the follo wing Lemma. Lemma 3.2 In  ase 1, almost every r e alization of the two-state HMM has innitely many str ong b arriers. Pro of. Without loss of generalit y , assume p aa ≥ p bb . Then (15) holds implying that there exists ǫ > 0 su h that P a ( X a ) > 0 , where X a :=  x ∈ X : f b ( x ) p bb f a ( x ) p aa < 1 − ǫ  . Let in teger k b e suien tly large for (1 − ǫ ) k < p ab p ba / ( p aa p bb ) to hold. Then ev ery sequene z 1: k ∈ X k a satises k Y j =1 f b ( z j ) p bb f a ( z j ) p aa < (1 − ǫ ) k < p ab p ba p aa p bb . (22) Let u > k b e arbitrary and let z 0: k ∈ X k +1 a b e the last k + 1 observ ations in a generi sequene x 1: u ∈ X u − k − 1 × X k +1 a . T o shorten the notation, w e write d j ( z i ) for δ u − k + i ( j ) for ev ery i = 0 , 1 , . . . , k , j = a, b . Next, w e sho w that x u − k : u on tains at least one strong no de, and onsequen tly , z 0: k is a strong barrier. Indeed, if none of the observ ations x u − k : u w ere a strong a -no de then w e w ould ha v e d b ( z k ) = d b ( z 0 ) k Y j =1 f b ( z j ) p bb . Similarly , if none among the observ ations x u − k +1: u w ere a strong b -no de, w e w ould ha v e δ u ( a ) ≥ δ u − k ( b ) p ba ( k Y j =1 f a ( z j )) p k − 1 aa . Hene, δ u ( b ) δ u ( a ) ≤ δ u − k ( b ) p bb ( Q k j =1 f b ( z j )) p k − 1 bb δ u − k ( b ) p ba ( Q k j =1 f a ( z j )) p k − 1 aa = Q k j =1 ( f b ( z j ) p bb ) Q k j =1 ( f a ( z j ) p aa ) p aa p ba and b y (22) δ u ( b ) δ u ( a ) < p ab p bb 9 that on tradits (10). Th us, at least one of x u − k : u m ust b e a strong no de. Sine P a ( X a ) > 0 , b y ergo diit y of HMM, almost ev ery realization has in- nitely man y barriers z 0: k ∈ X k +1 a , implying also that ev ery realization has innitely man y strong no des. The next Theorem renes the previous result. Theorem 3.1 Supp ose the (tr ansition matrix of the) two-state HMM me ets the  ondition of  ase 1. If p aa ≥ p bb , then almost every r e alization has in- nitely many str ong a -b arriers. (If p aa ≤ p bb , then almost every r e alization has innitely many str ong b -b arriers.) Pro of. Let p aa ≥ p bb and use the notation of the pro of of Lemma 3.2. First, w e sho w that none of the observ ations x k − u +1: u is a b -no de. Indeed, sine d b ( z 1 ) = max { d a ( z 0 ) p ab , d b ( z 0 ) p bb } f b ( z 1 ) , at least one of the follo wing t w o inequalities m ust hold: p ab f b ( z 1 ) p ba ≥ p aa f a ( z 1 ) p aa , p bb f b ( z 1 ) p ba ≥ p ba f a ( z 1 ) p aa (23) in order for x u − k +1 to b e a b -no de. Ho w ev er, (15) implies that p ba f a ( z 1 ) p aa > p bb f b ( z 1 ) p ba and, sine p bb > p ab , w e ha v e p bb f b ( z 1 ) p ba > p ab f b ( z 1 ) p ba . Hene, neither of the t w o inequalities (23) holds. Th us, x u − k +1 annot b e a b - no de, and the same argumen t sho ws that none of the subsequen t observ ations x u − k +2 , . . . , x u an b e a b -no de either. The argumen t of the pro of of Lemma 3.2 then sho ws that one of the observ ations in x u − k : u is a strong a -no de and therefore z 0: k is a strong a - barrier. The ergo di argumen t nishes the pro of. (The same argumen t with a and b sw app ed establishes the seond part of the Theorem.) Note that the ondition p bb ≥ p aa is suien t but not neessary for (16) to hold. In fat, for man y 2-state HMMs, su h as the one with additiv e white Gaussian noise, b oth (15) and (16) hold for an y (p ositiv e) v alues of p aa and p bb . On the other hand, it migh t happ en that one of the onditions (15) and (16), sa y (16) , fails. This w ould mean P b ( { x ∈ X : p bb f b ( x ) > p aa f a ( x ) } ) = 0 or, equiv alen tly , λ ( { x ∈ X : p bb f b ( x ) > p aa f a ( x ) } ) = 0 . (24) Corollary 3.1 In  ase 1, e quation (24) implies that almost ev ery se quen e of observations has innitely many str ong a -b arriers and no str ong b -no des. F urthermor e, e quation (24) in  ase 1 implies that for almost ev ery r e alization, if a b -no de do es o  ur, it o  urs b efor e the rst a -no de. Pro of. F rom the pro of of Theorem 3.1, it follo ws that no observ ation x ∈ X su h that p bb f b ( x ) ≤ p aa f a ( x ) (i.e. from the omplemen t of the set in (24)) 10 an b e a strong b -no de; a loser insp etion of the pro of atually sho ws that ev en a w eak (i.e. not strong) b -no de annot o ur after an a -no de (sine in ase 1 p bb > p ba ). Theorem 3.1 then implies that almost every sequene of observ ations has innitely man y strong a -barriers. Corollary 3.1 in its turn implies that starting with the rst strong a - no de on w ard, the Viterbi alignmen t v ( x 1: n ) sta ys in state a . As w e ha v e already men tioned, Viterbi alignmen ts need not b e unique (see (Lem b er and K olo ydenk o, 2008)), i.e. ties are p ossible in general, and in this ase, in partiular, they are p ossible up un til the rst strong a -no de. Ho w ev er, the imp ossibilit y of strong b -no des in this ase implies that the ties an b e brok en in fa v or of a , resulting in the onstan t all a alignmen t. Theorem 3.1 is a generalization of Theorem 7 in (Calieb e, 2006), whi h basially states that in ase 1, if (15) and (16) hold then under some additional assumptions (equal supp orts of P a and P b and further onditions A2), almost ev ery realization has innitely man y no des. Th us, (Calieb e, 2006) stops short of realizing that in ase 1 onditions (15) and (16) alone ensure the existene of a − and b -no des. This results in (Calieb e, 2006) in v oking Theorem 2 of (Calieb e and Rösler, 2002) to pro v e the existene of no des, hene sup eruous assumptions A1, A2. Also the pro of of Theorem 7 in (Calieb e and Rösler, 2002) ould b e simplied and shortened with the help of the notions of no des and barriers. Finally , Corollary 3.1 generalizes Theorems 8 and 9 of (Calieb e, 2006). 3.3 Case 2 Reall that w e ha v e b een pro ving the existene of barriers without ondition (7). Note that in ase 2, ondition (7) b eomes λ ( { x ∈ X : p aa f a ( x ) p aa > p ab f b ( x ) p ba } ) > 0 . Reall (2) also that in ter hanging a with b giv es a similar ondition for strong b -no des to o ur innitely often in almost every realization. It follo ws from (12) that for some ǫ > 0 , the sets X a := { x ∈ X : f a ( x )(1 − ǫ ) > f b ( x ) } , X b := { x ∈ X : f a ( x ) < f b ( x )(1 − ǫ ) } b oth ha v e p ositiv e λ -measure. Hene P a ( X a ) > 0 and P b ( X b ) > 0 . Then, for x 1:2 ∈ X a × X b , the follo wing holds: f b ( x 1 ) f a ( x 2 ) f a ( x 1 ) f b ( x 2 ) < (1 − ǫ ) 2 . (25) Lemma 3.3 In  ase 2, almost every r e alization has innitely many str ong b arriers. 11 Pro of. Let X a and X b b e as ab o v e. Cho ose k suien tly large for (1 − ǫ ) 2 k < p aa p bb p ba p ab to hold. Next, onsider a sequene z 0:2 k ∈ X 2 k +1 , where z 0 , z 2 i ∈ X a , z 2 i − 1 ∈ X b , for ev ery i = 1 , . . . , k . W e sho w that for ev ery u > 2 k , ev ery sequene of observ ations x 1: u ∈ X u su h that x u − 2 k : u = z 0:2 k , on tains a strong no de, making z 0:2 k a strong barrier. The  hoie of k and z 0:2 k implies Q k i =1 p ba f a ( z 2 i − 1 ) p ab f b ( z 2 i ) Q k i =1 p ab f b ( z 2 i − 1 ) p ba f a ( z 2 i ) < (1 − ǫ ) 2 k < p bb p aa p ba p ab . (26) If there is no strong no de among x u − 2 k : u , then d b ( z 2 k ) = d b ( z 0 ) k Y i =1 p ba f a ( z 2 i − 1 ) p ab f b ( z 2 i ) and d a ( z 2 k ) ≥ d b ( z 0 ) p bb p ab k Y i =1 p ab f b ( z 2 i − 1 ) p ba f a ( z 2 i ) . Hene, b y (26) d b ( z 2 k ) d a ( z 2 k ) ≤ Q k i =1 p ba f a ( z 2 i − 1 ) p ab f b ( z 2 i ) p bb p ab Q k i =1 p ab f b ( z 2 i − 1 ) p ba f a ( z 2 i ) < p aa p ba whi h on tradits (11). Next, w e rene this result. Without loss of generalit y assume p ba ≥ p ab . Therefore p ab p aa ≥ p ba p bb , (27) and also, for ev ery x ∈ X a , p ba f a ( x ) > p ab f b ( x ) . (28) Hene, (17) holds. W e m ultiply the righ t side of (28) b y p ba p bb and the left side b y p ab p aa , and use (27) to obtain f a ( x ) p aa > f b ( x ) p bb . (29) Finally , for x ∈ X b , w e ha v e f a ( x ) < f b ( x ) . (30) W e will need the follo wing Lemma. 12 Lemma 3.4 Assume (in addition to b eing in  ase 2) that p ab ≤ p ba . a) In any p air of observations z 1:2 ∈ X a × X b , z 1 is not a b -no de. b) In any p air of observations z 2:3 ∈ X b × X a , if z 2 is a b -no de, then z 3 is a str ong a -no de. Pro of. Assume that p ab ≤ p ba , and onsider a ) . First note that sine w e are in ase 2, z 1 is a b -no de if and only if d b ( z 1 ) p bb ≥ d a ( z 1 ) p ab . (31) Supp ose rst that z 0 is not a no de, in whi h ase d b ( z 1 ) = d a ( z 0 ) p ab f b ( z 1 ) and d a ( z 1 ) = d b ( z 0 ) p ba f a ( z 1 ) . Then d a ( z 1 ) p ab = d b ( z 0 ) p ba f a ( z 1 ) p ab ≥ d a ( z 0 ) p aa f a ( z 1 ) p ab > d a ( z 0 ) p bb f b ( z 1 ) p ab = d a ( z 0 ) p ab f b ( z 1 ) p bb = d b ( z 1 ) p bb . The rst inequalit y ab o v e follo ws from the reursion prop ert y (3) of sores δ , whereas the seond one follo ws from (29) . Th us, when z 0 is not a no de, z 1 annot b e a b -no de. Similarly , supp osing that z 0 is an a -no de, w e obtain that z 1 is not a b -no de. Supp ose nally that z 0 is a b -no de. Then d b ( z 1 ) = d b ( z 0 ) p bb f b ( z 1 ) and d a ( z 1 ) = d b ( z 0 ) p ba f a ( z 1 ) . Applying onseutiv ely p bb < p ab , (28) and p bb < p ab again, w e obtain: p bb f b ( z 1 ) p bb < p ab f b ( z 1 ) p bb ≤ p ba f a ( z 1 ) p bb < p ba f a ( z 1 ) p ab . Th us, on trary to (31) d b ( z 1 ) p bb = d b ( z 0 ) p bb f b ( z 1 ) p bb < d b ( z 0 ) p ba f a ( z 1 ) p ab = d a ( z 1 ) p ab , that is, z 1 is not a b -no de in this ase either. Let us no w pro v e b) . If z 2 is a b -no de, then d a ( z 3 ) = d b ( z 2 ) p ba f a ( z 3 ) and d b ( z 3 ) = d b ( z 2 ) p bb f b ( z 3 ) . By (29), w e no w ha v e d a ( z 3 ) p aa = d b ( z 2 ) p ba f a ( z 3 ) p aa > d b ( z 2 ) p bb f b ( z 3 ) p ba = d b ( z 3 ) p ba . Similarly to the argumen t regarding b -no des guaran teed b y (31) ab o v e, w e no w ha v e d a ( z 3 ) > d b ( z 3 ) , implying d a ( z 3 ) p ab > d b ( z 3 ) p bb . Th us z 3 is a strong a -no de. Theorem 3.2 If p ba ≥ p ab , then almost every r e alization has innitely many str ong a -no des. If p ba ≤ p ab , then almost every r e alization has innitely many str ong b -no des. Pro of. Assume again that p ba ≥ p ab . Let z 0:2 k b e as in the pro of of Lemma 3.3 and atta h one more elemen t z 2 k +1 ∈ X b to the end. Th us, z 2 i ∈ X a and z 2 i +1 ∈ X b , i = 0 , 1 , . . . , k . F rom (the pro of of ) Lemma 3.3 w e kno w that z 0:2 k on tains at least one strong no de. If this is an a -no de, then the theorem is pro v en. Otherwise this is a b -no de, whi h, aording to part a) of Lemma (3.4), an only b e among z 1 , z 3 , . . . , z 2 k − 1 . Applying part b) of Lemma (3.4) sho ws that there m ust also b e a strong a -no de z 2 , z 4 , . . . , z 2 k . In v oking ergo diit y again nishes the pro of. 13 Clearly , sw apping a and b in the ab o v e disussion follo wing the pro of of Lemma 3.3, establishes the other part of the theorem. Inequalit y (27) guaran tees (17) . Often, the mo del is su h that in ad- dition to (17) , (18) also holds. Ho w ev er, to apply the previous pro of (i.e. of Theorem 3.2) to guaran tee the sim ultaneous existene of innitely man y strong a and b -no des, w e w ould need the follo wing oun terpart of (29) : P b ( { x ∈ X : f b ( x ) p ab > f a ( x ) p ba , f b ( x ) p bb > f a ( x ) p aa } ) > 0 , whi h is stronger than (18). Ho w ev er, this previous ondition is indeed often met, resulting in innitely man y strong a - and b -no des (in almost every realiza- tion x 1: ∞ ). Lemma 3.3 app ears without pro of as Theorem 10 in (Calieb e, 2006). The author of (Calieb e, 2006) atually suggests that Theorem 10 and other re- sults for ase 2 are analogous to the orresp onding results for ase 1, mainly Theorem 7 (of the same w ork). It is further stated in (Calieb e, 2006) that the pro ofs of those results are not giv en as they are v ery similar to the or- resp onding pro ofs in ase 1. Our presen t w orkings atually sho w that ase 2 is quite dissimilar to ase 1 (due to the utuating nature of the t ypial Viterbi alignmen t) and in partiular requires a more areful treatmen t. Note that, ev en if Theorem 10 in (Calieb e, 2006) assumed (8) and (9) (as Theorem 7 in (Calieb e, 2006) do es) to help one to pro v e this Theorem b y analogy to Theorem 7, it is still not lear ho w the t w o pro ofs ould b e v ery similar. 3.3.1 Case 3 (the mixture mo del) Reall that ev ery observ ation in this ase is a (not neessarily strong) no de. F urthermore, ev ery observ ation from { x ∈ X : π a f a ( x ) > π b f b ( x ) } is a strong a no de. Th us, w e ha v e the follo wing oun terpart of Theorems 3.1 and 3.2. Theorem 3.3 If π a ≥ π b , then almost every r e alization has innitely many str ong a -no des. If π a ≤ π b , then almost every r e alization has innitely many str ong b -no des. 4 Conlusion In summary , w e ha v e pro v ed Theorem 4.1 stated b elo w and pro viding a basis for the pieewise onstrution and asymptoti analysis of the Viterbi align- men ts of t w o-state HMMs. Theorem 4.1 A lmost every r e alization of the two-state HMM has innitely many str ong b arriers. F urthermor e a) if the tr ansition pr ob abilities satisfy p aa ≥ p ba then (almost every r e aliza- tion of ) the hain has innitely many str ong s -b arriers wher e s is suh that p ss = max { p aa , p bb } , 14 b) otherwise (i.e. if p aa < p ba ) (almost every r e alization of ) the hain has innitely many str ong s -b arriers wher e s is suh that p ts = max { p ab , p ba } (for some t ∈ S ). Referenes Baum, L. E., P etrie, T., 1966. Statistial inferene for probabilisti funtions of nite state Mark o v  hains. Ann. Math. Statist. 37, 15541563. Bilmes, J., 1998. A gen tle tutorial of the EM algorithm and its appliation to parameter estimation for Gaussian mixture and hidden Mark o v mo dels. T e h. Rep. 97021, In ternational Computer Siene Institute, Berk eley , CA, USA. Calieb e, A., 2006. Prop erties of the maxim um a p osteriori path estimator in hidden Mark o v mo dels. IEEE T rans. Inform. Theory 52 (1), 4151. Calieb e, A., Rösler, U., 2002. Con v ergene of the maxim um a p osteriori path estimator in hidden Mark o v mo dels. IEEE T rans. Inform. Theory 48 (7), 17501758. Cappé, O., Moulines, E., Rydén, T., 2005. Inferene in hidden Mark o v mo d- els. Springer Series in Statistis. Springer, New Y ork, with Randal Dou's on tributions to Chapter 9 and Christian P . Rob ert's to Chapters 6, 7 and 13, With Chapter 14 b y Gersende F ort, Philipp e Soulier and Moulines, and Chapter 15 b y Stéphane Bou heron and Elisab eth Gassiat. Durbin, R., Eddy , S., A., K., Mit hison, G., 1998. Biologial Sequene Anal- ysis: Probabilisti Mo dels of Proteins and Nulei Aids. Cam bridge Uni- v ersit y Press. Eddy , S., 2004. What is a hidden Mark o v mo del? Nature Biote hnology 22 (10), 1315  1316. Ephraim, Y., Merha v, N., 2002. Hidden Mark o v pro esses. IEEE T rans. In- form. Theory 48 (6), 15181569, sp eial issue on Shannon theory: p ersp e- tiv e, trends, and appliations. Genon-Catalot, V., Jean theau, T., Larédo, C., 2000. Sto  hasti v olatilit y mo dels as hidden Mark o v mo dels and statistial appliations. Bernoulli 6 (6), 10511079. Huang, X., Ariki, Y., Ja k, M., 1990. Hidden Mark o v mo dels for sp ee h reognition. Edin burgh Univ ersit y Press, Edin burgh, UK. Jelinek, F., 1976. Con tin uous sp ee h reognition b y statistial metho ds. Pro . IEEE 64, 532556. 15 Jelinek, F., 2001. Statistial metho ds for sp ee h reognition. The MIT Press, Cam bridge, MA, USA. Ji, G., Bilmes, J., 2006. Ba k o mo del training using partially observ ed data: Appliation to dialog at tagging. In: Pro . Human Language T e hn. Conf. NAA CL, Main Conferene. Asso iation for Computational Linguis- tis, New Y ork Cit y , USA, pp. 280287. URL http://www.alweb .o rg /an th ol ogy /N /N 06/ N0 6- 1 03 6 Juang, B.-H., Rabiner, L., 1990. The segmen tal K-means algorithm for esti- mating parameters of hidden Mark o v mo dels. IEEE T rans. A oust. Sp ee h Signal Pro . 38 (9), 16391641. K olo ydenk o, A., Käärik, M., Lem b er, J., 2007. On adjusted Viterbi training. A ta Appl. Math. 96 (1-3), 309326. Krogh, A., 1998. Computational Metho ds in Moleular Biology . Elsevier Si- ene, Ch. An In tro dution to Hidden Mark o v Mo dels for Biologial Se- quenes. Lem b er, J., K olo ydenk o, A., 2007. A djusted Viterbi training: A pro of of onept. Probab. Eng. Inf. Si. 21 (3), 451475. Lem b er, J., K olo ydenk o, A., 2008. The Adjusted Viterbi training for hidden Mark o v mo dels. Bernoulli 14 (1), 180206. Leroux, B. G., 1992. Maxim um-lik eliho o d estimation for hidden Mark o v mo d- els. Sto  hasti Pro ess. Appl. 40 (1), 127143. Lomsadze, A., T er-Ho vhannisy an, V., Cherno, V., Boro do vsky , M., 2005. Gene iden tiation in no v el euk ary oti genomes b y self-training algorithm. Nulei A ids Res. 33 (20), 64946506. MDermott, E., Hazen, T., 2004. Minim um lassiation error training of landmark mo dels for real-time on tin uous sp ee h reognition. In: Pro . ICASSP . Ney , H., Stein biss, V., Haeb-Um ba h, R., et. al. , 1994. An o v erview of the Philips resear h system for large v o abulary on tin uous sp ee h reognition. In t. J. P attern Reognit. Artif. In tell. 8 (1), 3370. O h, F., Ney , H., 2000. Impro v ed statistial alignmen t mo dels. In: Pro . 38th Ann. Meet. Asso . Comput. Linguist. Asso . Comput. Linguist., pp. 440  447. P admanabhan, M., Pi hen y , M., 2002. Large-v o abulary sp ee h reognition algorithms. Computer 35 (4), 42  50. Rabiner, L., Juang, B., 1993. F undamen tals of sp ee h reognition. Pren tie- Hall, In., Upp er Saddle Riv er, NJ, USA. 16 Rabiner, L., Wilp on, J., Juang, B., 1986. A segmen tal K-means training pro edure for onneted w ord reognition. A T&T Te h. J. 64 (3), 2140. Sh u, I., Hetherington, L., Glass, J., 2003. Baum-Wel h training for segmen t-based sp ee h reognition. In: Pro . IEEE ASR U W orkshop. St. Thomas, U. S. Virgin Islands, h ttp://groups.sail.mit.edu/sls/publiations/2003/ASR U_Sh u.p df, pp. 4348. Stein biss, V., Ney , H., Aub ert, X., et. al. , 1995. The Philips resear h system for on tin uous-sp ee h reognition. Philips J. Res. 49, 317352. Ström, N., Hetherington, L., Hazen, T., Sandness, E., Glass, J., 1999. A ous- ti mo deling impro v emen ts in a segmen t-based sp ee h reognizer. In: Pro . IEEE ASR U W orkshop. pp. 139142. 17

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment