Entropy of Hidden Markov Processes via Cycle Expansion

En trop y of Hidden Mark o v Pro cesses via Cycle Expansion. Armen E. Allah verdy an Y er evan Physics Institute, Alikhanian Br other s Str e et 2, Y er evan 375036, A rmenia (Dated: Nov ember 6, 2018) Hidden Marko v Pro cess es (HMP) is one of t he basic tools of the mo dern probabilistic mo deling. The c haracterization of their en tropy remains how ever an op en problem. Here the entrop y of HMP is calculated via the cycle expansion of the zeta-function, a metho d adopt ed from the theory of dynamical sy stems. F or a cla ss of H MP this meth od pro duces exact results b oth for th e entrop y and the moment-generating function. The latter allow s to estimate, via the Chernoﬀ b ound , the probabilities of large deviations for the HMP . More generally , the metho d oﬀers a representation of the moment-generating function and of the entrop y via conv ergen t series. P ACS n um bers: 89.70.Cf, 05.20.-y I. INTRO DUCTION. Hidden Ma rko v Pro ces ses (HMP) are gener ated by a Markov pro cess observed via a memor y-less no isy channel. They are widely employ ed in v arious area s of probabilistic mo deling [1–4]: information theory , signal pro cessing, bioinformatics, ma thematical economics, linguis tics, etc . One of the main reaso ns for these numerous applications is that HMP present simple a nd ﬂexible models fo r a history-dep endent ra ndom pr o cess. This is in contrast to the Marko v pro cess, where the history is irrelev a nt, since the future of the pro ce s s dep ends on its present state only . Much attention w as devoted to the entropy of HMP [5–17]. It characterizes the information cont ent (minimal nu mber of bits needed for a reliable enco ding) of HMP viewed as a probabilis tic source of information. More speciﬁc a lly , the rea lizations gene r ated in the long r un of a random erg o dic pro cess , e.g. HMP , are divided into t wo sets [6, 8]. The ﬁrst (typical) set is the smallest set o f re alizations with the overall pro bability close to one. The rest of r ealizations are contained in the seco nd, low-probabilit y set. Now the entropy c hara c ter izes the num ber o f elements in the typical set [6, 8]. W hen HMP is employed as a mo del for information transmission ov er a noisy c hannel, the en tropy is still impo r tant, since it is the basic non- tr ivial compo nent of the channel ca pacity (other co mpo nent s needed for reconstructing the c hannel capacity are normally easier to calculate and characterize ). How ev er, there is no direct formula for the entrop y of HMP , in contrast to the Mar kov case where such a formula is well-kno wn [5, 6, 8]. Th us p eople studied the entrop y v ia expansio ns around v a rious limiting ca ses, or via upp er and low er bo unds [6, 10–17]. The r e is als o a genera l forma lism that ex presses the e ntropy of HMP via the so lution of an in tegral equation [7–9]. This formalism is howev er relatively diﬃcult to apply in practice. Once the entropy characterizes the n um ber of typical long-run realiza tions, it is of interest to estimate the pro bability of at ypical realizations. These es timates are sta ndardly giv en via the moment-generating function of the ra ndom pro cess [6, 8]. The k nowledge o f this function also allows to r e c onstruct the entropy [6, 8]. This paper pr esents a metho d for calc ulating the momen t-generating function of HMP . The method is adopted fro m the theor y of c haotic dynamical systems, where it is k nown as the cycle expa ns ion of the zeta- function [25, 2 7]. W e show that in a certain class o f HMP one can obtain exa ct expressions for the moment-generating function and for the ent ropy . F or other cases the metho d oﬀers analytic approximations of the moment-generating function via convergen t power series. W e a ttempted to make this pap er s elf-contained and org a nized it as follows. Section II deﬁnes the HMP , settles some notations, and recalls ho w to expr ess the probabilities of HMP via a random matrix pro duct. In section II I w e brieﬂy review the main facts ab out the en tropy of an ergo dic pro cess and the co rresp onding t ypical (highly pro ba ble) set of r ealizations. The main pur po se of section IV is to relate the en tropy of HMP to the spectral radius of the corres p o nding random matrix pro duct. This is done via the Lyapuno v exponent of the rando m matrix pro duct. Section V discusses the momen t-genera ting function of HMP . This function is employ ed (via Cher no ﬀ b ounds ) for characterizing the atypical (improbable) realizations of HMP . Section VI shows how to calculate the entropy a nd the genera ting function via the z e ta-function and the p e rio dic or bit expansion. Section VII discusses o ne o f the simplest examples o f HMP a nd presents ex act expressions for its entrop y and the moment-generating function. Her e we also apply the moment -gener ating function for es timating at ypical realizations o f the HMP . Section VII I studies another p opular mo del for HMP , binar y symmetric HMP . It is shown that the presented approa ch repr o duces known approximate results and pr edicts several new ones. The last section shortly summarizes the obtained results. Some issues, whic h are either too technical o r too gener a l for the present purp oses, are discussed in Appe ndice s . 2 II. DEFINITION OF THE HIDDEN MARKO V PRO CESS. In this section w e r ecall the deﬁnition of the Hidden Mar ko v Pro cess (HMP); see [1 , 2 ] for reviews. Let a dis crete-time random pr o cess S = {S 0 , S 1 , S 2 , ... } b e Ma rko vian, with time- independent conditional probability Pr[ S k = s k |S k − 1 = s k − 1 ] = Pr[ S k + l = s k |S k − 1+ l = s k − 1 ] = p ( s k | s k − 1 ) , (1) where l is an in teger. Each realization s of the random v ariable S takes v alues s = 1 , ..., L . The joint probability of the Marko v pro cess reads Pr[ S N = s N , ..., S 0 = s 0 ] = p ( s N | s N − 1 ) . . . p ( s 1 | s 0 ) p ( s 0 ) = 1 Y k = N p ( s k | s k − 1 ) p ( s 0 ) , (2) where p ( s 0 ) is the initial probability . The conditiona l pr obabilities p ( s k | s k − 1 ) deﬁne the L × L transition matrix P : P s k s k − 1 = p ( s k | s k − 1 ) . (3) W e assume that the Marko v pro cess S is mixing [18]: it has a unique sta tionary distribution p st ( s ), L X s ′ =1 p ( s | s ′ ) p st ( s ′ ) = p st ( s ) , (4) that is established fr om any initial probability in the long time limit. The transition matrix P ha s alwa ys one eigenv alue equal to 1 [since P has a left eigenv ector (1 , ..., 1 )], and the mo dules [a bsolute v a lues ] of all other eigenv alues are not larger than one 1 . The mixing fea tur e ho wev er demands that the e ig env a lue equal to 1 is non- degenerate a nd the mo dules of all other eigenv a lues are smaller than 1 [18]. A suﬃcient c o ndition for mixing is that a ll the conditional probabilities p ( s i +1 | s i ) o f the Ma rko v pro cess are po sitive [18] 2 . T aking p ( s ) = p st ( s ) in (2 ) ma kes the pro cess S stationary . Let random v ariables X i , with realizations x i = 1 , .., M , be noisy observ ations of S i : the (time-in v ar iant) conditional probability of o bserving X i = x i given the realization S i = s i of the Marko v pro cess is π ( x k | s k ). The joint probability of the original pro cess a nd its nois y o bserv ations reads P ( s N , . . . , s 0 ; x N , . . . , x 1 ) = 1 Y k = N π ( x k | s k ) p ( s k | s k − 1 ) p st ( s 0 ) (5) = T s N s N − 1 ( x N ) ...T s 1 s 0 ( x 1 ) p st ( s 0 ) , (6) where the L × L transfer-ma trix T ( x ) with matr ix ele men ts T s i s i − 1 ( x ) is de ﬁned a s T s i s i − 1 ( x ) = π ( x | s i ) p ( s i | s i − 1 ) . (7) Thu s X = {X 1 , X 2 , ... } , called hidden Mar ko v pro cess, results from observing the Ma rko v pro cess S through a memory-less pro cess with the co nditional probability π ( x | s ). The comp osite pro cess S X is Marko vian as well. The probabilities for the pro cess X a re represented via the transfer matrix pro duct (similar repres ent ation were employ ed in [11, 12]) P ( x N . .. 1 ) = h un | T ( x N . .. 1 ) | st i , (8) T ( x N . .. 1 ) ≡ 1 Y k = N T ( x k ) , (9) x N . .. 1 ≡ ( x N , ..., x 1 ) , (10) 1 Indeed, P k P ik x k = ν x i implies | P k P ik x k | ≤ P k P ik | x k | = | ν || x i | , which then leads to | ν | ≤ 1. 2 W eaker suﬃcien t conditions for mixing are that i) f or an y ( i, j ) there exists a positive in teger m ij suc h that ( P m ij ) ij > 0, i.e., for some pow er of the matrix its entries are positive, and ii) P has at least one positive diago nal elemen t [18]. If we do assume the ﬁrst condition, but do not assume the second one, the eigen v alue 1 of P is [algebraically and th us geometrically] non-degen erate, and i s not small er than the absolute v alues of all other eigenv alues [18] . The corresponding [unique] eigenv ecto r has strictly posi tiv e comp onen ts. How ev er, it may b e that the mo dule of some other eigen v al ue(s) is equal to 1 th us preven ting the prop er mixing, but still all owing for ergodicity due to condition i) . 3 where we use d the bra(c)ket notations: | st i is the column vector with elemen ts p st ( k ), k = 1 , ..., L , and h un | = (1 , ..., 1). The HMP deﬁned b y (8) is (in gener al) not a Mar ko v pro ces s, i.e., its probabilities do not factorize a s in (2). Th us the history of the pro cess can b eco me r elev ant. This is the underly ing rea son for widespr e ad applicatio ns o f HMP . The pro cess X is s tationary due to the stationa rity o f S : Pr[ X N + l = x N , ..., X l +1 = x 1 ] = Pr[ X N = x N , ..., X 1 = x 1 ] = P ( x N , ..., x 1 ) , (11) where l is a p o sitive integer. In addition, X inherits the mixing fea ture from the underlying Mar ko v pro ces s S [2], b eca use the o bserv ation pro cess by itself is memor yless: π ( x k | s k ) = π ( x k | s k , s k − 1 , ..., s 0 ). (The ge ne r al deﬁnitions of ergo dicit y and mixing are reminded below.) A. Notations for the eigenv alues and singular v al ues. F or future pur po ses we concre tis e s ome notatio ns. F or a matrix A , let l 0 [ A ] , l 1 [ A ] , .... b e the modules of its eige n- v alues. W e order l k [ A ] as λ [ A ] ≡ l 0 [ A ] ≥ l 1 [ A ] ≥ . . . , (12) λ [ A ] is called the spe ctral radius of A [18]. If A has non-nega tive matrix elements, the s pec tr al radius is an eig env alue by itself [18]. Here are tw o obvious features of the function λ ( d is a p ositive integer): λ [ A d ] = ( λ [ A ]) d , (13) λ [ AB ] = λ [ B A ] , (14) where (14) follows fro m the fa c t that AB and B A hav e identical eig e n v a lues: AB | ψ i = ν | ψ i implies B A ( B | ψ i ) = ν B | ψ i . Let A † be the co mplex conjuga te of A . The singular v alues σ k [ A ] ≥ 0 for a matrix A are the eigenv alues o f a her mitean matrix √ AA † or, equiv alently , of √ A † A ; see Appendix A for a brief reminder on the features o f the singular v alues. W e order σ k [ A ] as σ 0 [ A ] ≥ σ 1 [ A ] ≥ .... (15) II I. ENTROPY AND T Y PICAL SET OF ERG ODIC PROCESSES. The N -blo ck entropy of a s ta tionary [not necessarily Hidden Ma rko v] ra ndo m pro cess X is deﬁned as [5, 6 , 8] H ( N ) = H ( X 1 , ..., X N ) ≡ − X x N ... 1 P ( x N . .. 1 ) ln P ( x N . .. 1 ) , (16) where the probability P ( x N . .. 1 ) is given as in (8), and where x N . .. 1 is deﬁned in (10). V ario us features of H ( N ) and of several related quant ities are discussed in Appendix B. Using (16) one now deﬁnes the entrop y (rate) o f the random pro cess X as [5, 6, 8] h = lim N →∞ H ( N ) N . (17) Alternative representations of h are reca lled in Appendix B. In particular, h is the unce r taint y [p e r unit of time] of the random pro cess g iven its long his tory . F or ergo dic pro cesses the above deﬁnition of en tropy can be related to a single, long sequence of realizations [5 , 6, 8]. First of all let us re c all that the pro cess X is erg o dic if it satisﬁes to the weak law of large num bers (time av erage is equal to the space av erag e): for any function f with a ﬁnite expec ta tion v alue ¯ f ≡ P x k ,...,x 0 f [ x k , ..., x 0 ] P ( x k , ..., x 0 ), we hav e pro bability-one co n vergence for N → ∞ [5 , 6, 8]: 1 N N − 1 X n =0 f [ X n + k , ..., X n ] → ¯ f , (1 8) 4 i.e., for any po sitive num b er s ε and δ , there is such an integer N ( ε, δ ) that for all N > N ( ε, δ ), Pr "      1 N N − 1 X n =0 f [ X n + k , ..., X n ] − ¯ f      ≥ ε # ≤ δ. (19) Several alternative deﬁnitions of ergo dicit y a r e discussed in [3 5] 3 . Now the McMillan lemma sta tes that for an ergo dic pro cess the en tropy (17) characterizes individual realizations in the sense o f pr o bability-one conv ergence for N → ∞ [5, 6 , 8] 4 : − 1 N ln P ( x N . .. 1 ) → h or Pr      − 1 N ln P ( x N . .. 1 ) − h     ≤ ε  ≥ 1 − δ. (20) Based on (20) one deﬁnes the typical set Ω ∗ N ( ε ) as the s et o f all x N . .. 1 , which satisfy to h − ε ≤ − 1 N ln P ( x N . .. 1 ) ≤ h + ε. (21) Now (20) implies that Pr[ x N . .. 1 ∈ Ω ∗ N ( ε )] ≥ 1 − δ , i.e., the ov erall pro ba bilit y of Ω ∗ N ( ε ) conv erges to one in the limit N → ∞ . Since all elements in Ω ∗ N ( ε ) ha ve appr oximately equal probabilities , the num ber of elements | Ω ∗ N ( ε ) | in Ω ∗ N ( ε ) scales a s e N h . More precisely , this num ber is estimated from (20, 21) as [5] (1 − δ ) e N ( h − ε ) ≤ | Ω ∗ N ( ε ) | ≤ e N ( h + ε ) . (22) Relations similar to (2 1) will b e freq uently written a s P ( x N . .. 1 ) ≃ e − N h for x N . .. 1 ∈ Ω ∗ N , (23) meaning that the pr e cise sense of the asymptotic relation ≃ for N → ∞ ca n be clariﬁed upon introducing pro per ǫ and δ . IV. L Y APUNOV EXPONENTS AND ENT R OPY. The purp ose of this section is to establish rela tion (2 9) be tw een the entrop y o f a Hidden Marko v Pr o cess, a nd the sp ectral ra dius of the ass o ciated random matrix pro duct (8). The reader may skip this sectio n, if this relation is taken granted. A. Singular v alues of the random-matrix product. The actual calculation of the entropy h for non-Markov pro cess e s mee ts (in general) considerable diﬃculties. (F or Marko v pro cesses deﬁnition (17) a pplies dir ectly leading to the well-kno wn formula for the en tropy [5].) The ﬁrst step in calculating the entropy h for a Hidden Markov Pro cess (HMP) is to relate h to the larg e - N behaviour of the L × L matrix T ( x N . .. 1 ), which deﬁnes the probability of HMP; see (8, 9). Recall that T ( x N . .. 1 ) is a function o f the random pro cess X . Assume that i) X is stationary , as is the case after (11). ii) The average loga rithm of the maxima l singular v alue of T ( x ) is ﬁnite: h ln σ 0 [ T ( x )] i < ∞ . iii) X is ergo dic. Then the s ubadditive e rgo dic theo rem applies claiming for N → ∞ the proba bility-one conv ergence [19, 20]: − 1 N ln σ k [ T ( x N . .. 1 )] → µ k , k = 0 , . . . , L − 1 , (24) 3 One such deﬁnition i s worth m en tioning: X is ergo dic if f or an y k , m and s : lim N →∞ 1 N P N − 1 n =0 Pr[ X n + k = x k , ..., X n = x 0 , X m + s = y m , .. ., X s = y 0 ] = P ( x k , ..., x 0 ) P ( y m , .. ., y 0 ). This deﬁnition admits a s traigh tforward and imp ortant generalization. X is called mixing if the ab ov e relation holds without the time-av eraging 1 N P N − 1 n =0 , but in the limit n → ∞ . 4 The McMillan lemma cont ains tw o essential steps [5]. First i s to realize that although the deﬁnition (1 8) of ergodicity does not apply directly to 1 N ln P ( x N ... 1 ), i t do es apply to the pr obabilit y Q m ( x N ... 1 ) = P ( x 1 , .. ., x m ) Q N − m i =1 P ( x m + i | x m + i − 1 , .. ., x i ), which deﬁnes an approximat ion of the original ergo dic pro cess by a m -order Mar k ov pro cess. In the second step using a chain of inequalities Pr[ | ln x | ≥ nε ] ≤ 1 nε | ln x | ≤ 1 nε (2 x − ln x ), one prov es that for any stationary [not necessarily er godic] pro cess Q m ( x N ... 1 ) i s indeed a goo d approximation in the sense of 1 N ln Q m ( x N ... 1 ) P ( x N ... 1 ) ≃ 0 for N ≫ m → ∞ . 5 where σ k [ T ( x N . .. 1 )] a r e the singular v alues of T ( x N . .. 1 ) (see section II A for no tations), a nd where µ k are c a lled Lyapuno v exp onents. Acco rding to (15) they are or dered as µ 0 ≤ µ 1 ≤ ... . Using the deﬁnition (21) of the typical set, (24) can b e written as a n asymptotic relation σ k [ T ( x N . .. 1 )] ≃ e − N µ k for x N . .. 1 ∈ Ω N and suﬃcien tly large N [21]. Mor eov er, employing the singular v alue deco mpo sition [see App endix A], one represents T ( x N . .. 1 ) for N → ∞ and x N . .. 1 ∈ Ω ∗ N as T ( x N . .. 1 ) ≃ diag  e − N µ 0 , . . . , e − N µ L − 1  U ( x ) , (25) where diag [ a, . . . , b ] is a dia g onal matrix with entries a, . . . , b , and wher e U ( x ) is an orthogo na l matrix. The fact that (for N → ∞ ) the matrix U do e s not dep end o n N (but do e s in gener a l dep end on the r ealization x ) is a conse q uence of the Oseledec theor em [21, 2 2]. Thu s the meaning of (25) is that the e s sential dependence of T ( x N . .. 1 ) on N is cont ained in the singular v alues e − N µ k , while U ( x ) do es not dep end on N for N → ∞ . B. Eigenv alues of the random-matrix product. The a b ove r easoning by itself is silent ab out the eigenv a lues of T ( x N . .. 1 ). Since the matrix T ( x N . .. 1 ) is in general not nor mal, i.e., the commutator o f T ( x N . .. 1 ) with its transp ose T † ( x N . .. 1 ) is not zero, the mo dules l k [ T ( x N . .. 1 )] of its eigenv alues are not automatically eq ual to its singular v alues e − N µ k ; see Appendix A. F or us the knowledge of the sp ectral radius λ [ T ( x N . .. 1 )] will be impor ta nt , b ecause for calculating the en tropy we s hall employ a metho d that essentially relies on the features (13, 14), which ho ld for the eigenv alues, but do not hold for singula r v a lue s . It is shown in Appendix D that the representation (2 5) c an b e used for deducing that in the limit N → ∞ a nd for x N . .. 1 ∈ Ω ∗ N the spec tral radius λ [ T ( x N . .. 1 )] of T ( x N . .. 1 ) behaves a s [recall (12)] λ [ T ( x N . .. 1 )] ≃ e − N µ 0 , ( 26) where µ 0 is the so called top Lyapuno v exp onent. Appendix D discusses under which generic co nditions (26 ) holds ; see also [23] in this context. Using (8) w e have asymptotically for N → ∞ and x N . .. 1 ∈ Ω ∗ N T ( x N . .. 1 ) ≃ e − N µ 0 | R ( x ) ih L ( x ) | + O [ e − N ν 1 ( x N ... 1 ) ] , (27) P ( x N . .. 1 ) ≃ e − N µ 0 + O (1) + O [ e − N ν 1 ( x N ... 1 ) ] , (28) where we denoted l 1 [ T ( x N . .. 1 )] ≡ e − N ν 1 ( x N ... 1 ) [see (12)], and whe r e | R ( x ) i a nd | L ( x ) i are, r esp ectively , the right and left eigenv ectors of T ( x N . .. 1 ); see Appendix A. They do not dep end on N (for N → ∞ ) for the same re ason as U in (25 ) do es not dep end on N . In writing down (2 7) we assumed tha t the spe ctral r adius λ [ T ( x N . .. 1 )] is not a degenerate eigenv alue of T ( x N . .. 1 ), o r at least that its algebraic and geometric degener acies co incide (see App endix A). In tha t latter case o ne can then use (27) with straightforward mo diﬁca tio ns a nd obtain (28). The term O [ e − N ν 1 ( x N ... 1 ) ] in (27, 2 8 ) can b e neglected for N → ∞ provided that µ 0 > ν 1 ( x N . .. 1 ∈ Ω ∗ N ). The m ultiplicative correction O (1) in (28) comes from the eig env ectors in (2 7). This corr ection can b e neglected if µ 0 stays ﬁnite for N → ∞ . Below we assume that these t wo hypothese s hold. This implies fro m (21) a stra ightforw ard relation b e t ween the entrop y h and the spec tral radius λ [ T ( x N . .. 1 )] of T ( x N . .. 1 ): h = µ 0 = lim N →∞ { − 1 N ln λ [ T ( x N . .. 1 )] } . (29) The r elation b etw een the top Lyapunov exp one nt and the entropy is known [1 1, 12]. The a bove discussion empha - sizes the role of the sp ectral radius in this relation [27]. V. GENERA TING F UNCTION AND A TYPICAL REALIZA T IONS While the entrop y ch ara cterizes typical realiza tions of the pro cess, it is of interest (mainly for a ﬁnite num b e r of realizations ) to describ e atypical rea lizations, those which fall out of the t ypical set Ω ∗ N . T o this end let us intro duce the generating function [8] Λ N ( n, N ) = X x N ... 1 λ n [ T ( x N . .. 1 )] , (30) 6 where n is a non-neg ative num b er. (Note that Λ N ( n, N ) means Λ( n, N ) in degree of N .) The generating function Λ N ( n, N ) is an analog of the partition sum in statistica l physics [8] 5 . W riting Λ N ( n, N ) = X x N ... 1 ∈ Ω ∗ N λ n [ T ( x N . .. 1 )] + X x N ... 1 6∈ Ω ∗ N λ n [ T ( x N . .. 1 )] , (31) one notes that in the limits N → ∞ and n → 1 the second contribution in the RHS of (31 ) can b e neg lected due to deﬁnition (21, 23) of the typicality , and then Λ N ( n, N ) = Λ N ( n ) = e − ( n − 1) N h ; see (27, 28). Here we a lr eady no ted that Λ( n, N ) do es no t dep end on N for N → ∞ , and denoted (in this limit) Λ( n, N ) = Λ( n ). T aking into acco unt tha t Λ(1) = 1, the e n tropy h is calculated via deriv a tive of the ge nerating function: h = − 1 N ∂ Λ N ( n ) ∂ n     n =1 = − dΛ( n ) d n     n =1 ≡ − Λ ′ (1) (32) = − X x N ... 1 λ [ T ( x N . .. 1 )] ln λ [ T ( x N . .. 1 )] . (33) The genera ting function (30) can b e employ ed for estimating the w eight of atypical seq uences. This estimate is known as the Chernoﬀ bound [6, 8], and now we brieﬂy recall its deriv a tion a dopted to our situation. Consider the ov erall weigh t of at ypical seque nce s, whic h have proba bility low er than the typical-sequence proba bility e − N h ; see (21, 2 3). These at ypical sequences are deﬁned to satisfy − ln λ [ T ( x N . .. 1 )] > (1 + η ) N h, (34) where η > 0 qua n tiﬁes the deviation from the t ypical b ehavior. Let P x N ... 1 be the sum ov er all those x N . .. 1 that s atisfy to (34 ). Deﬁne a n a uxiliary pr obability distr ibution e P ( x N . .. 1 | n ) = Λ − N ( n, N ) λ n [ T ( x N . .. 1 )]. The soug ht w eight of the atypical sequences is expressed as ( η > 0 and 0 < n < 1): X x N ... 1 λ [ T ( x N . .. 1 )] = Λ N ( n, N ) X x N ... 1 e P ( x N . .. 1 | n ) e (1 − n ) ln λ [ T ( x N ... 1 )] ≤ e N [ ln Λ( n,N ) +( n − 1)(1+ η ) h ] X x N ... 1 e P ( x N . .. 1 | n ) ≤ e N [ ln Λ( n,N )+( n − 1)(1+ η ) h ] . (35) Eq. (35) leads to the following upper (Cher noﬀ ) b o und for the weigh t of atypical sequences with the pro bability lo wer than the e − N h : X − ln λ [ T ( x N ... 1 )] > (1+ η ) N h λ [ T ( x N . .. 1 )] ≤ e − N f ( η ) , (36) f ( η ) ≡ max 0 0 . (37) Analogously to (35) w e get for the weigh t of the a typical sequences with the proba bility higher than the e − N h (0 < η < 1 ): X − ln λ [ T ( x N ... 1 )] < (1 − η ) N h λ [ T ( x N . .. 1 )] ≤ e − N g ( η ) , (38) g ( η ) ≡ max n> 1 [ ln 1 Λ( n ) + (1 − n )(1 − η ) h ] , η > 0 . (39) The functions f ( η ) and g ( η ) in (37 ) and (39), resp ectively , are called the rate functions [6]. It is seen that f ( η ) and g ( η ) are the Legendre tra nsforms of ln Λ( n ). The latter is a conv ex function of n , d 2 d 2 n ln Λ( n ) ≥ 0 , a s follow fro m its deﬁnition (30). Then f ( η ) and g ( η ) are conv ex a s well [8]. F or example taking in to account that n and η a re related via the extremu m condition d d n ln Λ( n ) = − (1 + η ) h , we get f ′′ ( η ) =  d n d η  2 h d 2 d n 2 ln Λ( n ) i n = n ( η ) ≥ 0. While the above reas o ning is based on the Chernoﬀ b ounds, ther e is another (related, but more formal) appr oach to descr ibing atypical realization, which is known as the measur e c oncentration theory . F or a re cent application o f this theory to HMP s ee [2 4]. 5 Λ( n, N ) is sometimes called the generalized Lyapuno v exponent . It is closely related to the concept of multi-fractality [21]. 7 VI. ZET A FUNCTION A ND ITS EXP ANSION OVER THE PERIODIC ORBITS (CYCLES). A. Zeta function and e nt rop y . In this section we show how to a dopt the method prop osed in [25, 27] for calcula ting the mo men t-genera ting function Λ( n ) (and thus fo r calculating the entrop y h via (32)). The metho d is ba sed on the concepts of the zeta-function and per io dic orbits. Deﬁne the in verse zeta-function as [8, 25, 26, 28] ξ ( z , n ) = exp " − ∞ X m =1 z m m Λ m ( n, m ) # , (40) where Λ m ( n, m ) ≥ 0 is given b y (30). The analogs of (40) are well-kno wn in the theory of dynamic systems; s e e [26] for a ma thema tical in tro duction, and [25, 27, 28] fo r a physicist-oriented discussion. Since for a la rge N , Λ N ( n, N ) → Λ N ( n ), the zeta-function ξ ( z , n ) has a zer o at z = 1 Λ( n ) : ξ ( 1 Λ( n ) , n ) = 0 . (41) Indeed for z close (but smaller than) 1 Λ( n ) , the series P ∞ m =1 z m m Λ m ( n, m ) → P ∞ m =1 [ z Λ( n )] m m almost diverges and one has ξ ( z ) → 1 − z Λ ( n ). Recalling that Λ(1) = 1 and taking n → 1 in 0 = d d n ξ ( 1 Λ( n ) , n ) = − Λ ′ ( n ) Λ 2 ( n ) ∂ ∂ z ξ ( 1 Λ( n ) , n ) + ∂ ∂ n ξ ( 1 Λ( n ) , n ) , (4 2) we get for the en tropy from (32) h = − Λ ′ (1) = − ∂ ∂ n ξ (1 , 1 ) ∂ ∂ z ξ (1 , 1 ) . (43) B. Expansion over the p erio dic orbits. In App endix E 2 we describ e following to [25–28] that under conditio ns (13, 14) one can ex pa nd ξ ( z , n ) ov er the per io dic orbits: ξ ( z , n ) = ∞ Y p =1 Y Γ p ∈ Per( p )  1 − z p λ n [ T ( x γ 1 ) ...T ( x γ p )]  , (44) Γ p ≡ ( γ 1 , ..., γ p ) , (45) where γ i = 1 , ..., M are the indices r eferring to the realizations o f the random pro cess X . The set of per io dic orbits Per( p ) contains sequenc e s Γ p = ( γ 1 , ..., γ p ) selected according to the fo llowing tw o rules: i) Γ p turns to itself after p successive cy c lic p e rmutations o f its elements, but it do e s not turn to itself after an y smaller (than p ) n umber of successive cyclic p ermutations; ii) if Γ p is in P er( p ), then Per( p ) contains none of those p − 1 sequences obtained from Γ p under p − 1 successive cyclic p er mu tations. Concrete examples of Per ( p ) for M = 2 , 3 are g iven in T ables IV and V. It is mor e con venien t to present (44) as an inﬁnite sum [25, 27, 29] ξ ( z , n ) = 1 − z M X l =1 λ n l + ∞ X k =2 ϕ k ( n ) z k , (46) where w e deﬁned λ n α...β ≡ λ n [ T ( x α ) ...T ( x β )] , λ n α + β ≡ λ n [ T ( x α )] λ n [ T ( x β )] , (47) and where ϕ k ( n ) are calculated from (44, 45) and recipes presented in App endix E. These calc ulations become tedious for large v alues of k in ϕ k ( n ). This is why in App endix E 3 it is shown how to generate ϕ k ( n ) via Mathematica 5. 8 F or t wo ( M = 2) rea lizations of the HMP we employ the notations (47) and get for the ﬁrst few terms of the pro duct (44) [consult T able IV for understanding the origin of these ter ms ] ξ ( z , n ) = (1 − z λ n 1 ) (1 − z λ n 2 ) (1 − z λ n 12 ) (1 − z λ n 122 ) (1 − z λ n 112 ) (48) (1 − z λ n 1222 ) (1 − z λ n 1112 ) (1 − z λ n 1122 ) ∞ Y p =5 Y Γ p ∈ Per( p )  1 − z p λ n γ 1 ...γ p  . (49) F or the ﬁrst six terms of the expa ns ion (46) we get ϕ 2 ( n ) = − λ n 12 + λ n 1+2 , (50) ϕ 3 ( n ) = − λ n 221 + λ n 2+21 − λ n 112 + λ n 1+12 , (51) ϕ 4 ( n ) = − λ n 1122 + λ n 2+211 − λ n 1222 + λ n 2+122 − λ n 1112 + λ n 1+211 (52) − λ n 1+2+12 + λ n 1+122 (53) ϕ 5 ( n ) = − λ n 11222 + λ n 1+1222 − λ n 11122 + λ n 2+1112 (54) − λ n 11112 + λ n 1+1112 − λ n 12222 + λ n 2+1222 (55) − λ n 12121 + λ n 1+1122 − λ n 12122 + λ n 2+1122 (56) − λ n 1+2+122 + λ n 12+122 − λ n 1+2+112 + λ n 12+112 , (57) ϕ 6 ( n ) = − λ n 111122 + λ n 1+11122 − λ n 112122 + λ n 1+12122 − λ n 111222 + λ n 1+11222 (58) − λ n 111212 + λ n 1+11212 − λ n 112222 + λ n 1+12222 − λ n 222121 + λ n 2+22121 (59) − λ n 122222 + λ n 2+12222 − λ n 111112 + λ n 1+11112 − λ n 112212 + λ n 2+12121 (60) − λ n 1+12+122 + λ n 1+12122 − λ n 2+12+211 + λ n 12+1122 − λ n 1+12+211 + λ n 12+2111 (61) − λ n 2+12+122 + λ n 12+1222 − λ n 1+2+1222 + λ n 2+11222 − λ n 1+2+2111 + λ n 2+21111 (62) − λ n 1+2+1122 + λ n 122+211 . (63) In section VII E w e s tudy examples, where the ex pansion (46) can be summed exa ctly . In these examples the sum in (46) exp onentially co nv ergences for | z | < α n , where α > 1 is a parameter. As discussed in [28], the exp onential conv ergence of ξ ( z ) is expected to b e a gene r al feature, a nd it is supp or ted by rig orous results on the structure of the zeta-function. 1. The struct ur e of ϕ k ( n ) . Note that ϕ k consists of even num be r of terms. The ter ms are grouped in pairs, e.g., [ − λ n 221 + λ n 2+21 ] + [ − λ n 112 + λ n 1+12 ] for ϕ 3 , and analogo usly for other ϕ k ’s. Ea ch pa ir has the form − λ n A + λ n B , where A and B hav e the same num b er o f symbols 1 and the same n umber o f symbo ls 2 . This fea ture ensures that when the sp e ctral radius o f the pro duct is equal to the pr o duct of the spectr al radii, all the terms ϕ k will v anish. Ultimately , this is the feature that enforc e s the conv ergence of (46) [25, 28]. Once it co nv erges, we can approximate ξ ( z , n ) by a p olynomial of a ﬁnite order. The set o f pair s for ea ch ϕ k can be divided further into several gr oups. The ﬁr st g roup is fo r med by (5 0) a nd (51) for ϕ 2 and ϕ 3 , resp ectively , by (52) for ϕ 4 , by (54–56 ) for ϕ 5 , and by (58 – 60) for ϕ 6 . The pair s in this gro up have the form − λ n Al + λ n A + l , where l = 1 or l = 2 . If A con tains m indices and if m is large, we exp ect ln λ n A = O ( m ) according to the disc us sion in section IV B. Then − λ n Al + λ n A + l → 0 for m → ∞ . (64) The second group is g iven by (53) for ϕ 4 , (57) for ϕ 5 , and by (61, 62) for ϕ 6 . In this second group the terms hav e the for m − λ n A + B + C + λ n A + B C = λ n A ( λ n B + C − λ n B C ). Here the term ( λ n B + C − λ n B C ) has the str ucture of the ﬁrst group. F or B o r/and C cont aining a lar ge n umber of indices, ( λ n B + C − λ n B C ) will go to zer o. Finally the third group app ears o nly for k ≥ 6. F or k = 6 this group has only one pa ir g iven b y (63). The mem b ers of this third g roup are o f the form − λ n A + B + C D + λ n AB D + C . 9 Let us return to (6 4), which holds, in particular , for A consisting of the same type of indices (e.g ., A containing only 1’s). Reca lling our discussions after (28) and after (63), and ex pa nding A ov er its eig env alues and eige nvectors, we c o nclude heuristic al ly that for the conv ergence radius of P ∞ k =2 ϕ k ( n ) z k in (46 ) to b e suﬃciently la rger than 1, it is ne c essary to hav e for the transfer-matrice s T ( x ) (using notations (12)) λ [ T ( x )] 6≈ l 1 [ T ( x )] , λ [ T ( x ) ] 6≈ 1 , (65) i.e., clos er is λ [ T ( x )] to l 1 [ T ( x )] and or λ [ T ( x )] to 1, more ter ms ar e needed in the expa ns ion (46 ) for the relia ble estimate of the ent ropy . Note that if λ [ T ( x )] = l 1 [ T ( x )] > l 2 [ T ( x )], the ﬁrst rela tion in (65 ) should b e mo diﬁed to λ [ T ( x )] 6≈ l 2 [ T ( x )]. W e shall meet such exa mples b elow; see (81 ) and the discuss ion before it. Recall from (43) that fo r calculating the en tropy we need to know ξ ( z , n ) in the vicinit y of z = 1 a nd n = 1. If the qualitative conditio ns (6 5) ar e satisﬁed, we ex pe c t that the v ic init y of z = 1 a nd n = 1 is included in the conv ergence area. The convergence of expa nsions s imila r to (4 6 ) is discuss e d in [2 5, 27, 28]. In par ticular, Refs. [2 5, 27] employ criteria similar to (65) a nd test them numerically . In the context of expansion (46) we should ment ion the r esults devoted to analyticity pr op erties o f the top Ly apunov exp onent [30, 31] and o f the en tropy for HMP [32]. In particula r, Ref. [32] states that the entropy h of HMP is an analytic function of the Markov transition pro babilities (3), provided tha t these pr o babilities are p ositive. At the moment it is unclear for the prese n t a uthor how in gener al this analyticity result can be link ed to the expansion (46). Ho wev er, we show b elow o n concr ete examples that the expansion (46) can b e r ecast in to an expansio n over the Marko v transition probabilities (3). VII. THE SIMPLEST AG GREGA TED MARKO V PR OCESS. A. Deﬁnition. An Agg r egated Markov Pro ces s (so metimes called a Markov source) is a par ticular case of HMP , where the proba - bilities π ( x | s ) in (5) take only tw o v a lues 0 and 1 [2, 5]. Thus it is deﬁned b y the underlying Markov pro ce ss S tog e ther with a deter ministic function F ( s i ) that takes the realizations of the Mar ko v pro ces s to those of the ag grega ted pro- cess: X = ( X 1 , X 2 , ... ) = ( F ( S 1 ) , F ( S 2 ) , ... ). The function F is not one-to - one so that at least tw o realizatio ns of S are lumped tog ether in to one r e alization o f X . The s implest example is given by a Markov pro cess S = {S 0 , S 1 , .... } with thr e e r ealizations S i = 1 , 2 , 3 , such that, e.g., the realiza tions 2 and 3 of S i are no t distinguished from each other and corre sp o nd to one rea lization 2 of the observed pro cess X i [see Fig. 1]: F (1 ) = 1 , F (2) = F ( 3) = 2 , (66) π (1 | 1) = 1 , π (1 | 2 ) = 0 , π (1 | 3 ) = 0 , (67) π (2 | 1) = 0 , π (2 | 2 ) = 1 , π (2 | 3 ) = 1 . (68) The transition matrix of a gener al three-realizatio n Marko v pro cess is [see Fig. 1] P =   1 − p 1 − p 2 q 1 r 1 p 1 1 − q 1 − q 2 r 2 p 2 q 2 1 − r 1 − r 2   , | st i ∝   q 1 ( r 1 + r 2 ) + q 2 r 1 r 2 ( p 1 + p 2 ) + p 1 r 1 p 2 ( q 1 + q 2 ) + p 1 q 2   (69) where all elements of P ar e p ositive, and where we presented the s tationary vector | st i up to the ov erall norma lization 6 . The pro cess X = {X 1 , X 2 , .... } has tw o realizations: X i = 1 , 2. The corresp onding transfer ma trices read from (7) T (1) =   1 − p 1 − p 2 q 1 r 1 0 0 0 0 0 0   , T (2) =   0 0 0 p 1 1 − q 1 − q 2 r 2 p 2 q 2 1 − r 1 − r 2   . (70) Note tha t the second (sub-dominant) eigenv alue of the transfer- matrix pro duct T ( x N . .. 1 ) = Q N k =1 T ( x k ) (with separa te transfer-matr ices deﬁned b y (7 0)) is equa l to zer o , since this eigenv alue can b e pres e n ted a s that of the matrix T (1) A , 6 Note that some authors pr esen t the Marko v transition matri ces P i s suc h a w ay that the elemen ts i n eac h ra w sum to one. This amoun ts to transp osition of (69). The representa tion (69) is p erhaps more famil iar to physicists. 10 1 1 2 3 2 p 1 q 1 p 2 r 1 r 2 q 2 FIG. 1: Schematic represen tation of t he hidden Mark o v process deﬁned by (66–70). The gray squ ares and gra y arro ws indicate, respectively , on the realization of the internal Marko v pro cess and transitions b etw een the realizations; see (69). The circles and black arrows in d icate on the realizations of the observed pro cess. The gray arro ws are probabilistic; the corresp onding probabilities are indicated next to them. The b lac k arro w are d eterministic; see (66). where A is some 3 × 3 matrix . The only exclus io n, which ha s a non- z ero sub-dominant eig env alue, is the realiz a tion of X that do es no t co nt ain 1 a t all: T (2 ... 2) = T N (2). The consider ed HMP (66–70) b elongs to the cla ss of HMP with una mb iguous symbol, since the Marko v r ealization 1 is not cor rupted by the noise; see Fig. 1. F or such HMP , Ref. [32] rep o rts se veral res ults on the analytic features of the en tropy . B. Uniﬁlar pro cess. Before studying in detail the HMP deﬁned by (66–70 ), let us mention one example of HMP , where the e ntropy can b e calcula ted directly [2, 5]. This uniﬁlar pro cess is deﬁned a s follows [5]: for e ach realization s i of the Markov pro cess S consider realizatio ns s j with a strictly po sitive transition proba bilit y p ( s j | s i ) > 0. Now requir e that the realizations F ( s j ) of X j are distinct. Th us given the realization s i of S 1 , there is one to one cor resp ondence b etw een the realizations of ( X 1 , X 2 , ... ) and thos e o f ( S 1 , S 2 , ... ). W rite the blo ck-en tropy of X a s H ( X N , ..., X 1 ) = H ( X N , ..., X 1 |S 1 ) + H ( S 1 ) − H ( S 1 |X 1 , ..., X N ) , (71) where H ( A|B ) ≡ − P a,b Pr( a, b ) ln Pr( a | b ) is the conditional ent ropy of the sto chastic v ariable A given B . Due to the deﬁnition of the uniﬁlar pro cess: H ( X N , ..., X 1 |S 1 ) = H ( S N , ..., S 2 |S 1 ). The latter is work ed out via the Markov feature: H ( S N , ..., S 2 |S 1 ) = ( N − 1) h marko v , (72) h marko v = − X k,l p st ( k ) p ( l | k ) ln p ( l | k ) , (73) where p st ( k ) is the stationar y Markov probability deﬁned in (4), and where p ( l | k ) a re the Mar ko v tr ansition proba- bilities from (3). Since H ( S 1 ) and H ( S 1 |X 1 , ..., X N ) in (71 ) a re ﬁnite in the limit N → ∞ , the entrop y h ( X ) of the uniﬁlar pro cess reduces to tha t of the underlying Mar kov pro cess h marko v [5]. Note that any ﬁnite-or der Markov pr o cess (conv en tionally assuming that the usual Ma r ko v pro cess is of ﬁrst order) can be pr esented as a uniﬁlar pro cess . There are, howev er, uniﬁlar pro ce s ses that do not reduce to a ny ﬁnite-order Marko v pro cess [5] 7 . The main problem in iden tifying uniﬁlar pro cesses is that even if X is no t uniﬁlar for g iven S , it can be s till uniﬁlar with resp ect to another Markov pr o cess S ′ (see section VII C be low for the simplest example). This mak es especially diﬃcult the recognition of uniﬁlar pro cess es that do not reduce to any ﬁnite-or de r Marko v pro cess. 7 The example of suc h a pro cess given i n [5] is not minimal. The minim al example is given by f our-realization M arko v pro cess with non-zero transition probabilities p (4 | 1), p (3 | 4), p (2 | 3), p (1 | 2), p (1 | 1), p (2 | 2), p (3 | 3) and p (4 | 4) (all other transition probabilities are zero), and tw o realizations of X i suc h that F (1) = F (3) = 1, F (2) = F (4) = 2. The uniﬁlar pro cess X do es not reduce to a ﬁnite-order Marko v pro cess, since, e.g., there are tw o diﬀerent mechanisms of pr o ducing the sequence 1 ... 1. This means that P (1 | 111) is not equal to P (1 | 11), et c . 11 C. Par ticular cases. W e now return to the HMP (66–70) and discus s some of its particular cases. 1. F or q 2 = r 2 and q 1 = r 1 all the terms ϕ k with k ≥ 3 in the expansion (4 6) a re zer o. One can chec k that for this case the observed pro cess X is b y itself Ma rko v. 2. F or (1 − q 1 − q 2 )(1 − r 1 − r 2 ) = q 2 r 2 , one ca n chec k that φ k = 0 for k ≥ 4. Now the pro cess X is the second-or der Marko v: P ( x k | x k − 1 , x k − 2 , x k − 3 ) = P ( x k | x k − 1 , x k − 2 ). Thu s at least for thes e tw o cases the ca lculation of the entropy is straightforw ard. The a b ov e t wo facts tend to clarify the meaning of the expa nsion (46 ). It is tempting to s uggest that if the expansion (46) is cut precisely at a p ositive integer K > 2, i.e., ϕ k ≥ K = 0, then the corr e sp o nding pro cess X is K − 2-order Marko vian. If true, this will give conv enien t conditions for deciding on the ﬁnite-o r der Mar kov fea ture, and will mea n that the successive terms in (4 6) are in fac t a pproximations the HMP via ﬁnite-o rder Marko v pro cess es. D. Upp er and low er b ounds for the e nt rop y . Before pr esenting the main results o f this section, let us recall that the entropy of any (stationary) HMP satisﬁes the following inequalities [6] 8 : H ( X 2 |S 1 ) ≤ h ≤ H ( X 2 |X 1 ) ≡ H (2) − H (1 ) , (74) where H ( A|B ) = − P a,b Pr( A = a, B = b ) ln Pr( A = a |B = b ) and H ( N ) are, resp ectively , the conditional entrop y and the blo ck entrop y deﬁned in (16). Employing (5, 7) w e deduce Pr( X 2 = x |S 1 = s ) = L X s ′ =1 T s ′ s ( x ) . (75) This equation together with the stationa r y probability (69) of the Marko v pro cess is suﬃcient for calculating H ( X 2 |S 1 ) for the HMP (66, 7 0): H ( X 2 |S 1 ) = p st (1) χ ( p 1 + p 2 ) + p st (2) χ ( q 1 ) + p st (3) χ ( r 1 ) , (76) χ ( p ) ≡ − p ln p − (1 − p ) ln(1 − p ) . (77) The upper b ound H ( X 2 |X 1 ) is calculated directly from (8, 1 0 , 1 6). E. Generating function and ent rop y: exact resul ts. F or a particular four-para metr ic clas s of HMP (66–7 0) we were able to sum exactly the expansio n (4 6) 9 . This class is character ized by the condition that the t wo leading eigenv a lues o f the transfer-ma trix T (2) in (70) ha ve equal absolute v alues [the third eigenv alue is eq ua l to zero]: λ [ T ( 2)] = λ 1 [ T (2)] . (78 ) A direct insp ection s hows that this condition amounts to tw o po ssible for ms (80) and (88) o f the transition matrix P . These t wo cases are studied below. 1. First c ase. F or this ﬁrst case the transition matrix is obta ine d fro m (7 0 ) under 10 r 2 = 0 and r 1 = q 1 + q 2 . (79) 8 Eq. (74) is a particular case of a sl ight ly more general inequality [6, 10]. F or our purely ill ustrativ e purp oses (74) i s suﬃcient . 9 This was done by hands, c hec king the separate terms of the expansion (44). 10 Or, alternativ ely , via q 2 = 0 and q 1 = r 1 + r 2 . This, how ev er, does not amoun t to an ything new as compared to (80). 12 0.2 0.4 0.6 0.8 1 q 0.1 0.2 0.3 0.4 entropy FIG. 2: En tropy (85) of HMP (69, 70, 80) versus q = q 2 for p 2 = q 1 = 0. Normal line: p 1 = 0 . 5. Thick line: p 1 = 0 . 75. Up p er dashed line: p 1 = 0 . 05. Lo we r dashed line: p 1 = 0 . 01. It is seen that for a small va lue of p 1 , the entropy h is nearly constant for a range of q = q 2 . This leads from (69) to the tra ns ition matrix P =   1 − p 1 − p 2 q 1 q 1 + q 2 p 1 1 − q 1 − q 2 0 p 2 q 2 1 − q 1 − q 2   . (80) It is seen that the rea liz ation {S k +1 = 2 , S k = 3 } for the Mar kov pro cess is prohibited. F or the HMP there ar e no prohibited sequences. The in verse zeta-function reads from (46): ξ ( z , n ) = 1 − [ (1 − p 1 − p 2 ) n + (1 − q 1 − q 2 ) n ] z + [ (1 − p 1 − p 2 ) n (1 − q 1 − q 2 ) n − ( p 1 q 1 + p 2 ( q 1 + q 2 ) ) n ] z 2 + z 3 [ p 1 q 2 ( q 1 + q 2 )] n [ Φ( y , − n, b ) − Φ( y , − n, b + 1) ] , (8 1 ) where w e deﬁned b ≡ (1 − q 1 − q 2 ) p 2 ( q 1 + q 2 ) + p 1 q 1 p 1 q 2 ( q 1 + q 2 ) , (82) y ≡ (1 − q 1 − q 2 ) n z , (83) and where Φ( y , − n, b ) is the Lerch Φ-function: Φ( y , − n, b ) = ∞ X k =0 ( k + b ) n y k . (84) In this repres e ntation, which led to (81), the sum conv erges for | y | < 1 or for z < (1 − q 1 − q 2 ) − n ≥ 1. The conv ergence radius tends to one for q 1 + q 2 → 0 , o r, equiv alently , for λ [ T (2)] → 1; see (70). This violates the se c ond q ualitative condition in (65). Using (43) w e get fro m (81) for the entropy: h = − 1 p 1 + p 2 + q 1 + q 2 + p 1 q 2 q 1 + q 2 { (1 − p 1 − p 2 )( q 1 + q 2 ) ln(1 − p 1 − p 2 ) + (1 − q 1 − q 2 )( p 1 + p 2 + p 1 q 2 q 1 + q 2 ) ln(1 − q 1 − q 2 ) + p 1 q 2 ln[ p 1 q 2 ( q 1 + q 2 ) ] + [ ( p 1 + p 2 ) q 1 + p 2 q 2 ] ln[ ( p 1 + p 2 ) q 1 + p 2 q 2 ] + p 1 q 2 ( q 1 + q 2 ) h Φ ′ [2] (1 − q 1 − q 2 , − 1 , b ) − Φ ′ [2] (1 − q 1 − q 2 , − 1 , b + 1) i } , (85 ) where b is deﬁned in (82 ), and where Φ ′ [2] ( y , − 1 , b ) = ∞ X k =0 ln  1 k + b  ( k + b ) y k . (86) 13 T A BLE I: F or tw o set of parameters of the H MP (66, 79, 80) we present th e exact va lue of en tropy h obt ained from (85), the lo w er bound H ( X 2 |S 1 ), and the upp er b ound H ( X 2 |X 1 ); see (74). h H ( X 2 |S 1 ) H ( X 2 |X 1 ) p 1 = 0 . 75 p 2 = 0 . 10 0.56 9580 0.557243 0.572373 q 1 = 0 . 2 5 q 2 = 0 . 2 0 p 1 = 0 . 30 p 2 = 0 . 20 0.68 4796 0.682486 0.684843 q 1 = 0 . 5 5 q 2 = 0 . 1 0 0 0.05 0.1 0.15 0.2 0.25 Η 0.0005 0.001 0.002 0.0025 0.003 f,g FIG. 3: The rate functions f ( η ) and g ( η ) deﬁn ed by (37) and (39), resp ectivel y for t he HMP given by (70, 88, 89). Normal line : g ( η ). Dashed line : f ( η ). F or th e parameters in (88) w e take: p 1 = 0 . 2 , p 2 = 0 . 3 , q = 0 . 05, and r = 0 . 01. F or t h ese v alues the entrop y (90) is h = 0 . 166671. The b ehavior o f h is illustra ted in Fig. 2 fo r particular v a lues of p 1 , p 2 , q 1 and q 2 . T a ble I compar es the exact expression (85) with the upp er and lower bounds (74). The analytic features of h given by (85) as a function of the Markov transition pro babilities p 1 , p 2 , q 1 and q 2 , agr ee with the res ults o btained in [32]. In particular, note tha t for p 1 + p 2 → 1 the en tropy h b eco mes no n-analytic due to the term ∝ (1 − p 1 − p 2 ) ln(1 − p 1 − p 2 ). 0 0.05 0.1 0.15 0.2 0.25 Η 0.02 0.04 0.06 0.08 0.1 0.12 f,g FIG. 4: The same as in Fig. 3 bu t with q = 0 . 1 and r = 0 . 4. F or these va lues the entrop y (90) is h = 0 . 619519 , which is larger than the entrop y in Fig. 3. 14 T A BLE I I: F or tw o set of parameters of the HMP (69, 70, 88) we p resent the exact v alue of entrop y h obtained from (90), the lo w er b ound H ( X 2 |S 1 ), and th e upp er b oun d H ( X 2 |X 1 ); see (74). The parameters p 1 , p 2 , q and r are tuned suc h that H ( X 2 |S 1 ) and H ( X 2 |X 1 ) provide rather tight b ounds on h . h H ( X 2 |S 1 ) H ( X 2 |X 1 ) p 1 = 0 . 1 p 2 = 0 . 1 0. 528531 0.525 571 0.52853 4 q = 0 . 2 r = 0 . 3 p 1 = 0 . 2 p 2 = 0 . 2 0. 659897 0.656 974 0.65990 1 q = 0 . 3 r = 0 . 4 2. Se c ond c ase. The second po s sibility of s atisfying (78) is given b y q 1 + q 2 = 1 and r 1 + r 2 = 1 , (87) P =    1 − p 1 − p 2 q r p 1 0 1 − r p 2 1 − q 0    . (88) The realizatio ns of the cor resp onding Marko v pro c e s s do not c ontain {S k +1 = 2 , S k = 2 } and {S k +1 = 3 , S k = 3 } . Again, the realizations of the HMP do not hav e any prohibited seq uence. The in verse zeta-function reads from (46) ξ ( z , n ) = 1 − h (1 − p 1 − p 2 ) n + (1 − q ) n/ 2 (1 − r ) n/ 2 i z + h − ( p 1 q + p 2 r ) n + (1 − p 1 − p 2 ) n (1 − q ) n/ 2 (1 − r ) n/ 2 i z 2 + z 3 1 + z (1 − q ) n/ 2 (1 − r ) n/ 2 h ( p 1 q + p 2 r ) n (1 − q ) n/ 2 (1 − r ) n/ 2 − ( p 1 r (1 − q ) + p 2 q (1 − r ) ) n i . (89) The series tha t led to (89) conv erges for | z | < (1 − q ) − n/ 2 (1 − r ) − n/ 2 . Again the co nv ergence radius going to one violates the second qualitative condition in (65). Eqs. (32, 89) imply fo r the sour ce entrop y: h = − 1 2( p 1 + p 2 ) + q (1 − p 1 ) + r (1 − p 2 ) − q r { [ q (1 − r ) + r ](1 − p 1 − p 2 ) ln(1 − p 1 − p 2 ) +( p 1 + p 2 ) (1 − q ) (1 − r ) ln[(1 − q )(1 − r )] + ( p 1 q + p 2 r ) ln [ p 1 q + p 2 r ] +[ p 2 q (1 − r ) + p 1 (1 − q ) r ] ln[ p 2 q (1 − r ) + p 1 (1 − q ) r ] } . (90) Applying the general deﬁnition (73 ) o f the Ma rko v entrop y to the particular case (69) w e g et for the Mar kov entropy h marko v = − 1 2( p 1 + p 2 ) + q (1 − p 1 ) + r (1 − p 2 ) − q r { [ q (1 − r ) + r ][ (1 − p 1 − p 2 ) ln(1 − p 1 − p 2 ) + p 1 ln p 1 + p 2 ln p 2 ] [(1 − r )( p 1 + p 2 ) + p 1 r ] [ q ln q + (1 − q ) ln(1 − q )] [ p 2 + p 1 (1 − q )] [ r ln r + (1 − r ) ln (1 − r )] } . (91) Comparing (90, 91) one can chec k [e.g., numerically] that h marko v > h , as sho uld b e, since lumping several states together decr eases the en tropy . T able I I compares the exact v a lue (90) for the entrop y with the upp e r and lower bo unds (74). 15 3. R ate functions for lar ge deviations. Recall that the rate function f ( η ) ( g ( η )) deﬁned in section V, describ e the weigh t of atypical sequences with the probability smaller (larger) than the typical sequence proba bilit y e − N h . The p o sitive parameter η deﬁnes the a mo unt of this sma llness (largeness); see (37) a nd (3 9). The calcula tio n of f ( η ) and g ( η ) for the co nsidered HMP mo del (88, 70 ) is straig h tforward. One ﬁnds out the zer o of the ξ -function given by (89). This will deﬁne, via (41), the moment-generating function Λ( n ). If there are several zeros of ξ ( z , n ) as a function of z , we select the one that go es to z = 1 for n → 1. Then f ( η ) and g ( η ) ar e ca lculated from their deﬁnitions (37) and (39 ). The b ehavior of f ( η ) a nd g ( η ) as functions of η is presented in Figs. 3 and 4. F or each ﬁgure we take diﬀerent sets of parameters p 1 , p 2 , q and r ; see (88) for their deﬁnition. T o make this diﬀerence explicit let us denote f 3 ( η ), g 3 ( η ) and f 4 ( η ), g 4 ( η ) for Fig. 3 a nd Fig. 4, resp ectively . Now let us observe tha t f 3 ( η ) < f 4 ( η ) , g 3 ( η ) < g 4 ( η ) , (92) g 3 ( η ) > f 3 ( η ) , g 4 ( η ) < f 4 ( η ) . (93) F or explaining these inequalities we note that for the parameters o f Fig. 3 the entropy is smaller than h in Fig. 4: h 3 < h 4 , (94) which means that the typical set Ω ∗ N for Fig. 4 contains mor e s equences, so there remains less of them outside, which may expla in (92). F or the s ame rea son (94), the probability of each typical sequence is higher for the para meter s in Fig. 3 . Th us for the para meters presented in Fig. 3 more high- probability sequences are included in the co rresp onding t ypical set Ω ∗ N . This may explain (93 ). In further num erical chec kings it was no ted that the above relation b etw e en (92 ) and (93) from one side, and (94) from another side, see ms to b e muc h mo re general tha n these particular examples. VII I. BINAR Y SY MMETRIC HIDDEN MARKO V PR OCESS. A. Deﬁnition and symme tries. This is ano ther popula r (and simple to deﬁne) exa mple of HMP . Now the Markov pro cess has t wo states 1 and 2. The realiz ations of the observed (Hidden Marko v) pro cess also tak e tw o v a lues 1 and 2. The internal Ma rko v pro cess is dr iven by the conditiona l pr o bability P =    p (1 | 1) p (1 | 2) p (2 | 1) p (2 | 2)    =    1 − q q q 1 − q    . (95) The stationary probability for this Marko v pro ces s is found v ia (4): p st (1) = p st (2) = 1 2 . The probabilities for the obs e rv a tio ns 1 or 2 given the in ternal state read π ( x i | s i ) =    π (1 | 1) π (1 | 2) π (2 | 1) π (2 | 2)    =    1 − ǫ ǫ ǫ 1 − ǫ    , (96) where ǫ is the er ror probability dur ing the o bserv ation. F or the transfer matrices w e hav e: T (1) =    ǫ (1 − q ) ǫq (1 − ǫ ) q (1 − ǫ )(1 − q )    , T (2) =    (1 − ǫ )(1 − q ) (1 − ǫ ) q ǫq ǫ (1 − q )    . (97) T (2) is obtained from T (1) v ia ǫ → 1 − ǫ . 16 T A BLE I I I: F or tw o sets of the parameters q and ǫ of th e bin ary sy mmetric HMP (95, 96, 97) we present the en tropy h obt ained by approximating (46) via a p olynomial or order 2, 13 an d 12, resp ectively . These v alues are denoted b y h 2 , h 13 and h 12 . W e compare h k with the low er b ound H ( X 2 |S 1 ), and the up p er bou n d H ( X 2 |X 1 ); see ( 74). It is seen that th e relativ e diﬀerence h 13 − h 2 h 13 is not larger th an 0 . 02. h 2 h 13 h 12 H ( X 2 |S 1 ) H ( X 2 |X 1 ) q = 0 . 2 ǫ = 0 . 45 0.68 7811 0.69 3108 0.693100 0.691346 0.693129 q = 0 . 25 ǫ = 0 . 4 0.681322 0.692884 0.692881 0.688139 0.692947 The following symmetry features are deduced directly fr o m (9 5–97): (1) F or any N the probability P ( x N , . . . , x 1 ; q , ǫ ) of the binary symmetr ic HMP is inv ariant with respect to ǫ → 1 − ǫ : P ( x N , . . . , x 1 ; q , ǫ ) = P ( x N , . . . , x 1 ; q , 1 − ǫ ). (2) The probability P ( x N , . . . , x 1 ; q , ǫ ) is inv a r iant with respect to the full ”inversion” o f the realization ( x N , . . . , x 1 ), e.g. P (1 , 2 , 1 , 1; q , ǫ ) = P (2 , 1 , 2 , 2 ; q , ǫ ). (3) In g e neral, the probability P ( x N , . . . , x 1 ; q , ǫ ) is not in v a riant with resp ect to q → 1 − q , e.g., P (1 , 2; q , ǫ ) − P (1 , 2 ; 1 − q , ǫ ) = 1 2 (1 − 2 ǫ )(2 q − 1). Ho wev er, for each given realization ( x N , . . . , x 1 ) o ne can ﬁnd another unique realization ( ¯ x N , . . . , ¯ x 1 ) such that P ( x N , . . . , x 1 ; q , ǫ ) = P ( ¯ x N , . . . , ¯ x 1 ; 1 − q , ǫ ). The logics of rela ting ( x N , . . . , x 1 ) to ( ¯ x N , . . . , ¯ x 1 ) should b e clear from the following exa mple: if ( x 4 , . . . , x 1 ) = (1 , 2 , 2 , 1 ), then ( ¯ x 4 , . . . , ¯ x 1 ) = (2 , 2 , 1 , 1 ). In mo r e detail, ¯ x 4 = 2 is deﬁned to be diﬀerent from x 4 = 1, and o nce x 3 = 2 is diﬀer ent from x 4 = 1, ¯ x 3 = 2 do es not diﬀer from ¯ x 4 = 2, etc . It s hould be clear (e.g., by induction) that for a given ( x N , . . . , x 1 ), ( ¯ x N , . . . , ¯ x 1 ) is indeed unique. This feature means, in particular , that the entropy h o f the binary symmetric HMP—b eing acco rding to (16, 17) a symmetric function o f all pr obabilities P ( x N , . . . , x 1 )—is inv ariant with r e sp e ct to q → 1 − q : h ( q , ǫ ) = h (1 − q , ǫ ), in addition to being inv aria nt with resp ect to ǫ → 1 − ǫ . (4) In g e neral, the pro babilities P ( x N , . . . , x 1 ) are not inv aria n t with resp ect to a cyclic in terchange of the r ealiza- tions, e.g., P (1 , 2 , 1; q , ǫ ) − P (1 , 1 , 2; q , ǫ ) = 1 2 (1 − 2 ǫ ) 2 q (2 q − 1). F or the cons idered binary symmetric HMP we did not ﬁnd any ex actly so lv able situation. Th us, we employ ed (46) and ca lculated ξ ( z , n ) b y a pproximating the inﬁnite sum in the RHS o f (46) via a p olyno m of o rder K : P K k =2 ϕ k ( n ) z k 11 . This approximation w as suggested in [25] and it is based on the fact that the sum supp osed to c o nv erge ex p o nen- tially a t least in the vicinity of z = 1 a nd n = 1. This is what we saw for the ex a ctly so lv able situations (8 1) and (89). The qualitative criterion for the exp onential conv erges w as suggested in [25, 27] and was discussed by us ar ound (65). Since bo th transfer-matric es in (97) ha ve the same eigenv a lues 1 2 h 1 − q ± p q 2 + (1 − 2 q )(1 − 2 ǫ ) 2 i , (98) for the studied bina ry symmetric HMP ther e are several cases, where the [q ualitative] conditions (6 5 ) are violated: i) q → 0 and ǫ → 1 2 ; ii) q → 1; iii) q → 0 and ǫ → 0. In these three ca ses we exp ect that that approximating ξ ( z , n ) by P K k =2 ϕ k ( n ) z k will no t be feasible, since large v alues of K will b e requir ed to achiev e a reasona bly high precision. Fig. 5 and T able I I I present the results fo r the entrop y obtained in the a b ove approximate way and compare them with the upper and low er b ounds, as given by (74 ). B. Small-noise limit. F or ǫ = 1 2 or for q = 1 2 the pro ces s b eco mes memory-le ss: P ( x 1 , ..., x N ) = P ( x 1 ) ...P ( x N ). Here a ll the functions ϕ k in (46) are equal to zero . Another par ticular ca se is the limit ǫ → 0 (no noise), where the hidden Marko v pro cess degenerates in to the origina l Marko v pro cess. It is stra ightforw ard to check that in (46 ) for the entrop y only the term φ 2 is diﬀerent from zer o, while φ k = 0 for k ≥ 3. This pr o duces the well-known ex pression (73) for the entropy of a Marko v pro cess. 11 The terms in this expansion can p erhaps b e r e-arranged so as to facilitate the conv ergence . Since in the presen t pap er the numerical calclations serve mainly illustrative purposes, we shall not dwe ll int o this asp ect. 17 0 0.2 0.4 0.6 0.8 1 Ε 0.3 0.4 0.5 0.6 0.7 entropy FIG. 5: Entrop y of t h e binary hidden Marko v chai n (normal line) versus the error probabilit y ǫ for q = 0 . 1. D ashed lines: upp er and lo w er b ound s for the entrop y as given by (74). The entrop y is calculated from (46, 43) appro ximating the inﬁnite sum in (46) by a p oynomial or th e order 13. Let us work o ut the vicinit y of ǫ = 0, assuming that ǫ is sma ll (qua si-Markov situation). One can chec k that ϕ k = O ( ǫ k − 2 ) for k ≥ 3 . (99) Thu s for ﬁnding the entropy and the ge ne r ating function within the or der O ( ǫ 2 ), we need to expand ϕ k with k = 1 , 2 , 3 , 4 ov er ǫ and s elect all the terms o f orde r O ( ǫ ) and O ( ǫ 2 ). W e write down explicitly the approximation of ξ ( z , n ) via the p olynom of or der 4 (hig her-order ter ms ϕ k ≥ 5 are not needed, since they do not co nt ribute to the order O ( ǫ 2 )): ξ ( z , n ) = 1 + z ϕ 1 ( n ) + z 2 ϕ 2 ( n ) + z 3 ϕ 3 ( n ) + z 4 ϕ 4 ( n ) + O ( z 3 ) . (100) Using (98) and (50 –53) w e get after s tr aightforw ard algebra ic calculations (taking for simplicit y q < 1 2 ) ϕ 1 ( n ) = − 2 (1 − q ) n + 2 ǫ n (1 − q ) n − 2 (1 − 2 q ) − ǫ 2 n (1 − q ) n − 4 (1 − 2 q ) { (1 − 2 q )( n − 1 − q ) + q } + O ( ǫ 3 ) , (101) ϕ 2 ( n ) = (1 − q ) 2 n − q 2 n − 2 ǫ n ( 1 − 2 q ) h (1 − q ) 2 ( n − 1) + q 2 ( n − 1) i − ǫ 2 n (1 − 2 q ) h q 2 ( n − 2) { (1 − 2 q )( q + 2 n − 3) − q } +(1 − q ) 2( n − 2) { (1 − 2 q )( q + 1 − 2 n ) − q } i + O ( ǫ 3 ) , (102) ϕ 3 ( n ) = 2 ǫ n (1 − 2 q ) 2 (1 − q ) n − 2 q 2( n − 1) − ǫ 2 n (1 − 2 q ) 2 (1 − q ) n − 4 q 2( n − 2)  5 − 3 n + 4 q (3 n − 5) + 2 q 2 (16 − 7 n ) +4 q 3 ( n − 6 ) + 10 q 4  + O ( ǫ 3 ) , (103) ϕ 4 ( n ) = ǫ 2 n (1 − 2 q ) 3 (1 − q ) 2( n − 2) q 2( n − 2) [ 2 − 4 q (1 − q ) − n (1 − 2 q ) ] + O ( ǫ 3 ) . (104 ) Note that a ll ǫ c o rrections n ullify for q = 1 2 , once in this limit we should g et a memory - less pro cess. These e quations pro duce for the entropy from (100, 4 3): h = − (1 − q ) ln(1 − q ) − q ln q (105) − 2 ǫ (1 − 2 q ) ln  1 − q q  (106) − 2 ǫ 2 (1 − 2 q )  ln  1 − q q  + 1 − 2 q 4(1 − q ) 2 q 2  + O ( ǫ 3 ) . (107) 18 Eq. (10 5) is just the Markov entrop y (73) o btained in the limit ǫ = 0. Eqs. (106) is the ﬁr st cor rection to the Marko v situatio n; it is obtained in [11, 1 3]. The second cor rection (107) is r ep orted in [15]. The author s of [15] also obtain the higher-o rder corr ections employing the mapping of the binary symmetric HMP to the one-dimensiona l Ising mo del. These higher- o rder correc tion can b e a lso obtained within the pr esent metho d. Thus w e demonstr ated that the small-noise (quasi-Markov) situation can b e a dequately ex plored with the present method. In addition w e obtain the small-nois e expressions (101 -104) for the zeta -function. This result is new and it allows to ﬁnd the momen t-genera ting function, which contains more information than the e ntropy , e.g., (100–104 ) ca n b e used for approximating the rate functions (37) and (39). In particular, for the g enerating function we get from (41) and (101–104 ) Λ( n ) = q n + (1 − q ) n − ǫn (1 − 2 q )  (1 − q ) 2 n q 2 − (1 − q ) 2 q 2 n  q 2 (1 − q ) 2 [ (1 − q ) n + q n ] + O ( ǫ 2 ) . (108) IX. SUMMAR Y. In this pap er w e studied the entropy and the momen t-genera ting function of Hidden Markov Pro cesses (HMP). The fac t that these pro cesses mo del non-Mar ko v memor y is at the origin of their numerous applica tions, and, simul- taneously , the main r eason of diﬃculties in characteriz ing their en tropy and the moment-generating function. Recall that the entrop y gives the num ber of sequenc e s in the typical set o f the r a ndom pro cess [6, 8]; the t ypical s et is the smallest set o f realizations with the ov erall pr obability clos e to one. Alter na tively , the en tropy is the unce r taint y [p er time-unit] of the pro cess given its long histor y . The gener ating function allows to estimate the [sma ll] proba bilit y of atypical sequences via the Cherno ﬀ bound and the rate functions [6, 8]. The entrop y of HMP w as studied via upp er and low er b ounds [6, 10], ex pa nsions over small par ameters [15–17], and via expressing the en tropy as a so lution o f an in tegral equation [7, 8, 1 0–1 4]. Here we prop osed to calculate the entrop y and the moment-generating function of HMP via the cycle e x pansion of the zeta-function, a method adopted from the theory of dy namical sy stems [25, 27, 29]. I show that this method has tw o basic a dv antages. First, it pr o duces exact r esults, b oth for the en tropy and the moment-generating function, for a cla ss o f HMP . W e did not s o far got into any systematic wa y of s earching for the ex a ct so lutio ns within this metho d. The ex a mples of exact solutions pres e n ted in s ection VI I E were obtained in the most stra ightforw ard way . Second, even if no ex a ct solution is found, the metho d oﬀers an ex pa nsion for the entrop y and the moment-generating function via an exp onentially conv ergent p ow er series [2 5, 2 7, 2 9]. Cutting oﬀ these expansio ns at so me ﬁnite order gives normally an improv able approximation for the s o ught quantit ies, es pe c ia lly since there are qualitative estimates for the con vergence radius of the series. This was demons trated in section VI I I. As a by-product of this s tudy , we conjectured in section VI I C o n tentative conditions under which HMP reduces to a ﬁnite-order Mar ko v pro cess. These conditions compa re fav orably with those existing in liter ature, s e e e.g. [34], and they deserve further e xploration. W e a lso conjectured relatio ns (92–9 4) b etw e en the rate functions of the random pro cess and its en tropy . Ackno wledgments I thank David Saakian for a rousing m y in terest in this problem. The work was suppo rted by V olks wagenstiftung gra nt ”Quantum Thermo dynamics: Energ y and infor ma tion ﬂow at na noscale” . [1] L. R. Rabiner, Pro c. IEEE, 77 , 257-286, (1989). [2] Y. Ephraim and N. Merha v, IEEE T rans. Inf. Th., 48 , 1518-1569, (2002). [3] M. Crouse, R. Now ak and R. Baraniuk, IEEE T ran. Signal Process., 46 , 886 (1998). [4] T. Koski, Hidden Markov Mo dels f or Bi oinformatics (Kluw er, Academic Publishers, Dordrech t, 2001). P . Baldi and S. Brunak, Bi oinformatics (MIT Press, Cambridge, USA, 2001). [5] R. Ash, Inf ormation The ory (Interscience Publishers, NY, 1965). [6] T. M. Co ver and J. A. Thomas, Elements of Information The ory , (Wiley , New Y ork, 1991). [7] D. Blac kw ell, The entr opy of functions of ﬁnite-stat e markov chains , in T rans. First Prague Conf. I n f. Th., Statistical Decision F unctions, Random Pro cesses, p . 13 (Pub. H ouse Chec hoslo v ak Acad. Sci., Prague, Czec hoslo v ak ia, 1957). [8] R.L. Stratonovic h, Information The ory (So vietskoe Radio, Moscow , 1976) (In Ru ssian). 19 [9] M. Rezaeian, Hidden Markov Pr o c ess: A New R epr esentation, Entr opy Ra te and Estimation Entr opy , arXiv:cs.IT/06061 14. [10] I.J. Birc h, Ann . Math. Stat. 33 , 930 (1962). [11] P . Jacquet, G. Seroussi, and W. Szpanko wski, On the Entr opy of a H i dden Markov Pr o c ess , Int. Sy mp. Inf. Th. p. 10, Chicago, IL, 2004. [12] T. Hollida y , A. Goldsmith and P . Glynn, I EEE T rans. Inf. Th. 52 , 3509 (2006). [13] E. Ordentlic h and T. W eissman, IEEE T rans. I nf. Th., 52 , 19 (2006). [14] S. Egner et al. , On the Entr opy R ate of a Hidden Markov Mo del , I nt. Sy mp. I n f. Th., p. 12, Chicago , IL (2004). [15] O. Zuk, I. Kanter and E. Domany , J. Stat. Phys. 121 , 343 (2005). [16] O. Zuk, I. Kanter, E. Domany and M. Aizenman, IEEE Signal Pro cessing Letters, 13 , 517 (2006). [17] P .Chigansky , The Entr opy R ate of a Binary Channel with Slolw ly V arying Input , arXiv:cs/0602074 . [18] R. A. Horn and C. R. Johnson, Matrix Analysis (Cambridge Universit y Press, New Jersey , USA, 1985). [19] J.F.C. Kingman, Ann. Probab. 1 , 883 (1973). [20] J.M. Steele, Ann ales d e l’I.H.P . B, 25 , 93 (1989). [21] A. Crisan ti, G. P aladin and A. V ulpiani, Pr o ducts of R andom Matric es in Statistic al Physics , S pringer Series in Solid State Sciences, V ol. 104, (Sp rin ger, Berlin, 1993). [22] L.Y. Goldsheid and G.A. Margulis, Russ. Math. Su rveys 44 , 11 (1989). [23] S.A. Orszag, P .L. Sulem an d I. Goldirsc h, Physica D 27 , 311 (1987). [24] L. Kontoro vic h, Me asur e Conc entr ation of Hidden Markov Pr o c esse s , arXiv:math/060806 4. [25] R. Artuso. E. Au rell an d P . Cvitanovic, Nonlinearit y 3 , 325 (1990). P . Cvitanovic, Phys. Rev. Lett. 61 , 2729 (1988). [26] D. Ruelle, Statistic al Me chanics, Thermo dynamic F ormalism , (Readin g, MA: Ad dison-W esley , 1978). [27] R. Mainieri, Chaos 2 , 91 (1992). [28] E. Aurell, J. Stat. Phys., 58 , 967 (1990). [29] J. N ielsen, Lyapunov exp onents for pr o ducts of r andom matric es , preprin t av ailable at http://ci teseer.ist.psu.edu/438423 .html. [30] L. Arnold, V. M. Gun dlac h and L. Demetrius, Ann. Appl. Probab., 4 , 859 (1994). [31] Y. P eres, An n. Inst. H. Poincare Probab. Statist., 28 , 131 (1992). [32] G. Han and B. Markus, IEEE T rans. I nf. Th., 52 , 5251, ( 2006). [33] I learned about th e funct ion ListNecklaces2 from th e e- mail exchange presen ted in http://forums.w olfram.com/student- supp ort/topics/6401 [34] L. Gurvits and J Ledoux, Linear Algebra and A pplications, 404 , 85 (2005). [35] K. P etersen, L e ctur es on Er go di c The ory , a v ailable from http://ww w.math.unc.edu/F aculty/petersen/lecturesp df.p df APPENDIX A : RECOLLECTION OF SOME F A CTS ABOUT T HE EIGEN-REPRESENT A T ION VERSUS SINGULAR V ALUE DECOMPOSITION. A matrix A ca n b e diag onalized if [18] A = V D V − 1 , (A1) where D is a diag onal matr ix , and where V is an arbitrar y inv ertible matrix . W riting the eigen-resolution o f D , D = P k α k | α k ih α k | , wher e h α k | α n i = δ kn , one gets A = X k α k | R k ih L k | , (A2) where α k are the eig env alues o f A (i.e., the solutions of det ( A − α 1) = 0), a nd where | R k i and | L k i are, resp ectively , the right and left eigenv ectors: A | R k i = α k | R k i , h L k | A = α k h L k | , h L k | R n i = δ kn . (A3) Note that in ge ne r al h L k | L n i 6 = δ kn . The rig ht a nd left eigen vectors c o incide for normal matrices [ A, A † ] = 0 ( A commutes with its complex conjugate). F or those matrices V is unitary . Not every ma trix can b e diago nalized, a necess a ry and suﬃcient condition for this is that for ea ch eigenv a lue the algebraic deg eneracy (i.e., dege ne r acy of this eigenv alue a s the ro o t of the characteristic p olynom) coincides with the geometric degenera cy (the n umber o f eigenv ectors cor resp onding to this eige n v a lue; geometric degener acy cannot b e larger than the algebraic one). Thus a suﬃcient condition for a matrix to b e diago nalizable is that its eigenv alues are not deg enerate. Here is a mor e general suﬃcient condition: Any matrix that commutes with a matrix with non-degenera te eigenv alues, is diagonalizable [1 8]. If for o ne eigen v a lue α o f A the algebr aic and geometr ic degeneracie s are equal (say to m ), then A = V αI m × m 0 0 A ′ ! V − 1 , (A4) 20 where I m × m is the m × m unit matrix. An alternative representation for the ma trix A is given b y the singular v a lue decomp o sition. No te that if det A 6 = 0 , the matrix A [ A † A ] − 1 / 2 is unitary . Then it holds A = U [ A † A ] 1 / 2 , (A5) where U is unitary . Eq . (A5) ho lds also for det A = 0 via the co n tinuit y . Going to the eigen-re s olution o f the hermitian matrix A † A , w e see that for any matrix A there is a sing ular v alue decomp osition: A = X k σ k | u k ih v k | , (A6) A | v k i = σ k | u k i , h v k | v n i = δ kn (A7) h u k | A = σ k h v k | , h u k | u n i = δ kn , (A8) where σ k (singular v alues of A ) is the common eig env alue sp ectrum of √ AA † and √ A † A . F or a given diago na lizable matr ix A , its singular v alue deco mp os ition is related to the eigen-r esolution via [1 8] h v n | R k i σ n = α k h u n | R k i , (A9) h u n | L k i σ n = α ∗ k h v n | L k i . (A10) The matrix A is nor mal if and only if | α k | = σ k . (I did not ﬁnd any standa rd reference on the fact that | α k | = σ k leads to no r mality; the pro of I got m yself is too tedious to b e presented here). Singular v a lues and eigenv alues a r e rela ted v ia the W eyl inequalities. F or a given matrix A , order the absolute v alues of its eigenv a lues as l 0 ≥ l 1 ≥ ... ≥ l n , a nd o r der its singular v alues a s σ 0 ≥ σ 1 ≥ ... ≥ σ n . The W eyl inequalities then read: m Y k =0 σ k ≥ m Y k =0 l k , m Y k =0 σ n − k ≤ m Y k =0 l n − k , (A11) m X k =0 σ ρ i ≥ m X k =0 l ρ i , ρ > 0 . (A12) F or n = m , (A11 ) leads to equa lity: Q n k =0 σ k = Q n k =0 l k . APPENDIX B: ADDITIONAL FEA TURES OF THE ENT R OPY. Recall the deﬁnitions (17 ) and (16 ) o f the entropy h a nd the blo ck entropy H ( N ) = H ( X N , ..., X 1 ), resp ectively , for the stationary pro cess X . Deﬁne: h ( N ) = H ( N ) − H ( N − 1) = H ( X N |X N − 1 , ..., X 1 ) . (B1) h ( N ) [sometimes calle d inno v a tio n entropy] is the uncertaint y o f X N given its histo ry X N − 1 , ..., X 1 . It is clear that once lim N →∞ H ( N ) N exists, h ( N ) co nverges to the so ur ce en tropy for N → ∞ . One can show that [6] H ( N ) N ≥ h ( N ) ≥ h ( N + 1) ≥ h. (B2) T o derive the second inequality in (B2) no te that the stationar ity and the ent ropy reduction due to conditioning imply h ( N ) = H ( X N |X N − 1 , ..., X 1 ) = H ( X N +1 |X N , ..., X 2 ) ≥ H ( X N +1 |X N , ..., X 1 ) = h ( N + 1) . (B3) The ﬁrst inequalit y in (B2) is shown as follows. H ( N ) N = 1 N H ( X 1 ) + 1 N N X i =2 H ( X i |X i − 1 , ..., X 1 ) ≥ 1 N N X i =1 H ( X N |X N − 1 , ..., X 1 ) = h ( N ) , (B4) where the ﬁrst equality is the obvious chain rule fo r the conditional information, while the s econd inequality in (B4) follows fro m the stationarity H ( X 1 ) = H ( X N ), a nd then fro m the same rea soning as in (B3). The last ine q uality in (B2) is no w ob vious. 21 The meaning of H ( N ) N ≥ h ≡ lim N →∞ H ( N ) N is that taking into ac count all the co rrelations dec reases the entrop y . In a rela ted context, h ( N ) ≥ h ( N − 1) mea ns that the innov atio ns decrease under a ccumulation o f exp er ience. This inequality can be employed for putting an upper b ound for H ( N + 1) in terms of H ( N ) and H ( N − 1): 2 H ( N ) − H ( N − 1) ≥ H ( N + 1) ≥ H ( N ) . (B5) Note also tha t H ( N + 1) = H ( N ) + h ( N + 1) ≤ H ( N ) + H ( N +1) N +1 leads to H ( N + 1) N + 1 ≤ H ( N ) N , (B6) i.e., the uncertaint y p er step decreases when incr easing N . APPENDIX C: ERGO DIC FEA T U RES OF THE SINGULAR V ALUES FOR A RAND OM MA TRIX PR ODUCT. Let us recall some imp ortant featur e s of the Lyapuno v exp onents of the rando m matrix pr o duct (8). Employ the known relation b etw een the singular v alues o f AB versus those of A and B [18] m Y k =0 σ k [ AB ] ≤ m Y k =0 σ k [ A ] σ k [ B ] , (C1) where 0 ≤ m ≤ L − 1, and where the o rdering (15) is ass umed: σ 0 [ A ] ≥ σ 1 [ A ] ≥ . . . . Now recall deﬁnitions (9, 10). Applying (C1) with m = 0 to T ( x N . .. 1 ) w e get ( M < N ) ln σ 0 [ T ( x N . .. 1 )] ≤ ln σ 0 [ T ( x M − 1 ... 1 )] + ln σ 0 [ T ( x N . ..M )] . (C2) Thu s, ln σ 0 [ T ( x N . .. 1 )] is sub- a dditive. T o gether with the a ssumptions i) , ii) and iii) of section IV A, Eq. (C2) ensures the applicability of the sub-additive erg o dic theorem [19, 20]. This leads (for N → ∞ ) to the pro bability-one conv ergence (24): − 1 N ln σ k [ T ( x N . .. 1 )] → µ k , (C3) for k = 0. Applying in the same way (C1) with m = 1 to T ( x N . .. 1 ), w e us e the sub-additivity for ln ( σ 0 [ T ( x N . .. 1 )] σ 1 [ T ( x N . .. 1 )]), deduce (24) for k = 1, and so o n. It is clear that we co uld not employ the s ub- additivity directly for l k [ T ( x N . .. 1 )] (mo dules of the eigenv alues), since they in general do not satisfy to anything like (C1). The sub-additive ergo dic theore m is related to the additive (Bir khoﬀ-Khinchin) ergo dic theorem that claims the existence (with proba bilit y one) of a similar limit for a function 1 N P N k =1 f [ X k ] of the stationar y ra ndom pro cess X = { X 1 , . . . , X N , . . . } [20]. APPENDIX D: EIGENV ALUES AND SINGULAR V ALUES OF THE RANDOM MA TRIX PRODUCT. Recall section IV B a nd the main question po sed there: when the mo dules of the eigenv alues of the ma trix pro duct T ( x N . .. 1 ) are equal, for N ≫ 1, to the singular v alues of T ( x N . .. 1 ). As shown b y (25), for N ≫ 1 w e can k eep the dependence on N only in the singular v a lues of T . (W e simpliﬁed notations as T ( x N . .. 1 ) = T .) First ass ume that T is a 2 × 2 matrix. W rite the singular v alue deco mpo sition (A5) for T as T =    e − N µ 0 0 0 e − N µ 1    U, U =    a b c d    , (D1) where e − N µ 0 and e − N µ 1 [with µ 0 < µ 1 ] a re the singular v a lues of T , and wher e the matrix U can be taken real, since T is real. Th us U is o rthogona l: ab + cd = 0, a 2 + c 2 = b 2 + d 2 = 1, ad − bc = ± 1. 22 F or the mo dules of the eigenv alues of T in (D1) o ne ﬁnds l 0 = | a | e − N µ 0 + | bc | | a | e − N ( µ 1 − µ 0 ) + . . . , l 1 = 1 | a | e − N µ 1 − 2 | bc | | a | 3 e − N (2 µ 1 − µ 0 ) + . . . . (D2) If | a | 6 = 0, the singular v a lues of T coincide with the absolute v alues of its eig e n v a lues for N ≫ 1 [2 3]: the terms O ( e − N ( µ 1 − µ 0 ) ) and O ( e − N (2 µ 1 − µ 0 ) ) are negligible and ln | a | is a lso neglected inside of the ex po nents as compare d to N µ 0 and N µ 1 . This conclusion changes for a = 0 (and th us d = 0 since U is o rthogona l). Now the mo dules o f the eigenv alues coincide with each other and are equa l to e − N ( µ 1 + µ 2 ) / 2 which is diﬀerent from the singular v alues . The next ex ample is 3 × 3 matrix T with the deter minant equal to zero: T =        e − N µ 0 0 0 0 e − N µ 1 0 0 0 0        U, U =        a b e c d f x y z        , (D3) where e − N µ 0 and e − N µ 1 [with µ 0 < µ 1 ] a re tw o non-zero singular v alues of T , and where the matrix U is or thogonal. Note tha t provided the third Lyapunov exp onent µ 2 is la rger than µ 1 (and pr ovided w e do not use the orthog onality features of the matrix U in (D3)), the consider e d exa mple is s uﬃcient ly general. Since det T = 0, the third singular v alue of T is zero. The third eigenv a lue o f T ( x N . .. 1 ) is also equal to zer o, while for the absolute v alues of the remaining eig env alues we hav e from (D3) l 0 = | a | e − N µ 0 + O ( 1 | a | e − N ( µ 1 − µ 0 ) ) , l 1 = | ad − b c | | a | e − N µ 1 + O ( e − N (2 µ 1 − µ 0 ) ) . (D4) If | ad − bc | 6 = 0 , the singular v alues e − N µ 0 and e − N µ 1 coincide [for N ≫ 1] with the modules o f the eigenv alues. F or | ad − bc | = 0 the second eigenv a lue of T is equa l to zero, while the second singular v alue is no n- zero. Howev er, the ﬁrst Lyapunov exp onent is still equal to the sp ectral radius (mo dule of the ﬁrst eigenv alue) if a 6 = 0. The latter tw o quantities are not equal for a = 0. Now the mo dules o f bo th eig env alues of T ( x N . .. 1 ) reduce to p | bc | e − N ( µ 1 + µ 2 ) / 2 . Using the ex amples (D1, D3) we g ot a suﬃcient condition fo r deciding whether the maximal singular v alue o f T is equal to the mo dule of the corr esp onding eigenv a lue . It is that the abso lute v alues o f the t wo leading eigenv alues of T are diﬀeren t. APPENDIX E: ZET A -FUNCTION AND PERIODIC ORBIT EXP ANSION. 1. Structure of p erio dic orbits. Deﬁne formally Z m = M X i 1 ,...,i m =1 φ [ A i 1 ...A i m ] , (E1) where A 1 , ..., A M are matrices , and where φ [ . ] is a function tha t turn its matr ix argument to a n umber. W e assume that the following fea tures hold for φ ( d is a p ositive integer): φ [ A d ] = φ d [ A ] , φ [ AB ] = φ [ B A ] . (E2) Using these features one ca n prove for Z m the following formula [26]: Z m = X n | m X ( γ 1 ,...,γ n ) ∈ Per( n ) n [ φ [ A γ 1 ...A γ n ] ] m n , (E3) where P n | m means that the summation go e s over all n that divide m , e.g., n = 1 , 2 , 4 for m = 4. Here Per( n ) cont ains sequences Γ = ( γ 1 , ..., γ n ) (E4 ) 23 T A BLE IV: The elements of Per( p ) for p = 1 , ..., 5 and M = 2. As compared to (9) we d enoted T ( x 1 ) = 1 and T ( x 2 ) = 2. It is seen that Per(1) contains tw o elements, since the cyclic p erm utation is t riv ial. Per(2) conta ins a single elemen t 12, since 11 and 22 remain inv ariant un der a single cyclic p ermutatio n, while B A is obtained from AB v ia a single cy clic p ermutation. Besides th e ob vious sequences 1111 and 2222, Per(4) do es not include the sequences 1212 and 2121 which stay in v arian t after tw o successiv e cyclic p ermutations. In Per(5) w e ﬁrst meet diﬀerent eleme nts that hav e the same ove rall num b er of 1’s and 2’s, e.g., 12121 and 11122. p P er( p ) 1 1, 2 2 12 3 122, 211 4 1222, 2111, 1122 5 12222, 21111, 11222, 22111, 12121, 21212 6 122222, 112222, 111222, 111122, 111112, 112212, 221121 111212, 222121. T A BLE V: The elemen ts of Per( p ) for p = 1 , ..., 4 and M = 3. p P er( p ) 1 1 , 2 , 3 2 12, 13, 23 3 122, 211, 233 322, 133, 311 123, 132 4 1222, 2111, 1122, 2333, 3222, 2233 1333, 3111, 1133 1123, 1132, 1213 2213, 2231, 2321 3312, 3321, 3231 selected acc o rding to the following r ules: i) Γ tur ns to itse lf after n successive cyclic p ermutations, but do es not turn to itself after any smaller (than n ) num b e r of successive cyclic p e r mut ations; ii) if Γ is in Per( n ), then Per( n ) contains none of those n − 1 sequences obtained from Γ under n − 1 successive cyclic p ermutations. Assume that M = 2, which mea ns that the matrices A i can take t wo v alues A 1 = 1 a nd A 2 = 2. With exa mples of Per( n ) given in T able IV, the pro o f of (E3) is s traightforw ard. 2. The i nv erse zeta-function and deriv ation of Eq. (44). The inverse zeta function is deﬁned a s ξ ( z ) = e xp  − P ∞ m =1 z m m Z m  , where Z m is given by (E1). Employing (E3) and in tro ducing notations p = n , q = m n , w e transform ξ ( z ) as ξ ( z ) = exp   − ∞ X p =1 X Γ ∈ Per( p ) ∞ X q =1 z pq q  φ [ A γ 1 ...A γ p ]  q   . (E5) the summation over q in (E5) is taken as ∞ X q =1 z pq q  φ [ A γ 1 ...A γ p ]  q = − ln  1 − z p φ [ A γ 1 ...A γ p ]  . (E6) 24 W e shall then ﬁnally get [25, 26]: ξ ( z ) = ∞ Y p =1 Y Γ ∈ Per( p )  1 − z p φ [ A γ 1 ...A γ p ]  . (E7) 3. How to generate the ele ments of P er( p ) vi a Mathematica 5. The elements o f Per( p ) pres ent ed in T ables IV and V were gener ated by hands. F o r large r p it is more conv enien t to ge nerate these elements via Mathematica 5. Below we assume that the rea der knows Mathematica a t some av erage level. First one should run the pack a ge of combinatoric functions: <

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment