On analytic properties of entropy rate

On analytic pr operties of entrop y rate Alexander Sch ¨ onhuth Paciﬁc Institute for the Mathematical Sciences School of Computing Science Simon Fraser Uni versity 8888 Uni versity Driv e Burnaby , BC, V5A 1S6, Canada schoenhuth@cs .sfu.ca Abstract. Entropy rate is a real v alued functional on the space of discrete ran- dom sources for which it exists. Ho we ver , it lacks existence proofs and /or closed formulas eve n for classes of random sources which have intuitive parameteriza- tions. A good way to ov ercome this problem i s to ex amine its analytic properties relativ e to some r easonable topology . A canonical choice of a topolo gy is that of the norm of total variation as it immediately arises w ith the idea of a discrete random source as a probability measure on sequence space. It is shown that both upper and lower en tropy rate, hence entropy r ate itself if it e xists, are L ipschitzian relativ e to this topology , which, by well known facts, is close to differentiability . An application of this theorem l eads to a simple and elementary proof of the ex- istence of entropy rate of random sources with ﬁnite ev olution dimension. This class of so urces encompasses arbitrary hidden Mark ov so urces and quantum ran- dom walks. K eywords. Analytic properties, discrete random source, entropy rate, ev olution dimension, hidden Marko v source, quantum random walk 1 Intr oduction Entropy rate is a k ey quantity in informatio n theor y as it is equal to the a verage amount of inf ormation per symbol of discrete-time, discrete- valued stoc hastic pro cesses (usu- ally referr ed to as d iscr ete rand om sources in the following). Ther efore, it is natural to ask how entropy r ate behaves if knowledge of discrete random sources is subject to uncertainties wh ich, for example, may be inh erent to inf erence pr ocesses an d/or or igi- nate from noisy channels. Ho we ver , closed formulas for entropy rate exist only for rare examples of classes of discrete random sour ces. F or in stance, already hid den Markov sources ( HMSs) seem to defy a con venient formula alth ough the re is one for the spe- cial case of Markov sources. Therefore, in th is case, recent ef forts focused o n the direct in vestigation of an alytic prop erties o f entropy rate like smooth ness or even ana lyticity [20,21], [31,30], [25], [18]. The purp ose o f this p aper is to contribute to the issue of analytic proper ties of en- tropy r ate in a more gene ral fashion. Namely , we study the beh avior of en tropy rate relativ e to the topology indu ced by th e norm of total variation. This top ology is one o f 2 A. Sch ¨ onh uth the natural choices and it is ubiquitous in both theoretical and practical work. W e sho w that entropy rate is Lipschitzian on th e whole space of d iscrete rando m sources which is, due to an elementary theorem of Rademacher, c lose to differentiability . W e will u se this r esult to g iv e an elementary pro of of the existence of e ntropy rate for sources with ﬁn ite ev olution dimensio n [6] which contain the classes o f arbitra ry HMSs [24] and quantu m rando m walks (QR Ws)[1], [5]. The paper is organized as follows. W e will id entify discrete random sources with probab ility me asures acting on the mea surable space o f symb ol sequenc es eq uipped with the σ -algebr a generated by the cylinde r sets of sequen ces. Theref ore, in section 2, we will b rieﬂy comp ile the theor y’ s standar d argu ments. In section 3 w e p rove that entropy rate is Lip schitz co ntinuou s relati ve to the topo logy induce d by the norm of total v ariation which is the main contribution of this paper . I n section 4 we demonstrate how to exploit this result for an elementary proo f of existence of rand om sources with ﬁnite ev olution dimension which includ e HMSs and QR Ws as special cases. In sectio n 5 we will describ e the proo f ’ s intuition thereby commen ting on open problems such as other choices of topolog y and/or stricter choices of analytic proper ties. 2 Random sources and entr opy rate As usual, Σ ∗ = ∪ t ≥ 0 Σ t is th e set of all words (strings of ﬁnite length ) over the ﬁn ite alphabet Σ to gether with the concaten ation operation v ∈ Σ t , w ∈ Σ s ⇒ v w ∈ Σ t + s . (1) Throu ghout this paper Ω = Σ N = N ∞ t =0 Σ is the set of seq uences over Σ and B is the σ -alg ebra generated by the cylinder sets. Cylinder sets B are identiﬁed with sets of words A B ⊂ Σ t such that B is th e set of sequences which star t with th e words in A B . In general, the cardinality of a set A is denote d by | A | . W e view stochastic processes ( X t ) t ∈ N with values in Σ as probab ility measures P X on the measurable sp ace ( Ω , B ) and vice versa via the relationship ( v = v 0 ...v t − 1 ∈ Σ t correspo nds to the cylinder set of sequences ha ving v as preﬁx) P X ( v ) = P ( { X 0 = v 0 , X 1 = v 1 , ..., X t − 1 = v t − 1 } ) , (2) where the term on the right hand side is the probab ility that the r andom source emits the symbols v 0 , ..., v t − 1 at periods 0 , ..., t − 1 . Note that a stochastic process ( X t ) is uniquely de termined b y the values P X ( v ) for all v ∈ Σ ∗ as th e cylinder sets co rre- sponding to words v g enerate B Although being a canonical choice of norm (see ap pendix A for a short re view of the related theo ry and cor respond ing deﬁn itions), com putation o f th e no rm of total vari- ation would not be easy f or the m easurable space und er co nsideration by m eans of its original deﬁnition alon e. The fo llowing lemm a sh ows a concrete way to get a grip of the correspondin g topo logy . Exact deﬁnition and basic prop erties of the norm of total variation ha ve been deferr ed to appendix A. On analytic prope rties of entropy r ate 3 Lemma 1. The topology induced by the norm of total variation is that of the metric d T V ( P X , P Y ) = sup t ∈ N X v ∈ Σ t | P X ( v ) − P Y ( v ) | = lim t →∞ X v ∈ Σ t | P X ( v ) − P Y ( v ) | . (3) wher e P X , P Y ar e p r o bability m easur es a ssociated to d iscr ete random sour ces ( X t ) , ( Y t ) . Pr oof. See sec. A.2 o f the a ppend ix for th e predo minantly measure theoretical ar- guments. ⋄ 2.1 Entropy Rate In the following, we will refer to the quan tities H ( X ) := H ( P X ) := lim sup t →∞ H t ( P X ) (4) resp. H ( X ) := H ( P X ) := lim inf t →∞ H t ( P X ) (5) as upper entr op y rate resp. lower entr o py r ate of a rando m sou rce ( X t ) with associated probab ility measure P X , where, using the languag e introdu ced abov e, H t ( P X ) := − 1 t X v ∈ Σ t P X ( v ) log P X ( v ) (6) is the entro py o f the distribution over the words of length t induced by the random source, divided by t . Entr opy rate of a random source ( X t ) with associated p robability measure P X is denoted by H ( X ) := H ( P X ) := lim t →∞ H t ( P X ) . (7) The existence of the limit o f the H t ( P X ) is also r eferred to as the existence of entr opy rate where, obviously , a necessary and suf ﬁcient condition for entropy rate to e xist is H ( X ) = H ( X ) (= H ( X )) . (8) Throu ghout th is paper, ∆ n − 1 = { x = ( x 1 , ..., x n ) ∈ R n | x i ≥ 0 , P i x i = 1 } is the usual r egular n − 1 -dimen sional simplex in R n and, fo r technic al conv encien ce, log is the n atural logarithm . Note that, as it is mo re comm on to use the lo garithm to the base 2 , switching bases does not affect an y analytic proper ty of entropy rate. 3 Analytic pr operties of entropy rate Our m ain r esult is th e following theorem , which states that entropy r ate is L ipschitz continuo us with respect to the top ology indu ced by the nor m of to tal variation. In the following let P be the set o f th e pr obability measur es associated with discrete r andom sources, viewed as a normed space. E lements of P will be denoted by P or Q . W e further denote th e no rmed subspace of discrete random sou rces fo r which entro py rate exists by P H . 4 A. Sch ¨ onh uth Theorem 1 (Lipschitz cont inuity of entropy rate) . The real-valued fun ctionals H and H o n P ar e Lipschitzian with L ip ( H ) = Lip ( H ) = log | Σ | , th at is, for P , Q ∈ P , | H ( P ) − H ( Q ) | ≤ (log | Σ | ) d T V ( P, Q ) (9) | H ( P ) − H ( Q ) | ≤ (log | Σ | ) d T V ( P, Q ) . (10) Clearly , beca use of (8), a cor ollary of the theorem is that the sam e holds tru e for entropy rate itself. Corollary 1. Entr op y r ate is Lipschitzian with Lip ( H ) = log | Σ | , that is, | H ( P ) − H ( Q ) | ≤ (lo g | Σ | ) d T V ( P, Q ) (11) wher e, her e, P, Q ∈ P H . W e presen t two lemmata, which inc orpor ate the essential ideas of the p roof of th e theorem. W e write d T V ,t ( P, Q ) := X v ∈ Σ t | P ( v ) − Q ( v ) | . (12) Lemma 1 says that lim t →∞ d T V ,t ( P, Q ) = d T V ( P, Q ) . Note that d T V ,t is not a metric on P . Lemma 2. Let P, Q ∈ P such that d T V ( P, Q ) ≤ 1 e . Then it hold s t hat | H t ( P ) − H t ( Q ) | ≤ (log | Σ | + 1 t log 1 d T V ,t ( P, Q ) ) · d T V ,t ( P, Q ) , wher e 0 · log ∞ := 0 in case of d T V ,t ( P, Q ) = 0 . For the proof of this lemma we will need a technical sublemma. Sublemma 31 Let h ( x ) := x lo g(1 / x ) for x ∈ ]0 , 1] and h (0) = 0 . Then, for x, y ∈ [0 , 1] , | x − y | ≤ 1 e = ⇒ | h ( x ) − h ( y ) | ≤ h ( | x − y | ) . (13) Pr oof. Note ﬁrst that h ′ ( x ) = log 1 x − 1 and h ′′ ( x ) = − 1 x . Hence h is concave, has a global m aximum at 1 e and h ( 1 e ) = 1 e . Therefore x ≤ h ( x ) ⇔ x ≤ 1 e ( ∗ ) . Because of | h ( x ) − h ( y ) | = | | h ( x ) − h ( 1 e ) | − | h ( 1 e ) − h ( y ) | | ≤ max {| h ( x ) − h ( 1 e ) | , | h ( 1 e ) − h ( y ) |} (14) On analytic prope rties of entropy r ate 5 and the fact that h is mono tonically increasing on [0 , 1 e ] we can, witho ut loss o f g ener- ality , assume th at either x, y ≥ 1 e or x, y ≤ 1 e . Because of | h ′ ( x ) | ≤ 1 o n [ 1 e , 1] and the mean value theorem , it holds t hat 1 e ≤ x, y ≤ 1 ⇒ | h ( x ) − h ( y ) | ≤ | h ′ ( x ) || x − y | = | x − y | . (15) Because of ( ∗ ) we obtain the claim for the case 1 e ≤ x, y ≤ 1 . It remains the c ase (w .l. o.g. x < y ) x < y ≤ 1 e . Here it h olds that | h ( x ) − h ( y ) | = h ( y ) − h ( x ) . W e note that th e f unction log 1 t − 1 is positive and m onoton ically decreasing on [0 , 1 e ] ( ∗∗ ) . W e obtain the claim from the calculation | h ( x ) − h ( y ) | = Z y x (log 1 t − 1) dt ( ∗∗ ) ≤ Z y x (log 1 t − x − 1) dt s = t − x = Z y − x 0 (log 1 s − 1) ds =  s log 1 s  y − x 0 = h ( y − x ) . (16) ⋄ Let n ow ∆ n − 1 K := K · ∆ n − 1 = { x = ( x 1 , ..., x n ) ∈ R n | x i ≥ 0 , P i x i = K } . In a way that is completely an alogou s to that of showing that entropy attains a maxi- mum at uniform distributions we infer that, on ∆ n − 1 K , the function h K,n ( x 1 , ..., x n ) := P n i =1 x i log 1 x i (a scaled version of entropy) attains a g lobal m aximum at ¯ x := ( K /n, ..., K/ n ) ( ∗ ∗ ∗ ) . W e are now able to prove lemma 2. Pr oof. Obviously H t ( P ) = H t ( Q ) in case of d T V ,t ( P, Q ) = 0 . In case of d T V ,t ( P, Q ) > 0 | H t ( P ) − H t ( Q ) | ≤ 1 t X v ∈ Σ t | P ( v ) log 1 P ( v ) − Q ( v ) log 1 Q ( v ) | S ubl. 31 ≤ 1 t X v ∈ Σ t | P ( v ) − Q ( v ) | log 1 | P ( v ) − Q ( v ) | ( ∗∗∗ ) ≤ 1 t X v ∈ Σ t d T V ,t ( P, Q ) | Σ | t log | Σ | t d T V ,t ( P, Q ) = 1 t d T V ,t ( P, Q )( t log | Σ | + log 1 d T V ,t ( P, Q ) ) . (17) ⋄ T o g et c ontrol of the limes superior r esp. inferior in volved in the deﬁnition of en - tropy rate we will further need the following lemma. 6 A. Sch ¨ onh uth Lemma 3. Let ( a t ) and ( b t ) two non-n e gative r eal valued sequen ces s uch that | a t − b t | ≤ c t and lim t →∞ c t = c. (18) Then it holds that | lim sup t →∞ a t − lim sup t →∞ b t | ≤ c (19) | lim inf t →∞ a t − lim inf t →∞ b t | ≤ c. (20) Pr oof. W e only display th e proof for ( 19) as that o f (20) can be obtained, mu tatis mutandis, by analogo us consideratio ns. W .l.o.g. assume a := lim sup a t ≥ lim sup b t =: b . Choose a sub sequence k ( t ) such that lim t →∞ a k ( t ) = a . W e obtain a − b ≤ a − lim sup t →∞ b k ( t ) = lim s up t →∞ a k ( t ) − lim sup t →∞ b k ( t ) ≤ lim sup t →∞ | a k ( t ) − b k ( t ) | ≤ c. (21) ⋄ W e are now in position to prove th eorem 1. Pr oof. As L ipschitz c ontinuity is a local pro perty , we can assume that d T V ( P, Q ) ≤ 1 e . Setting a t := H t ( P ) and b t := H t ( Q ) we obtain by lemma 2 | a t − b t | ≤ d T V ,t ( P, Q )(log | Σ | + 1 t log 1 d T V ,t ( P, Q ) ) = : c t . (22) The deﬁnition of d T V ,t and lemma 1 lead to lim t →∞ c t = lim t →∞ d T V ,t ( P, Q )(log | Σ | + 1 t log 1 d T V ,t ( P, Q ) ) = d T V ( P, Q ) · log | Σ | . (23) Plugging ( a t ) , ( b t ) and ( c t ) into lemma 3 then yields the desired result. ⋄ In order to elucidate that the struc ture of the proof stron gly depen ds o n the choice of the norm we re phrase lemma 2 in a more gene ral fashion, withou t the “so ul” of an entropy . Th erefor e let h n ( x 1 , ..., x n ) = 1 log n n X i =1 x i log 1 x i (24) on ∆ n − 1 where n ≥ 2 and 0 log ∞ := 0 . A m ore pr osaic version of lemma 2 th en reads | h n ( x ) − h n ( y ) | ≤ || x − y || 1 · (1 + 1 log n log 1 || x − y || 1 ) , (25) On analytic prope rties of entropy r ate 7 where || x || 1 = P i | x i | as u sual. A straightfor ward con sequence of the lemma is ∀ ǫ ∈ R + ∃ δ ∈ R + ∀ n ≥ 2 ∀ x, y ∈ ∆ n − 1 : || x − y || 1 < δ = ⇒ | h n ( x ) − h n ( y ) | < ǫ. (26 ) After being tr anslated b ack to e ntropies, this states that entropy ra te is unifor mly con - tinuous o n P . W e no te that th e statemen t of th e ge neralized lemma n eed not b e tru e relativ e to norms || . || p different from || . || 1 . More formally: Lemma 4. Let 2 ≤ p < ∞ and || x || p = p p P i | x i | p the u sual p -norm on R n . Then it holds that ∃ ǫ ∈ R + ∀ δ ∈ R + ∃ N ≥ 2 ∃ x, y ∈ ∆ N − 1 : || x − y || p < δ, | h N ( x ) − h N ( y ) | > ǫ which is just the ne gation of (26). For t he p roof we use the notation ( 0 < m ≤ n ) x ∗ m,n := ( 1 m , ..., 1 m | {z } m times , 0 , ..., 0 ) ∈ ∆ n − 1 . (27) Pr oof. Choose ǫ = 1 / 2 and δ ∈ R + arbitrarily . Choose an m ∈ N , such that m > 1 δ . Then ﬁnd an N 0 > m , such that || x ∗ n,n || 2 = ( 1 n ) 1 / 2 < δ for e very n ≥ N 0 . Further || x ∗ m,n − x ∗ n,n || p ≤ || x ∗ n,n || p = ( 1 n p − 1 ) 1 p = n − p − 1 p = n 1 p − 1 ≤ n − 1 2 = || ( 1 n , ..., 1 n ) || 2 < δ, (28) but | h n ( x ∗ m,n ) − h n ( x ∗ n,n ) | = 1 log n | log m − log n | − → n →∞ 1 . (29) Therefo re, we ﬁnd an N ∈ N and suitab le x, y ∈ ∆ N − 1 which suppo rt the statemen t of the lemma. ⋄ R E M A R K Because o f lemma 4, one c ould intuitively be led to the assum ption that entropy rate need not be continuou s with resp ect to the norms given th rough the spaces L p ( Ω , B , P ) , p ≥ 2 . Howe ver , th is is not true, see [27] for respecti ve considerations. 4 Entr opy rate of sources with ﬁnite evolution dimension In the following we will g iv e a direct pr oof of the existence of entro py rate of sourc es with ﬁnite evolution dimension wh ich h ad been introduced in [6]. See the sub sequent subsection 4.3 for prev alent e xamples of random sources of ﬁnite e v olution dimension. As sourc es with ﬁnite ev olution dimen sion are asymptotically mean station ary [6] , the result can be obtained as a corollary of the theorem of Shann on-McMillan -Breiman for asymp totically m ean stationary sour ces [ 9]. Howe ver , the fo llowing proo f is much simpler . See subsection 4.4 for a detailed comparison of the tw o proo fs. 8 A. Sch ¨ onh uth 4.1 Preliminaries In the following let the shift operator T : Ω → Ω be deﬁned by T ( v 0 v 1 v 2 ... ) := v 1 v 2 ... . (30) Obviously , T is measurable. If ( X t ) is a discrete random source with associated mea- sure P X then ( P X ◦ T − k )( v ) := X w ∈ Σ k P X ( wv ) (31) giv es rise to a pr obability measure P ◦ T − k which is a ssociated with the discrete ran dom source (( X k ) t ) deﬁned through P X k ( { ( X k ) 0 = v 0 , ( X k ) 1 = v 1 , ..., ( X k ) t − 1 = v t − 1 } ) := P X ( { X k = v 0 , X k +1 = v 1 , ..., X t − 1+ k = v t − 1 } ) . A discrete r andom source ( X t ) is said to be o f ﬁ nite evolutio n dimensio n if the family ( P X ◦ T − k ) k ≥ 0 spans a ﬁnite-dimension al subspace in the linear spa ce of ﬁnite, signed measures on ( Ω , B ) (see app endix A for the deﬁnition of a ﬁnite, signed measure). In the following we will write P T − i := P ◦ T − i and P n := 1 n n − 1 X i =0 P T − i (32) for prob ability measures P associated with random sources. Theorem 2. If P is a discrete r ando m source of ﬁnite evolution dimension ther e is a stationary discr ete random sour ce ¯ P , called the stationary mean of P such that lim n →∞ d T V ( P n , ¯ P ) = 0 . (33) Pr oof. The proof is cen tered o n an elementary fact from linear algebra. As it re- quires some of the basic theo ry of ﬁnite, signed mea sures, we have d eferred it to sec. A.3 in the appen dix. Note that an alternati ve, slightly more com plicated version of the proo f has already been giv en in [6]. ⋄ 4.2 Proof f or the existence of entropy rate In order to be pr epared f or the pro of we p rovide a lemma wh ose im mediate consequ ence is that entropy rate coincides for all P n , n ≥ 0 . Lemma 5. Let P be a pr obability measur e associa ted with a random sour ce. Then it holds that ∀ n ∈ N : lim t →∞ ( H t ( P ) − H t ( P n )) = 0 . (34) On analytic prope rties of entropy r ate 9 Pr oof. A straightf orward con sequence o f Lem ma 2 .3.4, [10] is that for α ∈ [0 , 1] and prob ability measures P, Q : αH t ( P ) + (1 − α ) H t ( Q ) ≤ H t ( αP + (1 − α ) Q ) ≤ αH t ( P ) + (1 − α ) H t ( Q ) + log 2 t Now , by induction on n , 1 n n − 1 X i =0 H t ( P T − i ) ≤ H t ( P n ) ≤ 1 n n − 1 X i =0 H t ( P T − i ) + n t log 2 and the assertion f ollows from lemma 7 (appen dix B) which states that th e H t ( P T − i ) conincide for all i ≥ 0 . ⋄ W e establish that both upp er entropy rate H and lower entropy rate H coincid e for all P n , n ≥ 0 . Corollary 2. Let P be the pr o bability measur e associated with a random sour ce. Then it holds that H ( P ) = H ( P n ) and H ( P ) = H ( P n ) (35) for all n ∈ N . Pr oof. Use lemma 5 in or der to apply lemma 3 to the sequenc es ( a t := H t ( P )) , ( b t := H t ( P n )) for the ﬁrst equation. F or the seco nd one r ephrase lemma 3 with lim inf in- stead of lim s up . ⋄ As a consequence, we can prove the existence o f entro py rate for ﬁnite-evolution- dimensiona l s our ces. Theorem 3 (Existence o f entropy ra te). Let P be a p r o bability measur e associated with a r ando m sour ce of ﬁnite evolution d imension. Let ¯ P be the s tationa ry mean of P . Then it holds that H ( P ) = H ( P ) = lim t →∞ H t ( P ) . (36) Ther efor e, entr opy rate of P exists. Mor eover , it is e qual to the o ne of the statio nary mean ¯ P . Pr oof. As th e P n conv erge in TV -norm to ¯ P (theor em 2) we obtain d ue to the continuity of H , H (theor em 1) lim n →∞ H ( P n ) = H ( ¯ P ) and lim n →∞ H ( P n ) = H ( ¯ P ) . (37) It follows, as H ( P n ) and H ( P n ) are constant with respect to n (cor ollary 2) and H ( ¯ P ) = H ( ¯ P ) (as entro py rate exists f or station ary source s) that H ( P ) = H ( P ) = H ( ¯ P ) . ⋄ R E M A R K Theorem 2 c an be gen eralized to gen eral asyptotically me an stationary (AMS) sourc es (see [9] for th e theory of AMS sources). Howe ver , the proof needs a sophisticated ergodic theorem, thereby loosing the elementary ﬂa vour [26 ]. 10 A. Sch ¨ on huth 4.3 Examples of discrete random sources of ﬁnite ev olution dimension In th e following, we present two classes of d iscrete ran dom sour ces that ha ve ﬁnite ev olution dimension. Hidden Markov Sources (HMSs) Hidd en Markov sour ces (HMSs) are the discrete ran- dom sources associated with hidde n M arkov mode ls (HMMs) (also termed hidden Markov chains in the related literatu re). HMSs have been largely stud ied, see e.g. [ 24] for a compreh ensiv e revie w . In the following, we will give a b rief deﬁnition of HMMs. An HMM M = ( Σ , S, π , A, E ) is sp eciﬁed by a ﬁnite set of o utput sym bols Σ , a set of hidden states S = { 1 , ..., n } , a transition pro bability matr ix A = ( A ij ) i,j ∈ S ∈ R n × n , an initial pro bability distribution π ∈ R n and an em ission prob ability matrix E = ( E iv ) i ∈ S,v ∈ Σ ∈ R n × Σ . It gives rise to a discrete r andom sour ce p M with values in the ﬁnite set Σ , referred to as hidden Markov sour ce (HMS) by the idea of changing hidden states according to the transition probabilities A ij = P ( i → j ) , wh ere the ﬁrst state is picked acco rding to π , and emitting sym bols from th e hid den states, as speciﬁed by the emission pr obabilities E ia = P ( a is emitted fr om i ) . More fo rmally , in accordanc e with (2), p M ( v = v 1 ...v t ) = X i 1 ...i t ∈ S t π ( i 1 ) E i 1 v 1 A i 1 i 2 E i 2 v 2 · . . . · A i t − 1 i t E i t v t . (38) In the literature, H MSs are often introduced as b eing induced by ﬁnite fu nctions of Markov chains wh ere e mission prob ability distributions are replaced b y a ﬁnite functio n f : S → Σ mappin g h idden states to output symbo ls. It is straightforward to see that they gi ve rise to complete class of HMSs as well. It is well known that HMSs ha ve ﬁnite dimension or, equiv alently , have ﬁnite de gree of fr eed om . See [13] for an early work o n the topic and [ 14] f or f urther related work. The r elationship o f ﬁn ite d imension and ﬁnite ev olution dimension has been thor oughly discussed in [6]. I t holds that ﬁnite ev olution d imension is a n ecessary condition o f ﬁnite dimension, which estab lishes that H MSs a re o f ﬁn ite ev olution dimen sion. Ex amples fo r which the generalizatio n o f the existence of entropy r ate of sou rces with ﬁnite evolution dimension apply are non- stationary HMSs. A simple example for this might be a binary - valued so urce (i.e. Σ = { 0 , 1 } ) ind uced by a “circular” HMM acting on th ree hid den states S := { 1 , 2 , 3 } w ith transition resp. emissionn proba bility matrix   0 1 0 0 0 1 1 0 0   resp.   0 1 0 . 5 0 . 5 1 0   . (39) Clearly , this sour ce is no t stationary such that the simple existence pro of for stationary sources does not apply . Howe ver , as an HMS, th is source is of ﬁnite evolution dimension such that theorem 3 ensures the existence of its entropy rate. See the subsequent sec. 4.4 for a comparison of av ailable proofs of the e xistence of entropy rate. R E M A R K Related w ork o n analy tic prop erties of entropy rate of HMSs is con- cerned with top ologies re ferring to the pa rameterization s of the HMMs giving rise to the On analytic properties of entropy rate 11 HMSs, that is, with the natu ral top ologies o f real- valued vector spaces (e.g. [20,21,22]). For example, in th e special case o f b inary valued i.i.d. p rocesses, emitting values from { 0 , 1 } , entropy rate is computed as p log( 1 p ) (40) where the only p arameter p is the p robab ility tha t the binary valued i.i.d. process e mits a 1 . Clearly , p lo g(1 /p ) is not Lipschitz co ntinuou s in intervals aroun d zero [ ( p log(1 /p )) ′ = log 1 / p − 1 ]. Howe ver , this does not contra dict theorem 1 as con vergence w .r .t. the pa- rameterization d oes not imply co n vergence w .r .t. the n orm o f total variation, which we will brieﬂy outline in the following. As follows f rom elemen tary measure theo retical co nsideration s, the top ology in- duced by the n orm of total variation is equi valent to that of the gene ral version of the metric of total variation D T V ( P, Q ) := sup B ∈B | P ( B ) − Q ( B ) | (41) where P, Q ∈ P are two pr obability measu res acting on the measurable space ( Ω , B ) . In the case of the m easurable sequenc e space s und er c onsideration here, the equ iv alence of the topologies of the m etric and the norm of total v ariation can be seen b y lemma 1 as it fo llows from straightforward elementary computa tions that the to polog ies of d T V of lemma 1 and th e metric of to tal variation D T V of (41) are equiv alent. As a consequen ce, con vergence in the sense of the norm of to tal variation is equiv alent to unifor m conv ergence o n all measurable sets, that is, lim n →∞ || P − P n || T V = 0 ⇔ lim n →∞ sup B ∈B | P ( B ) − P n ( B ) | = 0 (42) where P , P n ∈ P . Howe ver , as o utlined in [20], sec. VIII, convergence of pro bability measures in- duced by hidden Markov models whose parameter izations conver ge may no t e ven b e str o ng (see [1 5] for deﬁnitions and characte rizations of sev eral forms of convergence of probab ility measures) meaning that there might exist a set B ∗ ∈ B for which lim sup n →∞ | P ( B ∗ ) − P n ( B ∗ ) | > 0 (43) where the P n , n = 0 , 1 , ... a re hidden Mar kov mo dels whose pa rameterization s co n- verge to the parameterization o f P . Acco rding to (42), this means th at convergence in terms of the p arameteriza tion does not necessarily imply co n vergence w .r .t. the norm of total variation. Quantum Ran dom W alks (QRWs) Qu antum random walks ( QRWs) were introduced to quantum information th eory in 2001 as an an alogon to classical Mar kov sources [1]. F or example, they a llow to emulate Markov Chain Monte Carlo ap proach es o n quantum computer s. Howe ver , th eir proper ties ar e much less understood . A QR W Q = ( G, U , ψ 0 ) , in a very general form (see [1] for the full range of deﬁnitions), is speciﬁed 12 A. Sch ¨ on huth by a d irected, K -regular gr aph G = ( V , E ) , a unitary ( evolution ) operator U : C N → C N and a wave functio n ψ 0 ∈ C N (i.e. || ψ 0 || = 1 f or || . || the E uclidean norm ) where N := K · | V | = | E | . Dimensions are labeled by edges which in turn are labeled by ( u, x ) where u ∈ V an d x ∈ X , | X | = K and C N is consider ed to be spa nned b y th e orthon ormal ba sis ( e ( u,x ) ) ( u,x ) ∈ V × X = E . A QR W induces a classical rand om source p Q with values in Σ := V (i. e. the set o f no des) b y the f ollowing iterativ e proc edure. In the ﬁrst step, the e volution o perator is applied to th e initial wa ve function ψ 0 , and the resulting wa ve fu nction U ψ 0 , with prob ability P x ∈ X | ( U ψ 0 ) ( u 1 ,x ) | 2 , is collapsed (i.e. p rojected a nd renormalized, which models a quantum mechanica l measuremen t) to the subspace of C N , span ned by th e vectors e ( u 1 ,x ) , x ∈ X th at is associated with (the edges leaving from ) node u 1 , thereby ge nerating the ﬁrst symbol u 1 . Th is procedu re results in a ne w wave fu nction de scribing the state ψ u 1 the QR W is in after having generated the ﬁrst symbol u 1 . In ord er to g enerate a seco nd symbol U is ap plied to ψ u 1 , and U ψ u 1 is, with pr obability P x ∈ X | ( U ψ u 1 ) ( u 2 ,x ) | 2 , collap sed to state ψ u 1 u 2 , thereby gener ating the secon d sym bol u 2 . Iterative applicatio n of this b asic proce dure of ev olving fo llowed b y collapsing y ields a sequ ence of symb ols. See [1] f or fur ther details. A concise form al descr iption in terms of for mula a nalogou s to (3 8) of the discrete random source p Q along with a proof of QR Ws being of ﬁnite dimension has been presented in [28 ]. Finite ev olution d imension f ollows from ﬁnite d imension, which , as outlined above, has been thorou ghly discussed in [6]. 4.4 Comparison of existence proofs of entropy r ate The result of theorem 3 for the special case of HMSs can be obtained as a combinatio n of the Sh annon -McMillan-Breim an (SMB) theorem for asymptotically mean stationary (AMS) sources [ 9] and the fact that HMSs ar e A MS [19] (see also [24] for a com- prehen si ve re view of theor etical results o n HMSs). Therefo re, the existence of entropy rate for ar bitrary , stationary and no n-stationar y , HMSs h as theor etically been known since 1981. For QWRs the r esult has been kn own since 2006 , implied by combinin g the results o f [9] and [5] in th e same fashio n as for HMSs. Ho wever , even f or HMSs, the result seems to be rather un noticed which might b e du e to both th e co mplex nature of its proof and that the necessary comb ination of results has not been explicitly mention ed. The SMB theorem in this most generalized version is centered ar ound a proof f or the class of ergodic, stationar y r andom sources [ 29,23,3,4] which requires inv olved ergod ic theorems. The extension to gen eral station ary sources [2,16,17], in an ex emplar y (and elegant) version, need s th e so phisticated con cept of th e ergodic decom position of sta- tionary r andom sou rces [8]. The ﬁnal step [9] requir es again a collection of non -trivial theorems as a prereq uisite. The pro of given here is sub stantially simpler f rom two m ain aspects. First, it is c en- tered around the standard elemen tary pro of of the existence of entropy rate of stationary sources. Note that, this way , we do not even need to in troduce ergodicity . Second , the extension to non -stationary classes of rand om sources is do ne by results o f exclusi vely elementary nature (theorem s 2 , 1). On analytic properties of entropy rate 13 5 Conclusion W e show that entropy r ate is Lipschitzian relati ve to the topolog y o f total variation in an elemen tary fashion. B esides f rom providing a comparatively simp le existence proof for HMSs and QR Ws, th is helps getting a more general grip of entropy rate. Moreover , it brings up some interesting open questions: – A ﬁrst op en question which im mediately arises is whe ther our arguments can be strengthen ed to stricter analy tic pr operties. A ﬁrst c lue is that the de ﬁnition of en - tropy rate as well as th eorem 1 can be consistently extend ed to the wh ole real vector space of ﬁn ite, signed measures. Rad emacher’ s theor em [ 7] s tates that Lipschitzian function als on ﬁnite-dimension al re al vector spa ces are differentiab le almo st every- where w .r .t. t he Leb esgue mea sure on th e Borel- sets. This po ints at that entro py rate is close to being dif feren tiable and, so far , we ha ve not succeeded in constructing a random source at which entropy rate is not differentiable. – The intuitio n behind our pr oof is that en tropy rate cannot differ too much if sets of typical sequences of tw o sources overlap to a sufﬁciently high degree. Ho wev er , it seems to be o bvious that entr opy rate is con tinuou s when conside ring it relative to sizes of sets o f typical sequenc es which is a m ore genera l assumption. A cor- respond ing result would certainly be applicab le to coarser topolog ies as, say , the weak topo logy . So far, it h as o nly been known that entro py rate, as a f unctiona l on the set of stationar y sour ces o nly , is upper semicontinuous [1 0] relati ve to the weak topo logy . W e believe that theo rems of the q uality of theo rem 1, based o n the compariso n of sizes of sets of typical sequences, will greatly improve such results. A The norm of total variation In the following, let A ˙ ∪ B be the disjoint union of two sets A and B and ∁ A b e the compleme nt of a set A . A.1 Finite signed measures A ﬁnite, signed measure o n ( Ω , B ) is a σ - additive but n ot nec esarily positive, ﬁnite set fun ction on B . The most imp ortant relev ant properties of ﬁnite signed measures a re summarized in the following theorem (see [11], ch. VI for proofs). Theorem 4. 1. By eventwise add ition a nd scala r mu ltiplication, the set of ﬁnite signed measures can be considered a s a r eal-valued vector space. 2. The Jor dan deco mposition theor em state s that for every P ∈ P ther e a r e ﬁnite measur es P + , P − such that P = P + − P − (44) and fo r all other d ecomposition s P = P 1 − P 2 with measur es P 1 , P 2 it hold s tha t P 1 = P + + δ, P 2 = P − + δ for an other measure δ . In th is sen se, P + and P − ar e unique and ca lled positive r e sp. n egati ve v ariation . The measu r e | P | := P + + P − is called total variation . 14 A. Sch ¨ on huth 3. In parallel to the Jor dan decomposition we have the Hahn decomposition of Ω into two disjoint events Ω + , Ω − Ω = Ω + ˙ ∪ Ω − (45) such that P − ( Ω + ) = 0 and P + ( Ω − ) = 0 . Ω + , Ω − ar e un iquely determined up to | P | -null-sets. 4. The norm of total variation || . || T V on P is give n by || P || T V := | P | ( Ω ) = P + ( Ω ) + P − ( Ω ) = P + ( Ω + ) + P − ( Ω − ) . (46) Obviously || | P | || T V = || P || T V . A.2 Proof of lemma 1 For the proof, we will identify cylinder sets B ∈ B with sets of words A B ∈ Σ t as usual ( B is the set of sequences which are the continuations of the w ords in A B ). In our notation, we correspon dingly obtain P ( B ) = X v ∈ A B P + ( v ) − P − ( v ) (47) for a signe d measure P with Jo rdan de composition P = P + − P − . W e will further make use of the appr oximation theo rem (see Halmos [1 1], p. 56, Th. D) which tells that, gi ven a measure P , an event B ∈ B an d ǫ ∈ R + , we ﬁnd a cylinder set F such that P ( B △ F ) < ǫ, (48) where B △ F = ( B \ F ) ∪ ( F \ B ) is the sym metric set difference. A stra ightforward consequen ce of this is that | P ( B ) − P ( F ) | < ǫ . Pr oof. It suf ﬁces to show || P || T V = sup t ∈ N X v ∈ Σ t | P ( v ) | = lim t →∞ X v ∈ Σ t | P ( v ) | . (49) for an arbitrary ﬁnite, signed measure P . The second equation of (3) now follo ws immediately from X v ∈ Σ t | P ( v ) | = X v ∈ Σ t | X a ∈ Σ P ( v a ) | | {z } = | P ( v ) | ≤ X v ∈ Σ t X a ∈ Σ | P ( v a ) | = X v ∈ Σ t +1 | P ( v ) | (50) which shows that ( P v ∈ Σ t | P ( v ) | ) t ∈ N is a monotonic ally increasing seque nce. It re- mains to sh ow that it con verges to || P || T V . This tran slates to d emonstrate th at, gi ven ǫ ∈ R + , there is T 0 ∈ N with X v ∈ Σ T 0 | P ( v ) | > || P || T V − ǫ. (51) On analytic properties of entropy rate 15 Therefo re let P + , P − be the Jordan decomposition of P and, correspond ingly , Ω = Ω + ˙ ∪ Ω − be the Hahn deco mposition. By an application of the appr oximation theorem (see above) we ﬁnd T 0 ∈ N and a cylinder set correspondin g to A ⊂ Σ T 0 with | P | ( Ω + △ A ) < ǫ 4 (52) a straightfo rward ( | P | = P + + P − ) consequen ce of which is that both P + ( Ω + △ A ) < ǫ 4 and P − ( Ω + △ A ) < ǫ 4 (53) Now note th at the o bvious ∁ A △ ∁ B = A △ B in combina tion with Ω − = ∁ Ω + and (53) yields P − ( Ω − △ ∁ A ) = P ( Ω + △ A ) < ǫ 4 . (54) (53) and (54) then yield the inequa lities P + ( ∁ A ) P + ( Ω − )=0 = P + ( Ω + \ A ) ≤ P + ( Ω + △ A ) < ǫ 4 (55) and P − ( A ) P − ( Ω + )=0 = P − ( Ω − \ ∁ A ) ≤ P − ( Ω − △ ∁ A ) < ǫ 4 . (56) Moreover , it is straig htforward from (53) and (54) that P + ( A ) > P + ( Ω + ) − ǫ 4 and P − ( ∁ A ) > P − ( Ω − ) − ǫ 4 . (57) W e ﬁnally compute X v ∈ Σ T 0 | P ( v ) | = X v ∈ A | P ( v ) | + X v ∈ ∁ A | P ( v ) | ≥ | P ( A ) | + | P ( ∁ A ) | ≥ P + ( A ) − P − ( A ) + P − ( ∁ A ) − P + ( ∁ A ) ( 55 ) , ( 56 ) > ( 57 ) ( P + ( Ω + ) − ǫ 4 ) − ǫ 4 + ( P − ( Ω − ) − ǫ 4 ) − ǫ 4 = P + ( Ω + ) + P − ( Ω − ) − ǫ = || P || T V − ǫ. (58) ⋄ A.3 Proof of theorem 2 W e start with the following lemma. 16 A. Sch ¨ on huth Lemma 6. Let P be a ﬁnite signed measure on (Ø , B ) and T : Ø → Ø a measurable function. Then P ◦ T − 1 is a ﬁnite signed measur e for which | P ◦ T − 1 | ( B ) ≤ | P | ( T − 1 B ) (59) for all B ∈ B . In particular , || P ◦ T − 1 || T V ≤ || P || T V . (60) Pr oof. Note th at P ◦ T − 1 = P + ◦ T − 1 − P − ◦ T − 1 is a decom position into a difference of mea sures. Because of the u niquen ess pro perty of the Jor dan decomp o- sition (see th . 4), there is a m easure δ such th at P + ◦ T − 1 = ( P ◦ T − 1 ) + + δ and P − ◦ T − 1 = ( P ◦ T − 1 ) − + δ . T herefor e | P ◦ T − 1 | ( B ) = ( P ◦ T − 1 ) + ( B ) + ( P ◦ T − 1 ) − ( B ) ≤ P + ( T − 1 B ) + P − ( T − 1 B ) = | P | ( T − 1 B ) . B = Ø yields the last asser- tion, as T − 1 Ø = Ø . ⋄ Pr oof o f Th. 2. W e recall that, in th. 2, T was sup posed to be the sh ift operator, which is measurable . W e observe that µP := ( P ◦ T − 1 ) (61) establishes a linea r ope rator on the vector spac e of ﬁnite, s igned m easures. Du e to lemma 6, (60), it hold s that || µP || T V ≤ || P || T V for all ﬁnite signed measur es P , which establishes || µ || ≤ 1 (62) where || . || is the operato r norm associated with the norm of total variation. Now consider the subsp ace P P of the ﬁnite sign ed measur es spanned by all P ◦ T − i , i ∈ N fo r a given ﬁnite signed measure. No te that an equiv alent description of ﬁnite e volution dimension is just dim P P < ∞ . (63) Note further that µ ( P P ) ⊂ P P . (64) The elem entary , linear algebraic lemma 3. 2 in [6] states th at, given an endomo rphism F : V → V on a ﬁnite-dimensional r eal- or c omplex-valued vecto r space V with || F || ≤ 1 , for all x ∈ V there is an F -inv ariant ¯ x ∈ V such that lim n →∞ || 1 n n − 1 X k =0 F k x − ¯ x || = 0 . (65) As all norms are equ iv alent on V , this applies f or arbitrar y choices of norms || . || . Re- placing V by P P , || . || by || . || T V , F by µ and x by P co ncludes the proof of theorem 2. ⋄ On analytic properties of entropy rate 17 B Proof of lemma 7 Lemma 7. Let P be a discr ete random sour ce. Th en it holds that lim t →∞ ( H t ( P ) − H t ( P ◦ T − k )) = 0 . (66) Pr oof. Using the notation I k t ( P ) := 1 t X v ∈ Σ k X w ∈ Σ t P ( v w ) log P T − k ( w ) P ( v w ) (67) and J k t ( P ) := 1 t X v ∈ Σ k X w ∈ Σ t P ( v w ) log P ( v ) P ( v w ) (68) one obtains H t ( P ) + J k t ( P ) ( ∗ ) = H k + t ( P ) = I k n ( P ) + H t ( P ◦ T − k ) ( 69) where ( ∗ ) follows from a we ll known and elementary theorem ( e.g. [12], p.22, theorem 2.1) and the second equation is obvious. Because of 0 ≤ J k t ( P ) ≤ k t H k ( P ◦ T − t ) S ≤ 1 t log | Σ k | − → t →∞ 0 (70) and 0 ≤ I k t ( X ) ≤ 1 t H k ( X ) ≤ 1 t log | Σ k | − → t →∞ 0 , (71) the assertion follows from an application of the sandwich theorem. ⋄ Acknowledgmen t I would like to th ank Ulrich F aigle who considerably contributed to the w ork presented here. I also would like to than k the unknown re viewers for help ful com ments and sug- gestions. Refer ences 1. D. Aharonov , A. Ambainis, J. K empe, U. V azirani, ”Quantum walks on graphs”, in Proc. of 33r d ACM STOC, New Y ork , 2001, pp. 50-59. 2. P . Billingsle y , E r godic Theory and Information , W iley , 1965. 3. L. Breiman, “The in dividu al ergod ic theorem of information the ory”, in Annals of Mathema t- ical Statistics , 1957, v ol. 28, pp. 809-811. 4. L. Breiman, A correction to ’the indi vidual ergod ic theorem of information t heory’. Annals of Mathematica l Statistics , 31:809–8 10, 1960. 18 A. Sch ¨ on huth 5. U. Faigle, A . Sch ¨ onhuth, ”Quantum predictor models”, Electonic Notes in Discr ete Mathe- matics , 2006, vol. 25, pp. 149-155. 6. U. Faigle and A. Schoenhuth, “ Asymptotic mean stationarity of sources with ﬁnite e volution dimension”, IEEE T ra ns. Inf. Theory , 2007, vo l. 53(7), pp. 2342-2348 . 7. H. Federer , Geometric Measur e Theory . Springer , 1969. 8. R. Gray and L . Davisson, “The ergodic decomposition of stationary discrete r andom pro- cesses”, IEEE T ransactions on Information Theory , 1974, vol. 20(5 ), pp. 625-636. 9. R.M. Gray and J.C. Kieffer , “ Asymptotically mean stationary measures” Annals of P r obabil- ity , 1980, v ol. 8, pp. 962–973. 10. Robert M. Gray , E ntr opy and Information Theory . Springer V erlag, 1990. 11. P .R. Halmo s, Measur e Theory . V an Nostrand, 1964 . 12. T .S. Han and K. Ko bayashi, Mathematics of Information and Coding . American Mathemat- ical Society , 2002. 13. A. Heller “On stochastic processes deriv ed from Markov chains”, A nnals of Mathematical Statistics , vol. 36(4), pp. 1286-1291 , 1965 14. H. Ito, S.-I. Amari and K. K obayash i: “Identiﬁability of hidden Mark ov information source s and their minimum degrees of freedom”, IEEE T ran s. Inf. Theory , vol. 38(2), pp. 32 4–333, 1992. 15. S.D. Jacka and G.O. Roberts, “On strong f orms of weak con ver gence” Stochastic Pr ocesses and Applications , 1997 , vol. 67, pp. 41-53 . 16. K. Jacob s, “Die bertragung diskreter Informationen durch periodische und fastperiodisch e Kanle”, 1959, Mathematisc he Annalen , vol. 137, pp. 125-135. 17. K. Jacobs, “ber die Struktur der mittleren Entropie”, Mathematisches Zentralblatt , 1962, vol. 7 8, pp. 33-43. 18. P . Jacquet, G. Seroussi, and W . Szpank o wski, “On the entropy of a h idden markov process” , in Pr oc. Data Compr ession Conf. , Sno w bird, UT , March 2004, pp. 362-371. 19. J.C. Kieffer and M. Rahe, “Markov channels are asymptotically mean stati onary”, SIAM J . Math. Anal. , 1981, vol. 1 2(3), pp. 293-305. 20. G. Han an d B. Marcu s, “ Analyticity of entrop y rate of hidden Markov c hains”, IEEE T ran s. Inf. Theory , 200 6, vol. 52(12), pp. 5251-5266 . 21. G. Han and B. Marcus, “Deri v ativ es of entropy rate in special families of hidden Markov chains”, IEEE T ran s. Inf. Theory , 200 7, vol. 53(7), pp. 2642-2652. 22. G. Han and B. Marcus, “ Asymptotics o f entrop y rate of hidden Mark ov chains at weak black holes”, Pr oc. IEEE Int. Symp. Inf. Th. , 2008, pp. 2629-2633 . 23. B. McMillan, “T he basic theorems of information theory”, Annals of Mathematical Statistics , 1953, v ol. 24, pp. 196-219. 24. Y . Ephraim, N. Me rhav , ”Hidden Marko v processes”, IEEE T rans. on Information Theory , vol. 4 8(6), pp. 1518-1569. 25. E. Ordentlich and T . W eissman, “On the optimality of symbol by symbol ﬁltering and de- noising”, IEEE T ra ns. Inf. Theory , 2006, 52(1 ), pp. 19-40. 26. A. Sch ¨ onhuth, “The ergodic decompo sition of asymptotically mean stati onary random sources”, submitted manuscript, http://arxi v .org/abs/0804.2487. 27. A. Sch ¨ onhuth, “On analytic properties of entropy rate”, 2006, technical report, ZAIK, Uni- versity of Co logne. 28. A. Sch ¨ onhuth, “ A simple and efﬁcient solution of t he identiﬁabilit y problem for hidden M arkov sources and quantum random walks”, ISIT A 2008, to appear , http://arxiv .org/ab s/0808.2833. 29. C. S hannon, “ A mathematical theory of communication”, Bell System T echnical Jo urnal , 1948. 30. O. Zuk, E. Domany , I. Kanter , and M. Aizenman, “From ﬁnite system entropy to entropy rate for a hidden mark ov process” , IEE E Signal Pr ocessing Letter s , 2006 , vol. 13 (9), pp. 517-5 20. On analytic properties of entropy rate 19 31. O. Zuk, I. Kanter , and E. Domany . The entropy of a binary hidden marko v process. Journa l of Statistical Physics , 2005, v ol. 121(3-4), pp. 343-360.

On analytic properties of entropy rate

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment