The ergodic decomposition of asymptotically mean stationary random sources

The ergodic decomposition of asymptotically mean stationary random sources Alexander Sch ¨ onhuth , Member IEEE Paciﬁc Institute for the Mathematical Sciences School of Computing Science Simon Fraser Uni versity 8888 Uni versity Dri ve Burnaby , BC, V5A 1S6, Canada schoenhuth@cs .sfu.ca Abstract. It is demonstrated ho w to represent asymptotically mean stationary (AMS) random sources wi th va lues in standard spaces as mixtures of ergodic AMS sources. This an extension of the well kno wn decomposition of stationary sources which has facilitated the gene ralization of prominent source coding the- orems to arbitrary , not necessarily ergodic, stationary sources. Asymptotic mean stationarity generalizes the deﬁnition of s tationarity and co vers a much lar ger va- riety of real-world e xamples of random sources of practical interest. It is sk etched ho w to obtain source coding and related theorems for arbitrary , not necessarily er- godic, AMS sources, based on the presented ergodic dec omposition. K eywords. Asymptotic mean stationarity , ergodicity , e rgodic decomposition, er- godic theorem, source coding, stationarity . 1 Intr oduction The main purpose of this paper is to dem onstrate how to dec ompose asymp totically mean stationary (AMS) random sources into ergodic AMS sources. The is sue was broug ht up in [ 10 ], as it is inv olved in a variety of aspe cts of substantial interest to informa tion theory . T o the b est of our knowledge, it h ad remained unsolved since then. The ergodic d ecompo sition of AMS sou rces can be viewed as an extensio n of the ergodic decomp osition of station ary sources which states tha t a s tationary so urce ca n be decomp osed into ergod ic compo nents o r , in oth er words, th at it is a mixtur e of stationar y and e rgodic so urces. Th is was origin ally d iscussed in more abstract measure theo retic settings (see the subsequent remark 1 ). The ﬁrst result in inf ormation theory that builds on the idea o f decomp osing a sour ce into ergodic compon ents was o btained by Jacobs in 196 3. He proved th at the entropy rate of a stationary sou rce is the average of the rates of its ergo dic components [ 17 ]. In 1974, the er godic decomposition of statio nary sources w as rigorously introduced to th e commun ity by Gray and Davisson [ 7 ] who also pr ovided an intuitive proof for so urces with values in a discrete alphab et. This turned out to be a striking success as prominent theorems from sou rce coding theory an d related ﬁelds co uld be extended to ar bitrary , 2 A. Sch ¨ onh uth not necessarily ergodic, station ary sources [ 8 , 19 , 23 , 27 , 22 , 5 ] (see the references therein as well as [ 11 ] for a complete list). In gen eral, these results u nderscor e that ergodic and infor mation theory have tradi- tionally been sources of mutual inspiration. Remark 1. The ﬁrst variant of an ergodic decomposition of stationary sourc es (with values in certain to pological spaces) was elaborated in a seminal paper by von Neu - mann [ 31 ]. Subsequently , Kryloff and Bogoliou boff [ 3 ] obtained the result for compact metric spaces. and it was further extended by Halmos [ 13 , 14 ] to normal spaces. In par- allel, Rok hlin [ 29 ] p roved the decomposition theorem fo r Lebesgu e spaces, which still can b e co nsidered as one of the mo st gen eral re sults. Ox toby [ 24 ] fu rther clariﬁed the situation by demonstra ting tha t Kryloff ’ s and Bogo liouboff ’ s results can be o btained as corollaries o f Riesz’ representation theo rem. In ergodic theory , the cor respondin g idea is now s tandard [ 26 , 32 ]. Asymptotic mea n stationar ity was ﬁrst intro duced in 19 52 by Dowker [ 4 ] and fur- ther stu died by Rechard [ 28 ], but b ecame an area o f active research only in the ear ly 1980s, th anks to a fundam ental paper of Gray and Kieffer [ 9 ]. Asymp totic mean sta- tionarity is a p roperty tha t applies fo r a large variety of natural examples o f sou rces of practical interest [ 9 ]. Reasons are: 1. Asymptotic mean stationa rity is stable u nder condition ing (see [ 21 ], p. 33) whereas stationarity is not. 2. T o possess er godic proper ties w .r .t. bound ed measurem ents is equivalent to asymp- totic mean stationarity [ 4 , 9 ]. Note that Birkhoff ’ s theo rem (e.g. [ 21 ]) states that stationarity is sufﬁcient to possess ergodic properties. 3. The Shann on-McMillan -Breiman (SMB) theorem was iteratively extended to ﬁ- nally hold for AMS discrete rand om sources in 1980 [ 9 ]. Note that an alter nativ e, elegant pro of of the SMB theorem can be achieved by em- ploying th e ergo dic decomp osition o f statio nary sou rces [ 1 ]. Th e seco nd p oint gives evidence of the practical relev ance o f AMS source s, as to possess ergodic prop erties is a necessity in a wide range of real-world ap plications of stochastic proce sses. For example, asymptotic mean stationarity is implicitly assumed wh en relativ e frequencies along sequences emitted b y a real-world process are to co n verge. See also [ 20 , 6 ] for expositions of large classes of AMS pr ocesses of p ractical in terest. Th e validity o f the SMB theorem is a fu rther theoretical clue to the relev ance of AMS sources in informa- tion theory . The beneﬁts of an ergo dic decompo sition of AMS sources a re, on one h and, to ar- range th e theory of AMS source s and, on the o ther han d, to facilitate f ollow-up results in source coding theory and related ﬁelds (see the discussion section 7 for some imme- diate consequences). I n [ 10 ], one ca n ﬁnd a concise proof of the er godic decomposition of stationary sour ces as well as the ergod ic decompo sition of two-sided AMS sourc es, both with v alues in standard spaces. The case of two-sided AMS sources, howe ver , is a straightfor ward r eduction to the s tationary ca se which does not ap ply for a rbitrary AMS The ergodic decomposition of asymptotically mean stationary random sources 3 sources. As the result for arbitra ry AMS sou rces would h av e b een highly desirable, it was listed as an open question in the discussion section of [ 10 ]. The main p urpose o f this paper is to pr ovide a proo f of the ergodic decomposition of arbitrary (two-sided and one-sided) AMS sources with v alues in stan dard spaces which cover discrete-valued and all natural examples of topological spaces. The pap er is organ ized as follows. In section 2 we co llect basic n otations and state the two main results. Th e ﬁrst one is the e rgodic d ecomposition itself an d the second one is an essential lemma that m ay be interesting in its own rig ht. In section 3 , we present basic d eﬁnitions of probability an d m easure the ory as well as a classical er- godic theorem ( Kr engel’ s stochastic er godic theo r em ) requir ed for o ur p urposes. T he statement o f Kren gel’ s th eorem is intuitively easy to grasp and can b e und erstood by means o f basic de ﬁnitions fr om pro bability theo ry only . In section 4 we give a proo f of lemma 1 . Both the statement and th e pro of of lemma 1 are crucial fo r the pr oof of th e deco mposition. In section 5 , we list relev ant basic p roperties o f standa rd spaces (subsection 5.1 ) and regular condition al pr obabilities and cond itional expectations (sub- section 5.2 ). Finally , in section 6 , we present th e proof of the ergodic decompo sition. For o rganizational con venience, we have subdivided it into three steps and collected the merely technical passages into lemm ata which have been deferred to th e appen dices A and B . W e co nclude by outlining imm ediate consequ ences of our result and pointing out potential applications in source coding theory , in the discussion section 7 . 2 Basic Notations and Statement of Results Let ( Ω , B ) be a mea surable space and T : Ω → Ω a measur able functio n. In this setting (see [ 26 , 12 ]), a probab ility m easure P is called stationa ry (relati ve to T ), if P ( B ) = P ( T − 1 B ) for all B ∈ B . It is called asymptotically mean stationary (AMS) ( relativ e to T ), if there is a measure ¯ P on ( Ω , B ) such that ∀ B ∈ B : lim n →∞ 1 n n − 1 X i =0 P ( T − i B ) = ¯ P ( B ) . (1) Clearly , the measure ¯ P is stationar y and it is therefore called the s tationary mean of P . An event I ∈ B is c alled invariant (relative to T ), if T − 1 I = I . The set o f in variant ev ents is a sub- σ -algebra of B whic h we will denote by I . A probab ility measur e P on ( Ω , B ) is said to be ergodic (relative to T ), if P ( I ) ∈ { 0 , 1 } fo r any such in v ariant I ∈ I . Note that an AMS system is ergodic if and only if its stationary mean is. In order to a pply this theo ry to ( A -valu ed) rand om sou r ces , that is, discrete-time stochastic processes with values in a standard space A (for a deﬁn ition of standard space see subsection 5.1 ), one sets Ω = A I = O i ∈ I A 4 A. Sch ¨ onh uth where I ∈ { N , Z } . Th at is, Ω is the space of one-sided ( I = N ) o r two-sided ( I = Z ) A -valued sequences. B then is set to b e the σ -algebra generated by the cylind er sets of sequences. A rando m sou rce is giv en by a p robability measu re P o n ( Ω , B ) . Further, T : Ω → Ω is deﬁn ed to be the left shift operator , i.e. ( T x ) n = x n +1 for x = ( x 0 , x 1 , ..., x n , ... ) ∈ Ω (on e-sided case) o r x = ( ..., x − 1 , x 0 , x 1 , ... ) ∈ Ω (two-sided case). The main contribution of this paper is to giv e a proof of the following theorem. Theorem 1. Let P be a pr ob ability measur e on a standar d space ( Ω, B ) which is AMS r elative to the measurable T : Ω → Ω . Th en the r e is a T -invariant set E ∈ I with P ( E ) = 1 such that for ea ch ω ∈ E th er e is an ergodic AMS pr obability measur e P ω and the following pr operties apply: (a) ∀ B ∈ B : P ω ( B ) = P T ω ( B ) . (b) ∀ B ∈ B : P ( B ) = Z P ω ( B ) dP ( ω ) . (c) If f ∈ L 1 ( P ) , th en also ω 7→ R f dP ω ∈ L 1 ( P ) an d Z f dP ( ω ) = Z ( Z f dP ω ) dP ( ω ) . Replacing AMS by station ary yields the a foremen tioned an d well-known theor em of the ergodic decomposition of stationary random sources (e. g. [ 10 ], th. 2.5). The following le mma is a key observation fo r the pro of o f theorem 1 and may be interesting in its own righ t. I t states that the co n vergence in volved in th e deﬁnition of AMS measures is unifor m over th e elemen ts of B . Th is may s eem in tuitiv ely surprising , as the underly ing measurable sp ace does not e ven hav e to be standard. Lemma 1. Let P be an AMS measur e on ( Ω , B ) r elative to T . Then sup B ∈B | 1 n n − 1 X i =0 P ( T − i B ) − ¯ P ( B ) | − → n →∞ 0 . In other wor ds, the con ver gence o f ( 1 ) is uniform over the e vents B ∈ B . 3 Pr eliminaries 3.1 Con vergence of Measures Deﬁnition 1. Let ( P n ) n ∈ N be a sequen ce o f p r o bability measures on a measu rable space ( Ω , B ) . The ergodic decomposition of asymptotically mean stationary random sources 5 – W e say th at th e P n conver ge strongly to a pr obability mea sur e ¯ P if th e sequ ences ( P n ( B )) n ∈ N conver ge to ¯ P ( B ) for all B ∈ B . – If this con ver gence happe ns to be u niform in B ∈ B we say tha t th e P n conver ge Skorokhod wea kly to ¯ P . See [ 16 ] for history and detailed character isations of these deﬁnitio ns. Obviou sly Skorokho d weak conver gence implies stron g conver gence. Seen from this perspe cti ve, lemma 1 states that the m easures P n = 1 / n P n − 1 t =0 P ◦ T − t , where P is an AMS mea- sure and P ◦ T − t ( B ) := P ( T − t B ) , do no t only converge stron gly (which they do by deﬁnition), but also Sk orokh od weakly to the stationary mean ¯ P . A helpful characteriz ation of Skorokh od weak conver gence is the following the- orem. Therefo re we recall that a p robability measure Q is said to domin ate another probab ility measur e P (written Q >> P ) if Q ( B ) = 0 implies P ( B ) = 0 for all B ∈ B . The theorem of Radon- Nikodym (e.g . [ 15 ]) states that in case of Q > > P there is a measurab le f unction f : Ω → R , c alled Radon- Nikodym derivative or simply density , written f = dP dQ , such that P ( B ) = Z B f dQ for all B ∈ B . It hold s that P ( f = g ) = 1 (h ence Q ( f = g ) = 1 ) for two densities f , g = dP dQ . As usual, L 1 ( Q ) := L 1 ( Ω , B , Q ) denotes the (linear ) space of Q - integrable functions on ( Ω , B ) modu lo the subspace o f function s th at are null almo st ev erywher e. For tech nical conv enience, we will some- times id entify elements of f ∈ L 1 ( Q ) with their representatives f : Ω → R . As a consequen ce we h av e that f = g in L 1 ( Q ) if and o nly if Q ( f = g ) = 1 for th eir rep - resentatives. That is, equ ality is in an almo st-ev erywhere sense fo r th e repr esentativ es. Therefo re, in L 1 ( Q ) , a density is uniqu e. Further more, L 1 ( Q ) c an be equipped with th e norm || f || 1 := Z Ω | f | dQ. See standard textbooks (e.g. [ 15 ]) for details. In this language, Skorokhod weak con vergence has a useful characterisatio n. Theorem 2 ([ 16 ]). Let ( P n ) n ∈ N , ¯ P be p r obability measur es. Then the following state- ments ar e equivalent: (i) The P n conver ge Skor ok hod weakly to ¯ P . (ii) Ther e is a pr obability measu r e Q , which d ominates ¯ P and all of the P n such th at the densities f n := dP n dQ conver ge stochastically to the density ¯ f := d ¯ P dQ , that is ∀ ǫ ∈ R + : Q ( { ω : | f n ( ω ) − ¯ f ( ω ) | > ǫ } ) − → n →∞ 0 . 6 A. Sch ¨ onh uth (iii) Ther e is a pr obability measu r e Q , which do minates ¯ P and all of the P n such th at the densities f n := dP n dQ conver ge in mean (in L 1 ( Q ) ) to th e density ¯ f := d ¯ P dQ , that is Z | f n − ¯ f | dQ − → n →∞ 0 . Pr oof. See [ 16 ], pp . 6–7. ⋄ 3.2 Krengel’ s theorem In fe w words, the stochastic er go dic theorem of Krengel states that the av erages of den- sities which are obtain ed b y iterativ e app lications of a positive con traction in L 1 ( Q ) conv erge stochastically to a d ensity tha t is in v ariant with respect to the p ositi ve contr ac- tion. T o be more precise, let ( Ω , B , P ) be a m easure space and U a positive contraction on L 1 ( Ω , B , P ) , that is, U f ≥ 0 for f ≥ 0 (p ositi vity) and || U f || 1 ≤ || f || 1 (contractio n). Then Ω can b e deco mposed into two disjoin t sub sets ( uniquely determined up to P - nullsets) Ω = ˜ C ˙ ∪ ˜ D , where ˜ C is the maximal sup port o f a f 0 ∈ L 1 ( Ω , B , P ) with U f 0 = f 0 . In other words, for all f ∈ L 1 with U f = f , we h a ve f = 0 o n ˜ D and there is a f 0 ∈ L 1 such that bo th U f 0 = f 0 and f 0 > 0 on ˜ C (see [ 21 ], p. 14 1 ff. fo r details). Kren gel’ s theo rem then reads as follows. Theorem 3 (Stochastic ergodic theorem; Krengel). If U is a po sitive contraction on L 1 of a σ -ﬁnite mea sur e spa ce ( Ω , B , Q ) (e.g. a p r o bability space, the deﬁnition of a σ -ﬁ nite measure space [ 15 ] is not further need ed her e) then , for an y f ∈ L 1 , the averages A n f := 1 n n − 1 X t =0 U t f conver ge sto chastically to a U -invariant ¯ f . Moreover , on ˜ C we have L 1 -conver gence, wher eas on ˜ D the A n f c on ver ge sto chastically to 0 . If f ≥ 0 then f = lim inf n →∞ A n f in L 1 ( Q ) . (2) Pr oof. [ 21 ], p.1 43. ⋄ 3.3 Finite Signed Measures Let ( Ω , B ) be a measura ble space. A ﬁnite signed measu re is a σ -add iti ve, but no t necessarily positive, ﬁnite set fun ction on B . Th e theorem of the Jordan decomposition ([ 15 ], p. 12 0 ff.) states that P = P + − P − for measures P + , P − . These m easures are uniquely deter mined insofar as if P = P 1 − P 2 for measure s P 1 , P 2 then there is a measure δ such th at P 1 = P + + δ and P 2 = P − + δ. (3) The ergodic decomposition of asymptotically mean stationary random sources 7 P + , P − and | P | := P + + P − are called positive, ne gative and total variation of P . W e further deﬁne || P || T V := | P | ( Ω ) . By “eventwise” addition an d scalar multiplication the set of ﬁnite sign ed measures can be made a normed vector s pace equipp ed with the norm of total variation || . || T V , writ- ten ( P , || . || T V ) or simply P . The fo llowing observation about signed measur es and measurable function s is crucial for this work. Lemma 2. Let P be a ﬁn ite signed mea sur e on ( Ω , B ) and T : Ω → Ω a measurable function. Then P ◦ T − 1 is a ﬁnite signed measur e for which | P ◦ T − 1 | ( B ) ≤ | P | ( T − 1 B ) for all B ∈ B . In particular , || P ◦ T − 1 || T V ≤ || P || T V . Pr oof. Note that P ◦ T − 1 = P + ◦ T − 1 − P − ◦ T − 1 is a decomp osition into a difference of measures. Because of the unique ness pro perty of the Jordan decomposition ( 3 ), th ere is a measure δ suc h that P + ◦ T − 1 = ( P ◦ T − 1 ) + + δ and P − ◦ T − 1 = ( P ◦ T − 1 ) − + δ . Therefo re | P ◦ T − 1 | ( B ) = ( P ◦ T − 1 ) + ( B ) + ( P ◦ T − 1 ) − ( B ) ≤ P + ( T − 1 B ) + P − ( T − 1 B ) = | P | ( T − 1 B ) . B = Ω yields the last assertion, as T − 1 Ω = Ω . ⋄ W e ﬁnally observe th e follo wing well known relationsh ip betwee n signed m easures dominated by a m easure Q and L 1 ( Q ) . Therefo re, as usual ( e.g. [ 15 ]), w e say th at a ﬁnite, signe d me asure P is d ominated by Q if its total variation is, that is, | P | << Q . Note that the set P Q of ﬁnite, signed m easures that are d ominated by Q is a linear subspace of P . Lemma 3. Let Q be a me asur e on th e mea surable space ( Ω , B ) and P Q be th e linea r space of the ﬁnite signed measures that ar e domin ated by Q . I f P f ( B ) := R B f dQ for f ∈ L 1 ( Q ) , then Φ : ( L 1 ( Q ) , || . || 1 ) − → ( P Q , || . || T V ) f 7→ P f establishes an isometry of normed vector spaces. Pr oof. This is a con sequence of the theore m of Radon-Nikodym , see [ 1 5 ], p. 128 ff. If P is a ﬁnite signed measure with | P | << Q then also P + , P − << Q . Deﬁne Ψ ( P ) := dP + dQ − dP − dQ ∈ L 1 ( Q ) as the difference of the densities of P + , P − relativ e to Q . Then Ψ is just the in verse of Φ . It is straightforward to check that || f || 1 = || Φ ( f ) || T V . ⋄ 4 Pr oof of Lemma 1 W e start by illustrating one o f the core techniq ues of this work. L et ( Ω , B ) be a mea sur- able space and ( Q n ) n ∈ N be a countable collection of pr obability measure s o n it. Then the set function deﬁned by Q ( B ) := X n ≥ 0 2 − n − 1 Q n ( B ) ∀ B ∈ B (4) 8 A. Sch ¨ onh uth is a proba bility measure which domina tes all of the Q n [ 16 ]. Let now ( Ω , B , P, T ) be such that P is an AMS m easure relati ve to the measurab le T : Ω → Ω . Deﬁne further P n to be the measures giv en by P n ( B ) = 1 n n − 1 X t =0 P ( T − t B ) (5) for B ∈ B . As a conseq uence of ( 4 ), the set function Q deﬁned by Q ( B ) := 1 2 ( ¯ P ( B ) + X n ≥ 0 2 − n − 1 P ( T − n B )) (6) for B ∈ B is a pr obability measur e which dominates all o f the P ◦ T − n as well as ¯ P . Hence it also dominates all of the P n . Accordingly , we write f n := dP n dQ and ¯ f := ¯ P dQ (7) for the respective densities. Lemma 1 can be obtain ed as a corollary o f the f ollowing result. Lemma 4. Let P be an AMS pr ob ability measur e on ( Ω , B ) r elative to T with station- ary mea n ¯ P . Let P n , Q , f n and ¯ f as deﬁ ned by e quations ( 5 ) ,( 6 ) and ( 7 ). Then th e f n conver ge stochastically t o the density ¯ f := d ¯ P dQ . Mor eover , ¯ f = lim inf n →∞ f n Q -a.e. (8) Pr oof. Let f 1 = dP dQ . The road map of the proof is to construct a positi ve con traction U on L 1 ( Q ) such t hat f n = 1 n X t =0 U t f 1 =: A n f 1 . As a consequen ce of Kren gel’ s theorem we will ob tain that the f n conv erge stochasti- cally to a U -in variant limit f ∗ . In a ﬁnal step we will show that indeed f ∗ = ¯ f in L 1 ( Q ) (i.e. Q -a.e.), which completes the proof . Our endom orphism U o n L 1 ( Q ) is indu ced by the measurable fun ction T . Let f ∈ L 1 ( Q ) . W e ﬁrst recall that, by lemma 3 , the set fun ction Φ ( f ) g i ven by Φ ( f )( B ) := Z B f dQ for B ∈ B and f ∈ L 1 ( Q ) is a ﬁn ite, signed measure o n ( Ω , B ) whose total variation | Φ ( f ) | is dom inated by Q . The ergodic decomposition of asymptotically mean stationary random sources 9 W e would li ke to deﬁne U f := Φ − 1 ( Φ ( f ) ◦ T − 1 ) , which would be obviously lin ear . Ho we ver , Φ − 1 is only d eﬁned on P Q , that is, for ﬁn ite signed measures that are domin ated by Q . The refore, we have to show tha t Φ ( f ) ◦ T − 1 ∈ P Q which translates to dem onstrating that | Φ ( f ) ◦ T − 1 | << Q . This does no t ho ld in general (see [ 21 ]). Ho we ver , in the special case o f the dominating Q cho sen here, it can be proven. T o see this let B such that | Φ ( f ) ◦ T − 1 | ( B ) > 0 and we have to show that Q ( B ) > 0 . Because of lemma 2 | Φ ( f ) | ( T − 1 B ) ≥ | Φ ( f ) ◦ T − 1 | ( B ) > 0 . As | Φ ( f ) | << Q , we obtain Q ( T − 1 B ) > 0 . By deﬁnition of Q we thus either ﬁn d an N 0 ∈ N such that 0 < P ( T − N 0 ( T − 1 B )) = P ( T − N 0 − 1 B ) or we h av e that 0 < ¯ P ( T − 1 B ) = ¯ P ( B ) because of the stationarity o f ¯ P . Both cases imply Q ( B ) > 0 which we had to show . If f ≥ 0 th en Φ ( f ) is a measure. Hence a lso Φ ( f ) ◦ T − 1 is a measure which in turn implies U f = d ( Φ ( f ) ◦ T − 1 ) dQ ≥ 0 . Hence U is p ositiv e. It is also a contractio n with respect to the L 1 -norm || . || 1 , as, becau se of the lemmata 2 and 3 , || U f || 1 = || Φ ( f ) ◦ T − 1 || T V ≤ || Φ ( f ) || T V = || f || 1 . For f 1 = dP dQ being the density of P relative to Q we obtain U n f 1 = d ( P ◦ T − n ) dQ Hence the f n := A n f 1 = 1 /n P n − 1 t =0 U t f 1 are th e densities o f th e P n = 1 n P n − 1 t =0 P ◦ T − t relativ e to Q . An applicatio n of Krengel’ s the orem 3 then shows that the A n f 1 conv erge stochastically to a U -in variant limit f ∗ ∈ L 1 ( Q ) . Note that a positive U - in variant f ju st correspon ds to a stationary measure. It r emains to show that ¯ f = f ∗ in L 1 ( Q ) or, equiv alently , ¯ f = f ∗ Q -a.e. for their representatives (see the discussions in subsection 3.1 ). Let ˜ D , as described in sub- section 3 .2 , be the comp lement of th e max imal supp ort of a U -inv ariant g ∈ L 1 ( Q ) . W e recall that station ary measures ar e identiﬁed with po siti ve, U -in v ariant elements of L 1 ( Q ) . Theref ore, ¯ f = d ¯ P dQ is U -in v ariant which yields Q ( { ¯ f > 0 } ∩ ˜ D ) = 0 which implies ¯ f = 0 Q -a.e. o n ˜ D . Due to K rengel’ s theorem, it holds that also f ∗ = 0 Q -a.e. on ˜ D , and we obtain that ¯ f = 0 = f ∗ Q − a.e. on ˜ D . 10 A. Sch ¨ on huth In order to conclude that ¯ f = f ∗ Q − a.e. on ˜ C it remains to show th at R B f ∗ dQ = R B ¯ f dQ fo r events B ⊂ ˜ C = Ω \ ˜ D as two integrable fun ctions co nincide almost e verywhere if their inte grals over arbitrary events coincide ([ 15 ]) with which we will have completed the proof . From Krengel’ s theor em we know that, on ˜ C , we hav e L 1 -conver gence of the f n : lim n →∞ Z ˜ C | f n − f ∗ | dQ = 0 . (9) Therefo re, for B ⊂ ˜ C , Z B f ∗ dQ ( 9 ) = lim n →∞ Z B f n dQ = lim n →∞ P n ( B ) ( ∗∗ ) = ¯ P ( B ) = Z B ¯ f dQ, where ( ∗ ∗ ) fo llows fro m the asymptotic mean stationarity of P . W e thu s have co m- pleted the proof of the main statement of the lemma. Finally , ( 8 ) is a direct consequen ce o f ( 2 ) in Krengel’ s th eorem. ⋄ In sum, we h av e shown that th ere is a measure Q that domin ates all of the P n as well as ¯ P such that the d ensities of the P n conv erge sto chastically to the d ensity of ¯ P . According to theorem 2 , t his is equ i valent to S khoro khod weak con vergence. Henc e we obtain lemma 1 as a corollary . 5 Pr eliminaries II In this section we w ill ﬁrst revie w a cou ple of additional d eﬁnitions that are n ecessary for a proof o f th eorem 1 . In sub section 5 .1 we g i ve th e d eﬁnition of a stand ard spac e. The ben eﬁcial prope rties of standar d spaces bec ome app arent in sub section 5.2 , wh ere we shortly revie w conditio nal probabilities and e xpectation. 5.1 Standard spaces See [ 25 ], ch. 3 or [ 12 ] for thorou gh treatmen ts o f standard sp aces. In the following, a ﬁeld F is a collection o f subsets of a set Ω that contains Ω and is clo sed with respect to compleme nts and ﬁnite union s. Deﬁnition 2. A ﬁeld F on a set Ω is said to have the countable extension p roperty if the following two conditions ar e met. 1. F ha s a countable number of elements. 2. Every nonne gative and ﬁnitely additive set function P o n F is continuo us at ∅ , that is, for a sequence of ele ments F n ∈ F with F n +1 ⊂ F n such th at ∩ n F n = ∅ we have lim n →∞ P ( F n ) = 0 . Deﬁnition 3. A mea surable space ( Ω , B ) is called a st andard space , if the σ - algebra B is generated by a ﬁeld F which has the countab le extension pr operty . The ergodic decomposition of asymptotica lly mean stationary random s ources 11 Remark 2. 1. Most o f th e prevalent e xamples of m easurable spaces in practice ar e stand ard. For example, any measurable space wh ich is gene rated by a c omplete, separable, metric space (i.e. a P olish space ) is standard. Moreover , standard spaces can be character- ized as be ing isom orphic to sub spaces ( B , B ∩ B ) of Polish spaces ( Ω , B ) where B ∈ B is a measurable set (see [ 25 ], ch. 3). 2. An alternativ e characterisation of standard spaces is that the σ - algebra B possesses a basis . See [ 18 ], app. 6, for a discussion. 5.2 Conditional Probability and Expectation See [ 25 ], ch. 6 or [ 12 ] for a discussion of conditiona l prob ability and expectation. Deﬁnition 4. Let P be a pr obability measure on a measurable spa ce ( Ω , B ) and let G ⊂ B be a sub- σ -a lgebra of B . A functio n δ ( ., . ) : B × Ω → R , is called a (version of the) conditional probability of P given G , if (CP1) δ ( B , . ) is G - measurable for all B ∈ B and (CP2) P ( B ∩ G ) = Z G δ ( B , ω ) dP ( ω ) for all G ∈ G , B ∈ B . δ ( ., . ) is called a ( version o f th e) regular conditio nal probability of P given G , if, in addition to (CP1) and (CP2), (RCP) δ ( ., ω ) is a pr obability measur e on B for all ω ∈ Ω . W e collect a couple of basic results about cond itional prob abilities. See [ 25 ] or [ 12 ] for details. 1. Let γ , δ be two versions of the conditio nal pr obability of P given G . Then the G - measurable f unctions γ ( B , . ) , δ ( B , . ) agree alm ost everywhere for any given B ∈ B , that is, we have ∀ B ∈ B : P ( { ω | γ ( B , ω ) = δ ( B , ω ) } ) = 1 . (10) 2. Condition al pro babilities alw ays exist. Ex istence of regular conditional p robabili- ties is not assured f or arbitrary measurable spaces. Howe ver , for standard spaces ( Ω , B ) existence can be proven. 3. Note that it cann ot be shown for arbitrary measurab le spaces that two versions δ, γ agree almost everywhere for all B ∈ B , meaning that we do not have P ( { ω | ∀ B ∈ B : γ ( B , ω ) = δ ( B , ω ) } ) = 1 . Howe ver , for standar d spaces ( Ω, B ) this beneﬁcial property applies: 12 A. Sch ¨ on huth Lemma 5. Let ( Ω , B ) be a mea surable space such that B is generated by a c ountab le ﬁeld F . Let P be a pr o bability measur e o n it an d assume that the r e gular con ditional pr obability of P given a sub - σ -algebra G exists. If δ, γ ar e two versions of it then the measur es δ ( ., ω ) and γ ( ., ω ) agr e e on a set of measur e one, that is, P ( { ω | ∀ B ∈ B : γ ( B , ω ) = δ ( B , ω ) } ) = 1 . W e display the proof, as its (routine) arguments are needed in subsequen t sections. Pr oof. Enum erate the elements of F and w rite F k for element No. k . Accord ing to ( 10 ) we ﬁnd for each k ∈ N a set B k of P -measure on e on which δ ( F k , . ) and γ ( F k , . ) ag ree. Hence, on B := T k B k , which is an event o f P -measure one, all of the δ ( F k , . ) an d the γ ( F k , . ) coincide. Thu s th e measur es δ ( ., ω ) a nd γ ( ., ω ) agree o n a generating ﬁeld of B fo r ω ∈ B . As a measure is uniq uely deter mined b y its values on a g enerating ﬁeld ([ 15 ]), we ob tain that the measures δ ( ., ω ) and γ ( ., ω ) a gree on B , that is, P -a lmost ev erywhere . ⋄ W e also give the de ﬁnition of c onditiona l expectations and p oint o ut their extra proper ties on standard spaces. Deﬁnition 5. Let ( Ω , B , P ) be a pr obability space and f ∈ L 1 ( P ) . Let G ⊂ B be a sub- σ -a lgebra. If h : Ω → R is 1. G - measurable and 2. for all G ∈ G it holds that R G f dP = R G h dP we say that h is a v ersion of the conditional expectatio n of f given G and write h ( ω ) = E ( f |G )( ω ) . Conditional e xpectations always exist. In case of standard s paces they have an extra proper ty which we rely on. See [ 25 ], ch. 6 for proof s of the following results. Theorem 4. Let ( Ω , B , P ) be a pr obability spa ce, G a sub- σ -algebra o f B an d f ∈ L 1 ( P ) . Then there exis ts a version E ( f |G ) of the co nditiona l e xpectation. In c ase of a standard spa ce ( Ω , B ) it h olds that E ( f |G )( ω ) = Z f ( x ) dδ P ( x, ω ) (11) wher e δ P is a version of the r egular conditional pr oba bility of P given G . Corollary 1. Let ( Ω , B ) be a standard space, P a pr obability mea sur e on it an d f ∈ L 1 ( P ) . Let G be a sub - σ -algebra and δ P the r e gular co nditiona l pr obability of P given G . Then ω 7→ R f dδ P ( ., ω ) is G -measurable (hence also B - measurable) a nd Z G f dP = Z G ( Z f dδ P ( ., ω )) dP (12) for all G ∈ G . The ergodic decomposition of asymptotica lly mean stationary random s ources 13 6 Pr oof of Theor em 1 W e recall t he n otations of s ection 2 an d that, according to the as sumption s of theo rem 1 , P is a measure o n a stan dard sp ace ( Ω , B ) that is AMS relative to the me asurable T : Ω → Ω . 6.1 Sketch of the Proof Str ategy The core idea for proving the theorem is to deﬁne th e measures P ω as being induced by the regular cond itional proba bility measures of P given the inv ariant e vents I . That is, we deﬁne ∀ B ∈ B : P ω ( B ) := δ P ( B , ω ) (13) where, h ere and in the fo llowing, δ ref ers to r egular conditiona l probabilities given the in variant ev ents I . Note that, for arb itrary probability measures P on ( Ω , B ) , δ P ( B , ω ) = δ P ( B , T ω ) , (14) as, other wise, δ P ( B , . ) − 1 ( y ) would n ot be an inv ariant set for y := δ P ( B , T ω ) which would be a contradiction to the I -measurability of δ P ( B , . ) . As a con sequence of ( 14 ), we ob tain property ( a ) of the th eorem. Furthermo re, ( b ) is the deﬁning property ( C P 2) of a regular conditional probability (see Def. 4 ) and ( c ) is eq uation ( 12 ) fro m co rollary 1 with G = Ω . What rem ains to show is that, for ω in an in variant set E o f P -m easure one, the P ω are ergodic and AMS. W e intend to do th is by the f ollowing strategy . First, we r ecall that if, in the orem 1 , AMS is r eplaced b y station ary , we ob tain the we ll k nown result of the ergod ic decom- position of stationar y measures (see the introductio n fo r a discussion). I f on e follows the lines of a rgumentation of its pr oof (see [ 10 ], th. 2.5) on e sees that, on a n inv ariant set of P -measur e one, the P ω are just th e regular con ditional probabilities of the stationary P . App lying the ergodic d ecompo sition of stationary m easures to the stationary mean ¯ P of P p rovides us with an inv ariant set ¯ E of P -measure 1 su ch that ω ∈ ¯ E = ⇒ ¯ P ω := δ ¯ P ( ., ω ) is stationary and ergodic. (15) W e will show that, o n an in variant set E ⊂ ¯ E of P -measu re one, th e P ω conv erge Skorokho d weak ly (hence strongly , see D ef. 1 ) to the ¯ P ω , which translates to tha t the P ω are AMS an d have station ary mean s ¯ P ω . As an AMS m easure is ergodic if its sta- tionary mean is ergodic, we will ha ve completed the pro of. Therefo re, we w ill proceed according to the following s teps: Step 1 W e construct measures Q ω that dom inate ¯ P ω and all of the P n,ω := 1 n n − 1 X t =0 ( P ω ◦ T − n ) , n ≥ 0 (16) 14 A. Sch ¨ on huth (note that P ω = P 1 ,ω ), which will provide us with densities f n,ω := dP n,ω dQ ω and ¯ f ω := d ¯ P ω dQ ω (17) for all ω . Step 2 W e construct positiv e contractions U ω on L 1 ( Q ω ) such that U ω d ( P ω ◦ T − n ) dQ ω = d ( P ω ◦ T − n − 1 ) dQ ω (18) hence A n f 1 ,ω := 1 n n − 1 X t =0 U t ω f 1 ,ω = f n,ω (19) W e app ly Kreng el’ s the orem (th. 3 ) to ob tain that the f n,ω conv erge stochastically to a U ω -in variant f ∗ ω as well as f ∗ ω = lim inf n →∞ f n,ω in L 1 ( Q ω ) Step 3 W e show t hat, for ω in an inv ariant set E of P -m easure one, f ∗ ω = ¯ f ω in L 1 ( Q ω ) . This com pletes the proof , as this states that the P ω conv erge Sk orokh od weakly to the ¯ P ω in E , hence that th e P ω are ergodic and AMS fo r ω in the inv ariant set E o f P - measure one. 6.2 Step 1 W e recall deﬁnitions ( 5 ) an d ( 6 ) of P n and Q . W e deﬁne Q ω as th e pro bability measures induced by the regular co nditional probability o f Q gi ven the in v ariant ev ents I , that is , Q ω ( B ) := δ Q ( B , ω ) (20) for B ∈ B . It remains to show that, by choosing an appro priate version, Q ω indeed dominates all of th e P ω ◦ T − n (hence all of th e P n,ω ) as well as ¯ P ω . This is estab lished by th e following lemma w hose me rely technical proof has been deferred to ap pendix A . Lemma 6. α ( B , ω ) := 1 2 ( ¯ P ω ( B ) + X n ≥ 0 2 − n − 1 P ω ( T − n B )) (21) is a version of the r egular conditiona l p r o bability of Q given I . Remark 3. In order to achie ve that Q ω dominates all of the P ω ◦ T − n and ¯ P ω one could have deﬁned Q ω directly v ia ( 21 ). However , the observation that Q ω is in duced by the regular conditional probability of Q given I is crucial for step 3. The ergodic decomposition of asymptotica lly mean stationary random s ources 15 6.3 Step 2 Construction of positive contractions U ω on L 1 ( Q ω ) is ach iev ed by , mu tatis mutan dis, reiterating the arguments ac companying the construction of U in the proo f of lemma 4 . In more detail, we rep lace P , P n , ¯ P , Q, f n , ¯ f th ere by P ω , ¯ P ω , P n,ω , Q ω , f n,ω , ¯ f ω (we recall ( 13 ),( 15 ),( 16 ),( 20 ),( 17 ) for the latter deﬁn itions) here. Note that choosing the ver- sion of Q ω accordin g to lemma 6 ensures that U ω indeed maps L 1 ( Q ω ) onto L 1 ( Q ω ) . ( 18 ) an d ( 1 9 ) then are a dir ect co nsequen ce of th e deﬁnition of U ω . Finally , appli- cation of Krengel’ s theo rem 3 to th e p ositiv e con traction U ω on L 1 ( Q ω ) yields a U ω - in variant f ∗ ω to whic h the f n,ω conv erge stochastically . Moreover , again by Kreng el’ s theorem, f ∗ ω = lim inf n →∞ f n,ω in L 1 ( Q ω ) . (22) 6.4 Step 3 W e hav e to sho w that f ∗ ω = ¯ f ω in L 1 ( Q ω ) for ω in an inv ariant set E ⊂ ¯ E with Q ( E ) = 1 . In a ﬁrst step, the fo llowing lemm a will provide as with a usefu l inv ariant E ∗ where E ⊂ E ∗ ⊂ ¯ E and Q ( E ∗ ) = 1 . W e further recall the deﬁnitions of f n and ¯ f a s the densities of P n and ¯ P w .r .t. Q (see ( 7 )). W ithout loss of gener ality , we choose rep resentatives that are everywhere nonnegati ve. Due to lemma 4 , lim inf n →∞ f n = ¯ f in L 1 ( Q ) . (23) Lemma 7. Ther e is an invariant set E ∗ with P ( E ∗ ) = Q ( E ∗ ) = 1 such that, for ω ∈ E ∗ , lim inf n →∞ f n = lim inf n →∞ f n,ω in L 1 ( Q ω ) (24) and ¯ f = ¯ f ω in L 1 ( Q ω ) . (25) Pr oof. W e have deferre d the merely t echnical pro of to app endix B . ⋄ W e compute Z E ∗ ( Z | f ∗ ω − ¯ f ω | dQ ω ) dQ ( 22 ) , ( 24 ) , ( 25 ) = Z E ∗ ( Z | lim inf n →∞ f n − ¯ f | dQ ω ) dQ ( ∗ ) = Z E ∗ | lim inf n →∞ f n − ¯ f | dQ ( 23 ) = 0 where ( ∗ ) follo ws fr om the deﬁnin g properties of the conditional expecta tion E ( | lim inf n →∞ f n − ¯ f | | I ) in comb ination with theorem 4 . According to th e last computatio n, we ﬁnd a set E ⊂ E ∗ with Q ( E ) = 1 such that ω ∈ E = ⇒ Z | f ∗ ω − ¯ f ω | dQ ω = 0 . 16 A. Sch ¨ on huth The in variance of the regular conditional probab ilities (see ( 14 )) inv olved in the deﬁni- tions of f ∗ ω , ¯ f ω implies Z | f ∗ ω − ¯ f ω | dQ ω = 0 ⇐ ⇒ Z | f ∗ T ω − ¯ f T ω | dQ T ω = 0 . This translates to that E is in variant such that E meets the requ irements of theorem 1 . ⋄ 7 Discussion W e have dem onstrated how to decom pose AMS rand om sou rces, which en compass a large variety of sources of practical inter est, in to ergodic compon ents. The result comes in the traditio n of the ergodic d ecompo sition of stationar y sources. As outline d in th e introdu ction, this substantially a dded to sourc e coding theo ry by facilitating the g ener- alization of a variety of pro minent theorem s to arb itrary , not necessarily ergo dic, sta- tionary sources. Our result can be expected to y ield similar contr ibutions to the theor y o f AMS sources. An im mediate clue is that the th eorems developed in [ 10 ] fo r two-sided AMS sources are now valid for arbitrary AMS sourc es by replacing theorem 2.6 there by theorem 1 here. Moreover , a co uple of relev ant q uantities in inform ation th eory (e.g. entro py rate) are afﬁne func tionals tha t ar e upper semicontinuou s w .r .t. the space of stationary random sources, equipped with the weak topology . Jaco bs’ theory of su ch function als ([ 17 ], see also [ 5 ], th. 4) immediately builds o n the ergod ic decomp osition of stationar y sources. This theory should now be extendable to AMS sources. W e ﬁnally would like to m ention that a c ertain class of source coding th eorems for AMS sou rces w ere obtained by partially circu mventing the lack of an ergodic de- composition . Schematically , th is was don e by a reduction fro m AMS sou rces to their stationary mea ns and subsequent ap plication o f the ergod ic decom position for station - ary sourc es in o rder to f urther reduc e to ergod ic sources. In these cases, our con tribution would only b e to simplify th e theor ems’ statements and thus a me rely esthetical one. Howe ver , in th e rema ining cases where th e red uction fr om asymp totic mean station ar- ity to stationarity is not applicable, o ur r esult will be essential. The full explor ation of related consequ ences seems to be a worthwhile undertak ing. 8 Acknowledgmen ts The author would like to thank the Paciﬁc I nstitute for the Mathematical Scien ces f or fundin g. A Proof of lemma 6 In the f ollowing, according to the assump tions of theor em 1 , P is a m easure on a stan- dard space ( Ω , B ) that is AMS relati ve to th e m easurable T : Ω → Ω . W e fu rther recall the notations of section 2 as well as equation s ( 5 ) and ( 6 ) for the necessary deﬁnitions. The ergodic decomposition of asymptotica lly mean stationary random s ources 17 Lemma 8. Let g : Ω → R be a T -in variant (tha t is, g ( ω ) = g ( T ω ) for all ω ∈ Ω ), measurable function. Then it holds that Z g dP = Z g d ( P ◦ T − n ) = Z g dP n = Z g d ¯ P = Z g dQ. (26) In particular , a ll of the inte grals exist if one of the integr als e xists . Pr oof. Note that Q and all of the P ◦ T − n and P n , like P , are AMS with station ary mean ¯ P , wh ich is an obviou s consequence of the ir deﬁn itions. There fore, the claim of the lemma f ollows from the, intuitively obvious, observation that R g dP = R g d ¯ P for in variant g and g eneral AMS P with stationary mean ¯ P . See [ 12 ] for details. ⋄ Lemma 9. The function s ζ n ( B , ω ) := δ P ( T − n B , ω ) = P ω ( T − n B ) ar e ver sions of the re gu lar conditional pr oba bilities δ P ◦ T − n of the P ◦ T − n given I . Pr oof. The function s ζ n ( ., ω ) are probability measures for ﬁxed ω ∈ Ω (this is ( RC P ) of d eﬁnition 4 ) as the P ω are, b y th e deﬁnitio n of δ P . Again by the deﬁnition of δ P , ζ n ( B , . ) is also I - measurable in ω f or ﬁxed B ∈ B . which is ( C P 1 ) of deﬁnition 4 . For I ∈ I and B ∈ B we comp ute Z I δ P ( T − n B , ω ) d ( P ◦ T − n )( ω ) ( 14 ) , ( 26 ) = Z I δ P ( T − n B , ω ) dP ( ω ) = P ( I ∩ T − n B ) T − n I = I = P ( T − n ( I ∩ B )) = Z I δ P ◦ T − n ( B , ω ) d ( P ◦ T − n )( ω ) where the ﬁrst equation follows f rom the invariance of the integrands and lemma 8 . W e have thus shown ( C P 2) of deﬁnition 4 . ⋄ W e recall that, for lemma 6 , we have to show that α ( B , ω ) = 1 2 ( ¯ P ω ( B ) + X n ≥ 0 2 − n − 1 P ω ( T − n B )) is a version o f the regular cond itional p robability δ Q . Note ﬁr st that ¯ P ω , accord ing to our proo f strategy outlined in subsection 6.1 , was deﬁned a s δ ¯ P ( ., ω ) wh ere δ ¯ P is the regular c onditiona l pro bability of the stationary mean ¯ P . Fu rthermor e, as a co nsequence of lemma 9 , we can identify the P ω ◦ T − n with δ P ◦ T − n ( ., ω ) and write α ( B , ω ) = 1 2 ( δ ¯ P ( B , ω ) + X n ≥ 0 2 − n − 1 δ P ◦ T − n ( B , ω )) . (27) W e will then exploit the deﬁning pro perties of the δ s to ﬁnally show that α is a version of δ Q . 18 A. Sch ¨ on huth Pr oof of lemma 6 . W e have to check pro perties ( R C P ) , ( C P 1) and ( C P 2) o f deﬁni- tion 4 . ( RC P ) : That α ( ., ω ) is a p robability measure for ﬁxed ω fo llows from an argum en- tation which is comp letely analogous to that at the b eginning of section 4 , surro unding equations ( 4 ) and ( 6 ). ( C P 1) : As all of the δ ’ s in volved in ( 27 ) are inv ariant in ω (see ( 14 )), we kn ow that α ( B , . ) is measurable w . r . t. I for any B ∈ B which is ( C P 1) of deﬁnition 4 . ( C P 2) : Fix B ∈ B an d consider the functions g n ( ω ) := 1 2 ( δ ¯ P ( B , ω ) + n X k =0 2 − k − 1 δ P ◦ T − k ( B , ω )) . This is an increasing sequence of non-n egati ve measurements which conv erges e very- where to th e values α ( B , ω ) . Because of ( 14 ) the summ ands of g n are inv ariant. As all of th e summan ds are also integrable with respect to some P ◦ T − k or ¯ P they ar e also integrable with re spect to Q , due to lemma 8 . Therefore, also the g n are integrable with respect to Q . The mo notone co n vergence theo rem o f Beppo Levi (e.g . [ 15 ]) rev eals that also α ( B , . ) is and further, for I ∈ I and B ∈ B : Z I α ( B , ω ) dQ ( ω ) = Z I lim n →∞ 1 2 ( δ ¯ P ( B , ω ) + n X k =0 2 − k − 1 δ P ◦ T − k ( B , ω )) dQ ( ω ) ( a ) = lim n →∞ Z I 1 2 ( δ ¯ P ( B , ω ) + n X k =0 2 − k − 1 δ P ◦ T − k ( B , ω )) dQ ( ω ) ( b ) = lim n →∞ 1 2 ( Z I δ ¯ P ( B , ω ) d ¯ P ( ω ) + n X k =0 2 − k − 1 Z I δ P ◦ T − k ( B , ω ) d ( P ◦ T − k )( ω )) ( c ) = lim n →∞ 1 2 ( ¯ P ( I ∩ B ) + n X k =0 2 − k − 1 P ( T − k ( I ∩ B )) = 1 2 ( ¯ P ( I ∩ B ) + X n ≥ 0 2 − n − 1 P ( T − n ( I ∩ B ))) = Q ( I ∩ B ) where ( a ) follows f rom Beppo Levi’ s th eorem, ( b ) follows fr om th e in variance of the δ s and su bsequent ap plication of lemma 8 and ( c ) is just th e deﬁn ing p roperty ( C P 2) of the condition al probabilities δ ( deﬁnition 4 ). W e thus have shown pro perty ( C P 2 ) for α . ⋄ The ergodic decomposition of asymptotica lly mean stationary random s ources 19 B Proof of Lemma 7 According to the assumptions of theorem 1 , P is a measure on a standard space ( Ω, B ) that is AMS relative to th e measura ble T : Ω → Ω . W e further re call the notations of section 2 as well as equation s ( 5 ), ( 6 ), ( 7 ), ( 13 ), ( 15 ), ( 16 ), ( 17 ), ( 20 ) and the surro unding texts for the n ecessary deﬁnition s. W e fu rther rem ind that, without loss of genera lity , we had c hosen representatives o f the f n and ¯ f that are everywhere nonnegative. The following lemma will deliver the tech nical key to lemma 7 . Lemma 10. F or each 1 ≤ n ∈ N ther e is an in variant E n ∈ I ⊂ B with P n ( E n ) = Q ( E n ) = 1 such tha t ω ∈ E n = ⇒ f n,ω = f n in L 1 ( Q ω ) . Ther e is also an in variant E ∞ with ¯ P ( E ∞ ) = Q ( E ∞ ) = 1 such that ω ∈ E ∞ = ⇒ ¯ f ω = ¯ f in L 1 ( Q ω ) . Loosely speaking, the lemm a r ev eals th at th e f n and the f n,ω as well as ¯ f and ¯ f ω agree Q ω -a.e, for Q -almost all ω ∈ Ω . This m eans that, for Q -alm ost all ω , they are equal on the parts of Ω consider ed relev ant by the measures Q ω . Pr oof. Consider the fun ctions β n ( B , ω ) := Z B f n,ω dQ ω and γ n ( B , ω ) := Z B f n dQ ω By the deﬁnition of a density , δ P n ( B , ω ) = Z B f n,ω dQ ω . Hence β n ( B , ω ) is just th e regular co nditional p robability o f P n giv en I . W e now show that γ n is a version of th e c onditiona l probab ility of P n giv en I (but n ot necessarily a regular o ne). Note ﬁrst that the γ n ( B , . ) are I -measurab le as, accor ding to ( 11 ), we have that γ n ( B , ω ) agrees with th e conditional expectio n E Q ( 1 B f n |I )( ω ) , which, by deﬁnition, is I -m easurable. Second , we observe that, for I ∈ I an d B ∈ B , as γ n is in variant in ω ( ∗ ) , Z I γ n ( B , ω ) dP n ( ∗ ) , ( 26 ) = Z I γ n ( B , ω ) dQ = Z I ( Z B f n dQ ω ) dQ = Z I ( Z 1 B f n dQ ω ) dQ ( 12 ) = Z I 1 B f n dQ = Z I ∩ B f n dQ = P n ( I ∩ B ) , which shows the req uired prop erty ( C P 2) of d eﬁnition 4 . Hence th e γ n ’ s are versions of the condition al probabilities o f the P n ’ s g i ven I . 20 A. Sch ¨ on huth Note that the γ n ( ., ω ) are measures because the f n had been ch osen n onnegative ev erywhere . If we follow the line of argumentatio n of lemma 5 we ﬁnd a set E n of P n - measure one such that the measures β n ( ., ω ) and γ n ( ., ω ) agree for ω ∈ E n . Because of th e inv ariance of β n , γ n the set E n is inv ariant. Hence (lemma 8 ) also Q ( E n ) = 1 . Resuming we have ω ∈ E n = ⇒ ∀ B ∈ B : Z B f n dQ ω = Z B f n,ω dQ ω . As two fu nctions ag ree almost everywhere if their integrals conincide over ar bitrary ev ents, we are done with the assertion of the lemma for the f n . W e ﬁnd an inv ariant s et E ∞ with ¯ P ( E ∞ ) = Q ( E ∞ ) = 1 such that ¯ f ω = ¯ f in L 1 ( Q ω ) for ω ∈ E ∞ by a completely analogou s ar gumentatio n . ⋄ Pr oof of lemma 7 . Deﬁne E ∗ := E ∞ ∩ ( \ n ≥ 1 E n ) (28) with E ∞ and the E n from lemma 10 . E ∗ is in v ariant and Q ( E ∗ ) = 1 as it applies to all sets on the right hand side of ( 28 ). W e obtain ∀ n ∈ N f n = f n,ω and ¯ f = ¯ f ω in L 1 ( Q ω ) for ω ∈ E ∗ . Therefore also lim inf n →∞ f n = lim inf n →∞ f n,ω in L 1 ( Q ω ) for ω ∈ E ∗ . ⋄ Refer ences 1. A L G O E T , P . and C OV E R , T. (1988). A sandwich proof of the Shannon-McM illan-Breiman theorem. Annals of Proba bility , 16:899-909. 2. A M B R O S E , W. and H A L M O S , P. R . and K A K U TA N I , S . (1942). The decomposition of mea- sures II. Duke Mathematical Jo urnal , 9:43–47. 3. K RY L O FF , N . and B O G O L I O U B O FF , N . (1937). La th ´ eorie g ´ en ´ erale de la mesure dans son application ` a l’ ´ etude d es syst ` emes dynamique s de la m ´ ecanique non lin ´ eaire. Annals of Ma th- ematics , 38:65– 113. 4. D OW K E R , Y . (1951). F inite an d σ -ﬁnite in v ariant measures. Annals of M athematics , 54:59 5– 608. 5. E FF R O S , M . , C H O U , P . A ., and G R AY , R . M (1994) V ariable-rate source coding theorems for stationary non ergodic sources. IEEE T ransactions on Information Theory , IT -40(6):1920- 1925 . 6. F A I G L E , U . and S C H ¨ O N H U T H , A . (20 07). Asympto tic mean stati onarity of sources with ﬁnite e volution dimen sion. IEEE T ran sactions on Information Theory , 53(7):234 2–2348 . The ergodic decomposition of asymptotica lly mean stationary random s ources 21 7. G R AY , R . and D A V I S S O N , L . ( 1974). The ergod ic decomposition of stationary discrete ran- dom processes. IEE E T r ansactions on Information Theo ry , IT -20(5):625–63 6. 8. G R AY , R . and D A V I S S O N , L . (1974 ). Source coding theorems without the ergod ic assump- tion. IEEE Tr ansaction s on Information Theory , IT -20(4):502–516. 9. G R AY , R . M . and K I E FF E R , J . C . (1980). Asymptotically mean stati onary measures. Annals of Pr oba bility , 8:962–973. 10. G R A Y , R . M . and S A A DAT , F. (198 0). Block source coding theory for asymptotically mean stationary measures. I EEE T ransactions on Information Theory , 30:54–68. 11. G R A Y , R . M . (1990 ). Entr opy and Information Theory . http://ee.sta nford.edu/ ˜ gray/it.pdf . 12. G R A Y , R . M . (2001). Pr obability , Ran dom P r ocesses and E r godic Pro perties . http://ee.sta nford.edu/ ˜ gray/arp.pdf . 13. H A L M O S , P . (1941). The decomposition of measures. Duke Math. J. , 8:386– 392. 14. H A L M O S , P . (1949). On a theorem of Dieudonne. Pr oc. Nat. Acad. S ci. U.S.A. , 35:38–4 2. 15. H A L M O S , P . (1950). Measur e Theory . V an Nostrand, Princeton. 16. J AC K A , S . D . and R O B E RT S , G . O . (19 97). On strong f orms of weak con v ergenc e. Stoc hastic Pr ocesses and App lications , 67:41–53. 17. J AC O B S , K . (1963). E rgodic de composition of the K olmo gorov -Sinai i n v ariant. In: Er godic Theory , Fred B. Wright, Ed., Academic Press Ne w Y ork. 18. K AT O K , A . and H A S S E L B L A T T , B . (1999). Intr oduction to the Modern Theory of Dynamical Systems . Cambridge Univ ersity Press. 19. K I E FF E R , J . C . (1975). On the optimum average distortion attainable by ﬁxed-rate coding of a noner godic source. IEEE T r ans. Inform. Theory , 21:190 –193. 20. K I E FF E R , J . C . and R A H E , M . (1981). Markov channels are asymptotically mean stationary . SIAM J . Math. Anal. , 12(3):293– 305. 21. K R E N G E L , U . (1985). Ergod ic Theor ems . De Gruyter , Berlin, Ne w Y ork. 22. L E O N - G A R C I A , A . , D A V I S S O N , L . and N E U H O FF , D . (1979). New results on coding of stationary noner godic sources. IEEE T ra nsactions on Information Theory , 25(2):13 7–144. 23. N E U H O FF , D . L . , G R A Y , R . M . and D A V I S S O N , L . D . (1975). Fixed rate uni versal block source co ding with a ﬁdelity criterion. IEEE T r ansactions on Informa tion Theory , 2 1(5):511– 523. 24. O X T O B Y , J . (1952) E rgodic sets. Bull. Amer . Math. Soc. , 58:116–13 6. 25. P A RT H A S A R A T H Y , K . R . (1967). Pr obability Theory on Metric Spaces . Academic Press, Ne w Y ork. 26. P O L L I C O T T , M . and Y U R I , M. (1998). Dynamical Systems and Er god ic Theor y . Cambridge Univ ersity Press. 27. P U R S L E Y , M . and D A V I S S O N , L . (1976). V ariable rate cod ing for noner godic sources and classes of ergodic sources subject to a ﬁdelity constraint. IEE E Tr ansac tions on Information Theory , 22(3):324– 337. 28. R E C H A R D , O . W . (1956). In v ariant measures for many-on e transformation s. Duke J. Math. 23:477–4 88. 29. R O K H L I N , V . A . (1952). On the fundamental ideas of measure theory . Amer . Math. Soc. T r anslations , 71. 30. S H I E L D S , P . C . , N E U H O FF , D . L . , D A V I S S O N , L . D . and L E D R A P P I E R , F. (1978). The distortion-rate function for noner godic sources. The Annals of Pr obability , 6(1):138-14 3. 31. VO N N E U M A N N , J . (1932). Zur Operatorenmethode der klassischen Mechanik. A nnals of Mathematics , 33:587– 642. 32. W A LT E R S , P . (1982). An Intr oduction to Er godic Theory . S pringer-V erlag, New Y ork.

The ergodic decomposition of asymptotically mean stationary random sources

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment