A Divergence Formula for Randomness and Dimension

If $S$ is an infinite sequence over a finite alphabet $\Sigma$ and $\beta$ is a probability measure on $\Sigma$, then the {\it dimension} of $ S$ with respect to $\beta$, written $\dim^\beta(S)$, is a constructive version of Billingsley dimension tha…

Authors: Jack H. Lutz

A Div ergence F orm ula for Randomness and Dimension Jac k H. Lutz ∗ Departmen t of Computer Science Io w a State Univ ersit y Ames, IA 50011, USA lutz@cs.iastate.edu Abstract If S is an infinite sequence over a finite alphab et Σ and β is a probability mea sure o n Σ, then the dimension of S with r espect t o β , written dim β ( S ), is a constructive version of Billingsley dimension that coincides with the (constructive Hausdorff ) dimension dim( S ) when β is the uniform probability measur e . This pap e r shows that dim β ( S ) and its dual Dim β ( S ), the s t r ong dimension of S with r espect to β , ca n b e used in conjunction with randomness to measure the similarity of tw o probability mea sures α and β o n Σ. Spec ific a lly , w e prov e that the diver genc e formula dim β ( R ) = Dim β ( R ) = H ( α ) H ( α ) + D ( α || β ) holds whenever α and β are computable, p ositiv e pro babilit y mea sures on Σ and R ∈ Σ ∞ is random with resp ect to α . In this formula, H ( α ) is the Shannon entrop y o f α , and D ( α || β ) is the Kullback-Leibler divergence betw een α and β . W e a ls o show that the ab o ve for m ula ho lds for all sequences R that are α -normal (in the s ense of B o rel) when dim β ( R ) and Dim β ( R ) a re replace d by the mor e effective finite-state dimensions dim β FS ( R ) and Dim FS β ( R ). In the course of proving this, we also prov e finite-state c o mpression characteriza tions of dim β FS ( S ) and Dim FS β ( S ). 1 In tro duction The constructiv e dimen sion dim( S ) and th e constructiv e strong dimension Dim( S ) of an infi nite sequence S o v er a finite alphab et Σ are constructiv e v ersions of the tw o m ost imp ortan t classical fractal d imensions, namely , Hausdorff dimen sion [9] and pac king dimension [22, 21], r esp ective ly . These tw o constructive dimensions, whic h were in tro duced in [13, 1], hav e b een s ho wn to ha v e the useful c haracterizations dim( S ) = lim inf w → S K( w ) | w | log | Σ | (1.1) and Dim( S ) = lim su p w → S K( w ) | w | log | Σ | , (1.2) where the logarithm is b ase-2 [16, 1]. In these equations, K( w ) is the Kolmogoro v complexit y of the prefix w of S , i.e., the length in bits of the shortest pr o gr a m that prints th e string w. (See section ∗ This researc h was supp orted in part by N atio nal Science F oundation Grants 9988483, 03 44187, 0652569 , and 0728806 and by the Spanish Ministry of Education and Science (MEC) and th e Europ ean Regional Developmen t F u nd (ERDF) u nder pro ject TIN2005-08832-C03 -02. 1 2.6 or [11] for d eta ils.) The numerators in these equations are thus the algorithmic information c ontent of w, wh ile the denominators are the “naiv e” information conte nt of w , also in bits. W e thus understand (1.1) and (1.2) to sa y that dim( S ) and Dim( S ) are the lo w er and upp er inf ormation densities of the s equence S . These constructiv e d imensions and their analogs at other levels of effectivit y hav e b een inv estig ated extensiv ely in recen t ye ars [10]. The constructiv e dimen sions dim( S ) and Dim( S ) hav e recen tly b een generalized to incorp orate a probability measure ν on the sequence space Σ ∞ as a parameter [14]. Sp ecifically , for eac h suc h ν and eac h sequ en ce S ∈ Σ ∞ , w e now h a v e the constructiv e dimen s ion dim ν ( S ) and the constructiv e s tr ong dimension Dim ν ( S ) of S with resp ect to ν . (Th e fi rst of th ese is a constructiv e v ersion of Billingsley d imension [2].) When ν is the uniform probabilit y measure on Σ ∞ , w e ha ve dim ν ( S ) = d im( S ) and Dim ν ( S ) = Dim ( S ). A more in teresting example occurs when ν is the pro duct measur e generated b y a nonuniform pr obabilit y m ea sur e β on the alphab et Σ. In this case, dim ν ( S ) an d Dim ν ( S ), whic h we write as dim β ( S ) an d Dim β ( S ), are again the lo wer and upp er information dens ities of S, but th ese densities are n o w measured with r espect to unequal letter costs. Sp ecifically , it wa s s ho wn in [14] that dim β ( S ) = lim inf w → S K( w ) I β ( w ) (1.3) and Dim β ( S ) = lim su p w → S K( w ) I β ( w ) , (1.4) where I β ( w ) = | w |− 1 X i =0 log 1 β ( w [ i ]) is the Shan n on self-information of w with resp ect to β . These unequal letter costs log(1 /β ( a )) for a ∈ Σ can in fact b e usefu l. F or example, the complete analysis of the dimensions of individu al p oin ts in self-similar fractals giv en b y [14] requires these constructiv e dimensions w ith a particular c hoice of the probability measure β on Σ. In this pap er we sho w h o w to u se the constructive dimensions dim β ( S ) and Dim β ( S ) in conju nc- tion with randomness to measure the degree to wh ic h tw o probabilit y m ea sur es on Σ are similar. T o see why this migh t b e p ossible, we note that the inequalities 0 ≤ dim β ( S ) ≤ Dim β ( S ) ≤ 1 hold for all β and S and that the maxim um v al ues dim β ( R ) = Dim β ( R ) = 1 (1.5) are ac hiev ed wh en ev er the sequence R is random with resp ect to β . It is thus reasonable to hop e that, if R is rand om with resp ect to some other probabilit y measure α on Σ, then d im β ( R ) and Dim β ( R ) will tak e on v alues whose closeness to 1 r eflect s th e degree to which α is similar to β . This is indeed the case. Ou r first main theorem sa ys th at the diver genc e formula dim β ( R ) = Dim β ( R ) = H ( α ) H ( α ) + D ( α || β ) (1.6) 2 holds w henev er α and β are computable, p ositiv e pr obabilit y measures on Σ and R ∈ Σ ∞ is rand om with resp ect to α . In this formula, H ( α ) is the Shannon entrop y of α , and D ( α || β ) is th e Kullbac k- Leibler div ergence b et w een α and β . When α = β , the Ku llbac k-Leibler diverge nce D ( α || β ) is 0, so (1.6) coincides with (1.5). When α and β are diss im ilar, the K u llbac k-Leibler divergence D ( α || β ) is large, so the right -hand side of (1.6) is small. Hence the divergence formula tells us that, w h en R is α -random, dim β ( R ) = Dim β ( R ) is a quantit y in [0 , 1] whose closeness to 1 is an in dicat or of the similarit y b et we en α and β . The p r oof of (1.6) serves as an outline of ou r other, more challengi ng task, w hic h is to pro v e that the div ergence formula (1.6 ) also holds for the muc h more effect ive finite-state β - dimension dim β FS ( R ) and finite-state str ong β - dimension Dim FS β ( R ). (These dimensions, defined in section 2.5, are generalizations of fi nite-sta te dimension and finite-state strong dimension, which were in tro duced in [6, 1], resp ectiv ely .) With this ob jectiv e in m ind, our second main theorem c haracterizes th e finite-state β -dimensions in terms of finite-state data compression. S p ecifica lly , this theorem says that, in analogy with (1.3) and (1.4), the iden tities dim β FS ( S ) = inf C lim inf w → S | C ( w ) | I β ( w ) (1.7) and dim β FS ( S ) = inf C lim sup w → S | C ( w ) | I β ( w ) (1.8) hold for all infi nite sequ ences S o v er Σ. The infima here are taken o v er all inform ati on-lossless fi nite- state compr essors (a mo del in tro duced b y Shann on [20] and in vesti gated extensiv ely ever since) C with outpu t alphab et 0 , 1, and | C ( w ) | denotes the num b er of bits that C outpu ts when pr ocessing the prefix w of S . T he sp ecial cases of (1.7) and (1.8 ) in which β is the un iform probabilit y measur e on Σ, and h ence I β ( w ) = | w | log | Σ | , were pro ve n in [6, 1]. In fact, our pro of uses these sp ecial cases as “blac k b o xes” from wh ich we derive the more general (1.7) and (1.8). With (1.7) and (1.8) in hand, we pr o v e our th ird main theorem. This in vo lves the fi n ite-st ate v ersion of randomness, wh ic h was introd uced by Borel [3] long b efore fi nite-sta te automata we re defined. If α is a probabilit y measure on Σ, then a sequence S ∈ Σ ∞ is α - normal in the sense of Borel if ev ery fin ite string w ∈ Σ ∗ app ears with asymptotic fr equ ency α ( w ) ∈ S , wh ere we write α ( w ) = | w |− 1 Y i =0 α ( w [ i ]) . (See section 2.6 for a p recise d efinition of asymptotic frequency .) Ou r third m ain theorem sa ys that the diver genc e formula dim β FS ( R ) = Dim FS β ( R ) = H ( α ) H ( α ) + D ( α || β ) (1.9) holds wheneve r α and β are p ositiv e probabilit y measur es on Σ and R ∈ Σ ∞ is α -norm al. In section 2 w e b riefly r eview id eas fr om Sh annon in formatio n theory , classical fractal dimen- sions, algorithmic in formatio n theory , and effectiv e fractal d imensions that are used in this pap er. Section 3 outlines the pr oofs of (1.6), section 4 outlines the pr oofs of (1.7) and (1.8), and section 5 outlines the pro of of (1.9). V arious pro ofs are consigned to a tec hnical app endix. 3 2 Preliminaries 2.1 Notation and setting Throughout th is pap er we work in a fin ite alph abet Σ = { 0 , 1 , . . . , k − 1 } , where k ≥ 2. W e write Σ ∗ for the set of (finite) strings o ver Σ and Σ ∞ for the s et of (infinite) se quenc es o ve r Σ. W e wr ite | w | for the length of a string w and λ for the emp ty string. F or w ∈ Σ ∗ and 0 ≤ i < | w | , w [ i ] is the i th sym b ol in w . Similarly , for S ∈ Σ ∞ and i ∈ N (= { 0 , 1 , 2 , . . . } ), S [ i ] is the i th sym b ol in S . Note that the leftmost symbol in a string or sequence is the 0th symbol. A pr e fix of a string or sequ en ce x ∈ Σ ∗ ∪ Σ ∞ is a string w ∈ Σ ∗ for whic h th ere exists a string or sequence y ∈ Σ ∗ ∪ Σ ∞ suc h that x = wy . In th is case w e wr ite w ⊑ x . The equation lim w → S f ( w ) = L means that, for all ǫ > 0, for all s u fficien tly long p refixes w ⊑ S , | f ( w ) − L | < ǫ . W e also use the limit inferior, lim in f w → S f ( w ) = lim w → S inf { f ( x ) | w ⊑ x ⊑ S } , and the limit sup erior lim s up w → S f ( w ) = lim w → S sup { f ( x ) | w ⊑ x ⊑ S } . 2.2 Probabilit y measures, gales, and Shannon information A pr ob ability me asur e on Σ is a fun ctio n α : Σ → [0 , 1] suc h that P a ∈ Σ α ( a ) = 1. A probability measure α on Σ is p ositive if α ( a ) > 0 for every α ∈ Σ. A probabilit y measure α on Σ is r ational if α ( a ) ∈ Q (i.e., α ( a ) is a r ational n umb er ) for ev ery a ∈ Σ. A pr ob ability me asur e on Σ ∞ is a f unction ν : Σ ∗ → [0 , 1] suc h that ν ( λ ) = 1 and, for all w ∈ Σ ∗ , ν ( w ) = P a ∈ Σ ν ( wa ). (Intuitiv ely , ν ( w ) is the probabilit y that w ⊑ S wh en the sequence S ∈ Σ ∞ is “c h osen acco rd ing to ν .”) Ea c h p robabilit y measure α on Σ naturally induces the probabilit y measure α on Σ ∞ defined b y α ( w ) = | w |− 1 Y i =0 α ( w [ i ]) (2.1) for all w ∈ Σ ∗ . W e reserve the sym b ol µ for the uniform pr ob ability me asur e on Σ, i.e., µ ( a ) = 1 k for all a ∈ Σ , and also for th e uniform pr ob ability me asur e on Σ ∞ , i.e., µ ( w ) = k −| w | for all w ∈ Σ ∗ . If α is a probabilit y measure on Σ and s ∈ [0 , ∞ ), then an s - α - gale is a function d : Σ ∗ → [0 , ∞ ) satisfying d ( w ) = X a ∈ Σ d ( wa ) α ( a ) s (2.2) for all w ∈ Σ ∗ . A 1- α -gale is also called an α - martinga le . When α = µ , we omit it from this terminology , so an s - µ -gale is called an s - gale , and a µ -martingale is called a martingale . W e frequently use the follo wing s imple fact without explicit citation. 4 Observ ation 2.1. L et α and β b e p ositive pr ob ability me asur es on Σ , and let s, t ∈ [0 , ∞ ) . If d : Σ ∗ → [0 , ∞ ) is an s - α -gale, then the f u nction ˜ d : Σ ∗ → [0 , ∞ ) define d by ˜ d ( w ) = α ( w ) s β ( w ) t d ( w ) is a t - β - g ale. In tuitive ly , an s - α -gale is a strategy for b etting on the s u cce ssive symb ols in a sequence S ∈ Σ ∞ . F or eac h p refix w ⊑ S , d ( w ) denotes the amoun t of capital (money) that the gale d has after b etting on the symb ols in w . I f s = 1, then th e righ t-hand side of (2.2) is the conditional exp ectation of d ( wa ), giv en that w has o ccurred, so (2.2) sa ys that the pay offs are fair. If s < 1, then (2.2 ) sa ys that the pa yo ffs are unfair. Let d b e a gale, and let S ∈ Σ ∞ . Then d suc c e e ds on S if lim sup w → S d ( w ) = ∞ , and d suc c e e ds str ongly on S if lim inf w → S d ( w ) = ∞ . The suc c ess set of d is the set S ∞ [ d ] of all sequences on w h ic h d succeeds, and th e str ong suc c ess set of d is th e set S ∞ str [ d ] of all sequences on whic h d succeeds strongly . The Shannon entr opy of a prob ab ility measure α on Σ is H ( α ) = X a ∈ Σ α ( a ) log 1 α ( a ) , where 0 log 1 0 = 0. (unless otherwise in dicate d, all logarithms in this pap er are base-2.) T he Kul lb ack-L eibler diver genc e b et w een tw o probabilit y measures α and β on Σ is D ( α || β ) = X a ∈ Σ α ( a ) log α ( a ) β ( a ) . The Kullback-Le ibler d iv ergence is u sed to quan tify ho w “far apart” the t w o prob ab ility measures α and β are. The Shannon se lf- informa tion of a string w ∈ Σ ∗ with resp ect to a probability measure β on Σ is I β ( w ) = log 1 β ( w ) = | w |− 1 X i =0 log 1 β ( w [ i ]) . Discussions of H ( α ), D ( α || β ), I β ( w ) and th eir prop erties ma y b e foun d in an y go od text on infor- mation theory , e.g., [5]. 2.3 Hausdorff, pac king, and Billingsley dimensions Giv en a pr ob ab ility measure β on Σ, eac h set X ⊆ Σ ∞ has a Hausdorff dimension dim( X ), a p acking dimension Dim( X ), a Bil lingsley dimension d im β ( X ), and a str ong Bil lingsley dimension Dim β ( X ), all of whic h are real n umb ers in the interv al [0 , 1]. In th is pap er we are not concerned with the original definitions of these classical dim en sions, b ut rather in their recen t c haracterizations (whic h ma y b e tak en as defi nitions) in terms of gales. Notation. F or eac h p robabilit y measure β on Σ and eac h set X ⊆ Σ ∞ , let G β ( X ) (resp ectiv ely , G β , str ( X )) b e the s et of all s ∈ [0 , ∞ ) such that there is a β - s -gale d satisfying X ⊆ S ∞ [ d ] (resp ec- tiv ely , X ⊆ S ∞ str [ d ]). Theorem 2.2 (gale c h arac terizations of classica l fractal dimensions) . L e t β b e a pr ob ability me asur e on Σ , and let X ⊆ Σ ∞ . 5 1. [12] dim( X ) = inf G µ ( X ) . 2. [1 ] Dim( X ) = inf G µ, str ( X ) . 3. [14] dim β ( X ) = inf G β ( X ) . 4. [14] Dim β ( X ) = inf G β , str ( X ) . 2.4 Randomness and constructive dimensi ons Randomness and constructiv e dimensions are defin ed by imp osing computabilit y constraints on gales. A real-v alued f unction f : Σ ∗ → R is c omputable if there is a compu table, rational-v alued function ˆ f : Σ ∗ × N → Q such that, f or all w ∈ Σ ∗ and r ∈ N , | ˆ f ( w, r ) − f ( w ) | ≤ 2 − r . A r eal -v alued function f : Σ ∗ → R is c onstructive , or lower semic omputable , if there is a computable, rational-v alued fun cti on ˆ f : Σ ∗ × N → Q such that (i) f or all w ∈ Σ ∗ and t ∈ N , ˆ f ( w, t ) ≤ ˆ f ( w, t + 1) < f ( w ), and (ii) for all w ∈ Σ ∗ , f ( w ) = lim t →∞ ˆ f ( w, t ). The first s uccessful definition of the ran d omness of individu al sequences S ∈ Σ ∞ w as f ormulated b y Martin-L¨ of [15]. Man y c haracterizations (equiv alen t d efinitions) of randomness are no w known, of whic h the follo wing is the m ost p ertinent. Theorem 2.3 (Sc hnorr [17, 18]) . L et α b e a pr ob ability me asur e on Σ . A se q u enc e S ∈ Σ ∞ is random with r esp e ct to α (or, b riefly, α - rand om ) if ther e is no c onstructive α - mar tingale that suc c e e ds on S . Motiv ated b y T heorem 2.2, we no w defi n e the constructiv e dim en sions. Notation. W e define the sets G β constr ( X ) and G β , str constr ( X ) to b e lik e th e sets G β ( X ) and G β , constr ( X ) of section 2.3, except that the β - s -gales are n o w required to b e constr u ctiv e. Definition. Let β b e a pr obabilit y measure on Σ, let X ⊆ Σ ∞ , and let S ∈ Σ ∞ . 1. [13] Th e c onstructive dimension of X is cdim( X ) = inf G µ constr ( X ). 2. [1] T h e c onstructive str ong dimension of X is cDim( X ) = in f G µ, str constr ( X ). 3. [14] Th e c onstructive β - dimension of X is cdim β ( X ) = inf G β constr ( X ). 4. [14] Th e c onstructive str ong β - dimension of X is cDim β ( X ) = inf G β , str constr ( X ). 5. [13] Th e dimension of S is dim( S ) = cdim( { S } ). 6. [1] T h e str ong dimension of S is Dim ( S ) = cDim( { S } ). 7. [14] Th e β - dimension of S is dim β ( S ) = cdim β ( { S } ). 8. [14] Th e str ong β - dimension of S is Dim β ( S ) = cDim β ( { S } ). It is clear that definitions 1, 2, 5, and 6 ab o v e are the sp ecial case β = µ of definitions 3, 4, 7, and 8, resp ectiv ely . It is kno wn that cdim β ( X ) = s u p S ∈ X dim β ( S ) and that cDim β ( X ) = sup S ∈ X Dim β ( S ) [14]. Cons tructiv e dimensions are th us in v estigated in terms of the dimensions of individ u al sequences. Sin ce on e do es n ot d iscuss the classical dimens ion of an individual se- quence (b eca use the dimensions of section 2.3 are all zero for singleton, or ev en coun table, s ets), no confu sion results from the n otation d im( S ), Dim( S ), dim β ( S ), and Dim β ( S ). 6 2.5 Normalit y and finite-state dimensions The preceding section deve lop ed the constructiv e dimensions as effec tive v ersions of the classical dimensions of section 2.3. W e now introd uce the ev en more effectiv e finite-state dimensions. Notation. ∆ Q (Σ) is the set of all r at ional-v alued probabilit y measur e on Σ. Definition ([19, 8, 6]) . A finite-state gambler ( FSG ) is a 4-tuple G = ( Q, δ, q 0 , B ) , where Q is a finite set of states , δ : Q × Σ → Q is the tr ansition function ; q 0 ∈ Q is th e initial state , and B : Q → ∆ Q (Σ) is the b etting function . The transition structure ( Q, δ, q 0 ) here w orks as in any deterministic finite-state automaton. F or w ∈ Σ ∗ , we write δ ( w ) for the state reac hed b y starting at q 0 and pr ocessing w according to δ . In tuitive ly , if the ab o v e FS G is in state q ∈ Q , then, for eac h a ∈ Σ, it b ets the fraction B ( q )( a ) of its cu r ren t capital th at th e next input sym b ol is an a . The pay offs are determined as follo ws. Definition. Let G = ( Q, δ , q 0 , B ) b e an FSG. 1. The martingale of G is the fun ction d G : Σ ∗ → [0 , ∞ ) defined by the recursion d G ( λ ) = 1 , d G ( wa ) = k d G ( w ) B ( δ ( w ))( a ) for all w ∈ Σ ∗ and a ∈ Σ. 2. If β is a probability measure on Σ and s ∈ [0 , ∞ ), then the s - β - gale of G is the function d ( s ) G,β : Σ ∗ → [0 , ∞ ) defined by d ( s ) G,β ( w ) = µ ( w ) β ( w ) s d G ( w ) for all w ∈ Σ ∗ . It is easy to verify that d G = d (1) G,µ is a martingale. It follo ws by Observ ation 2.1 that d (1) G,β is an s - β -gale. Definition. A finite- state s - β - gale is an s - β -gale of the form d ( s ) G,β for some FSG G . Notation. W e define the sets G β FS ( X ) and G β , str FS ( X ) to b e lik e th e sets G β ( X ) and G β , str ( X ) of section 2.3, except that the s - β -gales are n o w required to b e finite-state. Definition. Let β b e a pr obabilit y measure on Σ, and let S ∈ Σ ∞ . 1. [6] T h e finite-state dimension of S is dim FS ( S ) = inf G µ FS ( { S } ). 2. [1] T h e finite-state str ong dimension of S is Dim FS ( S ) = inf G µ, str FS ( { S } ). 3. The finite-state β - dimension of S is dim β FS ( S ) = inf G β FS ( { S } ). 4. The finite-state str ong β -dimension of S is Dim FS β ( S ) = inf G β , str FS ( { S } ). 7 W e now tur n to some ideas based on asymp tot ic fr equencies of strings in a giv en sequence. F or nonempt y strings w , x ∈ Σ ∗ , we wr ite #  ( w, x ) =      m ≤ | x | | w | − 1     x [ m | w | .. ( m + 1) | w | − 1] = w      for the num b er of blo c k o ccurrences of w in x . F or eac h sequence S ∈ Σ ∞ , eac h p ositiv e inte ger n , and eac h nonempty w ∈ Σ 0 , and ǫ > 0 have the pr op erty that, for al l w ∈ I , s ≥ | C ( w ) | I β ( w ) + ǫ. (4.1) Then ther e e xi st an FSG G and a r e al numb er δ > 0 such that, for al l sufficiently long strings w ∈ I , d ( s ) G,β ( w ) ≥ 2 δ | w | . (4.2) Lemma 4.2. L et β b e a p ositive pr ob ability me asur e on Σ , and let G b e an FSG. Assume that I ⊆ Σ ∗ , s > 0 , and ǫ > 0 have the pr op erty that, for al l w ∈ I , d ( s − 2 ǫ ) G,β ( w ) ≥ 1 . (4.3) Then ther e i s an ILFSC C such that, for al l w ∈ I , | C ( w ) | ≤ s I β ( w ) . (4.4) W e no w pr ov e the m ain result of this section. Theorem 4.3 (compression charact erizations of finite-state β -dimensions) . If β is a p ositive pr ob- ability me asur e on Σ , then, for e ach se qu enc e S ∈ Σ ∞ , dim β FS ( S ) = inf C lim in f w → S | C ( w ) | I β ( w ) , (4.5) and Dim FS β ( S ) = inf C lim s up w → S | C ( w ) | I β ( w ) , (4.6) wher e the i nfima ar e taken over al l ILFCSs C . Pr o of. Let β and S b e as giv en. W e first pro v e that the left-hand s id es of (4.5) and (4.6) do n ot exceed th e right-hand sides. F or this, let C b e an I LFSC. It su ffices to sho w that dim β FS ( S ) ≤ lim inf w → S | C ( w ) | I β ( w ) (4.7) and Dim FS β ( S ) ≤ lim su p w → S | C ( w ) | I β ( w ) . (4.8) T o see that (4.7) holds, let s excee d the righ t-hand side. Then there exist an infin ite set I of prefixes of S and an ǫ > 0 su c h th at (4.1 ) holds for all w ∈ I . It follo ws by Lemma 4.1 that ther e 11 exist an FSG G and a δ > 0 such that, for all sufficien tly long w ∈ I , d ( s ) G,β ( w ) ≥ 2 δ | w | . Sin ce I is infinite and δ > 0, this implies that S ∈ S ∞ [ d ( s ) G,β ], whence dim β FS ( S ) ≤ s . This establishes (4.7). The pro of that (4.8) holds is identica l to the preceding paragraph , except that I is no w a cofinite set of p refixes of S , so S ∈ S ∞ str [ d ( s ) G,β ]. It remains to b e sh o wn that the right-hand sides of (4.5) and (4.6) do not exceed the left-hand sides. T o see this for (4.5), let s > d im β FS ( S ). It suffices to sho w that th er e is an ILFSC C su c h that lim in f w → S | C ( w ) | I β ( w ) ≤ s. (4.9) By our c hoice of s there exists ǫ > 0 suc h that s − 2 ǫ > dim β FS ( S ). This implies th at there is an infinite set I of prefixes of S such that (4.3) holds f or all w ∈ I . Cho ose C for G , I , S , and ǫ as in Lemma 4.2 . Then lim in f w → S | C ( w ) | I β ( w ) ≤ inf w ∈ I | C ( w ) | I β ( w ) ≤ s (4.10) b y (4.4), so (4.9) holds. The pro of that the righ t-hand side of (4.6) do es not exceed the left-hand side is iden tical to the preceding p aragraph , except that the limits inferior in (4.9) and (4.1 0 ) are no w limits sup erior, and the set I is no w a cofinite set of pr efixes of S . 5 Div ergence form ula for n ormalit y and finite-state dimensions This section pro v es th e div ergence f orm u la for α -normalit y , finite-state β -dimension, and fin ite- state strong β -dimension. As should no w b e clear, Theorem 4.3 enables us to pr oceed in analogy with section 3. Lemma 5.1. If α and β ar e p ositive pr ob ability me asur es on Σ , then, for al l S ∈ Σ ∞ , lim in f w → S I α ( w ) I β ( w ) ≤ dim β FS ( S ) dim α FS ( S ) ≤ lim sup w → S I α ( w ) I β ( w ) , (5.1) and lim in f w → S I α ( w ) I β ( w ) ≤ Dim FS β ( S ) Dim FS α ( S ) ≤ lim su p w → S I α ( w ) I β ( w ) . (5.2) Lemma 5.2. If α and β ar e p ositive pr ob ability me asur es on Σ , then, for al l S ∈ FREQ α , dim β FS ( S ) = dim FS ( S ) H k ( α ) + D k ( α || β ) , and Dim FS β ( S ) = Dim FS ( S ) H k ( α ) + D k ( α || β ) . W e next pro ve a finite-state analog of T h eorem 3.4. Theorem 5.3. If α is a pr ob ability me asur e on Σ , then, for e very α -normal se qu e nc e R ∈ Σ ∞ , dim FS ( R ) = Dim FS ( R ) = H k ( α ) . 12 W e no w hav e our th ird main theorem. Theorem 5.4 (diverge nce theorem for n ormalit y and fin ite -state d imensions) . If α and β ar e p ositive pr ob ability me asur es on Σ , then, f or every α -normal se quenc e R ∈ Σ ∞ , dim β FS ( R ) = Dim FS β ( R ) = H ( α ) H ( α ) + D ( α || β ) . Pr o of. This f oll o ws immediately fr om Lemma 5.2 and T heorem 5.3. W e again note that D ( α || β ) = log k − H ( α ), so Th eorem 5.3 is the case β = µ of Theorem 5.4. Ac kno wledgmen ts. I thank Xiao y ang Gu and Elvira Ma yordomo for usefu l discussions. References [1] K . B. Athrey a, J. M. Hitc hco c k, J. H. Lu tz, and E. Ma yordomo. Effectiv e strong dimen s ion, algorithmic information, and computational complexit y . SIAM Journal on Computing , 37:671– 705, 2007. [2] P . Billingsley . Hausdorff dimen s ion in p robabilit y theory . Il linois Journal of Mathematics , 4:187– 209, 1960. [3] E. Borel. Sur les probabilit ´ es d ´ en om brab les et leurs applications arithm´ etiques. R end. Cir c. Mat. Palermo , 27:247–271 , 1909. [4] C . Bourk e, J. M. Hitc h coc k, and N. V. Vino dc h andran. Entrop y rates and fi nite-sta te dimen- sion. The or etic al Computer Scie nc e , 349(3):39 2–406, 2005. [5] T . M. Cov er and J. A. Thomas. Elements of Information The ory . John Wiley & Sons, Inc., second ed ition, 2006. [6] J . J. Dai, J. I. Lathrop , J. H. Lutz, and E. Ma yordomo. Finite-state dimension. The or etic al Computer Scienc e , 310:1–3 3, 2004. [7] H. Eggleston. The fractional dimension of a set defin ed b y decimal prop erties. Quarterly Journal of Mathematics , O x f ord Series 20:31–36 , 1949. [8] M. F eder. Gam b ling using a fin ite state mac h ine. IEEE T r ansactions on Information The ory , 37:145 9–1461, 1991. [9] F. Haus d orff. Dimension u n d ¨ ausseres Mass. Mathematische Annalen , 79:157 –179, 1919. English translation. [10] J. M. Hitc hco c k. Effectiv e F ractal Dimension B ibliography , h ttp://www.cs.uwy o.edu/ ∼ jhitc hco/bib/dim.shtml (curr ent O cto b er, 2008). [11] M. Li and P . M. B. Vit´ an yi. A n Intr o duction to Kolmo gor ov Complexity and its A ppl ic ations . Springer-V erlag, Berlin, 1997. Second E dition. 13 [12] J. H. Lu tz. Dimension in complexit y classes. SIAM J ourna l on Computing , 32:1236– 1259, 2003. [13] J. H. Lu tz. The dimensions of individ ual str in gs and sequences. Information and Computation , 187:49 –79, 2003. [14] J. H. Lutz and E. Ma y ordomo. Dimensions of p oin ts in self-similar fr actals. SIAM Journal on Computing , 38:1080 –1112, 2008. [15] P . Martin-L¨ of. The defin ition of r an d om sequences. Information and Contr ol , 9:602–61 9, 1966. [16] E. Ma y ordomo. A K olmo goro v complexity charact erization of constructive Hausdorff dimen- sion. Information Pr o c essing L etters , 84(1):1 –3, 2002. [17] C. P . Schnorr. A unified approac h to the definition of random sequ ences. Mathematic al Systems The ory , 5:246–25 8, 1971. [18] C. P . Sc hn orr. A survey of the theory of random sequences. In R. E. Butts and J. Hin tikk a, editors, Basic Pr oblems in Metho dolo g y and Linguistics , pages 193–210 . D. R eidel, 1977. [19] C. P . Sc hnorr and H. Stimm. End lic he Automate n und Zufallsfolgen. A cta Informatic a , 1:345– 359, 1972. [20] C. E. S hannon. A mathematical theory of comm unication. Be l l System T e chnic al Journal , 27:379 –423, 623–656, 1948. [21] D. S ulliv an. Entrop y , Hausdorff measures old and new, and limit sets of geometric ally fin ite Kleinian groups. A cta Mathematic a , 153:259–2 77, 1984. [22] C. T r ico t. Tw o defin itio ns of f racti onal d imension. M ath ematic al Pr o c e e dings of the Cambrid ge Philosophic al So ci e ty , 91:57–74, 1982. 14 A App endix – V arious Pro ofs Pro of of Lemma 3.2. Assume the h yp othesis, and let S ∈ FREQ α . Then, as w → S , w e hav e I β ( w ) = | w |− 1 X i =0 log 1 β ( w [ i ]) = X a ∈ Σ #( a, w ) log 1 β ( a ) = | w | X a ∈ Σ freq a ( w ) log 1 β ( a ) = | w | X a ∈ Σ ( α ( a ) + o (1)) log 1 β ( a ) = | w | X a ∈ Σ α ( a ) log 1 β ( a ) + o ( | w | ) = | w | X a ∈ Σ  α ( a ) log 1 α ( a ) + α ( a ) log α ( a ) β ( a )  + o ( | w | ) = ( H ( α ) + D ( α || β )) | w | + o ( | w | ) . Pro of of Lemma 3.3. Let α , β , and S b e as giv en . By the frequency divergence lemma, w e ha v e I µ ( w ) I β ( w ) = | w | log k ( H ( α ) + D ( α || β )) | w | + o ( | w | ) = log k H ( α ) + D ( α || β ) + o (1) = log k H ( α ) + D ( α || β ) + o (1) = 1 H k ( α ) + D k ( α || β ) + o (1) as w → S . The present lemma follo ws from this and Lemma 3.1. The follo win g lemma summarizes the first part of the p roof of Th eorem 2.7. Lemma A.1 ([6]) . F or e ach ILFSC C ther e is an inte ger m ∈ Z + such that, for e ach l ∈ Z + , ther e is an FSG G such that, for al l w ∈ Σ ∗ , log d (1) G ( w ) ≥ | w | log k − | C ( w ) | − m ( | w | l + l ) . (A.1) Pro of of Lemma 4.1. Assume the h yp othesis. L et δ β = min a ∈ Σ log 1 β ( a ) , noting the f ollo wing tw o things. 15 (i) δ β > 0, b ecause β is p ositiv e. (ii) F or all w ∈ Σ ∗ , I β ( w ) ≥ δ β | w | . (A.2) Cho ose m for C as in Lemm a A.1, let l =  3 m ǫδ β  , (A.3) and c ho ose G for C , m , an d l as in Lemma 4.1. Let δ = 2 3 ǫδ β , noting that δ > 0 and that | w | ≥ l 2 = ⇒ ǫδ β | w | − m ( | w | l + l ) = ǫδ β | w | − m l ( | w | + l 2 ) ≥ ǫδ β | w | − 2 m l | w | = ( ǫδ β − 2 m l ) | w | ≥ (A.3) 2 3 ǫδ β | w | , i.e., that | w | ≥ l 2 = ⇒ ǫδ β | w | − m ( | w | l + l ) ≥ δ | w | . (A.4) It follo ws that, for all w ∈ I with | w | ≥ l 2 , w e hav e log d ( s ) G,β ( w ) = log ( µ ( w ) β ( w ) s d (1) G ( w )) = −| w | log k + s I β ( w ) + log d (1) G ( w ) ≥ (A.1) s I β ( w ) − | C ( w ) | − m ( | w | l + l ) ≥ (4.1) s I β ( w ) − m ( | w | l + l ) ≥ (A.2) ǫδ β | w | − m ( | w | l + l ) ≥ (A.4) δ | w | . Hence (4.2) h olds. An FSG G = ( Q, Σ , δ, β , q 0 ) is nonvanishing if all its b ets are nonzero, i.e., if β ( q )( a ) > 0 holds for all q ∈ Q and a ∈ Σ. Lemma A.2 ([6]) . F or e ach FSG G and e ach δ > 0 , ther e i s a nonvanishing FSG G ′ such that, for al l w ∈ Σ ∗ , d (1) G ′ ( w ) ≥ k − δ | w | d (1) G ( w ) . (A.5) The follo win g lemma summarizes the second part of the pro of of Theorem 2.7. Lemma A.3 ([6]) . F or e ach nonvanishing FSG G and e ach l ∈ Z + , ther e exists an ILFSC C such that, for al l w ∈ Σ ∗ , | C ( w ) | ≤ (1 + 2 l ) | w | log k − log d (1) G ( w ) . (A.6) 16 Pro of of Lemma 4.2. Assume the h yp othesis. L et γ = log 1 β max , where β max = max a ∈ Σ β ( a ) . Note that γ > 0 (b ecause β is p ositiv e) and that, for all w ∈ Σ ∗ , I β ( w ) ≥ γ | w | . (A.7) Let δ = γ ǫ log k (A.8) and c ho ose G ′ for G and δ as in Lemma A.2. Let l =  2 log k γ ǫ  , (A.9) and c ho ose C for G ′ and l as in Lemm a A.3. Then, f or all w ∈ I , | C ( w ) | ≤ (A.6) (1 + 2 l ) | w | log k − log d (1) G ′ ( w ) ≤ (A.9) | w | ( γ ǫ + log k ) − log d (1) G ′ ( w ) ≤ (A.5) | w | ( γ ǫ + log k ) − log( k − δ | w | d (1) G ( w )) = | w | ( γ ǫ + log k + δ log k ) − log d (1) G ( w ) = | w | (2 γ ǫ + log k ) − log d (1) G ( w ) = | w | (2 γ ǫ + log k ) − log  β ( w ) s − 2 ǫ µ ( w ) d ( s − 2 ǫ ) G,β ( w )  ≤ (A.3) | w | (2 γ ǫ + log k ) − log  β ( w ) s − 2 ǫ µ ( w )  = | w | (2 γ ǫ + log k ) − log( k | w | β ( w ) s − 2 ǫ ) = 2 γ ǫ | w | − log β ( w ) s − 2 ǫ = 2 γ ǫ | w | + ( s − 2 ǫ ) I β ( w ) ≤ (A.7) s I β ( w ) . Pro of of Lemma 5.2. As in the pro of of Lemma 3.3, the hyp othesis implies that I µ ( w ) I β ( w ) = 1 H k ( α ) + D k ( α || β ) + o (1) as w → S . The present lemma follo ws from this and Lemma 5.1. 17 Pro of of Theorem 5.3. Assume the hyp ot hesis, and let l ∈ Z + . Let α ( l ) b e the restriction of th e pro duct pr obabilit y measure µ α to Σ l , n oti ng th at H ( α ( l ) ) = l H ( α ). W e first sho w that lim n →∞ H ( π ( l ) R,n ) = H ( α ( l ) ) , (A.10) where π ( l ) R,n is the empirical probabilit y measur e d efined in section 2.5. T o see this, let ǫ > 0. By the con tinuit y of the entrop y fun cti on, there is a r eal num b er δ > 0 su c h th at, for all p r obabilit y measures π on Σ l , max w ∈ Σ l | π ( w ) − α ( l ) ( w ) | < δ = ⇒ |H ( π ) − H ( α ( l ) ) | < ǫ. Since R is α -normal, there is, for eac h w ∈ Σ l , a p ositiv e in teger n w suc h that, f or all n ≥ n w , | π ( l ) R,n ( w ) − α ( l ) ( w ) | = | π ( l ) R,n ( w ) − µ α ( w ) | < δ. Let N = max w ∈ Σ l n w . T hen, for all n ≥ N , we ha ve |H ( π ( l ) R,n ) − H ( α ( l ) ) | < ǫ , confirmin g (A.10). By Theorem 2.5, w e no w h a ve dim FS ( R ) = Dim FS ( R ) = inf l ∈ Z + 1 l log k lim n →∞ H ( π ( l ) R,n ) = inf l ∈ Z + 1 l log k H ( α ( l ) ) = H ( α ) log k = H k ( α ) . 18

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment