The Gap Dimension and Uniform Laws of Large Numbers for Ergodic Processes
Let F be a family of Borel measurable functions on a complete separable metric space. The gap (or fat-shattering) dimension of F is a combinatorial quantity that measures the extent to which functions f in F can separate finite sets of points at a pr…
Authors: Terrence M. Adams, Andrew B. Nobel
The Gap Dimension and Uniform La ws of Large Num b ers for Ergo di c Pro cesses T errence M. Adams ∗ and Andrew B. Nob el † Ma y 24 , 2010 Abstract Let F be a family of Borel measura ble functions on a complete separable metric space. The gap (o r fat-sha tter ing) dimension of F is a combinatorial quant ity that measures the ex ten t to which functions f ∈ F can separate finite sets of p oints at a predefined re solution γ > 0. W e establish a connectio n betw een the gap dimension of F and the uniform conv e r gence of its sample a verages under ergo dic sampling. In particular, we show that if the gap dimension of F at resolution γ > 0 is finite, then for every er g o dic pro cess the sample a verages of functions in F ar e even tually within 1 0 γ of their limiting exp ectations uniformly over the class F . If the ga p dimensio n of F is finite for every res olution γ > 0 then the sample av erag es of functions in F co n verge uniformly to their limiting exp ectations. W e assume only that F is uniformly b ounded and count a ble (o r coun ta bly a pproximable). No smo othness conditions are placed on F , and no assumptions beyond e r go dicity are placed on th e sampling processe s. Our results extend existing work for i.i.d. pro cess e s. Runnin g Title: Gap Dimension a n d Uniform La ws of Large Numb ers Keyw ord s : combinatoria l d imension, e r god ic process, fat shattering, law of large n umbers , uniform conv ergence. ∗ T errence A dams is with the Department of Defense, 9800 Sav age Rd. Suite 6513, Ft. Meade, MD 20755 † Andrew Nob el is with th e Departmen t of Statistics and Op erations Researc h , Universi ty of North Car- olina, Chapel Hill, NC 27599 - 3260. Corr esp onding aut hor. Email: nobel@email.unc.edu 1 1 In tro duc tion Let X b e a complete separable metric space, and let F b e a countable f amily of Borel- measurable functions f : X → R . W e assume in what follo ws that F is u niformly b oun ded in the sense th at | f ( x ) | ≤ M for eve r y x ∈ X and f ∈ F , wh ere M < ∞ is a fixed constan t. Let X = X 1 , X 2 , . . . b e a s tationary ergo dic p ro cess taking v alues in X . By the ergo dic theorem, for eac h f ∈ F , the a v erages m − 1 P m i =1 f ( X i ) conv erges with probability one to E f ( X ). Of interest here is the limiting b eha vior of the d iscrepancy Γ m ( F : X ) △ = sup f ∈F 1 m m X i =1 f ( X i ) − E f ( X ) , (1) whic h measures the maxim um difference b et w een m -sample av erages and their limiting exp ectations o v er the functions in F . The discr ep ancy Γ m ( F : X ) and related qu an tities h a v e b een studied in a num b er of fields, including empir ical p r o cess theory , mac hin e learning and n on-parametric inf erence. The ma jorit y of existing w ork considers the case in whic h X 1 , X 2 , . . . are indep endent an d iden tically distrib uted, but there is also a substan tial literature conce r ned with th e b eha vior of the discrepancy for mixing pro cesses (see [1] and the discussion belo w ). Our fo cus here is on the general d ep endent case: the pro cess X is not assumed to satisfy an y mixing conditions b ey ond ergodicit y . When X is ergo dic, the limiting b eh avior of the discrepancy Γ m ( F : X ) can b e summa- rized b y a single num b er. As sho wn in Steele [15 ], Kingman’s subadditive ergo dic theorem implies that there is a non-negativ e constan t Γ( F : X ) suc h that lim m →∞ Γ m ( F : X ) → Γ( F : X ) wp1 . (2) W e w ill call Γ( F : X ) the asympt otic discr ep ancy of F on X , and will omit menti on of X when no confus ion will arise. When Γ( F : X ) = 0 the sample a v erages of fu nction f ∈ F con ve r ge uniformly to their limiting exp ectatio n s, an d F is said to b e a Gliv enk o Can telli class for the pro cess X . In this pap er we pr o vide b ounds on the asymptotic discrepancy of F in terms of a com binatorial quantit y kno w n a s the gap dimens ion that measures th e co mp lexit y of F at differen t resolutions or scale s . Definition: Let γ > 0. The family F is said to γ -shatter a fin ite set D ⊆ X if there is an α ∈ R such that f or ev ery D 0 ⊆ D there exists a fun ction f ∈ F satisfying f ( x ) > α + γ if x ∈ D 0 and f ( x ) < α − γ if x ∈ D \ D 0 2 The gap dimension of F at r esolution γ , wr itten dim γ ( F ), is the largest k suc h that F γ -shatters some set of cardinalit y k . If F can γ -shatter sets of arbitrarily large finite cardinalit y , then dim γ ( F ) = + ∞ . The gap dimension w as in tro duced by Kearns and Sc hapire [9] in a s ligh tly more general form. Sp ecifically , they allo we d the co n stan t γ to b e replaced b y a fi xed f unction g : X → R . W e w ill refer to this notion as the wea k gap d imension in what f ollo ws . Th e definition of gap dimension give n here w as suggested b y Alon, Ben-Da vid, Cesa-Bianc hi and Haussler [2], who also established elementa r y b ou n ds relating the gap an d w eak gap d imensions. Gap dim en sions ha ve b een referr ed to by a v ariet y of names in the literature, including scale-sensitiv e d im en sion and fat-shatt erin g dimension. O u r principal result is the follo win g theorem. As a b o ve , X is a s s umed to be a complete separable metric space. Theorem 1. L et F b e a c ountable, uniformly b ounde d family of Bor el me asur able functions f : X → R , and let X b e a stationary er go dic pr o c ess with values in X . If the asymptotic discr ep ancy Γ( F : X ) > η for some η > 0 , then dim γ ( F ) = ∞ for every γ ≤ η / 10 . The co n stan t 10 dividin g γ can, with min or mo difications of the pro of, b e impro v ed to 4 + ǫ , where ǫ is any fixed p ositiv e constan t. T heorem 1 h as the follo w in g, equiv alen t, form. Corollary 1. L et F b e as in The or em 1 . If dim γ ( F ) < ∞ for some γ > 0 then Γ( F : X ) ≤ 10 γ for ev ery stationary er go dic pr o c ess. In p articular, if dim γ ( F ) < ∞ for every γ > 0 , then Γ( F : X ) = 0 for every stationa ry er go dic pr o c ess. Uncoun table F amilies The coun tabilit y of F ensur es th at the discrepancies Γ m ( F , X ), m ≥ 1, are measurable. More imp ortan tly , coun tabilit y of F is us ed in the pr o of of Prop osition 1 and is a k ey assu mption in Lemma B. Nevertheless, one ma y readily ex- tend Theorem 1 to u ncoun table families under simp le approximat ion conditions. Call a (p ossibly uncounta b le) family F nice for a pro cess X if Γ m ( F : X ) is measurable for eac h m ≥ 1, and if for ev ery ǫ > 0 there exists a coun table sub-family F 0 ⊆ F suc h that lim sup m Γ m ( F : X ) ≤ lim sup m Γ m ( F 0 : X ) + ǫ with probab ility one. The conclusion of Theorem 1 immediately extends to an y ergo dic p ro cesses X f or whic h F is nice. In spite of suc h extensions, assumptions regarding the count ability or coun table appr o x- imabilit y of F cannot b e dr opp ed altogether, as they exclude extreme examples that can arise in the con text of dep endent pro cesses. W e illustrate with a simple example from [1]. Let T b e an irrational rotation of the un it circle S 1 with its un iform measure. Denote b y T i the i -fold comp osition of T with itself if i ≥ 1, the i -fold comp osition of T − 1 with itself if 3 i ≤ − 1 and the iden tit y if i = 0. F or eac h x ∈ S 1 let C x = ∪ ∞ i = −∞ { T i x } b e the (bi-infinite) tra jectory of x un der T , and let F b e the family of indicator functions of the sets C x . Note that F is uncoun table, and that ev ery set C x has Leb esgue measure zero. F or distinct p oin ts x 1 , x 2 ∈ S 1 , either C x 1 = C x 2 , or C x 1 ∩ C x 2 = ∅ , and therefore d im γ ( F ) = 1 for 0 < γ < 1 2 . No w let X i = T i X 0 , w here X 0 is un iformly distributed on S 1 . Th en the pro cess X = X 1 , X 2 , . . . is stationary and ergo dic. Moreo ver, it is easy to see that E f ( X ) = 0 for eac h f ∈ F , and that sup f ∈F m − 1 P m i =1 f ( X i ) = 1. Thus Γ m ( F : X ) = 1 with probability one f or eac h m ≥ 1, and the conclusion o f Corollary 1 f ails to hold. 1.1 Related W ork V apnik and Chervonenkis [18] ga ve necessary and sufficien t conditions for uniform con- v ergence of sample means in the i.i.d. case. Sp ecifically , they show ed that if X is i.i.d., then Γ( F : X ) = 0 if an d only if n − 1 log N ( ǫ, F , X n 1 ) → 0 in p robabilit y for ev ery ǫ > 0. Here N ( ǫ, F , X n 1 ) is th e n umb er of ǫ -balls needed to co ver F u nder th e empirical L 1 metric d ( f 1 , f 2 ) = n − 1 P n i =1 | f 1 ( X i ) − f 2 ( X i ) | . Extensions of these results to emp irical pro cesses can b e found , for example, in Gin´ e and Zinn [8] (see also Du dley [7]). T alagrand [16] ga ve necessary and sufficien t conditions for un if orm con ve r gence of sample means, which are d ifferen t than those of [18]. He show ed that Γ( F : X ) > 0 for an i.i.d. pro cess X w ith X i ∼ P if and only if there exists a set A with P ( A ) > 0 and γ > 0 suc h that for ev ery n ≥ 1 the family F γ -shatters P n -almost every sequence x 1 , . . . , x n ∈ A n . Alon et al. [2] considered the relationship b et we en the gap dimen s ion and th e learnabilit y of classes of un iformly b ound ed functions u nder indep end en t s amp ling. In particular, they sho wed that if F is a family of fu nctions f : X → [0 , 1] satisfying suitable measurabilit y conditions, and suc h that dim γ ( F ) is finite for some γ > 0, then lim n →∞ " sup X ∈I ( X ) P sup m ≥ n Γ m ( F : X ) > ε # = 0 (3) when ε = 48 γ . Here I ( X ) is the family of all i.i.d. pro cesses taking v alues in X . Con versely , if dim γ ( F ) = + ∞ , they sh o w ed that (3) fails to hold for ev ery ε < 2 γ . F ur th er conn ections b et ween the gap dimen sion and different notions of learnabilit y (in the i.i.d. case) can b e found in [3] and the references therein. T alagrand [17] and Mendelson and V ershynin [11] sho wed that th e L 2 co v ering num b ers of a uniform ly b oun ded sets of functions can b e b ound ed in terms of it s w eak gap dimension. In addition t o the pap ers cited ab o ve, there are a num b er of r esults on un iform con v er- gence f or dep end en t p ro cesses satisfying a v ariet y of standard mixing conditions; a discussion 4 of these results can b e found in [1]. In related w ork, Rao [13] and Billingsle y and T opsøe [6] studied and c haracterized classes of fu nctions F such that sup F | R f dP n − R dP | → 0 whenev er P n con ve r ges w eakly to P . As noted in [6 ], the eleme nts of suc h uniformit y classes are n ecessarily contin u ous a lmost ev erywh ere w ith resp ect to P . Bick el and Milla r [4] pro- vided sufficien t conditions for a more general n otion o f un iformit y , and revisited sev eral of the r esults in ea r lier p ap ers. Adams and Nobel [1] established Theorem 1 in the sp ecial case wh ere the elemen ts of F are in d icator f u nctions o f subsets of X . T he problem simplifies in this case , as dim γ ( F ) is zero for γ ≥ 1 / 2, and equal to the V C-dimension of F if 0 ≤ γ < 1 / 2. I f F h as fi nite V C-dimension, their r esults imply that Γ( F : X ) = 0 for ev ery ergo dic pro cess X . F or uniformly b ounded f amilies F they sho w that Γ( F : X ) = 0 for every ergodic pro cess X if dim 0 ( F ) < ∞ , or if F is a V C-graph class (c.f. [1 2 ]). 1.2 Ov erview The pro of of Theorem 1 is based on the direct construction of γ -shattered s ets of arb itrarily large c ard inalit y . In particular, the pro of d o es note make use of results or tec hniqu es from the study of uniform con vergence in the i.i.d. case. T he core of the co n struction, w hic h is con tained in Section 5 b elo w, follo ws the arguments in [1 ]. In th e next s ection we reduce Theorem 1 to an analogous result w ith X is equal to the unit in terv al. This equiv alent result is stated in Theorem 2. Section 3 con tains several preliminary defin itions and Lemmas used in the pro of of Th eorem 2. Th e pro of of Th eorem 2 is presen ted in S ections 4 - 7. S ection 4 giv es an outline of the pro of of the theorem. Th e pro ofs of t wo k ey pr op ositions are g iven in Sectio n s 5 and 6. The diagram b elo w pr o vides an o ve r view of the p ro of. Theorem 1 ⇐ Theorem 2 ⇐ Proposition 2 + Lemma 1 + Lemma B ⇑ Prop osition 1 + Lemma 2 2 Reduction to the Unit In terv al Let X and F b e as in T heorem 1 and le t X b e an X -v alued ergo dic pro cess, defined on an underlying p robabilit y sp ace (Ω , A , P ), suc h that Γ( F , X ) > η > 0. By assu mption, there 5 exists a n u m b er 0 < M < ∞ such that | f | ≤ M for eac h f ∈ F . Replacing f ∈ F with f ′ = ( f + M ) / 2 M , we may assume w ith ou t loss of generali ty th at eac h f ∈ F takes v alues in [0 , 1]. The pro of of the follo wing lemma, w h ic h relies on elemen tary ergo dic theory , is similar to that of Lemma 5 in [1], and is omitted. Lemma A. L et X b e a stationary er go dic pr o c ess with values i n X . If Γ( F : X ) > η > 0 , then X is ne c essarily unc ountable, and ther e exists a stationa ry er go dic pr o c ess ˜ X with values in X such that P ( ˜ X i = x ) = 0 for e ach x ∈ X and Γ( F : ˜ X ) > η . Let µ ( · ) b e the marginal distribution of X . By Lemma A, it suffices to establish Theorem 1 in the case where X is uncountable, and µ ( · ) is non-atomic. Let λ ( · ) d enote ord inary Leb esgue measure on the unit in terv al [0 , 1] equ ipp ed with its Borel subsets B . By standard results in r eal analysis ( c.f. Theorem 5.16 of [14]), there is a measur e space isomorphism b et ween ( X , S , µ ) and ([0 , 1] , B , λ ). More p recisely , there exist B orel measurable s ets X 0 ⊆ X and I 0 ⊆ [0 , 1], and a bijection ψ : X 0 → I 0 with the follo wing prop erties: (i) µ ( X 0 ) = λ ( I 0 ) = 1; (ii) ψ and ψ − 1 are measurable with resp ect to the restricted sigma algebras S ∩ X 0 and B ∩ I 0 , resp ectiv ely; and (ii i) µ ( A ) = λ ( ψ ( A )) for eac h A ∈ S ∩ X 0 . In particular, th e ev ent E = { X i ∈ X c 0 for some i ≥ 1 } h as p robabilit y zero. B y r emo ving E fr om th e underlying sample space, w e ma y assu me without loss of generalit y that X i ( ω ) ∈ X 0 for eac h samp le p oint ω and ea ch i ≥ 1. Define Y i = ψ ( X i ) for i ≥ 1. Then the p r o cess Y = Y 1 , Y 2 , . . . ∈ [0 , 1] is stationary and ergo dic with marginal distribution λ . F or eac h fun ction f ∈ F d efine an asso ciated f unction ˜ f : [0 , 1] → [0 , 1] via the rule ˜ f = ( f ◦ ψ − 1 )( u ) if u ∈ I 0 0 otherwise and let ˜ F = { ˜ f : f ∈ F } . It is easy to see that ˜ f ( Y i ) = f ( X i ), and in particular, that E ˜ f ( Y ) = E f ( X ). Thus Γ m ( ˜ F : Y ) = Γ m ( F : X ) with prob ab ility one for eac h m ≥ 1. Moreo v er, if k distinct p oin ts u 1 , . . . , u k ∈ [0 , 1] are γ -shattered by ˜ F , then necessarily eac h u j ∈ I 0 , and the (distinct) p oin ts ψ − 1 ( u 1 ) , . . . , ψ − 1 ( u k ) ∈ X are γ -shattered b y F . It follo ws that d im γ ( ˜ F ) ≤ dim γ ( F ). Theorem 1 is therefore a corollary of the f ollo w ing result. Theorem 2. L et F b e a c ountable family of Bor el me asur able functions f : [0 , 1] → [0 , 1] , and let X = X 1 , X 2 , . . . ∈ [0 , 1] b e a stationar y er go dic pr o c ess with X i ∼ λ . If the asymptotic discr ep ency Γ( F : X ) > η > 0 then dim γ ( F ) = ∞ for every γ ≤ η / 10 . 6 3 Preliminaries In this section w e defin e th r ee elemen tary n otions that will b e used in the pro of of Theorem 2. T he first is the segmen ts of a fu nction f : [0 , 1] → [0 , 1]. The second is the join of a sequence of families of disjoin t sets. The third is an ancestral set in a binary tr ee. Lemma 1 establishes a simp le co n nection b et we en joins, segmen ts and the ga p dimen s ion. Lemma 2 pro vid es a useful b ound for obtaining a subtr ee with goo d ancestral p rop erties fr om a large initial binary tree. 3.1 Segmen ts and Regular F amilies Let F and X b e as in the statemen t of Th eorem 2, and su pp ose that Γ( F : X ) > η > 0. Assume without loss of generalit y that η is rational, and let γ = η / 5. Let K = ⌊ γ − 1 ⌋ + 1 if γ − 1 is not an in teger, and K = γ − 1 otherwise. F or ea ch f ∈ F and 1 ≤ k ≤ K defi n e sets s k ( f ) = f − 1 [( k − 1) γ , k γ ) if 1 ≤ k ≤ K − 1 f − 1 [( K − 1) γ , 1] if k = K . (4) Definition: The sets s k ( f ) will b e called γ - se gments of f . Let π ( f ) = { s k ( f ) : 1 ≤ k ≤ K } b e the partition of [0 , 1] ge n erated by the γ -segmen ts of f . Two s egmen ts s k ( f ) and s k ′ ( f ) will b e called adjacen t if they co r resp ond to adjacen t inte r v als, equiv alen tly if | k − k ′ | = 1, and n on-adjacen t if | k − k ′ | ≥ 2. In ord er to establish Theorem 2 , we fir s t consider f amilies F whose elements satisfy a top ologica l r egularit y condition. Giv en a f amily F of functions f : [0 , 1] → [0 , 1], d efine the asso ciated collection of sets C ( F ) = { f − 1 [ a, b ) : 0 ≤ a < b < 2 r ational, and f ∈ F } . (5) Including v alues b > 1 ensures that C ( F ) con tains sets of the form f − 1 [ a, 1]. Note that C ( F ) is coun table if F is counta b le. Definition: A family F of m easur able functions f : [0 , 1] → [0 , 1] is r e gu lar if it is coun table, and eac h element of C ( F ) is a finite union of interv als. 3.2 Joins and the Gap Dimension In ergo dic theory , the join of a finite collectio n of sets conta ins the atoms of th eir generated field. Here w e emplo y a m in or generalization of this notio n . 7 Definition: Let D 1 , . . . , D k b e fi nite families sets in [0 , 1] s u c h that th e elemen ts of eac h family are d isjoin t. The j oin of D 1 , . . . , D k , d en oted W k i =1 D i or D 1 ∨ · · · ∨ D k , is the collectio n of all non empty intersec tions D 1 ∩ · · · ∩ D k where D i ∈ D i for i = 1 , . . . , k . The next lemma establishes a useful connectio n b et w een the gap dimension of F and the join of non-adjacen t segments of fu nctions f ∈ F . Its pro of is based on similar results in [10 ] and [1]. Lemma 1. Supp ose that f or some L ≥ 1 ther e exists a sub-family F 0 ⊆ F of 2 L functions, and a p air k , k ′ ∈ [ K ] of non-adjac ent inte gers such that the join J = _ f ∈F 0 { s k ( f ) , s k ′ ( f ) } of non-adjac ent γ -se gments has c ar dinality 2 2 L . Then dim γ / 2 ( F ) ≥ L . Remark: The conditions of the lemma ensu re that eac h of the possib le in tersections con- tained in J is n on-empt y , and therefore J h as maxim u m c ard inalit y . Pro of: Ind exing the elemen ts of F 0 in an arb itrary manner b y subsets of [ L ] := { 1 , . . . , L } , w e ma y write F 0 = { f α : α ⊆ [ L ] } . F or i = 1 , . . . , L , let x i b e any elemen t of the in tersection \ α ⊆ [ L ] ,i ∈ α s k ( f α ) ∩ \ α ⊆ [ L ] ,i 6∈ α s k ′ ( f α ) , whic h is non-emp t y by assumption. S upp ose without loss of generalit y that k < k ′ , and let c = γ ( k + k ′ − 1) / 2. Let β b e an y subset of [ L ] an d consider th e corresp ondin g fu nction f β ∈ F 0 . If i ∈ β , the selecti on of x i ensures that x i ∈ s k ( f β ), and consequen tly f β ( x i ) < γ k < c − γ / 2. On the other hand, if i ∈ β c then x i ∈ s k ′ ( f β ), and in this case f β ( x i ) ≥ γ ( k ′ − 1) ≥ c + γ / 2. As β w as arbitrary , it follo ws that dim γ / 2 ( F ) ≥ L . 3.3 Binary T rees and Ancestral Sets Binary trees app ear in sev eral k ey results of the p ap er. T hroughout w e consider standard binary trees T that h a v e a single ro ot, which is assumed to b e lo cated at the top of the tree. V ertices o f T are referred to as n o des, and u sually denoted by s or t . Eac h no de of T has either zero or tw o distinct c hildren and, w ith th e exception of the ro ot, a single paren t. A no de with t wo c hildr en is said to b e inte r nal; a no de with no c hildren is called a leaf. The set of lea ves in a tree T will b e denoted by ˜ T . A descending path in T is a sequence of adjacen t no des that pro ceeds only from p arent to c hild. The depth , or level , of a no de 8 t ∈ T is the length of the shortest (necessarily descending) path from th e ro ot to t . T he set of nod es at le vel r of T will be d enoted T [ r ]. The depth of T is the maxim um depth of an y no de in T . W e will exclusively c ons ider trees of finite depth, sa y L , that are complete in the sense that T [ r ] con tains 2 r no des for r = 0 , . . . , L . In this case, ˜ T = T [ L ] and eac h no de t ∈ T [ r ] with 0 ≤ r ≤ L − 1 is in ternal. Definition: Let T b e a binary tree. A nod e s in T is an ancestor of a no de t if there is a descending path in T from s to t of length greater than or equal to on e. A no de s will b e called a n ancestor of a set A ⊆ T if s is an ancestor of some t ∈ A . The next Lemma establishes a pigeon-hole t yp e result sho w ing that an y large collect ion of lea v es m u st ha ve a corresp ondingly large set of ancestors in some nearby lev el of the tree. Lemma 2. L et T b e a ful l binary tr e e of depth L , and let ˜ T denote the 2 L le aves of T . Supp ose that ther e exists a set of le aves S ⊂ ˜ T and a c onstant 0 < c < 1 such that | S | ≥ c 2 L ≥ 4 . L et u = ⌈ log 2 c − 1 + 1 ⌉ . Then ther e exi sts a set S ′ ⊆ T [ l 0 ] with L − u ≤ l 0 ≤ L − 1 such that for e ach no de s ∈ S ′ b oth of its childr en ar e anc estors of S , and | S ′ | ≥ c 2 L 4 L . (6) Pro of: F or l = 1 , . . . , L − 1, let m l b e the num b er of no d es s at lev el l that are the ancestor of some no de t ∈ S , and let n l b e the num b er of no des at lev el l with the prop erty that b oth their c hildr en are ancestors of a n o d e t ∈ S . It is easy to see that | S | = m L − 1 + n L − 1 , and more generally we ha ve | S | = m L − v + n L − v + n L − v +1 + · · · + n L − 1 ≤ 2 L − v + L − 1 X l = L − v n l for v = 1 , . . . , L − 1. S etting v = u , the assumption that | S | ≥ c 2 L yields L − 1 X l = L − u n l ≥ c 2 L − 2 L − u = 2 L − u ( c 2 u − 1) ≥ 2 L − u , where the last inequalit y fol lows from the definition of u . Let n l 0 b e the largest v alue of n l app earing in the sum ab o v e, and let S ′ b e the no des at lev el l 0 of T with the prop erty that b oth th eir c h ildren are ancestors o f S . Then | S ′ | = n l 0 ≥ 2 L − u u ≥ c 2 L 4 u ≥ c 2 L 4 L where the second inequalit y follo ws from the definition of u . 9 4 Outline of the Pro of of T heorem 2 In th is section w e p resen t an outline of the pro of of Theorem 2. W e b egin with P r op osition 1, which is the k ey result of the pap er. The p rop osition sh o ws that if F is regular and Γ( F : X ) > 0 then on e can asso ciate the n o d es of an arbitrarily large binary tree with segmen ts of selec t f unctions in F in suc h a w ay that (i) the intersectio n of seg ments a long ev ery path fr om the ro ot to a leaf is non-emp t y , and (ii) sibling segments are non-adjacent. The r esu lting stru cture will b e called an in tersection tree. Prop osition 2 refines Prop osition 1 using the pigeon-hole prin ciple fr om Lemma 2. It ensures that f or ev ery finite L ≥ 1 there is a family of L functions in F ha ving n on-adjacen t segmen ts with maximal join. The final step in the pro of of Theorem 2 is to remov e the regularit y condition on F . T his is done b y means of a m easure sp ace isomorphism d escrib ed in Lemma B. T h e pro of of Theorem 2 app ears in Section 7. 4.1 In tersect ion T rees Prop osition 1. L et F and X b e as i n The or e m 2. Supp ose that Γ( F : X ) > η > 0 and that F is r e gu lar. Then for e ach L ≥ 1 ther e exists functions g 1 , . . . , g L ∈ F and a c omplete binary tr e e T of depth L such that e ach no de t ∈ T is asso ciate d with a subset B t of [0 , 1] in such a way that the fol lowing two c onditions ar e satisfie d. (a) F or e ach internal no de t ∈ T at level ℓ , the sets B t ′ and B t ′′ asso ciate d with its childr en t ′ and t ′′ ar e e qual to non-adjac e nt se gments of g ℓ +1 . (b) F or e ach no de t ∈ T , the interse c tion W t of the sets B s app e aring along a desc ending p ath fr om the r o ot to t has non-empty interior. The p ro of of P r op osition 1 is giv en in Section 5. 4.2 Maximal Joins Prop osition 2. L et F and X b e as i n The or e m 2. Supp ose that Γ( F : X ) > η > 0 and that F is r e gu lar. L et γ = η / 5 . F or e ach L ≥ 1 ther e ar e functions f 1 , . . . , f L ∈ F and a p air k , k ′ ∈ [ K ] of non-adjac ent inte gers such that the join J = { s k ( f 1 ) , s k ′ ( f 1 ) } ∨ · · · ∨ { s k ( f L ) , s k ′ ( f L ) } of non-adjac ent γ - se gments has (maximum) c ar dinality 2 L , and every element of J has p ositive L eb esgue me asur e. The p ro of of P r op osition 2 app ears in S ection 6 b elow. 10 4.3 Remo ving Regularit y T ogether, Lemma 1 and Pr op osition 2 establishes T heorem 2 in the sp ecial case of reg u lar families. In ord er to remov e the assumption of regularit y , we require th e follo w ing result, whose pr o of can b e found in [1]. Lemma B. L et C = { C 1 , C 2 , . . . } b e a c ountable c ol le ction of Bor el su b sets of [0 , 1] such that the maximum diameter of the elements of the join J n = W n i =1 { C i , C c i } tends to zer o as n → ∞ . Then ther e exists a Bor el- me asur able map φ : [0 , 1] → [0 , 1] and a Bor el set V 1 ⊆ [0 , 1] of me asur e one such that: (i) φ pr eserves L eb esgue me asur e and is 1:1 on V 1 ; (ii) the image V 2 = φ ( V 1 ) and the inverse map φ − 1 : V 2 → V 1 ar e Bor el me asur able; (iii) φ − 1 pr eserves L eb esgu e me asur e; and (iv) for every set C ∈ C ther e is a set U ( C ) , e qual to a finite union of intervals, such that λ ( φ ( C ) △ U ( C )) = 0 , wher e △ is the usual symmetric differ enc e. Remark: Lemm a B is applied to the family of sets C = C ( F ). The existence of the isomorphism φ requires that C b e coun table, and this leads to th e requiremen t that F b e coun table as well. The p ro of of T heorem 2 is giv en in Section 7 b elo w. 5 Pro of of Prop osition 1 Construction of the in tersection tree in Prop osition 1 is based on a m u lti-stag e pro cedur e that is detaile d b elo w. At the fi rst stage, we pro d uce a refining sequ ence J 1 , J 2 , . . . of joins in [0 , 1] and simultaneo u sly identify a sequence of functions f 1 , f 2 , . . . ∈ F . The join J n is generated from selected non-adjacent seg ments of f 1 , . . . , f n . Th e function f n +1 c hosen at step ( n + 1) is a n elemen t of F whose a verage differs from its exp ectation b y at lea st η on a sample su fficien tly large to ensur e that the relativ e frequ ency of ev ery elemen t A ∈ J n is close to its pr ob ab ility . F rom J n and f n +1 w e iden tify a set G n equal to the union of the cells in J n on whic h the a verage of f n +1 is far fr om its exp ectation. The sets G n are used, in turn , to p ro duce a limiting “splitting” set R 1 via a w eak con verge n ce argument. This sequen tial pro cess is rep eated in subsequent stages, with the imp ortan t feature that the splitting sets R 1 , . . . , R s − 1 iden tified at stages 1 , . . . , s − 1 are u sed to generate the joins and the splitting set at stage s . The pro of of Prop osition 1 follo w s the pr o of of Pr op osition 3 in [1 ]. Th e earlier prop o- sition treats th e sp ecial case in whic h the elemen ts of F are indicator functions of sets, 11 and h en ce binary v alued. The definition and construction of the splitting sets R s follo w the arguments in the binary case, the principal difference b eing that the generalized joins defined here in volv e segmen ts rather than sets. The pro of of Lemma 4 b elo w an d the thr ee displa ys preceding it are identical to arguments in [1]. Differences in the proofs emerge from the fo cus here on n on-adjacen t segment s. In particular, the use of in tersection trees or a similar hierarc hical str ucture app ears to b e required, and the argumen ts that follo w Lemma 4 are somewhat more in v olv ed than in the binary case. The p ro of of Prop osition 1 r equires that one carefu lly k eep trac k of the quant ities ap- p earing at eac h step and stage of the construction, and ho w these quant ities are defined . F or this reason, and due to the differences d iscussed ab ov e, it is not p ossible to substanti ally shorten the pro of Prop osition 1 b y an app eal to the earlier results. W e pro vide a detailed argumen t b elo w for completeness. 5.1 Initial Construction Let F b e a coun table family of Borel measurable fu nctions f : [0 , 1] → [0 , 1], and let X = X 1 , X 2 , . . . ∈ [0 , 1] b e a stationary ergo d ic pro cess d efined on an underlying probabilit y space (Ω , A , P ) suc h that X i ∼ λ . Assume that Γ( F : X ) > η > 0, and that ev ery elemen t of C ( F ) is a fin ite union of in terv als. Let δ = η / 12, and note that 0 < δ < 1. F or eac h n ≥ 1 let D n = { [ k 2 − n , ( k + 1) 2 − n ) : 0 ≤ k ≤ 2 n − 2 } ∪ { [1 − 2 − n , 1] } b e the n th order dy adic subinterv als of [0 , 1], and let D = ∪ n ≥ 1 D n . The set A 0 consisting of the end p oin ts of the in terv als fr om wh ic h the elements of C ( F ) and D are constructed is counta b le, a n d therefore has L eb esgue measure ze r o. Removing a P -n ull set o f outcomes from Ω, w e ma y assume th at X i ( ω ) ∈ A c 0 for eac h ω ∈ Ω and for ev ery i ≥ 1. (This assumption is used in the last part of the pro of.) Belo w w e iden tify a sequence o f splitting sets R 1 , R 2 , . . . ⊆ [0 , 1] in stages, and then use these sets to construct the in tersection tree. Stage 1. The first stage of the construction pro ceeds as follo ws. Let f 1 b e any f unction in F , and supp ose th at fu nctions f 1 , . . . , f n ∈ F ha ve already b een selected. Let J n = D n ∨ π ( f 1 ) ∨ · · · ∨ π ( f n ) b e the join of the dy adic in terv als of order n and the γ -segmen ts of the p reviously selected fu n ctions. Here and in what follo ws w e tak e γ = η / 5. F or eac h 12 ω ∈ Ω, eac h function g : [0 , 1] → [0 , 1], and eac h m ≥ 1, define the (p oin t w ise) discrepan cy ∆ ω ( g : m ) △ = 1 m m X i =1 g ( X i ( ω )) − E g ( X ) , (7) whic h measur es the d ifference b et ween the exp ectation of g ( X ) and its a verage o ve r the sample sequence X 1 ( ω ) , . . . , X m ( ω ). F rom the ergod ic theorem and Prop osition 2, it f ollo ws that there exists a sample p oin t ω n +1 ∈ Ω, an in teger m n +1 ≥ 1 and a fun ction f n +1 ∈ F suc h that ∆ ω n +1 ( I A : m n +1 ) ≤ δ λ ( A ) for eac h A ∈ J n (8) and ∆ ω n +1 ( f n +1 : m n +1 ) > η . (9) Defining the join J n +1 = D n ∨ π ( f 1 ) ∨ · · · ∨ π ( f n +1 ) and con tin u ing, w e m ay select functions f n +2 , f n +3 , . . . ∈ F in a similar fashion. The relations (8) and (9) toge th er ensure that for many cells A ∈ J n the a v erage of f n +1 on A differs from its exp ectation o ver A . T o mak e t h is p recise, defin e the family H n = n A ∈ J n : ∆ ω n +1 ( f n +1 · I A : m n +1 ) > η 2 λ ( A ) o . As the next lemma shows, the sets in H n ⊆ J n o ccup y a n on-trivial fraction of the unit in terv al. Lemma 3. If G n = ∪ H n is the union of the sets A ∈ H n , then λ ( G n ) ≥ η / 6 . Pro of: T o s implify notation, let ω = ω n +1 , f = f n +1 , and m = m n +1 . Decomp osing ∆ ω ( f : m ) ov er the elements of J n and applying th e triangle inequalit y , we obtain the b ound η ≤ X A ∈ H n ∆ ω ( f · I A : m ) + X A ∈ J n \ H n ∆ ω ( f · I A : m ) . By d efinition of H n , th e second term is at most η / 2. The first term is at most X A ∈ H n ∆ ω ( f · I A : m ) ≤ X A ∈ H n " 1 m m X i =1 ( f · I A )( X i ( ω )) + E ( f · I A )( X ) # ≤ X A ∈ H n " 1 m m X i =1 I A ( X i ( ω )) + λ ( A ) # ≤ X A ∈ H n ∆ ω ( A : m ) + 2 λ ( G n ) ≤ ( δ + 2) λ ( G n ) ≤ 3 λ ( G n ) . 13 where the first inequ ality follo ws from the f act that 0 ≤ f ≤ 1. Com bin in g the b oun ds ab o ve yields the stated in equalit y . F or eac h n ≥ 1 defin e a sub-p r obabilit y measure λ n ( B ) = λ ( B ∩ G n ) on ([0 , 1] , B ), w here G n = ∪ H n . The collection { λ n } is tigh t, and is s uc h that λ n ([0 , 1]) ≥ η / 6 for eac h n . There is therefore a subsequence n (1) < n (2) < · · · suc h that λ n ( r ) con ve r ges w eakly to a sub-prob ab ility mea s ure ν 1 on ([0 , 1] , B ). It is easy to s ee that ν 1 is absolutely contin u ous with resp ect to λ , that ν 1 ([0 , 1]) ≥ η / 6, and th at t h e Radon-Nikodym deriv ativ e dν 1 /dλ is is b ound ed ab ov e by 1. Defin e R 1 = { x : ( dν 1 /dλ )( x ) > δ } . F rom the previous remarks it follo ws that η 6 ≤ ν 1 ([0 , 1]) = Z 1 0 dν 1 dλ dλ = Z R 1 dν 1 dλ dλ + Z R c 1 dν 1 dλ dλ ≤ Z R 1 1 dλ + Z R c 1 δ dλ ≤ λ ( R 1 ) + δ. (10) As δ = η / 12, we ha v e λ ( R 1 ) ≥ η / 12 > 0. This completes the first stage of the construction. F urther Stages. Sub s equen t stages follo w the general iterativ e pro cedure used to construct R 1 . Let ω n,s , f n,s , J n,s , m n,s , H n,s and G n,s denote the v arious quan tities app earing at the n th step of stage s . In particular, let f n, 1 = f n b e the n ’th function pro d u ced at stage 1, and d efine J n, 1 , m n, 1 , H n, 1 and G n, 1 in a similar fashion. Supp ose that for some s ≥ 2 the construction of the splitting sets R 1 , . . . , R s − 1 is complete, and that w e wish to construct the set R s at stage s . Let f 1 ,s b e an y elemen t of F , and supp ose that f 1 ,s , . . . , f n,s ha ve already b een selected. Define th e join J n,s = D n ∨ n _ i =1 π ( f i,s ) ∨ s − 1 _ j =1 { R j , R c j } . It follo ws from the ergo dic theorem and Prop osition 2 that there exists a sample p oin t ω n +1 ,s ∈ Ω, an in teger m n +1 ,s ≥ 1, and a function f n +1 ,s ∈ F suc h that ∆ ω n +1 ,s ( I A : m n +1 ,s ) ≤ δ λ ( A ) for eac h A ∈ J n,s (11) and ∆ ω n +1 ,s ( f n +1 ,s : m n +1 ,s ) > η . (12) W e ma y then d efine the join J n +1 ,s using f n +1 ,s and con tinue in the same fashion. F or eac h n ≥ 1 defin e th e family H n,s = n A ∈ J n,s : ∆ ω n +1 ,s ( f n +1 ,s · I A : m n +1 ,s ) > η 2 λ ( A ) o 14 and G n,s = ∪ H n,s ⊆ [0 , 1]. Lemm a 3 ensures that λ ( G n,s ) ≥ η / 6. As in stage 1, ther e is a sequen ce of in tegers n s (1) < n s (2) < · · · suc h that the sub-prob ab ility measures λ r,s ( B ) = λ ( B ∩ G n s ( r ) ,s ) con verge we akly as r → ∞ to a sub - probabilit y measure ν s on ([0 , 1] , B ) that is absolutely con tin u ous with resp ect to λ ( · ). Define R s = { x : ( dν s /dλ )( x ) > δ } . The argument in (10) shows that λ ( R s ) ≥ η / 12. In what follo ws, w e need to consid er densit y p oin ts o f R s . T o th is end, for eac h s ≥ 1 let ˜ R s = x ∈ R s : lim α → 0 λ (( x − α, x + α ) ∩ R s ) 2 α = 1 . b e the Leb esgue p oin ts of R s . By stand ard resu lts on differen tiation of in tegrals (c.f. Th e- orem 31.3 of Billingsley (1 995)), we ha ve λ ( ˜ R s ) = λ ( R s ) ≥ η / 12. 5.2 Existence of the I n tersection T ree Fix an in teger L ≥ 1. As the measures of the sets ˜ R s are b ounded a w ay from zero, there exist p ositiv e in tegers s 0 < s 1 < . . . < s L suc h that λ ( T L j =0 ˜ R s j ) > 0. Define the int ers ections Q l = L − l \ j =0 ˜ R s j for l = 0 , 1 , . . . , L , and note that Q l ⊆ Q l +1 . I n what follo ws, B o , B and ∂ B denote, resp ectiv ely , the inte r ior, closure and b ound ary of a set B ⊆ [0 , 1]. T he follo win g result is a strengthened v ersion of Prop osition 1 th at incorp orates the set s Q l . Its pro of completes the p ro of of Proposition 1. Prop osition 3. Supp ose that Γ( F : X ) > η > 0 and that every element o f C ( F ) is a finite union of intervals. Then ther e exists f u nctions g 1 , . . . , g L ∈ F and a c omplete binary tr e e T of depth L such that e ach no de t ∈ T is asso ciate d with a subset B t of [0 , 1] subj e ct to the fol lowing c onditions: (a) F or e ach internal no de t ∈ T [ l ] , the sets B t ′ and B t ′′ asso ciate d with its childr e n t ′ and t ′′ ar e e qual to non-adjac ent η / 5 -se gments of g l +1 . (b) F or e ach no de t ∈ T , the interse c tion W t of the sets B s app e aring along a desc ending p ath fr om the r o ot to t has non-empty interior. (c) If t ∈ T [ l ] then the interse ction W o t ∩ Q l is non-empty. 15 Pro of of Prop osition 3 : Let T b e a complete binary tr ee of d epth L w ith ro ot t 0 , and let B t 0 = [0 , 1]. W e will assign sets B t to the no des of T on a level- by-lev el basis, b eginnin g with the children of the ro ot. W e s ho w b elo w that there exists a function g 1 ∈ F , and non-adjacen t γ -segmen ts U, V ∈ π ( g 1 ), suc h that U o ∩ Q 1 and V o ∩ Q 1 are n on-empt y . Th e c hildren of t 0 ma y then b e associated with U and V , in either order. T o b egin, c ho ose a p oin t x 1 ∈ Q 0 , w hic h is non-empt y b y constru ction, and let ǫ = δ / 2( δ + 1). It f ollo ws from the definition of the sets ˜ R s j , that there exists α 1 > 0 such that I 1 △ = ( x 1 − α 1 , x 1 + α 1 ) satisfies λ ( I 1 ∩ Q 0 ) ≥ (1 − ǫ ) λ ( I 1 ) = 2 α 1 (1 − ǫ ) . (13) T o simplify notation, let κ = s L . Th e last d ispla y and the d efinition of R κ imply that ν κ ( I 1 ∩ R κ ) = Z I 1 ∩ R κ dν κ dλ dλ > δ λ ( I 1 ∩ R κ ) ≥ 2 α 1 (1 − ǫ ) δ . Let { n κ ( r ) : r ≥ 1 } b e the subs equ ence used to define the sub -probabilit y ν κ . As I 1 is an op en set, it follo ws from the P ortmant eau theorem that lim inf r →∞ λ ( I 1 ∩ G n κ ( r ) ,κ ) ≥ ν κ ( I 1 ) ≥ ν κ ( I 1 ∩ R κ ) > 2 α 1 (1 − ǫ ) δ . Cho ose r sufficien tly large so that λ ( I 1 ∩ G n κ ( r ) ,κ )) > 2 α 1 (1 − ǫ ) δ and 2 − n κ ( r ) < δ α 1 / 4. W e require the follo w ing subsidiary lemma. Its proof is identic al to Lemma 4 in [1], but is included in the Ap p endix for completeness. Lemma 4. Ther e exists a set A ∈ H n κ ( r ) ,κ such that A ⊆ I 1 and λ ( A ∩ Q 1 ) > 0 . Mor e over, A is c ontaine d in Q 1 . Let g 1 = f n κ ( r )+1 ,κ ∈ F . By assu mption, eac h element of π ( g 1 ) is a finite union of in terv als, and n o random v ariable X i tak es v alues in the fi nite set ∪ C ∈ π ( g 1 ) ∂ C . W e argue that the set A identified in Lemma 4 (and ther efore Q 1 ) has n on -emp t y in tersection w ith the in teriors of t wo non-adj acent segmen ts of g 1 . As A h as p ositiv e measur e, and the b oundary of eac h segment of g 1 has measure zero, it suffices to exclude the p ossibilit y that A in tersects no seg ments, only one segmen t, or only t w o adjacen t segmen ts of g 1 . As λ ( A ) > 0 and th e segmen ts of g 1 form a partition of [0 , 1], A must in tersect the in terior of at least one segment of g 1 . Su pp ose that A int ers ects only one segmen t U = s k ( g 1 ) of g 1 . Let h ( x ) = g 1 ( x ) − ( k − 1) γ , and note t h at 0 ≤ h ( x ) ≤ γ for eac h x ∈ U . In th is case, E ( g 1 I A )( X ) = X C ∈ π ( g 1 ) E ( g 1 I A I C )( X ) = E ( g 1 I A I U )( X ) = γ ( k − 1) λ ( A ) + E ( h I A )( X ) . (14) 16 Similarly , for eac h m ≥ 1, 1 m m X i =1 ( g 1 I A )( X i ) = 1 m m X i =1 X C ∈ π ( g 1 ) ( g 1 I A I C )( X i ) = 1 m m X i =1 ( g 1 I A I U )( X i ) = γ ( k − 1) 1 m m X i =1 I A ( X i ) + 1 m m X i =1 ( h I A )( X i ) . (15) Letting m = m n κ ( r )+1 ,κ , we find th at η 2 λ ( A ) < ∆ w ( g 1 · I A : m ) ≤ γ ( k − 1) ∆ w ( I A : m ) + max ( 1 m m X i =1 ( h I A )( X i ) , E ( h I A )( X ) ) ≤ γ ( k − 1) ∆ w ( I A : m ) + γ max ( 1 m m X i =1 I A ( X i ) , λ ( A ) ) ≤ γ ( k − 1) ∆ w ( I A : m ) + γ ( λ ( A ) + ∆ w ( I A : m )) ≤ ∆ w ( I A : m ) + γ λ ( A ) ≤ ( δ + γ ) λ ( A ) . Here the fi rst inequalit y follo ws from the d efinition of H n κ ( r ) ,κ , th e second follo w s from (14) and (15), the third follo ws from the b oun d on h ( · ) , and last f ollo ws from the defin ition of m . Comp aring the fi rst and last terms ab o ve , our definition of δ = η / 12 an d γ = η / 5 yields a contradict ion. Supp ose finally that A in tersects only t wo adj acent segment s of g 1 , s ay U = s k ( g 1 ) and V = s k +1 ( g 1 ). Let h ( x ) b e defin ed as ab ov e, and note that 0 ≤ h ( x ) ≤ 2 γ for x ∈ U ∪ V . Arguing as ab o v e, w e find that E ( g 1 · I A )( X ) = γ ( k − 1) λ ( A ) + E ( h I A )( X ) , and that for eac h m ≥ 1, 1 m m X i =1 ( g 1 I A )( X i ) = γ ( k − 1) 1 m m X i =1 I A ( X i ) + 1 m m X i =1 ( h I A )( X i ) . Letting m = m n κ ( r )+1 ,κ , the p revious t wo displa ys , and argumen ts lik e those ab o ve, can b e used to sho w that η 2 λ ( A ) < ∆ w ( g 1 · I A : m ) ≤ γ ( k − 1) ∆ w ( I A : m ) + 2 γ ( λ ( A ) + ∆ w ( I A : m )) ≤ (1 + γ ) ∆ w ( I A : m ) + 2 γ λ ( A ) ≤ ((1 + γ ) δ + 2 γ ) λ ( A ) . 17 Comparing the first and last terms, th e definition of δ = η / 12 and γ = η / 5 yields a con tradiction, and we conclude that A inte r sects the in teriors of tw o non-adjacen t segmen ts U and V of g 1 . This completes the assignment of sets to the c h ildren of the ro ot t 0 . Supp ose n ow that for some l ≤ L − 1 w e ha ve assigned sets B t ⊆ [0 , 1] to eac h no d e t of T ha vin g depth less than or equal to l , in suc h a wa y that p rop erties (a) - (c) of the Prop osition hold. Th ere are 2 l no des of T at distance l from the r o ot. Denote t h ese no d es b y 1 ≤ j ≤ 2 l , and let W j b e the intersectio n of the s ets B s app earing on th e descend in g path from the ro ot t 0 of T to no de j at lev el l . By assump tion, W o j ∩ Q l is n on-empt y: let x j ∈ W o j ∩ Q l for eac h j ∈ [2 l ]. Select α l +1 > 0 s u c h that, for eac h j , the in terv al I j △ = ( x j − α l +1 , x j + α l +1 ) is con tained in W o j and satisfies λ ( I j ∩ Q l ) ≥ (1 − ǫ ) λ ( I j ) = 2 α l +1 (1 − ǫ ) . Let κ ′ = s L − l and let { n κ ′ ( r ) : r ≥ 1 } b e the su bsequence used to d efine the su b-probabilit y ν κ ′ . F or ea ch in terv al I j , lim inf r →∞ λ ( I j ∩ G n κ ′ ( r ) ,κ ′ ) ≥ ν κ ′ ( I j ) ≥ ν κ ′ ( I j ∩ R κ ′ ) > 2 α l +1 (1 − ǫ ) δ . where the last in equalit y follo ws fr om the p revious displa y , and the fact that Q l ⊆ R κ ′ . Cho ose r sufficien tly large so that λ ( I j ∩ G n κ ′ ( r ) ,κ ′ ) > 2 α l +1 (1 − ǫ ) δ f or eac h j = 1 , . . . , 2 l , and 2 − n κ ′ ( r ) < δ α l +1 / 4. Applying the pro of of Lemma 4 to eac h interv al I j , we ma y identify sets A 1 , A 2 , . . . , A 2 l ∈ H n κ ′ ( r ) ,κ ′ such that λ ( A j ) > 0, A j ⊆ I j ⊆ W o j , and A j ⊆ Q l +1 for eac h j = 1 , . . . , 2 l . Define g l +1 = f n κ ′ ( r )+1 ,κ ′ ∈ F . Ar gumen ts identica l to those in the case l = 0 ab o v e sho w that, for eac h j , there exist non-adjacen t segment s U j , V j of g l +1 suc h that A j ∩ U o j and A j ∩ V o j are non-empt y . Assigning the sets U j and V j to the left a n d righ t c h ild ren of j in T , in either order, ensur es that prop erty (a) of the prop osition is satisfied. F or the child t of n o de j asso ciated with the set U j w e hav e W t = W j ∩ U j . It follo ws from the fact th at A j ⊆ W o j , A j ∩ U o j 6 = ∅ and A j ⊆ Q l +1 that W o t ∩ Q l +1 6 = ∅ , and therefore prop erties (b) and (c) of the p rop osition are satisfied. The argumen t for the other c hild of n o de j is similar. This completes t h e p ro of of P r op osition 3. 6 Pro of of Prop osition 2 Pro of of Prop osition 2: Fix L ≥ 1 suc h that 2 L − 1 /K 2 ≥ 4, and let T b e the complete binary tree of d epth L describ ed in Prop osition 1. Supp ose that eac h inte r ior n o d e in t ∈ T 18 is lab eled with th e in dices of the segmen ts assigned to its c h ildren: if the segmen ts s k ( g r ) and s k ′ ( g r ) of g r are assigned to the c hildren of a node t ∈ T [ r − 1], then t is a s signed the lab el ℓ ( t ) = ( k , k ′ ) ∈ [ K ] 2 , w here [ K ] = { 1 , . . . , K } . Let L 0 = L − 1. By an elemen tary p igeon-hole argument , there exist non-adjacent in tegers k 0 , k ′ 0 ∈ [ K ] suc h that the set S 0 of no des t ∈ T [ L 0 ] with ℓ ( t ) = ( k 0 , k ′ 0 ) has cardinalit y at least 2 L 0 /K 2 . (Here K 2 is an upp er b ound on the num b er of non-adjacen t pairs k , k ′ ∈ [ K ].) Let u 0 = ⌈ log 2 K 2 + 1 ⌉ . It f ollo ws from L emm a 2 and an additional pigeon hole argum ent that ther e exists an in teger L 1 , a p air k 1 , k ′ 1 ∈ [ K ] o f non-adjace nt integ ers, and a set of no des S 1 ⊆ T [ L 1 ] with the follo wing pr op erties: (i) L 0 − u 0 ≤ L 1 ≤ L 0 − 1; (ii) ℓ ( t ) = ( k 1 , k ′ 1 ) for ev ery t ∈ S 1 ; (iii) for ev ery t ∈ S 1 , eac h c hild of t is an ancestor of S 0 ; and (iv) | S 1 | ≥ 2 L 0 / 4 LK 4 . I n particular, in equalities (i) and (iv) imply that | S 1 | ≥ 2 L 1 2 L 0 − L 1 4 LK 4 ≥ 2 L 1 1 2 LK 4 ≥ 2 L 0 8 LK 6 . (16) If the last term ab o v e is greater than or equal to 4, then w e ma y app ly Lemma 2 again to find an in teger L 2 and a set o f no d es S 2 ⊆ T [ L 2 ] with prop erties analogous to (i) - (iv) ab o ve . Conti nuing in this fashion, w e obtain integ ers L 0 > L 1 > · · · > L R ≥ 0, sets of no des S r ⊆ T [ L r ], and n on-adjacen t pairs k r , k ′ r ∈ [ K ] su c h th at for 1 ≤ r ≤ R and for ev ery no de t ∈ S r , ℓ ( t ) = ( k r , k ′ r ) and b oth c hild r en of t are ancestors of S r − 1 . In particular, using argum ents like those in (16), one ma y sho w that | S r | ≥ 2 L r 1 (2 LK 2 ) r K 2 ≥ 2 L − 1 4 r · K 2 r +1 · (2 LK 2 ) r ( r +1) / 2 , and therefore R = R ( L ) can b e tak en to b e the largest in teger r ≥ 1 for wh ic h the last term ab o ve is g r eater th an 4. In particular, R ( L ) tends to infinit y with L . F rom the constru ction ab o v e, and an a d ditional pigeon-hole argumen t, w e ma y iden tify an intege r N = N ( L ) ≥ R ( L ) /K 2 and a subsequence i 0 < i 1 < · · · < i N of L R , L R − 1 , . . . , L 0 suc h that ( k i j , k ′ i j ) = ( k, k ′ ) for a fixed non-adjacen t p air ( k , k ′ ) ∈ [ K ] 2 . F rom the associated no de-sets S i 0 , . . . , S i N one ma y construct a n em b edd ed binary sub tree T o of T all of whose no de lab els are equal to ( k , k ′ ). T o see this, let the ro ot of T o b e any no de s ∈ S i 0 . At eac h lev el 0 ≤ r ≤ N − 1 let t h e left and righ t c hild ren of t ∈ T o [ r ] b e (n ecessarily distinct) descendan ts in S i r +1 of the c hildren of t ∈ T . Then it is easy to s ee that T o is a complete binary t r ee of depth N . F or r = 0 , . . . , N − 1 let h r = g i r +1 . By construction, eac h no de t ∈ T o [ r ] is con tained in S i r and h as lab el ℓ ( t ) = ( k , k ′ ). Th us the children t ′ and t ′′ of t in T o are asso ciated 19 with the segments s k ( h r ) and s k ′ ( h r ) of h r . F or eac h terminal no de t ∈ ˜ T o let W t b e the in tersection of the sets B s app earing on the d escending path (in T ) from the ro ot of T o to t . The constru ction of T o ensures that ev ery member of { W t : t ∈ ˜ T o } is con tained in a uniqu e elemen t of the join J = { s k ( h 0 ) , s k ′ ( h 0 ) } ∨ · · · ∨ { s l ( h N − 1 ) , s l ′ ( h N − 1 ) } Moreo v er, by Prop osition 1 , eac h set W t has n on-empt y in terior, and p ositiv e Leb esgue measure, and the s ame is therefore true f or eac h elemen t of J . As N ( L ) tend s to infi nit y with L , the lemma follo ws. 7 Pro of of Theorem 2 Pro of of Theorem 2: Let F and X b e as in the statemen t of the prop osition. T hen Γ( F : X ) > η > 0. Let C ( F ) b e the countable family d efined in (5). Without loss of generalit y , we ma y assum e that F contai n s the iden tit y function f 0 ( x ) = x , and therefore C ( F ) satisfies the sh rinking diamete r condition of Lemma B. Let the sets V 1 , V 2 ⊆ [0 , 1] and map φ ( · ) b e as in the s tatemen t of Lemma B. Define random v ariables Y i = φ ( X i ) f or i ≥ 1. T hen the pro cess Y = Y 1 , Y 2 , . . . is stationary and ergodic with Y i ∼ λ . F or eac h f ∈ F define an asso ciated function g f : [0 , 1] → [0 , 1] via th e ru le g f ( u ) = ( f ◦ φ − 1 )( u ) if u ∈ V 2 0 if u ∈ V c 2 (17) and let G = { g f : f ∈ F } . Argum en ts like those in Section 2 ab o v e sho w that Γ m ( G : Y ) is equal to Γ m ( F : X ) with probabilit y one for eac h m ≥ 1, and consequ en tly Γ( G : Y ) > η . Let the constan ts γ (equal to η / 5) and K , and the segmen ts s k ( f ), b e defined as in (4 ), and let ǫ = Γ( G : Y ) − η > 0. Cho ose a finite sequ ence o f rational n umb ers 0 = a 0 < a 1 < · · · < a N = 1 that in cludes { γ k : k = 1 , . . . , K − 1 } and is suc h that max j | a j − a j − 1 | < ǫ/ 2 . Define in terv als U j = [ a j − 1 , a j ) for j = 1 , . . . , N − 1, and let U N = [ a N − 1 , 1]. Usin g (17) one m ay ve r ify th at for eac h g f ∈ G , g − 1 f U j = φ ( f − 1 U j ) if 2 ≤ j ≤ N φ ( f − 1 U j ) ∪ V c 2 if j = 1 , where the second condition results from the fact th at the interv al U 1 con tains zero. 20 Let U b e the family of subsets of [0 , 1] that are equal to finite unions of in terv als, and let A ∼ = B denote the fact that A and B are equiv alen t mo d 0 , in other w ords, λ ( A △ B ) = 0. Fix a function f ∈ F , and let g f b e the associated elemen t of G . Lemma B and the fact that λ ( V c 2 ) = 0 imply that there exists sets C 1 , . . . , C N ∈ U such that g − 1 f U j ∼ = C j for 1 ≤ j ≤ N . If i 6 = j then λ ( C i ∩ C j ) = λ ( g − 1 f U i ∩ g − 1 f U j ) = λ ( g − 1 f ( U i ∩ U j )) = 0 so that C i and C j can intersect only at the endp oint s of their constitutiv e in terv als. It follo w s that the function h f ( u ) = P N j =1 a j − 1 I C j ( u ) approximat es g f in the s ense that | g f ( u ) − h f ( u ) | < ǫ/ 2 with p robabilit y one. Moreo ver, h − 1 f [ a, b ) ∈ U for all rational a, b . Let H = { h f : f ∈ F } b e the family of sim p le approxi m ations to the elemen ts of G . Then C ( H ) is con tained in U , and a str aigh tforw ard argumen t sho ws that Γ( H : Y ) > η . Fix L ≥ 1. As H satisfies the conditions of Prop osition 2, there exist functions f 1 , . . . , f L ∈ F and a pair of n on-adjacen t in tegers k, k ′ ∈ [ K ] suc h that the join J h = L _ ℓ =1 { s k ( h f ℓ ) , s k ′ ( h f ℓ ) } has 2 L elemen ts, eac h with p ositive measure. In order to obtain a f ull j oin for the segmen ts of f 1 , . . . , f L , we examine h o w the segmen ts of h f are r elated to those of f . T o this end, let i < j b e such that a i = ( k − 1) γ and a j = k γ . Then for ev ery f ∈ F , s k ( h f ) = h − 1 f [( k − 1) γ , k γ ) = h − 1 f [ a i , a j ) = j − 1 [ r = i C r +1 ∼ = j − 1 [ r = i g − 1 f U r +1 = g − 1 f [( k − 1) γ , k γ ) ∼ = φ ( f − 1 [( k − 1) γ , k γ )) = φ ( s k ( f )) . The same argumen t applies to s k ′ ( h f ), and therefore every element of J h is equiv alen t mod zero to an elemen t of the join J ′ h = L _ ℓ =1 { φ ( s k ( f ℓ )) , φ ( s k ′ ( f ℓ )) } . As φ is a bijection almost ev erywher e, every element of J ′ h is equiv alen t mod zero to a s et of th e form φ ( A ), where A is an ele m ent of the join J f = L _ ℓ =1 { s k ( f ℓ ) , s k ′ ( f ℓ ) } . 21 As eac h cell of J ′ h has p ositiv e Leb esgue measure, the same is true of th e cells of J f . In particular, J f has (maxim um ) cardinalit y 2 L . As L ≥ 1 wa s arbitrary , T heorem 2 follo ws from Lemma 1. A App endix A.1 Pro of of Lemma 4 The p ro of of L emm a 4 app ears in [1]; w e repro duce it here for completeness. Pro of: Let G = G n κ ( r ) ,κ . The c h oice o f n κ ( r ) ensur es that (1 − ǫ ) δ λ ( I 1 ) ≤ λ ( I 1 ∩ G ) = λ ( I 1 ∩ Q 1 ∩ G ) + λ ( I 1 ∩ Q c 1 ∩ G ) ≤ λ ( I 1 ∩ Q 1 ∩ G ) + λ ( I 1 ∩ Q c 1 ) ≤ λ ( I 1 ∩ Q 1 ∩ G ) + ǫλ ( I 1 ) where the final inequalit y follo ws from ( 13) and the fact t h at Q 0 ⊆ Q 1 . It follo ws from the displa y and the definition of ǫ th at λ ( I 1 ∩ Q 1 ∩ G ) ≥ δ α 1 . As the collectio n of sets used to define the join J n κ ( r ) ,κ includes the d y adic interv als of ord er n κ ( r ), eac h elemen t A of the join has diameter (and Leb esgue measure) b ounded b y 2 − n κ ( r ) < δ α 1 / 4. These last t wo inequalities imply that δ α 1 ≤ λ ( I 1 ∩ Q 1 ∩ G ) ≤ X A λ ( Q 1 ∩ A ) + 2 δ α 1 4 , where the sum is o ver A ∈ H n κ ( r ) ,κ suc h that A ⊆ I 1 . In particular, it is clear that th e sum is necessarily p ositiv e, and the first part of the claim follo ws. Moreo ve r , for an y set A ∈ H n κ ( r ) ,κ the definition of the join J n κ ( r ) ,κ requires that A b e con tained in either R k j or R c k j for ea ch j = 0 , . . . , L − 1. If λ ( A ∩ Q 1 ) > 0 then necessarily A ∩ Q 1 6 = ∅ , an d these con tainment r elatio n s imply that A ⊆ Q 1 . Th is completes the pro of of Lemma 4 Ac kno wledgemen ts The work pr esen ted in this pap er w as su pp orted in part b y NSF grant DMS-09071 77. References [1] Adams, T.M. an d Nobel, A.B. (2009) Uniform conv ergence of V apn ik-Ch erv onenkis classes un der ergo dic sampling. T o ap p ear in The A nnals of Pr ob ability . 22 [2] Alon, N . , Ben -Da vid, S. , Ces a-Bianchi, N . and Haussle r, D. (1997 ) Scale- sensitiv e d imensions, uniform con ve r gence, and learnabilit y . Journal of the ACM 44:4 615–6 31. MR1481318 (99 b :6815 4) [3] Bar tlett, P.L. , Long, P.M. , and Williamson, R.V. (1994 ) F at-shattering and the learnabilit y of real-v alued functions. Journal of Computer and System Scienc es 52 434-4 52. MR1408000 (98 i:68234) [4] Bickel, P.J. and Mill ar, P.W. (1992) Uniform co nv ergence of p robabilit y m easur es on classes of functions. Sta tistic a Sinic a 2 1 -15. MR1152 295 (93g:60005 ) [5] Billingsley, P. (1995) . Pr ob ability and Me asur e , 3rd ed., Wiley , New Y ork. MR13247 86 (95k:6000 1) [6] Billingsley, P. an d Topsøe, F. (1967). Uniformit y in w eak conv ergence. Z. Warscheinlichkeitsthe orie u nd verw. Gebiete 7 1-16. MR020 9428 (35 # 326) [7] Dudley, R.M. (1 999) Unif orm Centr al Limit The or ems Cambridge Univ. Press, New Y ork. MR1 720712 (2000k:6 0050) [8] Gin ´ e, E. and Z inn , j. (198 4) Some limit theorems for empir ical pro cesses (with dis- cussion) Annals of Pr ob ability 12:4 929-989. MR0757 767 (86f:6002 8) [9] Kearns, M.J. and Schapire, R.E. (1994) Efficien t distribution-free learning of pr oba- bilistic co n cepts. Journa l of Computer and System Scienc es 48(3) 464–497. MR12794 11 (95m:681 42) [10] Ma tousek , J. (2002) Lectures on Discrete Geometry . Gr aduate T exts in Mathematics 212 Sp ringer, New Y ork. MR189929 9 (200 3f:52011) [11] Me ndelson, S. and Versh ynin, R. (200 3) Ent r op y and the com bin atorial dimension. Inventiones Mathematic ae 152 37-55. MR1965 359 (2004d:600 47) [12] Pol l ard, D. (1984) Conver genc e of Sto chastic Pr o c esses Sprin ger, New Y ork. MR07629 84 (86i:6007 4) [13] R a nga Rao, R. (1962) Relations b et we en w eak and uniform con v ergence of measures with app licatio n s. A nnals of Mathematic al Statistics 33 659-680. [14] Ro yde n, H.L. (1988) R e al A nalysis , 3rd ed. Macmillan Publishin g Compan y , New Y ork. MR1 013117 (90g:00 004) 23 [15] S teele, J.M . (1978 ) Empirical d iscrepancies and subadd itiv e pro cesses. Ann. Pr ob ab. 6 118–127 . MR0464 379 (57 #43 10) [16] T al ag r and, M. (1987) The Gliv enk o-Cantel li problem. Anna ls of Pr ob ability 15:3 837–8 70. MR0893902 (88 h :6001 2) [17] T al ag r and, M. (2003) V apnik-Ch erv onenkis t yp e conditions and u niform Donsker classes of functions. Annals of Pr ob ability 31:3 1565–158 2. MR198 9443 (2004f:600 75) [18] V apnik, V.N. and Cher vonenkis, A.Y a. (1981) Necessary and sufficien t conditions for the uniform con vergence of means to their exp ectations. The ory of Pr ob ability and its Applic ation 26 5 32–553. MR062 7861 (83d:60031) 24
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment