On Sequences with Non-Learnable Subsequences

On Sequences with Non-Learnable Subsequences Vladimir V. V’yugin Institute for Information T ransmission Problems, Russian Academy of Sciences, Bol’shoi Karetnyi p er. 19, Moscow GSP-4, 127994 , R ussia e:mail vyugin@ iitp.ru Abstract. The remark able results of F oster and V ohra w as a starting p oin t for a series of pap ers which show that an y sequen ce of outcomes can b e learned (with no prior k no wledge) u sing some universal randomized forecasting algorithm and fore cast-dep endent checking rules. W e sho w that for the class of all computationally eﬃcient outcome-forecast-based chec king rules, th is prop erty is violated. Moreo ver, w e present a proba- bilistic algorithm generating with probabilit y close to one a sequence with a subsequence whic h simultaneously miscalibrates all partially weakly computable randomized forecasting algorithms. According to the D awid’s prequ enti al framew ork we consider partial re- cursive randomized algorithms. 1 In tr o duction Let a bina ry sequence ω 1 , ω 2 , . . . , ω n − 1 of outcomes is o bserved by a foreca ster whose tas k is to give a pro babilit y p n of a future even t ω n = 1. The ev aluation of probability fore c asts is based on a method called c alibr ation : informally , follo wing Dawid [1] fo recaster is said to b e well-calibrated if for any p ∗ the even t ω n = 1 holds in 100 p ∗ % of moments of time as he choose p n ≈ p ∗ . (see also [2 ]). Let us give some notatio ns . L e t Ω b e the se t o f a ll inﬁnite binary sequence s , Ξ be the set of all ﬁnite binar y sequences and λ b e the empt y sequence. F or any ﬁnite o r an inﬁnite sequence ω = ω 1 . . . ω n . . . , we wr ite ω n = ω 1 . . . ω n (w e put ω 0 = ω 0 = λ ). Also, l ( ω n ) = n denotes the leng th of the sequence ω n . If x is a ﬁnite seq uence and ω is a ﬁnite o r inﬁnite sequence then xω denotes the concatenation o f these sequences , x ⊑ ω means that x = ω n for so me n . In the mea sure-theoretic framework w e exp ect that the forecas ter has a metho d for assigning pro babilities p n of a future even t ω n = 1 for all p ossible ﬁnite se quences ω 1 , ω 2 , . . . , ω n − 1 . In other words, all co nditio na l proba bilities p n = P ( ω n = 1 | ω 1 , ω 2 , . . . , ω n − 1 ) m ust be s p eciﬁed and the overall probability distribution in the space Ω of all inﬁnite binary sequences will b e deﬁned. But in reality , we sho uld rec o gnize that we hav e only individual seque nce ω 1 , ω 2 , . . . , ω n − 1 of even ts and that the cor re- sp onding forecasts p n whose testing is considered may fall sho rt of deﬁning a full probability distribution in the whole space Ω . This is the po in t of the pr e quen- tial principle pr op osed by Dawid [1]. This principle says that the ev alua tio n of a probability forecaster should depend only on his actual proba bilit y forecas ts and the corresp onding outcomes. The additio na l informa tion contained in a proba - bilit y measure that has these probability foreca sts as co nditional pr obabilities should no t enter in the ev a luation. Acco rding to Dawid’s pre q uen tia l framework we do not consider num b ers p n as conditional proba bilities generated by some ov era ll probability distribution deﬁned for all p oss ible even ts. In such a wa y , a deterministic for e c asting system is a p artial r e cu r s ive function f : Ξ → [0 , 1]. W e suppo se that a v alid foreca sting system f is deﬁned on all ﬁnite initial fragments ω 1 , . . . , ω n − 1 , . . . of an analyzed individual sequence of outcomes. First examples of individual seq uences fo r which w ell-calibrated deterministic forecasting is impo ssible (non- calibrable sequences) were presented by Oakes [6] (see a lso Shervish [9]). Unfor tuna tely , the metho ds us e d in these pap ers, a nd in Dawid [1 ], [2], do not comply with prequential principle; they depend on some mild assumptions ab out the measur e from whic h proba bilit y forecas ts a re derived a s co nditional pro ba bilities. The metho d o f generation the non-ca librable sequences with proba bility a rbitrary close to one prese n ted in V’yugin [1 1 ] also is based on the s ame assumptions . In this pap er we mo dify construction from [11] for the c a se of partia l deterministic and randomized forecasting systems do not corres p onding to any ov er all pro babilit y distributions. Oakes [6] show ed that any everywhere deﬁned fo recasting system f is not calibrated for a sequence ω = ω 1 ω 2 . . . deﬁned ω i =  1 if p i < 0 . 5 0 other wise and p i = f ( ω 1 . . . ω i − 1 ), i = 1 , 2 , . . . . F oster and V ohra [3] s howed that the w ell-calibrated forecasts a re p oss ible if these forecasts a re ra ndomized. By a r andomize d forecasting s ystem they mea n a random v ar iable f ( α ; x ) deﬁned on s ome pro babilit y space Ω x supplied by some probability distribution P r x , where x ∈ Ξ is a par ameter. As usual, we o mit the argument α . F or any inﬁnite ω , these probability distributio ns P r ω i − 1 generate the over al l pr ob ability distribution P r on the direct pro duct of proba bilit y s pa ces Ω ω i − 1 , i = 1 , 2 , . . . . It w a s shown in [3], [4] that a n y sequence ca n be lea rned: for an y ∆ > 0, a universal ra ndomized forec asting system f was co nstructed such that for a ny sequence ω = ω 1 ω 2 . . . the overall probability P r o f the e vent      1 n n X i =1 I ( ˜ p i )( ω i − ˜ p i )      ≤ ∆ (1) tends to o ne as n → ∞ , wher e ˜ p i = f ( ω n − 1 ) is the rando m v ariable, I ( p ) is the characteristic function o f an a rbitrary subinterv al of [0 , 1]; we call this function a for e c ast-b ase d c hecking rule. Lehrer [5] a nd Sandrony et al. [8] extended the class o f checking rules to combination of for e c ast- a nd outc ome-b ase d chec king rules: a ch ecking rule is a function c ( ω i − 1 , p ) = δ ( ω i − 1 ) I ( p ), where δ : Ξ → { 0 , 1 } is an outcome-based chec king rule, and I ( p ) is a characteristic function of a subinterv al o f [0 , 1]. They also considered a mo re gener al class of randomized forecasting systems - r andom v a riables ˜ p i = f ( α ; ω i − 1 , p i − 1 ), where p i − 1 = p 1 , . . . , p i − 1 is the s equence of past realized forecasts. F or k = 1 , 2 , . . . , let { δ k } be any s e q uence o f outcome-based checkng r ules a nd { I k } b e any sequence of characteristic functions of subinterv a ls of [0 , 1]. Sandrony et al. [8] deﬁned a randomized univ e rsal for ecasting system whic h ca librates a ll chec king rules { δ k I k } , k = 1 , 2 , . . . , i.e., such that fo r any ∆ > 0 and for any sequence ω = ω 1 ω 2 . . . , the overall probability of the ev ent (1) tends to one as n → ∞ , where ˜ p i = f ( ω n − 1 , p i − 1 ) and I ( ˜ p i ) is re pla ced on δ k ( ω i − 1 ) I k ( ˜ p i ) for all k = 1 , 2 , . . . . In this pape r we consider the cla ss of a ll computable (partial recur s ive) outcome-based chec king r ules { δ k } and a slig h tly diﬀer e n t class of random- ized fore c asting systems: our foreca sting systems ar e random v aria bles ˜ p i = f ( α ; ω i − 1 ) do not dep ending on pas t r ealized for ecasts (this take a place fo r the universal forecasting systems deﬁned in [3] and [10] 1 ). Concurre n tly , such a function can b e undeﬁned outside ω , it r equires that a n y well deﬁned forecast- ing system must b e deﬁned on a ll initial fragment s o f a n analyz ed sequence o f outcomes. T his pec ulia rity is imp ortant, since w e consider foreca s ting systems po ssessing so me computational prop erties: there is an algorithm co mputing the probability distribution function of suc h foreca s ting s y stem. This a lgorithm when fed to so me input c an never ﬁnish its work, a nd so, is undeﬁned on this input. In this case, a univ ersal ra ndomized foreca sting algor ithm which calibra tes all co mputatio nally eﬃcient outcome-fore c ast-based checking rules do es not ex- ist. Mor eov er, we construct a pr ob abilistic gener ator (or probabilistic algorithm) of non-le arnable (in this wa y) sequences. This generator outputs with proba bil- it y close to one a n inﬁnite sequence such that for each r andomized forecasting system ˜ p i = f ( α ; ω i − 1 ) some computable outcome-ba sed chec king rule δ selects an inﬁnite subse q uence of ω on which the pr op erty (1) fails for some character- istic function I with the ov er all proba bilit y one, where the ov er all probability is asso ciated with the forecasting system f . 2 Miscalibrating the forecasts W e use standard notions of the theor y of algorithms. This theo ry is system- atically treated in, for example, Rog e r s [7 ]. W e ﬁx some eﬀective o ne - to-one enum eration o f all pair s (triples, and so on) of nonnegative integer num b ers. W e ident ify a n y pair ( t, s ) and its n umber h t, s i ; let p ( h t, s i ) = t . A function φ : A → R is ca lled (low er) s e micomputable if { ( r, x ) : r < φ ( x ) } ( r is a rational n um b er) is a recursively enumerable set. A function φ is upp er 1 Note that the algorithm from [8] can b e mod iﬁed in a fashion of [3], i.e., suc h that at any step of th e constru ction past forecasts can b e replaced on measures with ﬁnite supp orts d eﬁned on previous steps. Since these measures are deﬁned recursiv ely in the pro cess of the construct ion, they can b e eliminated from the cond ition of the universal forecasting algorithm. semicomputable if − φ is lower semico mputable. Standar d argument based on the recursio n theory shows that there exist the low er and upp er semicomputable real functions φ − ( j, x ) and φ + ( k , x ) universal for all low er semicomputable and upper s emicomputable functions fro m x ∈ Ξ ; in particular every computable real function φ ( x ) c a n b e re presented as φ ( x ) = φ − ( j, x ) = φ + ( k , x ) for all x , for some j and k . Let φ − s ( j, x ) b e eq ual to the maxima l rational num b er r such that the triple ( r, j, x ) is enum erated in s steps in the pro cess of enumerating of the set { ( r, j, x ) : r < φ ( j, x ) , r is rational } and equals −∞ , other wise. An y s uch function φ − s ( j, x ) takes only ﬁnite num b er o f rational v a lues distinct from −∞ . By deﬁnitio n, φ − s ( j, x ) ≤ φ − s +1 ( j, x ) fo r all j, s, x , and φ − ( j, x ) = lim s →∞ φ − s ( j, x ) . An analo gous no n-increasing sequence of functions φ + s ( k , x ) exists for a ny upp er semicomputable function. Let i = h t, k i . W e say that a r eal function φ i ( x ) is deﬁne d on x if giv e n any deg r ee of precisio n - p o sitive ra tional nu m ber κ > 0, it holds | φ + s ( t, x ) − φ − s ( k , x ) | ≤ κ fo r some s ; φ i ( x ) undeﬁned, otherwise . If a ny such s exists then for minimal such s , φ i,κ ( x ) = φ − s ( k , x ) is c a lled the rationa l approximation (from below) of φ i ( x ) up to κ ; φ i,κ ( x ) undeﬁned, other wise. T o deﬁne a measure P on Ω , we deﬁne v a lues P ( z ) = P ( Γ z ) for all interv als Γ z = { ω ∈ Ω : z ⊑ ω } , where z ∈ Ξ , a nd extend this function on all Bor el subsets o f Ω in a standard wa y . W e use als o a conce pt of c omputable op er ation on Ξ S Ω (see [12]). Let ˆ F be a r ecursively en umera ble set of o rdered pairs of ﬁnite sequences satisfying the following pro per ties: (i) ( x, λ ) ∈ ˆ F for each x ; (ii) if ( x, y ) ∈ ˆ F , ( x ′ , y ′ ) ∈ ˆ F and x ⊑ x ′ then y ⊑ y ′ or y ′ ⊑ y for all ﬁnite binary sequences x, x ′ , y , y ′ . A computable o p era tion F is deﬁned as follows F ( ω ) = sup { y | x ⊑ ω and ( x, y ) ∈ ˆ F for so me x } , where ω ∈ Ω S Ξ and sup is in the sense of the pa r tial or de r ⊑ on Ξ . A pr ob abilistic algorithm is a pa ir ( L, F ), wher e L ( x ) = L ( Γ x ) = 2 − l ( x ) is the uniform mea sure o n Ω and F is a computable op eratio n. F or a n y pro babilistic algorithm ( L , F ) and a set A ⊆ Ω , we consider the probability L { ω : F ( ω ) ∈ A } of gener ating by means o f F a sequence from A given a uniformly distr ibuted sequence ω . A partial r andomized forecas ting system f is we akly c omput able if its we ak pr ob ability distribution function ϕ n ( ω n − 1 ) = P r n { f ( ω n − 1 ) < 1 2 } is a partial recursive function fro m ω n − 1 . An y function δ : Ξ → { 0 , 1 } is c a lled an outco me- based selection (or check- ing) rule. F or any s equence ω = ω 1 ω 2 . . . , the selectio n rule δ se le c ts a seq uence of indices n i such that δ ( ω n i − 1 ) = 1, i = 1 , 2 , . . . , and the corres p onding subse- quence ω n 1 ω n 2 . . . of ω . The following theorem is the main result of this pap er. In particular, it shows that the construction of the univ ersal forecasting algorithm from Sandrony et al. [8 ] is computationally no n-eﬃcient in a ca se when the c lass of all par tial recursive outcome-based checking rules { δ k } is used. Theorem 1. F or any ǫ > 0 a pr ob abilistic algorithm ( L , F ) c an b e c onst ructe d, which with pr ob ability ≥ 1 − ǫ outputs an inﬁnite binary se qu en c e ω = ω 1 ω 2 . . . such that for every p artial we akly c omputable r andomize d for e c asting syst em f deﬁne d on al l initial fr agments of the s e quenc e ω ther e exists a c omput able se- le ction rule δ deﬁne d on al l these fr agments and such that for ν = 0 or for ν = 1 the over al l pr ob ability of the event lim s up n →∞      1 n n X i =1 δ ( ω i − 1 ) I ν ( ˜ p i )( ω i − ˜ p i )      ≥ 1 / 16 (2) e quals one, wher e I 0 and I 1 ar e the char acteristic fun ctions of the intervals [0 , 1 2 ) and [ 1 2 , 1 ] , ˜ p i = f ( ω i − 1 ) is a r andom variable, i = 1 , 2 , . . . , and the over al l pr ob ability distribution is asso ciate d with f . Pr o of . F or any proba bilistic algorithm ( L, F ), w e consider the function Q ( x ) = L { ω : x ⊑ F ( ω ) } . (3) It is eas y to verify that this function is lower semico mputable and satisﬁes : Q ( λ ) ≤ 1; Q ( x 0) + Q ( x 1) ≤ Q ( x ) for all x . Any function sa tisfying thes e prop- erties is called s emicomputable semimeasur e. F or a ny semicomputable semimea- sure Q a probabilis tic a lgorithm ( L, F ) exists such that (3) ho lds . Though the semimeasure Q is not a measur e , we consider the corr esp onding measure on the set Ω ¯ Q ( Γ x ) = inf n X l ( y )= n,x ⊑ y Q ( y ) . W e w ill constr uc t a se mico mputable semimea s ure Q as a some sort o f netw or k ﬂow. W e deﬁne an inﬁnite netw ork on the base o f the inﬁnite binary tree. Any x ∈ Ξ deﬁnes tw o edges ( x, x 0) and ( x, x 1) of length one. In the constr uction below we will moun t to the netw o rk extra edges ( x, y ) of length > 1, where x, y ∈ Ξ , x ⊑ y and y 6 = x 0 , x 1. By the length of the edge ( x, y ) w e mea n the nu m ber l ( y ) − l ( x ). F or any edge σ = ( x, y ) w e denote by σ 1 = x its starting vertex and b y σ 2 = y its terminal vertex. A computable function q ( σ ) deﬁned on all edges of length one a nd on a ll extra edges and taking rational v a lues is called a network if for all x ∈ Ξ X σ : σ 1 = x q ( σ ) ≤ 1 . Let G b e the set o f all e xtra edges of the netw or k q (it is a part of the domain of q ). By q - ﬂow we mean the minimal semimea s ure P such that P ≥ R , wher e the function R is deﬁned b y the following recursive equations R ( λ ) = 1 and R ( y ) = X σ : σ 2 = y q ( σ ) R ( σ 1 ) (4) for y 6 = λ . A netw ork q is calle d element ary if the set of extra edges is ﬁnite a nd q ( σ ) = 1 / 2 for almost a ll edges of unit length. F or any netw o rk q , we deﬁne the network ﬂow delay function ( q - delay function) d ( x ) = 1 − q ( x, x 0) − q ( x, x 1 ) . The constructio n b elow works with all computable rea l functions φ t ( x ), x ∈ Ξ , t = 1 , 2 , . . . . W e s uppos e that for a n y computable function φ there exist inﬁnitely many pro g rams t suc h tha t φ t = φ . 2 An y pair i = h t, s i is co nsidered as a progra m for computing the ra tional appr oximation φ t,κ s ( ω n − 1 ) of φ t from b elow up to κ s = 1 / s . By the co nstruction below we visit any function φ t on inﬁnitely many steps n . T o do this, we use the function p ( n ): for a n y p ositive integer num b er i w e hav e p ( n ) = i for inﬁnitely many n . Let β b e a ﬁnite sequence a nd 1 ≤ k < l ( β ). A bit β k of the sequence β is called har d ly pr e dictable b y a prog r am i = h t, s i if φ t,κ s ( β k − 1 ) is deﬁned and β k =  0 if φ t,κ s ( β k − 1 ) ≥ 1 2 1 otherwise Lemma 1. L et i = h t, s i b e a pr o gr am and µ b e an arbitr ary suﬃciently smal l p ositive r e al nu mb er. Then for any binary se quen c e x of length n the p ortion of al l se quenc es γ of length K = ⌈ (2 + µ ) i ⌉ n (in the set of al l ﬁnite se quenc es of length K ) such that 1) φ t,κ s ( xγ k ) is deﬁne d for al l 0 ≤ k < K , 2) the nu mb er of har d ly pr e dictable bits of γ by the for e c asting pr o gr am i is less than in , is ≤ 2 − 2 µ 2 in + O (log ( in )) for al l suﬃciently lar ge n . Pr o of . An y function σ ( x ), where x ∈ Ξ a nd σ ( x ) ∈ { A, B } , is ca lled lab el ling if σ ( x 0) 6 = σ ( x 1) for a ll x ∈ Ξ . F or any γ of length K a nd for any k such that 1 ≤ k < K , deﬁne σ ( γ k +1 ) = A and σ ( γ k ¯ γ k +1 ) = B if the bit γ k +1 of the sequence xγ is hardly pr edictable, where we denote ¯ θ = 1 − θ for any binar y bit θ . Since φ t,κ s ( xγ k ) is deﬁned for all 0 ≤ k < K , then σ ( γ k +1 ) is a lso de ﬁned for all these k . This partial lab elling σ can b e easily extended on the set of all binary sequences of length K in many diﬀerent wa ys. W e ﬁx some such ex tension. The n the total n umber of a ll γ sa tisfying 1)-2) do es not exceed the total num b er of all binary seq uences o f length K with ≤ in lab els A . Therefor e, for a ll suﬃciently large n , the p ortion o f these γ do es not exceed X i ≤ in  K i  2 − K ≤ 2 − (1 − H (1 / 2 − µ ))2 in + O (log( in )) ≤ 2 − 2 µ 2 in + O (log ( in )) , where H ( r ) = − r log r − (1 − r ) log (1 − r ). ✷ In the following we put µ = 1 / log( i + 1 ). W e deﬁne an a uxiliary relation B ( i , q n − 1 , σ, n ) and a function β ( x, q n − 1 , n ). Let x, β ∈ Ξ . The v alue of B ( i, q n − 1 , ( x, β ) , n ) is true if the following conditions hold: 2 T o obt ain this prop erty , we can replace the sequen ce φ t ( x ) on a sequence φ ′ h t,s i ( x ) = φ t ( x ) for all s . – n ≥ (1 + ⌈ (2 + lo g − 1 ( i + 1)) i ⌉ ) l ( x ); – l ( β ) = n and x ⊑ β ; – d n − 1 ( β j ) < 1 for all j such that 1 ≤ j < n ; – for all j , l ( x ) < j ≤ (1 + ⌈ (2 + log − 1 ( i + 1)) i ⌉ ) l ( x ), the v alue φ t,κ s ( β j − 1 ) is computed in ≤ n steps, and for at leas t il ( x ) of these j the bit β j is hardly predictable by the pro gram i = h t, s i . The v alue of B ( i, q n − 1 , ( x, β ) , n ) is false , otherwise. Deﬁne β ( x, q n − 1 , n ) = min { y : p ( l ( y )) = p ( l ( x )) , B ( p ( l ( x )) , q n − 1 , ( x, y ) , n ) } . Here min is considered for lexicogr aphical or dering of string s; we supp ose that min ∅ is undeﬁned. Construction. Let ρ ( n ) = ( n + n 0 ) 2 for some suﬃciently large n 0 (the v a lue n 0 will b e sp eciﬁed b elow in the pro of o f Lemma 5 ). Using the mathematical induction by n , we deﬁne a sequence q n of elementary net works. P ut q 0 ( σ ) = 1 / 2 for all edg es σ of length one . Let n > 0 and a netw o r k q n − 1 is deﬁned. Let d n − 1 be the q n − 1 -delay function and let G n − 1 be the se t o f a ll extra edges . W e supp ose also that l ( σ 2 ) < n for all σ ∈ G n − 1 . Let us deﬁne a net work q n . At ﬁrst, we deﬁne a netw ork ﬂow delay function d n and a set G n . The construction can b e split up in to tw o cases. Let w ( i, q n − 1 ) be equal to the minimal m such that p ( m ) = i and m > l ( σ 2 ) for ea c h extra edge σ ∈ G n − 1 such that p ( l ( σ 1 ))) < i . The ineq ua lit y w ( i, q m ) 6 = w ( i, q m − 1 ) can b e induced by s ome task j < i that mounts a n extra edge σ = ( x, y ) such that l ( x ) > w ( i, q m − 1 ) and p ( l ( x )) = p ( l ( y )) = j . Lemma 2 (below) will show that this ca n happ en only at ﬁnitely many s teps o f the c onstruction. Case 1 . w ( p ( n ) , q n − 1 ) = n (the goal of this par t is to start a new task i = p ( n ) or to restart the existing task i = p ( n ) if it w as destroyed b y some task j < i a t some pr eceding step). Put d n ( y ) = 1 / ρ ( n ) for l ( y ) = n and deﬁne d n ( y ) = d n − 1 ( y ) for all o ther y . Put also G n = G n − 1 . Case 2. w ( p ( n ) , q n − 1 ) < n (the go a l of this pa rt is to pro cess the task i = p ( n )). Let C n be the s e t o f a ll x such that w ( i , q n − 1 ) ≤ l ( x ) < n , 0 < d n − 1 ( x ) < 1, the function β ( x, q n − 1 , n ) is deﬁned 3 and there is no extra edg e σ ∈ G n − 1 such that σ 1 = x . In this cas e for each x ∈ C n deﬁne d n ( β ( x, q n − 1 , n )) = 0, and for all other y of length n such that x ❁ y deﬁne d n ( y ) = d n − 1 ( x ) 1 − d n − 1 ( x ) . Deﬁne d n ( y ) = d n − 1 ( y ) for all other y . W e add an extra edg e to G n − 1 , na mely , deﬁne G n = G n − 1 ∪ { ( x, β ( x, q n − 1 , n )) : x ∈ C n } . 3 In particular, p ( l ( x )) = i and l ( β ( x, q n − 1 , n )) = n . W e say that the task i = p ( n ) mounts the extra edge ( x, β ( x, q n − 1 , n )) to the net work a nd that all existing tas ks j > i a re destroy e d by the task i . After Ca se 1 and Cas e 2, deﬁne for any edge σ o f unit length q n ( σ ) = 1 2 (1 − d n ( σ 1 )) and q n ( σ ) = d n ( σ 1 ) for each extra edg e σ ∈ G n . Case 3 . Cases 1 and 2 do not hold. Deﬁne d n = d n − 1 , q n = q n − 1 , G n = G n − 1 . As the result of the construction we deﬁne the net w ork q = lim n →∞ q n , the net work ﬂow delay function d = lim n →∞ d n and the set of extra edges G = ∪ n G n . The functions q and d are computable and the s e t G is rec ursive by their deﬁnitions. Let Q denotes the q - ﬂow. The following lemma shows that any task can mount new extra edges only at ﬁnite num b er of steps. Let G ( i ) b e the s et of all extr a edges mounted by the task i , w ( i, q ) = lim n →∞ w ( i, q n ). Lemma 2. The set G ( i ) is ﬁnite, w ( i , q ) exists and w ( i, q ) < ∞ for al l i . Pr o of. Note that if G ( j ) is ﬁnite for all j < i , then w ( i, q ) < ∞ . Hence, we must prov e that the set G ( i ) is ﬁnite for any i . Suppose that the opposite assertion holds. Let i b e the minimal such that G ( i ) is inﬁnite. B y choice of i the sets G ( j ) fo r all j < i are ﬁnite. T he n w ( i , q ) < ∞ . F or an y x such that l ( x ) ≥ w ( i, q ), c onsider the maximal m such that for some initial fragment x m ⊑ x there exists an extr a edge σ = ( x m , y ) ∈ G ( i ). If no such extra edg e exists deﬁne m = w ( i , q ). By deﬁnition, if d ( x m ) 6 = 0 then 1 /d ( x m ) is an integer num b er. Deﬁne u ( x ) =    1 /d ( x m ) if d ( x m ) 6 = 0 , l ( x ) ≥ w ( i, q ) ρ ( w ( i, q )) if l ( x ) < w ( i, q ) 0 otherwise By construction the integer v alued function u ( x ) ha s the prop erty: u ( x ) ≥ u ( y ) if x ⊑ y . Besides, if u ( x ) > u ( y ) then u ( x ) > u ( z ) for all z such tha t x ⊑ z and l ( z ) = l ( y ). Then the function ˆ u ( ω ) = min { n : u ( ω i ) = u ( ω n ) for all i ≥ n } is deﬁned for all ω ∈ Ω . It is easy to see that this function is contin uous. Since Ω is compact space in the top olog y generated b y in terv a ls Γ x , this function is bo unded b y some n um b er m . Then u ( x ) = u ( x m ) for all l ( x ) ≥ m . By the construction, if any extr a edg e o f i th type was mounted to G ( i ) at some step then u ( y ) < u ( x ) holds for some new pa ir ( x, y ) s uc h that x ⊑ y . This is contradiction with the existence of the num b er m . ✷ An inﬁnite se quence α ∈ Ω is called an i - extension of a ﬁnite sequence x if x ⊑ α and B ( i , q n − 1 , x, α n , n ) is true for almost all n . A seq uence α ∈ Ω is ca lled i - close d if d ( α n ) = 1 for so me n suc h that p ( n ) = i , where d is the q -delay function. Note that if σ ∈ G ( i ) is so me extra edge (i.e. an edge o f i th t y pe) then B ( i, q n − 1 , σ, n ) is true, where n = l ( σ 2 ). Lemma 3. L et for any initial fr agment ω n of an inﬁnite se quenc e ω some i - extension exists. Then either the se qu en c e ω wil l b e i -close d in the pr o c ess of the c onst ruction or ω c ontains an extr a e dge of i t h typ e (i.e. σ 2 ⊑ ω for some σ ∈ G ( i ) ). Pr o of. Let a se q uence ω is not i -clos ed. By Lemma 2 the max imal m exists such that p ( m ) = i and d ( ω m ) > 0. Since the sequence ω m has an i -extensio n and d ( ω m ) < 1, b y Case 2 of the construction a new extra edge ( ω m , y ) o f i th type m ust be mounted to the binary tree. By the co nstruction d ( y ) = 0 and d ( z ) 6 = 0 for all z such that ω m ⊑ z , l ( z ) = l ( y ), and z 6 = y . By the choice o f m we have y ⊑ ω . ✷ Lemma 4. It holds Q ( y ) = 0 if and only if q ( σ ) = 0 for some e dge σ of un it length lo c ate d on y (this e dge satisﬁes σ 2 ⊑ y ) . Pr o of. The necessa ry conditio n is obvious. T o pr ov e that this condition is suﬃ- cient, let us supp ose that q ( y n , y n +1 ) = 0 fo r so me n < l ( y ) but Q ( y ) 6 = 0. Then by deﬁnition d ( y n ) = 1. Since Q ( y ) 6 = 0 an extra edge ( x, z ) ∈ G exists such that x ⊑ y n and y n +1 ⊑ z . B ut, by the construction, this extra edge can not b e mounted to the netw ork q l ( z ) − 1 since d ( z n ) = 1. This contradiction pr ov es the lemma. ✷ F or any semimeas ur e P deﬁne E P = { ω ∈ Ω : ∀ n ( P ( ω n ) 6 = 0) } - the s uppor t set of P . It is ea sy to see that ¯ P ( E P ) = ¯ P ( Ω ). By Lemma 4 E Q = Ω \ ∪ d ( x )=1 Γ x . Lemma 5. It holds ¯ Q ( E Q ) > 1 − 1 2 ǫ . Pr o of. W e b ound ¯ Q ( Ω ) from be low. Let R b e deﬁned b y (4). By deﬁnition of the netw ork ﬂow delay function, w e hav e X u : l ( u )= n +1 R ( u ) = X u : l ( u )= n (1 − d ( u )) R ( u ) + X σ : σ ∈ G,l ( σ 2 )= n +1 q ( σ ) R ( σ 1 ) . (5) Deﬁne an auxilia ry seque nc e S n = P u : l ( u )= n R ( u ) − P σ : σ ∈ G,l ( σ 2 )= n q ( σ ) R ( σ 1 ) . At ﬁrst, we c o nsider the case w ( p ( n ) , q n − 1 ) < n . If there is no edge σ ∈ G such tha t l ( σ 2 ) = n then S n +1 ≥ S n . Supp ose that some such edge exists. Deﬁne P ( u, σ ) ⇐ ⇒ l ( u ) = l ( σ 2 )& σ 1 ⊑ u & u 6 = σ 2 & σ ∈ G. By deﬁnition of the net work ﬂow delay function, we have X u : l ( u )= n d ( u ) R ( u ) = X σ : σ ∈ G,l ( σ 2 )= n d ( σ 2 ) X u : P ( u,σ ) R ( u ) = = X σ : σ ∈ G,l ( σ 2 )= n d ( σ 1 ) 1 − d ( σ 1 ) X u : P ( u,σ ) R ( u ) ≤ X σ : σ ∈ G,l ( σ 2 )= n d ( σ 1 ) R ( σ 1 ) = = X σ : σ ∈ G,l ( σ 2 )= n q ( σ ) R ( σ 1 ) . (6) Here we used the inequality P u : P ( u,σ ) R ( u ) ≤ R ( σ 1 ) − d ( σ 1 ) R ( σ 1 ) for all σ ∈ G such that l ( σ 2 ) = n . Combining this b ound with (5) w e obtain S n +1 ≥ S n . Let us c onsider the ca s e w ( p ( n ) , q n − 1 ) = n . Then P u : l ( u )= n d ( u ) R ( u ) ≤ ρ ( n ) = ( n + n 0 ) − 2 . Combining (5) a nd (6 ) we o btain S n +1 ≥ S n − ( n + n 0 ) − 2 for all n . Since S 0 = 1, this implies S n ≥ 1 − ∞ P i =1 ( i + n 0 ) − 2 ≥ 1 − 1 2 ǫ for so me suﬃciently large co nstant n 0 . Since Q ≥ R , it holds ¯ Q ( Ω ) = inf n X l ( u )= n Q ( u ) ≥ inf n S n ≥ 1 − 1 2 ǫ. Lemma is proved. ✷ Lemma 6. Ther e ex ist s a s et U of inﬁn ite binary se qu enc es such that ¯ Q ( U ) ≤ ǫ/ 2 and for any se quenc e ω ∈ E Q \ U for e ach p artial c omputable for e c asting system the c ondition (2) holds. Pr o of. Let ω b e an inﬁnite sequence and let f b e a partial computable forecasting system s uc h that the co r resp onding φ t ( ω n − 1 ) is deﬁned fo r all n . Let i = h t, s i be a progr am for co mputing the ratio nal approximation φ t,κ s from below up to κ s = 1 /s . If d ( ω m ) = 1 fo r some m suc h that p ( m ) = i then for every β o f le ng th (1 + ⌈ (2 + log − 1 ( i + 1) ⌉ i ) m s uc h tha t ω m ⊑ β there ar e < im bits hardly predictable by the for e casting progr am i . W e show that ¯ Q -measure of all int erv als g enerated by s uc h β b ecomes arbi- trary small for all suﬃciently la rge i . Since ther e are no extra edges σ such that ω m ⊑ σ 1 , the measure ¯ Q when restr icted o n interv al Γ ω m is pro por tional to the uniform measur e . Then by Lemma 1, wher e µ = log − 1 ( i + 1), ¯ Q -measure of all such β decr eases ex ponentially b y i m . Ther efore, for e a ch j there exists a num b er m j such that ¯ Q ( U j ) ≤ 2 − ( j +1) , where U j is the unio n of a ll interv als Γ β deﬁned by all β of length (1 + ⌈ (2 + log − 1 ( i + 1)) i ⌉ ) m for m ≥ m j containing < im bits hardly predictable b y the forecasting progra m i = p ( m ). Deﬁne U = ∪ j >k U j , where k = ⌈− log 2 ǫ − 1 ⌉ . W e hav e ¯ Q ( U ) < ǫ / 2. Deﬁne a selection rule γ as follows: – deﬁne γ ( ω j − 1 ) = 1 if σ 1 ⊑ ω j − 1 ⊑ σ 2 for some σ ∈ G ( i ) a nd the j th bit of σ 2 is har dly predictable by the foreca sting progr am i ; – deﬁne γ ( ω j − 1 ) = 0 otherwise. W e als o deﬁne t wo selectio n rules J ν , whe r e ν = 0 , 1, J ν ( ω j − 1 ) =  1 − ν if φ t,κ s ( ω j − 1 ) < 1 2 ν if φ t,κ s ( ω j − 1 ) ≥ 1 2 Suppo se that ω 6∈ U and φ t ( ω n ) is deﬁned for all n . Then ω is an i - extension of ω n for each n . Since for each n the sequence ω n is not i -clos e d, by Lemma 3 there e xists a n e xtra edge σ ∈ G ( i ) suc h that σ 2 ⊑ ω . In the follo wing , let m = l ( σ 1 ), n = (1 + ⌈ (2 + log − 1 ( i + 1)) i ⌉ ) m . Then by the constructio n the selection r ule δ ν ( ω j − 1 ) = γ ( ω j − 1 ) J ν ( ω j − 1 ), for ν = 0 or for ν = 1, selects from a fragment of ω of length n a s ubsequence ω t 1 , . . . , ω t l of leng th l ≥ i m/ 2. Since by deﬁnition thes e bits ar e hardly pre- dictable, we hav e ω t j = 1 for all j such that 1 ≤ j ≤ l if ν = 0, and ω t j = 0 for all these j if ν = 1. Let ˜ p j = f ( ω j − 1 ), j = 1 , 2 , . . . , be an arbitrary computable ra ndomizing forecasting system (it is a r andom v aria ble) deﬁned on all initial fragments of ω = ω 1 ω 2 . . . . Then φ ( ω j − 1 ) = P r { ˜ p j ≥ 1 2 } is a computable rea l function. By deﬁnition φ = φ t for inﬁnitely many t and φ t,κ s ( ω j − 1 ) ≤ φ t ( ω j − 1 ) ≤ φ t,κ s ( ω j − 1 ) + κ s . (7) for a ll s and j . Consider tw o random v aria bles, for ν = 0 and for ν = 1, ϑ n,ν = n X j =1 δ ν ( ω j − 1 ) I ν ( ˜ p j )( ω j − ˜ p j ) . Suppo se tha t l ≥ i m/ 2 holds for ν = 0. Then us ing (7) we o btain E ( ϑ n, 0 ) ≥ n X j = m +1 δ 0 ( ω j − 1 ) P r { ˜ p j < 1 2 } 1 2 − m ≥ ≥ im 4 ( 1 2 − κ s ) − m (8) Since n = (1 + ⌈ (2 + log − 1 ( i + 1)) i ⌉ ) m , i can b e arbitrary large and we visit any pair i = h t, s i inﬁnitely often, we o btain from (8) lim s up n →∞ 1 n E ( ϑ n, 0 ) ≥ 1 / 16 . (9) Analogously , if ν = 1 we o btain lim inf n →∞ 1 n E ( ϑ n, 1 ) ≤ − 1 / 16 . (10) The martinga le str ong law of large n umber s says that for ν = 0 , 1 with P r - probability one 1 n n X j =1 δ ν ( ω j − 1 ) I ν ( ˜ p j )( ω j − ˜ p j ) − 1 n E ( ϑ n,ν ) → 0 (11) as n → ∞ . Combining (9), (1 0 ) and (1 1) we obtain (2). Lemma 6 and Theorem 1 are prov ed. ✷ The following theorem is a gener alization of the result from V’yugin [11] fo r partial deﬁned co mputable deterministic forecas ting systems. Theorem 2. F or any ǫ > 0 a pr ob abilistic algorithm ( L , F ) c an b e c onst ructe d, which with pr ob ability ≥ 1 − ǫ outputs an inﬁnite binary se qu en c e ω = ω 1 ω 2 . . . such that for every p artial deterministic for e c asting algorithm f deﬁne d on al l initial fr agments of t he se quen c e ω a c omputable outc ome-b ase d sele ction rule δ exists deﬁne d on al l t hese fr agments such that lim s up n →∞      1 n n X i =1 δ ( ω i − 1 )( ω i − f ( ω i − 1 ))      ≥ 1 / 8 . (12) The pro of of this theor em is based on the same construction. 3 Ac knowledgem en ts Author thanks an anonymous refere e pointing out the connectio n of the r esults of this pap er with w orks [5] and [8 ]. This research was partially supp orted by Russia n foundation for fundamental resear ch: 06-01 -00122 -a. References 1. Da wid, A.P .: The W ell-Calibrated Bay esian [with discussion], J. A m. Statist. Assoc. 77, 605-613 (1982) 2. Da wid, A.P .: Calibration-Based Empirical Probability [with discussion], Ann . Statist. 13, 1251-128 5 (1985) 3. F oster, D .P .,V ohra, R.: Asymptotic Calibration, Biometrik a 85, 379-390 (1998) 4. Kak ade, S.M., F oster, D.P .: D eterministic Calibration and N ash Eq uilibrium, In John Sh a we T aylor and Y oram S inger, editors Proceedings of th e Seventeen th Annual Conference on Learning Theory V olume 3120 of Lecture Notes in Compu ter Science, 33-48, H eidelberg, S pringer (2004) 5. Lehrer, E.: Any Insp ection Rule is Manipulable, Ec onometric a, 69 -5, 1333-1347 (2001) 6. Oakes , D.: S elf-Cali brating Priors Do not Exists [with discussion], J. Am. St atist. Assoc. 80, 339-342 (1985) 7. Rogers, H .: Theory of R ecursiv e F un ctions and Eﬀectiv e Computability , New Y ork, McGra w H ill (1967) 8. Sandroni, A., S morodinsky R., and V ohra, R .: Calibration with Man y Chec k ing Rules, Mathematics of Op er ations R ese ar ch 28-1, 141-153 (2003) 9. Schervish, V.: Comment [to Oakes, 1985], J. Am. Statist. A ssoc. 80, 341-342 (1985) 10. V ovk, V.: D efensiv e foreca stin g for optimal p rediction with exp ert advice, arXiv:0708.15 03v1 (2007) 11. V’yugin, V.V.: Non-Sto chastic In ﬁnite and Finite Sequences, Theor. Comp. Sci- ence. 207, 363-382 (1998) 12. Zvo nkin, A.K., Levin, L.A.: The Complexit y of Finite Ob jects and the Algorithmic Concepts of In formation and Rand omness, Ru ss. Math. Surv . 25, 83-124 (1970)

On Sequences with Non-Learnable Subsequences

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment