Prediction with Expert Advice in Games with Unbounded One-Step Gains

Prediction with Exp ert Advice in Games with Un b ounded One-Step Gains Vladimir V. V’yugin Institute for Information T ransmission Problems, Russian Academy of Sciences, Bol’shoi Karetnyi p er. 19, Moscow GSP-4, 127994, R u ssia e-mail vyugin@iitp.ru Abstract. The games of prediction with exp ert adv ice are considered in this p ap er. W e presen t some modiﬁcation of Kalai and V empala al- gorithm of follo wing t h e p erturb ed leader for t he case of unrestrictedly large one-step gains. W e show that in general case t he cumulativ e gain of any probabilistic prediction algorithm can b e muc h w orse than th e gain of some exp ert of th e p ool. Neve rth eless, we give the low er b ound for this cumulativ e gain in general case and construct a universal algo- rithm whic h has the optimal performance; we also prov e that in case when one-step gains of exp erts of the po ol hav e “limited deviations” the p erformance of our algorithm is close to the p erformance of th e b est exp ert. 1 In tro duction Exp erts algorithms are used fo r online pr ediction or rep eated decisio n making o r rep eated game playing. Any such a lgorithm is based o n a “p oo l of exp erts”. At any step t , each exper t gives its recommenda tio n. F rom this, a “ master decision” is p erformed. After that, lo sses (or r ew a r ds) s i t are assig ned to each e xpert i = 1 , . . . , m by the environment (or adversary). The master algorithm also receives some loss or re ward dep ending on the mas ter decisio n. The goa l of the mas ter algorithm is to p erform almos t as well as the b est exp ert in hinds ig h t in the long run. Prediction with E xpert Advice considered in this pap er pro ceeds as follows. W e ar e asked to p erform sequential actions a t times t = 1 , 2 , . . . , T . A t each time step t , we observe res ults of actions o f experts in the for m of their g ains and lo sses o n s teps < t . After that, at the b eginning of the step t L e arner makes a decision to follow one of these exper ts, say Exp ert i . At the end of step t Learner r eceives the same g ain or loss as Exp ert i at step t . W e use notations and deﬁnitions from [5] and [7]. Let s i 1: t = s i 1 + . . . + s i t be the cumulativ e loss of E x pert i a t time t . Giv en s i 1: t − 1 , i = 1 , . . . , m , a t time t , a natural idea to so lv e the exp ert pr oblem is “to follow the leader” , i.e. to select the exp ert i which p erformed best in the past. The follo wing simple example from Kalai and V empala [7] shows tha t Lea rner can perfo r m m uch worse than each ex pert: let the cur rent lo sses o f tw o expe r ts on steps t = 1 , . . . , 6 b e s 1 1 , 2 , 3 , 4 , 5 , 6 = (0 , 1 , 0 , 1 , 0 , 1) and s 2 1 , 2 , 3 , 4 , 5 , 6 = ( 1 2 , 0 , 1 , 0 , 1 , 0). The “F ollow L eader” algorithm always choo ses the wr ong predictio n. The metho d of following the p erturb ed leader was discov er ed by Hannan [4]. Kalai and V empala [7] rediscovered this metho d and published a simple pro of of the main result o f Hannan. They called the algo r ithm of this type FPL (F ol- lowing the Perturbe d Lea der). Hutter and Poland [5 ] presented a further devel- opments of the FPL algorithm for coun table c lass of exp erts, arbitra ry weigh ts and ada ptiv e learning rate. The FPL algor ithm outputs prediction of an exp ert i which minimizes s i 1: t − 1 − 1 ǫ ξ i t , where ξ i t , i = 1 , . . . m , t = 1 , 2 , . . . , is a sequence of i.i.d ra ndo m v ar iables dis- tributed according to the exponential distribution with the density p ( t ) = e − t , and ǫ is a le arning r ate . Kala i and V empala [7 ] show tha t the ex pected cumu la tiv e loss o f the FP L alg orithm has the upp er bound E ( s 1: t ) ≤ (1 + ǫ ) min i =1 ,...,n s i 1: t + log n ǫ , where ǫ is a lea rning ra te, n is the num ber of expe rts.. In the pa pers cited ab o ve the loss of each exp ert i ca n change a t any step t b y a b ounded quantit y , for example, 0 ≤ s i t ≤ 1 for all t . Poland and Hut- ter [6] extended this analysis for g a mes with one- step losses upp er b ounded b y an incr easing sequence B t given in adv ance, i.e., s t ≤ B t for all t . Allenberg et al. [1] a lso c o nsidered unbounded loss e s, but with diﬀerent algor ithm than in this pa per. In g ames consider ed in this pa per the play ers will incur gains (loss is a neg- ative gain); s i t denotes o ne-step gain o f a pla yer i . F or practica l purpos es, the prop erty 0 ≤ s i t ≤ 1 seems to b e to o r estrictive. In Appendix A we consider so me applications o f r esults o f Sec tio ns 2-4 o f this pap er. W e deﬁne tw o ﬁnancial exp erts learning the fractional Brownian motion whos e one-step gains a t any step ca n not be re stricted in adv ance. This application is at the bottom o f our special in terest in zero-sum games with un b ounded gains in Section 4. In this pa per we present some mo diﬁcation of Kalai and V empala algor ithm for the case of unrestric tedly large one- step gains not b ounded in adv ance. W e show that in genera l case, the cum ula tive gain o f an y probabilistic prediction algorithm can b e muc h worse than the g ain of some exper t of the p oo l. Neverthe- less, we give the lower b ound for cumulativ e g ain of any pr obabilistic algor ithm in general case and pr ove that our universal algo rithm has optimal p erformance; we also prove that in case when o ne-step ga ins of exp erts of the p o o l have “lim- ited dev ia tions” (in pa rticular, when they ar e b ounded) the p erfor mance of o ur algorithm is clos e to the p erforma nce of the b est exp ert. This res ult is so me improv ement of results mentioned ab ov e. 2 Learning in games of tw o exp erts with unbounded gains In this section we give some preliminar y results pres en ting the bounds on the per formance o f the alg orithm constr ucted in Section 3. W e consider a simple ga me G of pre dic tio n with exp ert a dvice by following of tw o ex p erts with unbounded o ne - step ga ins. The goa l of the master algo rithm is to receive a cumulativ e ga in not muc h worse than the gain o f the b est e xpert in hindsig h t. A t each step t of the ga me both exp erts receive the nonnegative one-step gains s 1 t and s 2 t , and their cumulativ e gains after step t are equal to s 1 1: t = s 1 1: t − 1 + s 1 t and s 2 1: t = s 2 1: t − 1 + s 2 t . F or simplicity , we consider a v aria n t when at each step t of the g a me G only one exper t can receive a nonnegative one-s tep ga in s t , and the total gain of the other exp ert is unchanged, i.e., s 1 1: t = s 1 1: t − 1 + s t and s 2 1: t = s 2 1: t − 1 or s 2 1: t = s 2 1: t − 1 + s t and s 1 1: t = s 1 1: t − 1 . In the genera l ca se the analysis is similar. W e als o consider non-de gener ate exp erts (games), i.e., such that max { s 1 1: t , s 2 1: t } → ∞ as t → ∞ . A pro babilistic alg orithm of following the leader in the game with tw o exp erts is based o n a computable function f which given cumulativ e gains s 1 1: t − 1 and s 2 1: t − 1 of the exp erts in hindside outputs the probability of following the ﬁrst exp ert P { I = 1 } = f ( s 1 1: t − 1 , s 2 1: t − 1 ) and the probability of fo llowing the se cond exp ert P { I = 2 } = 1 − P { I = 1 } . The analysis in case when these probabilities dep end of the whole histor y o f gains is similar . Let tw o exp e rts b e g iven. The master algor ithm works as follows. Probabilistic algorithm of following the leader . F O R t = 1 , . . . T Given past cumulativ e gains o f the exper ts s 1 1: t − 1 and s 2 1: t − 1 choose the exp ert i ∈ { 1 , 2 } with pro babilit y P { I = i } . Receive the one- step gains at step t o f tw o exp e rts s 1 t and s 2 t and deﬁne one step gain s t = s i t of the master algo rithm. ENDF O R The following theorem s a ys that if a pro babilistic algo r ithm of following the leader has high p erformance in ga mes with b ounded one-s tep gains then its per formance in g a mes with un b ounded o ne-step gains can be m uch worse than the p erforma nce of so me exp erts. Theorem 1. L et δ, δ ′ b e arbitr ary close and arbitr ary smal l p ositive r e al nu m- b ers such that δ ′ > δ , and let for any two non-de gener ate exp ert s with b ounde d one-step gains s i t , i = 1 , 2 , i.e., such that 0 ≤ s i t ≤ 1 for al l t , a master algorithm has t he exp e cte d cumulative gain E ( s 1: t ) ≥ (1 − δ ) max i =1 , 2 s i 1: t (1) for al l suﬃciently lar ge t . Then ther e exist two exp erts with un b oun de d one-st ep gains such that the exp e cte d cumulative gain of the master algorithm is b oun de d fr om ab ove E ( s 1: t ) ≤ δ ′ max i =1 , 2 s i 1: t (2) for inﬁn itely many t . Pr o of . Let a master algorithm be giv en, a nd let P { I = 1 } = f ( s 1 , s 2 ) and P { I = 2 } = 1 − P { I = 1 } be probabilities to choo se the best exp ert fro m tw o exp erts with cumulativ e g a ins s 1 , s 2 . The pr oof of the theorem uses the following lemma. Lemma 1. L et δ, δ ′ b e p ositive r e al numb ers such that δ ′ > δ and for any two exp erts with b ounde d one-step gains t he master algo rithm has the ex p e cte d p er- formanc e (1) for al l su ﬃciently lar ge t . Then for any two r e al numb ers ˜ s 1 and ˜ s 2 a numb er s 1 exists such that s 1 ≥ ˜ s 1 , s 1 ≥ ˜ s 2 , and P { I = 1 } ≥ 1 − δ ′ , wher e P { I = 1 } = f ( s 1 , ˜ s 2 ) (and P { I = 2 } = 1 − P { I = 1 } ). Pr o of . Supp ose that for some pair ˜ s 1 , ˜ s 2 of rea l num b ers the contrary statement holds. Then we ca n c onstruct t wo exper ts with cum ulative gains s 1 1: t − 1 , s 2 1: t − 1 , t = 1 , 2 , . . . , and with step- g ains equal 0 or 1 s uch that (1) is violated. Deﬁne the seq uences s 1 t , s 2 t , t = 1 , 2 , . . . t 0 , such that s 1 t , s 2 t are equa l to 0 or 1 and suc h that s 1 1: t 0 = ˜ s 1 and s 2 1: t 0 = ˜ s 2 for some t 0 . After that, deﬁne s 1 t = 1 and s 2 t = 0 for all t > t 0 . W e have s 1 1: t − 1 > s 2 1: t − 1 and P { I = 1 } < 1 − δ ′ for a ll suﬃciently lar g e t . Then for the exp ected o ne-step g ain of the master alg o rithm, E ( s t ) < 1 ◦ P { I = 1 } + 0 ◦ P { I = 2 } ≤ 1 − δ ′ = s 1 t (1 − δ ′ ) holds for all these t . Since s 1 1: t → ∞ as t → ∞ , w e hav e E ( s 1: t ) < (1 − δ ) s 1 1: t for all suﬃcien tly large t . T his is a contradiction with (1). Hence, for so me t we hav e s 1 1: t − 1 > s 2 1: t − 1 and P { I = 1 } ≥ 1 − δ ′ , where P { I = 1 } = f ( s 1 1: t − 1 , s 2 1: t − 1 ). △ W e deﬁne tw o e x perts with unbounded one-s tep gains as follows. Deﬁne s 1 0 = 0 and s 2 0 = 0. By Lemma 1 a num b er s 1 > 0 exists s uc h that P { I = 1 } ≥ 1 − δ , where P { I = 1 } = f ( s 1 , 0 ). Deﬁne s 1 1 = s 1 , s 2 1 = 0. Let t b e even, and let s 1 1: t − 1 and s 2 1: t − 1 be deﬁned on prev ious steps. W e will use the induction h yp othesis: s 1 1: t − 1 > s 2 1: t − 1 and P { I = 1 } ≥ 1 − δ . By deﬁnition this inductio n hypothesis holds t = 2 . Deﬁne one- s tep g ains o f exp erts 1 and 2 at step t : s 1 t = 0 and s 2 t = M t , where M t = E ( s 1: t − 1 ) δ ′ − δ and E ( s 1: t − 1 ) is the mathematical exp ectation of the cum ula tive g ain o f the master a lgorithm on steps < t . Let t be o dd. By Lemma 1 a n umber s 1 exists such that s 1 ≥ s 1 1: t − 1 , s 1 ≥ s 2 1: t − 1 , and P { I = 1 } ≥ 1 − δ , where P { I = 1 } = f ( s 1 , s 2 1: t − 1 ). Deﬁne s 1 t = s 1 − s 1 1: t − 1 and s 2 t = 0. Then s 1 1: t = s 1 and s 2 1: t = s 2 1: t − 1 . Evidently , the induction hypothesis is v alid after step t . Let us prov e that this deﬁnition is corr ect. Let t be even. By the induc- tion hypothesis s 1 1: t − 1 > s 2 1: t − 1 and P { I = 1 } ≥ 1 − δ , wher e P { I = 1 } = f ( s 1 1: t − 1 , s 2 1: t − 1 ). Then P { I = 2 } < δ . By deﬁnition s 1 1: t = s 1 1: t − 1 and s 2 1: t = s 2 1: t − 1 + M t . Then we obtain an upp er b ound for the exp ected one-step g ain o f the ma s ter algo rithm E ( s t ) = s 1 t E { I = 1 } + s 2 t E { I = 2 } ≤ δ M t . F or exp ected cumulativ e g ain, we hav e E ( s 1: t ) ≤ E ( s 1: t − 1 ) + δ M t ≤ δ ′ ( s 2 1: t − 1 + M t ) = δ ′ s 2 1: t . (3) Inequality (3) ho lds for a ll even steps t . △ Decreasing the lo wer b ound o f the p erforma nce of a probabilis tic algo rithm for games with b ounded one-step gain functions we can increase it for games with unbounded g ain functions. The limit ca s e is giv en by the following s imple example. Evidently , the exp ected cumulativ e gain of the proba bilistic algo rithm which choos es one of tw o exp erts with equal probabilities 1 2 has the low er b ound E ( s 1: t ) = 1 2 s 1 1: t + 1 2 s 2 1: t ≥ 1 2 max i =1 , 2 s i 1: t (4) for i = 1 , 2. The following s imple dia gonal argument shows tha t the cumulative g ain of any probabilistic a lgorithm of fo llowing the leader can b e big ger than this b ound for so me exp erts, ana lo gously , it can b e smaller for some exp erts. Prop osition 1. F or any δ such that 0 < δ < 1 and for any pr ob abilistic al- gorithm of following the b est exp ert, two exp erts exist s u ch that t he exp e cte d cumulative gain of this algorithm satisﬁes E ( s 1: t ) ≤ 1 2 (1 + δ ) max i =1 , 2 s i 1: t , (5) for al l su ﬃciently lar ge t , wher e s 1 1: t , s 2 1: t ar e cumulative gains of these exp erts . Analo gously, t wo exp erts exist such that E ( s 1: t ) ≥ 1 2 (1 − δ ) max i =1 , 2 s i 1: t (6) for al l suﬃciently lar ge t . Pr o of . Given a probabilistic a lgorithm of following the best exper t and δ s uch that 0 < δ < 1 deﬁne recursively the gains o f exp ert 1 and exp ert 2 at any step t a s follows. Let s 1 1: t − 1 and s 2 1: t − 1 be cum ulative gains of these expe r ts incurr ed at steps < t . Let M t = E ( s 1: t − 1 ) /δ , where E ( s 1: t − 1 ) is the exp ected cumulativ e gain o f the mas ter algo rithm in the past. If P { I = 1 } > 1 2 then deﬁne s 1 t = 0 a nd s 2 t = M t , a nd deﬁne s 1 t = M t and s 2 t = 0 otherwise. Then E ( s t ) = s 1 t P { I = 1 } + s 2 t P { I = 2 } ≤ 1 2 M t and E ( s 1: t ) = E ( s 1 1: t − 1 ) + E ( s t ) ≤ 1 2 (1 + δ ) M t ≤ 1 2 (1 + δ ) max i =1 , 2 s i 1: t for a ll s uﬃcien tly large t . T o prov e (6 ) deﬁne s 1 t = M t and s 2 t = 0 if P { I = 1 } > 1 2 , and deﬁne s 1 t = 0 and s 2 t = M t otherwise. The follo wing deriv ation is a nalogous to the pro of of (5). △ 3 Asymptotically optimal algorithm of follo wing the p erturb ed leader In this s ection we show that the b ounds (1 ) and (2) obtained in Theor em 1 can be achieved b y some pro ba bilistic algorithm. More correctly , for any δ > 0 using the metho d o f fo llowing the p erturb ed leader we construct a universal a lgorithm such that for any δ suc h that 0 < δ < 1 the lower b ound E ( s 1: t ) ≥ (1 − δ ) max i =1 , 2 s i 1: t is v alid for a ll suﬃcient ly larg e t for arbitrary tw o exper ts ( i = 1 , 2) with bo unded one-step gain functions (and even in mo re general case), and, at the same time, for so me δ ′ > 0 the b ound E ( s 1: t ) ≥ δ ′ max i =1 , 2 s i 1: t is v alid for all exp erts with arbitrary un b ounded one-step g ain functions. Here E ( s 1: t ) is the cumulativ e exp e cted gain of the master algor ithm. Note that in this section the cumulativ e ga in is always nonnega tiv e s i 1: t ≥ 0 for all t and for i = 1 , 2. In Section 4 we consider the case when the gains can be ne g ative, i.e., exp erts can incur los s es. Recall that, for s implicity , we supp ose that at any step t only one exp ert c a n r eceive a p ositive one-step gain, i.e., s 1 t = 0 or s 2 t = 0. W e denote s t = max { s 1 t , s 2 t } . Let ξ 1 1 , ξ 2 1 , ξ 1 2 , ξ 2 2 , . . . be a s equence of i.i.d. ra ndom v ariables distributed ac- cording to the exp onential law with the density p ( t ) = e − t . W e co nsider the FP L algo r ithm with lea rning rate ǫ t − 1 = 1 µ ma x { s 1 1: t − 1 , s 2 1: t − 1 } , (7) where t = 1 , 2 , . . . and µ , where 0 < µ < 1, is a para meter o f the algor ithm. W e supp ose without loss of genera lit y that s 1 0 = s 2 0 = 1. By deﬁnition the sequence ǫ 1 , ǫ 2 , . . . is no n- decreasing. The FPL algor ithm is deﬁned as follows: FPL algorithm . F O R t = 1 , . . . T Output pr ediction of exp ert i = i max which maximizes s i 1: t − 1 + 1 ǫ t − 1 ξ i t , (8) where i = 1 , 2, ǫ t − 1 is deﬁned by (7). Receive one-s tep gains s i t for exp erts i = 1 , 2, a nd deﬁne one step gain s i max t of the master algorithm. ENDF O R Recall that a game G of tw o exper ts is called non-degener ate if v t = max { s 1 1: t , s 2 1: t } → ∞ as t → ∞ , wher e s i 1: t is the cum ulative gain o f the ex pert i = 1 , 2 at step t . The n umber Dev( G ) = lim sup t →∞ s t v t , (9) where s t = max { s 1 t , s 2 t } , is ca lled the deviatio n o f the game G . F or any game Dev( G ) ≤ 1 b y deﬁnition. In any non- de g enerate game G with bo unded o ne- step g ain function, i.e. such that 0 ≤ s t ≤ A for all t ( A is a po sitiv e real nu mber ), Dev( G ) = 0 . Theorem 2. F or any µ such t hat 0 < µ < 1 an FPL algorithm c an b e sp e ciﬁe d such that for any non- de gener ate game of two ex p erts its exp e cte d cum u lative gain at any step T has the lower b ound l 1: T ≥ e − 2 µ (1 − µ ) max i =1 , 2 s i 1: T , (10) wher e s i 1: T is the cumulative gain of the exp ert i = 1 , 2 . If Dev( G ) ≤ 1 2 µδ for some 0 < δ < 1 then l 1: T ≥ (1 − δ )(1 − µ ) max i =1 , 2 s i 1: T (11) holds for al l s uﬃciently lar ge T . 1 Pr o of . This theorem will follow from Theorem 3 and Corollar y 2 b elo w. In the pro of we follow the pr o of-s c heme of [5] and [7]. △ The ana lysis of optimality o f the FPL algorithm is based on an intermediate predictor IFPL (Infeasible FPL) with the lear ning rate ǫ t = 1 µv t , (12) where v t = max { s 1 1: t , s 2 1: t } . IFPL algorithm . F O R t = 1 , . . . T Output pr ediction of exp ert i = i max with maximal v alue of s i 1: t + 1 ǫ t ξ i t , where i = 1 , 2, ǫ t is deﬁned by (12), a nd ξ 1 t , ξ 2 t are indep enden t r andom v a riables distributed according to the expo nen tial distribution with the density p ( t ) = e − t . 1 The optimal v alue of µ for (10) is µ = 0 . 618. Then l 1: T ≥ 0 . 015 max i =1 , 2 s i 1: T in ( 10) and l 1: T ≥ 0 . 382(1 − δ ) max i =1 , 2 s i 1: T in (11). Comparing these b ounds with (4), we revea l a larg e gap b et ween b ound s (10) and (11). A uthor d oes not k n o w if we can increase the low er b ound (10) when µ ≈ 1 2 in (11). Receive one-s tep gains s i t for exp erts i = 1 , 2, a nd deﬁne one step gain s i max t of the master algorithm. ENDF O R The IFPL algorithm predicts under the kno wledg e of s 1 1: t and s 2 1: t ( ǫ t is their maximum), which b oth may no t b e av ailable a t be ginning of step t . Using unknown v alue of ǫ t (like s i 1: t , i = 1 , 2) is the main p eculiarity of o ur version of IFPL. T o distinguish the ga ins of the FPL and IFPL algo rithms we denote s I t a one-step ga in of the FP L algor ithm at step t and s J t is a one-step gain of the IFPL algorithm. The exp ected one-step gains of the FPL a nd IFPL alg orithms at the step t are denoted l t = E t ( s I t ) a nd r t = E t ( s J t ). Theorem 3. F or any µ , 0 < µ < 1 , the exp e cte d one-step gai n l t of the FPL algorithm with le arning r ate (7) and the exp e cte d one-st ep gain r t of the IFPL algorithm with le arning r ate (12) s at isfy the ine qualities l t ≥ e − 2 µ r t (13) for al l t . If Dev ( G ) ≤ 1 2 µδ for some 0 < δ < 1 t hen l t ≥ (1 − δ ) r t (14) holds for al l s uﬃciently lar ge t . Pr o of . F or any t > 0, denote ξ 1 = ξ 1 t , ξ 2 = ξ 2 t and consider tw o random v ariables I =  1 if s 1 1: t − 1 + 1 ǫ t − 1 ξ 1 > s 2 1: t − 1 + 1 ǫ t − 1 ξ 2 2 other wise and J =  1 if s 1 1: t + 1 ǫ t ξ 1 > s 2 1: t + 1 ǫ t ξ 2 2 otherwise Recall that v t = max { s 1 1: t , s 2 1: t } for a ll t . F or any r eal n umber r we compare conditional probabilities P { I = 1 | ξ 2 = r } with P { J = 1 | ξ 2 = r } and P { I = 2 | ξ 2 = r } with P { J = 2 | ξ 2 = r } . In our analysis, the nont r ivial cases are s 2 1: t = s 2 1: t − 1 + s t and s 1 1: t = s 1 1: t − 1 or s 1 1: t = s 1 1: t − 1 + s t and s 2 1: t = s 2 1: t − 1 , wher e s t > 0 (w e indicate thes e cases in (16)-(19) below by ± ). In this case the following c ha in of equalities is v alid: P { I = 1 | ξ 2 = r } = P { s 1 1: t − 1 + 1 ǫ t − 1 ξ 1 > s 2 1: t − 1 + 1 ǫ t − 1 r | ξ 2 = r } = P { ξ 1 > ǫ t − 1 ( s 2 1: t − 1 − s 1 1: t − 1 ) + r | ξ 2 = r } = P { ξ 1 > ǫ t ( s 2 1: t − 1 − s 1 1: t − 1 ) + ( ǫ t − 1 − ǫ t )( s 2 1: t − 1 − s 1 1: t − 1 ) + r | ξ 2 = r } = (15) e − ( ǫ t − 1 − ǫ t )( s 2 1: t − 1 − s 1 1: t − 1 ) P { ξ 1 > 1 µv t ( s 2 1: t − 1 − s 1 1: t − 1 ) + r | ξ 2 = r } = (16) e −  1 µv t − 1 − 1 µv t  ( s 2 1: t − 1 − s 1 1: t − 1 ) ± s t µv t P { ξ 1 > 1 µv t ( s 2 1: t − 1 ± s t − s 1 1: t − 1 ) + r | ξ 2 = r } = (17) e s t µv t  γ t s 1 1: t − 1 − s 2 1: t − 1 v t − 1 ± 1  P { ξ 1 > 1 µv t ( s 2 1: t − s 1 1: t ) + r | ξ 2 = r } = (18) e s t µv t  γ t s 1 1: t − 1 − s 2 1: t − 1 v t − 1 ± 1  P { J = 1 | ξ 2 = r } . (19) Here we have used twice, in (1 5)-(16) and in (16)-(17), the equa lity P { ξ > a + b } = e − b P { ξ > a } for an y rando m v ariable ξ distributed a ccording to the exp onen tial law; we also used the equality v t − v t − 1 = γ t s t , where 0 ≤ γ t ≤ 1, in (1 6). The exp onen t (19) is b ounded 2 ≥ γ t s 1 1: t − 1 − s 2 1: t − 1 v t − 1 ± 1 ≥ − 2 . (20) These b ounds follow fro m the inequalities s i 1: t − 1 /v t − 1 ≤ 1 and s i 1: t − 1 ≥ 0 for a ll t a nd for i = 1 , 2. W e also used the inequa lit y s t /v t ≤ 1 for a ll t . Therefore, e 2 µ P { J = 1 | ξ 2 = r } ≥ P { I = 1 | ξ 2 = r } ≥ e − 2 µ P { J = 1 | ξ 2 = r } . (21) Since, the the inequality (21) ho lds for all r , it also holds unconditiona lly e 2 µ P { J = 1 } ≥ P { I = 1 } ≥ e − 2 µ P { J = 1 } . (22) Analogously , we obtain e 2 µ P { J = 2 } ≥ P { I = 2 } ≥ e − 2 µ P { J = 2 } (23) for a ll t = 1 , 2 , . . . . If Dev ( G ) ≤ 1 2 µδ , w hen for suﬃciently larg e t the ex p onent (19) is bo unded from b elow by e − 2 µ s t v t ≥ e − δ ≥ 1 − δ. F rom this (14) follows. F rom (22 ) and (23) we obtain the low er bo und (13) l t = E ( s I t ) = s 1 t P ( I = 1) + s 2 t P ( I = 2) ≥ s 1 t e − 2 µ P ( J = 1 ) + s 2 t e − 2 µ P ( J = 2 ) = e − 2 µ E ( s J ) = e − 2 µ r t . (24) △ The connectio n b et ween exp ected cumulativ e g ain of the IFPL algo rithm r 1: T = T X t =1 r t and exp ected cumulativ e g a in of the FP L algor ithm l 1: T = T X t =1 l t . is given in the following corollar y . Corollary 1. F or any µ and η , 0 < µ, η < 1 , the exp e cte d cumu lative gains of the IFPL and FPL algorithms with p ar ameters deﬁne d in The or em 3 s atisfy the fol lowing ine qualities l 1: T ≥ e − 2 µ r 1: T (25) for al l T . If Dev( G ) ≤ 1 2 µδ for some 0 < δ < 1 then l 1: T ≥ (1 − δ ) r 1: T holds for al l s uﬃciently lar ge T . The seco nd bound also holds for unbounded one-step gain games and so, it is some improvemen t of results of [7 ] and [5]. The following theo r em, whic h is a n analogue of the result fro m [7 ], g iv es a bo und for the I FP L algor ithm Theorem 4. The exp e cte d cumulative gain of the IFPL algorithm with the le arn- ing r ate (12) is b ounde d by r 1: T ≥ ma x i =1 , 2 s i 1: T − 1 ǫ T (26) for al l T . The pro of is along the line o f the pro of from [5] (which is a reﬁnement of the pro of from [7]). Let in this pro of s t = ( s 1 t , s 2 t ) b e a v ector of one step gains a nd s 1: t = ( s 1 1: t , s 2 1: t ) b e a v ector of cumulativ e ga ins of t wo e x perts, also let ξ be a vector whose c o or dinates are random v ariables ξ 1 t and ξ 2 t . Deﬁne ǫ 0 = ∞ and ˜ s 1: t = s 1: t + 1 ǫ t ξ t for t = 1 , 2 , . . . . Consider the o ne-step gains ˜ s t = s t + ξ t  1 ǫ t − 1 ǫ t − 1  for the moment. F or any vector s and a unit vector d denote M ( s ) = a rgmax d ∈ D { d ◦ s } , where D = { (0 , 1) T , (1 , 0) T } is the set of tw o unit vectors of dimensio n 2 and ◦ is the inner pro duct of tw o v ec tors. W e ﬁrs t show that T X t =1 M ( ˜ s 1: t ) ◦ ˜ s t ≥ M ( ˜ s 1: T ) ◦ ˜ s 1: T . (27) F or T = 1 this is obvious. F or the induction step fro m T − 1 to T w e need to show that M ( ˜ s 1: T ) ◦ ˜ s T ≥ M ( ˜ s 1: T ) ◦ ˜ s 1: T − M ( ˜ s 1: T − 1 ) ◦ ˜ s 1: T − 1 . This follows from ˜ s 1: T = ˜ s 1: T − 1 + ˜ s T and M ( ˜ s 1: T ) ◦ ˜ s 1: T − 1 ≤ M ( ˜ s 1: T − 1 ) ◦ ˜ s 1: T − 1 . W e r e w r ite (2 7 ) as follows T X t =1 M ( ˜ s 1: t ) ◦ s t ≥ M ( ˜ s 1: T ) ◦ ˜ s 1: T − T X t =1 M ( ˜ s 1: t ) ◦ ξ t  1 ǫ t − 1 ǫ t − 1  . (28) By the deﬁnition of M we hav e M ( ˜ s 1: T ) ◦ ˜ s 1: T ≥ M  s 1: T + 1 ǫ T  ◦  s 1: T + ξ ǫ T  = max d { d ◦ s 1: T } + M  s 1: T + 1 ǫ T  ◦ ξ T ǫ T . (29) The expec tation of the la st term in (29) is equal to 1 ǫ T . W e hav e a lso T X t =1  1 ǫ t − 1 ǫ t − 1  M ( ˜ s 1: t ) ◦ ξ t ≤ T X t =1  1 ǫ t − 1 ǫ t − 1  M ( ξ t ) ◦ ξ t . (30) W e have P { max ξ > y } ≤ P { ξ 1 > y } + P { ξ 2 > y } = 2 e − y . Since E ( M ( ξ t ) ◦ ξ t ) = E (max { ξ 1 , ξ 2 } ) ≤ Z ∞ 0 2 e − y dy = 2 , the exp ectation o f (30) ha s upper bo und 2 ǫ T . Combining the b ounds (28)-(30) we o bta in (26). △ . Corollary 2. L et µ , 0 < µ < 1 , b e given. If the game of t wo ex p erts is non- de gener ative then the exp e cte d cumulative gain of the IFPL algorithm is b oun de d by r 1: T ≥ max i =1 , 2 s i 1: T (1 − µ ) . 4 Zero sum games W e consider a simplest example of the g ame of prediction with exper t a dvice with arbitrary p ositive and negative one-step gains a nd lo sses. W e apply these results in App endix A. W e consider a game G of tw o exper ts with zero s um, i.e., s 1 t = − s 2 t at ea c h step t of the game. If a one-step ga in is negative it is called a lo ss. There are no restrictions o n the absolute v a lues of s 1 t . Deﬁne a volume of the game at step t V t = t X j =1 | s 1 j | . A ga me w ith zero sum is ca lle d non-de gener ate if lim t →∞ V t = ∞ . Analo gously to (9) we conside r the deviation of the game G with zero sum Dev( G ) = lim sup t →∞ s t V t , where s t = | s 1 t | a nd V t is the volume of the g a me. Evidently , the exp ected cumulative ga in of the a lgorithm which cho o ses one of t wo exp erts with pro ba bilit y 1 2 equals z ero. The following pro p osition is a n analog ue o f Pr opo s ition 1 . Prop osition 2. F or any pr ob abilistic algorithm of fol lowing the b est ex p ert, two exp erts exist such that the exp e cte d cumulative gain of this algorithm E ( s 1: t ) ≤ 0 and two exp erts exist such that E ( s 1: t ) ≥ 0 for al l t . Pr o of . If P { I = 1 } > 1 2 deﬁne s 1 t = 1, s 2 t = − 1 and deﬁne s 1 t = − 1, s 2 t = 0 otherwise. The following estimates a re analogous to that g iv en in the pro of of Prop osition 1. △ The following theorem whic h is an analo gue of the Theorem 1 for games with zero sum shows tha t if a pr obabilistic a lg orithm of the following the leader has high p erforma nc e in games with b ounded one-step ga ins then its exp ected cum ula tiv e gain in some games with unbounded one - step exp ert gains can b e arbitrar y neg ativ e. Theorem 5. L et L t b e any se quenc e of p ositive r e al numb ers, t = 1 , 2 , . . . . L et δ, δ ′ b e arbitr ary close and arbitr ary smal l p ositive r e al n umb ers such that δ > δ ′ , and let for any t wo ex p erts with b ounde d one-step gains s t , i.e. such that 0 ≤ s t ≤ 1 for al l t , the exp e cte d cumulative gain of the master algorithm has the lower b ound E ( s 1: t ) ≥ (1 − δ ) | s 1 1: t | (31) for al l suﬃciently lar ge t . Then t her e exist two non-de gener ate exp erts with un- b ounde d one-step gains such that the exp e cte d p erformanc e of the m ast er algo- rithm is b ounde d fr om ab ove E ( s 1: t ) ≤ 2 δ ′ | s 1 1: t | − (1 − 2 δ ′ ) V t (32) and su ch that V t ≥ L t for inﬁnitely many t , wher e V t is t he volume of the game. Pr o of . The pro of is similar to the pro of o f Theorem 1 . It use s a mo diﬁed version of Lemma 1 which is also v alid for negative gains with some evide nt mo diﬁcations. 2 Let a master a lg orithm b e given. W e deﬁne tw o exp erts with unbounded one-step ga ins as follows. Deﬁne s 1 1 = s 2 1 = 0. By mo diﬁed version of Lemma 1 a num b er s 1 exists such that s 1 > 0 a nd P { I = 1 } ≥ 1 − δ , where P { I = 1 } = f ( s 1 , − s 1 ). Let t b e even, and let s 1 1: t − 1 and s 2 1: t − 1 = − s 1 1: t − 1 be deﬁned on pr evious steps. W e will use the induction h yp othesis: s 1 1: t − 1 > 0 a nd P { I = 1 } ≥ 1 − δ , where P { I = 1 } = f ( s 1 1: t − 1 , s 2 1: t − 1 ). Deﬁne o ne-step gains o f exp erts 1 and 2: s 1 t = − M t and s 2 t = M t , whe r e M t = max  | E ( s 1: t − 1 ) | 2( δ − δ ′ ) , L t , V t − 1 δ  . Let t b e o dd. B y mo diﬁed version of Lemma 1 a num b er s 1 exists such that s 1 > | s 1 1: t − 1 | and P { I = 1 } ≥ 1 − δ , wher e P { I = 1 } = f ( s 1 , − s 1 ). Deﬁne s 1 t = s 1 − s 1 1: t − 1 , then s 1 1: t = s 1 , and s 2 t = − s 1 t . Ev iden tly , the induction hypothesis is v alid after o dd step t . Let us pro ve that this constr uction is corr ect. Let t be even. Then by the induction hypothesis s 1 1: t − 1 > 0 and P { I = 1 } ≥ 1 − δ (and P { I = 2 } < δ ). B y deﬁnition s 1 1: t = s 1 1: t − 1 − M t and s 2 1: t = s 2 1: t − 1 + M t . The exp ected one- step g ain of the master algorithm is b ounded E ( s t ) = s 1 t P { I = 1 } + s 2 t P { I = 2 } ≤ − (1 − δ ) M t + δ M t = − (1 − 2 δ ) M t . By deﬁnition L t ≤ M t ≤ V t = V t − 1 + M t ≤ (1 + δ ) M t . Then E ( s 1: t ) ≤ E ( s 1: t − 1 ) − (1 − 2 δ ) M t ≤ − (1 − 2 δ ′ ) M t = − M t + 2 δ ′ M t ≤ − (1 − δ ) V t + 2 δ ′ | s 1 1: t | for a ll even steps t . △ W e co nsider the non- de gener ate games , i.e., such that V t is unbounded. T o obtain the low er b ounds w e r e duce our zero sum g a me to a game with non-negative one-step g ains. Deﬁne one-step g ain of new e x perts ˜ s i t = s i t + | s 1 t | for i = 1 , 2. Then ˜ s i t ≥ 0 for all t and ˜ s 1 t = 0 or ˜ s 2 t = 0 for all t . By deﬁnition ˜ s i 1: t = s i 1: t + V t for i = 1 , 2, where V t is the volume of the initial game. Ev ide ntly , the FPL and IFPL algor ithms deﬁned in Section 3 make the same choices for exp erts of b oth type. The exp ected one-s tep gains of the mas ter algo rithm for fo r exp erts of bo th t yp e satisfy ˜ l t = s 1 t P { I = 1 } + s 2 t P { I = 1 } + | s t | . This implies the equa lit y ˜ l 1: t = l 1: t + V t for exp ected cumulativ e gains . The analogous equa lities hold for ˜ r t , ˜ r 1: t and r t , r 1: t . The following theore m is a cor ollary of Theor em 2. 2 A modiﬁ ed version of Lemma 1 lo oks as follo ws: Let δ, δ ′ b e p ositive real numbers such that δ ′ > δ , and let for any tw o exp erts with b ounded one-step gains (31) h olds for all su ﬃcien tly large t . Then for any num b er ˜ s 1 a number s 1 > 0 exists such that s 1 ≥ ˜ s 1 and P { I = 1 } ≥ 1 − δ ′ , where P { I = 1 } = f ( s 1 , − s 1 ). Theorem 6. F or any µ such t hat 0 < µ < 1 an FPL algorithm c an b e sp e ciﬁe d such that for any non- de gener ate game of two ex p erts its exp e cte d cum u lative gain at any step T has the lower b ound l 1: T ≥ e − 2 µ (1 − µ ) | s 1 1: T | − V T (1 − e − 2 µ (1 − µ )) , (33) wher e s 1 1: T is the cumulative gain of the t he ﬁrst ex p ert and V T is the volume of the game at step T . If Dev( G ) ≤ 1 2 µδ for some 0 < δ < 1 then l 1: T ≥ (1 − δ )(1 − µ ) | s 1 1: T | − ( δ + µ ) V T (34) holds for al l s uﬃciently lar ge T . Pr o of . This theorem follows from Theorem 2 and relations b et ween o ne-step gains ˜ s i t and s i t , i = 1 , 2, of tw o t yp e of exp erts. △ Remark . In case when Dev( G ) ≤ 1 4 µδ , the b ound (34 ) can b e improv ed for some t if we r eplace in Section 3 the learning rate (7) on ǫ t − 1 = 1 µ ma x j 1 2 (a smo other trend). Exp ert 2 uses the hypo thesis that the Hurst exp onent is < 1 2 (volatilit y is high). It is reasonable to derandomize the FPL algor ithm for this ﬁnancial g ame. F or that, the investor must follow b oth exp erts strategies sim ulta ne o usly holding P { I = 1 } C 1 t + P { I = 2 } C 2 t shares of a sto c k at any step t . In this case Theorem 6 holds, where the exp ected gain at step t is re placed on a pure ga in s t = P { I = 1 } C 1 t ∆S t + P { I = 2 } C 2 t ∆S t . 4 References 1. Cham y Allen b erg, P eter Auer, Laszlo Gy orﬁ and Gy orgy Ottucsak: Hannan Consis- tency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring. LNCS, V olume 4264, 229-243, Sp ringer-V erlag Berlin Heidelb erg 2006. 2. Delbaen F., Schac hermay er W.: A general version of the fundamental theorem of asset pricing. Mathematische Annalen, 300 (1994), 463-520. 3 W e su ppose that th is price is also v alid at the b eginning of the p erio d t + 1. 4 Analogously we can derandomize all probabilistic games of th is pap er if we allo w for Learner to receive a given fraction of the gain of an exp ert. 3. Cheredito P .: Arbitrage in fractional Brownia n motion, Finance and St atistics, 7 (4) (2003), 533-553. 4. Hannan J.: Ap p ro ximation to Bay es risk in rep eated p lays. In M. Dresher, A.W. T uck er, and P . W olfe, editors, Con tributions to the Theory of Games 3, 97-13 9, Princeton Universit y Press, 1957. 5. Hutter M., Poland J.: Prediction with exp ert ad v ice b y follo wing the p erturb ed leader for general w eights, (S .Ben-Da wid, J.Case , A.Maruok a (Eds.)): AL T 2004 LNAI 3244, 279-293. Springer-V erlag Berlin Heidelb erg 2004. 6. P oland J., Hutter M.: Defensive u niv ersal learning with exp erts. for general weigh t. (S.Jain, H.U.Simon an d E.T omita (Eds.)): A L T 2005 (S.Jain, H.U.Simon and E.T omita (Eds.)), LNAI 3734, 356-370. S pringer-V erlag Berlin Heidelb erg 2005. 7. Kalai A., V emp ala S.: Eﬃcient algorithms for online decisions. In Pro ceedings of t he 16th Annual Conference on Learning Theory (COL T-2003), LN AI, 506-521, Berlin, 2003, Springer. Extend ed version in Journal of Computer and S ystem Sciences, 71, 2005, 291-307. 8. Rogers C.: Arbitrage with fractional Bro wnian motion. Mathematical Finance, 7 (1997), 95-105. 9. V o v k V.: A game-theoretic explanation of t h e √ dt eﬀect, W orking pap er #5, 2003, http://w ww.probability andﬁn ance.com

Prediction with Expert Advice in Games with Unbounded One-Step Gains

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment