Prediction with Expert Advice in Games with Unbounded One-Step Gains
The games of prediction with expert advice are considered in this paper. We present some modification of Kalai and Vempala algorithm of following the perturbed leader for the case of unrestrictedly large one-step gains. We show that in general case t…
Authors: Vladimir V. Vyugin
Prediction with Exp ert Advice in Games with Un b ounded One-Step Gains Vladimir V. V’yugin Institute for Information T ransmission Problems, Russian Academy of Sciences, Bol’shoi Karetnyi p er. 19, Moscow GSP-4, 127994, R u ssia e-mail vyugin@iitp.ru Abstract. The games of prediction with exp ert adv ice are considered in this p ap er. W e presen t some modification of Kalai and V empala al- gorithm of follo wing t h e p erturb ed leader for t he case of unrestrictedly large one-step gains. W e show that in general case t he cumulativ e gain of any probabilistic prediction algorithm can b e muc h w orse than th e gain of some exp ert of th e p ool. Neve rth eless, we give the low er b ound for this cumulativ e gain in general case and construct a universal algo- rithm whic h has the optimal performance; we also prov e that in case when one-step gains of exp erts of the po ol hav e “limited deviations” the p erformance of our algorithm is close to the p erformance of th e b est exp ert. 1 In tro duction Exp erts algorithms are used fo r online pr ediction or rep eated decisio n making o r rep eated game playing. Any such a lgorithm is based o n a “p oo l of exp erts”. At any step t , each exper t gives its recommenda tio n. F rom this, a “ master decision” is p erformed. After that, lo sses (or r ew a r ds) s i t are assig ned to each e xpert i = 1 , . . . , m by the environment (or adversary). The master algorithm also receives some loss or re ward dep ending on the mas ter decisio n. The goa l of the mas ter algorithm is to p erform almos t as well as the b est exp ert in hinds ig h t in the long run. Prediction with E xpert Advice considered in this pap er pro ceeds as follows. W e ar e asked to p erform sequential actions a t times t = 1 , 2 , . . . , T . A t each time step t , we observe res ults of actions o f experts in the for m of their g ains and lo sses o n s teps < t . After that, at the b eginning of the step t L e arner makes a decision to follow one of these exper ts, say Exp ert i . At the end of step t Learner r eceives the same g ain or loss as Exp ert i at step t . W e use notations and definitions from [5] and [7]. Let s i 1: t = s i 1 + . . . + s i t be the cumulativ e loss of E x pert i a t time t . Giv en s i 1: t − 1 , i = 1 , . . . , m , a t time t , a natural idea to so lv e the exp ert pr oblem is “to follow the leader” , i.e. to select the exp ert i which p erformed best in the past. The follo wing simple example from Kalai and V empala [7] shows tha t Lea rner can perfo r m m uch worse than each ex pert: let the cur rent lo sses o f tw o expe r ts on steps t = 1 , . . . , 6 b e s 1 1 , 2 , 3 , 4 , 5 , 6 = (0 , 1 , 0 , 1 , 0 , 1) and s 2 1 , 2 , 3 , 4 , 5 , 6 = ( 1 2 , 0 , 1 , 0 , 1 , 0). The “F ollow L eader” algorithm always choo ses the wr ong predictio n. The metho d of following the p erturb ed leader was discov er ed by Hannan [4]. Kalai and V empala [7] rediscovered this metho d and published a simple pro of of the main result o f Hannan. They called the algo r ithm of this type FPL (F ol- lowing the Perturbe d Lea der). Hutter and Poland [5 ] presented a further devel- opments of the FPL algorithm for coun table c lass of exp erts, arbitra ry weigh ts and ada ptiv e learning rate. The FPL algor ithm outputs prediction of an exp ert i which minimizes s i 1: t − 1 − 1 ǫ ξ i t , where ξ i t , i = 1 , . . . m , t = 1 , 2 , . . . , is a sequence of i.i.d ra ndo m v ar iables dis- tributed according to the exponential distribution with the density p ( t ) = e − t , and ǫ is a le arning r ate . Kala i and V empala [7 ] show tha t the ex pected cumu la tiv e loss o f the FP L alg orithm has the upp er bound E ( s 1: t ) ≤ (1 + ǫ ) min i =1 ,...,n s i 1: t + log n ǫ , where ǫ is a lea rning ra te, n is the num ber of expe rts.. In the pa pers cited ab o ve the loss of each exp ert i ca n change a t any step t b y a b ounded quantit y , for example, 0 ≤ s i t ≤ 1 for all t . Poland and Hut- ter [6] extended this analysis for g a mes with one- step losses upp er b ounded b y an incr easing sequence B t given in adv ance, i.e., s t ≤ B t for all t . Allenberg et al. [1] a lso c o nsidered unbounded loss e s, but with different algor ithm than in this pa per. In g ames consider ed in this pa per the play ers will incur gains (loss is a neg- ative gain); s i t denotes o ne-step gain o f a pla yer i . F or practica l purpos es, the prop erty 0 ≤ s i t ≤ 1 seems to b e to o r estrictive. In Appendix A we consider so me applications o f r esults o f Sec tio ns 2-4 o f this pap er. W e define tw o financial exp erts learning the fractional Brownian motion whos e one-step gains a t any step ca n not be re stricted in adv ance. This application is at the bottom o f our special in terest in zero-sum games with un b ounded gains in Section 4. In this pa per we present some mo dification of Kalai and V empala algor ithm for the case of unrestric tedly large one- step gains not b ounded in adv ance. W e show that in genera l case, the cum ula tive gain o f an y probabilistic prediction algorithm can b e muc h worse than the g ain of some exper t of the p oo l. Neverthe- less, we give the lower b ound for cumulativ e g ain of any pr obabilistic algor ithm in general case and pr ove that our universal algo rithm has optimal p erformance; we also prove that in case when o ne-step ga ins of exp erts of the p o o l have “lim- ited dev ia tions” (in pa rticular, when they ar e b ounded) the p erfor mance of o ur algorithm is clos e to the p erforma nce of the b est exp ert. This res ult is so me improv ement of results mentioned ab ov e. 2 Learning in games of tw o exp erts with unbounded gains In this section we give some preliminar y results pres en ting the bounds on the per formance o f the alg orithm constr ucted in Section 3. W e consider a simple ga me G of pre dic tio n with exp ert a dvice by following of tw o ex p erts with unbounded o ne - step ga ins. The goa l of the master algo rithm is to receive a cumulativ e ga in not muc h worse than the gain o f the b est e xpert in hindsig h t. A t each step t of the ga me both exp erts receive the nonnegative one-step gains s 1 t and s 2 t , and their cumulativ e gains after step t are equal to s 1 1: t = s 1 1: t − 1 + s 1 t and s 2 1: t = s 2 1: t − 1 + s 2 t . F or simplicity , we consider a v aria n t when at each step t of the g a me G only one exper t can receive a nonnegative one-s tep ga in s t , and the total gain of the other exp ert is unchanged, i.e., s 1 1: t = s 1 1: t − 1 + s t and s 2 1: t = s 2 1: t − 1 or s 2 1: t = s 2 1: t − 1 + s t and s 1 1: t = s 1 1: t − 1 . In the genera l ca se the analysis is similar. W e als o consider non-de gener ate exp erts (games), i.e., such that max { s 1 1: t , s 2 1: t } → ∞ as t → ∞ . A pro babilistic alg orithm of following the leader in the game with tw o exp erts is based o n a computable function f which given cumulativ e gains s 1 1: t − 1 and s 2 1: t − 1 of the exp erts in hindside outputs the probability of following the first exp ert P { I = 1 } = f ( s 1 1: t − 1 , s 2 1: t − 1 ) and the probability of fo llowing the se cond exp ert P { I = 2 } = 1 − P { I = 1 } . The analysis in case when these probabilities dep end of the whole histor y o f gains is similar . Let tw o exp e rts b e g iven. The master algor ithm works as follows. Probabilistic algorithm of following the leader . F O R t = 1 , . . . T Given past cumulativ e gains o f the exper ts s 1 1: t − 1 and s 2 1: t − 1 choose the exp ert i ∈ { 1 , 2 } with pro babilit y P { I = i } . Receive the one- step gains at step t o f tw o exp e rts s 1 t and s 2 t and define one step gain s t = s i t of the master algo rithm. ENDF O R The following theorem s a ys that if a pro babilistic algo r ithm of following the leader has high p erformance in ga mes with b ounded one-s tep gains then its per formance in g a mes with un b ounded o ne-step gains can be m uch worse than the p erforma nce of so me exp erts. Theorem 1. L et δ, δ ′ b e arbitr ary close and arbitr ary smal l p ositive r e al nu m- b ers such that δ ′ > δ , and let for any two non-de gener ate exp ert s with b ounde d one-step gains s i t , i = 1 , 2 , i.e., such that 0 ≤ s i t ≤ 1 for al l t , a master algorithm has t he exp e cte d cumulative gain E ( s 1: t ) ≥ (1 − δ ) max i =1 , 2 s i 1: t (1) for al l sufficiently lar ge t . Then ther e exist two exp erts with un b oun de d one-st ep gains such that the exp e cte d cumulative gain of the master algorithm is b oun de d fr om ab ove E ( s 1: t ) ≤ δ ′ max i =1 , 2 s i 1: t (2) for infin itely many t . Pr o of . Let a master algorithm be giv en, a nd let P { I = 1 } = f ( s 1 , s 2 ) and P { I = 2 } = 1 − P { I = 1 } be probabilities to choo se the best exp ert fro m tw o exp erts with cumulativ e g a ins s 1 , s 2 . The pr oof of the theorem uses the following lemma. Lemma 1. L et δ, δ ′ b e p ositive r e al numb ers such that δ ′ > δ and for any two exp erts with b ounde d one-step gains t he master algo rithm has the ex p e cte d p er- formanc e (1) for al l su fficiently lar ge t . Then for any two r e al numb ers ˜ s 1 and ˜ s 2 a numb er s 1 exists such that s 1 ≥ ˜ s 1 , s 1 ≥ ˜ s 2 , and P { I = 1 } ≥ 1 − δ ′ , wher e P { I = 1 } = f ( s 1 , ˜ s 2 ) (and P { I = 2 } = 1 − P { I = 1 } ). Pr o of . Supp ose that for some pair ˜ s 1 , ˜ s 2 of rea l num b ers the contrary statement holds. Then we ca n c onstruct t wo exper ts with cum ulative gains s 1 1: t − 1 , s 2 1: t − 1 , t = 1 , 2 , . . . , and with step- g ains equal 0 or 1 s uch that (1) is violated. Define the seq uences s 1 t , s 2 t , t = 1 , 2 , . . . t 0 , such that s 1 t , s 2 t are equa l to 0 or 1 and suc h that s 1 1: t 0 = ˜ s 1 and s 2 1: t 0 = ˜ s 2 for some t 0 . After that, define s 1 t = 1 and s 2 t = 0 for all t > t 0 . W e have s 1 1: t − 1 > s 2 1: t − 1 and P { I = 1 } < 1 − δ ′ for a ll sufficiently lar g e t . Then for the exp ected o ne-step g ain of the master alg o rithm, E ( s t ) < 1 ◦ P { I = 1 } + 0 ◦ P { I = 2 } ≤ 1 − δ ′ = s 1 t (1 − δ ′ ) holds for all these t . Since s 1 1: t → ∞ as t → ∞ , w e hav e E ( s 1: t ) < (1 − δ ) s 1 1: t for all sufficien tly large t . T his is a contradiction with (1). Hence, for so me t we hav e s 1 1: t − 1 > s 2 1: t − 1 and P { I = 1 } ≥ 1 − δ ′ , where P { I = 1 } = f ( s 1 1: t − 1 , s 2 1: t − 1 ). △ W e define tw o e x perts with unbounded one-s tep gains as follows. Define s 1 0 = 0 and s 2 0 = 0. By Lemma 1 a num b er s 1 > 0 exists s uc h that P { I = 1 } ≥ 1 − δ , where P { I = 1 } = f ( s 1 , 0 ). Define s 1 1 = s 1 , s 2 1 = 0. Let t b e even, and let s 1 1: t − 1 and s 2 1: t − 1 be defined on prev ious steps. W e will use the induction h yp othesis: s 1 1: t − 1 > s 2 1: t − 1 and P { I = 1 } ≥ 1 − δ . By definition this inductio n hypothesis holds t = 2 . Define one- s tep g ains o f exp erts 1 and 2 at step t : s 1 t = 0 and s 2 t = M t , where M t = E ( s 1: t − 1 ) δ ′ − δ and E ( s 1: t − 1 ) is the mathematical exp ectation of the cum ula tive g ain o f the master a lgorithm on steps < t . Let t be o dd. By Lemma 1 a n umber s 1 exists such that s 1 ≥ s 1 1: t − 1 , s 1 ≥ s 2 1: t − 1 , and P { I = 1 } ≥ 1 − δ , where P { I = 1 } = f ( s 1 , s 2 1: t − 1 ). Define s 1 t = s 1 − s 1 1: t − 1 and s 2 t = 0. Then s 1 1: t = s 1 and s 2 1: t = s 2 1: t − 1 . Evidently , the induction hypothesis is v alid after step t . Let us prov e that this definition is corr ect. Let t be even. By the induc- tion hypothesis s 1 1: t − 1 > s 2 1: t − 1 and P { I = 1 } ≥ 1 − δ , wher e P { I = 1 } = f ( s 1 1: t − 1 , s 2 1: t − 1 ). Then P { I = 2 } < δ . By definition s 1 1: t = s 1 1: t − 1 and s 2 1: t = s 2 1: t − 1 + M t . Then we obtain an upp er b ound for the exp ected one-step g ain o f the ma s ter algo rithm E ( s t ) = s 1 t E { I = 1 } + s 2 t E { I = 2 } ≤ δ M t . F or exp ected cumulativ e g ain, we hav e E ( s 1: t ) ≤ E ( s 1: t − 1 ) + δ M t ≤ δ ′ ( s 2 1: t − 1 + M t ) = δ ′ s 2 1: t . (3) Inequality (3) ho lds for a ll even steps t . △ Decreasing the lo wer b ound o f the p erforma nce of a probabilis tic algo rithm for games with b ounded one-step gain functions we can increase it for games with unbounded g ain functions. The limit ca s e is giv en by the following s imple example. Evidently , the exp ected cumulativ e gain of the proba bilistic algo rithm which choos es one of tw o exp erts with equal probabilities 1 2 has the low er b ound E ( s 1: t ) = 1 2 s 1 1: t + 1 2 s 2 1: t ≥ 1 2 max i =1 , 2 s i 1: t (4) for i = 1 , 2. The following s imple dia gonal argument shows tha t the cumulative g ain of any probabilistic a lgorithm of fo llowing the leader can b e big ger than this b ound for so me exp erts, ana lo gously , it can b e smaller for some exp erts. Prop osition 1. F or any δ such that 0 < δ < 1 and for any pr ob abilistic al- gorithm of following the b est exp ert, two exp erts exist s u ch that t he exp e cte d cumulative gain of this algorithm satisfies E ( s 1: t ) ≤ 1 2 (1 + δ ) max i =1 , 2 s i 1: t , (5) for al l su fficiently lar ge t , wher e s 1 1: t , s 2 1: t ar e cumulative gains of these exp erts . Analo gously, t wo exp erts exist such that E ( s 1: t ) ≥ 1 2 (1 − δ ) max i =1 , 2 s i 1: t (6) for al l sufficiently lar ge t . Pr o of . Given a probabilistic a lgorithm of following the best exper t and δ s uch that 0 < δ < 1 define recursively the gains o f exp ert 1 and exp ert 2 at any step t a s follows. Let s 1 1: t − 1 and s 2 1: t − 1 be cum ulative gains of these expe r ts incurr ed at steps < t . Let M t = E ( s 1: t − 1 ) /δ , where E ( s 1: t − 1 ) is the exp ected cumulativ e gain o f the mas ter algo rithm in the past. If P { I = 1 } > 1 2 then define s 1 t = 0 a nd s 2 t = M t , a nd define s 1 t = M t and s 2 t = 0 otherwise. Then E ( s t ) = s 1 t P { I = 1 } + s 2 t P { I = 2 } ≤ 1 2 M t and E ( s 1: t ) = E ( s 1 1: t − 1 ) + E ( s t ) ≤ 1 2 (1 + δ ) M t ≤ 1 2 (1 + δ ) max i =1 , 2 s i 1: t for a ll s ufficien tly large t . T o prov e (6 ) define s 1 t = M t and s 2 t = 0 if P { I = 1 } > 1 2 , and define s 1 t = 0 and s 2 t = M t otherwise. The follo wing deriv ation is a nalogous to the pro of of (5). △ 3 Asymptotically optimal algorithm of follo wing the p erturb ed leader In this s ection we show that the b ounds (1 ) and (2) obtained in Theor em 1 can be achieved b y some pro ba bilistic algorithm. More correctly , for any δ > 0 using the metho d o f fo llowing the p erturb ed leader we construct a universal a lgorithm such that for any δ suc h that 0 < δ < 1 the lower b ound E ( s 1: t ) ≥ (1 − δ ) max i =1 , 2 s i 1: t is v alid for a ll sufficient ly larg e t for arbitrary tw o exper ts ( i = 1 , 2) with bo unded one-step gain functions (and even in mo re general case), and, at the same time, for so me δ ′ > 0 the b ound E ( s 1: t ) ≥ δ ′ max i =1 , 2 s i 1: t is v alid for all exp erts with arbitrary un b ounded one-step g ain functions. Here E ( s 1: t ) is the cumulativ e exp e cted gain of the master algor ithm. Note that in this section the cumulativ e ga in is always nonnega tiv e s i 1: t ≥ 0 for all t and for i = 1 , 2. In Section 4 we consider the case when the gains can be ne g ative, i.e., exp erts can incur los s es. Recall that, for s implicity , we supp ose that at any step t only one exp ert c a n r eceive a p ositive one-step gain, i.e., s 1 t = 0 or s 2 t = 0. W e denote s t = max { s 1 t , s 2 t } . Let ξ 1 1 , ξ 2 1 , ξ 1 2 , ξ 2 2 , . . . be a s equence of i.i.d. ra ndom v ariables distributed ac- cording to the exp onential law with the density p ( t ) = e − t . W e co nsider the FP L algo r ithm with lea rning rate ǫ t − 1 = 1 µ ma x { s 1 1: t − 1 , s 2 1: t − 1 } , (7) where t = 1 , 2 , . . . and µ , where 0 < µ < 1, is a para meter o f the algor ithm. W e supp ose without loss of genera lit y that s 1 0 = s 2 0 = 1. By definition the sequence ǫ 1 , ǫ 2 , . . . is no n- decreasing. The FPL algor ithm is defined as follows: FPL algorithm . F O R t = 1 , . . . T Output pr ediction of exp ert i = i max which maximizes s i 1: t − 1 + 1 ǫ t − 1 ξ i t , (8) where i = 1 , 2, ǫ t − 1 is defined by (7). Receive one-s tep gains s i t for exp erts i = 1 , 2, a nd define one step gain s i max t of the master algorithm. ENDF O R Recall that a game G of tw o exper ts is called non-degener ate if v t = max { s 1 1: t , s 2 1: t } → ∞ as t → ∞ , wher e s i 1: t is the cum ulative gain o f the ex pert i = 1 , 2 at step t . The n umber Dev( G ) = lim sup t →∞ s t v t , (9) where s t = max { s 1 t , s 2 t } , is ca lled the deviatio n o f the game G . F or any game Dev( G ) ≤ 1 b y definition. In any non- de g enerate game G with bo unded o ne- step g ain function, i.e. such that 0 ≤ s t ≤ A for all t ( A is a po sitiv e real nu mber ), Dev( G ) = 0 . Theorem 2. F or any µ such t hat 0 < µ < 1 an FPL algorithm c an b e sp e cifie d such that for any non- de gener ate game of two ex p erts its exp e cte d cum u lative gain at any step T has the lower b ound l 1: T ≥ e − 2 µ (1 − µ ) max i =1 , 2 s i 1: T , (10) wher e s i 1: T is the cumulative gain of the exp ert i = 1 , 2 . If Dev( G ) ≤ 1 2 µδ for some 0 < δ < 1 then l 1: T ≥ (1 − δ )(1 − µ ) max i =1 , 2 s i 1: T (11) holds for al l s ufficiently lar ge T . 1 Pr o of . This theorem will follow from Theorem 3 and Corollar y 2 b elo w. In the pro of we follow the pr o of-s c heme of [5] and [7]. △ The ana lysis of optimality o f the FPL algorithm is based on an intermediate predictor IFPL (Infeasible FPL) with the lear ning rate ǫ t = 1 µv t , (12) where v t = max { s 1 1: t , s 2 1: t } . IFPL algorithm . F O R t = 1 , . . . T Output pr ediction of exp ert i = i max with maximal v alue of s i 1: t + 1 ǫ t ξ i t , where i = 1 , 2, ǫ t is defined by (12), a nd ξ 1 t , ξ 2 t are indep enden t r andom v a riables distributed according to the expo nen tial distribution with the density p ( t ) = e − t . 1 The optimal v alue of µ for (10) is µ = 0 . 618. Then l 1: T ≥ 0 . 015 max i =1 , 2 s i 1: T in ( 10) and l 1: T ≥ 0 . 382(1 − δ ) max i =1 , 2 s i 1: T in (11). Comparing these b ounds with (4), we revea l a larg e gap b et ween b ound s (10) and (11). A uthor d oes not k n o w if we can increase the low er b ound (10) when µ ≈ 1 2 in (11). Receive one-s tep gains s i t for exp erts i = 1 , 2, a nd define one step gain s i max t of the master algorithm. ENDF O R The IFPL algorithm predicts under the kno wledg e of s 1 1: t and s 2 1: t ( ǫ t is their maximum), which b oth may no t b e av ailable a t be ginning of step t . Using unknown v alue of ǫ t (like s i 1: t , i = 1 , 2) is the main p eculiarity of o ur version of IFPL. T o distinguish the ga ins of the FPL and IFPL algo rithms we denote s I t a one-step ga in of the FP L algor ithm at step t and s J t is a one-step gain of the IFPL algorithm. The exp ected one-step gains of the FPL a nd IFPL alg orithms at the step t are denoted l t = E t ( s I t ) a nd r t = E t ( s J t ). Theorem 3. F or any µ , 0 < µ < 1 , the exp e cte d one-step gai n l t of the FPL algorithm with le arning r ate (7) and the exp e cte d one-st ep gain r t of the IFPL algorithm with le arning r ate (12) s at isfy the ine qualities l t ≥ e − 2 µ r t (13) for al l t . If Dev ( G ) ≤ 1 2 µδ for some 0 < δ < 1 t hen l t ≥ (1 − δ ) r t (14) holds for al l s ufficiently lar ge t . Pr o of . F or any t > 0, denote ξ 1 = ξ 1 t , ξ 2 = ξ 2 t and consider tw o random v ariables I = 1 if s 1 1: t − 1 + 1 ǫ t − 1 ξ 1 > s 2 1: t − 1 + 1 ǫ t − 1 ξ 2 2 other wise and J = 1 if s 1 1: t + 1 ǫ t ξ 1 > s 2 1: t + 1 ǫ t ξ 2 2 otherwise Recall that v t = max { s 1 1: t , s 2 1: t } for a ll t . F or any r eal n umber r we compare conditional probabilities P { I = 1 | ξ 2 = r } with P { J = 1 | ξ 2 = r } and P { I = 2 | ξ 2 = r } with P { J = 2 | ξ 2 = r } . In our analysis, the nont r ivial cases are s 2 1: t = s 2 1: t − 1 + s t and s 1 1: t = s 1 1: t − 1 or s 1 1: t = s 1 1: t − 1 + s t and s 2 1: t = s 2 1: t − 1 , wher e s t > 0 (w e indicate thes e cases in (16)-(19) below by ± ). In this case the following c ha in of equalities is v alid: P { I = 1 | ξ 2 = r } = P { s 1 1: t − 1 + 1 ǫ t − 1 ξ 1 > s 2 1: t − 1 + 1 ǫ t − 1 r | ξ 2 = r } = P { ξ 1 > ǫ t − 1 ( s 2 1: t − 1 − s 1 1: t − 1 ) + r | ξ 2 = r } = P { ξ 1 > ǫ t ( s 2 1: t − 1 − s 1 1: t − 1 ) + ( ǫ t − 1 − ǫ t )( s 2 1: t − 1 − s 1 1: t − 1 ) + r | ξ 2 = r } = (15) e − ( ǫ t − 1 − ǫ t )( s 2 1: t − 1 − s 1 1: t − 1 ) P { ξ 1 > 1 µv t ( s 2 1: t − 1 − s 1 1: t − 1 ) + r | ξ 2 = r } = (16) e − 1 µv t − 1 − 1 µv t ( s 2 1: t − 1 − s 1 1: t − 1 ) ± s t µv t P { ξ 1 > 1 µv t ( s 2 1: t − 1 ± s t − s 1 1: t − 1 ) + r | ξ 2 = r } = (17) e s t µv t γ t s 1 1: t − 1 − s 2 1: t − 1 v t − 1 ± 1 P { ξ 1 > 1 µv t ( s 2 1: t − s 1 1: t ) + r | ξ 2 = r } = (18) e s t µv t γ t s 1 1: t − 1 − s 2 1: t − 1 v t − 1 ± 1 P { J = 1 | ξ 2 = r } . (19) Here we have used twice, in (1 5)-(16) and in (16)-(17), the equa lity P { ξ > a + b } = e − b P { ξ > a } for an y rando m v ariable ξ distributed a ccording to the exp onen tial law; we also used the equality v t − v t − 1 = γ t s t , where 0 ≤ γ t ≤ 1, in (1 6). The exp onen t (19) is b ounded 2 ≥ γ t s 1 1: t − 1 − s 2 1: t − 1 v t − 1 ± 1 ≥ − 2 . (20) These b ounds follow fro m the inequalities s i 1: t − 1 /v t − 1 ≤ 1 and s i 1: t − 1 ≥ 0 for a ll t a nd for i = 1 , 2. W e also used the inequa lit y s t /v t ≤ 1 for a ll t . Therefore, e 2 µ P { J = 1 | ξ 2 = r } ≥ P { I = 1 | ξ 2 = r } ≥ e − 2 µ P { J = 1 | ξ 2 = r } . (21) Since, the the inequality (21) ho lds for all r , it also holds unconditiona lly e 2 µ P { J = 1 } ≥ P { I = 1 } ≥ e − 2 µ P { J = 1 } . (22) Analogously , we obtain e 2 µ P { J = 2 } ≥ P { I = 2 } ≥ e − 2 µ P { J = 2 } (23) for a ll t = 1 , 2 , . . . . If Dev ( G ) ≤ 1 2 µδ , w hen for sufficiently larg e t the ex p onent (19) is bo unded from b elow by e − 2 µ s t v t ≥ e − δ ≥ 1 − δ. F rom this (14) follows. F rom (22 ) and (23) we obtain the low er bo und (13) l t = E ( s I t ) = s 1 t P ( I = 1) + s 2 t P ( I = 2) ≥ s 1 t e − 2 µ P ( J = 1 ) + s 2 t e − 2 µ P ( J = 2 ) = e − 2 µ E ( s J ) = e − 2 µ r t . (24) △ The connectio n b et ween exp ected cumulativ e g ain of the IFPL algo rithm r 1: T = T X t =1 r t and exp ected cumulativ e g a in of the FP L algor ithm l 1: T = T X t =1 l t . is given in the following corollar y . Corollary 1. F or any µ and η , 0 < µ, η < 1 , the exp e cte d cumu lative gains of the IFPL and FPL algorithms with p ar ameters define d in The or em 3 s atisfy the fol lowing ine qualities l 1: T ≥ e − 2 µ r 1: T (25) for al l T . If Dev( G ) ≤ 1 2 µδ for some 0 < δ < 1 then l 1: T ≥ (1 − δ ) r 1: T holds for al l s ufficiently lar ge T . The seco nd bound also holds for unbounded one-step gain games and so, it is some improvemen t of results of [7 ] and [5]. The following theo r em, whic h is a n analogue of the result fro m [7 ], g iv es a bo und for the I FP L algor ithm Theorem 4. The exp e cte d cumulative gain of the IFPL algorithm with the le arn- ing r ate (12) is b ounde d by r 1: T ≥ ma x i =1 , 2 s i 1: T − 1 ǫ T (26) for al l T . The pro of is along the line o f the pro of from [5] (which is a refinement of the pro of from [7]). Let in this pro of s t = ( s 1 t , s 2 t ) b e a v ector of one step gains a nd s 1: t = ( s 1 1: t , s 2 1: t ) b e a v ector of cumulativ e ga ins of t wo e x perts, also let ξ be a vector whose c o or dinates are random v ariables ξ 1 t and ξ 2 t . Define ǫ 0 = ∞ and ˜ s 1: t = s 1: t + 1 ǫ t ξ t for t = 1 , 2 , . . . . Consider the o ne-step gains ˜ s t = s t + ξ t 1 ǫ t − 1 ǫ t − 1 for the moment. F or any vector s and a unit vector d denote M ( s ) = a rgmax d ∈ D { d ◦ s } , where D = { (0 , 1) T , (1 , 0) T } is the set of tw o unit vectors of dimensio n 2 and ◦ is the inner pro duct of tw o v ec tors. W e firs t show that T X t =1 M ( ˜ s 1: t ) ◦ ˜ s t ≥ M ( ˜ s 1: T ) ◦ ˜ s 1: T . (27) F or T = 1 this is obvious. F or the induction step fro m T − 1 to T w e need to show that M ( ˜ s 1: T ) ◦ ˜ s T ≥ M ( ˜ s 1: T ) ◦ ˜ s 1: T − M ( ˜ s 1: T − 1 ) ◦ ˜ s 1: T − 1 . This follows from ˜ s 1: T = ˜ s 1: T − 1 + ˜ s T and M ( ˜ s 1: T ) ◦ ˜ s 1: T − 1 ≤ M ( ˜ s 1: T − 1 ) ◦ ˜ s 1: T − 1 . W e r e w r ite (2 7 ) as follows T X t =1 M ( ˜ s 1: t ) ◦ s t ≥ M ( ˜ s 1: T ) ◦ ˜ s 1: T − T X t =1 M ( ˜ s 1: t ) ◦ ξ t 1 ǫ t − 1 ǫ t − 1 . (28) By the definition of M we hav e M ( ˜ s 1: T ) ◦ ˜ s 1: T ≥ M s 1: T + 1 ǫ T ◦ s 1: T + ξ ǫ T = max d { d ◦ s 1: T } + M s 1: T + 1 ǫ T ◦ ξ T ǫ T . (29) The expec tation of the la st term in (29) is equal to 1 ǫ T . W e hav e a lso T X t =1 1 ǫ t − 1 ǫ t − 1 M ( ˜ s 1: t ) ◦ ξ t ≤ T X t =1 1 ǫ t − 1 ǫ t − 1 M ( ξ t ) ◦ ξ t . (30) W e have P { max ξ > y } ≤ P { ξ 1 > y } + P { ξ 2 > y } = 2 e − y . Since E ( M ( ξ t ) ◦ ξ t ) = E (max { ξ 1 , ξ 2 } ) ≤ Z ∞ 0 2 e − y dy = 2 , the exp ectation o f (30) ha s upper bo und 2 ǫ T . Combining the b ounds (28)-(30) we o bta in (26). △ . Corollary 2. L et µ , 0 < µ < 1 , b e given. If the game of t wo ex p erts is non- de gener ative then the exp e cte d cumulative gain of the IFPL algorithm is b oun de d by r 1: T ≥ max i =1 , 2 s i 1: T (1 − µ ) . 4 Zero sum games W e consider a simplest example of the g ame of prediction with exper t a dvice with arbitrary p ositive and negative one-step gains a nd lo sses. W e apply these results in App endix A. W e consider a game G of tw o exper ts with zero s um, i.e., s 1 t = − s 2 t at ea c h step t of the game. If a one-step ga in is negative it is called a lo ss. There are no restrictions o n the absolute v a lues of s 1 t . Define a volume of the game at step t V t = t X j =1 | s 1 j | . A ga me w ith zero sum is ca lle d non-de gener ate if lim t →∞ V t = ∞ . Analo gously to (9) we conside r the deviation of the game G with zero sum Dev( G ) = lim sup t →∞ s t V t , where s t = | s 1 t | a nd V t is the volume of the g a me. Evidently , the exp ected cumulative ga in of the a lgorithm which cho o ses one of t wo exp erts with pro ba bilit y 1 2 equals z ero. The following pro p osition is a n analog ue o f Pr opo s ition 1 . Prop osition 2. F or any pr ob abilistic algorithm of fol lowing the b est ex p ert, two exp erts exist such that the exp e cte d cumulative gain of this algorithm E ( s 1: t ) ≤ 0 and two exp erts exist such that E ( s 1: t ) ≥ 0 for al l t . Pr o of . If P { I = 1 } > 1 2 define s 1 t = 1, s 2 t = − 1 and define s 1 t = − 1, s 2 t = 0 otherwise. The following estimates a re analogous to that g iv en in the pro of of Prop osition 1. △ The following theorem whic h is an analo gue of the Theorem 1 for games with zero sum shows tha t if a pr obabilistic a lg orithm of the following the leader has high p erforma nc e in games with b ounded one-step ga ins then its exp ected cum ula tiv e gain in some games with unbounded one - step exp ert gains can b e arbitrar y neg ativ e. Theorem 5. L et L t b e any se quenc e of p ositive r e al numb ers, t = 1 , 2 , . . . . L et δ, δ ′ b e arbitr ary close and arbitr ary smal l p ositive r e al n umb ers such that δ > δ ′ , and let for any t wo ex p erts with b ounde d one-step gains s t , i.e. such that 0 ≤ s t ≤ 1 for al l t , the exp e cte d cumulative gain of the master algorithm has the lower b ound E ( s 1: t ) ≥ (1 − δ ) | s 1 1: t | (31) for al l sufficiently lar ge t . Then t her e exist two non-de gener ate exp erts with un- b ounde d one-step gains such that the exp e cte d p erformanc e of the m ast er algo- rithm is b ounde d fr om ab ove E ( s 1: t ) ≤ 2 δ ′ | s 1 1: t | − (1 − 2 δ ′ ) V t (32) and su ch that V t ≥ L t for infinitely many t , wher e V t is t he volume of the game. Pr o of . The pro of is similar to the pro of o f Theorem 1 . It use s a mo dified version of Lemma 1 which is also v alid for negative gains with some evide nt mo difications. 2 Let a master a lg orithm b e given. W e define tw o exp erts with unbounded one-step ga ins as follows. Define s 1 1 = s 2 1 = 0. By mo dified version of Lemma 1 a num b er s 1 exists such that s 1 > 0 a nd P { I = 1 } ≥ 1 − δ , where P { I = 1 } = f ( s 1 , − s 1 ). Let t b e even, and let s 1 1: t − 1 and s 2 1: t − 1 = − s 1 1: t − 1 be defined on pr evious steps. W e will use the induction h yp othesis: s 1 1: t − 1 > 0 a nd P { I = 1 } ≥ 1 − δ , where P { I = 1 } = f ( s 1 1: t − 1 , s 2 1: t − 1 ). Define o ne-step gains o f exp erts 1 and 2: s 1 t = − M t and s 2 t = M t , whe r e M t = max | E ( s 1: t − 1 ) | 2( δ − δ ′ ) , L t , V t − 1 δ . Let t b e o dd. B y mo dified version of Lemma 1 a num b er s 1 exists such that s 1 > | s 1 1: t − 1 | and P { I = 1 } ≥ 1 − δ , wher e P { I = 1 } = f ( s 1 , − s 1 ). Define s 1 t = s 1 − s 1 1: t − 1 , then s 1 1: t = s 1 , and s 2 t = − s 1 t . Ev iden tly , the induction hypothesis is v alid after o dd step t . Let us pro ve that this constr uction is corr ect. Let t be even. Then by the induction hypothesis s 1 1: t − 1 > 0 and P { I = 1 } ≥ 1 − δ (and P { I = 2 } < δ ). B y definition s 1 1: t = s 1 1: t − 1 − M t and s 2 1: t = s 2 1: t − 1 + M t . The exp ected one- step g ain of the master algorithm is b ounded E ( s t ) = s 1 t P { I = 1 } + s 2 t P { I = 2 } ≤ − (1 − δ ) M t + δ M t = − (1 − 2 δ ) M t . By definition L t ≤ M t ≤ V t = V t − 1 + M t ≤ (1 + δ ) M t . Then E ( s 1: t ) ≤ E ( s 1: t − 1 ) − (1 − 2 δ ) M t ≤ − (1 − 2 δ ′ ) M t = − M t + 2 δ ′ M t ≤ − (1 − δ ) V t + 2 δ ′ | s 1 1: t | for a ll even steps t . △ W e co nsider the non- de gener ate games , i.e., such that V t is unbounded. T o obtain the low er b ounds w e r e duce our zero sum g a me to a game with non-negative one-step g ains. Define one-step g ain of new e x perts ˜ s i t = s i t + | s 1 t | for i = 1 , 2. Then ˜ s i t ≥ 0 for all t and ˜ s 1 t = 0 or ˜ s 2 t = 0 for all t . By definition ˜ s i 1: t = s i 1: t + V t for i = 1 , 2, where V t is the volume of the initial game. Ev ide ntly , the FPL and IFPL algor ithms defined in Section 3 make the same choices for exp erts of b oth type. The exp ected one-s tep gains of the mas ter algo rithm for fo r exp erts of bo th t yp e satisfy ˜ l t = s 1 t P { I = 1 } + s 2 t P { I = 1 } + | s t | . This implies the equa lit y ˜ l 1: t = l 1: t + V t for exp ected cumulativ e gains . The analogous equa lities hold for ˜ r t , ˜ r 1: t and r t , r 1: t . The following theore m is a cor ollary of Theor em 2. 2 A modifi ed version of Lemma 1 lo oks as follo ws: Let δ, δ ′ b e p ositive real numbers such that δ ′ > δ , and let for any tw o exp erts with b ounded one-step gains (31) h olds for all su fficien tly large t . Then for any num b er ˜ s 1 a number s 1 > 0 exists such that s 1 ≥ ˜ s 1 and P { I = 1 } ≥ 1 − δ ′ , where P { I = 1 } = f ( s 1 , − s 1 ). Theorem 6. F or any µ such t hat 0 < µ < 1 an FPL algorithm c an b e sp e cifie d such that for any non- de gener ate game of two ex p erts its exp e cte d cum u lative gain at any step T has the lower b ound l 1: T ≥ e − 2 µ (1 − µ ) | s 1 1: T | − V T (1 − e − 2 µ (1 − µ )) , (33) wher e s 1 1: T is the cumulative gain of the t he first ex p ert and V T is the volume of the game at step T . If Dev( G ) ≤ 1 2 µδ for some 0 < δ < 1 then l 1: T ≥ (1 − δ )(1 − µ ) | s 1 1: T | − ( δ + µ ) V T (34) holds for al l s ufficiently lar ge T . Pr o of . This theorem follows from Theorem 2 and relations b et ween o ne-step gains ˜ s i t and s i t , i = 1 , 2, of tw o t yp e of exp erts. △ Remark . In case when Dev( G ) ≤ 1 4 µδ , the b ound (34 ) can b e improv ed for some t if we r eplace in Section 3 the learning rate (7) on ǫ t − 1 = 1 µ ma x j 1 2 (a smo other trend). Exp ert 2 uses the hypo thesis that the Hurst exp onent is < 1 2 (volatilit y is high). It is reasonable to derandomize the FPL algor ithm for this financial g ame. F or that, the investor must follow b oth exp erts strategies sim ulta ne o usly holding P { I = 1 } C 1 t + P { I = 2 } C 2 t shares of a sto c k at any step t . In this case Theorem 6 holds, where the exp ected gain at step t is re placed on a pure ga in s t = P { I = 1 } C 1 t ∆S t + P { I = 2 } C 2 t ∆S t . 4 References 1. Cham y Allen b erg, P eter Auer, Laszlo Gy orfi and Gy orgy Ottucsak: Hannan Consis- tency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring. LNCS, V olume 4264, 229-243, Sp ringer-V erlag Berlin Heidelb erg 2006. 2. Delbaen F., Schac hermay er W.: A general version of the fundamental theorem of asset pricing. Mathematische Annalen, 300 (1994), 463-520. 3 W e su ppose that th is price is also v alid at the b eginning of the p erio d t + 1. 4 Analogously we can derandomize all probabilistic games of th is pap er if we allo w for Learner to receive a given fraction of the gain of an exp ert. 3. Cheredito P .: Arbitrage in fractional Brownia n motion, Finance and St atistics, 7 (4) (2003), 533-553. 4. Hannan J.: Ap p ro ximation to Bay es risk in rep eated p lays. In M. Dresher, A.W. T uck er, and P . W olfe, editors, Con tributions to the Theory of Games 3, 97-13 9, Princeton Universit y Press, 1957. 5. Hutter M., Poland J.: Prediction with exp ert ad v ice b y follo wing the p erturb ed leader for general w eights, (S .Ben-Da wid, J.Case , A.Maruok a (Eds.)): AL T 2004 LNAI 3244, 279-293. Springer-V erlag Berlin Heidelb erg 2004. 6. P oland J., Hutter M.: Defensive u niv ersal learning with exp erts. for general weigh t. (S.Jain, H.U.Simon an d E.T omita (Eds.)): A L T 2005 (S.Jain, H.U.Simon and E.T omita (Eds.)), LNAI 3734, 356-370. S pringer-V erlag Berlin Heidelb erg 2005. 7. Kalai A., V emp ala S.: Efficient algorithms for online decisions. In Pro ceedings of t he 16th Annual Conference on Learning Theory (COL T-2003), LN AI, 506-521, Berlin, 2003, Springer. Extend ed version in Journal of Computer and S ystem Sciences, 71, 2005, 291-307. 8. Rogers C.: Arbitrage with fractional Bro wnian motion. Mathematical Finance, 7 (1997), 95-105. 9. V o v k V.: A game-theoretic explanation of t h e √ dt effect, W orking pap er #5, 2003, http://w ww.probability andfin ance.com
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment