A Variant of Azumas Inequality for Martingales with Subgaussian Tails
We provide a variant of Azuma's concentration inequality for martingales, in which the standard boundedness requirement is replaced by the milder requirement of a subgaussian tail.
Authors: Ohad Shamir
A V ariant of Azuma’ s Inequality for Martingales with Subgaussian T ails Ohad Shamir Microsoft Research New England ohadsh@mic rosoft.com A sequence o f rando m v ariables Z 1 , Z 2 , . . . is called a martingale differ ence sequence with respect to another sequence of random variables X 1 , X 2 , . . . , if for any t , Z t +1 is a function of X 1 , . . . , X t , and E [ Z t +1 | X 1 , . . . , X t ] = 0 with probability 1 . Azuma’ s inequ ality is a useful concentration bound for martingales. Here is one possible formulation of it: Theorem 1 (Azuma’ s Ine quality) . Let Z 1 , Z 2 , . . . be a martingale differ ence sequence with r espect to X 1 , X 2 , . . . , and suppose ther e is a constant b such that for any t , Pr( | Z t | ≤ b ) = 1 . Then for any positive inte ger T and any δ > 0 , it holds with pr obab ility at least 1 − δ that 1 T T X t =1 Z t ≤ b r 2 log(1 /δ ) T . Sometimes, for the martingale we have at han d, Z t is not boun ded, but rather b ounde d with hig h pr obability . In particular, suppose we can show tha t the pr obability of Z t being larger than a ( and smaller th an − a ), c ondition ed on any X 1 , . . . , X t − 1 , is on the ord er of exp( − Ω( a 2 )) . Random variables with this behavior are referred to as having subgaussian tails (since their tails decay at least as fast as a Gaussian random v ariable) . Intuitively , a variant o f Azu ma’ s inequality for th ese ‘alm ost-boun ded’ martingales should still h old, and is p rob- ably kn own. Howe ver , we were n’t able to find a convenient reference for it, and the g oal of th is technica l report is to formally provide such a result: Theorem 2 (Azuma’ s Inequality for Martingale s with Subgaussian T ails) . Let Z 1 , Z 2 , . . . , Z T be a martingale dif fer- ence sequence with r espect to a seq uence X 1 , X 2 , . . . , X T , and suppose ther e ar e constants b > 1 , c > 0 such that for any t and any a > 0 , it holds that max { Pr ( Z t > a | X 1 , . . . , X t − 1 ) , Pr ( Z t < − a | X 1 , . . . , X t − 1 ) } ≤ b e xp( − ca 2 ) . Then for any δ > 0 , it holds with pr obability at least 1 − δ that 1 1 T T X t =1 Z t ≤ r 28 b log(1 / δ ) cT . Pr oof of Thm. 2 W e b egin by proving the following lem ma, wh ich bou nds the momen t generating fu nction of subgau ssian ran dom variables. 1 It is quite lik ely that the numerical constant in the bound can be improv ed. 1 Lemma 1 . Let X b e a random variable with E [ X ] = 0 , an d sup pose th er e e xist a constan t b ≥ 1 and a constant c such that for all t > 0 , it holds that max { Pr( X ≥ t ) , Pr( X ≤ − t ) } ≤ b exp( − ca 2 ) . Then for any s > 0 , E [ e sX ] ≤ e 7 bs 2 /c . Pr oof. W e begin by noting that E [ X 2 ] = Z ∞ t =0 Pr( X 2 ≥ t ) dt ≤ Z ∞ t =0 Pr( X ≥ √ t ) dt + Z ∞ t =0 Pr( X ≤ − √ t ) dt ≤ 2 b Z ∞ t =0 exp( − ct ) dt = 2 b c Using this, the fact that E [ X ] = 0 , and the fact that e a ≤ 1 + a + a 2 for all a ≤ 1 , we ha ve that E [ e sX ] = E e sX X ≤ 1 s Pr X ≤ 1 s + ∞ X j =1 E e sX j < sX ≤ j + 1 Pr ( j < sX ≤ j + 1) ≤ E 1 + sX + s 2 X 2 sX ≤ 1 Pr ( sX ≤ 1 ) + ∞ X j =1 e j +1 Pr X > j s ≤ 1 + 2 bs 2 c + b ∞ X j =1 e 2 j − cj 2 /s 2 . (1) W e now need to bo und the series P ∞ j =1 e j (2 − cj /s 2 ) . If s ≤ √ c/ 2 , we have 2 − cj s 2 ≤ − c 2 s 2 ≤ − 2 for all j . Th erefor e, the series can be upper bounde d by the conver gen t geometric series ∞ X j =1 e − c/ (2 s 2 ) j = e − c/ (2 s 2 ) 1 − e − c/ (2 s 2 ) < 2 e − c/ (2 s 2 ) ≤ 4 s 2 /c, where we used the upp er boun d e − c/ (2 s 2 ) ≤ e − 2 < 1 / 2 in the second transition , and the last transition is by the inequality e − x ≤ 1 x for all x > 0 . Overall, we get that if s ≤ √ c/ 2 , then E [ e sX ] ≤ 1 + 2 bs 2 c + b 4 s 2 c ≤ e 6 bs 2 /c . (2) W e will now dea l with the case s > √ c/ 2 . For all j > 3 s 2 /c , we have 2 − j c/ s 2 < − 1 , so the tail of the series satisfies X j > 3 s 2 /c e j (2 − j c/s 2 ) ≤ ∞ X j =0 e − j < 2 < 8 s 2 c . Moreover , the fun ction j 7→ j (2 − j c/s 2 ) is ma ximized at j = s 2 /c , and theref ore e j (2 − j c/s 2 ) ≤ e s 2 /c for all j . Therefo re, t he initial pa rt of the series is at most ⌊ 3 s 2 /c ⌋ X j =1 e j (2 − j c/s 2 ) ≤ 3 s 2 c e s 2 /c ≤ e s 2 /ec e s 2 /c ≤ e (1+1 /e ) s 2 /c , where the second to last transition is from the fact that a ≤ e a/e for all a . 2 Overall, we get that if s > √ c/ 2 , then E [ e sX ] ≤ 1 + 10 bs 2 c + be (1+1 /e ) s 2 /c ≤ e 7 bs 2 /c , (3) where the last transition fo llows from the easily verified fact that 1 + 10 ba + e (1+1 /e ) ba ≤ e 7 ba for any a ≥ 1 / 4 , and indeed bs 2 /c ≥ 1 / 4 by the assum ption on s an d the assump tion that b ≥ 1 . Com bining Eq. (2) and Eq . (3) to handle the different cases of s , t he result follows. After proving the lemma, we tu rn to the proof of Thm. 2. Pr oof of Thm. 2. W e pro ceed b y the standard Che rnoff metho d. Using Mar kov’ s inequ ality and Lemm a 1, we have for any s > 0 tha t Pr 1 T T X t =1 Z t > ǫ ! = Pr e P T t =1 Z t > e sT ǫ ≤ e − sT ǫ E h e s P t Z t i = e − sT ǫ E " E " T Y t =1 e sZ t X 1 , . . . , X T ## = e − sT ǫ E " E " e sZ T T − 1 Y t =1 e sZ t X 1 , . . . , X T − 1 ## = e − sT ǫ E " E e sZ T X 1 , . . . , X T − 1 E " T − 1 Y t =1 e sZ t X 1 , . . . , X T − 1 ## ≤ e − sT ǫ e 7 bs 2 /c E " T − 1 Y t =1 e sZ t X 1 , . . . , X T − 1 # . . . ≤ e − sT ǫ +7 T bs 2 /c . Choosing s = cǫ/ 14 b , the expression above equals e − cT ǫ 2 / 28 , and we get that Pr 1 T T X t =1 Z t > ǫ ! ≤ e − cT ǫ 2 / 28 b , setting the r . h.s. to δ and solving for ǫ , the theorem follows. Acknowledgeme nts W e thank S ´ ebastien Bubeck for pointing out a bug i n a previous version of this manu script. 3
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment