An unbiased estimate for the mean of a {0,1} random variable with relative error distribution independent of the mean

Say $X_1,X_2,\ldots$ are independent identically distributed Bernoulli random variables with mean $p$. This paper builds a new estimate $\hat p$ of $p$ that has the property that the relative error, $\hat p /p - 1$, of the estimate does not depend in…

Authors: Mark Huber

An unbiased estimate fo r the p robab ilit y of heads on a coin where the relative erro r has a distribution i ndep endent of the coin Ma rk Hub er mhub er@cmc.edu V ersion: Octob er 18, 202 1 Abstrac t Say X 1 , X 2 , . . . ar e indep endent identically distributed Bernoulli random v aria bles with mean p , so P ( X i = 1) = p and P ( X i = 0) = 1 − p . An y estimate ˆ p of p has rela tive error ˆ p/p − 1. This pa p e r builds a new estimate ˆ p of p such that the r elative error of the estima te does no t depe nd in a ny way on the v a lue of p . This allows the eas y constr uction of exa ct confidence in terv als for p of a ny desired lev el without ne e ding any sor t of limit or approximation. In addition, ˆ p is unb iased. The exp ected num ber of Bernoulli draws use d by the alg orithm is at mos t 1 more tha n 1 − p times the num ber of draws needed if the Central Limit Theorem held exactly . F or ǫ a nd δ in (0 , 1 ), to obtain a n estimate where P ( | ˆ p/p − 1 | > ǫ ) ≤ δ , the new alg orithm takes on average at most 2 ǫ − 2 p − 1 ln(2 δ − 1 )(1 − (4 / 3) ǫ ) − 1 samples. It is also s hown that any such algor ithm that applies whenever p ≤ 1 / 2 r equires at lea st (1 / 5) ǫ − 2 (1 + 2 ǫ )(1 − δ ) ln((2 − δ ) δ − 1 ) p − 1 samples. T he sa me a lgorithm can also b e a pplied to estimate the mean o f any r a ndom v ar iable that falls in [0 , 1]. Applicatio ns of this metho dolo gy include finding exact p - v alues and es tima ting normalizing constants and Ba y es’ F actors using acceptance/r ejection. 1 Intro duction Sa y X 1 , X 2 , X 3 , . . . are indep endent, iden tically distributed (iid) Bernoulli random v ariables w ith mean p . W rite X i ∼ B ern ( p ) to denote P ( X i = 1) = p and P ( X i = 0) = 1 − p . The purp ose of this wo rk is to pr esent a new algorithm for estimating p w ith ˆ p so that the relativ e error ˆ p/p − 1 has a kno wn distribution that do es not dep end on the v alue of p . In ot her w ords, 1 with this algorithm it is p ossible to compute P ( a ≤ ˆ p/p − 1 ≤ b ) exactly for any a ≤ 0 ≤ b , w ith ou t n eeding any kind of approxi mation or limiting b ehavio r. This problem of estimating p , whic h is also kno wn as estimating the parameter of a bin omial giv en a large sample, arises in a w ide div ersit y of con texts. Examples include estimating the p ercen tage of farms growing a particular crop [9], estimati ng the prev alence of a disease in a p opulation [13, 12], and any situation where it is desirable to kno w the p ercen tage of a p opulation with a sp ecific prop erty . Another application is in exact p -v alues. G iv en a statistical mo d el and a statistic, let “heads” b e w hen the statistic applied to a dra w from the mo del is m ore u n usual than the same statistic applied to the data, and all other eve n ts are “tails.” Then the p -v alue for the d ata is ju st the probabilit y of h eads on the coin. This allo ws estimation of th e exact p -v alue for an y statistica l mo del th at can b e s im ulated from by flipp ing coins. Mo dels wh ere this has b een applied includ e testing if a p opulation is in Hardy-W ein b erg equilibrium [4, 5], th e Rasc h mo del [1], t w o-sample sur viv al data [14], and nonparametric testing of sequen tial asso ciation [10]. In theoretical compu ter science, man y problems of appr oximati on can b e reduced to the problem of estimating the binomial parameter. In particular, appro ximating the p ermanent of a matrix with p ositiv e entries [6], the num- b er of solutions to a disjunctive normal form expression [7], th e volume of a con v ex b o dy [8], estimating exact p -v alues for a mo del (see for instance [5]) and coun ting the lattice p oin ts inside a p olytop e can all b e put into this framew ork. In general, anywhere an acce ptance rejectio n metho d is used to build an appro ximation algorithm, this problem arises. The cost here i s u sually dominated by the n um b er of B ern ( p ) flips of the coin that are n eeded, and s o the fo cus h ere is on minimizing the exp ected n um b er of suc h flips needed. Definition 1. S upp ose A is a f u nction of X 1 , X 2 , . . . iid ∼ Bern ( p ) and auxil- iary randomness (represen ted b y U ∼ U nif ([0 , 1])) that outputs ˆ p . Let T b e a stopping time with resp ect to the natural filtration so that the v alue of ˆ p only dep ends on U and X 1 , . . . , X T . Then call T the running time of the algorithm. The simplest algorithm for estimating p just fixes T = n , and sets ˆ p n = X 1 + X 2 + · · · + X n n . 2 In this case ˆ p n has a b inomial distribu tion with parameters n and p . Th e standard deviation of ˆ p n is p p (1 − p ) /n. Therefore, to get an estimate w h ic h is close to p in the sens e of ha ving small relativ e err or, k should b e of the form C /p (for some constan t C ) so that the standard deviation is p p (1 − p ) /C and so roughly pr op ortional to p . F rom the Cen tral Limit T heorem, roughly 2 ǫ − 2 ln(2 /δ ) /p samples are necessary to get ˆ p n /p ∈ [1 − ǫ, 1 + ǫ ] for ǫ ∈ (0 , 1). (See S ection 4 for a more detailed form of this argument.) But p is unkno wn at the b eginning of the algorithm! Dagum, K arp, Luby an d Ross [3] d ealt w ith this circularit y problem w ith their stopping rule algorithm. In this context of Bern ( p ) rand om v ariables, their algorithm can b e written as follo ws. Fix ( ǫ, δ ) with ǫ ∈ (0 , 1) and δ > 0. Let T b e the s m allest intege r suc h that X 1 + · · · + X T ≥ 1 + (1 + ǫ )4( e − 2) l n(2 /δ ) ǫ − 2 . Then ˆ p DKLR = ( X 1 + · · · + X T ) /T . Call this metho d D KLR . They sh ow ed the f ollo wing result for their estimate (Stopping Rule Theorem of [3]). P (1 − ǫ ≤ ˆ p DKLR /p ≤ 1 + ǫ ) > 1 − δ, (1) and E [ T ] ≤ [1 + (1 + ǫ )4( e − 2) ln(2 /δ ) ǫ − 2 ] /p . They also sho w ed that an y such ( ǫ, δ ) appr o ximation algorithm that applies to all p ∈ [0 , 1 / 2] (Lemma 7.5 of [3]) must satisfy E [ T ] ≥ (4 e 2 ) − 1 (1 − δ )(1 − ǫ ) 2 (1 − p ) ǫ − 2 ln( δ − 1 (2 − δ )) . There are sev eral drawbac ks to DKLR . First, th e factor of 4( e − 2) (whic h is ab out 2.873) in the r u nning time of DKLR is an artifact of the analysis rather than coming from the problem itself. As ment ioned b efore, a heuristic Cen tral Limit Theorem argum ent (see Section 4) indicates that the corr ect factor in th e ru nning time should b e 2. Second, th e DKLR estimate is biased. Our alg orithm has a form simila r to DKLR , but with a con tinuous mod- ification that yields sev eral desirable b enefits. The DKLR estimate ( X 1 + · · · + X T ) /T is a fixed in tege r divided b y a negativ e bin omial random v ari- able. In th e alg orithm prop osed here, the estimat e is a fixed intege r divided b y a Gamma random v ariable. Since Gamma rand om v ariables are scalable, the relativ e error of the estima te d o es not dep end on the v alue of p . This allo ws a muc h tigh ter analysis of the error, since the v alue of p is n o longer an issue. In particular, th e algorithm attains (to first ord er) the 2 ǫ − 2 p − 1 ln(2 δ − 1 ) running ti me that is lik ely the b est p ossible. Th e new algorithm is called the Gamma Bernoulli appro ximation sc heme ( GBAS ). 3 Theorem 1. The GBAS metho d of Se ction 2 , f or any inte ger k ≥ 2 , outp uts an estimate ˆ p using T samples wher e E [ T ] = k /p , E [ ˆ p ] = p , and p / ˆ p has a Gamma distribution with shap e p ar ameter k and r ate p ar ameter k − 1 . The density of the r elative err or ˆ p/p − 1 is ( k − 1) k ( k − 1)! · exp( − ( k − 1) / ( s + 1)) ( s + 1) k +1 for s ≥ − 1 . In p articular, for ǫ ∈ (0 , 3 / 4) , δ ∈ (0 , 1) , and k = ⌈ 2 ǫ − 2 p − 1 ln(2 δ − 1 )(1 − (14 / 3) ǫ ) − 1 ⌉ , then P ( − ǫ ≤ ( ˆ p/p ) − 1 ≤ ǫ ) > 1 − δ . T o understand the effectiv eness of the new estimate, supp ose that in f act the v alue of p w as known exactly . Th en for a giv en n , the p robabilit y that the relativ e err or w as at least ǫ could b e calculated exactly , and the smallest v alue of n that makes this pr obabilit y b elo w δ w ould b e used . T he table b elo w presents to four significan t digits the num b er of samples used by the new algorithm, by DKLR and by using the optimal v alue for n assuming that p w as kn o wn ahead of time. The fi n al column giv es the exp ected num b er used b y the new metho d o ver the n u m b er needed b y the exact binomial approac h. ( ǫ, δ ) p New metho d DKLR Exact Bin. New/Exact (0 . 1 , 0 . 05 ) 0.05 7700 23340 7299 1.067 (0 . 1 , 0 . 05 ) 0.01 3 . 850 · 10 4 11 . 67 · 10 4 3 . 755 · 10 4 1.025 (0 . 1 , 10 − 6 ) 0.05 5 . 122 · 10 4 9 . 174 · 10 4 4 . 551 · 10 4 1.125 (0 . 1 , 10 − 6 ) 0.01 2 . 561 · 10 5 4 . 587 · 10 5 2 . 375 · 10 5 1.078 (0 . 01 , 0 . 05) 0.05 7 . 683 · 10 5 21 . 41 · 10 5 7 . 280 · 10 5 1.055 (0 . 01 , 0 . 05) 0.01 3 . 842 · 10 6 10 . 70 · 10 6 3 . 795 · 10 4 1.012 (0 . 01 , 10 − 6 ) 0.05 4 . 790 · 10 6 8 . 240 · 10 6 4 . 545 · 10 6 1.054 (0 . 01 , 10 − 6 ) 0.01 2 . 395 · 10 7 4 . 210 · 10 7 2 . 369 · 10 7 1.011 It is imp ortan t to note that the exact binomial column is n ot an ac- tual algorithm. Th is is b ecause to use this would require the kn o wledge of the exact v alue of p , wh ic h is the unkn o wn that we are tryin g to find . In some sense, this represen ts the optimal num b er of dra ws p ossible necessary to ac hieve ( ǫ, δ ) p erformance. The fact that the runn ing time of the new estimate comes so clo se to the optimal num b er of dra ws without needing to kno w p is one of the grea t strengths of the new approac h. 4 In [3] a lo w er b ound for the num b er of samples that any metho d w ould require wa s giv en in the more general case of [0 , 1] random v ariables. F or { 0 , 1 } random v ariables, this can b e imp ro v ed. The f ollo wing theorem is pro v ed in Section 4. Theorem 2. F or ǫ > 0 and δ ∈ (0 , 1) any algorithm that r eturns ˆ p for p ∈ [0 , 1 / 2] satisfying P ( − ǫ ≤ ( ˆ p/p ) − 1 ≤ ǫ ) > 1 − δ must have E [ T ] ≥ (1 / 5) ǫ − 2 (1 + 2 ǫ )(1 − δ ) ln ((2 − δ ) δ − 1 ) p − 1 . As ǫ and δ go to 0, the r atio b et ween the upp er and lo we r b ound s co n- v erges to 10 f or these results. F rom Central Limit Theorem considerations, it is lik ely that the upp er b oun d constan t of 2 is the correct one (see Section 4). 2 The GBAS Algori thm The algorithm is based up on prop erties of a one dimensional Po isson p oint pro cess. W rite Exp ( λ ) for the exp onential d istribution with rate λ and mean 1 /λ . So A ∼ Exp ( λ ) has densit y f A ( t ) = λ exp( − λt ) · 1 ( t ≥ 0). Here 1 (expression) d enotes the indicator f u nction that ev aluates to 1 if the ex- pression is true and is 0 otherwise. Let A 1 , A 2 , . . . b e iid Exp ( λ ) random v ariables. Set T i = A 1 + · · · + A i . Then P = { T i } ∞ i =1 is a one dimensional Poisson p oint pr o c ess of r ate λ . The sum of exponential rand om v ariables is well known to be a Gamma distributed random v ariable. (It is a lso called the Erlang distribution.) F or all i , the distribu tion of T i is Gamma with sh ap e parameter i and rate parameter λ . The densit y of this random v ariable is f T i ( t ) = [( i − 1)!] − 1 λ i t i − 1 exp( − tλ ) 1 ( t ≥ 0) . W rite T i ∼ G amma ( i, λ ). The k ey prop ert y u sed b y the algorithm is thinning where eac h p oin t in P is retained indep endently with p robabilit y p . Th e result is a new Poisson p oint process P ′ whic h has rate λp . (See for instance [11, p. 320 ].) The intuition is as follo w s. F or a P oisson p oin t pro cess of rate λ , the c h ance that a p oint in P lies in an interv al [ t, t + h ] is app ro ximately λh , while the chance that a p oin t in P ′ lies in in terv al [ t, t + h ] = λph since p oints are only r etained with probabilit y p . Hence the new rate is λp . F or completeness the next lemma ve rifies this fact directly b y establishing that the distribution of the minim um p oin t in P ′ is Exp ( λp ). 5 Lemma 1. L et G ∼ Geo ( p ) so for g ∈ { 1 , 2 , . . . } , P ( G = g ) = (1 − p ) g − 1 p . L et A 1 , A 2 , . . . iid ∼ A wher e A ∼ Exp ( λ ) . Then A 1 + A 2 + · · · + A G ∼ Exp ( λp ) . Pr o of. G has moment generating function M G ( t ) = E [exp( − tG )] = pe t / (1 − (1 − p ) e t ) w hen t < − ln(1 − p ). The moment generating f u nction of A is M A ( t ) = E [exp( − tA )] = λ ( λ − t ) − 1 when t < λ . The momen t generating function of A 1 + · · · + A G is the comp osition M G (ln( M A ( t ))) = pλ ( λ − t ) − 1 1 − (1 − p ) λ ( λ − t ) − 1 = pλ pλ − t , when t < pλ , and so A 1 + · · · + A G ∼ Exp ( λp ). Another usefu l fact is that exp onent ial distr ib utions (and so Gamma distributions) scale easily . Lemma 2. L et X ∼ Gamma ( a, b ) . Then for β ∈ R , β X ∼ Gamma ( a, β − 1 b ) . Pr o of. The m oment generating f unction of X is M X ( t ) = [ b/ ( b − t )] a for t < b , so that of β X is E [exp( − tβ X )] = M X ( β t ) = [ b/ ( b − β t )] a = [ β − 1 b/ ( β − 1 b − t )] a exactly the momen t generating function of a Gamma ( a, β − 1 b ). T ogether these results giv e the GBAS approac h. GBAS Input: k 1) S ← 0, R ← 0. 2) Rep eat 3) X ← Bern ( p ), A ← Exp (1) 4) S ← S + X , R ← R + A 5) Un til S = k 6) ˆ p ← ( k − 1) /R Lemma 3. The output ˆ p of GBA S satisfies p ˆ p ∼ G amma ( k , k − 1) , and E [ ˆ p ] = p . The numb er of B ern ( p ) c al ls T in the algorithm satisfies E [ T ] = k /p . The r e lative err or ( ˆ p/p ) − 1 has density f ( s ) = ( k − 1) k ( k − 1)! exp( − ( k − 1) / ( s + 1)) ( s + 1) k +1 for s ≥ − 1 . 6 Pr o of. F rom Lemma 1, the distribu tion of R is G amma ( k, p ). F rom Lemma 2, the distribution of p/ ˆ p = p R/ ( k − 1) is Gamma ( k , k − 1). Hence E [ ˆ p ] = E [ p/X ] where X ∼ Gamma ( k , k − 1). No w E [1 /X ] = Z ∞ 0 1 s ( k − 1) k ( k − 1)! s k − 1 exp( − ( k − 1) s ) ds = ( k − 1) k ( k − 1)! Z ∞ 0 s k − 2 exp( − ( k − 1) s ) ds = ( k − 1) k ( k − 1)! · ( k − 2)! ( k − 1) k − 1 = k − 1 k − 1 = 1 , so E [ ˆ p ] = E [ p/X ] = p . Since T , the num b er of Bern ( p ) dra wn b y the algorithm, is the sum of k geometric random v ariables (eac h with mean 1 /p ), T has mean k /p . The densit y of ( ˆ p/p ) − 1 f ollo ws from the fact that p / ˆ p has a Ga mma ( k, k − 1) distribution. Note that for giv en k and a , P ( ˆ p/p ≤ a ) can b e compu ted exactly in Θ( k ) floating p oint op erations using the incomplete gamma function. Hence for a giv en error b ound and accuracy requirement, it is p ossib le to exactly find the minimum k usin g less w ork than flipping k /p coins. Supp ose the user desires the absolute relat iv e error to be greater than ǫ with probabilit y at most δ . The ea siest wa y to compu te this is to note P ( | ˆ p/p − 1 | > ǫ ) = P ( p/ ˆ p < (1 + ǫ ) − 1 or p/ ˆ p > (1 − ǫ ) − 1 ) . No w p/ ˆ p ∼ Gamma ( k, k − 1), so it remains to fin d the smallest v alue of k that w orks for giv en ǫ and δ . F or instance, if ǫ = 0 . 1 (so p is d esired to one significan t fi gure) and δ = 0 . 05, then k = 388 is th e smallest v alue th at pro vides the guarante e. Hence 388 /p is the exp ected runnin g time (see the table in the in tro duction.) 3 Upp er b ou nds on k Supp ose a user wan ts to find k so that P ( a ≤ ˆ p/p − 1 ≤ b ) ≥ c. Then since P ( a ≤ ˆ p/p − 1 ≤ b ) = P ((1 + a ) − 1 ≥ p / ˆ p ≥ (1 + b ) − 1 ) , 7 and p/ ˆ p ∼ Gamma ( k, k − 1), it suffices to find the smallest v alue of k su ch that for X ∼ Gamma ( k , k − 1), P ((1 + a ) − 1 ≥ X ≥ (1 + b ) − 1 ) ≥ c. Th is is ho w the v alues in the table in the in trod uction where computed. That b eing said, it is u s eful to ha ve a simple f unction f ( ǫ, δ ), suc h th at if k ≥ f ( ǫ, δ ) and X ∼ Gamma ( k , k − 1), P ((1 − ǫ ) − 1 ≥ X ≥ (1 + ǫ ) − 1 ) > 1 − δ . In particular, suc h a fun ction exists for DKLR , and ha ving suc h a fun ction for GBAS allo ws a comparison of the t ime needed for the t w o metho ds. Building su c h an f requires theoretical b ounds on the tail of a Gamma random v ariable. Chernoff b ounds [2] are one w ay to g et these b ounds. F act 1 (Chernoff b ound s ) . L et X 1 , X 2 , . . . b e iid r andom variables with finite me an and finite moment gener ating f u nction f or t ∈ [ a, b ] , wher e a ≤ 0 ≤ b . L et γ ∈ (0 , ∞ ) , and h ( γ ) = E [exp( tX )] / exp ( tγ E [ X ]) . Then P ( X ≥ γ E [ X ]) ≤ h ( γ ) for al l t ∈ [0 , b ] and γ ≥ 1 . P ( X ≤ γ E [ X ]) ≤ h ( γ ) for al l t ∈ [ a, 0] and γ ≤ 1 . Lemma 4. F or X ∼ Gamma ( k , k − 1) , let g ( γ ) = γ / exp( γ − 1) . Then P ( X ≥ γ E [ X ]) ≤ g ( γ ) k for al l γ ≥ 1 P ( X ≤ γ E [ X ]) ≤ g ( γ ) k for al l γ ≤ 1 . Pr o of. F or X ∼ Gamma ( k , k − 1), E [ X ] = k / ( k − 1) and the moment gen- erating function is E [exp( tX )] = (1 − t/ ( k − 1)) − k when t < k − 1. Letting α = t/ ( k − 1), that mak es h ( γ ) f r om the Chernoff b ound h ( γ ) = (1 − α ) − k exp( αk γ ) . Letting α = 1 − 1 /γ minimizes the right hand side, making it [ γ / exp ( γ − 1)] k . No w for a useful b ound on the g fu nction. Lemma 5. F or ǫ ≥ 0 , max { g ((1 + ǫ ) − 1 ) , g (1 + ǫ )) } ≤ exp( − (1 / 2) ǫ 2 (1 − (4 / 3) ǫ )) . Pr o of. Note ln ( g (1 + ǫ )) = ln(1 + ǫ ) − ǫ = − ǫ 2 / 2 + ǫ 3 / 3 − · · · whic h is an alternating series for ǫ ≥ 0. Similarly , ln( g ((1 + ǫ ) − 1 )) = − ln(1 + ǫ ) − ǫ/ (1 + ǫ ) = − ǫ 2 / 2 + (2 / 3) ǫ 3 − · · · wh ich is also an al ternating series for ǫ ≥ 0. 8 Lemma 6. F or ǫ ∈ (0 , 3 / 4) , when k ≥ 2 ǫ − 2 (1 − (4 / 3) ǫ ) − 1 ln(2 δ − 1 ) , P ( | ( ˆ p/p ) − 1 | > ǫ ) < δ . Pr o of. Let X ∼ Ga mma ( k, k − 1). Then ( ˆ p/p ) − 1 ∼ (1 /X ) − 1, so P ( | ( ˆ p/p ) − 1 | > ǫ ) = P ( | (1 /X ) − 1 | > ǫ ) = P ( − ǫ > (1 /X ) − 1) + P ((1 /X ) − 1 > ǫ ) = P ((1 − ǫ ) − 1 < X ) + P ( X < (1 + ǫ ) − 1 ) = P ( X > γ 1 E [ X ]) + P ( X < γ 2 E [ X ]) , ≤ g ( γ 1 ) k + g ( γ 2 ) k where γ 1 = [( k − 1) /k ](1 − ǫ ) − 1 and γ 2 = [( k − 1) /k ](1 + ǫ ) − 1 . F or k ≥ ǫ − 2 , ( k − 1) /k ≥ 1 − ǫ 2 , so γ 1 ≥ 1 + ǫ , and γ 2 ≤ (1 + ǫ − 1 ). Since g ( x ) = x/ exp( x − 1) is in creasing w h en x < 1, and decreasing when x > 1, it holds that g ( γ 1 ) ≤ g (1 + ǫ ) and g ( γ 2 ) ≤ g ((1 + ǫ ) − 1 ). Using the previous lemma, P ( | ( ˆ p/p ) − 1 | > ǫ ) ≤ 2 exp( − ǫ − 2 k (1 − (4 / 3) ǫ )) and the result follo ws. 4 Lo w er b ound on running time The n ew algorithm in tentionall y int ro duces random smo othing to mak e the estimate easier to analyze. F or a fixed n umber of flips, a sufficient statistic for the m ean of a Bernoulli random v ariable i s the n umb er of times th e coin came up heads. Call this num b er S . F or k flip s of the coin, S will b e a binomial random v ariable with param- eters n and p . T hen ˆ p n = S /n is the unbiased estimate of p . By the C en tral Limit Th eorem, ˆ p n will b e approxi mately n ormally distributed w ith mean p and s tand ard deviation p p (1 − p ) /n . Therefore (for small p ), ˆ p n /p will b e appro ximately normal with mean 1 and standard deviation 1 / √ pn . Let Z denote s u c h a normal. T h en we ll kno wn b oun ds on th e tails of the norm al distribution giv e exp( − ǫ 2 pn/ 2) √ 2 π  1 ǫ 2 pn − 1 ( ǫ 2 pn ) 3  ≤ P ( Z > 1+ ǫ ) ≤ exp( − ǫ 2 pn/ 2) √ 2 π  1 ǫ 2 pn  . 9 Therefore, to get P ( Z > 1 + ǫ ) < δ / 2 requires ab out 2 ǫ − 2 p − 1 ln(2 δ − 1 ) samples. A b oun d on the lo wer tail ma y b e found in a similar fashion. Since only ab out this many samples are required by the algorithm of Section 2, the constan t of 2 in fron t is most lik ely the b est p ossible. T o actually pr o v e a low er b ound, f ollo w th e approac h of [3] that uses W ald’s sequen tial prob ab ility ratio test. Con s ider the problem of testing h yp othesis H 0 : p = p 0 v ersus H 1 : p = p 1 , where p 1 = p 0 / (1 + ǫ ) 2 . Supp ose there is an appr o xim ating sc heme that approxima tes p within a factor of 1 + ǫ with chance at least 1 − δ / 2 fo r all p ∈ [ p 1 , p 0 ] usin g T flips of th e coin. Then tak e the estimate ˆ p and accept H 0 (reject H 0 ) if ˆ p ≥ p 1 (1 + ǫ ) and accept H 1 (reject H 1 ) if ˆ p ≤ p 1 (1 + ǫ ). Then let α b e the c hance that H 0 is rejected even though it is true, and β b e the c h ance that H 1 is acce pted ev en though it is false. F rom the prop erties of the appro ximation sc heme, α and β are b oth a t most δ / 2. W ald pr esen ted the sequen tial p robabilit y ratio test for testing H 0 v ersus H 1 , and show ed that it min imized t he expected n um b er of coin flips among all tests with the t yp e I and I I error pr ob ab ilities α and β [15]. Th is result w as form ulated as Corollary 7.2 in [3]. F act 2 (Corollary 7.2 of [3]) . If T is the stopping time of any test of H 0 versus H 1 with err or pr ob abilities α and β such that α + β = δ , then E [ T | H 0 ] ≥ − (1 − δ ) ω − 1 0 ln((2 − δ ) δ − 1 ) . wher e ω 0 = E [ln( f 1 ( X ) /f 0 ( X ))] with X ∼ Bern ( p 0 ) , f 0 ( x ) = p 0 1 ( x = 1) + (1 − p 0 ) 1 ( x = 0) , and f 1 ( x ) = p 1 1 ( x = 1) + (1 − p 1 ) 1 ( x = 0) . This giv es the follo wing lemma for Bern ( p ) random v ariables. Lemma 7. Fix ǫ > 0 and δ ∈ (0 , 1) . L et T b e the stopping time of any (1 + ǫ, δ / 2) appr oximation scheme that applies to X i ∼ Bern ( p ) for al l p ∈ [0 , 1] . Then E [ T ] ≥ (1 / 5) ǫ − 2 (1 + 2 ǫ )(1 − δ ) ln ((2 − δ ) δ − 1 ) p − 1 . Pr o of. As noted ab o ve, u sing the approximat ion sc heme w ith ǫ and δ/ 2 to test if p 0 = p or p 1 = p 0 / (1 + ǫ ) 2 giv es α ≤ δ / 2 and β ≤ δ / 2. Here ω 0 = p 0 (ln( p 1 /p 0 )) + (1 − p 0 ) ln ((1 − p 1 ) / (1 − p 0 )) = p 0 [ln( p 1 /p 0 ) + (1 /p 0 − 1) ln ((1 − p 1 ) / (1 − p 0 ))] = p 0 ln " p 1 (1 − p 1 ) 1 /p 0 − 1 p 0 (1 − p 0 ) 1 /p 0 − 1 # . 10 Consider a function of the form g ( x ) = x (1 − x ) 1 /c − 1 where c is a constan t. Then g ( x ) > 0 for x ∈ (0 , 1), and g ′ ( x ) = g ( x ) x − 1 − (1 /c − 1) g ( x )(1 − x ) − 1 , whic h giv es g ′ ( x ) > 0 ⇔ x − 1 − (1 /c − 1)(1 − x ) − 1 ⇔ x < c. Hence for all p 0 > p 1 , ln( p 1 (1 − p 1 ) 1 /p 0 − 1 ) is strictly increasing in p 1 . Setting p 1 = p 0 giv es ω 0 = 0, so ω 0 < 0 for 0 < p 1 < p 0 ≤ 1. Using α + β ≤ δ and ω 0 < 0 in F act 2 giv es E [ T ] ≥ − ω − 1 0 (1 − δ ) ln ((2 − δ ) δ − 1 ) . Since ln(1 + x ) = x − x 2 / 2+ · · · is alternating and decreasing in magnitud e for x ∈ (0 , 1): ln  p 1 p 0  = ln  1 (1 + ǫ ) 2  = − 2 ln(1 + ǫ ) ≥ − 2 ǫ. Also, since 1 − (1 + ǫ ) − 2 = (2 ǫ + ǫ 2 ) / (1 + ǫ ) 2 .  1 p 0 − 1  ln  1 − p 1 1 − p 0  =  1 − p 0 p 0  ln  1 − p 0 (1 + ǫ ) − 2 1 − p 0  =  1 − p 0 p 0  ln  1 + p 0 (1 − (1 + ǫ ) − 2 ) 1 − p 0  =  1 − p 0 p 0  "  p 0 (1 − (1 + ǫ ) − 2 ) 1 − p 0  − 1 2  p 0 (1 − (1 + ǫ ) − 2 ) 1 − p 0  2 # ≥ 2 ǫ + ǫ 2 (1 + ǫ ) 2 − 1 2 ·  2 ǫ + ǫ 2 (1 + ǫ ) 2  2 · p 0 1 − p 0 . F or p 0 ≤ 1 / 2, p 0 / (1 − p 0 ) ≤ 1 and the la st facto r of the second term can b e remo v ed. Putting the b ounds on the te rms of ω 0 together, ω 0 ≥ p 0 " − 2 ǫ + 2 ǫ + ǫ 2 (1 + ǫ ) 2 − 1 2 ·  2 ǫ + ǫ 2 (1 + ǫ ) 2  2 # = p 0 − 5 ǫ 2 (1 + 2 ǫ + (3 / 2) ǫ 2 + (2 / 5 ) ǫ 3 ) (1 + ǫ ) 4 ≥ − p 0 5 ǫ 2 / (1 + 2 ǫ ) . The last inequalit y follo ws from the fact that for ǫ > 0, (1 + 2 ǫ )(1 + 2 ǫ + (3 / 2) ǫ 2 + (2 / 5 ) ǫ 3 ) ≤ (1 + ǫ ) 4 . 11 5 Extension to [0 , 1] random vari ables A w ell kno wn tr ick allo w s extension of the algorithm to [0 , 1] rand om v ari- ables with mean µ , rather than just Bernoulli’s. Lemma 8. L et W b e a [0 , 1] r andom v ariable with me an µ . Then for U ∼ Unif ([0 , 1]) , P ( U ≤ W ) = µ . Pr o of. F or U ∼ U nif ([0 , 1]) and W ∈ [0 , 1], P ( U ≤ W ) = Z 1 w =0 P ( U ≤ w ) dF ( w ) = Z 1 w =0 w dF ( w ) = E [ W ] . Therefore the a lgorithm of Section 2 can be applied to an y [0 , 1] random v ariable at the cost of one uniform on [0 , 1] p er dra w of the r andom v ariable. 6 Conclusions A new algorithm for estimating the mean of [0 , 1] v ariables is g iv en with the remark able prop ert y that the relativ e error in the estimate has a distribution indep end en t of the quan tit y to b e estimated. The estimate is unbiased. T o obtain an estimate whic h has absolute r elativ e error ǫ with probabilit y at least 1 − δ requires at most 2 ǫ − 2 (1 − (14 / 3) ǫ ) − 1 p − 1 ln(2 δ − 1 ) samples. Th e factor of 2 is an improv emen t o v er the factor of 4( e − 2) in [3]. Informal Cen tral Limit Theorem argumen ts indicate that this factor of 2 in the run- ning time is the b est p ossib le. The prov able lo w er b ound on the constant is impr ov ed from the (1 / 4) e − 2 ≈ 0 . 0338 of [3] to 1 / 5 for { 0 , 1 } random v ariables. References [1] J. Besag and P . Clifford. Generalized Monte- Carlo significance tests. Biometrika , 76(4):6 33–642 , DEC 1989. [2] H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observ ations. Ann. of Math. Stat. , 23:493 –509, 1952. [3] P . Dagum, R. Karp, M. Luby , and S. Ross. An optimal algo rithm for Mon te Carlo estimation. Siam. J. Comput. , 29(5 ):1484– 1496, 2000. 12 [4] P . Galbusera, L. Lens, T Sc henc k, E. W aiy aki, and E. Matth ysen. Ge- netic v ariabilit y and gene fl o w in the globally , critically-endangered Taita thrush . Conservation Genetic s , 1:45–5 5, 2000. [5] M. Hu b er, Y. Chen, I. Dinw o o die, A. Dobra, and M. Nic h olas. Mon te Carlo algorithms for Hardy-Weinberg prop ortions. Biometrics , 62:49– 53, Mar 200 6. [6] M. Hub er and J. La w. F ast approximat ion of the p ermanent for very dense problems. In Pr o c. of 19th A CM-SIAM Symp. on Discr ete Alg. , pages 681–6 89, 200 8. [7] R. M. Karp, M. Luby , and N. Madras. Mon te Carlo approxi mation algorithms for en umeration problems. J. Algorithms , 10(3):4 29–44 8, 1989. [8] L. Lo v´ asz and S. V empala. Simulated annealing in con vex b o dies and an ˜ o ( n 4 ) vo lume algorithm. In Pr o c. 44th Symp os. on F oundations of Computer Scienc e , pages 650 –659, 2003. [9] P .C. Mahalanobis. A sample su rv ey of the acreage un der jute in b engal with discussion of planning of exp erimen ts. Snakhy¯ ay , 4:511–531, 1 940. [10] V. Quera R. Bak eman, B. F. Robinson. T esting sequen tial association: Estimating exact p v alues u s ing sampled p ermutations. Psycholo gic al Metho ds , 1(1):4–1 5, 1996 . [11] S. Resnick. A dventur es in Sto chastic Pr o c esses . Birkh¨ auser, 1992 . [12] E. Rhame, L. J oseph, and T. W. Gy ork os. Bay esian samp le size de- terminiation for estimating binomial p arameters from data sub ject to misclassification. Appl. Statist. , 49:119– 128, 2000. [13] W. J. Rogan and B. Gladen. Estimating prev alence f rom the results of a screening test. Am. J. Epidemiol. , 107(1) :71–76 , 1978. [14] P . Senc haudhuri, C. Meh ta, and N. R. Pat el. Estimating exact p v alues b y the metho d fo con trol v ariates or Mon te Carlo rescue. J. Amer. Statist. Asso c. , 90(430 ), 1995. [15] A. W ald. Se que ntial Analysis . John Wiley , 1947. 13

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment