Inverse Sampling for Nonasymptotic Sequential Estimation of Bounded Variable Means

In v erse Sampling for Nonasymptotic Sequ en tial Estimation of Bound ed V ariable Means ∗ Xinjia Chen Revised on Decem b er 2, 2007 Abstract In this pap er, we consider the nonasymptotic sequential estimation of means o f r andom v ariables bo unded in b etw een zero and one. W e hav e rigor o usly demonstrated that, in order to guarantee prescrib ed relative pre c ision a nd conﬁdence lev el, it suﬃces to con tinue sampling un til the sample sum is no less than a certain bo und and then ta ke the av era ge of s amples as an estimate for the mean of the b ounded random v aria ble. W e hav e developed an ex plic it formula and a bisectio n sear ch metho d for the determination of s uch b ound of sample s um, without any knowledge of the bounded v ariable. Moreov er, we hav e derived b o unds for the distribution of sample size. In the special case of Bernoulli rando m v ariable s , we ha ve established analytical and nu meric a l metho ds to further reduce the b ound of sa mple sum a nd thus improve the eﬃciency of sampling. F urthermor e, the fallacy o f existing results are detected and a nalyzed. 1 In tro duc tion In v arious ﬁelds of sciences an d engineering, it is a f r equen t p roblem to estimate the means of b ounded random v ariables. Sp ecially , Bernoulli rand om v ariables constitute an extremely imp ortant class of b ound ed v ariables, since the universal prob lem o f estimating the p robabilit y of an eve nt can b e formulate d as the estimation of the m ean of a Bernoulli v ariable. F or examples, the pr ob lems of estimating net work reliabilit y [9], the probability of accepta ble p erformance of uncertain systems [18] [22] and app ro ximating p robabilistic inference in Ba y esian n et w ork [7] can b e cast into the framew ork of estimating the means of Bernoulli v ariables. Clearly , Bernoulli v ariables can b e considered as a s p ecial class of r an d om v ariable b ounded in [0 , 1]. In many applications, on e needs to estimate a quan tit y µ wh ic h can b e b ounded in [0 , 1] after prop er op erations of scaling and trans lation. A t ypical appr oac h is to design an exp eriment that pr o duces a rand om v ariable X distrib uted in [0 , 1] with exp ectation µ , run the exp erimen t ∗ The author is currently with Department of Electrical En gineering, Louisiana State Universit y at Baton Rouge, LA 70803, USA , and Department of Electrical Engineering, Southern Universit y and A &M College , Baton Rouge, LA 70813, USA ; Email: chenxinjia@gmail.com 1 indep en d en tly a num b er of times, and use the a v erage of the outcomes as the estimate [6]. This tec hnique, r eferred to as M onte Carlo metho d , has b een applied to tac kle a wid e ran ge of diﬃcult problems. F or ins tances, estimating m ultidimensional inte gration, v olume and count s [9] [20], ﬁnding approximat e solution to enumeratio n pr oblems [17], appr o ximating the p ermanen t of 0-1 v alued matrices [16], solving the Ising m o del of statistical mechanics [15], ev aluating the bit err or rate of communicatio n systems [19]. Since th e estimator of the mean of X is obtained from ﬁnite samp les of X and is th us of random nature, for the estimator to b e useful, it is necessary to ensure w ith a suﬃcien tly high conﬁdence that the estimation error is within certain margin. The well kn o wn Cher n oﬀ-Hoeﬀd ing b ound [5] [14] asserts th at if the sample size is ﬁxed and is greater than ln 2 δ 2 ǫ 2 , then, with prob ab ility at least 1 − δ , the sample mean approximat es µ with absolute error ǫ . Often, ho wev er, µ is small and a go o d absolute error estimate of µ is typica lly a p oor relativ e error appro ximation of µ [6]. Therefore, we seek an ( ε, δ ) approxi mation for µ in the sense that the relativ e error of the estimator is w ithin a m argin of relativ e error ε with probabilit y at least 1 − δ . S in ce the mean v alue µ is exactly wh at we w an t to estimate, it is us u ally n ot easy to obtain reasonably tight lo w er b ound for µ . F or a sampling s cheme with ﬁxed sample size, a lo ose low er b ou n d of µ can lead to a v ery conserv ativ e sample size. F or the most diﬃcult and imp ortan t case that no p ositiv e low er b ound of µ is a v aila ble, it is not p ossible to guarant ee prescrib ed relati ve p recision and conﬁ dence lev el by a sampling sc heme with a ﬁxed sample size. T his forces u s to lo ok at sampling metho ds with random samp le sizes. The estimation tec hn iques based on sampling schemes w ithout ﬁxed sample sizes ha v e formed a ric h branch of mo dern statistics und er the heading of se quential estimation . W ald pro vided a brief in tro d u ction to this area in his seminal b o ok [23]. Ghosh et al. oﬀered a comprehensiv e exp osition in [10]. In p articular, Nadas prop osed in [21] a sequen tial sampling sc heme f or estimating mean v alues with r elativ e pr ecision. Nadas’s sequ en tial metho d requires no sp eciﬁc in formation on the mean v alue to b e estimated. Ho w ev er, his samplin g sc heme is of asymptotic n ature. The conﬁdence r equiremen t is guaran teed only as the margin of relativ e error ε tends to 0, which implies that the actual sample size has to b e inﬁnity . T h is drawbac k seve rely circumv en ts the application of h is samplin g scheme. Due to the inherent un kno wn statistical error, asymptotical metho ds h a v e b een criticize d in some literatures (see, e.g ., [9], [13], and the references therein). Esp ecially , researc her s in the areas of rand omized algorithms, con trols and communicati on systems are v ery reluctan t to u se asymptotic metho ds for qu an tifying the u n certain t y of estimation for purp ose of a vo iding another leve l of un certain t y , namely , the un kno wn error of inference (see, e.g., [18], [19 ], [20], [22] and the references ther ein). Neve rth eless, when nonasymp totic m etho d is not a v aila ble or too conserv ativ e, one has to resort to asymp totic metho ds . In recen t y ears, aimed at making Mon te Carlo estimation a more eﬃcient and rigorous metho d , Dagum et al. and Cheng ha ve attempted to dev elop n on asym p totic sequentia l m etho ds for esti- mating means of random v ariables b ou n ded in [0 , 1]. T o guaran tee prescrib ed relativ e pr ecision and conﬁdence lev el, Dagum et al. p rop osed in [6] that one sh ou ld con tinue sampling until the 2 sample sum is no less than a thr eshold v alue. Obviously , this is simply a generalizatio n of the classical i nv e rse binomial sampling [11] [12]. How ev er, the d etermin ation of th e thresh old of s am- ple su m is not trivial. Dagum et al. pr o vided an explicit form ula f or computing suc h threshold v alue for ensu ring prescrib ed relativ e precision and conﬁdence lev el. In [4], Cheng attempted to impro ve the eﬃciency by using a smaller threshold v alue. In this pap er, w e r evisit the sequen tial estima tion of means of r an d om v ariables b ounded in [0 , 1]. W e disco v ered that Dagum et al. and Cheng h a v e left ma jor ﬂ a ws in th e determination of threshold of sample su m. Sp eciﬁcally , the p ro of of Dagum et al. for their claim on the reliabilit y of estimator is incomplete and the gap cannot b e ﬁlled by usin g their arguments. The pro of of Cheng for his claim on the r eliabilit y of estimator is basically incorrect. Most imp ortan tly , w e ha v e dev elop ed a n ew approac h to determine the smallest v alue of thr eshold and th us mak e the sampling muc h more eﬃcient. An explicit form ula for the thresh old of samp le sum is also deriv ed, w hic h is sub stan tially smaller than that of Dagum et al. A direct consequen ce of our explicit form ula is that Dagum’s claim can b e pro v ed as a sp ecial result of ou r s. Moreo v er, we ha v e deriv ed general b ounds on the d istribution of sample sizes. Our metho d applies to arbitrary random v ariables b oun ded in [0 , 1]. In the sp ecial case of Bernoulli r andom v ariables, w e hav e dev elop ed a metho d to further reduce the threshold v alue and th us improv e the eﬃciency of sampling. In particular, a computational metho d is established for computing the minim um threshold v alue when knowledge of the Bernoulli parameter is a v ailable. The remainder of th e pap er is organized as follo ws. In Section 2, our general theory of in ve rse sampling is present ed. W e discuss inv erse binomial sampling in Section 3. In Section 4, w e illustrate an app lication example in the p erforman ce ev aluation of comm u n ication systems. Section 5 is th e conclusion. All pr o ofs are give n in the App endices. The mistak es of existing w orks are examined in Ap p end ices D and E. Throughout this pap er, w e shall use the follo wing notations. The exp ectation of a r andom v ariable is denoted b y E [ . ]. Th e set of in tegers is denoted b y Z . T he ceiling f unction and ﬂ o or function are denoted resp ectiv ely by ⌈ . ⌉ and ⌊ . ⌋ (i.e., ⌈ x ⌉ represents the smallest in teger n o less than x ; ⌊ x ⌋ represents the largest in teger no greater th an x ). T he left limit as t tend s to 0 is denoted as lim t ↓ 0 . Th e notation “ ⇐ ⇒ ” means “if and only if ”. The other notations w ill b e made clear as we p ro ceed. 2 General In v erse Sampling Let X b e a b ounded random v ariable d eﬁned in a pr obabilit y space (Ω , F , Pr) suc h that 0 ≤ X ≤ 1 and E [ X ] = µ ∈ (0 , 1). W e wish to estimate th e m ean of X by using a sequen ce of i.i.d. random samples X 1 , X 2 , · · · of X based on the follo wing inverse sampling scheme: Continue sampling until the sample size r e ach a numb er n such that the sample sum P n i =1 X i is no less than a p ositive numb er γ . 3 W e call this an inve rse sampling sc heme, since it reduces to the classical inverse binomial sampling sc heme [11] [12] in th e sp ecial case that X is a Bernoulli rand om v ariable. W e shall consider the f ollo wing tw o estimators for µ : e µ = γ n , b µ = γ − 1 n − 1 . Sp ecially , when X is a Bernoulli random v ariable an d γ is an inte ger, e µ and b µ are, resp ectiv ely , the maximum likeliho o d estimator an d the minimum varianc e unbiase d estimator for the binomial parameter [8] [11] [12]. It should b e noted that e µ is not an un biased estimato r of the binomial parameter; the b ias ma y b e considerable for sm all v alues of γ . T o con trol the uncertain ty of estimation, for a margin of relativ e error ε ∈ (0 , 1) and a conﬁ- dence co eﬃcien t δ ∈ (0 , 1), it is highly d esirable to determine minim um γ suc h that Pr {| e µ − µ | < εµ } > 1 − δ w hen the estimator e µ is used, and Pr {| b µ − µ | < εµ } > 1 − δ w hen the estimator b µ is used. F or this pu rp ose, we hav e Theorem 1 L et ε ∈ (0 , 1) and γ > 1 . L et X 1 , X 2 , · · · b e a se quenc e of i.i.d. r andom variables deﬁne d in a pr ob ability sp ac e (Ω , F , Pr ) such that 0 ≤ X i ≤ 1 and E [ X i ] = µ ∈ (0 , 1) for any p ositive inte g er i . Deﬁne e µ = γ n and b µ = γ − 1 n − 1 , wher e n is a r andom variable such that n ( ω ) = min { n ∈ Z : P n i =1 X i ( ω ) ≥ γ } for any ω ∈ Ω . Deﬁne e Q ( ε, γ ) = (1 + ε ) − γ exp  εγ 1 + ε  +  γ − 1 + ε γ − εγ  γ exp  1 − εγ − ε 1 − ε  and b Q ( ε, γ ) = (1 + ε ) − γ exp  εγ 1 + ε  +  γ − 1 γ − εγ  γ exp  1 − εγ 1 − ε  . Then, the fol lowing statements hold true. (I) Pr n    e µ − µ µ    ≥ ε o ≤ e Q ( ε, γ ) pr ovide d that γ > 1 − ε ε . (II) Pr n    b µ − µ µ    ≥ ε o ≤ b Q ( ε, γ ) pr ovide d that γ > 1 ε . (III) e Q ( ε, γ ) is monotone de cr e asing with r esp e ct to γ > 1 − ε ε . Mor e over, for any δ ∈ (0 , 1) , ther e exists a unique nu mb er e γ > 1 − ε ε such that e Q ( ε, e γ ) = δ . (IV) b Q ( ε, γ ) is monotone de cr e asing with r esp e ct to γ > 1 ε . M or e over, for any δ ∈ (0 , 1) , ther e exists a unique numb er b γ > 1 ε such that b Q ( ε, b γ ) = δ . (V) (1 − ε ) b γ < e γ < b γ < (1+ ε ) ln 2 δ (1+ ε ) ln(1+ ε ) − ε < (1+ ε ) ln 2 δ (2 ln 2 − 1) ε 2 < 4( e − 2)(1+ ε ) ln 2 δ ε 2 . Mor e over, lim δ → 0 e γ h ln(1 + ε ) − ε 1+ ε i − 1 ln 2 δ = lim ε → 0 e γ h ln(1 + ε ) − ε 1+ ε i − 1 ln 2 δ = 1 . (1) (VI) F or  > µ γ , Pr  n ≥ γ (1 +  ) µ  ≤  1 +  − µ γ  ( 1+  − µ γ ) × γ µ × 1 − µ 1 +  − µ γ − µ ! ( 1+  − µ γ − µ ) × γ µ . 4 (VII) F or 0 <  < 1 − µ , Pr  n ≤ γ (1 −  ) µ  ≤ (1 −  ) (1 −  ) × γ µ ×  1 − µ 1 −  − µ  (1 −  − µ ) × γ µ . See App end ix A for a pro of. F rom Th eorem 1, we can see that e γ and b γ can b e readily computed b y a b isection s earch metho d by making u se of the m on otone pr op erties and the b ounds pro vided in (I I I), (IV) an d (V). As an immediate application of Theorem 1, we can easily d etermine the b oun d (i.e., threshold v alue) of sample sum without a lo w er b oun d of µ . S p ecially , w e h a v e Corollary 1 L et ε, δ ∈ (0 , 1) and γ > 1 . L et X 1 , X 2 , · · · b e a se quenc e of i.i.d. r andom variables deﬁne d in a pr ob ability sp ac e (Ω , F , Pr ) such that 0 ≤ X i ≤ 1 and E [ X i ] = µ ∈ (0 , 1) for any p ositive inte ger i . Deﬁne e µ = γ n and b µ = γ − 1 n − 1 , wher e n is a r and om variable such that n ( ω ) = min { n ∈ Z : P n i =1 X i ( ω ) ≥ γ } for any ω ∈ Ω . Then, Pr      e µ − µ µ     < ε  > 1 − δ, Pr      b µ − µ µ     < ε  > 1 − δ (2) pr ovide d that γ > (1 + ε ) ln 2 δ (1 + ε ) ln(1 + ε ) − ε (3) Corollary 1 pro vides an explicit b ound of sample sum in the inv erse sampling scheme to ensu re the reliabilit y requiremen ts (2 ). Act ually , as can b e seen from Th eorem 1, an implicit b ound , b γ , mak es the sample scheme more eﬃcien t while guaranteei ng (2). When ε or δ is small, the explicit b ound is close to the implicit b ound, as ind icated b y (1 ). In [6], Dagum et al. claimed that, in order to ensur e Pr      e µ − µ µ     ≤ ε  ≥ 1 − δ, (4) it s u ﬃces to ha v e γ great er than Υ 1 = 1 + 4( e − 2)(1+ ε ) ε 2 ln 2 δ . F or the same pu rp ose of guaranteei ng (4), Ch eng claimed in [4] th at γ can b e redu ced as α = (1+ ε ) ln 2 δ s (1+ ε ) ln (1+ ε ) − ε , wh ere δ s satisﬁes the equation  1 − δ s 2  ( (1 − δ s ) + " 1 − 2  δ s 2  1+ ε 1+2 ε #  δ s 2  + " 1 − 2  δ s 2  1+ ε 1+3 ε #  δ s 2  2 ) = δ. Ho w ev er, as can b e s een from our analysis in App endices D and E, th eir argu m en ts in justiﬁcation of the claims are fundamentally ﬂaw ed. The c hain of inequ alities of statemen t (V) of Theorem 1 show th at our explicit b ound (3) is signiﬁcantl y smaller than Υ 1 . This indicates that the b ound Υ 1 , ob tained by Dagum et al., indeed suﬃces the need of ensur ing (4), though their pro of is n ot correct. 5 Although Ch en g failed to pro v e his claim on the reliabilit y of e µ , he obtained in [4] the follo wing useful b ounds on the a v erage sample size: γ µ ≤ E [ n ] < γ µ + 1 (5) b y making use of the obs erv atio n that X 1 + · · · + X n − 1 < γ ≤ X 1 + · · · + X n and W ald’s identit y to conclude that µ ( E [ n ] − 1) < γ ≤ µ E [ n ] and thus (5 ). F rom (5), it can b e seen that th e a v erage sample size is almost pr op ortional to γ . Hence, it is r easonable to compare the eﬃciency of diﬀerent inv erse samplin g schemes by their b ounds of sample sum γ . F or this purp ose, w e h a v e plotted our explicit b ound, imp licit b ound b γ and the b ound , Υ 1 , of Dagum et al. in Figs. 1–4. It can b e seen that the b ound of sample s u m of Dagum et al. is to o conserv ativ e and leads to a substantial waste of sampling eﬀort. 0.05 0.06 0.07 0.08 0.09 0.1 500 1000 1500 2000 2500 3000 3500 4000 4500 Margin of Relative Error ε Bounds of Sample Sum Our implicit bound Our explicit bound Bound of Dagum et al. Figure 1: Bounds of S ample Sum v ersu s Margin of Relativ e Err or ( δ = 0 . 05) 3 In v erse Binomial Sampling F or the s p ecial case that X is a Bernoulli random v ariable, the sampling can b e made more eﬃcien t. When no kno wledge of the binomial parameter is av ailable, w e h a v e the f ollo wing resu lts that ca n b e used to determine the threshold v al ue, wh ic h is smaller than its count erp art in the general in verse sampling. Theorem 2 L et ε ∈ (0 , 1) . L et X 1 , X 2 , · · · b e a se quenc e of i.i.d. Bernoul li r ando m variables deﬁne d in a pr ob ability sp ac e (Ω , F , Pr ) such that Pr { X i = 1 } = p ∈ (0 , 1) and Pr { X i = 0 } = 1 − p = q for any p ositive inte ger i . Deﬁne e p = γ n , wher e γ is a p ositive inte ger and n is a r andom variable such that n ( ω ) = min { n ∈ Z : P n i =1 X i ( ω ) = γ } for any ω ∈ Ω . Then, Pr n    e p − p p    ≥ ε o ≤ 6 0.1 0.12 0.14 0.16 0.18 0.2 0 200 400 600 800 1000 1200 Margin of Relative Error ε Bounds of Sample Sum Our implicit bound Our explicit bound Bound of Dagum et al. Figure 2: Bounds of S ample Sum v ersu s Margin of Relativ e Err or ( δ = 0 . 05) 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 50 100 150 200 250 300 350 Margin of Relative Error ε Bounds of Sample Sum Our implicit bound Our explicit bound Bound of Dagum et al. Figure 3: Bounds of S ample Sum v ersu s Margin of Relativ e Err or ( δ = 0 . 05) 7 0.5 0.6 0.7 0.8 0.9 1 15 20 25 30 35 40 45 50 55 60 65 Margin of Relative Error ε Bounds of Sample Sum Our implicit bound Our explicit bound Bound of Dagum et al. Figure 4: Bounds of S ample Sum v ersu s Margin of Relativ e Err or ( δ = 0 . 05) Q ( ε, γ ) wher e Q ( ε, γ ) = (1+ ε ) − γ exp  εγ 1+ ε  +(1 − ε ) − γ exp  − εγ 1 − ε  , which is monoton e de cr e asing with r e sp e ct to γ . M or e over, for any δ ∈ (0 , 1) , ther e exists a unique numb er γ ∗ such that Q ( ε, γ ∗ ) = δ and max n (1+ ε ) ln 1 δ (1+ ε ) ln(1+ ε ) − ε , (1 − ε ) ln 2 δ (1 − ε ) ln (1 − ε )+ ε o < γ ∗ < e γ < (1+ ε ) ln 2 δ (1+ ε ) ln(1+ ε ) − ε . F urthermor e, Pr  n ≥ γ (1 +  ) p  ≤  1 − p 1 − p +   (1 − p +  ) × γ p (1 +  ) (1+  ) × γ p ,  > 0 and Pr  n ≤ γ (1 −  ) p  ≤  1 − p 1 − p −   (1 − p −  ) × γ p (1 −  ) (1 −  ) × γ p , 0 <  < 1 − p. See Ap p end ix B for a pro of. F r om Theorem 2, it is clear that γ ∗ can b e readily obtained by a bisection search. When the binomial parameter p is known to b e b ounded in [ a, b ] ⊂ (0 , 1), it is desirable to further redu ce the conserv ative ness by a computational method. F or instance, one ma y wish to determine the smallest γ such that P r n    b p − p p    < ε o > 1 − δ for any p ∈ [ a, b ] ⊂ (0 , 1). F or this purp ose, an essentia l compu tational r outine is to chec k w hether a giv en v alue of γ is large enough to en s ure Pr n    b p − p p    < ε o > 1 − δ for an y p ∈ [ a, b ]. A t the ﬁrst glance, it seems necessary to ev aluate Pr n    b p − p p    < ε o for inﬁn ite many v alues of p . F ortun ately , our follo wing r esult in dicates that the num b er of ev aluatio ns can b e reduced to ﬁnite. Theorem 3 L et ε ∈ (0 , 1) . L et X 1 , X 2 , · · · b e a se quenc e of i.i.d. Bernoul li r ando m variables deﬁne d in a pr ob ability sp ac e (Ω , F , Pr ) such that Pr { X i = 1 } = p ∈ (0 , 1) and Pr { X i = 0 } = 1 − p = q for any p ositive inte ger i . Deﬁne e p = γ n and b p = γ − 1 n − 1 , wher e γ is a p ositive i nte ger and n is a r and om variable su c h that n ( ω ) = min { n ∈ Z : P n i =1 X i ( ω ) = γ } for any ω ∈ Ω . Then, the 8 minimum of Pr n    b p − p p    < ε o with r esp e ct to p ∈ [ a, b ] ⊂ (0 , 1) is achieve d on the set { a, b } ∪  γ − 1 (1 − ε )( ℓ + γ − 1) ∈ ( a, b ) | ℓ = 0 , 1 , · · · , ∞  ∪  γ − 1 (1 + ε )( ℓ + γ − 1) ∈ ( a, b ) | ℓ = 0 , 1 , · · · , ∞  . Similarly, the minimum of Pr n    e p − p p    < ε o with r esp e ct to p ∈ [ a, b ] ⊂ (0 , 1) is achieve d on the set { a, b } ∪  γ (1 − ε )( ℓ + γ ) ∈ ( a, b ) | ℓ = 0 , 1 , · · · , ∞  ∪  γ (1 + ε )( ℓ + γ ) ∈ ( a, b ) | ℓ = 0 , 1 , · · · , ∞  . See App endix C for a pro of. The app lication of T heorem 3 in the compu tation of min imum γ is ob vious. F or a ﬁ xed v alue of γ , since the min im um of co verag e probabilit y with resp ect to p ∈ [ a, b ] is attained at a ﬁn ite set, it can determined by a computer whether γ is large enough to ensure Pr n    b p − p p    < ε o > 1 − δ for any p ∈ [ a, b ]. Starting from γ = 2, one can ﬁn d the minimum γ b y gradu ally incremen ting γ and chec king whether γ is large enough. F or p = p ℓ = γ − 1 (1+ ε )( ℓ + γ − 1) ∈ ( a, b ), we hav e Pr      b p − p p     < ε  = Pr  γ − 1 (1 + ε ) p + 1 < n < γ − 1 (1 − ε ) p + 1  = Pr  ℓ < n − γ < ℓ + 2 ε 1 − ε ( ℓ + γ − 1)  = ℓ + ⌈ 2 ε 1 − ε ( ℓ + γ − 1) ⌉ − 1 X i = ℓ +1  γ + i − 1 i  p γ ℓ (1 − p ℓ ) i . F or p = p ℓ = γ − 1 (1 − ε )( ℓ + γ − 1) ∈ ( a, b ), we h a v e Pr      b p − p p     < ε  = Pr  γ − 1 (1 + ε ) p + 1 < n < γ − 1 (1 − ε ) p + 1  = Pr  ℓ − 2 ε 1 + ε ( ℓ + γ − 1) < n − γ < ℓ  = ℓ − 1 X i =max ( 0 , ℓ − ⌈ 2 ε 1+ ε ( ℓ + γ − 1) ⌉ +1 )  γ + i − 1 i  p γ ℓ (1 − p ℓ ) i . Con ve nient formulas for the computation of P r n    e p − p p    < ε o can b e derived in a similar wa y . W e wo uld lik e to note that the metho d of reducing th e num b er of ev aluations of co v erage probabilit y can also b e deve lop ed for the problems of compu ting minim um ﬁxed sample sizes for the estimation of P oisson parameter, p r op ortions of in ﬁ nite and ﬁnite p opulations. In this direction, we hav e recen t researc h works [1] [2] [3]. 9 4 An A pplication Example In this section, w e sh all illustrate the app lication of the general in verse sampling metho d in information tec hnology . Consider the ev aluation of bit error rate p erformance of a comm unication system. T he stream of b its are divided as blo cks of bits with length L > 1. Eac h blo c k is mo dulated as wa v eforms and transmitted via a noisy channel. A t the receiv er side, the blo c k of bits are reco vered by demo du lation. Du e to the imp act of n oise, there ma y b e incorrectly reco v ered bits. Let Z b e th e num b er of erroneous bits. Then, Z L is a random v ariable b ounded in [0 , 1], assuming p ossible v alues ℓ L , ℓ = 0 , 1 , · · · , L . Th e bit error rate can b e deﬁned as P e = E  Z L  . Since eac h blo ck of bits are mo dulated and demo du lated identica lly and indep endently , we hav e a sequ ence of i.i.d. random v ariables Z 1 , Z 2 , · · · , which ha v e the same distrib ution as Z . T o estimate P e , we can con tinue the sim ulation of the mo dulation and demo dulation pro cess u n til the num b er of blo cks r eac h a n umb er n suc h that n X i =1 Z i > L (1 + ε ) ln 2 δ (1 + ε ) ln (1 + ε ) − ε . An estimate of the bit error rate can b e take n as b P e = 1 n − 1 " (1 + ε ) ln 2 δ (1 + ε ) ln (1 + ε ) − ε − 1 # . Then, by our exp licit form ula (3), Pr (      b P e − P e P e      < ε ) > 1 − δ. It should b e noted that existing asymptotic estimation m etho ds are not app ropriate in this con text, since the bit error r ate P e is usually v ery small. Sp ecial r esu lts for Bernoulli r andom v ariables are not applicable sin ce Z L is a r andom v ariable assum es ( L + 1) > 2 v alues. 5 Concluding Remarks The problem of ﬁnding relativ e p recision estimates f or means of r andom v ariables b ounded in b et ween zero and one has numerous applications and has b een s tudied in history for a long p erio d of time. Despite the lac k of rigorous justiﬁcation, it wa s a signiﬁ can t progress made by a num b er of researc hers in r ealizing that the sample mean ensu r es the pr escrib ed reliabilit y once the s amp le sum r eac hes a certain thr eshold v alue. Our main contributions are t wo folds. First, we ha ve disco v ered critical mistak es exist in the determination of the threshold v alue, whic h determines the reliabilit y of the estimate and the eﬃciency of sampling. S econd, we ha ve dev elop ed explicit form ulas and compu tational metho ds to calculate the threshold v alue, which mak e the sampling as eﬃcien t as p ossib le, w h ile guarant eeing prescrib ed relativ e precision and conﬁd ence lev el. 10 A Pro of of Theorem 1 Lemma 1 L et ε ∈ (0 , 1) and γ > 1 ε . L et η = εγ − 1 γ − 1 and ζ b e a numb er determine d by 1 1 − ε = 1 1 − ζ + 1 γ . Then, 0 < η < ζ < ε < 1 . Pro of . Since 0 < ε < 1 and γ > 1 ε , w e h a v e γ > 1 and 0 < η = ε − 1 − ε γ − 1 < ε < 1. T o sho w 0 < ζ < 1, it suﬃces to exclude three p ossibilities. First, ζ 6 = 1 b ecause ε 6 = 1. Second, ζ > 1 is imp ossible, otherwise 1 1 − ε < 1 γ ⇐ ⇒ γ < 1 − ε , cont radicting to γ > 1 ε . Third, ζ ≤ 0 is imp ossible, otherwise 1 1 − ε ≤ 1 + 1 γ ⇐ ⇒ γ ≤ 1 − ε ε , con tradicting to γ > 1 ε . Finally , since 0 < ε < 1 , 0 < ζ < 1 and 1 − η 1 − ζ = γ − 1+ ε γ − 1 > 1, we hav e ζ > η . ✷ W e need to us e some inequalities on the function ϕ ( x ) = ln (1 + x ) − x 1+ x for | x | < 1. Lemma 2 L et ε ∈ (0 , 1) . Then, ϕ ( − ε ) > ε 2 2(1 − ε ) > ε 2 2 > ϕ ( ε ) > ε 2 (2 ln 2 − 1) 1+ ε > 0 . Pro of . T o sho w ϕ ( ε ) < ε 2 2 , it suﬃces to note that ϕ ( ε ) = ε 2 2 = 0 f or ε = 0 and that d [ ϕ ( ε ) − ε 2 2 ] dε = ε (1+ ε ) 2 − ε < 0 for 0 < ε < 1. T o show ϕ ( − ε ) > ε 2 2(1 − ε ) , it suﬃ ces to show (1 − ε ) ln(1 − ε ) + ε − ε 2 2 > 0 for ε ∈ (0 , 1). This is true b ecause th e left sid e assumes v alue 0 for ε = 0 and its deriv ativ e is − ln(1 − ε ) − ε > 0 for an y ε ∈ (0 , 1). Deﬁne f ( ε ) = (1 + ε ) ln(1 + ε ) − ε − (2 ln 2 − 1) ε 2 . T o sho w ϕ ( ε ) > ε 2 (2 l n 2 − 1) 1+ ε , it suﬃces to sho w f ( ε ) > 0. Note that f ′ ( ε ) = ln (1 + ε ) − 2(2 ln 2 − 1) ε and f ′′ ( ε ) = 1 1+ ε − 2(2 ln 2 − 1) > 0 if ε < 1 2(2 ln 2 − 1) − 1. Hence, f ′ ( ε ) is in creasing f or 0 < ε < 1 2(2 ln 2 − 1) − 1 and decreasing for 1 2(2 ln 2 − 1) − 1 < ε < 1. Since f ′ (0) = 0 and f ′ (1) = ln 2 − 2(2 ln 2 − 1) = 2 − 3 ln 2 < 0, w e hav e that there exists a unique n ull p oin t ε ⋆ ∈  1 2(2 l n 2 − 1) − 1 , 1  of f ′ ( ε ). This implies that f ( ε ) is monotone increasing for 0 < ε < ε ⋆ and monotone decreasing for ε ⋆ < ε < 1. Observing that f (0) = f (1) = 0, we can conclude that f ( ε ) > 0 for an y ε ∈ (0 , 1). ✷ Lemma 3 L et ε, δ ∈ (0 , 1) and γ ≥ ln 2 δ ϕ ( ε ) . L et η = εγ − 1 γ − 1 . Then, ϕ ( − η ) > ϕ ( ε ) > 0 . Pro of . Since ϕ (0) = 0 and ϕ ′ ( ε ) = ε (1+ ε ) 2 , it f ollo ws that ϕ ( ε ) > 0 for an y ε ∈ (0 , 1). Note that 1 − η = 1 −  ε − 1 − ε γ − 1  = (1 − ε ) γ γ − 1 and η 1 − η = ε − 1 − ε γ − 1 (1 − ε ) γ γ − 1 = ε 1 − ε − 1 (1 − ε ) γ . Hence, ϕ ( − η ) = ln  (1 − ε )  1 + 1 γ − 1  + ε 1 − ε − 1 (1 − ε ) γ . 11 By the third inequalit y of Lemma 2, w e ha v e ϕ ( ε ) < ε 2 2 for ε ∈ (0 , 1) and th us γ > ln 4 ε 2 for an y δ ∈ (0 , 1). Sin ce 1 γ − 1 > 0, using the inequalit y ln(1 + x ) ≥ x 1+ x with x = 1 γ − 1 , we ha ve ln  1 + 1 γ − 1  ≥ 1 γ − 1 1+ 1 γ − 1 = 1 γ and thus ϕ ( − η ) ≥ ln(1 − ε ) + 1 γ + ε 1 − ε − 1 (1 − ε ) γ = ln(1 − ε ) + ε 1 − ε − ε (1 − ε ) γ . Deﬁne w ( ε ) = ln(1 − ε ) + ε 1 − ε − ε 3 (1 − ε ) ln 4 . App lying γ > ln 4 ε 2 , w e hav e − ε (1 − ε ) γ > − ε 3 (1 − ε ) ln 4 and thus ϕ ( − η ) > w ( ε ). T o sh o w ϕ ( ε ) < ϕ ( − η ), it suﬃces to sho w ϕ ( ε ) < w ( ε ). Note that ϕ (0) − w (0) = 0 and ϕ ′ ( ε ) − w ′ ( ε ) = ε (1 + ε ) 2 −  ε (1 − ε ) 2 − 3 ε 2 (1 − ε ) ln 4 − ε 3 (1 − ε ) 2 ln 4  = ε 2 [ u ( ε ) − 8 ln 2] (1 + ε ) 2 (1 − ε ) 2 ln 4 where u ( ε ) = (1 + ε ) 2 (3 − 2 ε ). Since u ′ ( ε ) = 2(1 + ε )(2 − 3 ε ) = 0 if ε = 2 3 , the maxim um of u ( ε ) o ver inte rv al [0 , 1] must ac hiev e at ε = 0 , 2 3 or 1. It can b e c hec k ed that u (0) = 3 , u (1) = 4 and u  2 3  = 125 27 < 8 ln 2. This s h o ws th at u ( ε ) < 8 ln 2, whic h implies ϕ ′ ( ε ) − w ′ ( ε ) < 0 for any ε ∈ (0 , 1). It follo ws that 0 < ϕ ( ε ) < w ( ε ) < ϕ ( − η ). ✷ A classical resu lt due to [14] is restated as Lemm a 4 as follo ws. Lemma 4 Deﬁne M ( z , µ ) = ln  µ z  +  1 z − 1  ln  1 − µ 1 − z  for 0 < z < 1 and 0 < µ < 1 . L et X 1 , · · · , X n b e i .i.d. r and om varia bles b ounde d in [0 , 1] with c ommon me an value µ ∈ (0 , 1) . Then, Pr n P n i =1 X i n ≥ z o ≤ exp ( nz M ( z , µ )) for 1 > z > µ = E [ X i ] . Similarly , Pr n P n i =1 X i n ≤ z o ≤ exp ( nz M ( z , µ )) for 0 < z < µ = E [ X i ] . Lemma 5 M ((1 + ε ) µ, µ ) is monotone de cr e asing with r esp e ct to ε ∈  0 , 1 µ − 1  . Similarly, M ((1 − ε ) µ, µ ) is monotone de cr e asing with r esp e ct to ε ∈ (0 , 1) . Pro of . F or ε ∈  0 , 1 µ − 1  , we hav e 0 < (1 + ε ) µ < 1 and ∂ M ((1 + ε ) µ, µ ) ∂ ε = ∂ ∂ ε  ln  1 1 + ε  +  1 µ (1 + ε ) − 1  ln  1 − µ 1 − µ (1 + ε )  = − 1 1 + ε − 1 µ (1 + ε ) 2 ln  1 − µ 1 − µ (1 + ε )  +  1 µ (1 + ε ) − 1  µ 1 − µ (1 + ε ) = − 1 µ (1 + ε ) 2 ln  1 − µ 1 − µ (1 + ε )  < 0 . Similarly , ∂ M ((1 − ε ) µ,µ ) ∂ ε = 1 µ (1 − ε ) 2 ln  1 − µ 1 − µ (1 − ε )  < 0 for 0 < ε < 1. This completes the pro of of the lemma. ✷ 12 Lemma 6 M ((1 + ε ) µ, µ ) is no g r e ater than − ϕ ( ε ) for 0 < µ < 1 1+ ε . Similarly, M ((1 − ε ) µ, µ ) is no gr e ater than − ϕ ( − ε ) for 0 < µ < 1 . Pro of . First, note that lim µ → 0 M ((1 + ε ) µ, µ ) = lim µ → 0  ln  1 1 + ε  +  1 µ (1 + ε ) − 1  ln  1 − µ 1 − µ (1 + ε )  = ln  1 1 + ε  − lim µ → 0 ln  1 − µ 1 − µ (1 + ε )  + lim µ → 0 1 µ (1 + ε ) ln  1 − µ 1 − µ (1 + ε )  = ln  1 1 + ε  + lim µ → 0 ln(1 − µ ) − ln[1 − µ (1 + ε )] µ (1 + ε ) = ln  1 1 + ε  + lim µ → 0 − 1 1 − µ + 1+ ε 1 − µ (1+ ε ) 1 + ε = ln  1 1 + ε  + ε 1 + ε = − ϕ ( ε ) and, similarly , lim µ → 0 M ((1 − ε ) µ, µ ) = ln  1 1 − ε  − ε 1 − ε = − ϕ ( − ε ) . Next, we need to show that M ((1 + ε ) µ, µ ) is monotone decreasing w ith resp ect to µ . T o th is end, note th at ∂ M ((1 + ε ) µ, µ ) ∂ µ = ∂ ∂ µ  ln  1 1 + ε  +  1 µ (1 + ε ) − 1  ln  1 − µ 1 − µ (1 + ε )  = ∂ ∂ µ  1 µ (1 + ε ) − 1  ln  1 − µ 1 − µ (1 + ε )  = − 1 µ 2 (1 + ε ) ln  1 − µ 1 − µ (1 + ε )  +  1 µ (1 + ε ) − 1   − 1 1 − µ + 1 + ε 1 − µ (1 + ε )  = − 1 µ 2 (1 + ε ) ln  1 − µ 1 − µ (1 + ε )  +  1 µ (1 + ε ) − 1   − 1 1 − µ  + 1 µ = − 1 µ 2 (1 + ε ) ln  1 − µ 1 − µ (1 + ε )  − 1 µ (1 − µ )(1 + ε ) + 1 1 − µ + 1 µ = − 1 µ 2 (1 + ε ) ln  1 − µ 1 − µ (1 + ε )  + ε µ (1 − µ )(1 + ε ) ≤ 0 if ln  1 − µ 1 − µ (1+ ε )  ≥ εµ 1 − µ , i.e., ln  1 − εµ 1 − µ  ≤ − εµ 1 − µ . (6) Since 0 < µ < 1 1+ ε , we hav e 0 < εµ 1 − µ < 1. Using the fact th at ln (1 − x ) < − x for an y x ∈ (0 , 1), w e can conclude (6) and thus establish the m onotone decreasing pr op ert y of M ((1 + ε ) µ, µ ). Similarly , to sh o w that M ((1 − ε ) µ, µ ) is monotone decreasing with r esp ect to µ , note that ∂ M ((1 − ε ) µ, µ ) ∂ µ = ∂ ∂ µ  ln  1 1 − ε  +  1 µ (1 − ε ) − 1  ln  1 − µ 1 − µ (1 − ε )  = − 1 µ 2 (1 − ε ) ln  1 − µ 1 − µ (1 − ε )  − ε µ (1 − µ )(1 − ε ) ≤ 0 13 if ln  1 − µ 1 − µ (1 − ε )  ≥ − εµ 1 − µ , i.e., ln  1 + εµ 1 − µ  ≤ εµ 1 − µ . (7) Since εµ 1 − µ > 0, using the fact th at ln(1 + x ) < x for an y x ∈ (0 , ∞ ), we can conclude (7) and th us establish the monotone decreasing pr op ert y of M ((1 − ε ) µ, µ ). Finally , since b oth functions M ((1 + ε ) µ, µ ) and M ((1 − ε ) µ, µ ) are monotone decreasing with resp ect to µ , these tw o functions must b e b ounded fr om ab ov e by their corr esp onding limit v alues as µ tends to 0, which hav e b een obtained at the b eginning of pro of. This pro ves the lemma. ✷ Lemma 7 M  γ µ γ (1+  ) − µ , µ  is monoto ne de cr e asing with r esp e ct to  > µ γ . Pro of . Let z = γ µ γ (1+  ) − µ and m = γ z . F or  > µ γ , w e h a v e 0 < z < µ, ∂ m ∂  > 0 and ∂ [ γ M ( z , µ )] ∂  = γ  − 1 z ∂ z ∂  +  1 z − 1  1 1 − z ∂ z ∂  − 1 z 2 ∂ z ∂  ln  1 − µ 1 − z  = − γ z 2 ∂ z ∂  ln  1 − µ 1 − z  = − mγ mz 2 ∂ z ∂  ln  1 − µ 1 − z  = − m z ∂ z ∂  ln  1 − µ 1 − z  = ∂ m ∂  ln  1 − µ 1 − z  < 0 , whic h implies that M ( z , µ ) is monotone decreasing with resp ect to  > µ γ . This pr o v es the lemma. ✷ W e are no w in p osition to prov e T heorem 1. A.1 Pro of of (I) By the deﬁn ition of the estimator e µ = γ n , Pr      e µ − µ µ     ≥ ε  = Pr  n ≤ γ µ (1 + ε )  + Pr  n ≥ γ µ (1 − ε )  . Hence, w e shall derive upp er b ound s f or the tail p robabilities P r n n ≤ γ µ (1+ ε ) o and P r n n ≥ γ µ (1 − ε ) o . W e ﬁrst b oun d Pr n n ≤ γ µ (1+ ε ) o . Sin ce n is an in teger, w e ha ve Pr  n ≤ γ µ (1 + ε )  = Pr  n ≤  γ µ (1 + ε )  = Pr  n ≤ γ µ (1 + ε ∗ )  where ε ∗ is a num b er such that γ µ (1+ ε ∗ ) = j γ µ (1+ ε ) k . Clearly , ε ∗ = γ µ ⌊ γ µ (1+ ε ) ⌋ − 1 ≥ ε > 0 . F or simplicit y of notation, let m = γ µ (1+ ε ∗ ) . Since m is a nonnegativ e integ er, it can b e zero or a natural num b er. If m = 0, then Pr  n ≤ γ µ (1 + ε )  = Pr { n ≤ m } = 0 < exp( − γ ϕ ( ε )) . 14 Otherwise if m ≥ 1, th en Pr  n ≤ γ µ (1 + ε ∗ )  = Pr { n ≤ m } = Pr { X 1 + · · · + X m ≥ γ } = Pr { X ≥ z } , where X = P m i =1 X i m and z = γ m = µ (1 + ε ∗ ) > µ . No w w e sh all consider three cases. (i): In the case of z > 1, we hav e Pr { X ≥ z } ≤ Pr { P m i =1 X i > m } = 0 < exp( − γ ϕ ( ε )) . (ii): In the case of z = 1, w e h a v e µ = 1 1+ ε ∗ , m = γ and Pr { X ≥ z } = P r ( m X i =1 X i = m ) = m Y i =1 Pr { X i = 1 } ≤ m Y i =1 E [ X i ] = µ m =  1 1 + ε ∗  γ ≤  1 1 + ε  γ ≤ exp( − γ ϕ ( ε )) . (iii): In the case of µ < z < 1, b y Lemma 4, we hav e Pr { X ≥ z } ≤ exp( mz M ( z , µ )) = exp( γ M ((1 + ε ∗ ) µ, µ )) . Since ε ∗ ≥ ε , it must b e tru e that µ (1 + ε ) ≤ µ (1 + ε ∗ ) < 1 and that M ((1 + ε ∗ ) µ, µ ) ≤ M ((1 + ε ) µ, µ ) as a result of Lemma 5. Hence, Pr  n ≤ γ µ (1 + ε )  = Pr { X ≥ z } ≤ exp( γ M ((1 + ε ) µ, µ )) . (8) By Lemma 6, we ha v e M ((1 + ε ) µ, µ ) ≤ − ϕ ( ε ). It follo ws that Pr  n ≤ γ µ (1 + ε )  = P r { X ≥ z } ≤ exp( − γ ϕ ( ε )) . Therefore, we hav e shown Pr  n ≤ γ µ (1 + ε )  ≤ exp( − γ ϕ ( ε )) (9) for all cases. W e no w b ound Pr n n ≥ γ µ (1 − ε ) o . Since n is an in teger, we ha ve Pr  n ≥ γ µ (1 − ε )  = Pr  n ≥  γ µ (1 − ε )  = Pr  n >  γ µ (1 − ε ) − 1  . Let ζ b e a n umb er suc h th at 1 1 − ε = 1 1 − ζ + 1 γ . T hen, 0 < ζ < ε as a result of Lemma 1. By the deﬁnition of ζ , w e hav e γ µ (1 − ε ) − 1 > γ µ  1 1 − ε − 1 γ  = γ µ (1 − ζ ) for an y µ ∈ (0 , 1). Hence, Pr  n >  γ µ (1 − ε ) − 1  ≤ Pr  n >  γ µ (1 − ζ )  = Pr  n > γ µ (1 − ζ ∗ )  15 with ζ ∗ satisfying γ µ (1 − ζ ∗ ) = l γ µ (1 − ζ ) m . Clearly , 1 > ζ ∗ ≥ ζ > 0. Let m = γ µ (1 − ζ ∗ ) . Th en, m is a p ositiv e integer and Pr { n > m } = Pr { X 1 + · · · + X m < γ } = Pr { X < z } where X = P m i =1 X i m and z = (1 − ζ ∗ ) µ . Applying Lemma 4, we ha v e Pr  n >  γ µ (1 − ζ )  = Pr { X < z } ≤ exp( mz M ( z , µ )) = exp( γ M ((1 − ζ ∗ ) µ, µ )) . Note that M ((1 − ζ ∗ ) µ, µ ) ≤ M ((1 − ζ ) µ, µ ) as a result of 1 > ζ ∗ ≥ ζ > 0 and Lemma 5. Hence, Pr { X < z } ≤ exp( γ M ((1 − ζ ) µ, µ )) . By Lemma 6, we ha v e M ((1 − ζ ) µ, µ ) ≤ − ϕ ( − ζ ). It follo ws that Pr  n >  γ µ (1 − ζ )  ≤ exp( − γ ϕ ( − ζ )) . (10) Th u s , w e h a v e b ound s for the tw o tail p robabilities as follo ws: Pr  n ≤ γ µ (1 + ε )  ≤ exp( − γ ϕ ( ε )) , Pr  n ≥ γ µ (1 − ε )  ≤ exp( − γ ϕ ( − ζ )) . It follo ws that Pr      e µ − µ µ     ≥ ε  = Pr  n ≤ γ µ (1 + ε )  + Pr  n ≥ γ µ (1 − ε )  ≤ exp( − γ ϕ ( ε ) ) + exp( − γ ϕ ( − ζ )) = e Q ( ε, γ ) , where we hav e us ed the d eﬁ nitions of ζ and ϕ ( . ) in the last equalit y . This completes the pro of of statemen t (I). A.2 Pro of of (I I) By the deﬁn ition of the estimator b µ = γ − 1 n − 1 , w e h a v e Pr      b µ − µ µ     ≥ ε  = Pr  n ≤ 1 + γ − 1 (1 + ε ) µ  + Pr  n ≥ 1 + γ − 1 (1 − ε ) µ  . T o b ound Pr n n ≤ 1 + γ − 1 (1+ ε ) µ o , we sh all consider t w o cases. (i): I n th e case of (1 + ε ) µ > 1, w e h av e Pr n n ≤ 1 + γ − 1 (1+ ε ) µ o ≤ Pr { n < γ } = 0 ≤ P r n n ≤ γ (1+ ε ) µ o b ecause n ≥ P n i =1 X i ≥ γ is alw a ys true. (ii): In the case of (1 + ε ) µ ≤ 1, w e h av e Pr n n ≤ 1 + γ − 1 (1+ ε ) µ o ≤ Pr n n ≤ γ (1+ ε ) µ o . 16 Therefore, in b oth cases, we ha ve P r n n ≤ 1 + γ − 1 (1+ ε ) µ o ≤ Pr n n ≤ γ (1+ ε ) µ o and, by virtue of (9), Pr  n ≤ 1 + γ − 1 (1 + ε ) µ  ≤ exp( − γ ϕ ( ε )) . (11) T o b ound Pr n n ≥ 1 + γ − 1 (1 − ε ) µ o , let η = εγ − 1 γ − 1 and note that Pr  n ≥ 1 + γ − 1 (1 − ε ) µ  = Pr  n ≥ 1 +  γ − 1 (1 − ε ) µ  = Pr  n >  γ − 1 (1 − ε ) µ  = Pr  n >  γ (1 − η ) µ  ≤ exp( − γ ϕ ( − η )) (12) where (12) follo ws fr om a similar metho d as p ro ving (10). Com binin g (11), (12) and inv oking the deﬁnitions of η and ϕ ( . ) yields Pr n    b µ − µ µ    ≥ ε o ≤ exp( − γ ϕ ( ε )) + exp( − γ ϕ ( − η )) = b Q ( ε, γ ) . This completes th e pro of of statemen t (I I). A.3 Pro of of (I I I) Note that e Q ( ε, γ ) = ex p( − γ ϕ ( ε )) + exp( − γ ϕ ( − ζ )) , where ζ is determined by 1 1 − ε = 1 1 − ζ + 1 γ . By th e c hain r ule of d iﬀerentiati on, we hav e ∂ [ e Q ( ε, γ )] ∂ γ = − ϕ ( ε ) exp( − γ ϕ ( ε )) − e xp( − γ ϕ ( − ζ ))  ϕ ( − ζ ) + γ dϕ ( − ζ ) dζ ∂ ζ ∂ γ  < 0 b y observin g that ϕ ( ε ) > 0 , ϕ ( − ζ ) > 0 , dϕ ( − ζ ) dζ > 0 and ∂ ζ ∂ γ > 0. This prov es that e Q ( ε, γ ) is monotone decreasing with resp ect to γ > 1 − ε ε . The existence and uniqu en ess of e γ in in terv al  1 − ε ε , ∞  can b e seen by the monotone d ecreasing prop erty of e Q ( ε, γ ) with resp ect to γ > 1 − ε ε and the f act that lim γ →∞ e Q ( ε, γ ) = 0 , lim γ → 1 − ε ε e Q ( ε, γ ) > lim γ → 1 − ε ε  γ − 1 + ε γ − εγ  γ exp  1 − εγ − ε 1 − ε  = 1 . A.4 Pro of of (IV) Note that b Q ( ε, γ ) = exp( − γ ϕ ( ε )) + exp( − γ ϕ ( − η )) , where η = εγ − 1 γ − 1 . By the c hain rule of diﬀerenti a- tion, we hav e ∂ [ b Q ( ε, γ )] ∂ γ = − ϕ ( ε ) exp( − γ ϕ ( ε )) − exp( − γ ϕ ( − η ))  ϕ ( − η ) + γ dϕ ( − η ) dη ∂ η ∂ γ  < 0 b y observin g that ϕ ( ε ) > 0 , ϕ ( − η ) > 0 , dϕ ( − η ) dη > 0 and ∂ η ∂ γ > 0. This prov es that b Q ( ε, γ ) is monotone decreasing with resp ect to γ > 1 ε . 17 The existence and uniquen ess of b γ in in terv al  1 ε , ∞  can b e seen by the monotone decreasing prop erty of b Q ( ε, γ ) with resp ect to γ > 1 ε and the fact that lim γ →∞ b Q ( ε, γ ) = 0 , lim γ → 1 ε b Q ( ε, γ ) > lim γ → 1 ε  γ − 1 γ − εγ  γ exp  1 − εγ 1 − ε  = 1 . A.5 Pro of of (V) First, we shall sh o w the upp er b ound s f or b γ . F or this purp ose, n ote that, for γ ≥ ln 2 δ ϕ ( ε ) , we ha ve 0 < ϕ ( ε ) < ϕ ( − η ) as a result of Lemma 3. Hence, b Q ( ε, γ ) < 2 exp ( − γ ϕ ( ε )) ≤ δ for γ ≥ ln 2 δ ϕ ( ε ) . By th e th ird inequalit y of Lemma 2, we h a v e ln 2 δ ϕ ( ε ) > 2 ln 2 δ ε 2 > 1 ε . Since b Q ( ε, b γ ) = δ and b Q ( ε, γ ) is monotone decreasing with resp ect to γ > 1 ε , it must b e true that b γ < ln 2 δ ϕ ( ε ) . Applying Lemm a 2, w e h a v e b γ < ln 2 δ ϕ ( ε ) < (1+ ε ) ln 2 δ (2 ln 2 − 1) ε 2 < 4( e − 2)(1+ ε ) ln 2 δ ε 2 . Second, we shall show e γ < b γ . C learly , if e γ < 1 ε , then b γ > e γ is trivially true since b γ > 1 γ . Thus, w e can f o cus on the case that b oth b γ an d e γ are greater than 1 ε . F or γ > 1 ε , by Lemma 1 , we ha ve η < ζ and consequentl y e Q ( ε, γ ) < b Q ( ε, γ ). As a r esult, δ = e Q ( ε, e γ ) = b Q ( ε, b γ ) > e Q ( ε, b γ ). S ince e Q ( ε, e γ ) > e Q ( ε, b γ ) and ∂ [ e Q ( ε,γ )] ∂ γ < 0 , we h av e e γ < b γ . Third, we shall sho w (1 − ε ) b γ < e γ . In ligh t of the fact that η = ε b γ − 1 b γ − 1 , 1 1 − ε = 1 1 − ζ + 1 b γ , w e h a v e 1 − η = (1 − ε ) b γ b γ − 1 , 1 − ζ = 1 − ε 1 − 1 − ε e γ , 1 − η 1 − ζ = b γ b γ − 1  1 − 1 − ε e γ  . Therefore, if (1 − ε ) b γ ≥ e γ , then 1 − η 1 − ζ ≥ 1, which implies ζ ≤ η . As a result, exp ( − e γ ϕ ( ε )) + exp ( − e γ ϕ ( − ζ )) ≥ exp ( − e γ ϕ ( ε )) + exp ( − e γ ϕ ( − η )) > exp ( − b γ ϕ ( ε )) + exp ( − b γ ϕ ( − η )) = δ, whic h cont radicts exp ( − e γ ϕ ( ε )) + exp ( − e γ ϕ ( − ζ )) = e Q ( ε, e γ ) = δ . Hence, it must b e tr ue that (1 − ε ) b γ < e γ . No w, w e sh all show (1 ). Since ϕ ( ε ) > 0 and exp ( − e γ ϕ ( ε )) < δ = e Q ( ε, e γ ), w e ha v e ln 1 δ ϕ ( ε ) < e γ . Com binin g this lo w er b oun d with the previously established upp er b oun d yields ln 1 δ ϕ ( ε ) < e γ < ln 2 δ ϕ ( ε ) . (13) Clearly , as an immediate consequence of (13), w e h a v e lim δ → 0 e γ [ ϕ ( ε )] − 1 ln 2 δ = 1 . It remains to sho w lim ε → 0 e γ [ ϕ ( ε )] − 1 ln 2 δ = 1 . T o this end, w e n eed to prov e lim ε → 0 exp( − e γ ϕ ( ε )) /δ exp( − e γ ϕ ( − ζ )) /δ = 1 . I t suﬃces to 18 sho w lim ε → 0 e γ [ ϕ ( ε ) − ϕ ( − ζ )] = 0. By virtue of (13) and the condition 1 1 − ε = 1 1 − ζ + 1 e γ , ζ ε = 1 − 1 − ε ε e γ 1 − 1 − ε e γ ≥ 1 − 1 − ε ε ϕ ( ε ) ln 1 δ 1 − 1 − ε e γ . (14) Making use of th e inequalit y of (14) and the facts that ϕ ( ε ) ε → 0 and e γ > 1 − ε ε → ∞ as ε → 0, w e ha v e lim inf ε → 0 ζ ε ≥ lim ε → 0 1 − 1 − ε ε ϕ ( ε ) ln 1 δ 1 − 1 − ε e γ = 1 . On the other hand, lim sup ε → 0 ζ ε ≤ 1 b eca u s e ζ < ε . Hence, lim ε → 0 ζ ε = 1 . (15) Applying the upp er b ound in (13), w e ha v e exp ( − e γ ϕ ( ε )) > δ 2 = 1 2 e Q ( ε, e γ ) = 1 2 exp ( − e γ ϕ ( ε )) + 1 2 exp ( − e γ ϕ ( − ζ )) , whic h leads to exp ( − e γ ϕ ( ε )) > exp ( − e γ ϕ ( − ζ )), or equiv alently , ϕ ( ε ) − ϕ ( − ζ ) < 0 . (16) By (13), (15) and ( 16 ), we ha ve lim ε → 0 ϕ ( ε ) − ϕ ( − ζ ) ϕ ( ε ) = lim ε → 0 " 1 − ϕ ( − ζ ) ζ 2  ζ ε  2 ε 2 ϕ ( ε ) # = 1 − lim ε → 0 ϕ ( − ζ ) ζ 2 × lim ε → 0  ζ ε  2 × lim ε → 0 ε 2 ϕ ( ε ) = 1 − 1 2 × 1 × 2 = 0 and consequentl y , lim sup ε → 0 e γ [ ϕ ( ε ) − ϕ ( − ζ )] ≤ lim su p ε → 0 ln 1 δ ϕ ( ε ) [ ϕ ( ε ) − ϕ ( − ζ )] = 0 , lim inf ε → 0 e γ [ ϕ ( ε ) − ϕ ( − ζ )] ≥ lim in f ε → 0 ln 2 δ ϕ ( ε ) [ ϕ ( ε ) − ϕ ( − ζ )] = 0 . It follo ws that lim ε → 0 e γ [ ϕ ( ε ) − ϕ ( − ζ )] = 0 and th us lim ε → 0 exp( − e γ ϕ ( ε )) /δ exp( − e γ ϕ ( − ζ )) /δ = 1 . Finally , since exp( − e γ ϕ ( ε )) δ + exp( − e γ ϕ ( − ζ )) δ = 1 and lim ε → 0 exp( − e γ ϕ ( ε )) /δ exp( − e γ ϕ ( − ζ )) /δ = 1 , we h av e lim ε → 0 exp( − e γ ϕ ( ε )) δ = 1 2 , which implies lim ε → 0 e γ ϕ ( ε ) ln 2 δ = 1. 19 A.6 Pro of of (VI) and (VI I) First, we sh all der ive th e up p er b ound of P r n n ≥ γ (1+  ) µ o . Since n is an in teger, we ha ve Pr  n ≥ γ (1 +  ) µ  = Pr  n ≥  γ (1 +  ) µ  = Pr  n ≥ γ (1 +  ∗ ) µ  where  ∗ is a num b er satisfying γ (1+  ∗ ) µ = l γ (1+  ) µ m . C learly ,  ∗ ≥  by the deﬁn ition of  ∗ . Let m = γ (1+  ∗ ) µ − 1 . Since  > µ γ , w e h a v e γ (1+  ) µ > 1, which implies m ≥ 1. Hence, Pr  n ≥ γ (1 +  ) µ  = Pr { n ≥ m + 1 } = Pr { X 1 + · · · + X m < γ } = Pr { X < z } with X = P m i =1 X i m and z = γ m = γ µ γ (1+  ∗ ) − µ . Note that 0 < z < µ as a result of  ∗ γ ≥ γ > µ . It follo ws fr om Lemma 4 that Pr  n ≥ γ (1 +  ) µ  = Pr { X < z } ≤ exp( mz M ( z , µ )) = exp  γ M  γ µ γ (1 +  ∗ ) − µ , µ  . Since  ∗ ≥  > µ γ , applying Lemm a 7 , w e ha v e M  γ µ γ (1+  ∗ ) − µ , µ  ≤ M  γ µ γ (1+  ) − µ , µ  and Pr  n ≥ γ (1 +  ) µ  ≤ exp  γ M  γ µ γ (1 +  ∗ ) − µ , µ  ≤ exp  γ M  γ µ γ (1 +  ) − µ , µ  = exp γ " ln  1 +  − µ γ  +  1 +  µ − 1 γ − 1  ln 1 − µ 1 − γ µ γ (1+  ) − µ !#! =  1 +  − µ γ  ( 1+  − µ γ ) × γ µ × 1 − µ 1 +  − µ γ − µ ! ( 1+  − µ γ − µ ) × γ µ for  > µ γ . No w we b ound P r n n ≤ γ (1 −  ) µ o . In vo king (8), we h a v e Pr  n ≤ γ µ (1 + ε )  ≤ exp( γ M ( µ (1 + ε ) , µ )) = exp  γ  ln  1 1 + ε  +  1 µ (1 + ε ) − 1  ln  1 − µ 1 − µ (1 + ε )  for µ (1 + ε ) < 1. Letting  = ε 1+ ε , w e h a v e Pr  n ≤ γ (1 −  ) µ  ≤ exp γ " ln (1 −  ) +  1 −  µ − 1  ln 1 − µ 1 − µ 1 −  !#! = exp  γ µ  (1 −  ) ln (1 −  ) + (1 −  − µ ) ln  1 − µ 1 −  − µ  = (1 −  ) (1 −  ) × γ µ ×  1 − µ 1 −  − µ  (1 −  − µ ) × γ µ for 0 <  < 1 − µ . 20 B Pro of of Theorem 2 W e need the follo wing preliminary r esult. Lemma 8 L et k = n − γ . Then, Pr { k ≥ s } ≤  s + γ s q  s  s + γ γ p  γ for s > E [ k ] . Similarly, Pr { k ≤ s } ≤  s + γ s q  s  s + γ γ p  γ for 0 < s < E [ k ] . Pro of . Note that k is a negativ e b inomial random v ariable with d istribution Pr { k = k } =  γ + k − 1 k  p γ q k , k = 0 , 1 , 2 , · · · . F or an y t > 0, Pr { k ≥ s } = P r { e t ( k − s ) ≥ 1 } ≤ E h e t ( k − s ) i = e − ts E  e t k  = e − ts ∞ X k =0 e tk  γ + k − 1 k  p γ q k = e − ts p γ ∞ X k =0  γ + k − 1 k  ( q e t ) k = φ ( t ) where φ ( t ) = e − s  p 1 − q e t  γ . I t can b e c hec ked that d ln φ ( t ) dt = − s + γ q e t 1 − q e t = 0 if s = ( s + γ ) q e t , in which t = ln  s ( s + γ ) q  > 0 b ecause s > E [ k ] = q γ p . S ubstituting q e t = s s + γ and e − ts =  s + γ s q  s in φ ( t ) yields the upp er b ound of Pr { k ≥ s } . Similarly , th e upp er b ound of Pr { k ≤ s } can b e established for 0 < s < E [ k ]. ✷ No w we are in p ositio n to pro ve Theorem 2. By the d eﬁnition of the estimator e p = γ k + γ , Pr      e p − p p     ≥ ε  = Pr  k ≤ γ p (1 + ε ) − γ  + Pr  k ≥ γ p (1 − ε ) − γ  . T o b ound Pr n k ≤ γ p (1+ ε ) − γ o , w e n eed to consider thr ee cases as f ollo ws. (i): In the case of p (1 + ε ) > 1, w e ha v e Pr n k ≤ γ p (1+ ε ) − γ o = 0 < exp ( − γ ϕ ( ε )) . (ii): In the case of p (1 + ε ) = 1, we ha ve Pr n k ≤ γ p (1+ ε ) − γ o = Pr { k = 0 } = p γ = (1 + ε ) − γ < exp ( − γ ϕ ( ε )) . (iii): In the case of 0 < p (1 + ε ) < 1, app lying Lemma 8 w ith s = γ p (1+ ε ) − γ < γ p − γ = E [ k ] , we ha v e Pr  k ≤ γ p (1 + ε ) − γ  ≤  s + γ s q  s  s + γ γ p  γ = γ p (1+ ε ) γ p (1+ ε ) − γ q ! γ p (1+ ε ) − γ γ p (1+ ε ) γ p ! γ =  q 1 − p (1 + ε )  γ p (1+ ε ) − γ  1 1 + ε  γ = exp ( γ M ((1 + ε ) p, p )) . 21 By the ﬁr s t statemen t of Lemma 6, we hav e M ((1 + ε ) p, p ) ≤ − ϕ ( ε ). Therefore, P r n k ≤ γ p (1+ ε ) − γ o ≤ exp ( − γ ϕ ( ε )) is true for all cases. T o b ound Pr n k ≥ γ p (1 − ε ) − γ o , applying Lemma 8 with s = γ p (1 − ε ) − γ > γ p − γ > E [ k ] , w e hav e Pr  k ≥ γ p (1 − ε ) − γ  ≤  s + γ s q  s  s + γ γ p  γ =  q 1 − p (1 − ε )  γ p (1 − ε ) − γ  1 1 − ε  γ = exp ( γ M ((1 − ε ) p, p )) . By the second statemen t of Lemma 6, we h a v e M ((1 − ε ) p, p ) ≤ − ϕ ( − ε ) and thus Pr { k ≥ γ p (1 − ε ) − γ } ≤ exp ( − γ ϕ ( − ε )) . Combining the tw o b oun ds of the tail distribution p robabilities of k , w e h a v e Pr      e p − p p     ≥ ε  ≤ exp ( − γ ϕ ( ε )) + exp ( − γ ϕ ( − ε )) = Q ( ε, γ ) . Clearly , Q ( ε, γ ) is monotone decreasing w ith resp ect to γ . Beca us e of suc h monotone prop erty and th e fact that lim γ →∞ Q ( ε, γ ) = 0 , lim γ → 0 Q ( ε, γ ) > 1, there exists a un iqu e num b er γ ∗ suc h that Q ( ε, γ ∗ ) = δ . T o derive the lo w er b ound for γ ∗ , applying Lemma 2, we hav e 0 < ϕ ( ε ) < ϕ ( − ε ) an d thus 2 exp ( − γ ϕ ( − ε )) < Q ( ε, γ ). It follo ws that max { 2 exp ( − γ ∗ ϕ ( − ε )) , exp ( − γ ∗ ϕ ( ε )) } < δ = Q ( ε, γ ∗ ) , from which we can obtain the lo wer b ound of γ ∗ . T o deriv e th e upp er b ound for γ ∗ , note that Q ( ε, γ ) − e Q ( ε, γ ) = exp ( − γ ϕ ( − ε )) − exp ( − γ ϕ ( − ζ )) < 0 b ecause 0 < ζ < ε < 1 and dϕ ( − ε ) dε > 0 for 0 < ε < 1. Hence, δ = Q ( ε, γ ∗ ) = e Q ( ε, e γ ) > Q ( ε, e γ ). Since Q ( ε, γ ) is monotone decreasing w ith resp ect to γ , it m ust b e true th at γ ∗ < e γ < ln 2 δ ϕ ( ε ) . Finally , w e consider the distr ib ution of sample s ize. Note th at Pr  n ≥ γ (1 +  ) p  = Pr { k ≥ s } where s = (1 − p +  ) × γ p . Note that s + γ s q = q × 1+  1 − p +  and s + γ γ p = 1 +  . By Lemma 8, Pr  n ≥ γ (1 +  ) p  = Pr { k ≥ s } ≤  s + γ s q  s  s + γ γ p  γ =  q × 1 +  1 − p +   (1 − p +  ) × γ p (1 +  ) γ =  q 1 − p +   (1 − p +  ) × γ p (1 +  ) γ +(1 − p +  ) × γ p =  1 − p 1 − p +   (1 − p +  ) × γ p (1 +  ) (1+  ) × γ p . 22 Changing the sign of  to negativ e yields Pr  n ≤ γ (1 −  ) p  ≤  1 − p 1 − p −   (1 − p −  ) × γ p (1 −  ) (1 −  ) × γ p . This completes the pro of of Th eorem 2. C Pro of of Theorem 3 F or simp licit y of notations, d eﬁne C ( p ) = P r {| b p − p | < εp } and S ( γ , g , h, p ) = P h i = g  γ + i − 1 i  p γ (1 − p ) i . Then, C ( p ) = Pr      γ − 1 k + γ − 1 − p     < εp  = Pr  γ − 1 (1 + ε ) p − γ + 1 < k < γ − 1 (1 − ε ) p − γ + 1  = Pr { g ( p ) ≤ k ≤ h ( p ) } = S ( γ , g ( p ) , h ( p ) , p ) where g ( p ) =  γ − 1 (1 + ε ) p  − γ + 2 , h ( p ) =  γ − 1 (1 − ε ) p  − γ . It should b e n oted that C ( p ) , g ( p ) and h ( p ) are actually multiv ariate fu n ctions of p, ε and γ . W e need some p reliminary results. Lemma 9 L et p ℓ = γ − 1 (1 − ε )( ℓ + γ − 1) wher e ℓ ∈ Z . Then, h ( p ) = h ( p ℓ +1 ) = ℓ for any p ∈ ( p ℓ +1 , p ℓ ) . Pro of . Note that h ( p ) =  γ − 1 (1 − ε ) p  − γ =  γ − 1 (1 − ε ) p ℓ p ℓ p  − γ =  ( ℓ + γ − 1) p ℓ p  − γ . Since 1 < p ℓ p < p ℓ p ℓ +1 = ℓ + γ ℓ + γ − 1 for p ∈ ( p ℓ +1 , p ℓ ), we hav e ℓ + γ − 1 <  ( ℓ + γ − 1) p ℓ p  ≤  ( ℓ + γ − 1) ℓ + γ ℓ + γ − 1  = ℓ + γ . Hence, ℓ − 1 < h ( p ) ≤ ℓ . Since h ( p ) is an in teger, it m us t b e true th at h ( p ) = ℓ = h ( p ℓ +1 ). ✷ Lemma 10 L et p ℓ = γ − 1 (1+ ε )( ℓ + γ − 1) wher e ℓ ∈ Z . Then, g ( p ) = g ( p ℓ ) = ℓ + 1 for any p ∈ ( p ℓ +1 , p ℓ ) . 23 Pro of . Note that g ( p ) =  γ − 1 (1 + ε ) p  − γ + 2 =  γ − 1 (1 + ε ) p ℓ p ℓ p  − γ + 2 =  ( ℓ + γ − 1) p ℓ p  − γ + 2 . Since 1 < p ℓ p < p ℓ p ℓ +1 = ℓ + γ ℓ + γ − 1 for p ∈ ( p ℓ +1 , p ℓ ), we hav e ℓ + γ − 1 ≤  ( ℓ + γ − 1) p ℓ p  < ( ℓ + γ − 1) ℓ + γ ℓ + γ − 1 = ℓ + γ . Hence, ℓ + 1 ≤ g ( p ) < ℓ + 2. S ince g ( p ) is an intege r, it m ust b e tru e that g ( p ) = ℓ + 1 = g ( p ℓ ). ✷ Lemma 11 L et α < β b e two c onse c utive elements of the asc ending arr angement of al l distinct elements of { a, b } ∪ { γ − 1 (1 − ε )( ℓ + γ − 1) ∈ ( a, b ) : ℓ ∈ Z } ∪ { γ − 1 (1+ ε )( ℓ + γ − 1) ∈ ( a, b ) : ℓ ∈ Z } . Then, b oth g ( p ) and h ( p ) ar e c onsta nts for any p ∈ ( α, β ) . Pro of . Since α and β are tw o consecutive elemen ts of the ascend in g arrangement of all distinct elemen ts of the set, it m ust b e true that there is no in teger ℓ s u c h that α < γ − 1 (1 − ε )( ℓ + γ − 1) < β or α < γ − 1 (1+ ε )( ℓ + γ − 1) < β . It follo ws that there exist t wo intege rs ℓ and ℓ ′ suc h that ( α, β ) ⊆  γ − 1 (1 − ε )( ℓ + γ ) , γ − 1 (1 − ε )( ℓ + γ − 1)  and ( α, β ) ⊆  γ − 1 (1 − ε )( ℓ ′ + γ ) , γ − 1 (1 − ε )( ℓ ′ + γ − 1)  . Applyin g Lemma 9 and Lemma 10, w e h a v e g ( p ) = g  γ − 1 (1 − ε )( ℓ + γ − 1)  and h ( p ) = h  γ − 1 (1 − ε )( ℓ ′ + γ )  for an y p ∈ ( α, β ). ✷ Lemma 12 F or any p ∈ (0 , 1) , lim t ↓ 0 C ( p + t ) ≥ C ( p ) and lim t ↓ 0 C ( p − t ) ≥ C ( p ) . Pro of . Observing that g ( p + t ) ≤ g ( p ) for any t > 0 and that h ( p + t ) = h ( p ) +  γ − 1 (1 − ε )( p + t ) −  γ − 1 (1 − ε ) p  = h ( p ) for − 1 < γ − 1 (1 − ε )( p + t ) − l γ − 1 (1 − ε ) p m < 0 , i.e., 0 < t < γ − 1 (1 − ε ) l γ − 1 (1 − ε ) p m − 1  − 1 − p , we h a v e S ( γ , g ( p + t ) , h ( p + t ) , p + t ) ≥ S ( γ , g ( p ) , h ( p ) , p + t ) (17) for 0 < t < max  1 , γ − 1 (1 − ε ) l γ − 1 (1 − ε ) p m − 1  − 1  − p . Sin ce g ( p + t ) = g ( p ) +  γ − 1 (1 + ε )( p + t ) −  γ − 1 (1 + ε ) p  , w e h a v e g ( p + t ) =      g ( p ) − 1 for γ − 1 (1+ ε ) p = j γ − 1 (1+ ε ) p k & 0 < t ≤ γ − 1 (1+ ε ) j γ − 1 (1+ ε ) p k − 1  − 1 − p, g ( p ) for γ − 1 (1+ ε ) p 6 = j γ − 1 (1+ ε ) p k & 0 < t ≤ γ − 1 (1+ ε ) j γ − 1 (1+ ε ) p k − 1 − p. 24 It follo ws that b oth g ( p + t ) and h ( p + t ) are ind ep endent of t if t > 0 is small enough. S ince S ( γ , g, h, p + t ) is con tin uous with resp ect to t for ﬁxed g and h , we h a v e that lim t ↓ 0 S ( γ , g ( p + t ) , h ( p + t ) , p + t ) exists. As a result, lim t ↓ 0 C ( p + t ) = lim t ↓ 0 S ( γ , g ( p + t ) , h ( p + t ) , p + t ) ≥ lim t ↓ 0 S ( γ , g ( p ) , h ( p ) , p + t ) = S ( γ , g ( p ) , h ( p ) , p ) = C ( p ) , where the inequalit y follo ws from (17). Observing that h ( p − t ) ≥ h ( p ) for an y t > 0 and that g ( p − t ) = g ( p ) +  γ − 1 (1 + ε )( p − t ) −  γ − 1 (1 + ε ) p  = g ( p ) for 0 < t < p − γ − 1 (1+ ε )  1 + j γ − 1 (1+ ε ) p k − 1 , we ha ve S ( γ , g ( p − t ) , h ( p − t ) , p − t ) ≥ S ( γ , g ( p ) , h ( p ) , p − t ) (18) for 0 < t < p − γ − 1 (1+ ε )  1 + j γ − 1 (1+ ε ) p k − 1 . Since h ( p − t ) = h ( p ) +  γ − 1 (1 − ε )( p − t ) −  γ − 1 (1 − ε ) p  , w e h a v e h ( p − t ) =      h ( p ) + 1 for γ − 1 (1 − ε ) p = l γ − 1 (1 − ε ) p m & 0 < t < p − γ − 1 (1 − ε )  1 + l γ − 1 (1 − ε ) p m − 1 , h ( p ) for γ − 1 (1 − ε ) p 6 = l γ − 1 (1 − ε ) p m & 0 < t < p − γ − 1 (1 − ε ) l γ − 1 (1 − ε ) p m − 1 . It follo ws that b oth g ( p − t ) and h ( p − t ) are ind ep endent of t if t > 0 is small enough. S ince S ( γ , g, h, p − t ) is con tin uous with resp ect to t for ﬁxed g and h , we h a v e that lim t ↓ 0 S ( γ , g ( p − t ) , h ( p − t ) , p − t ) exists. Hence, lim t ↓ 0 C ( p − t ) = lim t ↓ 0 S ( γ , g ( p − t ) , h ( p − t ) , p − t ) ≥ lim t ↓ 0 S ( γ , g ( p ) , h ( p ) , p − t ) = S ( γ , g ( p ) , h ( p ) , p ) = C ( p ) , where the inequalit y follo ws from (18). ✷ Lemma 13 L et 0 < u < v < 1 , h ≥ 0 and g ≤ h . Then, min p ∈ [ u,v ] S ( γ , g, h, p ) = min { S ( γ , g, h, u ) , S ( γ , g , h, v ) } . 25 Pro of . Since Pr { k ≤ k } = I p ( γ , k + 1) where I p is the regularized incomplete b eta fu nction I p ( a, b ) = B ( p, a, b ) B ( a, b ) = R p 0 t a − 1 (1 − t ) b − 1 dt R 1 0 t a − 1 (1 − t ) b − 1 dt , w e h a v e ∂ [Pr { k ≤ l } ] ∂ p = p γ − 1 (1 − p ) l B ( γ , l + 1) > 0 (19) for an y in teger l ≥ 0. T o sh ow the lemma, it su ﬃ ces to consider 2 cases as follo ws. Case (i): g ≤ 0 ≤ h . In this case, S ( γ , g , h, p ) = S ( γ , 0 , h, p ), w hic h is in cr easing as a resu lt of (19). Case (ii): 0 < g ≤ h . By (19), for t wo in tegers 0 ≤ k < l , ∂ [Pr { k < k ≤ l } ] ∂ p = p γ − 1 (1 − p ) l B ( γ , l + 1) − p γ − 1 (1 − p ) k B ( γ , k + 1) = p γ − 1 (1 − p ) k B ( γ , l + 1)  (1 − p ) l − k − l !( γ + k )! k !( γ + l )!  > 0 if p < 1 − h l !( γ + k )! k !( γ + l )! i 1 l − k . If follows tha t S ( γ , g , h, p ) is a unimo d al f unction of p . F rom suc h inv estigati on of the deriv ativ e of C ( p ) = S ( γ , g , h, p ) with resp ectiv e to p , w e can see that one of the follo wing three cases must b e true: (1) C ( p ) decreases monotonically for p ∈ [ u, v ]; (2) C ( p ) increases monotonically for p ∈ [ u, v ]; (3) there exists a num b er θ ∈ ( u, v ) such that C ( p ) increases monotonica lly for p ∈ [ u, θ ] and decreases monotonica lly for p ∈ ( θ , v ]. It follo ws that the lemma must b e true f or all cases. ✷ Lemma 14 L et α < β b e two c onse c utive elements of the asc ending arr angement of al l distinct elements of { a, b } ∪ { γ − 1 (1 − ε )( ℓ + γ − 1) ∈ ( a, b ) : ℓ ∈ Z } ∪ { γ − 1 (1+ ε )( ℓ + γ − 1) ∈ ( a, b ) : ℓ ∈ Z } . Then, C ( p ) ≥ min { C ( α ) , C ( β ) } for any p ∈ ( α, β ) . Pro of . By Lemma 11, g ( p ) and h ( p ) are constan ts for any p ∈ ( α, β ). Hence, w e can drop the argumen t p and write g ( p ) = g , h ( p ) = h and C ( p ) = S ( γ , g , h, p ). F or p ∈ ( α, β ), deﬁne int erv al [ α + t, β − t ] with 0 < t < min  p − α, β − p, β − α 2  . Then, p ∈ [ α + t, β − t ]. By Lemma 13, C ( p ) ≥ min µ ∈ [ α + t,β − t ] C ( µ ) = min { C ( α + t ) , C ( β − t ) } for 0 < t < min  p − α, β − p , β − α 2  . By Lemm a 12, b oth lim t ↓ 0 C ( α + t ) and lim t ↓ 0 C ( β − t ) exist and are b ounded fr om b elo w by C ( α ) and C ( β ) r esp ectiv ely . Hence, C ( p ) ≥ lim t ↓ 0 min { C ( α + t ) , C ( β − t ) } = min  lim t ↓ 0 C ( α + t ) , lim t ↓ 0 C ( β − t )  ≥ min { C ( α ) , C ( β ) } 26 for an y p ∈ ( α, β ). ✷ Finally , we can readily deduce Theorem 3. Th e ﬁrst statemen t on the minimum of the co v erage probabilit y follo ws immediately f rom Lemma 14. T he second statement on the m inim um of the co v erage probabilit y can b e pr ov ed in a similar wa y . D The Inc omplete W ork of Dagum et al. Let Z 1 , Z 2 , · · · b e a sequence of i.i.d. random v ariables deﬁned on th e same probability space suc h that Z i ∈ [0 , 1] and E [ Z i ] = µ Z ∈ (0 , 1). I n ord er to estimate µ Z , Dagum et al. prop osed (in Section 2.1, p age 1486 of [6]) the follo wing Stopping Rule Algorithm: Initialize N ← 0 , S ← 0. While S < Υ 1 do: N ← N + 1 , S ← S + Z N . Return b µ Z = Υ 1 N as the estimate of µ Z . In S ection 2.1, p age 1486 of [6], Dagum et al. claimed that the r eliabilit y of the estimate b µ Z is asserted by the follo wing “Stopping Rule Th eorem”. Theorem 4 L et ε ∈ (0 , 1) and δ ∈ (0 , 1) . Then, Pr {| b µ Z − µ Z | ≤ εµ Z } ≥ 1 − δ . W e would lik e to p oin t out that the pro of of Dagum et al. is not complete. T here exists a signiﬁcan t gap whic h cann ot b e p atc hed by u sing their argument. D.1 The P ro of of “Stopping Rule Theorem” b y Dagum et al T o exhibit the fallacy of the argument by Dagum et al., w e s hall represent their pro of for “Stopping Rule Theorem” in this section. The follo wing preliminary result is ﬁrst established by Dagum et al. as Lemma 4.6 in page 1489 of [6]. Lemma 15 L et λ = e − 2 . L et ρ Z = m ax { σ 2 Z , εµ Z } wher e σ 2 Z is the varianc e of Z . Deﬁne ξ k = P k i =1 ( Z i − µ Z ) for k = 1 , 2 , · · · . Then, for any ﬁxe d N > 0 and any β ∈ [0 , 2 λρ Z ] , Pr  ξ N N ≥ β  ≤ exp  − N β 2 4 λρ Z  (20) and Pr  ξ N N ≤ − β  ≤ exp  − N β 2 4 λρ Z  . (21) The argumen t of Dagum et al. (in Section 5, page 1490-14 91 of [6 ]) for the “Stopp ing Ru le Theorem” pro ceeds as follo ws. Let N Z b e the samp le size at th e stopp ing time. Recall that b µ Z = Υ 1 N Z . It suﬃces to show that Pr  N Z < Υ 1 µ Z (1 + ε )  ≤ δ 2 (22) 27 and that (equation (8), page 1491 of [6]) Pr  N Z > Υ 1 µ Z (1 − ε )  ≤ δ 2 . (23) T o sho w (22), it su ﬃces to consider the case th at µ Z (1 + ε ) ≤ 1, since th e theorem is trivially true if µ Z (1 + ε ) > 1. Let L = j Υ 1 µ Z (1+ ε ) k . By the deﬁn itions of Υ 1 and L , L = $ 1 + (1 + ε ) 4 λ ε 2 ln 2 δ µ Z (1 + ε ) % > 1 + (1 + ε ) 4 λ ε 2 ln 2 δ µ Z (1 + ε ) − 1 ≥ 4 λ ε 2 µ Z ln 2 δ . (24) Since N Z is an integ er, N Z < Υ 1 µ Z (1+ ε ) implies N Z ≤ L . But N Z ≤ L if and only if S L ≥ Υ 1 . Th u s , P r n N Z < Υ 1 µ Z (1+ ε ) o ≤ Pr { N Z ≤ L } = Pr { S L ≥ Υ 1 } wh er e S L = P L i =1 Z i . L et β = Υ 1 L − µ Z . Then, Pr { S L ≥ Υ 1 } = Pr { S L − µ Z L − β L ≥ 0 } = Pr n ξ L L ≥ β o . Noting that εµ Z ≤ β ≤ 2 λρ Z , Lemma 15 implies that Pr n ξ L L ≥ β o ≤ exp  − Lβ 2 4 λρ Z  ≤ exp  − L ( εµ Z ) 2 4 λρ Z  . Using the last inequalit y of (24) and noting that ρ Z ≤ max { µ Z (1 − µ Z ) , εµ Z } ≤ µ Z , it follo ws that Pr n N Z < Υ 1 µ Z (1+ ε ) o ≤ Pr n ξ L L ≥ β o ≤ exp  − L ( εµ Z ) 2 4 λρ Z  ≤ δ 2 . This completes the pro of of (22). Finally , instead of giving detailed argum ent in [6], Dagum et al. claimed that the pro of of (23) is similar. D.2 A Hole in t he Pro of of Dagum et al W e w ould lik e to p oint out that the pro of of Dagum et al. is not complete b ecause (23) cannot b e sho wn by a s imilar argumen t of Dagum et al. in p ro ving (22). The gap is exhibited as f ollo ws. T o sho w Pr n N Z > Υ 1 µ Z (1 − ε ) o ≤ δ 2 in the similar spirit of the ﬁrst part, it is exp ected to construct an in teger L and a real n u mb er β = µ Z − Υ 1 L suc h that  N Z > Υ 1 µ Z (1 − ε )  ⊆ { S L ≤ Υ 1 } , (25) εµ Z ≤ β , (26) β ≤ 2 λρ Z , (27) L ≥ 4 λ ε 2 µ Z ln 2 δ (28) and consequentl y Pr  N Z > Υ 1 µ Z (1 − ε )  ≤ Pr { S L ≤ Υ 1 } (29) = Pr  ξ L L ≤ − β  (30) ≤ exp  − Lβ 2 4 λρ Z  (31) ≤ exp  − L ( εµ Z ) 2 4 λρ Z  (32) ≤ exp  − Lε 2 µ Z 4 λ  (33) ≤ δ 2 (34) 28 where (29) relies on (25); (30) is d ue to the d eﬁnitions of β and S L ; (31) relies on (21) of L emm a 15 and (27); (32) relies on (26 ); (33) is d u e to the fact ρ Z ≤ max { µ Z (1 − µ Z ) , εµ Z } ≤ µ Z ; (34) relies on (28). Unfortunately , it is p ossible that (26) contradict s (25)! T o see this, note that, by th e d eﬁnition of β and (26), we ha v e εµ Z ≤ µ Z − Υ 1 L , i.e., L ≥ l Υ 1 (1 − ε ) µ Z m since L is an in teger. W e can sho w that n N Z > Υ 1 µ Z (1 − ε ) o ⊆ { S L ≤ Υ 1 } is not true if L ≥ l Υ 1 (1 − ε ) µ Z m . F or th is purp ose, note that n N Z > Υ 1 µ Z (1 − ε ) o = { N Z > K } = { S K < Υ 1 } where K = j Υ 1 (1 − ε ) µ Z k . F or a random v ariable Z w ith mean v alue µ Z suc h that Υ 1 (1 − ε ) µ Z is not an inte ger, we hav e K = j Υ 1 (1 − ε ) µ Z k < l Υ 1 (1 − ε ) µ Z m ≤ L . As a result, it is p ossible that P K i =1 Z i < Υ 1 < P L i =1 Z i , w hic h implies that { S K < Υ 1 } ⊆ { S L ≤ Υ 1 } is not true. Thus w e ha ve sho wn that (25) is not necessarily tru e if (26) is satisﬁed. Th is demonstrates that it is n ot p ossible to sho w Pr n N Z > Υ 1 µ Z (1 − ε ) o ≤ δ 2 b y u sing th e argument of Dagum et al. E The F allacy of Ch eng’s Reasoning In order to imp ro v e eﬃciency , Ch eng revised the Stopp ing Rule Algorithm of Dagum et al. by replacing the threshold v alue Υ 1 with a smaller num b er α . See, pages 12–13, Algorithm 1 of Section 5, and page 18, lines 9-10 of his p ap er [4]. Cheng claimed that suc h a revised algorithm ensures Pr {| b µ Z − µ Z | ≤ εµ Z } ≥ 1 − δ . He ﬁr s t established Th eorem 4 in page 7 of his pap er, which is restated as Th eorem 5 follo ws. Theorem 5 L et Z 1 , · · · , Z n b e i.i.d. r ando m variables b ounde d in [0 , 1] with c ommo n me an value µ Z ∈ (0 , 1) . L et 0 < ε < min { 1 , (1 − µ Z ) /µ Z } . Then, Pr n    P n i =1 Z i n − µ Z    ≤ εµ Z o ≥ 1 − δ if n ≥ 1 µ Z 1 (1+ ε ) ln(1+ ε ) − ε ln  2 δ  . In the ﬁr s t paragraph of page 18 of his pap er [4], after deﬁnin g eve nts E 1 =  0 < µ Z < b µ Z 1 + 3 ε  , E 2 =  b µ Z 1 + 3 ε ≤ µ Z < b µ Z 1 + 2 ε  , E 3 =  b µ Z 1 + 2 ε ≤ µ Z < b µ Z 1 + ε  , E 4 =  µ Z ≥ b µ Z 1 + ε  , Cheng applied the la w of total prob ab ility to write Pr {| b µ Z − µ Z | ≤ εµ Z } = 4 X i =1 Pr {| b µ Z − µ Z | ≤ εµ Z | E i } Pr { E i } and attempted to sh o w th at the righ t-hand side of the equalit y is b oun ded from b elo w by 1 − δ . Unfortunately , Cheng made a fu ndament al mistak e in b ou n ding Pr {| b µ Z − µ Z | ≤ εµ Z | E 4 } and other terms alike. He noted that S N Z = P N Z i =1 X i ≥ α and th us N Z ≥ 1 b µ Z 1 ln(1+ ε ) − ε/ (1+ ε ) ln  2 δ s  29 b ecause of the deﬁnitions of the stopping ru le and α . Conditioning up on µ Z ≥ b µ Z 1+ k ε with some constan t k , he obtained N Z ≥ 1 µ Z 1 + ε 1 + k ε ln  2 δ s  (1 + ε ) ln (1 + ε ) − ε = 1 µ Z 1 (1 + ε ) ln (1 + ε ) − ε ln  2 δ s  1+ ε 1+ kε (35) and then app lied Theorem 5 to claim Pr      b µ Z − µ Z µ Z     ≤ ε | µ Z ≥ b µ Z 1 + k ε  ≥ 1 − 2  δ s 2  1+ ε 1+ kε , (36) whic h was equation (28) of page 17 in h is pap er [4]. Here Cheng made a subtle and critic al mistak e by il le gal ly app lyin g Theorem 5. The reason is that the sample size r e quir ement of The or em 5 is indep endent of samples, while the validity of (35) dep ends on samples . Consequ en tly , (36) is not jus tiﬁed. This aﬀects sub sequent relev ant dev elopmen t. References [1] Chen, X. (2007). Exact computation of minimum sample size for estimation of bin omial parameters. arXiv:0707.21 13 v1 [math.ST]. [2] Chen, X. (2007). Exact computation of minim um sample size for estimati on of P oisson pa- rameters. arXiv:0707.21 16 v1 [math.ST]. [3] Chen, X. (2007). Exact computation of m in im um s amp le size for estimating p rop ortion of ﬁnite p opulation. arXiv:0707.211 5 v1 [math.ST]. [4] Cheng, J. (2001). S ampling algorithms for estimating th e mean of b ounded v ariables. Com- put. Statist. 16 1–23. [5] Chernoff, H. (1952). A measure of asymptotic eﬃciency for tests of a h yp othesis based on the sum of observ atio ns. Ann. Math. Statist. 23 493–50 7. [6] D agum, P . , Karp, R. , Luby, M. and R oss, S. (2000). An optimal algorithm for Mon te Carlo estimation. SIA M J. Comput. 29 1484– 1496. [7] D agum, P. and Luby, M. (1997). An optimal appro ximation algorithm f or Ba y esian infer- ence, Artiﬁcial Intel ligenc e 93 1–27 . [8] DeGr oot , M. H. (1959). Un biased sequential estimation for binomial p opulations. Ann. Math. Statist. 30 80–10 1. [9] Fishman, G. S . (1996). Monte Carlo – Conc epts, Algorith ms and Applic ations , Sprin g-V erlag, New Y ork. 30 [10] Ghosh, M. , Mu khop adhy a y, N. and Sen, P. K. (1997). Se quential Estimation , Wiley , New Y ork. [11] Haldane, J. B. S. (1945). A labou r -sa ving metho d of samp ling. Natur e 155 49–5 0, January 13. [12] Haldane, J . B. S . (1945). On a metho d of estimating fr equencies. Biometrika 33 222–225. [13] Hampel , F. (1998) . Is statistics to o diﬃcult? The Cana dian Journal of Statistics 26 497– 513. [14] Hoeffd ing, W. (1963). Probability inequalities for s u ms of b oun d ed v ariables. J . Amer. Statist. Asso c. 58 13–29. [15] Jerrum, M. an d Sinclair, A. (1993). Pol ynomial-time appr o ximation algo rithms for th e Ising mo del. SIAM J . Comput. 22 1087–1 116. [16] Karmar kar, M. , Kar p, R . , Lipto n, R. , Lov as z, L. and Luby M. (1993) . A Monte Carlo algorithm for estimating the p ermanen t. SIAM J. Comput. 22 284–29 3. [17] Karp, R . , L u by , M. and Madra s, N. (1989). Mon te C arlo appro ximation algorithms for en umeration problems. J. Algorithm s 10 429–44 8. [18] Khargonekar, P. P. and Tikku, A. (1996). Randomized algorithms for r obust con trol analysis and synt hesis ha ve p olynomial complexit y . P r o c e e dings of Confer enc e on De cision and Contr ol , 3470–347 5, Kob e, Japan, Decem b er. [19] Mendo, L. and Hernando , J. M. (2006). A simple s equen tial stopping rule for Mont e Carlo simulat ion. IEEE T r ansactions on Communic ations 54 231–241. [20] Mitzenmacher, M. and Up f a l, E. (2 005). Pr ob ability and Computing — R andomize d Algor ithms and P r ob abilistic A nalysis , Cam brid age Univ ersit y Press, C am bridage. [21] Nadas, A. (1969). An extension of a theorem of Cho w and Robbins on sequent ial conﬁdence in terv als. Ann. M ath. Statist. 40 667–671. [22] Stengel , R. F. and Ra y, L. R. (19 91). S to c hastic robustn ess of linear time-inv arian t systems. IEEE T r ansaction s on Automatic Contr ol 36 82–87. [23] W a ld, A. (1947) . Se quential Analysis , Wiley , New Y ork. 31

Inverse Sampling for Nonasymptotic Sequential Estimation of Bounded Variable Means

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment