On Approximating Frequency Moments of Data Streams with Skewed Projections

On Approxima ting Frequency Moments of Data Streams with Ske wed Projections Ping Li (pingli@cornell.edu ) Department of Statistical Science Faculty of C o mputing and Information Science Cornell Uni versity , Ithaca, NY 14853 November 19, 2007 (re vis ed September 7, 2021) Abstract W e propose ske wed stable rand om projection s for approximating the α th frequenc y moments of dynamic data str eams ( 0 < α ≤ 2 ). W e sho w the sample complexity (numbe r of projections) k = G 1 ǫ 2 log ` 2 δ ´ , where G → ǫ 2 log(1+ ǫ ) = O ( ǫ ) as α → 1 , i.e., α = 1 ± ∆ with ∆ → 0 . Pre vious results based on symmetric stable rand om pr ojections [12, 16] required G = non-zero constant + O ( ǫ ) , ev en when ∆ = 0 . T he case ∆ → 0 is practically i mportant. For example, ∆ might be the “decay rate” or “interest rate, ” which is usually small; and hence one might view skewed stable r andom pr ojections as a “generalized counter” for estimating the total value in the future, taking in accou nt of the ef fect of decaying or interest accruement. W e consider the popular Tu rnstile data stream model. The input data stream a t = ( i, I t ) arriving se quentially describes the underlying signal A , meaning A t [ i ] = A t − 1 [ i ] + I t , i ∈ [1 , D ] . W e allo w the increment I t to be eit her positiv e (i.e., i nsertion) or neg at iv e (i. e., deletion). By deﬁnition, the α th fr equenc y moment F ( α ) = P D i =1 | A t ( i ) | α . Our method only requires that, at the time t for th e e v aluation, A t ( i ) ≥ 0 , which is o nly a minor restriction for natural data streams encou ntered in practice. More speciﬁcally , compared wit h pre vious st udies[11, 12, 16], our contribu t ions are two-fold. 1. Our proposal of sk ewed stable random projections for data str eam computations In F OCS’00[11], Indyk proposed (symmetric) stable ran dom pr ojections for approxima t ing the α th fre- quenc y moment of data streams , where 0 < α ≤ 2 . Because practical data s treams are o f ten: (a) insertion only (i.e., the cash r egister model), or (b) always non-negati ve (i.e., the strict T urnstile model), or (c) ultimately n on-ne gative at check p oints, using symmetric stable ran dom pr ojections is often not necessary . Consider at the ti me t , A t ( i ) ≥ 0 for all i . When α = 1 , we can co mpute F (1) essentially error -f ree using a co unter . Howe ver , if one applies symmetric stable r andom pr ojections and the geometric mean estimator in [16], the sample complexity requires k = “ π 2 2 + O ( ǫ ) ” 1 ǫ 2 log 2 δ . The situation becomes much more interesting when α = 1 ± ∆ with small ∆ , be cause in this case the traditional counter can no t be used bu t symmetric stable rand om pr ojections will stil l req uire a large number of samples (projections). For the ﬁr st time, we propose skewed stable random projection s , which may be viewed as a “generalized counter” and works especially well when ∆ is small, which is also practically very important. 2. Our development o f various statistical estimators for ske w ed stable distributions Good statistical estimators are both theoretically important (e.g., for sample comp l exity bounds) and prac - tically useful (e. g., for accurate estimates using fewer samples). The method of skewed stable random pr ojections e ventually boils down to a stati stical estimation problem, which i s less well -studied in statis- tics than for symmetric stable rando m pro jecti ons . Thus, much of our work is based on the ﬁrst principle. • T o build the foundation for statistical estimation, we deri ve theoretical formulas for moments of ske wed stable distributions and discover a useful property that a fully skewed stable distribution has inﬁnite-order negati ve moments. W e only recommend fully ske wed projections. • W e design a general estimator based on t he geometric mean for ske wed stable distributions and sho w that t he estimati on variance is minimized in fully ske wed stable distributions. The asymptotic v ariance of the estimator is (1 − α 2 ) π 2 6 1 k F 2 ( α ) (when α < 1 ) and (5 − α )( α − 1) π 2 6 1 k F 2 ( α ) (when α > 1 ). Compared wit h [16], our work in a sense achie ves an inﬁnite improvement when α → 1 , in t erms of t he asymptotic va r iances. W e also provide explicit tail bounds and consequently establish that k = G 1 ǫ 2 log ` 2 δ ´ , where G = ǫ 2 log(1+ ǫ ) − 2 √ ∆ log(1+ ǫ )+ o ( √ ∆ ) as α = 1 ± ∆ → 1 (i.e., ∆ → 0 ). • For α < 1 , t he harmonic mean estimator is considerably more accurate. Unlike the harmonic mean estimator in [16] (which w as useful only for v ery small α ), this estimator has inﬁnite-order moments and hence exhibits nice tail beha viors for al l 0 < α < 1 . W e pro vide the tail bounds explicitly . 1 • Maximum li kelihoo d estimators (MLE) can be explicitly deri ved for α = 0+ , α = 0 . 5 and α = 2 . W e analyze t he MLE for α = 0 . 5 , including the varianc es and explicit tail b ounds. • Finally , we also propose the optimal power estimator , which becomes the MLE when α = 0 . 5 , 0+ , or 2 . Moreover , for α < 1 , all moments exist and expo nential bounds can be established. 1 Introd uction The u biquitou s ph enomen on o f massive data streams[10, 7, 12, 2, 6, 19] imposes many challeng es inc luding transmit , comp ute , and stor e [19]. In fact, “Scaling Up for High Dimen sional Data and High Speed Data Streams” is am ong the “ten ch allenging proble ms in d ata minin g research. ” 1 This pa per focu ses on a pprox imating freq uency moments o f stream s, using a new method called ske wed stab le random pr ojectio ns , wh ich consider ably (or even “inﬁnitely” in special cases) improves p revious meth ods based on symmetric sta ble random pr ojections [11, 12, 16]. Consider the pop ular T urnstile mod el [19]. T he inpu t data stream a t = ( i, I t ) arriving sequentially describes the un derlying signal A , meanin g A t [ i ] = A t − 1 [ i ] + I t , i ∈ [1 , D ] . The in crement I t can be either positive (insertion) or n egati ve ( deletion). Restricting I t ≥ 0 results in th e cash r egister mod el. Restricting A t [ i ] ≥ 0 at all times t (but still allowing I t to be either positiv e or negative) results in th e strict T urnstile mode l, which sufﬁces for describing m any (but n ot all) natur al ph enomena . For example[19], in a d atabase, a reco rd can only be deleted if it was previously inserted. A nother example is the checkin g/savings accou nt, which allows deposits and withdrawals but in generally does not allow overdraft. Our proposed method of sk ewed table r a ndom pr ojections is applicable when, at the time t f or the e valuation, A t [ i ] ≥ 0 fo r all i . This is much mo re ﬂexible than the strict T urnstile model, which requ ires that A t [ i ] ≥ 0 for all t . I n other words, our pro posed m ethod is ap plicable to data stream s that are ( a) insertion on ly (i.e., the cash re gister mo del), or (b) alw ay s n on-negative (i.e., the strict T u rnstile m odel), or (c) eventually non -negativ e at check points. W e b elie ve our model sufﬁces fo r most natural data streams encountere d in practice. Pioneered by[1], there ha ve been many studies on approx imating the α th fr equency mom ent F ( α ) , deﬁned as F ( α ) = D X i =1 ( A t [ i ]) α . [1] consider ed integer moments, α = 0 , 1 , 2, as well as α > 2 . Soon af ter , [7, 11] p rovided improved algor ithms for 0 < α ≤ 2 . [20, 3] p roved th e sample co mplexity lower bound s for α > 2 . [23] proved th e o ptimal lower bou nds fo r all frequen cy moments, except for α = 1 , bec ause [23] c onsidered non-n egati ve data streams ( A t [ i ] ≥ 0 ), for which one can comp ute F (1) essentially error-free with a coun ter[18 , 8, 1]. [13] pr ovided algorithm s for α > 2 to (essentially) achiev e the lo wer boun ds proved in [20, 3]. W e should also mention that the fundam ental com plexity results [24, 25] were used in the pr oofs in [1, 20, 3, 23]. Our p roposed metho d of skewed sta ble random p r ojections is app licable when 0 < α ≤ 2 and it works particularly well when α is o nly slightly smaller or larger than 1, i.e., α = 1 ± ∆ an d ∆ is small. This can be practically very useful. For example, ∆ may be in terpreted a s the “decay rate” o r th e “ interest rate, ” which is usually small. I n a sense, we can view sk ewed stable random pr ojections as a “gen eralized counter” in that it can count the total values in t h e future taking into account the effect of decay ing or interest accruement. This is the ﬁrst pa per on ske wed stable random pr ojectio ns , an d hence we start with a b rief intro duction to ske wed stable distributions. 1.1 Skewe d Stable Distributions A rando m variable Z fo llows a β - ske wed α -stable distribution if the Fourier transform of its den sity is[26, 21] F Z ( t ) = E exp  √ − 1 Z t  = exp  − F | t | α  1 − √ − 1 β sgn ( t ) tan  π α 2  , α 6 = 1 , where − 1 ≤ β ≤ 1 and F > 0 is the scale parameter . W e den ote Z ∼ S ( α, β , F ) . 1 http://www.cs. uvm.edu/ ˜ icdm/10Problem s/index.shtml 2 Consider two indep endent random variables, Z 1 ∼ S ( α, β , 1) and Z 2 ∼ S ( α, β , 1) . For any non -negative constants C 1 and C 2 , the “ α -stability” follows from properties of F o urier transforms: Z = C 1 Z 1 + C 2 Z 2 ∼ S ( α, β , C α 1 + C α 2 ) . Howe ver , if C 1 and C 2 do not have the same signs, the above “stability” does not hold (u nless β = 0 or α = 2 , 0+ ). T o see this, we con sider Z = C 1 Z 1 − C 2 Z 2 , with C 1 ≥ 0 and C 2 ≥ 0 . The n, b ecause F − Z 2 ( t ) = F Z 2 ( − t ) , F Z = exp  −| C 1 t | α  1 − √ − 1 β sgn ( t ) tan  π α 2  exp  −| C 2 t | α  1 + √ − 1 β sgn ( t ) tan  π α 2  , which does not repr esent a s tab le la w , unless β = 0 or α = 2 , 0+ . This is the fund amental reason why symmetric stable random pr ojections can b e applied to the general T urnstile mode l while our skewed stable random pr ojec- tions will be limited to non-n egati ve stream s at the time of ev alu ations. W e will soon e x plain why we recommend β = 1 (fully skewed). While there ha ve been nu merous stud ies and app lications of rand om p rojections, to the b est of ou r knowledge, this is the ﬁrst propo sal for skewed stable random projections. 1.2 Symmetric Stable Random Project i ons Consider a data stream A t [ i ] , i ∈ [1 , D ] , f ollowing the T urnstile mod el. [11, 12] d escribed the fo llowing (ideal- ized) proced ure for app roximating F (1) = P D i =1 ( A t [ i ]) : 1. Genera te R ∈ R D × k with i.i.d. entries r ij ∼ S (1 , 0 , 1) , i.e., standard Cauchy . Set x j = 0 , with j = 1 to k . 2. For each ne w tuple a t = ( i, I t ) , perfo rm x j = x j + I t × r ij , for all j = 1 to k . 3. Return median ( | x j | , j = 1 , ..., k ) , as the estimate of F (1) . This pro cedure extends to 0 < α ≤ 2 . By prop erties of Fourier transfor ms, the gen erated x j , j = 1 to k , represent k i.i.d . samp les x j ∼ S ( α, 0 , F ( α ) ) . Thus, the pro blem boils down t o estimating th e scale parame ter F ( α ) from k i.i.d. samples. The recent paper [16] proposed estimators based on the geometric mean and harmonic mean . • The geometric mean estimator has variance asymp totically to be ( α 2 +2) π 2 12 1 k F 2 ( α ) . It exhib its exponen tial tail bound s and has the sample complexity bound k =  ( α 2 +2) π 2 6 + O ( ǫ )  1 ǫ 2 log  2 δ  , so that with prob ability at least 1 − δ , the estimate is within a 1 ± ǫ factor of the truth. • The harmon ic mean estimator is statistically optimal and consider ably more accurate than the geometric mean estimator, when α → 0+ . As α is slightly away from 0 , the variance incr eases sub stantially and becomes inﬁnite when α → 0 . 5 . Th is estimator does not have bo unds in exponential forms unless α = 0+ . 1.3 Skewe d Stable Random Projections If, at the time t for the evaluation, the data stream is non -negative (wh ich includes the strict T urn stile mod el as a special case), using symmetric stable r an dom pr o jections is unn ecessary . For example, at α = 1 , using symmetric stable random pr ojections and the geometric mean estimator[ 16], the sample com plexity is asymptotically k =  π 2 2 + O ( ǫ )  1 ǫ 2 log  2 δ  , which is unn ecessary , be cause at α = 1 , we can u se a simple co unter to compu te F (1) essentially er ror-free[18 , 8, 1 , 23]. T he pro blem becomes mo re interesting when α is slightly larger or smaller than 1. Idea lly , we h ope to have a me chanism that will be (essentially) erro r-free wh en α → 1 in a continu ous fashion. The method of skewed stable random pr ojection s provides such a tool. Instead of g enerating the projectio n ma trix R ∈ R D × k from i.i.d. sym metric stable r ij ∼ S ( α, 0 , 1) , we generate r ij ∼ S ( α, β , 1) (an d we r ecommen d β = 1 ). After the pro jection operatio ns on the da ta stream A t [ i ] , ( i = 1 to D ), we obtain k i.i.d . samples x j ∼ ( α, β , F ( α ) ) , where F ( α ) = P D i =1 ( A t [ i ]) α is what we are after . Therefo re, we face a new estimation task, which is m ore soph isticated and less well-studied in statistics than that in symmetric stab le random pr ojec tions . Th us, we have to build some o f the basic tools from the ﬁrst statistical principle. W e derive the g eneral formu la f or th e m oments of skewed stab le d istributions, based on which we propo se the geometric mean and harmonic mean estimators. In particular , we discover some interesting pro perties of fu lly skew ed stable distributions , which make some estimators h av e better b ehaviors (e.g., tail bo unds) th an previous analogous estimator s in [16]. 3 1.4 Summary of Estimators Assume k i.i.d . samples x j ∼ S  α, β = 1 , F ( α )  . W e pro pose ﬁ ve types of estimators and analyze their v arian ces and tail bound s, includin g the geometric mea n estimato r , the h armonic mean estimator , the maximu m likelihood estimator, as well as the o ptimal power estimator . Figure 1 com pares th eir asymptotic variances alo ng with the asymptotic variance of the geometric mean estimator for symmetric stable random pr ojections [16]. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 1 2 3 4 5 α Asymp. variance factor Geometric mean Harmonic mean Optimal power Symmetric GM Figure 1: Let ˆ F be an estimator of F with a symptotic variance V ar  ˆ F  = V F 2 k + O  1 k 2  . W e plot the V values for the geo metric mean estimator, the ha rmonic mean e stimator (fo r α < 1 ), th e op timal power estimator (the lower dashed curve), along with the V values for the geometric mean estimator for symmetric stable r a ndom pr ojections in [16] (“symmetric GM” , th e up per d ashed curve). When α → 1 , our method achieves an “inﬁnite improvement” in terms of the asym ptotic v aria nces. 1.4.1 The geometric mean estimator , ˆ F ( α ) ,gm , f o r 0 < α ≤ 2 , ( α 6 = 1 ) ˆ F ( α ) ,gm = Q k j =1 | x j | α/k  cos k  κ ( α ) π 2 k  / cos  κ ( α ) π 2   2 π sin  π α 2 k  Γ  1 − 1 k  Γ  α k  k . V ar  ˆ F ( α ) ,gm  = F 2 ( α ) k π 2 12  α 2 + 2 − 3 κ 2 ( α )  + O  1 k 2  , κ ( α ) = α, if α < 1 , κ ( α ) = 2 − α, if α > 1 . ˆ F ( α ) ,gm is unbiased and has exponential tail bo unds for all 0 < α ≤ 2 . W e pr ovide the sample complexity bound k = O  G 1 ǫ 2 log 2 ǫ  explicitly and prove that, as α = 1 ± ∆ → 1 (i.e., ∆ → 0 ), for ﬁxed ǫ , G = ǫ 2 log(1 + ǫ ) − 2 p ∆ lo g(1 + ǫ ) + o  √ ∆  . 1.4.2 The harmonic estimator , ˆ F ( α ) ,hm,c , for 0 < α < 1 ˆ F ( α ) ,hm,c = k cos ( απ 2 ) Γ(1+ α ) P k j =1 | x j | − α  1 − 1 k  2Γ 2 (1 + α ) Γ(1 + 2 α ) − 1  , E  ˆ F ( α ) ,hm,c  = F ( α ) + O  1 k 2  , V ar  ˆ F ( α ) ,hm,c  = F 2 ( α ) k  2Γ 2 (1 + α ) Γ(1 + 2 α ) − 1  + O  1 k 2  . 4 ˆ F ( α ) ,hm,c has exponen tial tail bound s and we provid e the co nstants e x plicitly . 1.4.3 The maximum likelihood estimator , ˆ F (0 . 5) ,mle,c , f o r α = 0 . 5 only ˆ F (0 . 5) ,mle,c =  1 − 3 4 1 k  s k P k j =1 1 x j , E  ˆ F (0 . 5) ,mle,c  = F (0 . 5) + O  1 k 2  , V ar  ˆ F (0 . 5) ,mle,c  = 1 2 F 2 (0 . 5) k + 9 8 F 2 (0 . 5) k 2 + O  1 k 3  . ˆ F (0 . 5) ,mle,c has exponential tail bounds and we provide the constants explicitly . 1.4.4 The optimal power estimator , ˆ F ( α ) ,op,c , f o r 0 < α ≤ 2 , ( α 6 = 1 ) ˆ F ( α ) ,op,c =    1 k P k j =1 | x j | λ ∗ α cos ( κ ( α ) λ ∗ π 2 ) cos λ ∗ ( κ ( α ) π 2 ) 2 π Γ(1 − λ ∗ )Γ( λ ∗ α ) sin  π 2 λ ∗ α     1 /λ ∗ × 1 − 1 k 1 2 λ ∗  1 λ ∗ − 1  cos ( κ ( α ) λ ∗ π ) 2 π Γ(1 − 2 λ ∗ )Γ(2 λ ∗ α ) sin ( π λ ∗ α )  cos  κ ( α ) λ ∗ π 2  2 π Γ(1 − λ ∗ )Γ( λ ∗ α ) sin  π 2 λ ∗ α  2 − 1 !! , E  ˆ F ( α ) ,op,c  = F ( α ) + O  1 k 2  V ar  ˆ F ( α ) ,op,c  = F 2 ( α ) 1 λ ∗ 2 k cos ( κ ( α ) λ ∗ π ) 2 π Γ(1 − 2 λ ∗ )Γ(2 λ ∗ α ) sin ( π λ ∗ α )  cos  κ ( α ) λ ∗ π 2  2 π Γ(1 − λ ∗ )Γ( λ ∗ α ) sin  π 2 λ ∗ α  2 − 1 ! + O  1 k 2  . λ ∗ = argmin g ( λ ; α ) , g ( λ ; α ) = 1 λ 2 cos ( κ ( α ) λπ ) 2 π Γ(1 − 2 λ )Γ(2 λα ) sin ( π λ α )  cos  κ ( α ) λπ 2  2 π Γ(1 − λ )Γ( λα ) sin  π 2 λα  2 − 1 ! . When 0 < α < 1 , we p rove that λ ∗ < 0 a nd ˆ F ( α ) ,op,c has expo nential tail bo unds (not explicitly includ ed in the article). g ( λ ; α ) is a convex fu nction of λ , b u t we provide the rigo rous proof only for 0 < α < 1 . ˆ F ( α ) ,op,c becomes the harmonic mea n estima tor whe n α = 0+ , the arithm etic mean estimator w hen α = 2 , and the maximum likelihood estimator when α = 0 . 5 . 2 The Geometric Mean Estimator W e ﬁrst prove a fu ndamental result about the moments of ske wed stable distributions. Lemma 1 If Z ∼ S ( α, β , F ( α ) ) , then for any λ , wher e − 1 < λ < α , E  | Z | λ  = F λ/α ( α ) cos  λ α tan − 1  β tan  απ 2    1 + β 2 tan 2  απ 2  λ 2 α  2 π sin  π 2 λ  Γ  1 − λ α  Γ ( λ )  , (1) which can be simpliﬁed when β = 1 , to be E  | Z | λ  = F λ/α ( α ) cos  κ ( α ) α λπ 2  cos λ/α  κ ( α ) π 2   2 π sin  π 2 λ  Γ  1 − λ α  Γ ( λ )  , (2) κ ( α ) = α if α < 1 , and κ ( α ) = 2 − α if α > 1 . (3) 5 F or α < 1 , an d −∞ < λ < α , E  | Z | λ  = E  Z λ  = F λ/α ( α ) Γ  1 − λ α  cos λ/α  απ 2  Γ (1 − λ ) . (4) Proof: See Append ix A .  Recall after k pro jections, we obtain k i.i.d. samples x j ∼ S ( α, β , F ( α ) ) and the task b ecomes estimating the scale parameter F ( α ) from these k samples. Setting λ = α k in Lemma 1 yields an unbiased estimator of F ( α ) , ˆ F ( α ) ,gm , β = Q k j =1 | x j | α/k cos k  1 k tan − 1  β tan  απ 2   1 + β 2 tan 2  απ 2  1 2  2 π sin  π α 2 k  Γ  1 − 1 k  Γ  α k  k . (5) Because of th e symmetry abo ut β = 0 , we on ly consider 0 ≤ β ≤ 1 . I n the fo llowing Lemma. we show that the variance of ˆ F ( α ) ,gm , β decreases with increasing β . Lemma 2 The variance of ˆ F ( α ) ,gm , β V ar  ˆ F ( α ) ,gm , β  = F 2 ( α ) cos k  2 k tan − 1  β tan  απ 2   2 π sin  π α k  Γ  1 − 2 k  Γ  2 α k  k cos 2 k  1 k tan − 1  β tan  απ 2   2 π sin  π α 2 k  Γ  1 − 1 k  Γ  α k  2 k − 1 ! , (6) is a decr easing functio n of β ∈ [0 , 1] . Proof: It sufﬁces to consider h ( β ) = cos  2 k tan − 1  β tan  απ 2  cos 2  1 k tan − 1  β tan  απ 2  = 2 − sec 2  1 k tan − 1  β tan  απ 2   . which is a deceasing function of β ∈ [0 , 1] . Thus V ar  ˆ F ( α ) ,gm , β  is also a decr easing function of β ∈ [0 , 1] .  Therefo re, in o rder to ach iev e the smallest variance, we take β = 1 . For brevity , we simp ly use ˆ F ( α ) ,gm instead of ˆ F ( α ) ,gm , 1 . In fact, for the rest of the paper, we will always consider β = 1 only . W e rewrite ˆ F ( α ) ,gm (i.e., ˆ F ( α ) ,gm , β =1 ) as ˆ F ( α ) ,gm = Q k j =1 | x j | α/k  cos k  κ ( α ) π 2 k  / cos  κ ( α ) π 2   2 π sin  π α 2 k  Γ  1 − 1 k  Γ  α k  k . (7) Recall κ ( α ) = α , if α < 1 , and κ ( α ) = 2 − α if α > 1 . W e n eed to restrict that k ≥ 2 . The next Lemma concerns the as y mptotic moments of ˆ F ( α ) ,gm . Lemma 3 As k → ∞  cos  κ ( α ) π 2 k  2 π Γ  α k  Γ  1 − 1 k  sin  π 2 α k   k → exp ( − γ e ( α − 1)) , (8) monotonically with incr ea sing k ( k ≥ 2 ), wher e γ e = 0 . 5 7724 ... is Euler’s constan t. F or any ﬁxed t , a s k → ∞ , E   ˆ F ( α ) ,gm  t  = F t ( α ) cos k  κ ( α ) π 2 k t   2 π sin  π α 2 k t  Γ  1 − t k  Γ  α k t  k cos kt  κ ( α ) π 2 k   2 π sin  π α 2 k  Γ  1 − 1 k  Γ  α k  kt = F t ( α ) exp  1 k π 2 ( t 2 − t ) 24  α 2 + 2 − 3 κ 2 ( α )  + O  1 k 2  . (9) Consequently , V ar  ˆ F ( α ) ,gm  = F 2 ( α ) k π 2 12  α 2 + 2 − 3 κ 2 ( α )  + O  1 k 2  . (10) Proof: See Appe ndix B.  6 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 ε G R,gm α = 0.01 0.99 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 α = 0.9999 (a) Right tail bound constant , α < 1 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 12 14 16 18 ε G R,gm α = 2.0 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.01 1.0001 (b) Right tail bound constant, α > 1 0 0.2 0.4 0.6 0.8 1 0 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 ε G L,gm α = 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 (c) Left tail bound constant, α < 1 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 10 11 ε G L,gm 1.01 1.1 1.2 1.3 1.4 1.5 1.9 1.8 1.7 1.6 α = 2 (d) Left tail bound constant , α > 1 Figure 2: W e plot the tail bou nd constants of ˆ F ( α ) ,gm in Lemma 4, for a wide rang e of α and ǫ . For convenience, we plot th e left bound co nstant G L,g m using its asymptote ( i.e., a ssuming k 0 = ∞ in (1 4). This is eq uiv alent to replace the denom inator in (7) by its asymp tote, which can be vie we d as a biased v er sion of the estimator in (7). Lemma 4 provides the tail bounds and Figure 2 p lots the tail bound constants. Lemma 4 The right tail bou nd: Pr  ˆ F ( α ) ,gm − F ( α ) ≥ ǫF ( α )  ≤ exp  − k ǫ 2 G R,gm  , ǫ > 0 . (11) wher e ǫ 2 G R,gm = C R log(1 + ǫ ) − C R γ e ( α − 1) − log  cos  κ ( α ) π C R 2  2 π Γ ( αC R ) Γ (1 − C R ) s in  π αC R 2  , (12) and C R is the solution to γ e ( α − 1) − log (1 + ǫ ) − κ ( α ) π 2 tan  κ ( α ) π 2 C R  + απ / 2 tan  απ 2 C R  + ψ ( αC R ) α − ψ (1 − C R ) = 0 . Her e ψ ( z ) = Γ ′ ( z ) Γ( z ) is the “Psi” function . The left tail bound: Pr  ˆ F ( α ) ,gm − F ( α ) ≤ − ǫF ( α )  ≤ exp  − k ǫ 2 G L,g m,k 0  , k > k 0 0 < ǫ < 1 . (13) 7 wher e ǫ 2 G L,g m,k 0 = − C L log(1 − ǫ ) − lo g  − cos  κ ( α ) π 2 C L  2 π Γ ( − αC L ) Γ (1 + C L ) s in  π αC L 2  − k 0 C L log  cos  κ ( α ) π 2 k 0  2 π Γ  α k 0  Γ  1 − 1 k 0  sin  π 2 α k 0  , (14) and C L is the solution to log(1 − ǫ ) C L − γ e ( α − 1) C L + κ ( α ) π 2 tan  κ ( α ) π 2 C L  − απ 2 tan  απ 2 C L  − ψ (1 + αC L ) α + ψ (1 + C L ) = 0 . Proof: See Append ix C.  It is interesting and practically important to under stand the behavior of the tail bounds when α = 1 ± ∆ → 0 , i.e., ∆ → 0 . Figure 3 plots the righ t tail bound con stant G R,gm as a functio n of ∆ instead of α . 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 6 7 8 9 ∆ ( α <1) G R,gm ε = 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ε = 1.0 (a) α < 1 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 12 14 16 18 ∆ ( α >1) G R,gm ε = 0.01 ε = 1.0 (b) α > 1 Figure 3: W e p lot the rig ht tail bou nd constant G R,gm in Lem ma 4, as a fun ction of ∆ instead of α . Here, we let 0 < ∆ < 1 alw a ys. If α < 1 , then α = 1 − ∆ , and if α > 1 , then α = 1 + ∆ . Lemma 5 describ es th e r ate of convergence of the right tail bound constant G R,gm as a fun ction of ∆ when ∆ → 0 , fo r ﬁxed ǫ . Lemma 5 Let α = 1 − ∆ if α < 1 and α = 1 + ∆ if α > 1 , i.e. 0 < ∆ < 1 . F or ﬁxed ǫ , as α → 1 (i.e., as ∆ → 0 ) , the (right) tail bound constant G R,gm in Lemma 4 conver ges to ǫ 2 log(1+ ǫ ) at the rate O  √ ∆  : G R,gm = ǫ 2 log(1 + ǫ ) − 2 p ∆ lo g (1 + ǫ ) + o  √ ∆  . (15) Proof: See Append ix D.  The fact that G R,gm conv erges at th e rate O  √ ∆  does n ot appear c ompletely intu iti ve. For the sake of veriﬁcation, Figure 4 plots G R,gm for small values of ∆ , alo ng with the approximation s suggested in (15). Once we know the expone ntial tail b ounds, we can establish the samp le co mplexity bound immed iately , that k = O  G 1 ǫ 2 log  2 δ  sufﬁces to appr oximate F ( α ) within a 1 ± ǫ factor with probab ility at least 1 − δ . It suf ﬁces to let G = ma x { G R,gm , G L,g m } . 3 The Harmonic Mean Estimators f or 0 < α < 1 While the geometric mean estimator ˆ F ( α ) ,gm applies to 0 < α ≤ 2 ( α 6 = 1 ), it is by no means the op timal estimator . For α < 1 , the h armonic mean estimator can conside rably imp rove ˆ F ( α ) ,gm . Unlike the harmon ic 8 10 −4 10 −3 10 −2 0 0.5 1 1.5 2 ∆ ( α <1) G R,gm ε = 1.0 ε = 0.1 ε = 0.5 Exact Approximate (a) α < 1 10 −4 10 −3 10 −2 0 0.5 1 1.5 2 ∆ ( α >1) G R,gm ε = 1.0 ε = 0.5 ε = 0.1 Exact Approximate (b) α > 1 Figure 4: W e plot G R,gm for small ∆ , along with the ap proxim ations suggested in (15), i.e. , G R,gm ≈ ǫ 2 log(1+ ǫ ) − 2 √ ∆ log(1+ ǫ ) for small ∆ . mean estimator in [16], which is useful only for small α an d h as no expo nential tail b ound s except for α = 0+ , the harmonic mean estimator in this study has very nice tail properties for all 0 < α < 1 . The harmon ic mean estimato r takes ad vantage of th e fact th at if Z ∼ S ( α < 1 , β = 1 , F ( α ) ) , th en E  | Z | λ  exists f or all −∞ < λ < α . Note that when α < 1 an d β = 1 , Z is al ways non-negativ e , i.e., E  | Z | λ  = E  Z λ  . Lemma 6 Assume k i.i.d. sa mples x j ∼ S ( α < 1 , β = 1 , F ( α ) ) , we deﬁne the ha rmonic mean estimator ˆ F ( α ) ,hm , ˆ F ( α ) ,hm = k cos ( απ 2 ) Γ(1+ α ) P k j =1 | x j | − α , (16) and the bias-corr ected harmon ic mean estimator ˆ F ( α ) ,hm,c , ˆ F ( α ) ,hm,c = k cos ( απ 2 ) Γ(1+ α ) P k j =1 | x j | − α  1 − 1 k  2Γ 2 (1 + α ) Γ(1 + 2 α ) − 1  . (17) The bias and variance of ˆ F ( α ) ,hm,c ar e E  ˆ F ( α ) ,hm,c  = F ( α ) + O  1 k 2  , (18) V a r  ˆ F ( α ) ,hm,c  = F 2 ( α ) k  2Γ 2 (1 + α ) Γ(1 + 2 α ) − 1  + O  1 k 2  . (19) The right tail bound of ˆ F ( α ) ,hm is Pr  ˆ F ( α ) ,hm − F ( α ) ≥ ǫF ( α )  ≤ exp  − k  ǫ 2 G R,hm  , ǫ > 0 , (20) ǫ 2 G R,hm = − log ∞ X m =0 Γ m (1 + α ) Γ(1 + mα ) ( − t ∗ 1 ) m ! − t ∗ 1 1 + ǫ , (21) wher e t ∗ 1 is the solution to P ∞ m =1 ( − 1) m m ( t ∗ 1 ) m − 1 Γ m (1+ α ) Γ(1+ mα ) P ∞ m =0 ( − 1) m ( t ∗ 1 ) m Γ m (1+ α ) Γ(1+ mα ) + 1 1 + ǫ = 0 . (22) 9 The left tail bound of ˆ F ( α ) ,hm is Pr  ˆ F ( α ) ,hm − F ( α ) ≤ − ǫ F ( α )  ≤ exp  − k  ǫ 2 G L,hm  , 0 < ǫ < 1 , (23) ǫ 2 G L,hm = − log ∞ X m =0 Γ m (1 + α ) Γ(1 + mα ) ( t ∗ 2 ) m ! + t ∗ 2 1 − ǫ (24) wher e t ∗ 2 is the solution to − P ∞ m =1 m ( t ∗ 2 ) m − 1 Γ m (1+ α ) Γ(1+ mα ) P ∞ m =0 ( t ∗ 2 ) m Γ m (1+ α ) Γ(1+ mα ) + 1 1 − ǫ = 0 (25) Proof: See Append ix E.  . 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 ε G R,hm α = 0.01 0.2 0.4 0.3 0.5 0.6 0.1 0.7 0.8 0.9 0.99 (a) Right tail bound constant 0 0.2 0.4 0.6 0.8 1 0 0.4 0.8 1.2 1.6 2 ε G L,hm α = 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.8 0.9 0.99 0.7 (b) L eft tail bound constant Figure 5: W e plot the tail bou nd constants for t he harmonic mean estimator in Lemma 6. 4 The Maximum Likelihood Estimators f or α = 0 . 5 Estimators based on the maximum likelihood are statistically optimal (thoug h usu ally biased). It is known that the optimal estimator fo r F (2) is the arithmetic mean , which is the maximum likelihood estimator (MLE ). [16] has shown that the harmonic mean estimato r is the MLE for α = 0+ . This section analyzes the MLE for α = 0 . 5 , which correspo nds to the L ´ evy distribution. Su ppose X ∼ S ( α = 0 . 5 , β = 1 , F (0 . 5) ) . The n f Z ( z ) = F (0 . 5) √ 2 π exp  − F 2 (0 . 5) 2 z  z 3 / 2 , F Z ( z ) = 2 √ π Z ∞ √ 1 2 z e − t 2 dt = erfc r 1 2 z ! . (26) The next Lemma deri ves the maximum likelihood estimators and their mo ments. Lemma 7 Assume k i. i.d. samples x j ∼ S (0 . 5 , 1 , F (0 . 5) ) , the maximum likelihood estimator of F (0 . 5) , is ˆ F (0 . 5) ,mle = s k P k j =1 1 x j . (27) T o r educe the bias and variance, we r ecommen d the bias-corr ected version: ˆ F (0 . 5) ,mle,c =  1 − 3 4 1 k  ˆ F (0 . 5) ,mle =  1 − 3 4 1 k  s k P k j =1 1 x j . (28) 10 The ﬁrst four moments of ˆ h mle,c ar e E  ˆ F (0 . 5) ,mle,c  = F (0 . 5) + O  1 k 2  , (29) V a r  ˆ F (0 . 5) ,mle,c  = 1 2 F 2 (0 . 5) k + 9 8 F 2 (0 . 5) k 2 + O  1 k 3  , (30) E  ˆ F (0 . 5) ,mle,c − E  ˆ F (0 . 5) ,mle,c  3 = 5 4 F 3 (0 . 5) k 2 + O  1 k 3  , (31) E  ˆ F (0 . 5) ,mle,c − E  ˆ F (0 . 5) ,mle,c  4 = 3 4 F 4 (0 . 5) k 2 + 75 8 F 4 (0 . 5) k 3 + O  1 k 4  . (32) Proof: See Appe ndix F.  . Compared with the geometric mean estimator at α = 0 . 5 , wh ose variance is 1 . 233 7 F 2 (0 . 5) k + O  1 k 2  , we can see that ˆ F (0 . 5) ,mle,c signiﬁcantly re duces the variance. Compared with the ha rmonic mean estimator at α = 0 . 5 , whose variance is 0 . 5708 k F 2 (0 . 5) + O  1 k 2  , the variance of ˆ F (0 . 5) ,mle,c is still smaller . The next task is to der iv e tail bou nds. Alth ough we recomme nd the b ias-corrected version ˆ F (0 . 5) ,mle,c , for conv en ience, we actually present the tail bou nds only for ˆ F (0 . 5) ,mle . Lemma 8 Pr  ˆ F (0 . 5) ,mle − F (0 . 5) ≥ ǫF (0 . 5)  ≤ exp  − k  log(1 + ǫ ) − 1 2 + 1 2 1 (1 + ǫ ) 2  , ǫ > 0 , (33 ) Pr  ˆ F (0 . 5) ,mle − F (0 . 5) ≤ − ǫ F (0 . 5)  ≤ exp  − k  log(1 − ǫ ) − 1 2 + 1 2 1 (1 − ǫ ) 2  , 0 < ǫ < 1 . (34) F o r small ǫ , the tail bounds can be written as Pr  ˆ F (0 . 5) ,mle − F (0 . 5) ≥ ǫF (0 . 5)  ≤ exp  − k  ǫ 2 − 5 3 ǫ 3 + ...  , (35) Pr  ˆ F (0 . 5) ,mle − F (0 . 5) ≤ − ǫF (0 . 5)  ≤ exp  − k  ǫ 2 + 5 3 ǫ 3 + ...  . (36) Proof: See Appe ndix G.  . 5 The Optimal Pow er Estima tor One may h av e no ticed that, the MLE at α = 0 . 5 , th e harmonic mean estimato r at α = 0 + , a nd the a rithmetic mean estimator fo r α = 2 , share the same fractiona l power f orm. Thus, this section is dev o ted to the o ptimal power estimator . Lemma 9 The optimal power estimator: ˆ F ( α ) ,op,c =    1 k P k j =1 | x j | λ ∗ α cos ( κ ( α ) λ ∗ π 2 ) cos λ ∗ ( κ ( α ) π 2 ) 2 π Γ(1 − λ ∗ )Γ( λ ∗ α ) sin  π 2 λ ∗ α     1 /λ ∗ × 1 − 1 k 1 2 λ ∗  1 λ ∗ − 1  cos ( κ ( α ) λ ∗ π ) 2 π Γ(1 − 2 λ ∗ )Γ(2 λ ∗ α ) sin ( π λ ∗ α )  cos  κ ( α ) λ ∗ π 2  2 π Γ(1 − λ ∗ )Γ( λ ∗ α ) sin  π 2 λ ∗ α  2 − 1 !! , ( 37) 11 has bias and variance E  ˆ F ( α ) ,op,c  = F ( α ) + O  1 k 2  (38) V a r  ˆ F ( α ) ,op,c  = F 2 ( α ) 1 λ ∗ 2 k cos ( κ ( α ) λ ∗ π ) 2 π Γ(1 − 2 λ ∗ )Γ(2 λ ∗ α ) sin ( π λ ∗ α )  cos  κ ( α ) λ ∗ π 2  2 π Γ(1 − λ ∗ )Γ( λ ∗ α ) sin  π 2 λ ∗ α  2 − 1 ! + O  1 k 2  . (39) wher e λ ∗ = argmin g ( λ ; α ) , g ( λ ; α ) = 1 λ 2 cos ( κ ( α ) λπ ) 2 π Γ(1 − 2 λ )Γ(2 λα ) sin ( π λ α )  cos  κ ( α ) λπ 2  2 π Γ(1 − λ )Γ( λα ) sin  π 2 λα  2 − 1 ! . (40) Proof: See Append ix H.  Figure 6(a) p lots g ( λ ; α ) in Lemm a 9 a s func tions of λ for a go od range of α values, illustrating that g ( λ ; α ) is a conve x function of λ an d hen ce the minimums λ ∗ can be easily obtained. Figure 6(b) plots the optimal v alues λ ∗ a functio n of α . −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 0 0.5 1 1.5 2 2.5 3 3.5 λ Variance factor 1e−6 0.2 0.4 0.6 1.2 1.4 1.6 1.8 2 (a) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −5 −4 −3 −2 −1 0 1 α λ * (b) Figure 6: (a) W e plot g ( λ ; α ) in Lemma 9 as fun ctions o f λ for a good range of α values, illustrating that g ( λ ; α ) is a convex functio n of λ an d hence the m inimums λ ∗ can be ea sily obtained (i.e. , the lowest points on th e curves). Note that there is a s in gularity at α = 2 − . (b) W e plot the optimal v alues λ ∗ a function of α , only for 0 < α < 2 . This typ e of estimator was r ecently pro posed in [17], for symmetric stable rando m pr ojections , by aggressively minimizing the asympto tic variance from the solution to a conve x progr am. The p roblem with the fractional power estimator in [17] is that it only has ﬁnite momen ts to a rather limited ord er (wh ich seriously affect tail behaviors). The story is so mewhat different f or the fractional power estimator in this section, althoug h the analysis be- comes more complicated than in [17]. For α < 1 , Lemma 10 proves tha t the optimal power λ ∗ < 0 , implying that all moments exist and exponential tail bounds hold . Lemm a 10 also proves that g ( λ ; α ) is a con vex f unction of λ . Lemma 10 If α < 1 , then g ( λ ; α ) is a conve x function of λ an d the optimal solution λ ∗ < 0 . Proof: See Append ix I.  The fact that λ ∗ < 0 when α < 1 is very useful, becau se it implies that the estimator h as all th e mo ments when α < 1 and con sequently e xp onential tail bounds exist. When α = 0 . 5 , we c an verify th at λ = − 2 satisﬁes ∂ g ( λ ; α ) ∂ λ = 0 . B e cause g ( λ ; α ) is a conv ex f unction, we know λ ∗ = − 2 wh en α = 0 . 5 , and ˆ F (0 . 5) ,op,c is exactly the maximum likelihood est imator at α = 0 . 5 , i.e., ˆ F (0 . 5) ,op,c =  1 − 3 4 1 k  s k P k j =1 1 x j . Therefo re, the op timal power estimator becomes statistically optimal at least at α = 0+ , α = 2 , and α = 0 . 5 . 12 6 Conclusion Approx imating the α th freq uency mom ents in massiv e data streams is a freq uently studied prob lem. In some applications, we might treat α as a tuning param eter . In other app lications, α may bear some physical m eaning, for example, α = 1 ± ∆ with ∆ being the “de cay rate” or “interest rate, ” where ∆ is o ften small. W e consider the popular T urnstile data stream model, which allo ws both insertion s and deletions. W e propose a new metho d called skewed stable random pr o jections for appro ximating th e α th frequ ency momen ts (where 0 < α ≤ 2 ) on data streams that are: (a) insertio n only (i.e., cash r egister model), or ( b) always no n-negative (i.e., strict T urnstile m odel), o r (c) eventually non-negative at chec k points. Becau se of the natu ral constrain ts in real-world, we belie ve our model sufﬁces for de scribing most data streams encountered in practice. Our proposed method w or ks particulary well when α is about 1, which correspond to many practical settings. For example, we can v ie w skewed stable random p r ojections as a “generalized coun ter” for ap proxim ating the total values in th e future taking into account the ef fec t of decaying or interest accruement. In this paper, detailed statistical analysis is con ducted on a variety of estimators d eriv ed from the ﬁrst princip le, including estimato rs based on the geometric mean , th e ha rmonic mean , the ma ximum likelihood , and the fractional power . The geometric mean estimator is particular ly useful for theo retical analysis of the samp le complexity bound as well a s the local beh avior of the sam ple complexity when α → 1 . For example, we show that using the geometric mean est im ator, the s a mple complexity bound constan t conver g es to ǫ 2 / log(1 + ǫ ) when α = 1 ± ∆ → 1 , at the rate O  √ ∆  . T o co nclude the paper, we sh ould men tion th at in some app lications, skewed stable random p r ojections may be combined with symmetric stable random pr ojections , due to the linearity in the deﬁnition of the α th frequen cy moment. For e xa mple, we can use skewed stable random pr ojection s for those elements which we are certain that they will e ventually turn non- negati ve at least at the time of ev aluation s; and we can use s y mmetric s ta ble r a ndom pr ojections for those elements which we are less certain about the signs. 7 Ackno wledgement The au thor wishes to th ank some helpful d iscussions and suggestion s from Gennady Sam orodn itsky , Jon Klein- berg, Martin W ells, and Anand V idyasha nkar . Th e auth or app reciates that Jelani Nelson ( and the r esearch gro up at MIT) mentio ned s o me immed iate applications of sk ewed stable random pr ojectio ns using α very close 1, after the author presented some of the unpub lished results in this paper at SOD A ’08. A Proof of Lemma 1 Assume Z ∼ S ( α, β , F ( α ) ) . T o pr ove E  | Z | λ  for − 1 < λ < α , [26, Theo rem 2.6.3] provided a partial answer: Z ∞ 0 z λ f Z ( z ; α, β B , F ( α ) ) dz = F λ/α ( α ) sin( π ρλ ) sin( π λ ) Γ  1 − λ α  Γ (1 − λ ) cos − λ/α ( π β B κ ( α ) / 2) where we denote κ ( α ) = α if α < 1 , and κ ( α ) = 2 − α if α > 1 , and accord ing to the notatio n and parametrization in the book[26, I.19, I.28] : β B = 2 π κ ( α ) tan − 1  β tan  π α 2  , ρ = 1 − β B κ ( a ) /α 2 . Note that cos − λ/α ( π β B κ ( α ) / 2) =  1 + tan 2 ( π β B κ ( α ) / 2)  λ 2 α =  1 + tan 2  tan − 1  β tan  π α 2  λ 2 α =  1 + β 2 tan 2  π α 2  λ 2 α . 13 Therefo re, fo r − 1 < λ < α , [26, Theorem 2.6.3 ] is equ i valent to Z ∞ 0 z λ f Z ( z ; α, β B , F ( α ) ) dz = F λ/α ( α ) sin( π ρλ ) sin( π λ ) Γ  1 − λ α  Γ (1 − λ )  1 + β 2 tan 2  π α 2  λ 2 α . T o compu te E  | Z | λ  , we take advantage of a usef ul property of the st able density functio n[26, page 65]: f Z ( − z ; α, β B , F ( α ) ) = f Z ( z ; α, − β B , F ( α ) ) . E  | Z | λ  = Z 0 −∞ ( − z ) λ f Z ( z ; α, β B , F ( α ) ) dz + Z ∞ 0 z λ f Z ( z ; α, β B , F ( α ) ) dz = Z ∞ 0 z λ f Z ( z ; α, − β B , F ( α ) ) dz + Z ∞ 0 z λ f Z ( z ; α, β B , F ( α ) ) dz = F λ/α ( α ) sin( πλ ) Γ  1 − λ α  Γ (1 − λ )  1 + β 2 tan 2  π α 2  λ 2 α  sin  π λ 1 − β B κ ( α ) /α 2  + sin  π λ 1 + β B κ ( α ) /α 2  = F λ/α ( α ) sin( πλ ) Γ  1 − λ α  Γ (1 − λ )  1 + β 2 tan 2  π α 2  λ 2 α  2 sin  π λ 2  cos  π λ 2 β B κ ( α ) /α  = F λ/α ( α ) cos( π λ/ 2) Γ  1 − λ α  Γ (1 − λ )  1 + β 2 tan 2  π α 2  λ 2 α cos  λ α tan − 1  β tan  π α 2   = F λ/α ( α )  1 + β 2 tan 2  π α 2  λ 2 α cos  λ α tan − 1  β tan  π α 2    2 π sin  π 2 λ  Γ  1 − λ α  Γ ( λ )  , which can be simpliﬁed when β = 1 , to b e E  | Z | λ  = F λ/α ( α ) cos  κ ( α ) α λπ 2  cos λ/α  κ ( α ) π 2   2 π sin  π 2 λ  Γ  1 − λ α  Γ ( λ )  . The ﬁna l task is to show that when α < 1 and β = 1 , E  | Z | λ  exists for all −∞ < λ < α , not ju st − 1 < λ < α . This is an extremely useful property . Note that when α < 1 and β = 1 , Z is always non-negative. As shown in the pro of of [26, Theorem 2.6.3], E  | Z | λ  = F λ/α ( α ) cos − λ/α  π α 2  1 π Im Z ∞ 0 z λ Z ∞ 0 exp  − z u exp( √ − 1 π / 2) − u α exp( − √ − 1 π α/ 2) + √ − 1 π 2  dudz = F λ/α ( α ) cos − λ/α  π α 2  1 π Im Z ∞ 0 Z ∞ 0 z λ exp  − z u √ − 1 − u α exp( − √ − 1 π α/ 2)  √ − 1 dudz . The only thing we need to ch eck is that in the p roof o f [26, Theorem 2.6.3], the condition for Fubini’ s th eorem (to exchange order of integration) still hold s when −∞ < α < 1 , β = 1 , and λ < − 1 . W e can show Z ∞ 0 Z ∞ 0   z λ exp  − z u √ − 1 − u α exp( − √ − 1 π α/ 2)  √ − 1   dudz = Z ∞ 0 Z ∞ 0 z λ   exp  − u α cos( π α/ 2) + √ − 1 u α sin( π α/ 2)    dudz = Z ∞ 0 Z ∞ 0 z λ exp ( − u α cos( π α/ 2)) dudz < ∞ , provided λ < − 1 ( λ 6 = − 1 , − 2 , − 3 , .... ) an d co s( πα/ 2 ) > 0 , i.e., α < 1 . Note th at | exp( √ − 1 x ) | = 1 a lw ay s and Euler’ s formu la: e x p( √ − 1 x ) = co s( x ) + √ − 1 s in( x ) is fre quently used to simplify the algebra. Once we have shown that Fubin i’ s con dition is satisﬁed, we can exchan ge the ord er o f integration and the rest just f ollows from th e pro of of [2 6, Theo rem 2.6 .3]. No te that because of c ontinuity , the “singularity po ints” λ = − 1 , − 2 , − 3 , ... do not matter . W e shou ld mention that in an unpublishe d technical report[ 14 ] , cited as [21, Pro perty 1. 2.17]) , E  | Z | λ  was proved in an integral form, b u t only for 0 < λ < α . 14 B Pr oof of Lemma 3 W e ﬁrst show that, for any ﬁxed t , as k → ∞ , E   ˆ F ( α ) ,gm  t  = F t ( α ) cos k  κ ( α ) π 2 k t   2 π sin  π α 2 k t  Γ  1 − t k  Γ  α k t  k cos kt  κ ( α ) π 2 k   2 π sin  π α 2 k  Γ  1 − 1 k  Γ  α k  kt = F t ( α ) exp  1 k π 2 ( t 2 − t ) 24  α 2 + 2 − 3 κ 2 ( α )  + O  1 k 2  . In [16], it was proved that, as k → ∞ ,  2 π sin  π α 2 k t  Γ  1 − t k  Γ  α k t  k  2 π sin  π α 2 k  Γ  1 − 1 k  Γ  α k  kt =1 + 1 k π 2 ( t 2 − t ) 24  α 2 + 2  + O  1 k 2  = exp  1 k π 2 ( t 2 − t ) 24  α 2 + 2  + O  1 k 2  . Using the inﬁnite produc t rep resentation of the cos function [9, 1.4 3.3] cos( z ) = ∞ Y s =0  1 − 4 z 2 (2 s + 1) 2 π 2  , we can rewrite cos k  κ ( α ) π 2 k t  cos kt  κ ( α ) π 2 k  = ∞ Y s =0  1 − κ 2 ( α ) t 2 (2 s + 1) 2 k 2  k  1 − κ 2 ( α ) (2 s + 1) 2 k 2  − kt = ∞ Y s =0  1 − κ 2 ( α ) t 2 (2 s + 1) 2 k 2   1 + t κ 2 ( α ) (2 s + 1) 2 k 2 + O  1 k 3  k = ∞ Y s =0  1 − κ 2 ( α )( t 2 − t ) (2 s + 1) 2 k 2 + O  1 k 3  k = ∞ Y s =0  1 − κ 2 ( α )( t 2 − t ) (2 s + 1) 2 k + O  1 k 2  = exp ∞ X s =0 log  1 − κ 2 ( α )( t 2 − t ) (2 s + 1) 2 k + O  1 k 2  ! = exp − κ 2 ( α ) k ( t 2 − t ) ∞ X s =0 1 (2 s + 1) 2 + O  1 k 2  ! = exp  − κ 2 ( α ) k ( t 2 − t ) π 2 8 + O  1 k 2  , which, combined with the result in [16], yields the desired expression. The next task i s to show  cos  κ ( α ) π 2 k  2 π Γ  α k  Γ  1 − 1 k  sin  π 2 α k   k → exp ( − γ e ( α − 1)) , monoto nically as k → ∞ , where γ e = 0 . 577215665 . .. , is Euler’ s constant. In [16], it was proved that, as k → ∞ ,  2 π Γ  α k  Γ  1 − 1 k  sin  π 2 α k   k → exp ( − γ e ( α − 1)) , 15 monoto nically . In th is study , we need to con sider instead  cos  κ ( α ) π 2 k  2 π Γ  α k  Γ  1 − 1 k  sin  π 2 α k   k = " 2 cos  κ ( α ) π 2 k  Γ  α k  sin  π α 2 k  Γ  1 k  sin  π k  # k (41) Note that the additional term h cos  κ ( α ) π 2 k i k = 1 − O  1 k  . Theref ore,  cos  κ ( α ) π 2 k  2 π Γ  α k  Γ  1 − 1 k  sin  π 2 α k   k → exp ( − γ e ( α − 1)) . T o show the mono tonicity , howe ver, we have to u se some different tec hniques from [1 6]. The reaso n is because the addition al term h cos  κ ( α ) π 2 k i k increases (instead of decreasing ) mon otonically with increasing k . First, we con sider α > 1 , i.e., κ ( α ) = 2 − α < 1 . For simplicity , we take log arithm of (4 1) and rep lace 1 /k by t , where 0 ≤ t ≤ 1 / 2 (recall k ≥ 2 ). It sufﬁces to sho w that g ( t ) increases with increasing t ∈ [0 , 1 / 2] , where g ( t ) = 1 t W ( t ) , W ( t ) = log  cos  κ ( α ) π 2 t  + log ( Γ ( αt )) + log  sin  π α 2 t  − log ( Γ ( t )) − log (sin ( π t )) + log(2) . Because g ′ ( t ) = 1 t W ′ ( t ) − 1 t 2 W ( t ) , to show g ′ ( t ) ≥ 0 in t ∈ [0 , 1 / 2] , it suf ﬁce s to show tW ′ ( t ) − W ( t ) ≥ 0 . One can check that tW ′ ( t ) → 0 and W ( t ) → 0 , as t → 0 + , where W ′ ( t ) = − tan  κ ( α ) π 2 t   κπ 2  + ψ ( αt ) α + 1 tan  π α 2 t   απ 2  − ψ ( t ) − 1 tan ( π t ) π . Here ψ ( x ) = ∂ log(Γ( x )) ∂ x is the “Psi” func tion. Therefo re, to show tW ′ ( t ) − W ( t ) ≥ 0 , it sufﬁces to show that t W ′ ( t ) − W ( t ) is an incr easing fun ction of t ∈ [0 , 1 / 2] , i.e., ( tW ′ ( t ) − W ( t )) ′ = W ′′ ( t ) ≥ 0 , i.e., W ′′ ( t ) = − sec 2  κ ( α ) π 2 t   κ ( α ) π 2  2 + ψ ′ ( αt ) α 2 − csc 2  π α 2 t   π α 2  2 − ψ ′ ( t ) + csc 2 ( π t ) π 2 ≥ 0 . Using series representation of ψ ( x ) [9, 8. 363.8] , we can show ψ ′ ( αt ) α 2 − ψ ′ ( t ) = ∞ X s =0 α 2 ( αt + s ) 2 − ∞ X s =0 1 ( t + s ) 2 = ∞ X s =0  1 ( t + s/α ) 2 − 1 ( t + s ) 2  ≥ 0 , because for now we con sider α > 1 . T hus, it sufﬁces to show that Q ( t ; α ) = − sec 2  κ ( α ) π 2 t   κ ( α ) π 2  2 − csc 2  π α 2 t   π α 2  2 + csc 2 ( π t ) π 2 ≥ 0 . T o show Q ( t ; α ) ≥ 0 , we can trea t Q ( t ; α ) as a functio n of α (fo r ﬁxed t ). Because both 1 sin( x ) and 1 cos( x ) are conv ex function s of x ∈ [0 , π / 2] , we know Q ( t ; α ) is a concave functio n of α (f or ﬁx ed t ). It is easy to check that lim α → 1+ Q ( t ; α ) = 0 , lim α → 2 − Q ( t ; α ) = 0 . Because Q ( t ; α ) is concave in α ∈ [1 , 2] , we must have Q ( t ; α ) ≥ 0 ; and consequently , W ′′ ( t ) ≥ 0 and g ′ ( t ) ≥ 0 . Therefo re, we have proved th at (41) decreases monoton ically with increasing k , when 1 < α ≤ 2 . 16 For α < 1 ( i.e., κ ( α ) = α < 1 ), we p rove the mono tonicity by a different techniqu e. First, using the inﬁnite-pro duct repr esentations of Gamma function[9, 8.322] and sin function [9 , 1. 431.1] , Γ( z ) = exp ( − γ e z ) z ∞ Y s =1  1 + z s  − 1 exp  z s  , sin( z ) = z ∞ Y s =1  1 − z 2 s 2 π 2  , we can rewrite (41) as " 2 cos  κ ( α ) π 2 k  Γ  α k  sin  π α 2 k  Γ  1 k  sin  π k  # k = " Γ  α k  sin  π α k  Γ  1 k  sin  π k  # k = exp ( − γ e ( α − 1)) × ∞ Y s =1 exp  α − 1 sk   1 + α k s  − 1  1 + 1 k s   1 − α 2 k 2 s 2   1 − 1 s 2 k 2  − 1 ! k . T o show its monotonicity , it sufﬁces to sho w that for any s ≥ 1  1 + α k s  − 1  1 + 1 k s   1 − α 2 k 2 s 2   1 − 1 s 2 k 2  − 1 ! k decreases monotonically , which i s equiv alen t to sho w the monoton icity o f g ( t ) with increasing t , for t ≥ 2 , where g ( t ) = t log  1 + α t  − 1  1 + 1 t   1 − α 2 t 2   1 − 1 t 2  − 1 ! = t log  t − α t − 1  . It is straightfo rward to show that t log  t − α t − 1  is monoto nically decreasing with increasing t ( t ≥ 2 ), for α < 1 . T o this end, we have proved that for 0 < α ≤ 2 ( α 6 = 1 ),  cos  κ ( α ) π 2 k  2 π Γ  α k  Γ  1 − 1 k  sin  π 2 α k   k → exp ( − γ e ( α − 1)) , monoto nically with incr easing k ( k ≥ 2 ). C Proof of Lemma 4 W e ﬁrst ﬁnd the constant G R,gm in the right tail bound Pr  ˆ F ( α ) ,gm − F ( α ) > ǫF ( α )  ≤ exp  − k ǫ 2 G R,gm  , ǫ > 0 . For 0 < t < k , the Markov moment bound yields Pr  ˆ F ( α ) ,gm − F ( α ) > ǫF ( α )  ≤ E  ˆ F ( α ) ,gm  t (1 + ǫ ) t F t ( α ) =(1 + ǫ ) − t h cos  κ ( α ) π 2 k t  2 π Γ  αt k  Γ  1 − t k  sin  π αt 2 k  i k h cos  κ ( α ) π 2 k  2 π Γ  α k  Γ  1 − 1 k  sin  π 2 α k  i kt ≤ (1 + ǫ ) − t h cos  κ ( α ) π 2 k t  2 π Γ  αt k  Γ  1 − t k  sin  π αt 2 k  i k exp ( − tγ e ( α − 1)) . W e need to ﬁnd the t th at minimizes the upper bound. For con venien ce, we consider its logarithm , i.e., g ( t ) = tγ e ( α − 1) − t log(1 + ǫ ) + k log  cos  κ ( α ) π 2 k t  2 π Γ  αt k  Γ  1 − t k  sin  π αt 2 k  17 whose ﬁrst and second deriv atives (with respect to t ) are g ′ ( t ) = γ e ( α − 1) − log (1 + ǫ ) − κ ( α ) π 2 tan  κ ( α ) π 2 k t  + απ / 2 tan  απt 2 k  + ψ  αt k  α − ψ  1 − t k  , g ′′ ( t ) = 1 k −  κ ( α ) π 2  2 sec 2  κ ( α ) π 2 k t  −  απ 2  2 csc 2  απ t 2 k  + α 2 ψ ′  αt k  + ψ ′  1 − t k  ! , where ψ ( z ) = Γ ′ ( z ) / Γ( z ) is th e Psi function . T o show that g ( t ) is a conv ex fu nction, i.e., g ′′ ( t ) ≥ 0 , we make use of the follo win g e x pansions: [9, 1.422 .2, 1.422 .4, 8.363.8] sec 2  π x 2  = 4 π 2 ∞ X j =1  1 (2 j − 1 − x ) 2 + 1 (2 j − 1 + x ) 2  , csc 2 ( π x ) = 1 π 2 x 2 + 2 π 2 ∞ X j =1 x 2 + j 2 ( x 2 − j 2 ) 2 , ψ ′ ( x ) = ∞ X j =0 1 ( x + j ) 2 , to rewrite k g ′′ ( t ) = − κ 2 ∞ X j =1  1 (2 j − 1 − κt/k ) 2 + 1 (2 j − 1 + κt/k ) 2  − k 2 t 2 − α 2 2 ∞ X j =1 ( αt/ 2 k ) 2 + j 2 (( αt/ 2 k ) 2 − j 2 ) 2 + α 2 ∞ X j =0 1 ( αt/k + j ) 2 + ∞ X j =0 1 (1 − t/k + j ) 2 = − κ 2 ∞ X j =1  1 (2 j − 1 − κt/k ) 2 + 1 (2 j − 1 + κt/k ) 2  − α 2 ∞ X j =1  1 ( αt/k − 2 j ) 2 + 1 ( αt/k + 2 j ) 2  + α 2 ∞ X j =1 1 ( αt/k + j ) 2 + ∞ X j =1 1 ( j − t/k ) 2 . If α < 1 , i.e., κ ( α ) = α , then k g ′′ ( t ) = − α 2 ∞ X j =1  1 ( αt/k − j ) 2 + 1 ( αt/k + j ) 2  + α 2 ∞ X j =1 1 ( αt/k + j ) 2 + ∞ X j =1 1 ( j − t/k ) 2 = − α 2 ∞ X j =1 1 ( j − αt/k ) 2 + ∞ X j =1 1 ( j − t/k ) 2 ≥ 0 , because α < 1 and 0 < t < k . If α > 1 , i.e., κ ( α ) = 2 − α < 1 , then k g ′′ ( t ) = − κ 2 ∞ X j =1  1 (2 j − 1 − κt/k ) 2 + 1 (2 j − 1 + κt/k ) 2  − α 2 ∞ X j =1  1 ( αt/k − 2 j ) 2 + 1 ( αt/k + 2 j ) 2  + α 2 ∞ X j =1 1 ( αt/k + 2 j ) 2 + α 2 ∞ X j =1 1 ( αt/k + 2 j − 1) 2 + ∞ X j =1 1 (2 j − t/k ) 2 + ∞ X j =1 1 (2 j − 1 − t/k ) 2 ≥ − κ 2 ∞ X j =1 1 (2 j − 1 + κt/k ) 2 − α 2 ∞ X j =1 1 (2 j − αt/k ) 2 + α 2 ∞ X j =1 1 ( αt/k + 2 j − 1) 2 + ∞ X j =1 1 (2 j − t/k ) 2 =   − ∞ X j =1 1 ((2 j − 1 ) /κ + t/k ) 2 + ∞ X j =1 1 ((2 j − 1 ) /α + t/k ) 2   +   − ∞ X j =1 1 (2 j /α − t/k ) 2 + ∞ X j =1 1 (2 j − t/k ) 2   ≥ 0 , 18 because α > κ . Since we ha ve p roved th at g ′′ ( t ) , i.e., g ( t ) is a co n vex function , on e can ﬁnd the o ptimal t b y solv ing g ′ ( t ) = 0 : γ e ( α − 1) − log (1 + ǫ ) − κ ( α ) π 2 tan  κ ( α ) π 2 k t  + απ / 2 tan  απt 2 k  + ψ  αt k  α − ψ  1 − t k  = 0 , W e let the solution be t = C R k , where C R is the solution to γ e ( α − 1) − log (1 + ǫ ) − κ ( α ) π 2 tan  κ ( α ) π 2 C R  + απ / 2 tan  απ 2 C R  + ψ ( αC R ) α − ψ (1 − C R ) = 0 . Alternatively , we can seek a “sub-o ptimal” (but asymptotica lly optimal) solution using the asympto tic expres- sion for E  ˆ F ( α ) ,gm  t in Lemma 3: h cos  κ ( α ) π 2 k t  2 π Γ  αt k  Γ  1 − t k  sin  π αt 2 k  i k h cos  κ ( α ) π 2 k  2 π Γ  α k  Γ  1 − 1 k  sin  π 2 α k  i kt = exp  1 k π 2 24  t 2 − t   2 + α 2 − 3 κ 2 ( α )  + ...  . In other words, we can seek the t that minimizes (1 + ǫ ) − t exp  1 k π 2 24  t 2 − t   2 + α 2 − 3 κ 2 ( α )   , whose minimum is attained at t = k log(1 + ǫ ) (2 + α 2 − 3 κ 2 ( α )) π 2 / 12 + 1 2 . This approximation will produ ce meaning less bound s even when ǫ is not too large, especially when α app roaches 1. Th erefore, de spite its simplicity , we d o not re commend this sub-o ptimal constant, which nevertheless can still be quite useful (e.g., ) for servin g the initial guess for C R in a numerica l procedure. Assume we know C R (e.g., by a simple numer ical proc edure), we can then express the right tail bou nd as Pr  ˆ F ( α ) ,gm − F ( α ) ≥ ǫF ( α )  ≤ (1 + ǫ ) − C R k h cos  κ ( α ) π C R 2  2 π Γ ( αC R ) Γ (1 − C R ) s in  π αC R 2  i k exp ( − C R k γ e ( α − 1)) = exp  − k ǫ 2 G R,gm  , where ǫ 2 G R,gm = C R log(1 + ǫ ) − C R γ e ( α − 1) − log  cos  κ ( α ) π C R 2  2 π Γ ( αC R ) Γ (1 − C R ) s in  π αC R 2  . Next, we ﬁnd the constant G L,g m,α,ǫ,k 0 in the left tail boun d Pr  ˆ F ( α ) ,gm − F ( α ) ≤ − ǫF ( α )  ≤ exp  − k ǫ 2 G L,α,ǫ,k 0  , k > k 0 , 0 < ǫ < 1 . From Lemma 3, we know that, for any t , where 0 < t < k /α if α > 1 an d t > 0 if α < 1 , Pr  ˆ F ( α ) ,gm ≤ (1 − ǫ ) F ( α )  = Pr  ˆ F − t ( α ) ,gm ≥ (1 − ǫ ) − t F − t ( α )  ≤ E  ˆ F − t ( α ) ,gm  (1 − ǫ ) − t F − t ( α ) = (1 − ǫ ) t h − cos  κ ( α ) π 2 k t  2 π Γ  − αt k  Γ  1 + t k  sin  π αt 2 k  i k h cos  κ ( α ) π 2 k  2 π Γ  α k  Γ  1 − 1 k  sin  π 2 α k  i − kt , 19 which can be minimized (sub-o ptimally) by ﬁnd ing the t , where t = C L k , such that log(1 − ǫ ) C L − γ e ( α − 1) C L + κ ( α ) π 2 tan  κ ( α ) π 2 C L  − απ 2 tan  απ 2 C L  − ψ (1 + αC L ) α + ψ (1 + C L ) = 0 . Thus, we have shown the left tail bound (for k > k 0 ) Pr  ˆ F ( α ) ,gm − F ( α ) < − ǫ F ( α )  ≤ exp  − k ǫ 2 G L,g m,k 0  , where ǫ 2 G L,g m,k 0 = − C L log(1 − ǫ ) − lo g  − cos  κ ( α ) π 2 C L  2 π Γ ( − αC L ) Γ (1 + C L ) s in  π αC L 2  − k 0 C L log  cos  κ ( α ) π 2 k 0  2 π Γ  α k 0  Γ  1 − 1 k 0  sin  π 2 α k 0  . D Proof of Lemma 5 From Lemma 4, ǫ 2 G R,gm = C R log(1 + ǫ ) − C R γ e ( α − 1) − log  cos  κ ( α ) π C R 2  2 π Γ ( αC R ) Γ (1 − C R ) s in  π αC R 2  , and C R is the solution to g 1 ( C R , α, ǫ ) = 0 , g 1 ( C R , α, ǫ ) = − γ e ( α − 1) + log (1 + ǫ ) + κ ( α ) π 2 tan  κ ( α ) π 2 C R  − απ / 2 tan  απ 2 C R  − ψ ( αC R ) α + ψ (1 − C R ) = 0 . Let α = 1 − ∆ if α < 1 an d α = 1 + ∆ if α > 1 . Thu s, 0 < ∆ < 0 and κ ( α ) = 1 − ∆ . Using the representation s in [9, 1.42 1.1,1.4 21.3,8.3 62.1] tan  π x 2  = 4 x π ∞ X j =1 1 (2 j − 1) 2 − x 2 , 1 tan ( πx ) = 1 π x + 2 x π ∞ X j =1 1 x 2 − j 2 , ψ ( x ) = − γ e − 1 x + x ∞ X j =1 1 j ( x + j ) , 20 we rewrite g 1 as g 1 = − γ e ( α − 1) + log(1 + ǫ ) + κπ 2 4 κC R π ∞ X j =1 1 (2 j − 1) 2 − ( κC R ) 2 − απ 2   2 π αC R + αC R π ∞ X j =1 1 ( αC R / 2) 2 − j 2   − α   − γ e − 1 αC R + αC R ∞ X j =1 1 j ( αC R + j )   +   − γ e − 1 1 − C R + (1 − C R ) ∞ X j =1 1 j (1 − C R + j )   = log (1 + ǫ ) + 2 κ 2 C R ∞ X j =1 1 (2 j − 1) 2 − ( κC R ) 2 + 2 α 2 C R ∞ X j =1 1 (2 j ) 2 − ( αC R ) 2 − α 2 C R ∞ X j =1 1 j ( αC R + j ) + (1 − C R ) ∞ X j =1 1 j (1 − C R + j ) − 1 1 − C R = log (1 + ǫ ) + κ ∞ X j =1  1 2 j + 1 − κC R − 1 2 j − 1 + κC R  + α ∞ X j =1  1 2 j − αC R − 1 2 j + αC R  − α ∞ X j =1  1 j − 1 αC R + j  + ∞ X j =1  1 j − 1 1 − C R + j  + κ 1 − κC R − 1 1 − C R It is easy to show that, as α → 1 , i.e., κ → 1 , the term lim α → 1 κ ∞ X j =1  1 2 j + 1 − κC R − 1 2 j − 1 + κC R  + α ∞ X j =1  1 2 j − αC R − 1 2 j + αC R  − α ∞ X j =1  1 j − 1 αC R + j  + ∞ X j =1  1 j − 1 1 − C R + j  = lim α → 1 ∞ X j =1  κ 2 j + 1 − κC R + α 2 j − αC R  − ∞ X j =1  κ 2 j − 1 + κC R + α 2 j + αC R  − α ∞ X j =1  1 j − 1 αC R + j  + ∞ X j =1  1 j − 1 1 − C R + j  = lim α → 1 ∞ X j =1 κ 1 + j − κC R − ∞ X j =1 κ j + κ C R − α ∞ X j =1  1 j − 1 αC R + j  + ∞ X j =1  1 j − 1 1 − C R + j  =0 . Recall that, fro m Lemm a 4, we know that g 1 = 0 h as a unique well-deﬁned so lution for C R ∈ (0 , 1) . W e also need to analyze the following term κ 1 − κC R − 1 1 − C R = κ − 1 (1 − κC R )(1 − C R ) = − ∆ (1 − κC R )(1 − C R ) , which, when α → 0 , mu st app roach a ﬁnite non -zero limit. In other words, W e must have C R → 1 , at the rate O  √ ∆  . This argumen t also provides an approxim ation for C R when α → 1 , i.e., C R = 1 − s ∆ log(1 + ǫ ) + o  √ ∆  . The next task i s to ana lyze G R,gm . ǫ 2 G R,gm = C R log(1 + ǫ ) − C R γ e ( α − 1) − log  cos  κ ( α ) π C R 2  2 π Γ ( αC R ) Γ (1 − C R ) s in  π αC R 2  = C R log(1 + ǫ ) − C R γ e ( α − 1) + log cos  απC R 2  Γ(1 − αC R ) cos  κπ C R 2  Γ(1 − C R ) ! . 21 Using the inﬁnite produc t rep resentations of the cosine and gamma fu nctions, we can re-write cos  απC R 2  Γ(1 − αC R ) cos  κπ C R 2  Γ(1 − C R ) = exp( γ e ( α − 1) C R ) 1 − C R 1 − αC R × ∞ Y j =0  1 − α 2 C 2 R (2 j + 1) 2   1 − κ 2 C 2 R (2 j + 1) 2  − 1 ∞ Y j =1 exp  (1 − α ) C R j   1 + 1 − C R j   1 + 1 − αC R j  − 1 = exp( γ e ( α − 1) C R ) (1 + αC R )(1 − C R ) 1 − κ 2 C 2 R × ∞ Y j =1  1 − α 2 C 2 R (2 j + 1) 2   1 − κ 2 C 2 R (2 j + 1) 2  − 1 exp  (1 − α ) C R j   1 + 1 − C R j   1 + 1 − αC R j  − 1 , T aking logarithm of which yields log cos  απC R 2  Γ(1 − αC R ) cos  κπ C R 2  Γ(1 − C R ) = γ e ( α − 1) C R + log (1 + αC R )(1 − C R ) 1 − κ 2 C 2 R + ∞ X j =1 log  1 − α 2 C 2 R (2 j +1) 2   1 − κ 2 C 2 R (2 j +1) 2  +  (1 − α ) C R j  + log  1 + 1 − C R j   1 + 1 − αC R j  . If α < 1 , i.e., κ = α = 1 − ∆ , then log cos  απC R 2  Γ(1 − αC R ) cos  κπ C R 2  Γ(1 − C R ) = − γ e ∆ C R + log 1 − C R 1 − αC R + ∞ X j =1  (1 − α ) C R j  + log  1 + 1 − C R j   1 + 1 − αC R j  = − γ e ∆ C R − log  1 + ∆ C R 1 − C R  + ∞ X j =1 1 2  1 − αC R j  2 − 1 2  1 − C R j  2 + ... ! = − γ e ∆ C R − log  1 + ∆ C R 1 − C R  + π 2 12 C R ∆(2 − αC R − C R ) + ... Thus, for α < 1 , consider C R = 1 − q ∆ log(1+ ǫ ) + o  √ ∆  , we have ǫ 2 G R,gm = C R log(1 + ǫ ) − ∆ C R 1 − C R + π 2 12 C R ∆(2 − αC R − C R ) + ... = log(1 + ǫ ) − 2 p ∆ lo g (1 + ǫ ) + o  √ ∆  If α > 1 , i.e., α = 1 + ∆ and κ = 1 − ∆ , then (using above result for α < 1 ) log cos  απC R 2  Γ(1 − αC R ) cos  κπ C R 2  Γ(1 − C R ) = γ e ∆ C R + log (1 + αC R )(1 − C R ) 1 − κ 2 C 2 R + ∞ X j =1 log  1 − α 2 C 2 R (2 j +1) 2   1 − κ 2 C 2 R (2 j +1) 2  + ... 22 log (1 + αC R )(1 − C R ) 1 − κ 2 C 2 R = log 1 + αC R 1 + κC R − log 1 − κC R 1 − C R = log  1 + 2∆ C R 1 + κC R  − log  1 + ∆ C R 1 − C R  = − p ∆ lo g(1 + ǫ ) + o  √ ∆  . ∞ X j =1 log  1 − α 2 C 2 R (2 j +1) 2   1 − κ 2 C 2 R (2 j +1) 2  = ∞ X j =1 log 1 + αC R 2 j +1 1 + κC R 2 j +1 + log 1 − αC R 2 j +1 1 − κC R 2 j +1 = ∞ X j =1 log 1 + 2∆ C R 2 j +1 1 + κC R 2 j +1 ! + log 1 − 2∆ C R 2 j +1 1 − κC R 2 j +1 ! = O (∆) . Therefo re, fo r α > 1 , we also h a ve ǫ 2 G R,gm = log (1 + ǫ ) − 2 p ∆ lo g (1 + ǫ ) + o  √ ∆  . In other words, as α → 1 , the co nstant G R,gm conv erges to ǫ 2 log(1+ ǫ ) at the rate O  √ ∆  , i.e., G R,gm = ǫ 2 log(1 + ǫ ) − 2 p ∆ lo g (1 + ǫ ) + o  √ ∆  . E Pr oof of Lemma 6 Assume k i.i.d . samples x j ∼ S ( α < 1 , β = 1 , F ( α ) ) . Using the ( − α ) th mo ment in Lemma 1 suggests that ˆ R ( α ) = 1 k P k j =1 | x j | − α cos ( απ 2 ) Γ(1+ α ) , is an unbiased estimator of d − 1 ( α ) ,whose variance is V ar  ˆ R ( α )  = d − 2 ( α ) k  2Γ 2 (1 + α ) Γ(1 + 2 α ) − 1  . W e can then estimate F ( α ) by 1 ˆ R ( α ) , i.e., ˆ F ( α ) ,hm = 1 ˆ R ( α ) = k cos ( απ 2 ) Γ(1+ α ) P k j =1 | x j | − α . which is biased at the order O  1 k  . T o remove the O  1 k  term of the bias, we recommen d a bias-corrected version obtained by T aylor expansions [15, Theorem 6.1.1]: 1 ˆ R ( α ) − V ar  ˆ R ( α )  2 2 F − 3 ( α ) ! , (42) from which we obtain the bias-corre cted estimator ˆ F ( α ) ,hm,c = k cos ( απ 2 ) Γ(1+ α ) P k j =1 | x j | − α  1 − 1 k  2Γ 2 (1 + α ) Γ(1 + 2 α ) − 1  , (43) 23 whose bias and variance are E  ˆ F ( α ) ,hm,c  = F ( α ) + O  1 k 2  , V ar  ˆ F ( α ) ,hm,c  = F 2 ( α ) k  2Γ 2 (1 + α ) Γ(1 + 2 α ) − 1  + O  1 k 2  . W e now study the tail bounds. For con venien ce, we provide tail boun ds for ˆ F ( α ) ,hm instead of ˆ F ( α ) ,hm,c . W e ﬁrst analyze the following moment generating fu nction: E  exp  F ( α ) | x j | − α cos ( απ / 2) / Γ(1 + α ) t  =1 + ∞ X m =1 t m m ! E  F ( α )  | x j | − α cos ( απ / 2) / Γ(1 + α )  m  =1 + ∞ X m =1 t m m ! Γ(1 + m )Γ m (1 + α ) Γ(1 + mα ) = ∞ X m =0 Γ m (1 + α ) Γ(1 + mα ) t m . For t h e right tail bound , Pr  ˆ F ( α ) ,hm − F ( α ) ≥ ǫF ( α )  = Pr   k cos ( απ 2 ) Γ(1+ α ) P k j =1 | x j | − α ≥ (1 + ǫ ) F ( α )   = Pr exp − t P k j =1 F ( α ) | x j | − α cos ( απ / 2) / Γ(1 + α ) !! ≥ exp  − t k (1 + ǫ )  ! ( t > 0) ≤ ∞ X m =0 Γ m (1 + α ) Γ(1 + mα ) ( − t ) m ! k exp  t k (1 + ǫ )  = exp − k − log ∞ X m =0 Γ m (1 + α ) Γ(1 + mα ) ( − t ∗ 1 ) m ! − t ∗ 1 1 + ǫ !! = exp  − k ǫ 2 G R,hm  , where t ∗ 1 is the solution to P ∞ m =1 ( − 1) m m ( t ∗ 1 ) m − 1 Γ m (1+ α ) Γ(1+ mα ) P ∞ m =0 ( − 1) m ( t ∗ 1 ) m Γ m (1+ α ) Γ(1+ mα ) + 1 1 + ǫ = 0 , which, for numerica l reaso ns, can be written as ∞ X m =1 ( − 1) m  m (1 + ǫ ) ( t ∗ 1 ) m − 1 Γ m (1 + α ) Γ(1 + mα ) − ( t ∗ 1 ) m − 1 Γ m − 1 (1 + α ) Γ(1 + ( m − 1 ) α )  = 0 24 For t h e left tail boun d, Pr  ˆ F ( α ) ,hm − F ( α ) ≤ − ǫ F ( α )  = Pr   k cos ( απ 2 ) Γ(1+ α ) P k j =1 | x j | − α ≤ (1 − ǫ ) F ( α )   = Pr exp t P k j =1 F ( α ) | x j | − α cos ( απ / 2) / Γ(1 + α ) !! ≥ exp  t k (1 − ǫ )  ! ( t > 0) ≤ ∞ X m =0 Γ m (1 + α ) Γ(1 + mα ) t m ! k exp  − t k (1 − ǫ )  = exp − k − log ∞ X m =0 Γ m (1 + α ) Γ(1 + mα ) ( t ∗ 2 ) m ! + t ∗ 2 1 − ǫ !! , where t ∗ 2 is the solution to ∞ X m =1  ( t ∗ 2 ) m − 1 Γ m − 1 (1 + α ) Γ(1 + ( m − 1) α ) − m (1 − ǫ ) ( t ∗ 2 ) m − 1 Γ m (1 + α ) Γ(1 + mα )  = 0 . F Proof of Lemma 7 Assume z ∼ S ( α = 0 . 5 , β = 1 , F (0 . 5) ) . For conv en ience, we will denote h = F (0 . 5) , only in the proof. The log likelihoo d, l ( z ; h ) , and ﬁrst three deriv atives (w .r .t. h ) are l ( z ; h ) = log h − h 2 2 z − 3 2 log z , l ′ ( z ; h ) = 1 h − h z , l ′′ ( z ; h ) = − 1 h 2 − 1 z , l ′′′ ( z ; h ) = 2 h 3 . Therefo re, g i ven k i. i.d. samples x j ∼ S (0 . 5 , 1 , h ) , the maximum likelihoo d estimator (MLE) is computed by ˆ h mle = s k P k j =1 1 x j . Asymptotically , the v arian ce of the MLE, ˆ h mle reaches 1 k I ( h ) , where I ( h ) is the Fisher Info rmation: I = I ( h ) = E ( − l ′′ ( h )) = 1 h 2 + E  1 z  . W e will soon also need to ev aluate h igher mo ments E  1 z m  . W e can u tilize th e mom ent gene rating fu nction of 1 z , which will be also needed for proving tail bounds in Lemma 8. E exp  t z  = Z ∞ 0 h √ 2 π exp  − h 2 2 z  z 3 / 2 exp  t z  dz = h √ 2 π Z ∞ 0 exp  x  t − h 2 2  x − 1 / 2 dx,  x = 1 z  = h √ 2 π r π h 2 / 2 − t = h  h 2 − 2 t  − 1 / 2 , ( t < h 2 / 2) [ 9 , 3 . 4 72 . 15] From the m th deriv ative of E exp  t z  , ∂ m E exp  t z  ∂ t m = 1 × 3 × 5 × ... × (2 m − 1) h  h 2 − 2 t  − 2 m +1 2 , m = 1 , 2 , 3 , ..., 25 we can write down E  1 z m  = 1 × 3 × 5 × ... × (2 m − 1) h − 2 m . Therefo re, the Fisher Inf ormation I ( h ) = 2 h 2 . Accordin g to the c lassical statistical results[4, 22], we can obtain the ﬁrst four momen ts of ˆ h mle by ev aluating the expressions in [22, 16a-16d] , E  ˆ h mle  = d − [12] 2 k I 2 + O  1 k 2  V ar  ˆ h mle  = 1 k I + 1 k 2  − 1 I + [1 4 ] − [1 2 2] − [13] I 3 + 3 . 5[12] 2 − [1 3 ] 2 I 4  + O  1 k 3  E  ˆ h mle − E  ˆ h mle  3 = [1 3 ] − 3 [12] k 2 I 3 + O  1 k 3  E  ˆ h mle − E  ˆ h mle  4 = 3 k 2 I 2 + 1 k 3  − 9 I 2 + 7[1 4 ] − 6[1 2 2] − 10[13 ] I 4  + 1 k 3  − 6[1 3 ] 2 − 12 [1 3 ][12] + 45[1 2 ] 2 I 5  + O  1 k 4  , where, after re-form atting, [12] = E ( l ′ ) 3 + E ( l ′ l ′′ ) , [1 4 ] = E ( l ′ ) 4 , [1 2 2] = E ( l ′′ ( l ′ ) 2 ) + E ( l ′ ) 4 , [13] = E ( l ′ ) 4 + 3 E ( l ′′ ( l ′ ) 2 ) + E ( l ′ l ′′′ ) , [1 3 ] = E ( l ′ ) 3 . W ithou t gi ving the tails, we repor t E ( l ′ ) 3 = − 8 h 3 , E ( l ′ l ′′ ) = 2 h 3 , E ( l ′ ) 4 = 60 h 4 , E ( l ′′ ( l ′ ) 2 ) = − 12 h 4 , E ( l ′ l ′′′ ) = 0 , [12] = − 6 h 3 , [1 4 ] = 60 h 4 , [1 2 2] = 48 h 4 , [13] = 24 h 4 , [1 3 ] = − 8 h 3 . Thus, we obtain E  ˆ h mle  = h + 3 4 h k + O  1 k 2  , V ar  ˆ h mle  = 1 2 h 2 k + 15 8 h 2 k 2 + O  1 k 3  , E  ˆ h mle − E  ˆ h mle  3 = 5 4 h 3 k 2 + O  1 k 3  , E  ˆ h mle − E  ˆ h mle  4 = 3 4 h 4 k 2 + 93 8 h 4 k 3 + O  1 k 4  . W e recomm end the bias-corrected v er sion: ˆ h mle,c = ˆ h mle  1 − 3 4 1 k  , whose ﬁrst four moments, after some algebra, are E  ˆ h mle,c  = h + O  1 k 2  , V ar  ˆ h mle,c  =  1 − 3 4 1 k  2  1 2 h 2 k + 15 8 h 2 k 2  + O  1 k 3  = 1 2 h 2 k + 9 8 h 2 k 2 + O  1 k 3  , E  ˆ h mle,c − E  ˆ h mle,c  3 = 5 4 h 3 k 2 + O  1 k 3  , E  ˆ h mle,c − E  ˆ h mle,c  4 =  1 − 3 4 1 k  4  3 4 h 4 k 2 + 93 8 h 4 k 3  + O  1 k 4  = 3 4 h 4 k 2 + 75 8 h 4 k 3 + O  1 k 4  . 26 G Proof of Lemma 8 Again, for simplicity , we denote on ly in the proof that h = F (0 . 5) , and hence ˆ h mle = ˆ F (0 . 5) ,mle etc. W e prove the tail bou nds for ˆ h mle , using standard techniques for the Chernoff bo unds[5]. For t > 0 , Pr  ˆ h mle − h ≥ ǫh  = Pr   k P k j =1 1 x j ≥ (1 + ǫ ) 2 h 2   = Pr   − k X j =1 1 x j t ≥ − t k − 1 (1 + ǫ ) h   ≤   k Y j =1 E  exp  − t x j    exp  t k (1 + ǫ ) 2 h 2  = h ( h 2 + 2 t ) 1 / 2 ! k exp  t k (1 + ǫ ) 2 h 2  = exp k lo g h ( h 2 + 2 t ) 1 / 2 ! + t k (1 + ǫ ) 2 h 2 ! , whose minimum is attained at t = h 2 2  (1 + ǫ ) 2 − 1  . Th erefore Pr  ˆ h mle − h ≥ ǫh  ≤ exp  − k  log(1 + ǫ ) − 1 2 + 1 2 1 (1 + ǫ ) 2  . Similarly , we can prov e the left tail bound . Pr  ˆ h mle − h ≤ − ǫh  = Pr   k P k j =1 1 x j ≤ (1 − ǫ ) 2 h 2   = Pr   k X j =1 1 x j t ≥ t k (1 − ǫ ) 2 h 2   ≤   k Y j =1 E  exp  t x j    exp  − t k (1 − ǫ ) 2 h 2  =  h ( h 2 − 2 t ) 1 / 2  k exp  − t k (1 − ǫ ) 2 h 2  , whose minimum is attained at t = h 2 2  1 − (1 − ǫ ) 2  . Th erefore, Pr  ˆ h mle − h ≤ − ǫh  ≤ exp  − k  log(1 − ǫ ) − 1 2 + 1 2 1 (1 − ǫ ) 2  . For s m all ǫ , because log(1 + ǫ ) = ǫ − ǫ 2 2 + ǫ 3 3 ... and 1 (1+ ǫ ) 2 = 1 − 2 ǫ + 3 ǫ 2 + 4 ǫ 3 ... , these boun ds becom e Pr  ˆ h mle − h ≥ ǫh  ≤ exp  − k  ǫ 2 − 5 3 ǫ 3 + ...  , Pr  ˆ h mle − h ≤ − ǫh  ≤ exp  − k  ǫ 2 + 5 3 ǫ 3 + ...  . 27 H Proof of Lemma 9 Assume k i.i.d . samples x j ∼ S ( α, β , F ( α ) ) . W e ﬁrst seek an unb iased estimator of F λ ( α ) , denoted by ˆ R ( α ) ,λ , ˆ R ( α ) ,λ = 1 k P k j =1 | x j | λα cos ( κ ( α ) λπ 2 ) | cos ( απ 2 ) | λ  2 π Γ(1 − λ )Γ( λα ) sin  π 2 λα  , whose variance is V ar  ˆ R ( α ) ,λ  = F 2 λ ( α ) k cos ( κ ( α ) λπ ) 2 π Γ(1 − 2 λ )Γ(2 λα ) sin ( π λ α )  cos  κ ( α ) λπ 2  2 π Γ(1 − λ )Γ( λα ) sin  π 2 λα  2 − 1 ! . In order for the variance to be bo unded , we need to restrict − 1 / 2 α < λ < 1 / 2 if α > 1 , and λ < 1 / 2 if α < 1 . A biased estimator o f F ( α ) would be simply  ˆ R ( α ) ,λ  1 /λ , which has O  1 k  bias. Th is b ias ca n be removed to an extent by T aylor expansions [15 , The orem 6.1.1]. W e call this new estimator the “fr actional power” estimator: ˆ F ( α ) ,f p,c,λ = “ ˆ R ( α ) ,λ ” 1 /λ − V ar “ ˆ R ( α ) ,λ ” 2 1 λ „ 1 λ − 1 « “ d λ ( α ) ” 1 /λ − 2 = 0 B B @ 1 k P k j =1 | x j | λα cos ( κ ( α ) λπ 2 ) | cos ( απ 2 ) | λ 2 π Γ(1 − λ )Γ( λα ) sin ` π 2 λα ´ 1 C C A 1 /λ 1 − 1 k 1 2 λ „ 1 λ − 1 « cos ( κ ( α ) λπ ) 2 π Γ(1 − 2 λ ) Γ(2 λα ) sin ( π λα ) ˆ cos ` κ ( α ) λπ 2 ´ 2 π Γ(1 − λ )Γ( λα ) sin ` π 2 λα ´˜ 2 − 1 !! , where we plug in the estimated F λ ( α ) . Th e asymptotic v ar iance would be V ar “ ˆ F ( α ) ,f p,c,λ ” = V ar “ ˆ R ( α ) ,c,λ ” „ 1 λ “ F λ ( α ) ” 1 /λ − 1 « 2 + O „ 1 k 2 « = F 2 ( α ) 1 λ 2 k cos ( κ ( α ) λπ ) 2 π Γ(1 − 2 λ ) Γ(2 λα ) sin ( π λα ) ˆ cos ` κ ( α ) λπ 2 ´ 2 π Γ(1 − λ )Γ( λα ) sin ` π 2 λα ´˜ 2 − 1 ! + O „ 1 k 2 « . The optimal λ , denoted by λ ∗ , is then λ ∗ = argmin ( 1 λ 2 cos ( κ ( α ) λπ ) 2 π Γ(1 − 2 λ )Γ(2 λα ) sin ( π λ α )  cos  κ ( α ) λπ 2  2 π Γ(1 − λ )Γ( λα ) sin  π 2 λα  2 − 1 !) . W e denote the optimal fraction al po wer estimator ˆ F ( α ) ,f p,c,λ ∗ by ˆ F ( α ) ,op,c . I Proof of Lemm a 10 W e consider only α < 1 , i.e., κ ( α ) = α , T o pr ove that g ( λ ; α ) = 1 λ 2 cos ( κ ( α ) λπ ) 2 π Γ(1 − 2 λ )Γ(2 λα ) sin ( π λ α )  cos  κ ( α ) λπ 2  2 π Γ(1 − λ )Γ( λα ) sin  π 2 λα  2 − 1 ! is a co n vex function of λ , where λ < 1 / 2 , it sufﬁces show that ∂ 2 g ( λ ; α ) ∂ λ 2 > 0 . Here u nless we specify λ = 0 , we always assume λ 6 = 0 to avoid triviality . (I t is easy to sho w ∂ 2 g ( λ ; α ) ∂ λ 2 → 0 when λ → 0 .) 28 Because κ ( α ) = α , we simplify g ( λ ; α ) (startin g with Euler’ s reﬂection formula), to be g ( λ ; α ) = 1 λ 2  Γ(1 − 2 λ )Γ 2 (1 − λα ) Γ(1 − 2 λα )Γ 2 (1 − λ ) − 1  = 1 λ 2  α Γ( − 2 λ )Γ 2 ( − λα ) Γ( − 2 λα )Γ 2 ( − λ ) − 1  = 1 λ 2  α 2 2 λα − 2 λ Γ( − λ + 1 / 2 )Γ( − λα ) Γ( − λα + 1 / 2)Γ( − λ ) − 1  = 1 λ 2 α 2 2 λα − 2 λ ∞ Y s =0  1 + 1 / 2 − λα + s   1 − 1 / 2 − λ + 1 / 2 + s  − 1 ! = 1 λ 2 α 2 2 λα − 2 λ ∞ Y s =0 (2 s − 2 λα + 1)( s − λ ) ( s − λα )(2 s + 1 − 2 λ ) − 1 ! = 1 λ 2 ( C M − 1 ) , where C = C ( λ ; α ) = α 2 2 λα − 2 λ , M = M ( λ ; α ) = ∞ Y s =0 f s ( λ ; α ) , f s ( λ ; α ) = (2 s − 2 λα + 1)( s − λ ) ( s − λα )(2 s + 1 − 2 λ ) , and we hav e used properties of the Gamma function[9, 8.335.1 ,8.32 5.1]: Γ(2 z ) = 2 2 z − 1 √ π Γ( z )Γ ( z + 1 / 2) , Γ( α )Γ( β ) Γ( α + γ )Γ( β − γ ) = ∞ Y s =0  1 + γ α + s   1 − γ β + s  . W ith respect to λ , the ﬁrst two deriv atives o f g ( λ ; α ) are (d enoting w = log (2)(2 α − 2) ) ∂ g ∂ λ = 1 λ 2 − 2 λ ( C M − 1 ) + w + ∞ X s =0 ∂ log f s ∂ λ ! C M ! ∂ 2 g ∂ λ 2 = C M λ 2   6 λ 2 + ∞ X s =0 ∂ 2 log f s ∂ λ 2 + w + ∞ X s =0 ∂ lo g f s ∂ λ ! 2 − 4 λ w + ∞ X s =0 ∂ lo g f s ∂ λ !   − 6 λ 4 . T o show ∂ 2 g ∂ λ 2 > 0 , it sufﬁces to sho w ∂ 2 g ∂ λ 2 λ 4 = 6 ( C M − 1 ) + C M λ 2   ∞ X s =0 ∂ 2 log f s ∂ λ 2 + w + ∞ X s =0 ∂ log f s ∂ λ ! 2 − 4 λ w + ∞ X s =0 ∂ log f s ∂ λ !   > 0 . Because ( C M ) | λ =0 = 1 and ( C M ) | λ 6 =0 > 1 (which is in tuitiv e and will be shown by algebra ), it s u fﬁces to sho w T 1 ( λ ; α ) = 6 ( C M − 1) + λ 2 ∞ X s =0 ∂ 2 log f s ∂ λ 2 + λ 2 w + ∞ X s =0 ∂ log f s ∂ λ ! 2 − 4 λ w + ∞ X s =0 ∂ lo g f s ∂ λ ! > 0 . 29 Because T 1 ( λ = 0 ; α ) = 0 , it sufﬁces to show λ ∂ T 1 ∂ λ > 0 , where ∂ T 1 ∂ λ =(6 C M − 4 ) w + ∞ X s =0 ∂ lo g f s ∂ λ ! − 2 λ ∞ X s =0 ∂ 2 log f s ∂ λ 2 + λ 2 ∞ X s =0 ∂ 3 log f s ∂ λ 3 + 2 λ w + ∞ X s =0 ∂ log f s ∂ λ ! 2 + 2 λ 2 w + ∞ X s =0 ∂ lo g f s ∂ λ ! ∞ X s =0 ∂ 2 log f s ∂ λ 2 , λ ∂ T 1 ∂ λ =(6 C M − 4 ) λ w + ∞ X s =0 ∂ log f s ∂ λ ! − 2 λ 2 ∞ X s =0 ∂ 2 log f s ∂ λ 2 + λ 3 ∞ X s =0 ∂ 3 log f s ∂ λ 3 + 2 λ 2 w + ∞ X s =0 ∂ log f s ∂ λ ! 2 + 2 λ 3 w + ∞ X s =0 ∂ log f s ∂ λ ! ∞ X s =0 ∂ 2 log f s ∂ λ 2 . Because C M > 1 and we will soon show λ  w + P ∞ s =0 ∂ log f s ∂ λ  > 0 , it sufﬁces to show 2 λ w + ∞ X s =0 ∂ lo g f s ∂ λ ! − 2 λ 2 ∞ X s =0 ∂ 2 log f s ∂ λ 2 + λ 3 ∞ X s =0 ∂ 3 log f s ∂ λ 3 + 2 λ 2 w + ∞ X s =0 ∂ log f s ∂ λ ! 2 + 2 λ 3 w + ∞ X s =0 ∂ log f s ∂ λ ! ∞ X s =0 ∂ 2 log f s ∂ λ 2 = λT 2 ( λ ; α ) > 0 , for which it sufﬁces to show T 2 (0; α ) = 0 , and ∂ T 2 ∂ λ = λ 2 ∞ X s =0 ∂ 4 log f s ∂ λ 4 + 2 w + ∞ X s =0 ∂ log f s ∂ λ ! 2 + 8 λ w + ∞ X s =0 ∂ lo g f s ∂ λ ! ∞ X s =0 ∂ 2 log f s ∂ λ 2 + 2 λ 2 ∞ X s =0 ∂ 2 log f s ∂ λ 2 ! 2 + 2 λ 2 w + ∞ X s =0 ∂ log f s ∂ λ ! ∞ X s =0 ∂ 3 log f s ∂ λ 3 > 0 . T o this end, we know in order to prove th e con vexity of g ( λ ; α ) , it sufﬁces to prove the following: ( C M ) | λ =0 = 1 , ( C M ) | λ 6 =0 > 1 , λ w + ∞ X s =0 ∂ log f s ∂ λ ! > 0 , ∞ X s =0 ∂ 2 log f s ∂ λ 2 > 0 , ∞ X s =0 ∂ 4 log f s ∂ λ 4 > 0 . 4 ∞ X s =0 ∂ 2 log f s ∂ λ 2 + λ ∞ X s =0 ∂ 3 log f s ∂ λ 3 > 0 , where ∞ X s =0 ∂ lo g f s ∂ λ = ∞ X s =0  − 2 α 2 s − 2 λα + 1 − 1 s − λ + α s − λα + 2 2 s + 1 − 2 λ  , ∞ X s =0 ∂ 2 log f s ∂ λ 2 = ∞ X s =0 − 4 α 2 (2 s − 2 λα + 1) 2 − 1 ( s − λ ) 2 + α 2 ( s − λα ) 2 + 4 (2 s + 1 − 2 λ ) 2 ! , ∞ X s =0 ∂ 3 log f s ∂ λ 3 = ∞ X s =0 − 16 α 3 (2 s − 2 λα + 1) 3 − 2 ( s − λ ) 3 + 2 α 3 ( s − λα ) 3 + 16 (2 s + 1 − 2 λ ) 3 ! , ∞ X s =0 ∂ 4 log f s ∂ λ 4 = ∞ X s =0 − 96 α 4 (2 s − 2 λα + 1) 4 − 6 ( s − λ ) 4 + 6 α 4 ( s − λα ) 4 + 96 (2 s + 1 − 2 λ ) 4 ! . First, we can show ( C M ) | λ =0 = 1 and  w + P ∞ s =0 ∂ log f s ∂ λ  | λ =0 = 0 , bec ause C M | λ =0 = α lim λ → 0 (1)( − λ ) ( − λα )(1) ∞ Y s =1 (2 s + 1)( s ) ( s )(2 s + 1 ) = 1 , 30 and ∞ X s =0 ∂ lo g f s ∂ λ      λ =0 = − 2 α + 2 + ∞ X s =1  − 2 α 2 s + 1 − 1 s + α s + 2 2 s + 1  = − 2 α + 2 + ( α − 1 ) ∞ X s =1 1 s (2 s + 1) = − ( α − 1) log(2) = − w because P ∞ s =1 1 s (2 s +1) = 2 − 2 log(2) ; see [9, 0.2 34.8]. Therefo re, onc e we have proved P ∞ s =0 ∂ 2 log f s ∂ λ 2 > 0 , ( C M ) | λ 6 =0 > 1 and λ  w + P ∞ s =0 ∂ log f s ∂ λ  > 0 follows immediately . T o show P ∞ s =0 ∂ 2 log f s ∂ λ 2 > 0 , P ∞ s =0 ∂ 4 log f s ∂ λ 4 > 0 , and 4 P ∞ s =0 ∂ 2 log f s ∂ λ 2 + λ P ∞ s =0 ∂ 3 log f s ∂ λ 3 > 0 , we make use of Riemanns’ Zeta functio n[9, 9.5 11,9.5 21], ζ ( m, q ) = ∞ X s =0 1 ( s + q ) m = 1 Γ( m ) Z ∞ 0 t m − 1 e − qt 1 − e − t dt, q < 0 , m > 1 , to rewrite ∞ X s =0 ∂ 2 log f s ∂ λ 2 = ∞ X s =0 − 4 α 2 (2 s − 2 λα + 1) 2 − 1 ( s − λ ) 2 + α 2 ( s − λα ) 2 + 4 (2 s + 1 − 2 λ ) 2 ! = − α 2 ζ  2 , 1 2 − λα  − 1 λ 2 − ζ (2 , 1 − λ ) + α 2 λ 2 α 2 + α 2 ζ (2 , 1 − λα ) + ζ  2 , 1 2 − λ  = Z ∞ 0 t 1 − e − t  − α 2 exp ( − t (1 / 2 − λα )) − exp ( − t (1 − λ )) + α 2 exp ( − t (1 − λα )) + exp ( − t (1 / 2 − λ ))  dt = Z ∞ 0 t 1 − e − t  e − t/ 2 − e − t   e λt − α 2 e λαt  dt = Z ∞ 0 te − t/ 2 1 + e − t/ 2  e λt − α 2 e λαt  dt = Z ∞ 0 t 1 + e − t/ 2  e − t (1 / 2 − λ ) − α 2 e − t (1 / 2 − λα )  dt Note that 1 ≤ 1 + e − t/ 2 ≤ 2 when t ∈ [0 , ∞ ) , and Z ∞ 0 t  e − t (1 / 2 − λ ) − α 2 e − t (1 / 2 − λα )  dt = 1 (1 / 2 − λ ) 2 − α 2 (1 / 2 − λα ) 2 = 1 (1 / 2 − λ ) 2 − 1 (1 / 2 /α − λ ) 2 > 0 because λ < 1 / 2 , α < 1 , and R ∞ 0 t m e − pt dt = m ! p − m − 1 . This proves t h at P ∞ s =0 ∂ 2 log f s ∂ λ 2 > 0 . Similarly , ∞ X s =0 ∂ 4 log f s ∂ λ 4 = ∞ X s =0 − 96 α 4 (2 s − 2 λα + 1) 4 − 6 ( s − λ ) 4 + 6 α 4 ( s − λα ) 4 + 96 (2 s + 1 − 2 λ ) 4 ! = − 6 α 4 ζ  4 , 1 2 − λα  − 6 λ 4 − ζ (4 , 1 − λ ) + 6 α 4 λ 4 α 4 + 6 α 2 ζ (4 , 1 − λα ) + 6 ζ  4 , 1 2 − λ  = Z ∞ 0 t 3 1 + e − t/ 2  e − t (1 / 2 − λ ) − α 4 e − t (1 / 2 − λα )  dt ≥ 3! 2 1 (1 / 2 − λ ) 4 − α 4 (1 / 2 − λα ) 4 ! > 0 . At this point, it is trivial to show 4 P ∞ s =0 ∂ 2 log f s ∂ λ 2 + λ P ∞ s =0 ∂ 3 log f s ∂ λ 3 > 0 if λ > 0 . For λ < 0 , h owe ver , we have to use a slightly different approach. Note that wh en α → 1 , W = 4 P ∞ s =0 ∂ 2 log f s ∂ λ 2 + λ P ∞ s =0 ∂ 3 log f s ∂ λ 3 → 0 . Therefor e, we can treat W as a function of λ fo r ﬁxed λ . The only thing we need to show is ∂ W ∂ α < 0 when α < 1 and λ < 0 . 31 ∂ W ∂ α = ∂ h 4 P ∞ s =0 ∂ 2 log f s ∂ λ 2 + λ P ∞ s =0 ∂ 3 log f s ∂ λ 3 i ∂ α = Z ∞ 0 e − t (1 / 2 − λα ) 1 + e − t/ 2  4 t  − 2 α − α 2 λt  + λt 2  − 3 α 2 − α 3 λt  dt = − Z ∞ 0 e − t (1 / 2 − λα ) 1 + e − t/ 2  8 αt + 7 α 2 λt 2 + α 3 λ 2 t 3  dt ≤ − 1 2 Z ∞ 0 e − t (1 / 2 − λα )  8 αt + 7 α 2 λt 2 + α 3 λ 2 t 3  dt = − 4 α (1 / 2 − λα ) + 7 / 2 α 2 λ (1 / 2 − λα ) 2 + α 3 λ 2 (1 / 2 − λα ) 3 ! = − α (1 / 2 − λα ) 3  4 (1 / 2 − λα ) 2 + 7 / 2 αλ (1 / 2 − λα ) + α 2 λ 2  = − α (1 / 2 − λα ) 3  αλ + 7 4  1 2 − αλ  2 + 4 −  7 4  2 !  1 2 − αλ  2 ! < 0 . This completes the proof of the con vexity of g ( λ ; α ) . Finally , we need to show that λ ∗ < 0 , wher e λ ∗ is the solution to ∂ g ∂ λ = 0 , or eq uiv alently , the solution to V ( λ ; α ) = − 2( C M − 1 ) + λ w + ∞ X s =0 ∂ lo g f s ∂ λ ! C M = 0 , provided we discar d the trivial solution λ = 0 . T hus, it su fﬁces to show that V ( λ ; α ) incr eases monoto nically as λ > 0 , i.e., ∂ V ∂ λ > 0 if λ > 0 . Because ∂ V ∂ λ = C M   2 w + ∞ X s =0 ∂ lo g f s ∂ λ ! + λ ∞ X s =0 ∂ 2 log f s ∂ λ 2 + λ ∞ X s =0 ∂ lo g f s ∂ λ ! 2   , it sufﬁces to show  w + P ∞ s =0 ∂ log f s ∂ λ  > 0) . This is true b ecause we have shown lim λ → 0  w + P ∞ s =0 ∂ log f s ∂ λ  > 0) = 0 and P ∞ s =0 ∂ 2 log f s ∂ λ 2 > 0 . This completes the proof that λ ∗ < 0 and hence we ha ve completed the proof for this Lemma. Refer ences [1] Noga Alon, Y ossi Matias, and Mario Szegedy . The space complexity of approximating the frequenc y moments. In STOC , page s 20–29, Philadelphia, P A, 1996. [2] Brian Babcock, S hiv nath Babu, Mayur Datar , Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In P ODS , pages 1–16, Madison, WI, 2002. [3] Ziv Bar-Y ossef, T . S. Jayram, Ravi Kumar , and D. Siv akumar . An information statistics approach to data stream and communication complexity . In FOCS , pages 209–218, V ancouv er, BC, Canada, 2002. [4] Maurice S. Bartlett . Approximate conﬁdence intervals, II. Biometrika , 40(3/4):306–31 7, 1953. [5] Herman Chernof f. A measure of asymptotic efﬁciency for tests of a hypothesis based on the sum of observ ations. The Annals of Mathematical Statistics , 23(4):493– 507, 1952. [6] Graham Cormode, Mayur Datar , Piotr Indyk, and S. Muthuk ri shnan. Comparing data streams using hamming norms (ho w t o zero in). I EEE T ransactions on Knowledg e and Data Engineering , 15(3):529–540 , 2003. [7] Joan Feigenbaum, Sampath Kan nan, Martin Strauss, and Mahesh V i swanatha n. An approximate l 1 -differen ce algorithm for massiv e data streams. In FOCS , pages 501–511, New Y ork, 1999. 32 [8] Philippe Flajolet. Approximate counting: A detailed analysis. BIT , 25(1):113–134, 1985. [9] Izrail S. Gradshteyn and Iosif M. Ryzhik. T able of Inte grals, Series, and Prod ucts . Academic P ress, New Y ork, sixth edition, 2000. [10] Monika R. Henzinger , Prabhakar Raghav an, and Sridhar Rajagopalan. Computing on Data Str eams . American Mathe- matical Society , Boston, MA, USA, 1999. [11] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In FOCS , pages 189–19 7, Redondo Beach, CA, 2000. [12] Piotr Indy k. Stable distribution s, pseudorando m generators, embedding s, and data stream computation. Journa l of ACM , 53(3):307–3 23, 2006. [13] Piotr Indyk and Da vid P . W oodruf f. Optimal approximations of the frequ ency moments of data streams. In STOC , pag es 202–20 8, Baltimore, MD, 2005. [14] Clyde D. Hardin Jr . Ske wed stable variables and processes. T echnical Report 79, Univ ersity of North Carolina, 1984. [15] Erich L. Lehmann and George Casella. Theory of P oint Estimation . Springer, Ne w Y ork, NY , second edition, 1998. [16] Ping Li. Est imators and tail bound s for dimension reduction in l α ( 0 < α ≤ 2 ) using stable random projections. In SOD A , 2008. [17] Ping L i and Tre vor J. Hasti e. A uniﬁed near-optimal esti mator for dimension reduction in l α (0 < α ≤ 2) using stable random projections. In NIPS , V ancouv er , BC, Canada, 2008. [18] Robert Morris. Counting large numbers of e vents in small registers. Commun. ACM , 21(1 0):840–842, 1978. [19] S. Muthukrishnan . Data streams: Algorithms and applications. F oundations and T rends in Theor etical Computer Science , 1:117–23 6, 2 2005. [20] Michael E. Saks and Xiaod ong Sun. Space lo wer bounds for distance approximation in the data stream model. In STOC , pages 360–369 , Montreal, Quebec, Canada, 2002. [21] Gennady Samorodnitsky and Murad S. T aqqu. Stable Non-Gaussian Random Proce sses . Ch apman & Hall, New Y ork, 1994. [22] Leonard R. Shenton and Kimiko O. Bo wman. Higher moments of a maximum-likeliho od estimate. J ournal of R oyal Statistical Society B , 25(2):305– 317, 1963. [23] David P . W oodruff. Optimal space lower bounds for all frequency moments. In SODA , pages 167–175, New Orl eans, LA, 2004. [24] Andre w Chi-Chih Y ao. Some complex it y questions related to distributiv e computing (preliminary report). In STOC , pages 209–213 , Atlanta, GA, 1979. [25] Andre w Chi-Chih Y ao. Lower bounds by probabilistic arguments (e xtended abstract). I n FOCS , pag es 420–428, T ucson, AZ, 1983. [26] Vladimir M. Zolotarev . One-dimensional Stable Distributions . American Mathematical Society , Providence, RI, 1986. 33

On Approximating Frequency Moments of Data Streams with Skewed Projections

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment