An entropic view of Pickands theorem
It is shown that distributions arising in Renyi-Tsallis maximum entropy setting are related to the Generalized Pareto Distributions (GPD) that are widely used for modeling the tails of distributions. The relevance of such modelization, as well as the…
Authors: J.-F. Bercher, C. Vignat
An Entropic V ie w of Pickands’ Th eorem Jean-Franç ois Bercher Laborato ire des Signaux et Systèmes, CNRS-Univ Paris Sud-Supelec, 91192 Gif-sur-Yvette cedex, Fran ce Email: bercher j@esiee.fr Christophe V ignat Institut Gaspard Monge Université de Marne la V allée 77454 Marne-la-V allée ced ex 02, France Email: vignat@univ-mlv .f r A B S T R A C T It is shown that distributions a rising in Rényi-Tsallis max - imum entropy s etting are related to the Generalized P areto Distributions (GPD) that are widely used for modeling the tails of distributions. The relev ance of such mod elization, as well as the u biquity of GPD in p ractical situations f ollows from Balkema-De Haan-Pickan ds theor em on the d istribution of excesses (over a high threshold ). W e provide an entr opic view of this result, by showing that the distribution of a suitably norm alized excess variable conver ges to the so lution of a maximu m Tsallis entropy , which is the GPD. This result resembles the entro pic approach to the Central Limit theorem as provided in [1]; ho we ver , the con vergence in entropy proved here is weaker than the co n vergence in supr emum nor m giv en by Pickands’ theorem. I . I N T RO D U C T I O N Generalized Pareto Distributions (GPD) are wid ely u sed in practice for modeling the tails of d istributions. The under lying rationale is the Balkema-De Haan-Pickan ds the orem [2], [3], which asserts that th e d istribution function of the excess variable X − u | X > u (i.e. the distribution of the shifted variable X exceeding a threshold u ) converges, as u → ∞ , to a GPD with surviv al function: S X ( x ) = P r ( X > x ) = 1 + γ σ x − 1 γ , (1) where σ is a scale parameter and γ a shape pa rameter; fo r γ = 0 , the GPD red uces to the exponen tial distribution S X ( x ) = exp ( − x/σ ) . The correspon ding density is f X ( x ) = 1 σ 1 + γ σ x − 1 γ − 1 , for γ 6 = 0 , and reduc es to f X ( x ) = 1 /σ e xp( − x/σ ) for γ = 0 . In applied fields, GPD have encoun tered a large success since they wer e obtain ed a s the maximizers of a specia l entropy , the Tsallis (Havrda-Charv át-Daróvczy ) entropy [4], with suitable constraints. This en tropy is defin ed by H q ( f X ) = 1 1 − q Z f q X ( x ) dx − 1 (2) for q ≥ 0 . W e no te that Shanno n entr opy H 1 ( f X ) = lim q → 1 H q = − Z f X ( x ) log f X ( x ) dx (3) 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P S f r a g r e p l a c e m e n t s x γ = 0 γ = 1 γ = 10 γ = 100 f X ( x ) = 1 σ 1 + γ σ x − 1 γ − 1 Figure 1. Infinite support Genera lized Par eto Densities for sev eral val ues of the paramet er γ , with σ = 1 . is recovered in the lim it case q = 1 . It is worth m entioning that any mon otonou s transform of the latter entropy exhibits the same GPD maxim izers: an importan t example is Rényi entro py [5]. Th e GPD distribution is of very h igh interest in many p hysical systems, since it enables to mod el power-la w pheno mena. Indeed , power -laws are especially interesting since they appea r wid ely in ph ysics, biology , economy , a nd many other fields [6]. In this communicatio n, we gi ve an interp retation o f Pickands’ theorem which relates it to th e maximum (Rényi/Tsallis) entropy setting; this v ie w giv es a po ssible interpretatio n for th e ubiquity of ‘Tsallis’ ( GPD) distributions in ph ysics a pplications, as well as in o ther fields, an an argument in supp ort to the use of Rényi/Tsallis entro pies. In the following, we deal with univ ariate distributions defined on R or o n a subset of R . Our approach is as follows: first, we show that the GPD can be obtain ed as th e solution of a maximu m Rényi-Tsallis entro py problem with proper n ormalization an d mome nt constrain ts. Second, we consider distributions in th e Fr échet domain of attraction of distributions: this family includes fo r instance Cauchy , Studen t and Pareto distributions. W e characterize the ass ociated q - norm and first moment o f the surv i val f unction associated to the excess variable X − u | X > u . Using an ap propriate normalizatio n, we define a variable whose surviv al fun ction’ s q -nor m and mo ment co n verge to co nstant values. W e per form the same ana lysis for a sub set of distributions in the d omain of attraction of the Gumbel distributions. Thir d, we sh ow that the distribution of excesses coincides asymptotically with th e maximum Tsallis entropy solution . I I . S O L U T I O N T O T H E M A X I M I Z AT I O N O F T S A L L I S ’ E N T RO P Y W e first d eriv e the expression o f the solution to the max- imization o f Tsallis’ entr opy su bject to nor malization and moment constraints. Pr o position 1 : Conside r the set F = { G : R + → R } . The maximum Tsallis entropy problem (or equiv alently the maxi- mum q -nor m pro blem), with q < 1 , defined by max G ∈F H q ( G ) subject to + ∞ Z 0 z G ( z ) dz = µ and + ∞ Z 0 G ( z ) dz = θ (4) has for unique solution G ∗ ( z ) = α 1 q − 1 1 + β α z 1 q − 1 for q 6 = 1 (5) where α ≥ 0 and β ≥ 0 . Moreover , for 1 / 2 < q < 1 ∗ µ = ( q − 1) 2 q (2 q − 1) α 2 q − 1 q − 1 β 2 , θ = α q q − 1 β (1 − q ) q (6) and || G ∗ || q q = α 2 q − 1 q − 1 β (1 − q ) (2 q − 1) . (7) In the case q = 1 , the un ique solu tion writes and G ∗ ( z ) = α exp( − β x ) (8) with constants α and β such that µ = α β 2 , θ = α β (9) and the Sh annon entropy is H 1 ( G ∗ ) = − α β log α + α Pr o of: The solution of the maximum Shannon ( q = 1 ) entropy p roblem is we ll docum ented. W e o nly consider h ere the q < 1 case and we fo llow the ap proach o f [7]. Consider the function al Bregman divergence: B ( f , g ) = Z d ( f , g ) dx = − Z ( f ( x ) q − g ( x ) q ) g ( x ) q − 1 dx + q Z ( f ( x ) − g ( x )) g ( x ) q − 1 dx (10) ∗ Note that the mean is not defined for q < 1 / 2 associated to th e (p ointwise) Bregman diver gence d ( f , g ) built upon the strictly con vex function − x q for q ∈ (0 , 1) . Th en let us e valuate the div ergence between the d istribution G ∗ ( z ) in (5) an d any distribution G ( z ) , with G dom inated by G ∗ , G ( z ) ≪ G ∗ ( z ) , and satisfying (4): B ( G, G ∗ ) = − Z S ( G ( z ) q − G ∗ ( z ) q ) dz − α Z S ( G ( z ) G ∗ ( z ) q − 1 − G ∗ ( z ) q ) dz = − Z S G ( z ) q dz + Z S G ∗ ( z ) q dz , (11) where S d enotes the support of G ∗ ( z ) . The last line follows from the fact that since G and G ∗ both satisfy ( 4), then , using (5) it is easy to ch eck that Z S G ( x ) G ∗ ( x ) q − 1 dx = Z S G ∗ ( x ) q dx. The Bregman divergence B ( G, G ∗ ) bein g always p ositi ve an d equal to ze ro if and only if G = G ∗ , the e quality (1 1) imp lies that, for q ∈ [0 , 1[ , H q ( G ∗ ) ≥ H q ( G ) (12) which mean s that G ∗ is the distribution with m aximum Rényi- Tsallis entropy , with q ∈ [0 , 1[ , in the set of all distributions G ≪ G ∗ satisfying the constrain ts (4). V alues of the con- straints (6) and of th e maximum entropy (7) fo llow by direct calculation. I I I . T H E D I S T R I BU T I O N O F E X C E S S E S F O R D I S T R I B U T I O N S I N F R É C H E T D O M A I N O F AT T R AC T I O N In the following, we co nsider the Fréc het domain of attrac- tion: this is the set F o f distributions such that if variables X i are in depend ent and identically d istributed according to one of them, then max i =1 ..n { X i } converges to the GPD distribution as n → ∞ . It was shown by Gn edenko [8] that a nec essary and sufficient cond ition for a distribution to b e in the Fréchet domain of attraction is that its surviv al function S ( z ) satisfies lim z → + ∞ S ( z ) S ( cz ) = c a , for all c > 0 an d for some tail in dex a > 0 . Eq uiv alently , th is reads S ( z ) = z − a l ( z ) , where l ( z ) is a slowly v arying function, i.e. a fu nction such that lim z → + ∞ l ( z t ) l ( z ) = 1 , ∀ t > 0 . Let us consider the excess variable X u = X − u | X > u . Its surviv al function is S X u ( z ) = S X ( z + u ) S X ( u ) . Pr o position 2 : Sup pose that X b elongs to th e Fréchet do- main, with S X ( z ) ∼ z − a l ( z ) , then S X u has asymptotic q − norm k S X u k q ∼ ( u aq − 1 ) 1 /q and asymptotic fir st moment, with a > 2 , Z + ∞ 0 z S X u ( z ) dz = u 2 (1 − a ) (2 − a ) . Pr o of: the q − power of the q − norm writes k S X u k q q = Z + ∞ 0 S X ( z + u ) S X ( u ) q dz = Z + ∞ u S X ( z ) S X ( u ) q dz = u Z + ∞ 1 S X ( wu ) S X ( u ) q dw ∼ u Z + ∞ 1 ( uw ) − aq u − a dw = u Z + ∞ 1 w − aq dw = u aq − 1 , with 1 − aq ≤ 0 , since a > 2 , q > 1 / 2 . Of cour se, we immediately obtain, taking q = 1 , that k S X u k 1 = u a − 1 . Similarly , the first moment is Z + ∞ 0 z S X u ( z ) dz = Z + ∞ 0 z S X ( z + u ) S X ( u ) dz = Z + ∞ u ( z − u ) S X ( z ) S X ( u ) dz = Z + ∞ 1 u ( w − 1) S X ( wu ) S X ( u ) udw ∼ u 2 Z + ∞ 1 ( w − 1) w − a dw = u 2 (1 − a ) (2 − a ) . W e have a simple corollar y to this result: Cor o llary 1: The surviv al fun ction S Y u of random variable Y = X/g ( u ) , where fu nction g is such that g ( u ) ∼ u , h as asymptotic norms k S Y u k q ∼ ( 1 aq − 1 ) 1 /q and k S Y u k 1 = 1 a − 1 . and an a symptotic first moment Z + ∞ 0 z S Y u ( z ) dz ∼ 1 (1 − a ) (2 − a ) . Pr o of: Th e resu lts for S Y u follow directly f rom Pro posi- tion 2, with S Y u ( z ) = S X u ( z g ( u )) . (13) I V . T H E D I S T R I B U T I O N O F E X C E S S E S F O R A S U B S E T O F D I S T R I B U T I O N S I N G U M B E L D O M A I N O F AT T R AC T I O N For distributions in the Gumbel do main of attraction, th e maximum of a set of variables converges to the Gumb el extreme value distribution. These distributions ar e charac ter- ized by an “exponential” fall-off, and ar e said “light ta iled” distributions. Their excesses over a th reshold are exponen tially distributed, which corr esponds to a GPD with γ = 0 . T he general charac terization of th e Gumbel dom ain of attraction in volves an inco n venient con dition o n the d eriv ati ve of the hazard function. W e con sider her e on ly th e W eibull subset W of the Gu mbel domain of attra ction, whose surviv al functions verify S ( z ) ∼ exp − z ξ l ( z ) (14) where ξ is th e tail index and l ( z ) is a slowly varying functio n. This set c ontains for examp le the Gaussian and Gamma distributions: • The surviv al function of a Gaussian distribution with zero mean and unitary variance is S X ( x ) = 1 / 2 er fc( x/ √ 2) , with erfc the complementar y erro r fu nction. For x → + ∞ , the com plementary error function is equivalent to erfc( x ) ∼ exp( − x 2 ) / ( x √ π ) , and S X ( x ) = 1 √ 2 π e − x 2 2 x = 1 √ 2 π e − x 2 2 ( 1+2 log( x ) x 2 ) • The surviv al function of a Gamma distribution with shape parameter a and r ate param eter b is given by S X ( x ) = Γ( a, bx ) / Γ( a ) . Since Γ( a, b x ) ∼ ( bx ) a − 1 e − bx , we o btain S X ( x ) = 1 Γ( a ) e − bx ( 1 − ( a − 1) log( bx ) bx ) Pr o position 3 : Sup pose that X b elongs to W , with S X ( z ) ∼ exp( − z ξ l ( z )) , then the surviv al func tion S X u of the excesses o f X has asymptotic Shannon entropy H 1 ( S X u ) ∼ u 1 − ξ ξ l ( u ) and asymptotic fir st moment and 1 − norm Z + ∞ 0 z S X u ( z ) dz ∼ u 2(1 − ξ ) ξ 2 l ( u ) 2 , Z + ∞ 0 S X u ( z ) dz ∼ u 1 − ξ ξ l ( u ) Pr o of: The computatio ns are essentially the same as in the proof o f Proposition 2 . From this result, we d educe th e Cor o llary 2: The surviv al function S Y u of random variable Y = u ξ − 1 l ( u ) X has asymptotic Shannon entropy H 1 ( S Y u ) ∼ 1 ξ , and asymptotic fir st moment and 1 − norm Z + ∞ 0 z S Y u ( z ) dz ∼ 1 ξ 2 , Z + ∞ 0 S Y u ( z ) dz ∼ 1 ξ V . T H E E N T RO P Y S O L U T I O N A N D T H E D I S T R I B U T I O N O F E X C E S S E S Let u s now show th at the distributions of excesses, both in the Gumb el and the Fréchet case, coincide with the maximum Tsallis entropy solutio n. In the Fré chet domain of attractio n, this result read s as follows Theor em 1 : if X belongs to the Fréch et doma in of attrac- tion, then choosing q < 1 such that a = 1 1 − q , the distribution o f the excesses Y u as defined in Coro llary 1 reaches asympto tically the maximum q − nor m solution under constraints asympto tically equal to µ and θ provided α = β = 1 . Pr o of: Choosing a = 1 1 − q yields k S Y u k q q ∼ 1 aq − 1 = 1 − q 2 q − 1 , k S Y u k 1 ∼ 1 a − 1 = 1 − q q and Z + ∞ 0 z S Y u ( z ) dz ∼ 1 (1 − a ) (2 − a ) = ( q − 1) 2 q (2 q − 1) which co incide w ith the uniqu e maximum q − norm f unction with constraints µ and θ if and only if α = β = 1 . Since the maximum entropy solution with the same con- straints is uniq ue, we obtain th at the excess variable fro m a distribution in th e domain of attraction of Fréch et distribution asymptotically follows a Generalized Pareto Distribution. In the Gumbel case, we ob tain similarly Theor em 2 : if X belo ngs to W with S X ( z ) ∼ exp( − z ξ l ( z )) then the distribution of th e excesses Y u as define d in Corollary 2 reaches asympto tically the maxim um Sh annon entropy solu- tion under constra ints asymptotically equal to µ and θ pr ovided α = 1 and β = ξ . Pr o of: Equ ating the Shannon entropy and the first mo - ment and 1 − norm of the maximu m entropy solu tion of Proposition 1 with the sam e qu antities r eached asymp totically by S Y u as in Corollary 2 y ields α = 1 an d β = ξ . Example 1: As an illustration , let u s consider th e Cauchy case. The pdf is g i ven by f X ( x ) = 2 π (1 + x 2 ) , x ≥ 0 . Its surviv al function is S X ( x ) = 1 − 2 π arctan ( x ) . U sing n ow the fact that arctan( x ) ≈ π / 2 − 1 / x , for x ≫ 1 , we ob tain that S X ( x ) ∼ 2 / ( π x ) , wh ich means that the Cauchy distribution is in the Fréchet domain o f attr action, with expo nent a = 1 . The surviv al fun ction of the excess variable X u writes S X u = S X ( x + u ) S X ( u ) = 1 − 2 π arctan ( x + u ) 1 − 2 π arctan ( u ) . Using the ar ctan( x ) app roximatio n again , we rea dily obtain, with the threshold u ≫ 1 , S X u ∼ 1 u + x / 1 u = 1 + x u − 1 which has the form of the Generalized Pareto Distribution (1) with index γ = 1 . Finally , with S Y u ( x ) = S X u ( ux ) , we have S Y u ( x ) ∼ 1 / (1 + x ) . Since we want to emphasize on the resem blance with the Central limit theorem, we mention the following stability proper ty of the GPD (1): the d istribution of the excesses over a threshold of GPD remains a GPD, with the same exponent but a d ifferent shape param eter . This prop erty is to be co mpared with the usu al stability b y addition of indepen dent Gau ssian random variables. Theor em 3 : Given a GPD with parameters γ , σ th e dis- tribution of excesses remain s a GPD with p arameters γ and σ ′ = σ 1 + γ σ u . Pr o of: As usual, let u deno tes the threshold , S X the surviv al fu nction of the original GPD and S X u the sur viv al function of v ariable X u . Then, S X u ( x ) = S X ( x + u ) S X ( u ) = 1 + γ σ ( x + u ) − 1 γ 1 + γ σ u − 1 γ = 1 + γ σ ′ x − 1 γ with σ ′ = σ 1 + γ σ u . Note that in the limit γ = 0 case, the expo nential distribution is in v ariant by thresho lding, i.e. σ ′ = σ . V I . F I N A L C O M M E N T S In the Fréchet do main of attra ction as well as in a sub set of the Gumbel do main o f attraction, we have co nnected th e solu- tion of a maximum q -en tropy (or maximu m q -norm) prob lem with th e asymptotic distrib ution o f excesses over a thresho ld, and showed that these distributions ar e Gen eralized Pareto Distributions. W ith this r esult, it is possible to connect the ubiquity of heavy-tailed distributions in phy sics, econom ics or signal processing, the distribution of the excesses over a threshold, and a max imum e ntropy co nstruction. Our appro ach shows the co n vergence in entr opy of the d is- tributions of excess over a threshold; th is type o f convergence is in fact weaker than the d istribution in supremu m norm proved in Pickands’ theore m. H owe ver , this work un derlines an interesting p arallel with the en tropic pro of of the Centr al Limit Theorem a s gi ven in [1]. R E F E R E N C E S [1] A. R. Barron, “Entropy and the central limit theorem, ” Annals of Pr obabili ty , vol. 14, no. 1, pp. 336–342, Jan 1986. [2] J. Pickands, “Statistic al infere nce using extreme order statistic s, ” The Annals of Stat istics , vol. 3, pp. 119–131, 1975. [3] A. A. Balk ema and L. de Haan, “Resi dual life time at great age, ” The Annals of Pr obab ility , vol. 2, pp. 792–804, Oct. 1974. [4] C. Tsalli s, “Possible ge nerali zatio n of Bol tzmann-Gi bbs statistics, ” Jou r- nal of Stati stical Physics , vol. 52, pp. 479– 487, July 1988. [5] A. Rényi , “On measures of entrop y and information. ” Berkel ey , Calif.: Uni v . California Press, 1961, pp. 547–561. [6] M. E. J. Ne wman, “Po wer laws, Pareto distribut ions and Zipf ’ s law , ” Contempor ary Physics , vol. 46, pp. 323–351, 2005. [7] C. V ignat, A. Hero, and J. A. Costa, “ About closedness by con v oluti on of the Tsallis maximizer s, ” P hysica A , vol. 340, pp. 147–152, Sept. 2004. [8] B. V . Gnedenk o, Annals of Mathematics , vol. 44, pp. 423–453 , 1943.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment