Computation of Power Loss in Likelihood Ratio Tests for Probability Densities Extended by Lehmann Alternatives

Computation of P ower Loss in Likelihood Ratio T ests for Probability Densities Extended by Lehmann Alternatives Lucas Gallindo Martins Soares Departamento de Estat´ ıstica e Inform ´ atica Universidade F ederal Rural de P ernambuco , Brasil lucasgallindo@gmail.com Abstract W e compute the loss of power in likelihood ratio tests when we test the original parameter of a probability density extended by the ﬁrst Lehmann alternative. 1 Distributions Generated by Lehmann Alter- natives In the context of parametric models for lifetime data, [Gupta et alii 1998] disseminated the study of distributions generated by Lehmann alterna- tives, cumulative distributions that take one of the following forms: G 1 ( x, λ ) = [ F ( x )] λ or G 2 ( x, λ ) = 1 − [1 − F ( x )] λ (1) where F ( x ) is any cumulative distribution and λ > 0 . In the present note, we are going to call both G distributions generated distributions or extended distributions . It is easy to see that for integer values of λ , G 1 and G 2 are, respectively , the distribution of the maximum and the minimum of a sample of size λ , the support of the two distribution is the same of F , and that the associated density functions are g 1 ( x, λ ) = λf ( x ) [ F ( x )] λ − 1 and g 2 ( x, λ ) = λf ( x ) [1 − F ( x )] λ − 1 (2) where f ( x ) is the density function associated with F . Suppose that we generate a distribution G ( x | λ ) based on the distribution F ( x ) , and want to generate another distribution G 0 ( x | λ, λ 0 ) repeating the process; It is easy to see that the distribution G 0 will be the same as G , for the new param- eter of the distribution, λλ 0 may be summarized as a single one. This has 1 the interesting side effect that the standard uniparametric exponential dis- tribution may be seen as a distribution generated by the second Lehmann alternative from the distribution F ( x ) = 1 − e − x . T o compute the moments of distribution generated by Lehmann alter- natives, we use the c hange of variables u = F ( x ) in the expression E  X k | λ  = Z ∞ −∞ x k λf ( x ) [ F ( x )] λ − 1 dx (3) yielding E  X k | λ  = Z 1 0 λQ k ( u ) u λ − 1 du = E Beta( λ, 1) [ Q ( u )] (4) where Q ( u ) = F − 1 ( u ) is the quantile function. This integral is equivalent to the expectancy of Q ( u ) with respect to a Beta distribution with parameters α = λ, β = 1 . The same reasoning can be used to show that, for the second Lehmann alternative, E  X k | λ  = E Beta(1 ,λ ) [ Q ( u )] . Using the log-likelihood functions G 1 ( x, λ ) = n ln ( λ ) + n X j =1 ln f ( x j ) + ( λ − 1) n X j =1 ln F ( x j ) (5) and G 2 ( x, λ ) = n ln ( λ ) + n X j =1 ln f ( x j ) + ( λ − 1) n X j =1 ln [1 − F ( x j )] (6) we see that the maximum likelihood estimators to the parameter λ have the forms ˆ λ = − n P n j =1 ln F ( x j ) and ˆ λ = − n P n j =1 ln [1 − F ( x j )] (7) The existing literature about distributions generated by Lehmann al- ternatives concerns mostly distributions deﬁned on the interval (0 , ∞ ) or in the real line, with the paper by [Nadarajah and Kotz 2006] being the more complete review of progresses and the paper [Nadarajah 2006] being an in- teresting application of the concepts developed outside the original proposal by [Gupta et alii 1998], which was to analyze lifetime data. In the present paper , we are concerned with some information theoretical quantities of the ﬁrst extension. These are not the only papers dealing with the subject, but a complete list with comments would be a paper on its own. 2 K ullback-Leibler Divergence Given two probability density functions, the quantity deﬁned as D K L ( f | g ) = Z R f ( x ) ln  f ( x ) g ( x )  dx (8) 2 is called Kullback-Leibler Divergence (abbreviated DKL) after the authors of the classical paper [Kullback and Leibler 1951]. V ery often, this quantity is used as a measure of distance between two probability density functions , even though it is not a metric; This divergence measure clearly is greater or equal than zero, with zero occurring only and only if f = g , but it is not symmetric, so D K L ( f | g ) 6 = D K L ( g | f ) , and it does not obey the triangle inequality also. Rewriting equation (8), we get Z R f ( x ) ln  f ( x ) g ( x )  dx = Z R f ( x ) ln( f ( x )) − f ( x ) ln( g ( x )) dx (9) = E f [ln( f ( X ))] − E f [ln( g ( X ))] (10) where E f [ h ( X )] is the expectation of the random variable h ( X ) with respect to the probability density f . Since D K L ( f | g ) is greater than zero, we have that E f [ln( f ( X ))] > E f [ln( g ( X ))] (11) W e will now show that maximizing the likelihood is equivalent to mini- mize D K L ( f | e ) , where e is the empirical distribution function. Calculating D K L ( f | e ) we arrive at D K L ( f | e ) = E f [ln( f ( X ))] − 1 n n X j =1 ln ( f ( x j , θ )) (12) where the rightmost term is the empirical log-likelihood multiplied by a constant. So, maximizing the rightmost term we minimize the whole diver - gence; Then the process of maximizing the likelihood is equivalent to mi- nimizing the divergence between the empirical density and the parametric model. This result is very common in the related literature , and is shown in full detail on sources like [Eguchi and Copas 1998], whic h gives an accessi- ble but rather compact deduction of properties of methods based on Like- lihood Functions using DKL. In the next (and last) section we draw freely from a result shown in the [Eguchi and Copas 1998] paper that states that DKL might be used to measure the loss of power in likelihood ratio tests when the distribution under the alternative hypothesis is mis-speciﬁed. 3 Wrong Speciﬁcation of Reference Distribu- tion and Loss of P ower in Likelihood Ratio T ests Suppose we have data from a probability distribution H ( x | θ , λ ) , and w ant to test the hypothesis that ( θ = θ 0 , λ = λ 0 ) . The usual log-likelihood ratio is expressed as Λ( λ 0 , θ 0 ) = ` ( ˆ λ, ˆ θ ) ` ( λ 0 , θ 0 ) (13) 3 where the notation ˆ ξ is used for the unrestricted maximum likelihood es- timative of the parameter ξ . Suppose we are not willing to (or not able to) compute ` ( ˆ λ, ˆ θ ) because the estimative of the parameter λ is trouble- some and decide to approximate the likelihood ratio statistic using ` ( λ 1 , ˜ θ ) instead of the likelihood under the alternative hypothesis, where ˜ θ is the maximum likelihood estimator of θ given that λ = λ 1 . W e have then the relation Λ( x ) ≈ ` ( λ 1 , ˜ θ ) ` ( λ 0 , θ 0 ) (14) A result by [Eguchi and Copas 1998], section 3, states that the test statistic generated this wa y is less powerful than the usual one, with the loss in the power equal to ∆ Po wer = D K L  f ( x | ˆ λ, ˆ θ ) , f ( x | λ 1 , ˜ θ )  (15) In the present paper , we are concerned with the case where the data follows a distribution extended with the ﬁrst Lehmann alternative, where the original distribution is such that F = F ( x | θ ) for a parameter θ . The null hypothesis will be of the form H 0 : θ = θ 0 , λ = 1 (16) against a alternative hypothesis H A : θ 6 = θ 0 , λ 6 = 1 (17) If we erroneously consider that the data doesn’t come from a extended dis- tribution G ( x | λ, θ ) , but from a population that follows the original F ( x | θ ) distribution, we can say that we are approximating the log-likelihood un- der the alternative hypothesis like in the previous discussion. In this case, the log-likelihood will be taken under the hypothesis H A 0 : θ 6 = θ 0 , λ = 1 (18) which generates the following expression for the log-likelihood: Λ( x ) ≈ ` (1 , ˜ θ ) ` (1 , θ 0 ) (19) Then we have that the test has less power than the one using the full G distribution; The difference on the power of the tests is given by ∆ Po wer = D K L  g ( x | ˆ λ, ˆ θ ) | g ( x | 1 , ˜ θ )  (20) The main point in the above discussion is that for testing hypotheses about the ”original” parameter ξ , the tests using the extended version of distribu- tions are always more powerful, with a considerable difference in the error type II rate. 4 Expanding the equation (20) we have that ∆ P = D K L  g ( x | ˆ λ, ˆ θ ) | g ( x | 1 , ˜ θ )  (21) = Z R g ( x | ˆ λ, ˆ θ ) ln g ( x | ˆ λ, ˆ θ ) g ( x | 1 , ˜ θ ) ! dx (22) = Z R λf ( x | ˆ λ, ˆ θ ) F λ − 1 ( x | ˆ λ, ˆ θ ) ln λf ( x | ˆ λ, ˆ θ ) F λ − 1 ( x | ˆ λ, ˆ θ ) f ( x | 1 , ˜ θ ) ! dx (23) = Z R λf ( x | ˆ λ, ˆ θ ) F λ − 1 ( x | ˆ λ, ˆ θ ) ln  λF λ − 1 ( x | ˆ λ, ˆ θ )  dx (24) = ln λ + Z R λ ( λ − 1) f ( x | ˆ λ, ˆ θ ) F λ − 1 ( x | ˆ λ, ˆ θ ) ln  F ( x | ˆ λ, ˆ θ )  dx (25) Integrating by parts, we get ∆ Po wer = ln λ + 1 − λ λ (26) The graphic of this function is the loss of power that we have on our test when we the distribution of our data is one extended by the ﬁrst Lehmann alternative and we fail to notice that, and is depicted in Figure 1 for values of λ bigger than one . References [Eguchi and Copas 1998] E G U C H I , S. A N D C O PA S , J. (2006). Interpreting Kullback-Leibler divergence with the Neyman-P earson lemma. Journal of Multivariate Analysis , vol. 97 , Issue 9, pages 2034-2040. [Gupta et alii 1998] G U P T A , R . C. , G U P T A , P . L . A N D G U P T A , R . D. (1998). Modeling failure time data by Lehman alternatives . Communication in Statistics: T heory and Methods , vol. 27 , pages 887-904. [Kullback and Leibler 1951] K U L L B A C K , S . A N D L E I B L E R , R . A . (1951). On information and sufﬁciency . The Annals of Mathematical Statistics , vol. 22 , Number 1, pages 79-86. [Nadarajah and Kotz 2006] N A D A R A J A H , S. , K O T Z , S . (2006). The Expo- nentiated Type Distributions . Acta Applicandae Mathematicae , vol. 92 , pages 97-111. [Nadarajah 2006] N A D A R A J A H , S . 2006. The exponentiated Gumbel dis- tribution with climate application. Environmetrics , vol. 17 , Number 1, pages 13-23. 5 Figure 1: Loss of Power as a Function of λ , for λ > 1 . 6

Computation of Power Loss in Likelihood Ratio Tests for Probability Densities Extended by Lehmann Alternatives

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment