Some notes on biasedness and unbiasedness of two-sample Kolmogorov-Smirnov test

Some notes on biasedne ss and unbiasedne ss of tw o-sample K olmogor ov-Smirnov test P . Bubeliny e - mail : bubeliny @ karlin . mff . cuni . cz Charles Univ ersity , F ac ulty of Mathematics and Physics, KPMS, Sokolovska 83, Prague, Czech Republic, 18675 . Abstract: This pap er d eals with two-sample K olmogor ov-S mirnov test and its biasedness. This test is not unbia sed in general in case of di ﬀ erent sample sizes. W e fo und out most biased distribution for some values of signiﬁca nce level α . Mor eover we discovered that th er e exists nu mber o f o bservation a nd signiﬁ - cance level α such that this test is unbiased at le v el α . 1 Introd uction In the world of statistic, there exists an enormo us num ber of tests and new ones are going to be deri ved. For most of these tests we know , that they are con sistent, we know their as ymptotic beha vior and a lo t of ano ther proper ties. But there is one thing which is often omitted. Th is thing is unbiasedness. Somebod y can think, that all of the tests, which are used, are un biased or are biased against very s pecial alternative which ca n not occur in practical ap plications. Som ebody can look at unbiasedness as at v ery poor power of tests against some alternatives and somebody can just thing that u nbiasedness is unimp ortant. But they are all wrong . W e o ften check some assum ptions of test b y other tests. But what if the check ing test is biased a nd theref ore it le ads to the b ad decision ? Then the main test shou ld not be used and it can lead to wrong decision. Therefo re, unbiasedness should not be underestimate. There are a lot of tests which are really unb iased. But there are p lenty o f tests that are used daily and they are biased . One of such tests is well kn own two-sample Kolmogorov-Sm irnov test. In what follows, we look at biasedness and unbiasedn ess of this test in some cases in detail. 2 Biasedness and unbiasedness of K olmogorov-Smir nov test Firstly , we sh ould recall, what unbiasedness is. A test is said to be unbiased at lev el α if 1. it has signiﬁcance le vel α 2. for all distrib utions from alternativ e the po wer of this test is greater or eq ual to α . The test is said to be unbiased if it is u nbiased at all level α ∈ (0 , 1). Finally , th e te st is said to be b iased if it is not un biased. Specially , the test is biased at level α ag ainst altern ati ve G if it is an level α test and P (reject H | G ) < α . Consider, that x 1 , . . . , x n and y 1 , . . . , y m are two ind ependen t samples having distributions with co ntinu- ous distribution functio ns F and G , respec ti vely . W e would like to test the hypothesis H : F = G against the alternative A : F , G . T hen two-sample K o lmogor ov-Smirnov test is based on statistic D n , m = sup x | ˆ F n ( x ) − ˆ G m ( x ) | , where ˆ F n ( x ) and ˆ G m ( x ) are e mpirical distribution f unctions of F and G . The h ypothesis H is rejected for large v alue of D n , m . The exact formula for computin g p -values can be found in Hajek et al. (1999). 1 Firstly , we sh ould realize that statistic D n , m of two-sample K o lmogorov- Smirnov test has discrete distri- bution. Th erefore p -values for this test a re discre te as we ll. For example consider that n = m = 50. Then the test statistic D n , m can take just 50 di ﬀ er ent values 1 / n , 2 / n , . . . , 1. For statis tic D n , m = 0 . 26 the p -value is equal to 0 . 0678 and f or the next value D n , m = 0 . 28 the p -value is equal to 0 . 0392 . T esting at level α = 0 . 05 could be little bit co nfusing because the power of this test is equa l for each value α ∈ [0 . 0 392 , 0 . 067 8). There exists distrib ution G su ch that power of K olmogo rov-Smirnov test at le vel α = 0 . 0 5 is equal t o 0 . 045 . Such distribution does no t m eet req uirements o f deﬁnition of unb iasedness fo r α = 0 . 05 th ough th e power of this test is h igher than exact level of this test e qual to 0 . 039 2. T o ho ld th e idea o f unbia sedness for tests with d iscrete test statistic we should consider just discrete values o f sign iﬁcance level α or u se rand omized versions of these tests. It sho uld be kept in mind that K olmogo rov-Smirnov test does not depen d on monoto nic transformatio n of samples. I f we transform both s amples (by the same monotonic transfor mation) t o samples with distrib u - tion fun ctions F ′ and G ′ , respe cti vely then sup x | ˆ F n ( x ) − ˆ G m ( x ) | = sup x | ˆ F ′ n ( x ) − ˆ G ′ m ( x ) | . Theref ore without loss of generality , we assume that F is d istribution function of unif orm distrib u tion gi ven by F ( x ) =          0 if x < 0 x if 0 ≤ x ≤ 1 1 if x > 1 . (1) In Gordon and Kleban ov (2010), th ey proved that for n = m ther e exist α ∈ (0 , 1 ) such th at two-sample K o lmogorov- Smirnov test is u nbiased at level α against two-sided alternative F , G . If we co nsider just one-sided alternatives A 1 : F ≤ G or A 2 : F ≥ G we can extend this founding to n , m . Theorem 2.1. Let x 1 , . . . , x n and y 1 , . . . , y m be indepen dent samples fr o m distribution F a nd G . Then for arbitrary n , m ∈ N , ther e exis ts α ∈ ( 0 , 1) such that two-samp le K olmogor ov-S mirnov test of h ypothesis H : F = G again st one-sided alternative A 1 : F ≤ G or A 2 : F ≥ G is un biased at level α . Pr oof. W itho ut loss of generality , we consider that the ﬁrst sample x 1 , . . . , x n is from unifo rm distrib utio n. Firstly , we co nsider on ly the altern ativ e A 1 : F ≤ G . For this alternative, the K o lmogor ov-Smirnov statistic is gi ven by D ∗ n , m = sup x ∈ (0 , 1)  ˆ F n ( x ) − ˆ G m ( x )  , where ˆ F n and ˆ G m are empirical distrib u tion functions of F and G . The hypo thesis H is rejec ted for sm all values of D ∗ n , m . Con sider α such small, that we reject hyp otheses H for D n , m equals to minus one. It occurs if and only if the samples x 1 , . . . , x n and y 1 , . . . , y m satisfy max( y 1 , . . . , y m ) < m in( x 1 , . . . , x n ) . (2) The proba bility of this event is given by n Z 1 0 (1 − x ) n − 1 G m ( x ) d x . (3) Moreover , G ( x ) is m onoton e and G ( x ) ≥ x because we consid er a lternative A 1 : F ≤ G . Therefo re th e function ( 1 − x ) n − 1 G m ( x ) o f in tegral (3) attains its minimum for G ( x ) = x . This integral repre sents pr obability of r ejection of h ypothesis at level α if altern ati ve G is tr ue an d it is minimized for F = x = G ( x ). Hence, K o lmogorov- Smirnov test is unbiased at lev e l α . The proof fo r alternative A 2 : F ≥ G is similar . W e take α such sm all, that we reject hy pothesis if and only if D n , m = 1. The inequality (2) chang e to max( x 1 , . . . , x n ) < min( y 1 , . . . , y m ) 2 and prob ability of this event i s then given by n Z 1 0 x n − 1 (1 − G ( x )) m d x (4) For alterna ti ve A 2 , we have G ( x ) ≤ x and hence integral (4) is minimized for G ( x ) = x . It proves the theorem.  The result of this theorem does not mean that two-sample K o lmogorov- Smirnov test is unbiased against one-sided alterna ti ve. It only say s th at there exist small level α for which this test is unbiased. In the following theorem we show that for n , m two-sided K olm ogorov-Sm irnov test is no t unbiased against two-sided alternati ve. Theorem 2.2 . Let x 1 , . . . , x n be i.i.d fr om un iform d istrib ution with distribution function F and y 1 , . . . , y m be i.i.d. fr o m distribution having distribution function G . I f n , m then ther e exists α ∈ (0 , 1) such that two- sample K olmogor ov-Smirn ov test of hypothesis H : F = G is biased a gainst alternative with the distrib u tion function G ( x ) = ( x 1 − x ) n − 1 m − 1 1 + ( x 1 − x ) n − 1 m − 1 . (5) Pr oof. Consider α such small, that we reject h ypothe ses if and on ly if D n , m = sup x | ˆ F n ( x ) − ˆ G m ( x ) | is equal to one. That is, the samples x 1 , . . . , x n and y 1 , . . . , y m have to satisfy max( y 1 , . . . , y m ) < min( x 1 , . . . , x n ) o r max( x 1 , . . . , x n ) < m in( y 1 , . . . , y m ) . (6) The proba bility of this event is given by n Z 1 0  (1 − x ) n − 1 G m ( x ) + x n − 1 (1 − G ( x ) ) m  d x . Substitute G ( x ) by y and let the d eriv ative of func tion (1 − x ) n − 1 y m + x n − 1 (1 − y ) m accordin g to y equal to zero. It leads to the equatio n  y 1 − y  m − 1 =  x 1 − y  n − 1 . Therefo re the prob ability of e vent (6) is not minimized for F ( x ) = G ( x ) = x but for G ( x ) = ( x 1 − x ) n − 1 m − 1 1 + ( x 1 − x ) n − 1 m − 1 .  Some exam ples of d istribution fu nction given by (5) are in ﬁg ure 1 . Altho ugh we found out that two- sample Kolmogorov-Smirn ov test is b iased against alternative (5) it is really true f or very small α . Let d enote this smallest level α by α 1 . Then α 1 can be directly computed by α 1 = n Z 1 0 (1 − x ) n − 1 x m + x n − 1 (1 − x ) m d x = 2 nm Γ ( n ) Γ ( m ) Γ ( n + m + 1 ) . (7) 3 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 n= 50 m= 20 x G(x) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 n= 50 m= 55 x G(x) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 n= 50 m= 100 x G(x) Figure 1: Plo t of distribution function G g i ven by (5) for n = 50 and m = 2 0 , 55 , 10 0 For example if n = 1 0 and m = 11 th en α 1 is equal to 5 . 67x 10 − 6 . All pr evious result are considered for Kolmogorov-Sm irnov statistic D n , m = 1. Let co nsider seco nd hig h- est value of this statistic. For n > m it is equal to 1 − 1 / n and fo r n < m it is equa l to 1 − 1 / m , respec ti vely . W e denote b y α 2 the signiﬁcance lev el α such that we r eject two-sample K olmogo rov-Smirnov test if and only if D n , m ≥ max(1 − 1 / n , 1 − 1 / m ). Firstly , assume that n > m ≥ 2 and co nsider that D n , m = 1 − 1 / n . It can occur if and only if these samp les are such th at x (1) < . . . < x ( n − 1) < y (1) < x ( n ) or x (1) < y ( m ) < x (2) , . . . < x ( n ) . T o gether with the case D n , m = 1 ( x ( n ) < y (1) or y ( m ) < x (1) ) we have that D n , m is greater or equ al to 1 − 1 / n if an d only if x ( n − 1) < y (1) or y ( m ) < x (2) . It leads as to the probability of rejecting the hypoth eses at level α 2 P ( D n , m ≥ 1 − 1 / n ) = P ( ∀ j y j > x ( n − 1) ) + P ( ∀ j y j < x (2) ) = n ( n − 1) Z 1 0  x n − 2 (1 − x )(1 − G ( x )) m + x (1 − x ) n − 2 G m ( x )  d x . (8) As in proo f o f pr evious theorem let G ( x ) = y and let the d eriv ative of interior fu nction of integral (8) accordin g to y equa l to zero. It lead s us to solve the equ ation  y 1 − y  m − 1 =  x 1 − x  n − 3 . The solution y as a function of x is giv en by y = G ( x ) = ( x 1 − x ) n − 3 m − 1 1 + ( x 1 − x ) n − 3 m − 1 . (9) Now assume th at 2 ≤ n < m and consider D n , m = 1 − 1 / m . It can be true if and only if y (1) < . . . < y ( m − 1) < x (1) < y ( m ) or y (1) < x ( n ) < y (2) , . . . < y ( m ) . Therefo re th e probability of e vent D n , m ≥ 1 − 1 / m is equa l to P ( D n , m ≥ 1 − 1 / m ) = P ( D n , m = 1 − 1 / m ) + P ( D n , m = 1) 4 = nm Z 1 0  (1 − x ) n − 1 G m − 1 ( x )(1 − G ( x )) + x n − 1 (1 − G ( x ) ) m − 1 G ( x )  d x + n Z 1 0  (1 − x ) n − 1 G m ( x ) + x n − 1 (1 − G ( x ) ) m  d x . (10) As before let G ( x ) = y and let the derivati ve o f interior fun ction of integral (10) according to y equal to zero. It leads us to the equation  y 1 − y  m − 3 =  x 1 − x  n − 1 . Therefo re the distribution function of most biased distribution i n this case is given by y = G ( x ) = ( x 1 − x ) n − 1 m − 3 1 + ( x 1 − x ) n − 1 m − 3 . (11) Remark 2.3. If n = 3 and m = 2 or n = 2 and m = 3 then th e mo st bia sed distribution is discrete distribution given by pr oba bilities P ( y = 0) = P ( y = 1 ) = 1 2 or P ( y = 1 2 ) = 1 , r e spectively . Consider G ( x ) = x then le vel α 2 is giv en (according to (8) and (10)) by α 2 = 2 nmk Γ ( n ) Γ ( m ) Γ ( n + m + 1 ) = k α 1 , (12) where k = m in( n + 1 , m + 1). Distribution fun ctions (9) and (11) are similar to S -cur ves on ﬁgure 1. Although these distribution fu nctions a re not equal to them selves and to ( 5) as well, some inter esting re sults can be found . If | n − m | = 2 then (9) and (11) change to G ( x ) = x . It means that the distrib utio n which minimize (8) and (10) is unifor m distribution. It leads us to the following theorem . Theorem 2.4. Let α n , m be g iven by (12). If n = m + 2 or n = m − 2 th en two- sample K olmogor ov-Smirnov test is unb iased at le vel α n , m . Mor eover , if n , m and | n − m | , 2 th en K olmogo r ov-Smirnov test is b iased at level α n , m . Pr oof. Because of α n , m = α 2 , the most biased distribution fun ctions are given by (9) and (11 ). For | n − m | = 2 they chang e to G ( x ) = x = F ( x ). It means that the uniform distrib u tion minimize the probab ility o f rejection hypoth eses F = G against alternativ e F , G at level α 2 if and only if | n − m | = 2.  Remark 2 .5. If | n − m | = 1 th en K o lmogor ov-Smirnov test is not b iased against the distribution fun ctions (9) and (11) at level α 1 . Let denote by A α the set of distributions for which Kolmogorov-Smirnov test is biased at level α , it is A α = { G : P (r eject H at lev el α | alternati ve G is true) < α } . For di ﬀ erent levels 0 < α < α ∗ , o ne would expect that there is some subset relation between A α and A α ∗ . But it is not g enerally tru e. Acco rding to the theore m 2.4 there exist G α such th at G α ∈ A α and G α < A α ∗ . On the o ther han d, f rom rem ark 2.5 we have that th ere exists G ∗ α such th at G ∗ α < A α and G ∗ α ∈ A α ∗ . Therefo re, in general A α is not subset of A α ∗ and vice versa. 5 The pr evious result can be quite simply generalized to α 3 (the thir d smallest α ) in case of n > 2 m or 2 n < m . Add ing the prob ability of the ev en D n , m = 1 − 2 / m or D n , m = 1 − 2 / n to the (8) or (10) lead s us to the most biased distributions at le vel α 3 giv en by G 3 ( x ) = ( x 1 − x ) n − 5 m − 1 1 + ( x 1 − x ) n − 5 m − 1 if n > 2 m (13) or G 3 ( x ) = ( x 1 − x ) n − 1 m − 5 1 + ( x 1 − x ) n − 1 m − 5 if m > 2 n . (14) In this case, α 3 is giv en by α 3 = 2 k 2 nm Γ ( n ) Γ ( m ) Γ ( n + m + 1 ) = k 2 α 1 , where k 2 = min(( m + 2)( m + 1) , ( n + 2)( n + 1) ) 2 . I f n = m + 4 or m = n + 4 then G 3 ( x ) = x . T og ether with co ndition n > 2 m or m > 2 n we have that for n = 6 , m = 2 or n = 2 , m = 6 the two-sample Kolmogorov-Smirn ov test is u nbiased at lev el α 3 = 3 / 7 and f or n = 7 , m = 3 or n = 3 , m = 7 the two-sample Kolmogorov-Smirn ov test is unbiased at lev el α 3 = 1 / 6. Sofar c onsidered α ’ s are too small in case we have some tens of observation in each sample. Th erefore we perfor m the following simulation to look if two-sample K olmogo rov-Smirnov test is biased against the dis- tribution (5) at le vel α ≈ 0 . 05. W e set the number of observation n for the ﬁrst sample be n = 10 , 20 , 50 , 100 and the n umber of observation m for the secon d samp le be m = 11 , 15 , 21 , 51 , 101. As a distribution of the ﬁrst sample we co nsider uniform distrib u tion and fo r second sample we consider tw o distributions. Th e ﬁrst one is the u niform distribution and the seco nd one is distribution having distribution fun ction G g iv en by (5). W e perfor m 100 00 r epetitions and compute the di ﬀ erence between the estimate of power if second sample is from altern ati ve distribution a nd the est imated level α if the seco nd sample is fro m u niform distribution. The results of this simulation are in table 1. W e ca n see that for all consider ed n and m the estimate of di ﬀ er ence is greater than 0. It means that two-sample K o lmogor ov-Smirnov test is no t biased against alternative (5) at lev el α = 0 . 05. T able 1: Di ﬀ eren ce b etween estimate of power for alternative G given by (5) and estimate of level α of two-sample K olmog orov-Smirn ov test. α = 5% m = 11 m = 15 m = 21 m = 51 m = 101 n = 1 0 0.00 34 0.0144 0.0320 0.4153 0.729 0 n = 2 0 0.02 91 0.0087 0.0016 0.2784 0.917 0 n = 5 0 0.40 71 0.3403 0.2715 0.0001 0.529 1 n = 1 00 0.9070 0.9189 0.9190 0.455 7 0 .0001 3 Conclusion In th is p aper we lo oked at biasedness and unbiasedn ess o f two-sample K olmogo rov-Smirnov test. In case of di ﬀ er ent sample sizes this test is no t unbiased. Howe ver we found ou t that it is n ot true for all α ∈ (0 , 1). 6 There exists some special combinatio n o f number of observations in each sample and signiﬁcance level α at which this test is u nbiased ( see e.g theo rem 2.4). Mo reover , we discovered the most biased distribution for some values o f α . Althoug h we consider just sm all values of α , for small sample sizes or fo r data such a s gene expression s th ese levels o f α are app ropriate. W e did n ot con sider a ll levels of α . Howe ver we point out that th is test can be un biased for large samples an d α ar ound 0.0 5. Howev e r more research is needed to ﬁnd out the exact relation between number of observations an d lev el α at which this test is unbiased. Ackno wledgments The author thanks Prof. Lev Klebanov , Dr Sc. for valuable com ments, r emarks an d overall help . The work was supported by the grant SVV 261315 / 2 011. Refer ences H ´ ajek, J., ˇ Sid ´ ak, Z. and Sen, P ., K. (199 9), Theo ry of Rank T ests (Second Edition), Academic Pr ess . Gordon , A., Y . and Kleban ov , L., B. (2010 ), On a paradox ical proper ty of th e K olm ogorov- Smirnov two- sample test, N onpa rametrics an d Robustness in Modern Statistical Infer ence and T ime Series Analysis: A F estschrift in honor of Pr ofessor J a na J ure ˇ cck ov ´ a , V ol. 7, 70-74 . Massey (201 0), F ., J. (1 950), A Note o n the Power of a Non- Parametric T est, Annals o f Mathema tical Statistics , V ol. 21, 440-44 3. 7

Some notes on biasedness and unbiasedness of two-sample Kolmogorov-Smirnov test

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment