On the Non-degeneracy of Kendalls and Spearmans Correlation Coefficients

Hoeffding proved that Kendall's and Spearman's nonparametric measures of correlation between two continuous random variables X and Y are each asymptotically normal with an asymptotic variance of the form sigma^2/n -- provided the non-degeneracy condi…

Authors: Iosif Pinelis

ON THE NON-DEGE NERA CY OF KENDALL ’S AND SPEARMAN’S CORRELA TION COEFFICIENTS IOSIF PINELIS Abstract. Ho effding prov e d t hat Kendall’s and Spearman’s nonparametric measures of correlation b etw ee n t w o contin uou s random v ariables X and Y are eac h asymptotically normal with an a symptotic v ariance of t he form σ 2 /n – pro vided the non-deg eneracy condition σ 2 > 0 holds, where σ 2 is a certain (alwa ys nonnegativ e) expression which is determined by the joint distribution (say µ ) of X and Y . Su fficien t conditions for σ 2 > 0 in terms of the supp ort set (say S ) of µ are given, the same for b oth correlation statistics. One of them i s that there exist a r ectangle wi th all i ts vertices in S , sides parallel to the X and Y axes, and an interior p oint also in S . Ano ther sufficien t condition is that the Lebesgue measure of S b e nonzero. 1. Introduction Let ( X, Y ) b e a rando m p oint in R 2 with a (joint) cumulativ e distribution function (c.d.f.) F and contin uous ma r ginal c.d.f.’s F X and F Y , so that F ( x, y ) = P ( X 6 x, Y 6 y ), F X ( x ) = P ( X 6 x ), and F Y ( y ) = P ( Y 6 y ) for all real x and y . Then F is contin uo us as well, since | F ( x 2 , y 2 ) − F ( x 1 , y 1 ) | 6 | F X ( x 2 ) − F X ( x 1 ) | + | F Y ( y 2 ) − F Y ( y 1 ) | for all x 1 , y 1 , x 2 , y 2 in R . Vice versa, if F ( x, y ) is co ntin uous in x fo r each real y , then 0 = F ( x, y ) − F ( x − , y ) = P ( X = x, Y 6 y ) − → y →∞ P ( X = x ) = F X ( x ) − F X ( x − ), so that F X is contin uo us. Similarly , if F ( x, y ) is co n tin uous in y for each r eal x , then F Y is contin uous. So , the margina l c.d.f.’s F X and F Y are co nt inu ous iff the joint c.d.f. F is so; in such a ca se, one may simply say that the distribution o f ( X, Y ) is contin uous. Let µ = µ X,Y denote the measure that is the proba bility distribution of ( X, Y ). Let S = S X,Y stand for the set in R 2 that is the supp ort of µ  defined as the intersection of all closed sets of µ -measur e 1; then S is the smallest o f all such sets, and also S coincides with the set of all x ∈ R 2 such that µ ( B ε ( x )) > 0 for a ll ε > 0, where B ε ( x ) is the (say op en) disk o f r adius ε centered a t x  . The most common nonpar ametric statistics tha t mea sure asso c ia tion b etw ee n X and Y are Sp earman’s rank c o rrelatio n [3] and Kendall’s difference sign co r relation [2], usually denoted by ρ and τ , resp ectively . These s tatistics a re base d on a sa mple of indep endent random p oints ( X 1 , Y 1 ) , . . . , ( X n , Y n ) each having the sa me distribution as the random p oint ( X, Y ). In his landmark pap er [1 ], Ho effding prov ed that ρ a nd τ are each asymptotically normal as n → ∞ with an a s ymptotic v arianc e o f the for m σ 2 κ /n — pr ovided the non- degeneracy condition σ 2 κ > 0 holds, where κ is either ρ or τ , and σ 2 κ is a certain (always nonnegative) expressio n which is determined by the c.d.f. F . It is therefor e imp or tant to hav e co n venien t cr iteria for the non-deg eneracy condition σ 2 κ > 0. Date : October 26, 2018. 2000 Mathematics Subje ct Classific ation. 62G10; 62G20; 62G05; 62G30. Key wor ds and phr ases. Kendall’s correlation coefficient, Spearman’s correlation coefficient, asymptotic v ariance, non-degeneracy , s upp ort of distribution. 1 2 IOSIF PINELIS 2. Kendall ’s τ As follows from Ho effding [1, (9 .13)], σ 2 τ = 0 iff the function d τ defined by the formula d τ ( x, y ) := F ( x, y ) −  F X ( x ) + F Y ( y )  / 2 is cons tant on a set (say A ) of µ -measure 1; then d τ m ust b e constant on the clo s ure of A (since d τ is a contin uo us function) and hence on S . That is , one ha s Prop ositio n 2 .1. σ 2 τ = 0 iff d τ is c onst ant on S . Even in such simple cases as the ones c onsidered in examples g iven at the end o f this section, it may be not q uite immediately o bvious based on Prop osition 2.1 whether the distribution of τ is a symptotically non-deg enerate, in the sense that σ 2 τ > 0. How ever, we shall give simple co nditions sufficient for s uch no n- degeneracy . In what follows, the term r e ctangle means a nonempty set o f the form ( x 1 , x 2 ] × ( y 1 , y 2 ]; the p oints ( x 1 , y 1 ), ( x 1 , y 2 ), ( x 2 , y 1 ), ( x 2 , y 2 ) in R 2 are then na turally called the vertices of the rectangle. Lemma 2 .2. Supp ose that S c ont ains al l the four vertic es and also an interior p oint of a r e ctangle; t hen σ 2 τ > 0 . Pr o of. Assume, to the contrary , that σ 2 τ = 0 . Then, by Prop osition 2.1, d τ is constant on S . On the o ther hand, one has 5 p oints ( x ∗ , y ∗ ), ( x 1 , y 1 ), ( x 1 , y 2 ), ( x 2 , y 1 ), ( x 2 , y 2 ) in S such that x 1 < x ∗ < x 2 and y 1 < y ∗ < y 2 . So, in view of the definition of S , (2.1) 0 < µ  ( x 1 , x 2 ) × ( y 1 , y 2 )  6 µ  ( x 1 , x 2 ] × ( y 1 , y 2 ]  = d τ ( x 2 , y 2 ) − d τ ( x 1 , y 2 ) − d τ ( x 2 , y 1 ) + d τ ( x 1 , y 1 ) = 0 , which is a co nt radiction.  One s imple sufficient condition is an immediate corolla ry of Lemma 2.2: Corollary 2.3. If the interior of S is non-empty, then σ 2 τ > 0 . W orking a bit ha rder, one can g et a s tronger res ult. Let λ k denote the Leb esg ue meas ure fo r R k . Corollary 2.4. Supp ose that λ 2 ( S ) > 0 ; then σ 2 τ > 0 . Pr o of. Since R 2 can be partitioned into co un table many disjoint rectangle s , one has λ 2 ( ˜ S ) > 0 for some recta ngle R , where ˜ S := R ∩ S . F urther, ˜ S can b e approximated by the union o f disjoint rec ta ngles R 1 , R 2 , . . . contained in R so that ˜ S ⊆ S n R n and λ 2 ( ˜ S ) > 2 3 λ 2  S n R n  , that is, P n  λ 2 ( ˜ S ∩ R n ) − 2 3 λ 2 ( R n )  > 0 , whence λ 2 ( ˜ S ∩ R m ) > 2 3 λ 2 ( R m ) for some natural m . Shifting and re- scaling if necessar y , without loss of gener ality let us assume that R m = (0 , 1] 2 := (0 , 1] × (0 , 1 ]. Then, in tro ducing the set M := ˜ S ∩ R m , one ha s M ⊆ S ∩ (0 , 1] 2 and λ 2 ( M ) > 2 3 . Now, by F ubini’s theorem, R 1 0 λ 1 ( M x ) dx = λ 2 ( M ) > 2 3 , where M x := { y ∈ R : ( x, y ) ∈ M } . Hence, the set A := { x ∈ (0 , 1) : λ 1 ( M x ) > 2 3 } is infinite  otherwise, one would ha ve λ 1 ( M x ) 6 2 3 for almost all x in (0 , 1 ) and th us R 1 0 λ 1 ( M x ) dx 6 2 3  . Therefore, there a re x 1 , x ∗ , x 2 in A such that x 1 < x ∗ < x 2 . Then, for M ∗ := M x 1 ∩ M x ∗ ∩ M x 2 , one has λ 1 ( M ∗ ) > 1 − 3(1 − 2 3 ) = 0, so tha t the set M ∗ is infinite a nd thus co nt ains some y 1 , y ∗ , y 2 such that y 1 < y ∗ < y 2 . It remains to refer to L e mma 2.2.  NON-DEGENERACY OF COR RELA TION COEFFICIE NTS 3 Corollary 2. 5. Supp ose that the m e asur e µ has a nonzer o absolutely c ontinuous c omp onent (with r esp e ct to the L eb esgue me asur e λ 2 ); t hen σ 2 τ > 0 . Pr o of. Let f b e the densit y of the nonzero absolutely contin uous co mpo nen t of µ . Then 0 < R R 2 f dλ 2 = R S f dλ 2  since 0 6 R R 2 \ S f dλ 2 6 µ ( R 2 \ S ) = 0  . Ther efore, λ 2 ( S ) > 0. It remains to r efer to Cor ollary 2.4.  Observe that Co rollar y 2.4 strictly co nt ains Corolla ry 2.5. Indeed, ther e is a r andom po int ( X , Y ) with a contin uous c.d.f. F such that λ 2 ( S ) > 0 while µ is singula r with resp ect to λ 2 . F or ex ample, let X b e any r andom v ariable with a n everywhere strictly p ositive density (with resp ect to λ 1 ). Next, let Y := X + Q , wher e Q is a ny random v ariable with v alues in the set Q of all rational real nu m ber s such tha t P ( Y = r ) > 0 for all r ∈ Q . Then the random v ar iable Y is absolutely c o nt inu ous a nd P (( X , Y ) ∈ S 0 ) = 1, where S 0 := S r ∈ Q { ( x, x + r ) : x ∈ R } . At that, λ 2 ( S 0 ) = 0, s o tha t the measure µ is singular with resp ect to λ 2 , whereas S = R 2 and hence λ 2 ( S ) > 0. This example als o s hows that Corolla ry 2.5 do es not even contain Corolla ry 2.3. On the other ha nd, it is ea s y to see tha t, vice versa, Corolla ry 2.3 do es not contain Coro llary 2.5; indeed, let ( X , Y ) b e uniformly distributed on a set of the for m C × C , wher e C is any “ fat” Cantor subset of R (e.g . the so-ca lled Smith-V olterr a -Cantor set, http:/ /en.w ikipedia.org/wiki/Smith- Volte r r a - C a n t o r _ s e t ) , which is a non-empty compa ct nowhere-dense set such that λ 1 ( C ∩ B ε ( x )) > 0 for all x ∈ C and all ε > 0; then the measur e µ is absolutely contin uous with resp ect to λ 2 , whereas S = C × C , s o that the interior of S is e mpt y . The la tter example a lso shows that Corol- lary 2 .4 st rictly co nt ains Co rollar y 2 .3. Note that the 5-p oint condition in Lemma 2.2 is not necessa ry for σ 2 τ > 0. F or exa mple, suppo se that the supp or t of the measure µ is the union of stra ight line seg men ts S 1 := { ( x, x ) : 0 6 x 6 1 } and S 2 := { ( x, 1 − x ) : 1 2 6 x 6 1 } . Then the 5-p oint condition in Lemma 2.2 is not s atisfied, and yet, the function d τ is not co nstant on either segment S 1 or S 2 , so that, by P rop osition 2 .1, σ 2 τ > 0. How ever, the role of the int erior p oint in the 5 -p oint condition in Lemma 2.2 is in a certain sense indispe ns able. (a) (b) Figure 1. Examples with d τ constant o n S , that is, with σ 2 τ = 0 . F or exa mple, let the r andom p oint ( X , Y ) b e unifor mly distributed on the union S o f the four sides of the thick-line square shown in Fig. 1 (a). (More generally , one ma y assume that the distributio n is contin uous, sy mmetr ic ab out the center of the thick-line squar e, and has S a s its suppo rt.) Obviously , here there are infinitely many re c ta ngles with all the four vertices in S ; o ne of them is shown here (dashed); how ever, there is no 5th, interior p oint in 4 IOSIF PINELIS any such rectang le which would a ls o b elong to the supp ort S o f the distribution of ( X , Y ). Accordingly , her e σ 2 τ = 0 b y P r op osition 2 .1, since d τ ( X, Y ) = − 1 4 with probability 1. Another example is illustr ated by Fig. 1 (b), where it is assumed that the supp ort of the distribution o f the random p oint ( X , Y ) is the union of all the four thick-line co mpo ne nts: , , , and , whos e proba bilit y masses are 6 11 , 1 11 , 2 11 , and 2 11 , resp ectively , and at that the (conditional) distribution o n either one o f the pieces and is contin uo us and symmetric about the center of the piece. The (conditional) distribution on either o ne of the pieces and do es not matter at all, as long a s it is co n tin uo us. Her e to o there ar e infinitely many r e ctangles with a ll the four vertices in S , but the interior of none o f them has a nonzer o µ -ma s s. Acco rdingly , σ 2 τ = 0 , s ince d τ ( X, Y ) = − 3 22 with pro bability 1. Ho wev er, any weights of the pieces , , , and other than 6 11 , 1 11 , 2 11 , a nd 2 11 would result in σ 2 τ > 0 . These exa mples sugge s t that it would b e difficult, if at all p oss ible , to find a necessa ry and sufficien t condition for the non-degeneracy from which one co uld easily deduce s uch results as Corollar y 2.4. 3. Spearman’s ρ By Ho effding [1 , (9.27)], σ 2 ρ = 0 iff a certain function d ρ of the form d ρ ( x, y ) := F X ( x ) F Y ( y ) − f ( x ) − g ( y ) for some contin uous functions f and g is constant on a s et of µ -meas ure 1 or , equiv alently , on the supp or t S . So, σ 2 ρ = 0 implies that — ag ain for any 5 po in ts ( x ∗ , y ∗ ), ( x 1 , y 1 ), ( x 1 , y 2 ), ( x 2 , y 1 ), ( x 2 , y 2 ) in S s uch tha t x 1 < x ∗ < x 2 and y 1 < y ∗ < y 2 — a ll the relatio ns (2.1) hold with µ X ⊗ µ Y and d ρ instead o f µ and d τ , resp ectively; here, a s usua l, µ X and µ Y stand for the distributions of X a nd Y , so that µ X ⊗ µ Y is the co rresp onding pro duct measure. Thu s, obtains the following result. Corollary 3.1. L emma 2.2 holds with σ ρ in plac e of σ τ , and so do Cor ol laries 2.3–2.5. References [1] W. Ho effding, A Class of St atistics with Asymptotic al ly Normal Distribution , Ann. Math. Statistics 19 (1948) 227–250. [2] M. Kendall, A New Me asur e of R ank Corr elation , Biometrik a 30 (1938) 81–89. [3] C. Sp earman, The Pr o of and Me asur ement of Asso ciation b e twe en Two Things , The Am erican Journal of Psyc hology , 15 (1904) 72–101. Dep ar tment of Ma themat ical S ciences, Michigan Technological University, Houghton, Michi- gan 4 9931 E-mail addr ess : ipin elis@mtu .edu

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment