Concentration inequalities for the sample correlation coefficient

Concentration inequalitie s for the sample correlatio n co eﬃcie n t Daniel Salnik o v ∗† † Imp erial College Lo ndon Jan uary 23, 202 4 Abstract The sample correlatio n co eﬃcient R plays an imp or tant role in many statistical analyses. W e study the moment s of R under the bi- v ariate Gaus s ian mo del assumption, pro vide a nov el approximation for its ﬁnite sa mple mean and connect it with known results for the v ariance. W e explo it these approximations to presen t non-asy mptotic concentration inequalities for R . Fina lly , w e illustrate our results in a simulation exp eriment that further v alidates the approximations pre- sented in this work. Keywor ds: sample correlation coeﬃcien t, concen tration inequalit y , momen ts, sub-Gaussian b ound. 1 In tro duction The sample corr elation co eﬃcien t R pla ys an imp ortant role in man y statis- tical an alyses. It is deﬁned for samples with size n ≥ 3, u n der the biv ariate normal mo del assumption Fisher obtained an expression for the densit y fun c- tion and prop osed a Z -transform that is ap p ro ximately normal; s ee Fisher (1915), Prov ost (2015). Ho w ev er, the asso ciated normal approxima tion re- quires la rge sample sizes to b e v alid, and th e asymptotic v ariance of the Z - transform does not d ep end o n ρ ; see Hot elling (1953), Win terb ottom ( 1979 ). T o alleviate these sh ortcomings, Hotelling (1953) and Prov ost (2015) deriv ed ∗ Department of M athematics, Hux ley Building, Imp erial Colleg e, 180 Q ueen’s Gate, South Kensington, London, SW7 2AZ, UK. † Great O rmond S treet Institute of Child Health, 30 Guilford Street, Lon don W C1N 1EH, PEP DRIVE 1 closed form expr essions f or the momen ts of R that d ep end on the p opulation correlation co eﬃcien t ρ and sample size n . W e b uild on this work by p rop os- ing simpliﬁed b ounds for the mean and v ariance th at dep end on ρ and n . F urther, we exp loit these b ounds b y derivin g non-asymptotic concen tration inequalities for R . 2 Mean and v ariance appro ximat ions W e are interested in app ro ximating the mean of the Pearson sample correla- tion coeﬃcient computed from a sample of random v ariables ( X i , Y i ) ∈ R 2 , for i ∈ { 1 , . . . , n } . The samp le correlation co eﬃcien t R is giv en b y R = P n i =1  X i − X  Y i − Y   P n i =1  X i − X  1 / 2  P n i =1  Y i − Y  1 / 2 . If w e assume that ( X i , Y i ) ha v e a biv ariat e normal distribution, i.e., ( X i , Y i ) ∼ N 2 ( µ , Σ ) , where µ = ( µ x , µ y ) ∈ R 2 and Σ ∈ R 2 × 2 is the co v ariance m atrix suc h that ρ = σ xy ( σ x σ y ) − 1 ∈ [ − 1 , 1] is the p opulation co rr elation coeﬃ- cien t, σ x , σ y > 0 are the p opulation standard devia tions and σ xy ∈ R is the co v aria nce, then the p robabilit y densit y f unction of R is giv en b y f R ( r ) = 2 n − 3 (1 − ρ 2 ) n − 1 2 (1 − r 2 ) n − 4 2 π Γ( n − 2) + ∞ X k =0  Γ  n − 1 + k 2   2  2 r ρ  k k ! ; see Fisher (1915), Hotelling (1953), Pr o v ost (2015). W e will use the follo wing result, w h ic h com bines r esults of Hotelling (1953) and Pro vo st (2015) with a n ew app ro ximation. Theorem 1. L et R b e the sample c orr elation c o eﬃci e nt c ompute d fr om a sample of size n ≥ 3 c oming fr om a bivariate normal distribution with p opulation c orr elation c o eﬃcient ρ . Then, we have that E ( R m ) = 2 n − 3 (1 − ρ 2 ) n − 1 2 π Γ( n − 2) + ∞ X k =0  Γ  n − 1 + k 2   2  2 ρ  k k ! g m ( k ) , (1) E ( R ) = (1 − n − 1 ) 1 / 2 ρ + O ( n − 1 ) , (2) v ar( R ) = (1 − ρ 2 ) 2 n − 1 + O ( n − 2 ) , (3) E ( R 2 ) = ρ 2 + (1 − ρ 2 ) 2 n − 1 + O ( n − 2 ) , (4) 2 wher e g m ( k ) =  Γ  m + k + 1 2  Γ  n − 2 2   Γ  n + m + k − 1 2  − 1 I { m + k = 2 q } , q is a non-ne gative inte ge r and I is the indic ato r function. Pr o of. S ee Pro v ost (201 5 ) for a pro of of (1) or the sup plemen tary material for an alternativ e one. A pro of of (2) follo w s from manipulating (1), for ρ > 0, in to E ( R ) ≥ (1 − n − 1 ) 1 / 2 ρ + ∞ X q =0 Γ( q + n − 1 2 ) Γ  n − 1 2  Γ( q + 1) ( ρ 2 ) q (1 − ρ 2 ) n − 1 2 = (1 − n − 1 ) 1 / 2 ρ, E ( R ) ≤ ρ + ∞ X q =0 Γ( q + n − 1 2 ) Γ  n − 1 2  Γ( q + 1) ( ρ 2 ) q (1 − ρ 2 ) n − 1 2 = ρ. (5) If ρ < 0, then the inequ alities in (5) tu rn. Therefore, for | ρ | < 1 we conclude that (1 − n − 1 ) 1 / 2 | ρ | ≤ | E ( R ) | ≤ | ρ | . Moreo v er, using (5), w e ﬁn d the error b ound 1 − (1 − n − 1 ) − 1 / 2 ≤ n − 1 . See Hotell ing (1953) for a p ro of of (3). Using (2) and (3), only taking the ﬁr st t wo terms in (1) when m = 2, and noting that E ( R 2 ) − E 2 ( R ) ≥ 0 for all n w e ha ve that ρ 2 (1 − n − 1 ) + (1 − ρ 2 ) 2 n − 1 < E ( R 2 ) ≤ ρ 2 + (1 − ρ 2 ) 2 n − 1 + O ( n − 2 ) .  In terestingly , the sample correlat ion coeﬃcient R will, on a v erage, b e closer to zero than ρ , nonetheless, as n → ∞ the exp ected v alue gets closer and c loser to ρ . Note that the v ariance is largest when ρ = 0 and that if X and Y are p erfectly correlated (i.e., ρ = ± 1), then the v ariance is equal to zero giv en that R is a degenerate random v ariable in that case. This results in more sk ew ed distribu tions as | ρ | approac hes one. Th e follo wing b ounds satisfy these prop erties. By Theorem 1 w e hav e that v ar( R ) . (1 − ρ 2 ) 2 n − 1 + (1 − ρ 2 ) n − 1 , (6) v ar( R ) . 2(1 − ρ 2 ) 2 n − 1 , and b y (3 ) for n suﬃcien tly large w e hav e that v ar( R ) ≈ (1 − ρ 2 ) 2 ( n − 1) − 1 . Notice that the b ound give n by (6) is more conserv ativ e than the other one, hence, it is p referable for smaller samples. 3 3 Concen tration inequalities W e use the approxima tions giv en by (2) an d (3) for studying concen tration inequalities for R . Combining Mark o v’s inequalit y , (2) and (6) gives R = ρ + O Pr ( n − 1 / 2 ) . (7) Tigh ter b ounds can b e achiev ed, the result follo w s b elo w. Prop osition 1. L et R b e the sample c orr elation c o eﬃcient c ompute d fr om a sample of size n ≥ 3 c oming fr om a bivariate normal distribution with p opulation c orr elation c o eﬃcient ρ ∈ ( − 1 , 1) . Then for any t > 0 we have that Pr ( | R − ρ | > t ) ≤ 2 exp  − nt 2 { 2(1 + 2 nt ) } − 1  , mor e over, for n suﬃciently lar ge, we have the appr oximations Pr ( | R − ρ | > t ) . 2 exp  − nt 2 8(1 − ρ 2 ) 2  , (8) Pr ( | R − ρ | > t ) . 2 exp  − nt 2 4(1 − ρ 2 ) 2  . (9) Pr o of. S ee the su pplement ary material for a complete p ro of, we present a summary with the main ideas. Since R is b ounded almost sur ely , all of its moments exist, moreo v er, E { ( R − ρ ) 2 m } = E { ( R − ρ ) 2 m − 2 ( R − ρ ) 2 } ≤ 2 m − 2 ν 2 , where m ∈ { 1 , 2 , 3 , . . . } , and ν 2 := v ar( R ) ≤ ( n − 1) − 1 , wh er e equal- it y holds if ρ = 0. Hence, R satisﬁes Bernstein’s condition; see W a inwrigh t (2019). Consequently , b y (3), for t > 0 w e ha v e that Pr ( | R − E ( R ) | > t ) ≤ 2 exp  − t 2 { 2( ν 2 + 2 t ) } − 1  , scaling prop erly b y (1 − n − 1 ) 1 / 2 ﬁnishes the ﬁr s t part of the pro of. W e pro ceed to the seco nd part. Notice th at b y lo oking at (1) and expressing the gamma fun ction as a factorial w e hav e that g 2 m (2 q ) ≤ Q m − 1 t =1  2 t +2 q +1 2 t +2 q + n − 1  g 2 (2 q ) , which when substituted in to (1) results in the inequalit y  1 − n − 2 n + 1  m − 1 E ( R 2 ) ≤ E ( R 2 m ) .  1 − n − 2 2 m + n − 1  m − 1 E ( R 2 ) , th us, we can b ound f rom b elo w and ab o v e ev en m oments with the second momen t, and approxima te them. By com bining the ab o ve and (7), we ﬁnd a 4 sequence ( a n ) ∈ R suc h that a n ≥ 1 and a n approac hes one as n → + ∞ (i.e., a n → 1), so that the three terms in: { E ( R 2 ) } m ≤ E ( R 2 m ) ≤ a n { E ( R 2 ) } m for m ∈ { 1 , . . . } con v erge to ρ 2 m . Combining this inequ alit y with (2) and (4), and recalling that | ρ | < 1 w e sho w that E { ( R − ρ ) 2 m } . (2 m )! 2 m m ! h 2 1 / 2 m  E ( R 2 ) − ρ 2  1 / 2 i 2 m ≤ (2 m )! 2 m m ! ( √ 2 ν ) 2 m . Therefore, R appr o ximately satisﬁes a sub -Gaussian concen tration b ound; see W ain wr ight (2 019 ). By (3) and (4) for t > 0 this resu lts in Pr ( | R − ρ | > t ) . 2 exp  − nt 2 4(1 − ρ 2 ) 2  ≤ 2 exp  − nt 2 4  , where substituting the up p er b ound giv en b y (6) results in (8). W e ac hiev e a tighter b ound if w e accept the appro ximation 2 m X j = m +1  2 m j  ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j ≃ 0 , whic h results in Pr ( | R − ρ | > t ) . 2 exp  − nt 2 2(1 − ρ 2 ) 2  ≤ 2 exp  − nt 2 2  . (10)  In terestingly , the b ound in (10) links R with a sub-Gaussian random v ariable with a rate of n − 1 / 2 , wh ic h is usually optimal in parametric prob- lems. F u rther, note that ρ v alues close to zero hav e lo oser b oun d s. Hence, R will concen trate more tigh tly if | ρ | is closer to one (i.e., there is less un- certain t y), and if | ρ | = 1, then R = ρ almost surely , which the b ounds reﬂect. 4 Sim ulation Exp erimen t W e simulat e ten-thousand R observ ations coming fr om samples of size ten of a b iv ariate normal distribution with p opulation correlation co eﬃcien t ρ (i.e., R j for j ∈ { 1 , . . . , 10000 } is computed for eac h j fr om a sample of size ten, where ( X i , Y i ) ∼ N 2 ( µ , Σ ), and i ∈ { 1 , . . . , 10 } ). F urther, we co mpu te the relev an t summ ary statistics and compare them with the app ro ximations giv en by (2), (3) and the u p p er b ound giv en by (6). Subsequ ently , we 5 compute the p ercen tage of simulated observ ations that are captured by the co v erage in terv als giv en b y (8), (9) and (10). The results follo w in T ables 1 thru 2 b elo w. ρ E ( R ) R sd( R ) s R UB 0 0 -0.003 0.333 0.332 0.471 -0.25 -0.2 37 -0.236 0.312 0.316 0.450 0.56 0.531 0.534 0.229 0.249 0.359 -0.75 -0.7 12 -0.728 0.146 0.172 0.264 0.95 0.901 0.944 0.033 0.045 0.1 1 T able 1: 10,00 0 R j sim ulations computed from samp les of size ten (i.e., n = 10) coming from ( X i , Y i ) ∼ N 2 ( µ , Σ ) w ith p opulation correlation coeﬃcient ρ . Ab o ve E ( R ) is giv en by (2), R and s R are the mean and standard deviation of the sim ulated v alues, sd( R ) is giv en by th e square ro ot of (3), and UB (upp er-b ound) is g ive n by the square ro ot of (6). W e also s tu dy the concen tration prop erties of the b oun ds give n b y Prop o- sition 1. W e solv e for t > 0 with n = 10, and obta in the cov erag e in terv als: C 0 =  ( − t + ρ, ρ + t ) : 2 exp  − nt 2 { 8(1 − ρ 2 ) 2 } − 1  = 0 . 05  , C 1 =  ( − t + ρ, ρ + t ) : 2 exp  − nt 2 { 4(1 − ρ 2 ) 2 } − 1  = 0 . 05  , C 2 =  ( − t + ρ, ρ + t ) : 2 exp  − nt 2 { 2(1 − ρ 2 ) 2 } − 1  = 0 . 05  . (11) ρ C 0 C 1 C 2 0 100% 100% 100% -0.25 100% 100% 99.3% 0.56 100% 99.5% 97.3% -0.75 99.7% 98.7 % 96.1% 0.95 99.1% 97.6% 95.2% T able 2: Estimated co v erage compu ted from 10,000 simulated v alues of R when n = 10, w here C i is giv en by (11), and ρ ∈ { 0 , − 0 . 25 , 0 . 56 , − 0 . 75 , 0 . 95 } . By co verage w e mea n the p ercen tage of R j ∈ C i , where R j is the j th s im u- lation and i = 0 , 1 , 2. In the interest of replicabilit y , w e p erf orm ed th e exp eriment by ﬁxing the random seed to 2023 in R . The C i are wider and not con tained in [ − 1 , 1] (i.e., sup { C i } > 1 or inf { C i } < − 1) if | ρ | is clo se to zero and/or the sample s ize 6 is small. Hence, it is diﬃcult to identify uncorrelated r andom v ariables with small samples. Th e ab o ve su ggests that there is too muc h uncertain t y . Ho w- ev er, the co v erage interv als b ecome narro wer and m ore pr ecise as sample size increases. See the supplementa ry material for results when n = 3 , 5 , 30 , 10 0. 5 Conclusion W e presented an appro ximation for the mean and v ariance of R u n der the biv ariate norm al mo del assumption. Subsequ ently , we used th ese ap p ro x- imations for deriving ﬁnite samp le concen tration inequalities. These in - equalities en able us to solv e for an error margin δ > 0 with high p robabilit y , i.e., Pr ( | R − ρ | ≤ δ ) ≥ 1 − α , where α ∈ (0 , 1), when estimating ρ . The in- equalities reﬂect th e limitations of estimating ρ and ident ifying u n correlated random v ariables w ith small sample sizes. Finally , th e simulat ion study p ro- vided further v alidati on of the results presen ted in previous sectio ns . Ac kno wledgemen ts W e thank Heather Battey , Mario Cortina-Borja and Guy Nason for insigh t- ful discussions and suggestions in the writing of this wo rk. S alnik o v grate- fully ac kno wledges supp ort from th e UCL Great Ormond Street In stitute of Child Health, Imp erial College London, the Great Or mond S treet Hospi- tal DRIVE Informatics Programme, the Bank of Mexico and EPSR C NeST Programme grant EP/X002195 /1. The views expressed are those of the authors an d not necessa rily those of the EPSR C. References Abramo witz, M. & Stegun, I. A., eds (1972), H andb o ok of Mathematic al F unctions with F ormulas, Gr aphs, and Mathematic al T ables , ten th prin t- ing edn, U.S. Gov ernmen t Print ing Oﬃce, W ashington, DC, USA. Battey , H. (2023) , ‘Asymptotic metho ds and statistical applications’. Lec- ture Notes. Fisher, R. A. (1915) , ‘F requency distribution of th e v alues of the correlation co eﬃcien t in samples from an indeﬁnitely large p opulation’, Biometrika 10 (4), 507–521. URL: http://www.jstor.or g/stable/2331838 7 Hotelling, H. (1953), ‘New ligh t on the correlation co eﬃcient and its trans- forms’, Journal of the R oyal St atistic al. So ciety Series B (Metho dolo gic al) . Pro v ost, S. B. (2015) , ‘Closed-form r epresen tations of the densit y function and in tege r moments of the sample co rrelation co eﬃcien t’, axioms . W ainwrigh t, M. J. (2019), High-Di mensional Statistics: A Non-Asymptotic Viewp oint , Cambridge Series in Statistical and Probabilistic Mathematics, Cam brid ge Univ ersit y Press. Win terb ottom, A. (1979), ‘A note on the d er iv ation of ﬁsher’s transf orma- tion o f the correlation co eﬃcient.’ , Americ an Statistician . Supplemen tary material A Preliminary appro ximations W e are interested in app ro ximating the mean of the Pearson sample correla- tion co eﬃcien t, whic h we assume is compu ted from a sample of the random v ariables ( X i , Y i ) ∈ R 2 , for i ∈ { 1 , . . . , n } . The s amp le correlation co eﬃcient R ∈ [ − 1 , 1] is giv en b y R = P n i =1  X i − X  Y i − Y   P n i =1  X i − X  1 2  P n i =1  Y i − Y  1 2 . F u rthermore, if w e assume that ( X i , Y i ) ha ve a biv ariat e normal distr ib ution, i.e., ( X i , Y i ) ∼ N 2 ( µ , Σ ) , w h ere µ = ( µ x , µ y ) ∈ R 2 and Σ ∈ R 2 × 2 is the co v aria nce matrix s uc h that ρ = σ xy ( σ x σ y ) − 1 ∈ [ − 1 , 1] is the p opulation correlation co eﬃcien t, σ x , σ y > 0 are the p opulation s tandard deviations and σ xy ∈ R is the co v ariance. Then the p robabilit y densit y fun ction of R is giv en by f R ( r ) = 2 n − 3 (1 − ρ 2 ) n − 1 2 (1 − r 2 ) n − 4 2 π Γ( n − 2) + ∞ X k =0  Γ  n − 1 + k 2   2  2 r ρ  k k ! , (A.1) where − 1 ≤ r ≤ 1; see Battey (2023), Fisher (1915). He nce, the m th momen t is give n by E ( R m ) = Z 1 − 1 r m f R ( r ) dr . (A.2) W e will use the follo w in g result. 8 Prop osition 2. L et R b e the sample c orr elation c o eﬃcient c ompute d fr om a sample of size n ≥ 3 c oming fr om a bivariate normal distribution with p opulation c orr elation c o eﬃc i ent ρ . Then the m th moment of R is given by E ( R m ) = 2 n − 3 (1 − ρ 2 ) n − 1 2 π Γ( n − 2) + ∞ X k =0  Γ  n − 1 + k 2   2  2 ρ  k k ! g m ( k ) , (A.3) wher e g m ( k ) = Γ  m + k + 1 2  Γ  n − 2 2  Γ  n + m + k − 1 2  I { m + k = 2 q } , q is a non-ne gative inte ger and I is the indic ato r function. Pr o of. S ee Hote lling (195 3 ), Prov ost (2 015 ) for alternativ e expressions and deriv atio ns . Since | R | ≤ 1 almost surely , all of its momen ts exist and we can exc hange the order of integ ration and su mmation. B efore computing the momen ts we note the follo wing in tegrals. E ac h moment r equires computing g m ( k ) := Z 1 − 1 r m + k (1 − r 2 ) n − 4 2 dr . The in tegral ab o v e is symmetric with resp ect to zero, and when ( m + k ) is o dd the in tegrand is an o dd function in [ − 1 , 1], th us, in that case it is equal to zero. Now, if ( m + k ) is eve n, then the int egrand is an ev en function, so g m ( k ) = 2 Z 1 0 r m + k (1 − r 2 ) n − 4 2 dr = 2 Z 1 0 ( r 2 ) m + k 2 (1 − r 2 ) n − 4 2 dr . No w, substitute u = r 2 , so dr du = 2 − 1 u − 1 2 , abov e and w e ha v e that = 22 − 1 Z 1 0 u m + k +1 2 − 1 (1 − u ) n − 2 2 − 1 du = Γ  m + k + 1 2  Γ  n − 2 2  Γ  n + m + k − 1 2  . Therefore, w e conclud e that g m ( k ) = Γ  m + k + 1 2  Γ  n − 2 2  Γ  n + m + k − 1 2  I { m + k = 2 q } , (A.4) 9 where q is a non-negativ e int eger an d I is the in d icator fu nction. Using (A.1) (A.2), we hav e that E ( R m ) = 2 n − 3 (1 − ρ 2 ) n − 1 2 π Γ( n − 2) + ∞ X k =0  Γ  n − 1 + k 2   2  2 ρ  k k ! g m ( k ) .  Gamma function iden tities W e will use the follo w in g results; see Abramo witz & Stegun (19 72 ). Γ  n − 2 2  = 2 3 − n √ π Γ( n − 2) Γ  n − 1 2  , (A.5) and for a n on-negativ e in teger q Γ( q + 1 2 ) (2 q + 1)! 2 2 q = 2 2 q 2 2 q (2 q )! (2 q + 1)! √ π q ! = √ π (2 q + 1 )Γ( q + 1) , (A.6) also Γ( q + 1 2 ) (2 q )! 2 2 q = √ π Γ( q + 1) . (A.7) Finally , deﬁne the fu nction Γ( z ) Γ( z + 1 2 ) Γ( z ) Γ( z − 1 2 ) =: κ ( z ) , (A.8) note th at for z > 0, using Stirling’s a pp ro ximation, w e ha v e that κ ( z ) ≃ { 1 − ( z + 2 − 1 ) − 1 } 1 2 [ { 1 − (4 z 2 ) − 1 } − 1 ] z − 1 2 , where a ≃ b indicates that a and b , wher e a , b ∈ R , are asymptotically equal. There exists a n 0 ∈ N suc h that if n ≥ n 0 , then | a/b − 1 | < n − 1 , i.e ., lim n → + ∞ a/b = 1). 10 B First momen t Prop osition 3. L et R b e the sample c orr elation c o eﬃcient c ompute d fr om a sample of size n ≥ 3 c oming fr om a bivariate normal distribution with p opulation c orr elation c o eﬃcient ρ ∈ ( − 1 , 1) . Then, we have that E ( R ) = (1 − n − 1 ) 1 2 ρ + O ( n − 1 ) . (B.1) Pr o of. S ince f R is a probabilit y dens it y function, if m = 0 we must hav e that the sum in (A.3) adds u p to on e and ( m + k = k ), so the ev en terms can b e written as k = 2 q , wh ere q is a n on -n egativ e in teger. Also, if k is o dd, then that term in (A.3) is equal to zero (i.e., g m ( k ) = 0). Hence, by using (A.7), (A .5) and (A.6) w e ha v e that 1 = 2 n − 3 (1 − ρ 2 ) n − 1 2 π Γ( n − 2) + ∞ X q =0  Γ  n − 1 + 2 q 2   2 Γ  2 q +1 2  Γ  n − 2 2  Γ  n +2 q − 1 2   2 ρ  2 q (2 q )! = (1 − ρ 2 ) n − 1 2 + ∞ X q =0 Γ  n +2 q − 1 2  Γ( q + 1)Γ  n − 1 2  ( ρ 2 ) q . (B.2) F or the ﬁrst moment w e set m = 1 in (A.3 ), thus, for non-v anishing terms in the sum k = 2 q + 1, where q is a non-negativ e in teger. F urther, w e ha v e that th e in tegral in (A.4) is g 1 (2 q + 1) = Γ( q + 1 + 1 2 )Γ  n − 2 2  Γ  n +2 q +1 2  =  q + 1 2  2 3 − n √ π Γ( n − 2) Γ( q + 1 2 ) Γ  n +2 q +1 2  Γ  n − 1 2  . (B.3) Since m = 1, when we sub stitute (B.3) in to (A.3), we ha ve that E ( R ) = (1 − ρ 2 ) n − 1 2 √ π + ∞ X q =0 Γ  2 q + n 2  Γ( q + n 2 ) Γ  2 q + n +1 2  Γ  n − 1 2  Γ( q + 1 2 ) (2 q + 1)! 2 2 q 2  q + 1 2  ρ 2 q +1 . (B.4) 11 No w, substitute (A.6), (A.5) into (B.4) and we ha ve that E ( R ) = (1 − ρ 2 ) n − 1 2 Γ  n − 1 2  + ∞ X q =0 Γ( q + n 2 ) Γ( q + n +1 2 ) Γ( q + n 2 ) Γ( q + 1) 2 q + 1 2 q + 1 ρ 2 q +1 = (1 − ρ 2 ) n − 1 2 Γ  n − 1 2  + ∞ X q =0 Γ( q + n 2 ) Γ( q + n +1 2 ) Γ( q + n 2 ) Γ( q + n − 1 2 ) Γ( q + n − 1 2 ) Γ( q + 1) ρ ( ρ 2 ) q = ρ (1 − ρ 2 ) n − 1 2 + ∞ X q =0 Γ  n +2 q − 1 2  Γ( q + 1)Γ  n − 1 2  Γ( q + n 2 ) Γ( q + n +1 2 ) Γ( q + n 2 ) Γ( q + n − 1 2 ) ( ρ 2 ) q = ρ (1 − ρ 2 ) n − 1 2 + ∞ X q =0 Γ  n +2 q − 1 2  Γ( q + 1)Γ  n − 1 2  κ  2 q + n 2  ( ρ 2 ) q (B.5) Note that ab ov e almost has the same summan d s as (B.2), h o w eve r, these are multiplied b y a ratio of gammas ev aluat ed at q + n 2 − 1 ± 2 − 1 . Hence, w e can ap p ro ximate these ratios u s ing κ ( q + n 2 ). Sin ce κ is a monotone non-decreasing fu nction (i.e., 0 < κ ( z − ǫ ) ≤ κ ( z ) ≤ 1 for z > ǫ > 0) w e can lo w er b ound the sum b y substituting κ ( q + n 2 ) with κ ( n 2 ), and upp er b ound it by su bstituting κ ( q + n 2 ) with 1. Note that κ ( n 2 ) ≥ (1 − n − 1 ) 1 2 , thus, usin g (A.8), (B.2 ) and (B.5) for ρ > 0 w e h a v e that E ( R ) ≥ (1 − n − 1 ) 1 2 ρ + ∞ X q =0 Γ( q + n − 1 2 ) Γ  n − 1 2  Γ( q + 1) ( ρ 2 ) q (1 − ρ 2 ) n − 1 2 = (1 − n − 1 ) 1 2 ρ, E ( R ) ≤ ρ + ∞ X q =0 Γ( q + n − 1 2 ) Γ  n − 1 2  Γ( q + 1) ( ρ 2 ) q (1 − ρ 2 ) n − 1 2 = ρ. (B.6) If ρ < 0, then th e inequalities in (B.6) turn . Therefore, for | ρ | < 1 w e conclude that (1 − n − 1 ) 1 2 | ρ | ≤ | E ( R ) | ≤ | ρ | . (B.7) If | ρ | = 1, then R is a degenerate random v ariable equal to ± 1, h ence, in that case R = ± 1 almost surely and E ( R ) = ± 1.  Since E ( R ) and ρ ha v e the s ame sign, w e ha v e that | E ( R ) − (1 − n − 1 ) 1 2 ρ | ≤ | ρ |{ 1 − (1 − n − 1 ) 1 2 } ≤ n − 1 . 12 Th us , w e hav e th at E ( R ) = (1 − n − 1 ) 1 2 ρ + O ( n − 1 ), wh ic h matc hes Hottelling’s appro ximation error rate; see Hotelling (1953), b ut do es n ot dep end on ρ 2 or an y other higher moments. C Second momen t Lemma 1. [Hotel ling (1953)] L et R b e the sampl e c orr elation c o eﬃcient c omp ute d fr om a sample of size n ≥ 3 c oming fr om a biv ariate normal distribution with p opulation c orr elation c o eﬃc i ent ρ ∈ ( − 1 , 1) . Th en we have that v ar( R ) = (1 − ρ 2 ) 2 ( n − 1) − 1 + O ( n − 2 ) . (C.1) Pr o of. S ee Hotelling (1953).  Note that the v ariance is largest when ρ = 0 and that if X and Y are p erfectly corr elated (i.e., ρ = ± 1), then the v ariance is equal to zero giv en that R is a degenerate random v ariable in that case. D Concen tration inequalities W e present a p ro of o f Prop osition 1. Pr o of. S ince R is b ound ed almost surely , all of its momen ts exist, moreo ver, E { ( R − ρ ) 2 m } = E { ( R − ρ ) 2 m − 2 ( R − ρ ) 2 } ≤ 2 m − 2 ν 2 , where m ∈ { 1 , 2 , 3 , . . . } , and ν 2 := v ar( R ) ≤ ( n − 1) − 1 , where equalit y holds if ρ = 0. Hence, R satisﬁes Bernstein’s condition; see W ain wright (2019). Consequent ly , by (C.1), for t > 0 w e ha v e that Pr ( | R − E ( R ) | > t ) ≤ 2 exp  − t 2 { 2( ν 2 + 2 t ) } − 1  , scaling prop erly by (1 − n − 1 ) 1 2 ﬁnishes th e ﬁ rst p art of the pr o of. W e pro ceed to the second part. Com bin ing Mark o v’s inequ ality , (B.1) and (C.1) giv es R = ρ + O Pr ( n − 1 2 ) . (D.1) Notice that by lo oking at (A.3) an d expressin g the gamma fu nction as a factorial w e hav e th at g 2 m (2 q ) ≤ m − 1 Y t =1  2 t + 2 q + 1 2 t + 2 q + n − 1  g 2 (2 q ) , 13 whic h when su bstituted into (A.3) and noting that  1 − n − 2 2 m + n − 1  ≃  1 − n − 2 2 m + n + 2 q − 3  for q ∈ { 1 , 2 , . . . } results in the in equ alit y  1 − n − 2 n + 1  m − 1 E ( R 2 ) ≤ E ( R 2 m ) .  1 − n − 2 2 m + n − 1  m − 1 E ( R 2 ) , (D.2) th us, we can b ound f rom b elo w and ab o v e ev en m oments with the second momen t as follo ws. By Jensen ’s inequalit y w e ha v e t hat { E ( R 2 ) } m ≤ E ( R 2 m ) . (D.3) No w, b y (D.2) there exists a s equ ence ( a n ) ∈ R such that if E ( R 2 ) >  1 − n − 2 2 m + n − 1  , then w e ca n ﬁnd a sequence a n ≥ 1 that satisﬁes E ( R 2 m ) ≤ a n { E ( R 2 ) } m , similarly , if E ( R 2 ) ≤  1 − n − 2 2 m + n − 1  , th en we can ﬁnd a sequen ce a n ≥ 1 that s atisﬁes  1 − n − 2 2 m + n − 1  m − 1 ≤ a n { E ( R 2 ) } m − 1 , th us, w e h a v e that E ( R 2 m ) ≤  1 − n − 2 2 m + n − 1  m − 1 E ( R 2 ) ≤ a n { E ( R 2 ) } m , so by (D.3) and th e ab ov e there exists a sequence ( a n ) ∈ R su ch that a n ≥ 1 and { E ( R 2 ) } m ≤ E ( R 2 m ) ≤ a n { E ( R 2 ) } m . (D.4) By (D.1) and the con tinuous mapp ing theorem we hav e that all three terms in (D.4) tend to ρ 2 m as n → ∞ if a n → 1. Hence, there exists a sequence a n ≥ 1 that app roac hes one as n → + ∞ that allo ws us to p erform the appro ximation: E ( R 2 m ) ≈ { E ( R 2 ) } m , where m ∈ { 1 , . . . } , and we ﬁn d n 0 ∈ N su c h that a n 0 ≃ 1, wh ic h holds for an y n ≥ n 0 . 14 Using this, and recalling that | ρ | < 1 we b ound the cen tral ev en m omen ts as foll o ws. E { ( R − ρ ) 2 m } = 2 m X j =0  2 m j  ( − 1) j E ( R j ) ρ 2 m − j = m X j =0  2 m 2 j  E ( R 2 j )( ρ 2 ) m − j · · · − m − 1 X j =0  2 m 2 j + 1  E ( R 2 j +1 ) ρ 2 m − 2 j − 1 ( i ) . m X j =0  2 m 2 j  { E ( R 2 ) } j ( ρ 2 ) m − j · · · − m − 1 X j =0  2 m 2 j + 1  { E ( R 2 ) } 2 j +1 ( ρ 2 ) m − j | ρ − 1 | ( ii ) ≤ m X j =0  2 m j  ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j · · · + 2 m X j = m +1  2 m j  ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j = m X j =0  2 m j  m j  1 − 1 ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j · · · + 2 m X j = m +1  2 m j  ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j , (D.5) where ( i ) follo ws from the appr o ximation giv en b y (D.4) (i.e., E ( R 2 m ) ≈ { E ( R 2 ) } m ), noting that the sign do es n ot alternate for o dd summand s in subtracting terms, i.e., ρ 2 m − 2 j − 1 has the same sign as E ( R 2 j +1 ), and ( ii ) fol- lo ws f r om r earranging terms and n oting that ρ 2( m − j ) | ρ − 1 | > ρ 2( m − j ) . Hence, the subs titutions ab o v e increase the su m b y decreasing the subtracting terms 15 in abs olute v alue. No w, we compute the follo wing. m X j =0  2 m j  m j  1 − 1 ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j · · · = m X j =0 (2 m )! j !(2 m − j )! · j !( m − j )! m !  m j  ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j = (2 m )! m ! m X j =0  m Y k =0 (2 m − j − k )  − 1  m j  ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j ≤ (2 m )! 2 m m ! m X j =0  m j  ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j = (2 m )! 2 m m ! { E ( R 2 ) − ρ 2 } m = (2 m )! 2 m m ! ν 2 m + O ( n − 1 ) , (D.6) where ν 2 = v ar( R ) and the last equalit y follo ws from sub s tituting (B.1) into the abov e. Also, we ha ve that 2 m X j = m +1  2 m j  ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j ≤ m X j =0  2 m j  ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j ≤ (2 m )! 2 m m ! { E ( R 2 ) − ρ 2 } m , (D.7) hence, using (D.5 ), (D.6) and (D .7), w e ha v e that for m ∈ { 1 , . . . } E { ( R − ρ ) 2 m } . (2 m )! 2 m m !  2 1 / 2 m ν  2 m ≤ (2 m )! 2 m m !  √ 2 ν  2 m , th us, R satisﬁes a sub -Gaussian t yp e b ound ; see W ain wrigh t (2019). There- fore, it appro ximately satisﬁes a Gaussian concen tration b ound, wh ic h com- bined with Lemma 1 and for t > 0 results in Pr ( | R − ρ | > t ) . 2 exp  − t 2 4(1 − ρ 2 ) 2  ≤ 2 exp  − nt 2 4  . W e achiev e a tigh ter b ound by approximati ng the sum in (D.7) with a semi-telesco pic su m (i.e., alternating sign terms of v ery small num b ers, alb eit m ultiplied b y com binatorial coeﬃcients), whic h is app ro ximately equal to 16 zero. The ap p ro ximation is 2 m X j = m +1  2 m j  ( − 1) j { E ( R 2 ) } j ( ρ 2 ) m − j ≈ 0 , whic h results in the more aggressiv e b ound for t > 0 give n by Pr ( | R − ρ | > t ) . 2 exp  − t 2 2 ν 2  ≤ 2 exp  − nt 2 2  . (D.8)  E Sim ulation exp erimen t sup p lemen t The results for n = 3 , 5 , 30 , 1 00 follo w in T able 3 b elo w. In the in terest of replicabilit y , we p erform the exp erimen t by ﬁxing the random seed to 2023 in R . Ess entially , we sim ulate 10,000 obs erv ati ons of R compu ted usin g samples of size n = 3 , 5 , 30 , 100 coming from a b iv ariat e normal distribution with p op u lation correlation co eﬃcien t ρ . In T able 3: E ( R ) is giv en by (B.7), R is the mean of the simulations, sd( R ) is given by the square r o ot of (C .1 ), s R is the s tandard deviation of th e sim ulated v alues. W e also study the concen tration prop erties of the b ounds given by Pr op osition 1. W e solv e for t > 0 with n = 10, and obtain the co v erage int erv als: C 0 =  ( − t + ρ, ρ + t ) : 2 exp  − nt 2 8(1 − ρ 2 ) 2  = 0 . 05  , C 1 =  ( − t + ρ, ρ + t ) : 2 exp  − nt 2 4(1 − ρ 2 ) 2  = 0 . 05  , C 2 =  ( − t + ρ, ρ + t ) : 2 exp  − nt 2 2(1 − ρ 2 ) 2  = 0 . 05  . Whic h b y P rop osition 1 giv e Pr ( − t + ρ ≤ R ≤ t + ρ ) & 0 . 95 , where C 2 is the tigh test b ound and C 0 is lo osest one. 17 10,000 simulati ons with sample size n = 3 ρ E ( R ) R sd( R ) s R C 0 C 1 C 2 0 0 0.009 0.707 0.709 1 1 1 -0.25 -0.204 -0.19 4 0.663 0.691 1 1 1 0.56 0.457 0.457 0.485 0.616 1 1 0.92 -0.75 -0.612 -0.64 6 0.309 0.507 0.977 0.93 0.892 0.95 0.776 0.9 0.069 0.263 0.943 0 .926 0.9 06 10,000 simulati ons with sample size n = 5 ρ E ( R ) R sd( R ) s R C 0 C 1 C 2 0 0 -0.001 0.5 0.501 1 1 1 -0.25 -0.224 - 0.22 0.46 9 0.479 1 1 1 0.56 0.501 0.512 0.343 0.4 1 0.993 0.955 -0.75 -0.671 -0.70 3 0.219 0.305 0. 99 0.9 65 0.93 2 0.95 0.85 0.931 0.049 0.106 0.971 0.95 0.92 10,000 s im ulations with sample size n = 30 ρ E ( R ) R sd( R ) s R C 0 C 1 C 2 0 0 0.003 0.186 0.184 1 1 0.996 -0.25 -0.246 -0.24 7 0.174 0.175 1 1 0.99 3 0.56 0.551 0.555 0.127 0.129 1 0.998 0.988 -0.75 -0.737 -0.74 3 0.081 0.086 1 0.99 7 0 .98 0.95 0.934 0.948 0.018 0 .02 0.999 0.993 0.973 10,000 simulati ons with sample size n = 100 ρ E ( R ) R sd( R ) s R C 0 C 1 C 2 0 0 0.001 0.101 0.101 1 1 0.993 -0.25 -0.249 - 0.25 0.09 4 0.095 1 1 0.994 0.56 0.557 0.559 0.069 0.069 1 0.999 0.993 -0.75 -0.746 -0.74 8 0.044 0.045 1 0.99 9 0.989 0.95 0.945 0.9 5 0.01 0.01 1 0.999 0.988 T able 3: Sim ulation exp eriment with ten-thousand samples of R computed from samples of d iﬀeren t sizes coming f rom a biv ariate normal distribution. See te xt for fu rther description. 18

Concentration inequalities for the sample correlation coefficient

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment