Chi-squared Test for Binned, Gaussian Samples
We examine the $\chi^2$ test for binned, Gaussian samples, including effects due to the fact that the experimentally available sample standard deviation and the unavailable true standard deviation have different statistical properties. For data forme…
Authors: Nicholas R. Hutzler
Chi-squared T est for Binned, Gaussian Samples Nic holas R. Hutzler Division of Ph ysics, Mathematics, and Astronomy California Institute of T ec hnology P asadena, CA 91125 E-mail: hutzler@caltech.edu June 2019 Abstract. W e examine the χ 2 test for binned, Gaussian samples, including effects due to the fact that the experimentally a v ailable sample standard deviation and the una v ailable true standard deviation ha ve differen t statistical prop erties. F or data formed by binning Gaussian samples with bin size n , w e find that the exp ected v alue and standard deviation of the reduced χ 2 statistic is n − 1 n − 3 ± n − 1 n − 3 r n − 2 n − 5 r 2 N − 1 , (1) where N is the total num ber of binned v alues. This is strictly larger in b oth mean and standard deviation than the v alue of 1 ± (2 / ( N − 1)) 1 / 2 rep orted in standard treatmen ts, whic h ignore the distinction b et w een true and sample standard deviation. 1. In tro duction Precision measuremen ts of ph ysical quantities t ypically require a very large n umber of individual measurements of the same quantit y often tak en under v arying conditions, suc h as drifting signal-to-noise or man y experimental configurations with differen t signal sizes. F or this reason, as well as for simplification of data analysis and reduction of computational requiremen ts, the data are t ypically binned together suc h that measuremen ts in the same bin were tak en within a time during which the conditions were similar. In order to chec k whether the binning is susceptible to the v arying conditions, as well as to search for unkno wn sources of noise, a χ 2 test [1, 2, 3] is commonly used. Regardless of whether or not it is an ideal choice of statistic for this case, it is fairly in tuitiv e as a measure of whether the assigned error bars are correctly capturing the statistics of the data. Ho w ev er, some of the simplifying assumptions used to construct the standard χ 2 can giv e results with a significant bias for large data sets. W e discuss wh y the standard treatmen t underestimates b oth the mean and v ariance of the χ 2 statistic, and then determine the appropriate correction factors. Chi-squar e d T est for Binne d, Gaussian Samples 2 2. Chi-squared test for binned, Gaussian samples Consider a quantit y N x 1 of measurements x i without an y assigned uncertainties. Sa y that the measuremen ts are normally distributed with constant, true mean µ that is not know n to the experimenter. W e shall not assume that the data has a constant v ariance. Let us gather these data sequen tially in to groups G j with n consecutive points eac h. Now compute the usual sample mean, standard deviation, and standard error of eac h group of p oin ts: y j = 1 n X x i ∈ G j x i , s j = s 1 n − 1 X x i ∈ G j ( x i − y j ) 2 , s y j = 1 √ n s j . (2) W e hav e now binned our data in to a smaller set of N = N x /n 1 mean v alues y j and uncertainties s y j . As a chec k to see whether the assigned uncertainties are correctly capturing the statistical fluctuations of the data we can p erform a χ 2 test as outlined in many standard texts [1, 2, 3]. W e will test the hypothesis that the y j are normally distributed ab out a constan t ¯ y (though this approac h is easily extended to mo dels with more degrees of freedom), and that the uncertain ties correctly describ e the statistical fluctuations of the data ab out the mean. The reduced- χ 2 v alue of the data set is χ 2 red = 1 N − 1 N X j =1 y j − ¯ y σ y j 2 ≡ 1 N − 1 N X j =1 χ 2 j , (3) where ¯ y = ( P j y j /s 2 y j ) / ( P j 1 /s 2 y j ) is the w eigh ted mean of the y data, and σ y j is the true (unknown) standard deviation of the p oin ts { x i ∈ G j } , which need not b e constant o v er different v alues of j . If the fluctuations in the data are Gaussian in nature, and correctly accoun ted for by the uncertainties, then we hav e the usual result E[ χ 2 red ] = 1 , Std[ χ 2 red ] = r 2 N − 1 . (4) Ho w ever, the exp erimen ter do es not know the true standard deviation, and therefore actually computes the statistic ˜ χ 2 red = 1 N − 1 N X j =1 y j − ¯ y s y j 2 ≡ 1 N − 1 N X j =1 ˜ χ 2 j , (5) using s y j as an estimator for σ y j . W e wish to find the statistical prop erties of this quantit y , which w e shall find differ from χ 2 red . In tuitiv ely , the sample standard deviation is computed from a finite n um b er of measuremen ts and therefore has some uncertain t y associated with it, and that uncertain t y should b e propagated through when examining the ˜ χ 2 red statistic. This is a w ell-kno wn effect when estimating parameters from finite data sets and has b een previously explored in a n um b er of contexts, for example Poisson distributions, counting exp erimen ts, w eigh ted means, and histogram fitting [4, 5, 6, 7, 8, 9, 10]. More sp ecifically , while χ j ∼ N (0 , 1) is normally distributed, ˜ χ j is not: ˜ χ j ≡ y j − ¯ y s y j ≈ y j − µ s y j ∼ t ( n − 1) , (6) Chi-squar e d T est for Binne d, Gaussian Samples 3 the t -distribution with n − 1 degrees of freedom, which has larger tails for finite n than a normal distribution. Notice that we are treating ¯ y = µ as a constant, which is v alid in the limit N 1, though for smaller N the statistical prop erties of the weigh ted mean cannot b e ignored [9, 11, 12, 13, 14, 15]. In particular, the weigh ted mean also has correction factors due to the difference b et ween true and sample standard deviation, and has a non-trivial v ariance, b oth of whic h will impact the ˜ χ 2 red statistic. A go o d discussion of these complexities can b e found in reference [15]. The square of ˜ χ j is therefore distributed as ˜ χ 2 j ∼ F (1 , n − 1), the F -distribution with (1 , n − 1) degrees of freedom, which has E[ F (1 , n − 1)] = n − 1 n − 3 , V ar[ F (1 , n − 1)] = 2 n − 1 n − 3 2 n − 2 n − 5 . (7) This is as opp osed to the χ 2 j statistic, which has (appropriately) a χ 2 distribution. ˜ χ 2 red is therefore distributed as a sum of F -distributions, which is complicated [16]. How ev er, the exp ectation v alue and v ariance are straigh tforw ard to calculate, E[ ˜ χ 2 red ] = N N − 1 E ˜ χ 2 j = n − 1 n − 3 + O N − 1 , (8) V ar[ ˜ χ 2 red ] = N ( N − 1) 2 V ar ˜ χ 2 j = 2 N − 1 n − 1 n − 3 2 n − 2 n − 5 + O N − 2 . (9) This implies that the mean and standard deviation of the ˜ χ 2 red statistic are larger than those of the χ 2 red statistic b y E[ ˜ χ 2 red ] E[ χ 2 red ] = n − 1 n − 3 , Std[ ˜ χ 2 red ] Std[ χ 2 red ] = n − 1 n − 3 r n − 2 n − 5 , (10) up to further corrections of order O ( N − 1 ). A plot of these correction factors is sho wn in Figure 1. In the limit n → ∞ we reco v er the usual result, but for finite n w e will alw a ys exp ect larger v alues for b oth mean and standard deviation. W e can also see that c ho osing n ≤ 5 is not advisable, since the statistic will ha ve a non-con v ergen t v ariance. 5 10 20 50 100 1 1.1 1.2 1.4 1.6 1.8 2 2.5 3 3.5 Bin Size n E[ χ red 2 ]/E[ χ red 2 ] Std[ χ red 2 ]/Std[ χ red 2 ] ~ ~ Figure 1. Correction factors to the mean and standard deviation of ˜ χ 2 red . Chi-squar e d T est for Binne d, Gaussian Samples 4 3. Conclusion In summary , we find that the standard χ 2 statistic computed from binning finite data sets underestimates the mean and v ariance for binned Gaussian samples, and derive simple, closed expressions for the biases. F or v ery large data sets with finite bin sizes, suc h as those commonly found in precision physics measuremen ts, these corrections can b e significant and should not b e neglected. A cknow le dgments. I would lik e to ac kno wledge helpful discussions with Da vid W atson, and man y helpful discussions with the ACME Collab oration, in particular Da vid DeMille, John M. Doyle, and Brendon O’Leary . App endix: A simple example W e can see how the “usual” chi-squared statistic giv es an incorrect result by p erforming a simple n umerical test on some simulated data. Generate 1,000,000 points x i ∼ N (0 , 1), bin into groups of n = 10, and then compute means y j , standard errors σ y j , and the reduced chi-squared statistic ˜ χ 2 red (as describ ed in the main text) for the resulting 100,000 binned p oin ts. Nx = 1000000 //Number of x values nbin = 10 //Number of points to bin for j = 1:(Nx/nbin) //Step over bins x = randn(1,nbin) //Generate nbin normally distributed points y(j) = mean(x) //Means sigmayi(j) = std(x)/sqrt(nbin) //Standard errors end ybar = sum(y./sigmayi.^2)/sum(1./sigmayi.^2) //Weighted mean chi = (y-ybar)./sigmayi //chi chi2 = sum(chi.^2) //chi^2 dof = length(y)-1 //Degrees of freedom redchi2 = chi2/dof //Reduced chi^2 redchi2sigma = sqrt(2/dof) //‘‘Usual’’ uncertainty of chi^2 If w e run this piece of co de, w e will find redchi2 = 1.2868 and redchi2sigma = 0.0045 (though of course the former will b e different each time due to the random nature of the calculation.) This v alue differs considerably from the na ¨ ıv e exp ectation of 1 ± 0 . 0045 based on the usual treatmen t that ignores the difference b et w een sample and true standard deviations, but is quite close to the exp ected v alue of 1 . 2857 ± 0 . 0073 from equations (8) and (9). Chi-squar e d T est for Binne d, Gaussian Samples 5 [1] Press W H, T eukolsky S A, V etterling W T and Flannery B P 2007 Numeric al R e cip es 3rd ed (Cam bridge Universit y Press) ISBN 978-0521880688 [2] Bevington P R and Robinson D K 2003 Data R e duction and Err or Analysis for the Physic al Scienc es 3rd ed (Boston: McGraw-Hill) [3] T a ylor J R 1996 An Intr o duction to Err or Analysis 2nd ed (Univ ersity Science Bo oks) ISBN 978- 0935702750 [4] Bak er S and Cousins R D 1984 Nucl. Instruments Metho ds Phys. R es. 221 437–442 ISSN 01675087 [5] Jading Y and Riisager K 1996 Nucl. Instruments Metho ds Phys. R es. Se ct. A A c c el. Sp e ctr ometers, Dete ct. Asso c. Equip. 372 289–292 ISSN 01689002 [6] Hammersley A and Antoniadis A 1997 Nucl. Instruments Metho ds Phys. R es. Se ct. A A c c el. Sp e ctr ometers, Dete ct. Asso c. Equip. 394 219–224 ISSN 01689002 [7] Mighell K J 1999 Astr ophys. J. 518 380–393 ISSN 0004-637X [8] Hausc hild T and Jen tschel M 2001 Nucl. Instruments Metho ds Phys. R es. Se ct. A A c c el. Sp e ctr ometers, Dete ct. Asso c. Equip. 457 384–401 ISSN 01689002 [9] Zhang N F 2006 Metr olo gia 43 195–204 ISSN 00261394 [10] Gagunash vili N 2010 Nucl. Instruments Metho ds Phys. R es. Se ct. A A c c el. Sp e ctr ometers, Dete ct. Asso c. Equip. 614 287–296 ISSN 01689002 [11] Hutzler N R 2014 A New Limit on the Ele ctr on Ele ctric Dip ole Moment Ph.D. thesis Harv ard Univ ersity [12] Co c hran W G 1937 Suppl. to J. R. Stat. So c. 4 102–118 [13] Gra ybill F A and Deal R B 1959 Biometrics 15 543–550 [14] B¨ oc kenhoff A and Hartung J 1998 Biometric al J. 40 937–947 ISSN 0323-3847 [15] Hartung J, Knapp G and Sinha B K 2008 Statistic al Meta-Analysis with Applic ations (Wiley- In terscience) [16] Morrison D F 1971 J. Am. Stat. Asso c. 66 383 ISSN 01621459
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment