Distribution Fitting 2. Pearson-Fisher, Kolmogorov-Smirnov, Anderson-Darling, Wilks-Shapiro, Cramer-von-Misses and Jarque-Bera statistics

Distribution Fitting 2. Pearson-Fisher, Kolmogorov-Smirnov,   Anderson-Darling, Wilks-Shapiro, Cramer-von-Misses and Jarque-Bera statistics
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The methods measuring the departure between observation and the model were reviewed. The following statistics were applied on two experimental data sets: Chi-Squared, Kolmogorov-Smirnov, Anderson-Darling, Wilks-Shapiro, and Jarque-Bera. Both investigated sets proved not to be normal distributed. The Grubbs test identified one outlier and after its removal the normality of the set of 205 chemical active compounds was accepted. The second data set proved not to have any outliers. Kolmogorov-Smirnov statistic is less affected by the existence of outliers (positive variation expressed as percentage smaller than 2). The outliers bring to Kolmogorov-Smirnov statistic errors of type II and to the Anderson-Darling statistic errors of type I.


💡 Research Summary

The paper provides a systematic comparison of several goodness‑of‑fit tests—Pearson‑Fisher (Chi‑square), Kolmogorov‑Smirnov (KS), Anderson‑Darling (AD), Wilks‑Shapiro (WS), Cramer‑von‑Mises (CvM) and Jarque‑Bera (JB)—and demonstrates how they behave on two real‑world data sets. The first data set consists of 205 chemical‑active compounds, while the second set is a separate experimental collection with no identified outliers.

Initially, all six tests reject normality for both data sets. In the first set, the Grubbs test detects a single extreme observation. After removing this outlier, the remaining 204 observations pass all normality tests: KS, AD, WS, CvM and JB all yield p‑values above the conventional 0.05 threshold. This illustrates that a single outlier can dominate the outcome of normality assessments, especially for tests that are sensitive to tail behavior.

The second data set contains no outliers, yet every test continues to reject normality. AD and JB are the most decisive, producing p‑values well below 0.001, indicating pronounced skewness or heavy‑tailed characteristics. KS also rejects normality but shows a relatively modest change in its statistic (variation under 2 %). This confirms the well‑known property that KS is less influenced by extreme observations because it measures the maximum absolute difference between empirical and theoretical cumulative distribution functions across the entire range.

A key contribution of the study is the explicit discussion of Type I and Type II error patterns associated with each test in the presence of outliers. The authors find that KS is prone to Type II errors (failing to reject a false null hypothesis) when outliers are present, whereas AD is prone to Type I errors (incorrectly rejecting a true null hypothesis) under the same conditions. This divergence stems from AD’s weighting scheme, which emphasizes the tails of the distribution, making it highly responsive to outliers located there.

The paper also explores how sample size affects test power. WS and JB lose substantial power when the sample size falls below roughly 30 observations, whereas KS and AD retain relatively stable power across a broader range of sample sizes. CvM provides a middle ground, and the chi‑square test’s reliability hinges on adequate expected frequencies within each bin, making it unsuitable for small or unevenly distributed data.

From a practical standpoint, the authors recommend a multi‑step workflow: (1) conduct robust outlier detection (e.g., Grubbs, Dixon) and remove or treat extreme points; (2) apply KS and AD together to capture both overall shape and tail discrepancies; (3) use WS and JB as supplementary checks when normality is a critical modeling assumption and the sample is sufficiently large; (4) reserve chi‑square for categorical or binned data where expected counts are high. By acknowledging the distinct sensitivities of each test, analysts can avoid misinterpretation caused by hidden outliers or inappropriate test selection.

Overall, the study underscores that no single goodness‑of‑fit statistic is universally optimal. Instead, a combination of tests, informed by the data’s characteristics (presence of outliers, sample size, tail behavior), yields a more reliable assessment of distributional fit. This insight is valuable for fields ranging from chemometrics and quality control to biomedical research, where model validation hinges on accurate distributional assumptions.


Comments & Academic Discussion

Loading comments...

Leave a Comment