The power of visualizing distributional differences: Formal graphical $n$-sample tests

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Classical tests are available for the two-sample test of correspondence of distribution functions. From these, the Kolmogorov-Smirnov test provides also the graphical interpretation of the test results, in different forms. Here, we propose modifications of the Kolmogorov-Smirnov test with higher power. The proposed tests are based on the so-called global envelope test which allows for graphical interpretation, similarly as the Kolmogorov-Smirnov test. The tests are based on rank statistics and are suitable also for the comparison of $n$ samples, with $n \geq 2$. We compare the alternatives for the two-sample case through an extensive simulation study and discuss their interpretation. Finally, we apply the tests to real data. Specifically, we compare the height distributions between boys and girls at different ages, the sepal length distributions of different flower species, and distributions of standardized residuals from a time series model for different exchange courses using the proposed methodologies.

💡 Research Summary

The paper addresses a notable limitation of the classic two‑sample Kolmogorov‑Smirnov (KS) test: while it offers a convenient graphical interpretation of distributional differences, its statistical power can be low, especially for tail differences, and it is restricted to continuous data and only two groups. To overcome these drawbacks, the authors propose a family of non‑parametric, permutation‑based graphical tests that extend the KS framework to any number of samples (n ≥ 2) and improve power by employing the global envelope testing methodology introduced by Myllymäki et al. (2017).

The core idea is to treat the statistic of interest (for example, the pointwise difference of empirical cumulative distribution functions, kernel‑density differences, or quantile differences) as a functional vector evaluated on a discretised domain. Under the null hypothesis of equal distributions across all groups, the data are permuted among the groups to generate a large number (s) of simulated functional statistics. A ranking measure (such as extreme rank length, continuous rank, or area measure) is then applied to the observed statistic and the simulated ones. The rank of the observed statistic determines a critical value E(α) that yields a (1 − α) % global envelope: at each discretisation point the envelope consists of the minimum and maximum values among all simulated statistics whose rank is not more extreme than E(α). If the observed functional statistic leaves this envelope at any point, the null hypothesis is rejected, and the location(s) where the envelope is crossed pinpoint where the distributions differ.

Because the test relies solely on permutation and rank ordering, it makes no distributional assumptions about the underlying data or the test statistic, allowing it to be applied to small samples, discrete data, and any number of groups. The graphical output—essentially a band plot with the observed curve overlaid—provides an intuitive visual cue of the nature and location of the differences, preserving the appealing interpretability of the classic KS test while extending it to more complex scenarios.

A comprehensive simulation study compares the proposed global‑envelope tests with the asymptotic KS test and the permutation‑based KS test across a variety of alternatives: location shifts, scale changes, tail‑weight alterations, and mixtures. The results consistently show higher empirical power for the global‑envelope approach, especially when differences are concentrated in the tails where the KS test is known to be weak. The authors also demonstrate the method’s flexibility by using different functional statistics (ECDF differences, kernel density differences, pairwise quantile differences) and by handling more than two groups simultaneously.

Three real‑world applications illustrate the practical utility of the method. First, height distributions of boys and girls at various ages are compared; the envelope plot reveals specific age ranges where gender differences are most pronounced. Second, sepal‑length distributions among Iris species are examined, showing not only mean shifts but also differences in distribution shape and tail behaviour. Third, standardized residuals from a time‑series model for several exchange‑rate series are analyzed; the envelope highlights periods where particular series deviate markedly from the others.

The paper concludes that the global‑envelope permutation framework provides a powerful, distribution‑free, and visually informative solution for testing equality of distributions across multiple samples. It retains the interpretability of KS‑style graphics while overcoming its power limitations and extending applicability to discrete data, small samples, and more than two groups. The authors suggest future extensions such as incorporating alternative distance measures (e.g., Wasserstein, Energy distance), adapting the method to high‑dimensional functional data, and developing interactive visualization tools to further aid practitioners in exploratory data analysis.

The power of visualizing distributional differences: Formal graphical $n$-sample tests

💡 Research Summary

Comments & Academic Discussion

Leave a Comment