A nonparametric independence test using random permutations
We propose a new nonparametric test for the supposition of independence between two continuous random variables. The test is based on the size of the longest increasing subsequence of a random permutation. We identified the independence assumption between the two continuous variables with the space of permutation equipped with the uniform distribution and we show the exact distribution of the statistic. We calculate the distribution for several sample sizes. Through a simulation study we estimate the power of our test for diverse alternative hypothesis under the null hypothesis of independence.
💡 Research Summary
The paper introduces a novel non‑parametric test for assessing independence between two continuous random variables by exploiting the combinatorial properties of random permutations. The authors begin by ranking the observations of one variable (say X) and then extracting the corresponding ranks of the other variable (Y) to form a permutation π of {1,…,n}. Under the null hypothesis of independence, π is uniformly distributed over the symmetric group Sₙ, a fact that allows the authors to treat the independence problem as a problem about random permutations.
The test statistic is the length Lₙ of the longest increasing subsequence (LIS) of π. Intuitively, a random permutation (i.e., independent X and Y) yields an LIS length close to its expected value μₙ≈2√n (Ulam’s law), whereas a permutation that reflects a systematic relationship between X and Y will tend to have an unusually long (or short) LIS. The authors connect Lₙ to the first row length of the Young tableau obtained via the Robinson‑Schensted‑Knuth (RSK) correspondence, which enables an exact combinatorial expression for the distribution P(Lₙ = k). By employing dynamic programming they compute these probabilities for sample sizes up to at least n = 200, providing lookup tables for practitioners.
The testing procedure is straightforward: (1) sort the data by X and record the Y‑ranks to obtain π; (2) compute Lₙ using a Patience‑Sorting algorithm in O(n log n) time; (3) compare the observed Lₙ with the pre‑computed exact distribution to obtain a two‑sided p‑value; (4) reject the null if the p‑value falls below the chosen significance level (typically α = 0.05).
A comprehensive simulation study evaluates both type‑I error control and power under a variety of alternative dependence structures. Under H₀ the empirical rejection rates match the nominal level for all examined n, confirming the correctness of the exact distribution tables. For alternatives, the authors consider linear (Y = aX + ε), monotone non‑linear (Y = X² + ε), non‑monotone non‑linear (Y = sin(πX) + ε), and mixed forms (Y = X·ε). While traditional tests (Pearson correlation, Spearman rank correlation, Kendall’s τ) perform well for linear relationships, they lose power dramatically for the non‑monotone cases. In contrast, the LIS‑based test retains high power (often > 0.7) even when the signal‑to‑noise ratio is modest, demonstrating its sensitivity to complex dependencies that are invisible to correlation‑based methods.
The authors acknowledge that exact distribution tables become computationally burdensome for very large n. To address this, they show that Lₙ, after appropriate centering and scaling, converges to a normal distribution with mean μₙ≈2√n and variance σₙ² that can be approximated by known asymptotic formulas (σₙ²≈c·n^{1/3}). This normal approximation enables rapid p‑value calculation for large samples without Monte‑Carlo simulation.
Limitations are discussed: the method requires continuous variables (ties would break the uniform permutation assumption), and the current theory is limited to the bivariate case. The authors sketch a possible multivariate extension by constructing a multi‑dimensional permutation and aggregating LIS lengths across dimensions, but a rigorous treatment is left for future work.
In summary, the paper contributes a theoretically sound, exact‑distribution‑based independence test that is computationally efficient (O(n log n)) and particularly powerful against non‑linear, non‑monotone alternatives. By grounding the test in permutation combinatorics and providing practical implementation details (lookup tables, normal approximations, simulation evidence), the authors deliver a valuable tool for statisticians and data scientists seeking robust, distribution‑free independence assessments beyond the reach of classical correlation measures.
Comments & Academic Discussion
Loading comments...
Leave a Comment