Distribution-free two-sample testing with blurred total variation distance
Two-sample testing, where we aim to determine whether two distributions are equal or not equal based on samples from each one, is challenging if we cannot place assumptions on the properties of the two distributions. In particular, certifying equality of distributions, or even providing a tight upper bound on the total variation (TV) distance between the distributions, is impossible to achieve in a distribution-free regime. In this work, we examine the blurred TV distance, a relaxation of TV distance that enables us to perform inference without assumptions on the distributions. We provide theoretical guarantees for distribution-free upper and lower bounds on the blurred TV distance, and examine its properties in high dimensions.
💡 Research Summary
The paper tackles the classic two‑sample testing problem—deciding whether two unknown distributions P and Q are identical—under a fully distribution‑free setting, i.e., without any smoothness, moment, or support assumptions. It first revisits the total variation (TV) distance, a natural metric for hypothesis testing, and proves a fundamental impossibility result (Theorem 1.1): any distribution‑free upper confidence bound for TV must be trivial (essentially always equal to 1), because with continuous distributions finite samples are almost surely disjoint, allowing the possibility that P and Q are mutually singular. While lower bounds are easy to obtain (by evaluating probability differences on a fixed set), the asymmetry makes TV unsuitable for non‑parametric, assumption‑free inference.
To overcome this, the authors introduce the blurred TV distance dₕTV(P,Q) = TV(P * ψₕ, Q * ψₕ), where ψₕ is a scaled kernel (typically Gaussian) and “*” denotes convolution. Intuitively, both distributions are passed through the same smoothing operation before measuring TV, which mitigates the pathological singularity issue. They establish several key properties: (i) dₕTV ≤ TV for any bandwidth h; (ii) as h → 0, dₕTV converges to the original TV; (iii) as h → ∞, dₕTV collapses to zero; (iv) the map h↦dₕTV is continuous; and for the Gaussian kernel the distance is monotone decreasing in h.
The paper then studies empirical approximations. Let 𝑃̂ₙ and 𝑄̂ₘ be the empirical measures based on n and m samples. Theorem 2.3 shows that for any fixed h > 0, the expected blurred TV between the empirical and true distribution vanishes as n→∞. Proposition 2.4 quantifies the bias of the plug‑in estimator dₕTV(𝑃̂ₙ, 𝑄̂ₘ) by a term Δₙ,ₘ,ₕ that depends on the variability within each sample and can be estimated via sample splitting. As n,m grow, Δₙ,ₘ,ₕ→0.
Building on these results, the authors construct distribution‑free confidence intervals. Using McDiarmid’s inequality to control concentration of the empirical blurred TV, they define an upper confidence bound (DF‑UCB)
Ûα = dₕTV(𝑃̂ₙ, 𝑄̂ₘ) + εₙ,ₘ,α,
and a lower confidence bound (DF‑LCB)
Ĺα = max{0, dₕTV(𝑃̂ₙ, 𝑄̂ₘ) − dₕTV(𝑃̂ₙ^{(1)}, 𝑃̂ₙ^{(2)}) − dₕTV(𝑄̂ₘ^{(1)}, 𝑄̂ₘ^{(2)}) − 3εₙ,ₘ,α},
where εₙ,ₘ,α = √{(log(1/α)/2)(1/n + 1/m)}. Both bounds hold uniformly over all dimensions, all distributions, and any sample sizes, providing a non‑asymptotic, assumption‑free guarantee.
Computationally, exact evaluation of dₕTV(𝑃̂ₙ, 𝑄̂ₘ) requires integrating over ℝᵈ, which is prohibitive in high dimensions. The authors propose a Monte‑Carlo approximation: draw independent noise vectors from ψₕ, add them to the original samples, and estimate TV on the resulting noisy samples using standard empirical TV estimators. This approximation preserves the distribution‑free coverage because the randomness is independent of the data and can be incorporated into the confidence bound via an additional concentration term.
The high‑dimensional analysis (Section 4) reveals that the difficulty of inference depends not on the ambient dimension d but on the intrinsic dimension of the data manifold. When P and Q are supported on a low‑dimensional subspace, the required sample size scales with that intrinsic dimension, allowing meaningful testing even when d≫n,m. This contrasts with classical TV‑based tests, whose power deteriorates rapidly as d grows.
Finally, the paper situates blurred TV within a broader literature: it connects to contraction properties of additive‑noise channels in information theory, to smoothed integral probability metrics, and to recent work on MMD and Wasserstein distances. Unlike those alternatives, blurred TV retains a direct interpretation in terms of hypothesis testing (as a smoothed version of TV) while being statistically tractable without any structural assumptions. The authors conclude that blurred TV offers a principled compromise—preserving much of TV’s interpretability while enabling finite‑sample, distribution‑free inference—and opens avenues for robust, high‑dimensional two‑sample testing.
Comments & Academic Discussion
Loading comments...
Leave a Comment