A Bayesian Two-Sample Mean Test for High-Dimensional Data

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a two-sample Bayesian mean test based on the Bayes factor with non-informative priors, specifically designed for scenarios where $p$ grows with $n$ with a linear rate $p/n \to c_1 \in (0, \infty)$. We establish the asymptotic normality of the test statistic and the asymptotic power. Through extensive simulations, we demonstrate that the proposed test performs competitively, particularly when the diagonal elements have heterogeneous variances and for small sample sizes. Furthermore, our test remains robust under distribution misspecification. The proposed method not only effectively detects both sparse and non-sparse differences in mean vectors but also maintains a well-controlled type I error rate, even in small-sample scenarios. We also demonstrate the performance of our proposed test using the \texttt{SRBCTs} dataset.

💡 Research Summary

This paper addresses the challenging problem of testing equality of mean vectors from two high‑dimensional populations when the dimension p grows proportionally with the total sample size n (p/n → c₁ ∈ (0,∞)). Classical Hotelling’s T² becomes infeasible because the sample covariance matrix is singular when p > n, and many recent frequentist proposals either discard the covariance structure (using only diagonal elements) or rely on random projections that are not scale‑invariant.

The authors propose a fully Bayesian two‑sample mean test based on the Bayes factor with non‑informative priors. Under the null hypothesis H₀: μ₁ = μ₂ = μ and the alternative H₁: μ₁ ≠ μ₂, they place independent inverse‑Wishart priors W⁻¹ₚ(m₀,V) and W⁻¹ₚ(m₁,V) on the common covariance matrix Σ, with V taken as a scalar multiple of the identity (V = k′Iₚ). By integrating out μ₁, μ₂ and Σ, they obtain a closed‑form expression for the Bayes factor BF₁₀. The key term driving the decision is n₀ Dᵀ(Aₙ+V)⁻¹ D, where D = \bar X₁ − \bar X₂, n₀ = n₁n₂/n, and Aₙ = (n − 2)Sₙ.

Because the full sample covariance Sₙ is ill‑conditioned in high dimensions, the authors replace it with its diagonal part. They define Λₙ = (diag(Sₙ)+kIₚ)⁻¹, a diagonal matrix that retains variance information on each coordinate while regularizing the inverse. The test statistic is then

T_BF,1 =

A Bayesian Two-Sample Mean Test for High-Dimensional Data

💡 Research Summary

Comments & Academic Discussion

Leave a Comment