Hotellings test for highly correlated data
This paper is motivated by the analysis of gene expression sets, especially by finding differentially expressed gene sets between two phenotypes. Gene $\log_2$ expression levels are highly correlated and, very likely, have approximately normal distribution. Therefore, it seems reasonable to use two-sample Hotelling’s test for such data. We discover some unexpected properties of the test making it different from the majority of tests previously used for such data. It appears that the Hotelling’s test does not always reach maximal power when all marginal distributions are differentially expressed. For highly correlated data its maximal power is attained when about a half of marginal distributions are essentially different. For the case when the correlation coefficient is greater than 0.5 this test is more powerful if only one marginal distribution is shifted, omparing to the case when all marginal distributions are equally shifted. Moreover, when the correlation coefficient increases the power of Hotelling’s test increases as well.
💡 Research Summary
The paper investigates the behavior of the two‑sample Hotelling’s T² test when applied to data sets in which the variables are highly correlated, a situation commonly encountered in gene‑expression studies. Gene expression levels, after log₂ transformation, are approximately normally distributed and often exhibit strong inter‑gene correlations. Under these conditions, the authors ask whether the classical intuition—that the test achieves maximal power when all marginal means are shifted equally—still holds.
To answer this, they adopt a compound‑symmetry covariance structure Σ = (1 − ρ)I + ρ 1 1ᵀ, where ρ is the common correlation coefficient and p the number of variables (genes). The two‑sample Hotelling statistic is
\
Comments & Academic Discussion
Loading comments...
Leave a Comment