Inference for high dimensional repeated measure designs with the R package hdrm

Inference for high dimensional repeated measure designs with the R package hdrm
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Repeated-measure designs allow comparisons within a group as well as between groups, and are commonly referred to as split-plot designs. While originating in agricultural experiments, they are now widely used in medical research, psychology, and the life sciences, where repeated observations on the same subject are essential. Modern data collection often produces observation vectors with dimension $d$ comparable to or exceeding the sample size $N$. Although this can be advantageous in terms of cost efficiency, ethical considerations, and the study of rare diseases, it poses substantial challenges for statistical inference. Parametric methods based on multivariate normality provide a flexible framework that avoids restrictive assumptions on covariance structures or on the asymptotic relationship between $d$ and $N$. Within this framework, the freely available R-package hdrm enables the analysis of a wide range of hypotheses concerning expectation vectors in high-dimensional repeated-measure designs, covering both single-group and multi-group settings with homogeneous or heterogeneous covariance matrices. This paper describes the implemented tests, demonstrates their use through examples, and discusses their applicability in practical high-dimensional data scenarios. To address computational challenges arising for large $d$, the package incorporates efficient estimators and subsampling strategies that substantially reduce computation time while preserving statistical validity.


💡 Research Summary

The manuscript addresses the challenging problem of statistical inference for high‑dimensional repeated‑measure (split‑plot) designs, where the number of measurement variables (d) can be comparable to or larger than the total sample size (N). By assuming multivariate normality, the authors develop a flexible parametric framework that does not require restrictive covariance‑structure assumptions or a fixed relationship between (d) and (N). Central to the approach is the construction of an ANOVA‑type quadratic form (Q_N = N\bar X^{\top}T\bar X), where (T) is the projection matrix associated with the null hypothesis (H: H\mu = 0). Because (Q_N) may diverge with increasing dimension, a standardized statistic (e_{W,N}) is introduced, whose expectation and variance can be expressed in terms of trace functionals of the unknown covariance matrices.

A major contribution is the derivation of eight asymptotic regimes (I–VIII) that cover heterogeneous and homogeneous covariance settings, varying numbers of groups (a), dimensions (d), and sample sizes (n_i). In each regime the limiting distribution of (e_{W,N}) is either standard normal (when the leading eigenvalue of (T\Sigma_N T) is negligible) or a standardized (\chi^2_1) (when the leading eigenvalue dominates). To obtain a unified testing procedure, the authors employ the Pearson approximation, which matches the first three moments of (e_{W,N}) and yields a statistic (K_{f_P}) with a data‑driven degrees‑of‑freedom parameter (f_P). This approximation provides valid critical values irrespective of the eigenvalue configuration.

The practical implementation hinges on consistent estimators for the required trace terms. For the single‑group case, unbiased U‑statistics (A_1, A_2, A_3) are constructed. For multiple groups, analogous estimators (B_{i,1},\dots,B_{i,5}) are built from within‑group differences (Y_{i,\ell_1,\ell_2}=X_{i,\ell_1}-X_{i,\ell_2}). Because exact computation of higher‑order trace terms (especially (\operatorname{tr}


Comments & Academic Discussion

Loading comments...

Leave a Comment