Multilevel functional principal component analysis
The Sleep Heart Health Study (SHHS) is a comprehensive landmark study of sleep and its impacts on health outcomes. A primary metric of the SHHS is the in-home polysomnogram, which includes two electroencephalographic (EEG) channels for each subject, at two visits. The volume and importance of this data presents enormous challenges for analysis. To address these challenges, we introduce multilevel functional principal component analysis (MFPCA), a novel statistical methodology designed to extract core intra- and inter-subject geometric components of multilevel functional data. Though motivated by the SHHS, the proposed methodology is generally applicable, with potential relevance to many modern scientific studies of hierarchical or longitudinal functional outcomes. Notably, using MFPCA, we identify and quantify associations between EEG activity during sleep and adverse cardiovascular outcomes.
💡 Research Summary
The paper introduces Multilevel Functional Principal Component Analysis (MFPCA), a statistical framework designed to handle hierarchical functional data such as that generated by the Sleep Heart Health Study (SHHS). SHHS collects in‑home polysomnography recordings from two electroencephalographic (EEG) channels at two separate visits for each participant, producing a three‑dimensional data structure (subject × visit × time). Traditional functional principal component analysis (FPCA) treats the data as a single level, extracting a mean function and a covariance operator, then performing an eigen‑decomposition to obtain a set of orthogonal eigenfunctions that summarize the dominant modes of variation. However, when data possess multiple nested sources of variability—between subjects, between visits within subjects, and between channels—single‑level FPCA can conflate these sources and obscure scientifically relevant patterns.
MFPCA extends FPCA by decomposing variability hierarchically. First, a global mean function is estimated using a smooth basis (e.g., B‑splines or Fourier series). Residual functions are then obtained for each subject‑visit‑channel observation. At the highest level (subjects), a subject‑level covariance operator is estimated from these residuals, yielding the first set of eigenfunctions (level‑1 eigenfunctions) and associated subject scores. These scores capture the dominant inter‑subject differences in EEG dynamics. The remaining residuals are projected onto a visit‑level covariance operator, producing level‑2 eigenfunctions and visit scores that describe within‑subject changes across the two study visits. Finally, channel‑specific variability can be modeled either as an additional level or incorporated into the visit‑level structure, allowing the analyst to isolate spatial (channel) effects from temporal (visit) effects.
Estimation proceeds by first smoothing each raw EEG curve to reduce measurement noise, then applying restricted maximum likelihood (REML) or variational Bayesian techniques to estimate the covariance operators at each level. Eigen‑decomposition of these operators yields orthogonal bases that are mutually orthogonal across levels, ensuring that each component explains a unique portion of the total variance. Model selection (number of components per level) is guided by proportion of variance explained, cross‑validation error, and information criteria (AIC/BIC). The resulting scores are then entered into downstream regression models—such as Cox proportional hazards models—to assess associations with clinical outcomes.
Applying MFPCA to the SHHS dataset, the authors found that the first two subject‑level eigenfunctions accounted for roughly 65 % of the total variability, while three visit‑level eigenfunctions added another 20 %. The leading subject‑level component highlighted low‑frequency (delta and theta) power fluctuations, a hallmark of deep sleep stages. The visit‑level components captured systematic changes between the two visits, notably alterations in higher‑frequency (alpha and beta) activity. When the subject scores were used as predictors in a Cox model for adverse cardiovascular events, the first subject score yielded a hazard ratio of 1.42 (95 % CI 1.18–1.71), indicating that individuals with pronounced low‑frequency variability had a substantially higher risk of cardiovascular outcomes. Visit‑level scores also contributed modestly to risk prediction, suggesting that temporal changes in sleep EEG carry additional prognostic information.
Model diagnostics demonstrated that MFPCA outperformed standard single‑level FPCA. The multilevel approach achieved lower AIC and BIC values (approximately 12 % and 15 % reductions, respectively) and reduced mean squared prediction error in cross‑validation from 0.084 to 0.067. Bootstrapping and permutation tests confirmed the statistical significance of the eigenfunctions and their associated scores, while false discovery rate (FDR) correction controlled for multiple testing across components.
The authors acknowledge several limitations. Computational demands are high because each level requires separate covariance estimation and eigen‑decomposition, which can be burdensome for very large cohorts. The method assumes smooth, stationary covariance structures; violations of these assumptions (e.g., abrupt shifts in EEG dynamics) could bias results. Future work is proposed to (a) develop scalable algorithms such as randomized sketching or parallel EM, (b) incorporate non‑stationary or nonlinear covariance models, and (c) extend the framework to multivariate functional data that include additional physiological signals (e.g., ECG, respiratory flow) for a more comprehensive health risk assessment.
In summary, MFPCA provides a principled, flexible tool for dissecting complex hierarchical functional data. By explicitly separating inter‑subject, intra‑subject (visit), and channel‑specific variation, it yields interpretable components that can be directly linked to clinical endpoints. The application to SHHS demonstrates that sleep EEG patterns, especially low‑frequency power dynamics, are robust predictors of cardiovascular disease, offering new avenues for personalized risk stratification and targeted interventions in sleep medicine and preventive cardiology.
Comments & Academic Discussion
Loading comments...
Leave a Comment