Functional-SVD for Heterogeneous Trajectories: Case Studies in Health
Trajectory data, including time series and longitudinal measurements, are increasingly common in health-related domains such as biomedical research and epidemiology. Real-world trajectory data frequently exhibit heterogeneity across subjects such as patients, sites, and subpopulations, yet many traditional methods are not designed to accommodate such heterogeneity in data analysis. To address this, we propose a unified framework, termed Functional Singular Value Decomposition (FSVD), for statistical learning with heterogeneous trajectories. We establish the theoretical foundations of FSVD and develop a corresponding estimation algorithm that accommodates noisy and irregular observations. We further adapt FSVD to a wide range of trajectory-learning tasks, including dimension reduction, factor modeling, regression, clustering, and data completion, while preserving its ability to account for heterogeneity, leverage inherent smoothness, and handle irregular sampling. Through extensive simulations, we demonstrate that FSVD-based methods consistently outperform existing approaches across these tasks. Finally, we apply FSVD to a COVID-19 case-count dataset and electronic health record datasets, showcasing its effective performance in global and subgroup pattern discovery and factor analysis.
💡 Research Summary
The paper introduces Functional Singular Value Decomposition (FSVD), a unified statistical framework designed to handle heterogeneous trajectory data that are common in health‑related fields such as epidemiology and intensive‑care medicine. Traditional functional principal component analysis (FPCA) assumes that all subjects share the same mean and covariance functions, an assumption often violated when trajectories differ across regions, patient sub‑populations, or clinical features. FSVD removes this homogeneity requirement by representing each trajectory X_i(t) as an infinite sum X_i(t)=∑{r≥1}ρ_r a{ir} φ_r(t), where {φ_r(t)} are orthonormal singular functions (time domain), {a_r} are orthonormal singular vectors (subject/feature domain), and ρ_r are singular values. The authors prove the existence of such a decomposition in a Hilbert‑space setting without assuming identical distribution across subjects.
To make FSVD practical, they develop an alternating‑minimization algorithm that works with noisy, irregularly sampled observations. The procedure alternates between updating the singular functions (using smooth basis expansions such as B‑splines or Fourier) and the singular vectors, while enforcing orthogonality via QR or Gram‑Schmidt steps. Observation‑specific weighting handles irregular time grids, and L2 regularization promotes smoothness and guards against over‑fitting.
A central conceptual contribution is the introduction of “intrinsic basis functions” (capturing temporal structure) and “intrinsic basis vectors” (capturing subject/feature structure). This dual basis enables FSVD to serve simultaneously as a functional dimension‑reduction tool (like FPCA) and as a factor‑model for multivariate time series. Consequently, FSVD can be applied to a suite of learning tasks without additional model modifications: (1) functional completion (imputing missing time points), (2) clustering (using singular vectors to discover sub‑groups), (3) regression (using singular vectors as predictors), (4) factor modeling (dynamic latent factors that evolve over time), and (5) matrix/tensor completion.
Extensive simulations compare FSVD against FPCA and regularized factor models under varying degrees of heterogeneity, irregular sampling rates (30‑70% missing), and signal‑to‑noise ratios. Across reconstruction error, Adjusted Rand Index for clustering, and mean‑squared error for regression, FSVD consistently outperforms the competitors, especially when heterogeneity is strong. The method’s robustness to irregular sampling and noise is also demonstrated.
Two real‑world applications illustrate practical impact. First, the authors analyze cumulative COVID‑19 case counts (log‑scaled per million) from 64 regions in 2020, each observed on an irregular 67‑day window. FSVD extracts a few dominant singular functions that capture the global epidemic trend, while the associated singular vectors clearly separate continents (Asia, North America, Europe) and reveal subgroup dynamics that FPCA fails to detect. Second, they apply FSVD to the MIMIC‑IV ICU dataset, focusing on 12 routinely measured clinical variables over a 580‑minute window. FSVD uncovers 3‑4 time‑varying latent factors reflecting coordinated physiological systems, achieves a 15% reduction in imputation RMSE compared with standard matrix‑completion methods, and provides factor scores that improve mortality risk prediction.
Theoretical sections establish consistency, convergence rates, and optimality of the FSVD estimator under noisy, irregular designs. Supplementary material supplies algorithmic details, additional simulation results, and publicly available code (R package and GitHub repository).
In summary, FSVD offers a mathematically sound, computationally feasible, and highly versatile tool for analyzing heterogeneous, irregularly sampled trajectory data. By jointly modeling functional and tabular aspects, it bridges the gap between FPCA and traditional factor analysis, opening new avenues for robust health data analytics and beyond.
Comments & Academic Discussion
Loading comments...
Leave a Comment