A geometric approach to maximum likelihood estimation of the functional principal components from sparse longitudinal data
In this paper, we consider the problem of estimating the eigenvalues and eigenfunctions of the covariance kernel (i.e., the functional principal components) from sparse and irregularly observed longitudinal data. We approach this problem through a maximum likelihood method assuming that the covariance kernel is smooth and finite dimensional. We exploit the smoothness of the eigenfunctions to reduce dimensionality by restricting them to a lower dimensional space of smooth functions. The estimation scheme is developed based on a Newton-Raphson procedure using the fact that the basis coefficients representing the eigenfunctions lie on a Stiefel manifold. We also address the selection of the right number of basis functions, as well as that of the dimension of the covariance kernel by a second order approximation to the leave-one-curve-out cross-validation score that is computationally very efficient. The effectiveness of our procedure is demonstrated by simulation studies and an application to a CD4 counts data set. In the simulation studies, our method performs well on both estimation and model selection. It also outperforms two existing approaches: one based on a local polynomial smoothing of the empirical covariances, and another using an EM algorithm.
💡 Research Summary
The paper tackles the challenging problem of estimating the eigenvalues and eigenfunctions of a covariance kernel—i.e., the functional principal components (FPCs)—when the underlying longitudinal data are sparsely and irregularly observed. Classical functional principal component analysis (FPCA) assumes dense measurements per subject, which rarely holds in biomedical studies where each individual may contribute only a handful of observations. Existing sparse‑FPCA methods typically rely on smoothing the empirical covariance surface or on an Expectation‑Maximization (EM) algorithm that alternates between imputing missing values and updating the covariance structure. Both approaches, however, treat the eigenfunctions as unrestricted objects and therefore do not fully exploit their inherent smoothness and orthonormality.
The authors propose a maximum‑likelihood (ML) framework that explicitly incorporates smoothness by representing each eigenfunction as a linear combination of a pre‑specified set of smooth basis functions (e.g., B‑splines, Fourier series). The coefficient matrix that collects these basis weights is constrained to lie on a Stiefel manifold, the set of orthonormal matrices, which guarantees the orthonormality of the estimated eigenfunctions by construction. This parametrisation reduces the infinite‑dimensional functional problem to a finite‑dimensional optimization over the Stiefel manifold, dramatically lowering the effective dimensionality.
Under the assumption that the observed measurements are conditionally Gaussian given the latent functional scores, the log‑likelihood can be expressed as a smooth function of the basis coefficients and the eigenvalues. To maximise this likelihood subject to the Stiefel constraint, the authors develop a Newton‑Raphson algorithm that operates directly on the manifold. The algorithm computes the Riemannian gradient and Hessian in the tangent space, performs a line‑search, and then retracts the updated point back onto the manifold. This geometric optimisation yields rapid convergence and avoids the instability often encountered with penalty‑based constrained optimisation.
A second major contribution is an efficient model‑selection strategy for determining both the number of basis functions (K) and the intrinsic dimensionality of the covariance kernel (L). The authors adopt a leave‑one‑curve‑out (LOCO) cross‑validation (CV) criterion, which would normally require refitting the model for each omitted subject—a prohibitive cost for large datasets. They circumvent this by deriving a second‑order Taylor approximation of the LOCO CV score with respect to the omitted curve’s contribution. The approximation can be evaluated using quantities already computed in the Newton iterations, making the CV calculation essentially O(1) per curve. Consequently, a grid search over plausible (K, L) pairs becomes computationally feasible.
Simulation studies are conducted under a variety of sparsity regimes (average of 3–5 observations per subject) and noise levels. The proposed method consistently outperforms two benchmark approaches: (i) a local‑polynomial smoothing of the raw empirical covariances, and (ii) an EM‑based sparse FPCA algorithm. Performance metrics include mean squared error (MSE) for eigenfunction recovery, mean absolute error for eigenvalues, and the accuracy of selecting the correct dimensionality. The second‑order CV approximation matches the exact LOCO CV scores almost perfectly, confirming its reliability for model selection.
The methodology is illustrated on a real HIV dataset comprising CD4+ T‑cell counts measured at irregular clinic visits. The estimated first two functional principal components capture, respectively, the overall declining trend in immune status and the subject‑specific deviation patterns. The CV‑driven selection procedure identifies two components as sufficient, aligning with previous clinical analyses. Visualisation of the reconstructed individual trajectories demonstrates that the ML‑based reconstruction is smoother and more faithful to the observed data than the alternatives.
Key strengths of the work are: (1) a principled integration of smoothness and orthonormality via Stiefel‑manifold optimisation, (2) a fast Newton‑Raphson scheme that respects the manifold geometry, (3) an analytically tractable, low‑cost approximation to LOCO CV enabling simultaneous selection of basis size and functional rank, and (4) thorough empirical validation showing superior estimation and model‑selection performance. Limitations include reliance on a fixed basis set (the impact of basis choice is not exhaustively examined) and the Gaussian assumption for measurement errors; robustness to heavy‑tailed or heteroscedastic noise remains to be investigated. Future research directions suggested by the authors involve adaptive basis learning, extensions to non‑Gaussian likelihoods, and scalable parallel implementations for massive longitudinal cohorts.
Comments & Academic Discussion
Loading comments...
Leave a Comment