Computationally efficient multi-level Gaussian process regression for functional data observed under completely or partially regular sampling designs
Gaussian process regression is a frequently used statistical method for flexible yet fully probabilistic non-linear regression modeling. A common obstacle is its computational complexity which scales poorly with the number of observations. This is especially an issue when applying Gaussian process models to multiple functions simultaneously in various applications of functional data analysis. We consider a multi-level Gaussian process regression model where a common mean function and individual subject-specific deviations are modeled simultaneously as latent Gaussian processes. We derive exact analytic and computationally efficient expressions for the log-likelihood function and the posterior distributions in the case where the observations are sampled on either a completely or partially regular grid. This enables us to fit the model to large data sets that are currently computationally inaccessible using a standard implementation. We show through a simulation study that our analytic expressions are several orders of magnitude faster compared to a standard implementation, and we provide an implementation in the probabilistic programming language Stan.
💡 Research Summary
This paper addresses a fundamental computational bottleneck in functional data analysis when employing Gaussian process (GP) regression to model multiple curves simultaneously. The authors propose a hierarchical (“multi‑level”) GP model in which each observed function is decomposed into a common mean function μ(t) and an individual deviation η_i(t). Both μ and η_i are endowed with zero‑mean GP priors, and the η_i’s are coupled through a multi‑output GP with a specially constructed cross‑covariance matrix Ξ that enforces the identifiability constraint Σ_i η_i(t)=0 for every time point.
The central contribution lies in exploiting regularities in the sampling design to obtain exact, closed‑form expressions for the log‑likelihood and posterior distributions, thereby avoiding any approximations that would compromise the Bayesian model’s fidelity. Two sampling scenarios are considered:
-
Completely regular design – All n functions are observed on the same grid of J time points. In this case the joint covariance matrix Σ_Θ of the stacked observation vector y(t) exhibits a block‑Kronecker structure: each diagonal block equals V_ar = K_μ(t,t)+K_η(t,t)+σ²I_J, while each off‑diagonal block equals C_ov = K_μ(t,t)−(1/(n−1))K_η(t,t). This matrix can be written as Σ_Θ = I_n⊗(V_ar−C_ov)+1_{n,n}⊗C_ov. By applying the matrix identities from Seber (2008), the determinant and inverse of Σ_Θ reduce to operations on two much smaller J×J matrices, Σ_0 = (n/(n−1))K_η(t,t)+σ²I_J and Σ_1 = nK_μ(t,t)+σ²I_J. Consequently, the log‑likelihood, Σ_Θ^{-1}y, and the posterior means/covariances of μ and η_i can be evaluated in O(J³) time, essentially independent of the number of functions n.
-
Partially regular design – A subset of the functions share a common grid, while the remaining functions are observed at arbitrary time points. The authors show that the overall covariance still decomposes into a sum of a Kronecker‑structured component (for the common grid) and independent components (for the irregular grids). The log‑likelihood therefore splits into a sum of a “regular” term, handled exactly as above, and a set of “irregular” terms that each require only a standard Cholesky factorization of the corresponding J_i×J_i covariance matrix. This yields a computational cost that scales linearly with the number of irregularly sampled functions and cubically with the size of each individual grid, but remains dramatically faster than a naïve O((∑J_i)³) approach.
The paper provides rigorous proofs (supplementary material) for the block‑Kronecker decomposition, the determinant simplification, and the form of the posterior distributions under both designs. An iterative block‑Cholesky algorithm is described to compute the required matrix factorizations efficiently.
Implementation is carried out in the probabilistic programming language Stan, leveraging its automatic differentiation and Hamiltonian Monte Carlo (HMC) capabilities. The authors expose the analytic log‑likelihood and posterior formulas as custom Stan functions, allowing users either to run full Bayesian inference via HMC or to extract the deterministic components for use in other software environments (e.g., R via cmdstanr). The source code and a reproducible example are made publicly available on GitHub.
A simulation study demonstrates the practical impact. With n=500 functions each sampled at J=200 equally spaced points, the naïve implementation (direct inversion of the (nJ)×(nJ) covariance) requires several hours and exceeds typical memory limits, whereas the proposed method completes in under a second and uses only a few hundred megabytes. Accuracy is verified by comparing log‑likelihood values, posterior means, and credible intervals; the results are numerically identical up to machine precision, confirming that no approximation has been introduced.
The authors discuss the relevance of their approach for real‑world data that naturally exhibit regular sampling, such as electrocardiograms, continuous glucose monitoring, climate time series, wearable sensor streams, and spectroscopic measurements. They argue that the method enables fully Bayesian functional analysis at scales previously inaccessible, opening the door to downstream tasks (e.g., functional classification, clustering, or causal inference) that benefit from proper uncertainty quantification.
Future work is outlined: extending the framework to fully irregular designs, investigating alternative cross‑covariance structures for the multi‑output GP, and integrating sparse or low‑rank approximations to handle ultra‑high‑dimensional grids. The paper positions the Kronecker‑based analytic strategy as a principled alternative to inducing‑point or kernel‑approximation methods, preserving exact Bayesian inference while delivering orders‑of‑magnitude speedups.
In summary, the manuscript delivers a theoretically sound, computationally efficient solution for multi‑level GP regression under regular sampling designs, provides a ready‑to‑use Stan implementation, and demonstrates substantial practical gains, thereby making fully Bayesian functional data analysis feasible for large‑scale applications.
Comments & Academic Discussion
Loading comments...
Leave a Comment