Analysis of nonlinear modes of variation for functional data
A set of curves or images of similar shape is an increasingly common functional data set collected in the sciences. Principal Component Analysis (PCA) is the most widely used technique to decompose variation in functional data. However, the linear modes of variation found by PCA are not always interpretable by the experimenters. In addition, the modes of variation of interest to the experimenter are not always linear. We present in this paper a new analysis of variance for Functional Data. Our method was motivated by decomposing the variation in the data into predetermined and interpretable directions (i.e. modes) of interest. Since some of these modes could be nonlinear, we develop a new defined ratio of sums of squares which takes into account the curvature of the space of variation. We discuss, in the general case, consistency of our estimates of variation, using mathematical tools from differential geometry and shape statistics. We successfully applied our method to a motivating example of biological data. This decomposition allows biologists to compare the prevalence of different genetic tradeoffs in a population and to quantify the effect of selection on evolution.
💡 Research Summary
The paper addresses a fundamental limitation of classical functional data analysis: the reliance on linear modes of variation, as produced by Principal Component Analysis (PCA), which often fail to correspond to scientifically meaningful patterns. To overcome this, the authors propose a novel analysis‑of‑variance (ANOVA) framework that explicitly decomposes functional variation along a set of user‑specified, potentially nonlinear “modes.” These modes are chosen a priori based on domain knowledge (e.g., size, asymmetry, curvature) and may describe curved trajectories in the space of functions rather than straight lines.
The methodological core rests on differential geometry. The collection of observed functions (or images) is regarded as a sample from a Riemannian manifold embedded in a high‑dimensional Hilbert space. Each predefined mode defines a one‑dimensional submanifold (a geodesic curve) on this manifold. Distances between functions are measured using the intrinsic Riemannian metric, which automatically incorporates curvature of the underlying space. For a given mode (i), the authors define a sum‑of‑squares term (SS_i) as the sum of squared Riemannian distances from each observation to its orthogonal projection onto the geodesic representing mode (i). The total variation (SS_T) is the sum of squared distances from each observation to the overall Fréchet mean (the intrinsic mean on the manifold).
Crucially, the total variation is partitioned as
\
Comments & Academic Discussion
Loading comments...
Leave a Comment