Robust functional PCA for relative data

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper introduces a robust approach to functional principal component analysis (FPCA) for relative data, particularly density functions. While recent papers have studied density data within the Bayes space framework, there has been limited focus on developing robust methods to effectively handle anomalous observations and large noise. To address this, we extend the Mahalanobis distance concept to Bayes spaces, proposing its regularized version that accounts for the constraints inherent in density data. Based on this extension, we introduce a new method, robust density principal component analysis (RDPCA), for more accurate estimation of functional principal components in the presence of outliers. The method’s performance is validated through simulations and real-world applications, showing its ability to improve covariance estimation and principal component analysis compared to traditional methods.

💡 Research Summary

The paper addresses the problem of performing functional principal component analysis (FPCA) on relative functional data, such as probability density functions, which are constrained to be non‑negative and to integrate to one. Traditional FPCA methods treat functions as elements of an unconstrained L² space, ignoring these compositional constraints and thus becoming vulnerable to outliers. The authors adopt the Bayes space framework—a Hilbert space extension of Aitchison’s compositional geometry—to model densities as infinite‑dimensional compositions. Within this space, perturbation (⊕), powering (⊙), and an inner product ⟨·,·⟩_B² are defined, and the centered log‑ratio (clr) transformation provides an isometric map to the zero‑mean subspace L²₀, allowing standard Euclidean tools to be applied while preserving the compositional structure.

Simplicial functional PCA (SFPCA) has previously been proposed for Bayes spaces, but it relies on the sample covariance operator, which is highly sensitive to anomalous curves. To obtain robustness, the authors extend the Mahalanobis distance to Bayes spaces by introducing a regularized standardization (whitening) scheme. Because the covariance operator in infinite dimensions is not invertible, they formulate a Tikhonov‑type optimization problem (Equation 6) that seeks a function Y minimizing a weighted sum of a “C¹ᐟ²‑fit” term and a penalty α‖L Y‖², where L is a closed, densely defined operator and α>0 controls the amount of regularization. When L is the identity, this reduces to the α‑Mahalanobis distance previously defined for L² functions; Proposition 3.1 proves the equivalence of the Bayes‑space formulation and the clr‑transformed L² formulation.

The resulting regularized Mahalanobis distance (RDMD) quantifies the centrality of each observed density. By selecting a quantile of its asymptotic χ² distribution, observations with excessively large RDMD are flagged as outliers. The authors then compute a robust Bayes‑mean and a robust covariance operator using only the central subset of the data. Eigen‑decomposition of this robust covariance yields robust principal component functions, which they term Robust Density PCA (RDPCA).

A comprehensive simulation study varies noise levels, outlier proportions, and regularization parameters, comparing RDPCA against standard SFPCA, the minimum regularized covariance trace (MRCT) estimator, and depth‑based robust FPCA methods. Across metrics such as mean squared error of covariance estimation, reconstruction error, and eigenvalue accuracy, RDPCA consistently outperforms the competitors, especially when outlier contamination exceeds 20‑30 %.

Real‑world applications include age‑specific mortality density curves, spectroscopic intensity profiles, and normalized sensor signals. In each case, RDPCA delivers interpretable principal components that capture the dominant shape variations (e.g., shifts in mortality peaks, changes in spectral bands) while remaining stable in the presence of anomalous curves. The regularized Mahalanobis distance itself also serves as a diagnostic tool for visualizing structural outliers.

In summary, the authors provide a theoretically sound and practically effective extension of Mahalanobis distance to Bayes spaces, enabling robust covariance estimation and functional PCA for relative data. Their RDPCA framework overcomes the fragility of existing methods, respects the compositional nature of density‑type functions, and demonstrates superior performance on both synthetic and real datasets, opening new avenues for reliable dimension reduction in compositional functional data analysis.

Robust functional PCA for relative data

💡 Research Summary

Comments & Academic Discussion

Leave a Comment