An eigenanalysis of data centering in machine learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Many pattern recognition methods rely on statistical information from centered data, with the eigenanalysis of an empirical central moment, such as the covariance matrix in principal component analysis (PCA), as well as partial least squares regression, canonical-correlation analysis and Fisher discriminant analysis. Recently, many researchers advocate working on non-centered data. This is the case for instance with the singular value decomposition approach, with the (kernel) entropy component analysis, with the information-theoretic learning framework, and even with nonnegative matrix factorization. Moreover, one can also consider a non-centered PCA by using the second-order non-central moment. The main purpose of this paper is to bridge the gap between these two viewpoints in designing machine learning methods. To provide a study at the cornerstone of kernel-based machines, we conduct an eigenanalysis of the inner product matrices from centered and non-centered data. We derive several results connecting their eigenvalues and their eigenvectors. Furthermore, we explore the outer product matrices, by providing several results connecting the largest eigenvectors of the covariance matrix and its non-centered counterpart. These results lay the groundwork to several extensions beyond conventional centering, with the weighted mean shift, the rank-one update, and the multidimensional scaling. Experiments conducted on simulated and real data illustrate the relevance of this work.

💡 Research Summary

This paper investigates the fundamental question of whether to center data before performing eigen‑analysis in kernel‑based machine learning. Starting from a data matrix X∈ℝ^{d×n}, the authors define the non‑centered Gram matrix K = XᵀX and the centered Gram matrix K_c = X_cᵀX_c, where X_c = X – μ1ᵀ and μ = (1/n)X1 is the empirical mean. By applying matrix theory, they prove an interlacing property for the eigenvalues of K and K_c: λ_i(K) ≥ λ_i(K_c) ≥ λ_{i+1}(K) for i = 1,…,n‑1. Consequently, centering compresses the spectrum but never pushes the largest eigenvalue above that of the non‑centered matrix. They also derive a lower bound for the leading eigenvalue of the centered Gram matrix, λ₁(K_c) ≥ λ₁(K) – (1/n)‖μ‖², linking it directly to the norm of the mean vector.

The analysis is extended to outer‑product matrices. The non‑central second‑order moment C = (1/n)XXᵀ and the centered covariance matrix C_c = C – μμᵀ are shown to be related by a rank‑one update. Using this relationship, the authors obtain an explicit transformation between the leading eigenvectors: w_{c1} = (I – μμᵀ/‖μ‖²) w₁. This reveals that removing the mean component from the non‑centered eigenvector yields the centered eigenvector, providing an intuitive geometric interpretation.

Key mathematical tools include trace identities, the Sherman‑Morrison formula, and properties of positive‑definite matrices. The paper further generalizes the mean vector to a weighted mean μ_w = Xw (with wᵀ1 = 1), showing how the spectra of K and K_c evolve under weighted mean shifting. The rank‑one update formulation also enables efficient online updates when new samples arrive, a practical advantage for streaming scenarios.

From the multidimensional scaling (MDS) perspective, the authors note that the double‑centering of a distance matrix D (producing B = –½HJH with J = I – (1/n)11ᵀ) follows the same eigen‑structure as the Gram matrices, allowing the same spectral insights to be applied to classical MDS embeddings.

Two case studies illustrate the theoretical findings. The first revisits Principal Component Analysis (PCA) and its kernelized version, emphasizing that centered covariance eigenvectors maximize data variance and lead to minimal reconstruction error. The second examines non‑centered Entropy Component Analysis (ECA), which optimizes a Rényi‑entropy based objective and retains information about the data’s mean offset. While both methods solve eigen‑problems, the objective functions differ, leading to distinct spectral signatures.

Empirical evaluations on synthetic Gaussian clusters, a face image dataset, and gene‑expression data confirm the theory. Centered PCA exhibits a sparse spectrum with a few dominant eigenvalues, yielding effective dimensionality reduction. Non‑centered ECA, especially when the data mean is large, shows an inflated leading eigenvalue that captures asymmetry and non‑negativity of the data. The gap between λ₁(K_c) and λ₁(C_c) grows with ‖μ‖, suggesting that non‑centered approaches can be advantageous in applications such as non‑negative matrix factorization, spectral clustering, or density‑based learning where the mean carries discriminative information.

Finally, the authors argue that the derived eigenvalue/eigenvector relationships apply broadly to kernel‑based algorithms (kernel PCA, kernel PLS, kernel CCA, kernel FDA). By framing centering as a rank‑one modification, they provide a unified analytical framework that accommodates weighted mean shifts, online rank‑one updates, and extensions to MDS. This work supplies both rigorous theoretical justification and practical guidelines for practitioners deciding whether to center or retain raw data in modern machine‑learning pipelines.

An eigenanalysis of data centering in machine learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment