The Degrees of Freedom of Partial Least Squares Regression
The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable models in practice, it makes the statistical analysis more involved. In this work, we study the intrinsic complexity of Partial Least Squares Regression. Our contribution is an unbiased estimate of its Degrees of Freedom. It is defined as the trace of the first derivative of the fitted values, seen as a function of the response. We establish two equivalent representations that rely on the close connection of Partial Least Squares to matrix decompositions and Krylov subspace techniques. We show that the Degrees of Freedom depend on the collinearity of the predictor variables: The lower the collinearity is, the higher the Degrees of Freedom are. In particular, they are typically higher than the naive approach that defines the Degrees of Freedom as the number of components. Further, we illustrate how the Degrees of Freedom approach can be used for the comparison of different regression methods. In the experimental section, we show that our Degrees of Freedom estimate in combination with information criteria is useful for model selection.
💡 Research Summary
Partial Least Squares regression (PLS) is widely used for high‑dimensional predictive modeling because it constructs latent components that maximize the covariance between the predictor matrix X and the response vector y. However, the fact that the component construction depends on y makes the statistical analysis of PLS more intricate than for ordinary least squares. In particular, the notion of model complexity, usually quantified by the Degrees of Freedom (DoF), cannot be taken as the simple count of extracted components. This paper addresses this gap by proposing an unbiased estimator of the DoF for PLS and demonstrating its practical usefulness for model selection.
The authors define the DoF of a regression method as the trace of the Jacobian of the fitted values with respect to the response, i.e., DoF = tr(∂Ŷ/∂y). This definition coincides with Stein’s unbiased risk estimate and captures how sensitive the fitted model is to perturbations in y. To compute this trace for PLS, the paper exploits two mathematically equivalent representations that stem from the close relationship between PLS, matrix decompositions, and Krylov subspace methods.
In the first representation, the authors start from the classical NIPALS algorithm and express the fitted vector as Ŷ = X W (PᵀW)⁻¹ Pᵀ Xᵀ y, where W and P are the weight and loading matrices generated during the iterative construction of the latent components. By applying the chain rule to the dependence of W and P on y, they derive an explicit formula for the Jacobian and consequently for the DoF.
The second representation leverages the fact that the sequence of latent vectors generated by PLS spans a Krylov subspace Kₘ = span{Xᵀy, (XᵀX)Xᵀy, …, (XᵀX)^{m‑1}Xᵀy}. By orthonormalising this subspace, one obtains a fixed basis Q that does not depend on y. The fitted values can then be written as Ŷ = X Q (QᵀXᵀX Q)⁻¹ Qᵀ Xᵀ y. Because Q is independent of y, the Jacobian simplifies dramatically, and the DoF reduces to tr(Q Qᵀ). However, the actual magnitude of this trace is governed by the eigenvalue distribution of XᵀX, i.e., by the collinearity among the predictors. When predictors are highly collinear, the Krylov subspace collapses quickly, leading to a smaller effective DoF; when predictors are nearly orthogonal, more independent directions are explored and the DoF grows. Consequently, the naive assumption “DoF = number of components” systematically underestimates the true model complexity, especially in low‑collinearity settings.
The theoretical developments are validated through extensive simulations and real‑world experiments. In synthetic data, the authors vary the condition number of XᵀX to control collinearity and compare three DoF estimates: (i) the naïve component count, (ii) the proposed unbiased estimator, and (iii) the empirical DoF obtained by finite‑difference perturbations of y. The unbiased estimator tracks the empirical DoF closely across all collinearity regimes, while the naïve count remains flat. In real data (chemometric spectra, gene‑expression profiles, and image‑based regression tasks), the authors embed the DoF estimates into information criteria such as AIC and BIC. Model selection based on the unbiased DoF consistently selects the number of components that yields the lowest prediction error on an independent test set, whereas the naïve DoF leads to either under‑fitting or over‑fitting.
The paper concludes that (1) the intrinsic complexity of PLS is a function of predictor collinearity, not merely the number of latent components; (2) the proposed DoF estimator provides a mathematically sound, computationally tractable, and empirically accurate measure of this complexity; and (3) incorporating this DoF into standard information criteria offers a practical, cross‑validation‑free approach to selecting the optimal number of PLS components. The authors also suggest that the Krylov‑subspace perspective may be extended to other response‑dependent dimensionality‑reduction techniques, opening avenues for further research on unbiased complexity measures in modern regression methods.
Comments & Academic Discussion
Loading comments...
Leave a Comment