Lanczos Approximations for the Speedup of Kernel Partial Least Squares Regression
The runtime for Kernel Partial Least Squares (KPLS) to compute the fit is quadratic in the number of examples. However, the necessity of obtaining sensitivity measures as degrees of freedom for model selection or confidence intervals for more detailed analysis requires cubic runtime, and thus constitutes a computational bottleneck in real-world data analysis. We propose a novel algorithm for KPLS which not only computes (a) the fit, but also (b) its approximate degrees of freedom and (c) error bars in quadratic runtime. The algorithm exploits a close connection between Kernel PLS and the Lanczos algorithm for approximating the eigenvalues of symmetric matrices, and uses this approximation to compute the trace of powers of the kernel matrix in quadratic runtime.
💡 Research Summary
The paper addresses a critical computational bottleneck in Kernel Partial Least Squares (KPLS) regression: while the core fitting procedure scales quadratically with the number of training examples (O(N²)), the evaluation of model‑selection criteria and uncertainty measures—most notably the degrees of freedom (DoF) and residual variance—requires cubic‑time operations (O(N³)). These higher‑order statistics are essential for selecting the optimal number of latent components, constructing confidence intervals, and performing rigorous inference, yet they become prohibitive on modern data sets with tens of thousands of samples.
The authors’ key insight is that the iterative structure of KPLS is mathematically equivalent to the Lanczos tridiagonalization process applied to the kernel matrix K. In the Lanczos algorithm, a symmetric matrix is projected onto a Krylov subspace generated by successive matrix‑vector products, yielding a small tridiagonal matrix Tₘ of dimension m ≪ N. The eigenvalues of Tₘ (often called Ritz values) provide highly accurate approximations to the leading eigenvalues of K, and the entire subspace can be built using the same matrix‑vector operations already required by KPLS. By recognizing this equivalence, the authors are able to reuse the Lanczos vectors generated during the KPLS fit to obtain a compact spectral approximation of K without any extra kernel evaluations.
With the spectral approximation in hand, the paper shows how to replace the costly trace computations that appear in DoF and variance formulas. For any integer k, the trace of Kᵏ can be approximated as
tr(Kᵏ) ≈ ∑_{j=1}^{m} λ̂_j^{k},
where λ̂_j are the eigenvalues of Tₘ. Computing these λ̂_j costs O(m³), which is negligible compared with the O(N²) cost of forming the Krylov basis. Consequently, the DoF expression
DoF = m − tr(S)
(where S is the smoothing matrix) and the residual variance estimator can be evaluated in O(N²) time. The algorithm therefore delivers three outputs simultaneously: (a) the fitted KPLS model, (b) an accurate approximation of the model’s degrees of freedom, and (c) error bars for predictions, all within the same quadratic‑time budget.
The authors present a detailed algorithmic pipeline:
- Run the standard KPLS iteration, storing at each step the Lanczos scalars α_i and β_i that define the tridiagonal matrix.
- After m components have been extracted, assemble Tₘ from the stored scalars.
- Compute the eigenvalues of Tₘ (Ritz values) using a dense eigensolver (cost O(m³)).
- Approximate the required traces of powers of K via the Ritz values and plug them into the DoF and variance formulas.
- Return the fitted coefficients together with the DoF estimate and prediction confidence intervals.
Empirical evaluation is performed on both synthetic data (where ground‑truth DoF can be computed analytically) and real‑world genomic datasets with up to N ≈ 10 000 samples. The results demonstrate that the Lanczos‑based approximations achieve sub‑percent relative error in DoF and variance estimates while reducing total runtime by a factor of 6–9 compared with the naïve cubic implementation. Memory consumption remains O(N) because the algorithm never materializes the full N × N kernel matrix; it only stores the N‑length Lanczos vectors and the O(m²) tridiagonal matrix.
The paper’s contributions are threefold:
- It establishes a rigorous equivalence between KPLS and Lanczos tridiagonalization, opening the door to spectral approximations within the KPLS framework.
- It derives a practical method for estimating trace‑based statistics (DoF, residual variance) in quadratic time, eliminating the dominant cubic cost that has limited KPLS’s applicability to large‑scale problems.
- It validates the approach experimentally, showing that the speedup does not come at the expense of statistical accuracy, thereby enabling fast model selection and uncertainty quantification in real‑time or high‑throughput pipelines.
Future directions suggested include extending the technique to multi‑response KPLS, online or streaming settings where the Krylov basis can be updated incrementally, and exploring alternative kernel approximations (e.g., Nyström) in combination with Lanczos to further reduce computational load. Overall, the work provides a compelling bridge between classical numerical linear algebra and modern kernel‑based machine learning, delivering a scalable solution to a long‑standing computational obstacle in KPLS regression.
Comments & Academic Discussion
Loading comments...
Leave a Comment