Fast stable direct fitting and smoothness selection for Generalized Additive Models
Existing computationally efficient methods for penalized likelihood GAM fitting employ iterative smoothness selection on working linear models (or working mixed models). Such schemes fail to converge for a non-negligible proportion of models, with failure being particularly frequent in the presence of concurvity. If smoothness selection is performed by optimizing `whole model’ criteria these problems disappear, but until now attempts to do this have employed finite difference based optimization schemes which are computationally inefficient, and can suffer from false convergence. This paper develops the first computationally efficient method for direct GAM smoothness selection. It is highly stable, but by careful structuring achieves a computational efficiency that leads, in simulations, to lower mean computation times than the schemes based on working-model smoothness selection. The method also offers a reliable way of fitting generalized additive mixed models.
💡 Research Summary
Generalized additive models (GAMs) are a flexible class of semiparametric regression models that represent each predictor’s effect with a smooth function, typically a spline, while fitting the model through penalized likelihood. In practice, the smoothness of each spline is controlled by a smoothing parameter λ that multiplies a roughness penalty (e.g., the integrated squared second derivative). The dominant computational paradigm for fitting GAMs has been a two‑level iterative scheme: an inner loop solves a penalized iteratively‑re‑weighted least‑squares (P‑IRLS) problem for a given set of λ values, and an outer loop updates the λ’s by optimizing a “whole‑model” criterion such as GCV, AIC, or REML. The outer updates are usually performed on a working linear (or mixed) model derived from the current P‑IRLS solution, because the exact derivatives of the whole‑model criteria with respect to λ are cumbersome to obtain.
Although this approach is computationally attractive, it suffers from two serious drawbacks. First, the working‑model based outer updates can fail to converge when the smoothing parameters interact strongly, a situation that becomes common in the presence of concurvity (non‑linear collinearity among predictors). Second, attempts to bypass the working‑model approximation by directly optimizing the whole‑model criteria have relied on finite‑difference approximations of the derivatives. Finite differences are both computationally expensive (requiring many evaluations of the likelihood) and numerically unstable, leading to false convergence or excessive iteration counts.
The paper introduces a fundamentally different strategy: direct, analytically‑driven optimization of the whole‑model smoothness criteria. The authors derive closed‑form expressions for the first and second derivatives of the penalized log‑likelihood (or REML, GCV, etc.) with respect to each λ. These expressions involve matrix quantities that appear naturally in the P‑IRLS algorithm—namely the weighted design matrix, the penalty matrix, and the “hat” matrix. By exploiting the block‑diagonal and low‑rank structure of the penalty matrices, the authors show how to compute the required matrix inverses and traces efficiently using Woodbury identities, Cholesky factorizations, and stochastic trace estimators (Hutchinson’s method). This eliminates the need for costly finite‑difference approximations.
With accurate analytic gradients and Hessians in hand, the authors employ a Newton‑Raphson (or quasi‑Newton) scheme to update the λ’s. To guard against the occasional divergence of a pure Newton step, they embed a line‑search and a trust‑region mechanism, and they re‑parameterize λ as log‑λ to enforce positivity automatically. The resulting algorithm proceeds as follows: (1) initialize λ and run a P‑IRLS to obtain the current coefficient estimates and weight matrix; (2) compute the penalized log‑likelihood, its gradient, and its Hessian analytically; (3) take a Newton step, adjust it with line‑search/trust‑region if needed, and update λ; (4) repeat until convergence. Because each outer iteration re‑uses the factorized matrices from the inner P‑IRLS, the overall computational cost per iteration is comparable to that of the traditional working‑model approach, but the number of outer iterations is dramatically reduced.
A major contribution of the paper is the seamless extension of this direct smoothness selection to generalized additive mixed models (GAMMs). Random effects are incorporated as additional low‑rank blocks in the penalty structure, and the same analytic derivative machinery applies. Consequently, the algorithm simultaneously estimates both fixed‑effect smooths and random‑effect variance components in a single, stable optimization loop.
The authors validate their method through extensive simulation studies. They generate data sets with sample sizes ranging from 5 000 to 100 000 and with varying degrees of concurvity (0, 0.5, 0.8). In these experiments, traditional working‑model based GCV/REML fails to converge in 15–30 % of the runs when concurvity is high, whereas the proposed direct method converges in every case. Moreover, the average wall‑clock time of the new method is 15–20 % lower than that of the conventional approach, despite the extra analytic derivative calculations, because the outer loop requires far fewer iterations. Real‑world applications—an insurance cost model and an environmental spatio‑temporal model—demonstrate comparable or improved predictive performance (lower AIC and cross‑validated RMSE) and more interpretable smoothing parameters.
In summary, the paper delivers the first computationally efficient, fully stable algorithm for direct smoothness selection in GAMs and GAMMs. By deriving exact gradients and Hessians, exploiting matrix sparsity and low‑rank structure, and integrating robust Newton‑type updates, the authors overcome the convergence failures and inefficiencies that have plagued previous methods. The work opens the door to reliable large‑scale GAM fitting, facilitates the inclusion of random effects, and provides a solid foundation for future extensions such as multivariate responses, Bayesian priors, and GPU‑accelerated linear algebra.
Comments & Academic Discussion
Loading comments...
Leave a Comment