Efficient algorithm to select tuning parameters in sparse regression modeling with regularization
In sparse regression modeling via regularization such as the lasso, it is important to select appropriate values of tuning parameters including regularization parameters. The choice of tuning parameters can be viewed as a model selection and evaluation problem. Mallows’ $C_p$ type criteria may be used as a tuning parameter selection tool in lasso-type regularization methods, for which the concept of degrees of freedom plays a key role. In the present paper, we propose an efficient algorithm that computes the degrees of freedom by extending the generalized path seeking algorithm. Our procedure allows us to construct model selection criteria for evaluating models estimated by regularization with a wide variety of convex and non-convex penalties. Monte Carlo simulations demonstrate that our methodology performs well in various situations. A real data example is also given to illustrate our procedure.
💡 Research Summary
The paper addresses the problem of selecting tuning parameters in sparse regression models that employ regularization such as the lasso, elastic net, group lasso, and various non‑convex penalties. In these models the regularization parameter controls both sparsity and model complexity, and its optimal value is traditionally chosen by model‑selection criteria such as Mallows’ Cp, AIC, BIC, or GCV. All of these criteria require an estimate of the degrees of freedom (df), defined as the sum of covariances between fitted values and the observed response divided by the error variance. While df is trivial for linear estimators (trace of the hat matrix), it is difficult to obtain for regularized estimators because the penalty term is non‑differentiable at zero and the solution path lacks a closed form.
The authors propose a novel, computationally efficient algorithm that computes df simultaneously with the solution path by extending Friedman’s Generalized Path Seeking (GPS) method. GPS iteratively updates a single coefficient at each step, moving it in the direction of the gradient of the loss divided by the derivative of the penalty. The update rule for coefficient k is
β̂_k(t+Δt)=β̂_k(t)+Δt·λ_k(t),
where λ_k(t)=g_k(t)/p_k(t), g_k(t) is the gradient of the squared‑error loss, and p_k(t) is the derivative of the penalty with respect to |β_k|. The authors show that the fitted values μ̂(t)=Xβ̂(t) evolve according to
μ̂(t+Δt)=μ̂(t)+2Δt x_k x_kᵀ(y−μ̂(t))/N.
From this relationship they derive a recursive formula for the covariance matrix M(t)=cov(μ̂(t),y)/τ²:
I−M(t+Δt) = (I−α x_k x_kᵀ)(I−M(t)), α=2Δt/N.
Taking the trace yields the df at each step: df(t)=tr M(t). Because the update involves only rank‑one matrix operations, the algorithm avoids costly matrix inversions and can be implemented in O(Np) time.
Two practical enhancements are introduced. First, when the gradient g_k(t) becomes very small near the ordinary least‑squares solution, the algorithm takes a larger step of size m·Δt, where
m = log(1−α/|g_k(t)|) / log(1−α).
This accelerates convergence without sacrificing accuracy. Second, to handle non‑monotone or discontinuous solution paths, the algorithm selects the update variable from the set S={j | λ_j(t)·β̂_j(t)<0} if S is non‑empty, otherwise it chooses the variable with the largest |λ_j(t)|. This rule ensures stability across a wide class of convex and non‑convex penalties that satisfy ∂P/∂|β_j|>0.
Algorithm 1 summarizes the procedure: initialize t=0, iterate while any |g_j(t)| exceeds a small threshold α, compute gradients and λ’s, determine the update index k (with the S‑set rule), compute the adaptive step size m, update β̂_k, update the covariance matrix M, compute df, and advance t. The method yields the entire regularization path together with an exact df estimate at each point.
The authors evaluate the approach through extensive Monte Carlo simulations and a real‑world genetic data set. They compare Cp, AIC, BIC, and GCV computed with their df estimates against standard cross‑validation and bootstrap methods. Results show that the proposed df estimates are unbiased, lead to model‑selection criteria that correctly identify the true sparsity pattern, and dramatically reduce computational time (often by an order of magnitude). In the real data example, the algorithm selects a parsimonious model with predictive performance comparable to cross‑validated lasso, while providing transparent df values for interpretability.
In conclusion, the paper delivers a theoretically sound and practically fast algorithm for estimating degrees of freedom in a broad family of regularized regression models. By integrating df computation into the path‑seeking process, it enables efficient use of Cp‑type criteria for tuning‑parameter selection, offering an attractive alternative to computationally intensive resampling techniques. Future work may extend the framework to generalized linear models, survival analysis, and high‑dimensional structured data.
Comments & Academic Discussion
Loading comments...
Leave a Comment