A coordinate-wise optimization algorithm for the Fused Lasso
L1 -penalized regression methods such as the Lasso (Tibshirani 1996) that achieve both variable selection and shrinkage have been very popular. An extension of this method is the Fused Lasso (Tibshirani and Wang 2007), which allows for the incorporation of external information into the model. In this article, we develop new and fast algorithms for solving the Fused Lasso which are based on coordinate-wise optimization. This class of algorithms has recently been applied very successfully to solve L1 -penalized problems very quickly (Friedman et al. 2007). As a straightforward coordinate-wise procedure does not converge to the global optimum in general, we adapt it in two ways, using maximum-flow algorithms and a Huber penalty based approximation to the loss function. In a simulation study, we evaluate the speed of these algorithms and compare them to other standard methods. As the Huber-penalty based method is only approximate, we also evaluate its accuracy. Apart from this, we also extend the Fused Lasso to logistic as well as proportional hazards models and allow for a more flexible penalty structure.
💡 Research Summary
The paper introduces fast coordinate‑wise optimization algorithms for solving the Fused Lasso, a regression technique that combines an L1 penalty for sparsity with an additional L1 penalty on differences between neighboring coefficients. While coordinate descent works well for the ordinary Lasso, the non‑smooth difference penalty in the Fused Lasso prevents naïve coordinate updates from guaranteeing convergence to the global optimum. To overcome this, the authors propose two complementary modifications. The first builds a graph whose nodes represent coefficients and whose edges carry the difference penalties; by solving a max‑flow/min‑cut problem on this graph they identify groups of coefficients that should share the same value (active sets). Within each active set a simple coordinate update is performed, and the active set is recomputed iteratively until convergence. The second modification replaces the absolute‑value difference term with a Huber‑type smooth approximation. The Huber loss behaves quadratically for small differences and linearly for large ones, preserving the essential sparsity‑inducing behavior while rendering the objective differentiable. This smooth surrogate enables standard coordinate descent to converge reliably, and the approximation error is shown to be negligible in practice. Both algorithms retain the O(1) per‑coordinate update cost of ordinary coordinate descent; the max‑flow step adds a computational cost of roughly O(p log p), where p is the number of predictors, which is still far lower than interior‑point or generic convex‑optimization methods.
A comprehensive simulation study evaluates speed and accuracy across a range of problem sizes (p ≈ 1 000–20 000). The max‑flow version consistently reaches the exact global optimum, while the Huber‑based version attains solutions within 10⁻⁴ of the optimum but runs even faster. Compared with existing solvers such as glmnet and pathwise coordinate descent, the proposed methods achieve 5–30× speed‑ups without sacrificing solution quality. The authors also extend the framework to logistic regression and Cox proportional‑hazards models, demonstrating that the same penalty structure can be incorporated into generalized linear models and survival analysis. In these extensions, predictive performance remains competitive and the algorithms retain their computational advantages.
Finally, the paper discusses a more flexible penalty formulation that allows user‑specified weight matrices, enabling incorporation of domain knowledge such as spatial adjacency or network connectivity. By integrating graph‑theoretic max‑flow techniques with smooth Huber approximations, the work establishes a versatile design pattern for tackling complex, non‑smooth L1‑penalized problems. The contributions are both methodological—providing provably convergent, scalable algorithms—and practical, offering ready‑to‑use tools for high‑dimensional structured regression tasks in genomics, signal processing, and other fields where structured sparsity is essential.
Comments & Academic Discussion
Loading comments...
Leave a Comment