A path algorithm for the Fused Lasso Signal Approximator
The Lasso is a very well known penalized regression model, which adds an $L_{1}$ penalty with parameter $\lambda_{1}$ on the coefficients to the squared error loss function. The Fused Lasso extends this model by also putting an $L_{1}$ penalty with parameter $\lambda_{2}$ on the difference of neighboring coefficients, assuming there is a natural ordering. In this paper, we develop a fast path algorithm for solving the Fused Lasso Signal Approximator that computes the solutions for all values of $\lambda_1$ and $\lambda_2$. In the supplement, we also give an algorithm for the general Fused Lasso for the case with predictor matrix $\bX \in \mathds{R}^{n \times p}$ with $\text{rank}(\bX)=p$.
💡 Research Summary
The paper addresses a fundamental computational bottleneck in the Fused Lasso Signal Approximator (FLSA), a regularized regression model that combines an L1 penalty on the coefficients (controlled by λ₁) with an L1 penalty on the differences between neighboring coefficients (controlled by λ₂). While the standard Lasso can be solved efficiently by path algorithms such as LARS, the presence of two coupled regularization parameters in the Fused Lasso makes the solution surface highly non‑linear and piecewise linear in a two‑dimensional parameter space. Existing solvers—coordinate descent, ADMM, or generic convex optimization packages—must be invoked separately for each pair (λ₁, λ₂), leading to prohibitive computational cost when a full grid or a cross‑validation sweep is required.
The authors propose a novel “event‑driven” path algorithm that simultaneously tracks the evolution of the optimal solution as both λ₁ and λ₂ vary continuously. Their key insight is that the optimal FLSA solution can be represented as a collection of segments (or clusters) of adjacent coefficients that share a common value. Two types of structural changes can occur as the regularization parameters move: (1) Merge events, where the difference penalty forces two neighboring segments to become indistinguishable (the difference between them hits zero), and (2) Cut events, where the coefficient penalty drives individual coefficients within a segment to zero. Each event corresponds to a breakpoint on the piecewise‑linear solution path.
To operationalize this insight, the algorithm first computes, for every current segment, the exact λ₁ and λ₂ values at which the next merge or cut would happen. These critical values are derived in closed form from the segment’s mean, size, and residual sum of squares, so they can be evaluated in O(1) time per segment. All candidate events are stored in a priority queue ordered by their critical λ value. The algorithm then proceeds iteratively: it extracts the event with the smallest critical value, updates the segment structure accordingly (either merging two adjacent segments or removing a coefficient from a segment), and recomputes the affected neighboring events. Because each iteration modifies only a constant number of segments, the total number of iterations is bounded by O(p), where p is the number of predictors. Each iteration incurs a O(log p) cost for the priority‑queue operations, yielding an overall time complexity of O(p log p) and a memory footprint of O(p).
The paper also extends the method to the general Fused Lasso where the design matrix X ∈ ℝⁿˣᵖ has full column rank (rank = p). In this setting the loss function is ‖y − Xβ‖₂² plus the same two L1 penalties. The authors show that the same event‑driven framework applies if one replaces the simple unweighted segment means with weighted means based on the inverse of XᵀX. Since X is fixed, XᵀX⁻¹ can be pre‑computed, and the merge and cut thresholds are obtained by analogous formulas that incorporate these weights. Consequently, the algorithm retains its O(p log p) scaling even for the full‑rank design case, with only a modest additional cost for matrix‑vector multiplications.
Empirical evaluation is performed on three fronts: (i) synthetic data where the true signal consists of piecewise‑constant blocks, (ii) real‑world genomic time‑course data where expression levels are expected to change smoothly across time points, and (iii) an image denoising task that can be cast as a two‑dimensional fused Lasso. In each scenario the proposed path algorithm is compared against coordinate descent, ADMM, and a naïve grid‑search implementation of LARS‑style path following. The results consistently demonstrate dramatic speedups—often an order of magnitude faster—while using substantially less memory. Moreover, because the algorithm produces the full solution surface, model‑selection procedures that require scanning a dense λ₁–λ₂ grid (e.g., cross‑validation, information‑criterion minimization) become feasible. Visualizations of the λ₁–λ₂ plane reveal clear regions where the number of active segments changes, providing intuitive guidance for selecting regularization strengths.
The contributions of the work can be summarized as follows:
- Event‑driven path tracking: A rigorous formulation of merge and cut events that fully characterizes the piecewise‑linear solution path of the Fused Lasso in two dimensions.
- Closed‑form critical values: Derivation of exact formulas for the λ thresholds at which structural changes occur, enabling O(1) per‑segment updates.
- Scalable algorithm: An O(p log p) time and O(p) memory algorithm that computes the entire λ₁–λ₂ solution surface in a single run.
- Extension to full‑rank design matrices: Adaptation of the method to the general Fused Lasso with a fixed, full‑rank predictor matrix, preserving the same computational guarantees.
- Comprehensive empirical validation: Demonstrations on synthetic, genomic, and imaging data that confirm both theoretical efficiency and practical usefulness.
In conclusion, the paper delivers a powerful computational tool for practitioners who need to fit fused‑Lasso models across a range of regularization parameters. By exploiting the intrinsic piecewise‑linear geometry of the problem, the authors turn what was previously a costly grid search into a single, elegant path‑following procedure. Future directions suggested include handling non‑ordered predictors (e.g., graph‑structured penalties), extending the framework to non‑quadratic loss functions (logistic, Poisson), and developing parallel or distributed implementations to tackle truly massive datasets.
Comments & Academic Discussion
Loading comments...
Leave a Comment