Fast and Flexible ADMM Algorithms for Trend Filtering
This paper presents a fast and robust algorithm for trend filtering, a recently developed nonparametric regression tool. It has been shown that, for estimating functions whose derivatives are of bounded variation, trend filtering achieves the minimax optimal error rate, while other popular methods like smoothing splines and kernels do not. Standing in the way of a more widespread practical adoption, however, is a lack of scalable and numerically stable algorithms for fitting trend filtering estimates. This paper presents a highly efficient, specialized ADMM routine for trend filtering. Our algorithm is competitive with the specialized interior point methods that are currently in use, and yet is far more numerically robust. Furthermore, the proposed ADMM implementation is very simple, and importantly, it is flexible enough to extend to many interesting related problems, such as sparse trend filtering and isotonic trend filtering. Software for our method is freely available, in both the C and R languages.
💡 Research Summary
This paper addresses the computational challenges of trend filtering, a non‑parametric regression technique that estimates a signal β from observations y by solving
minβ ½‖y − β‖₂² + λ‖D^{(k+1)}β‖₁,
where D^{(k+1)} is the (k + 1)‑st order discrete difference operator and λ controls smoothness. For k = 0 the problem reduces to the one‑dimensional fused‑lasso (total variation denoising); for k ≥ 1 the solution is a piecewise polynomial of degree k with adaptively chosen knots. While trend filtering enjoys minimax‑optimal statistical properties for functions with bounded‑variation derivatives, existing solvers either lack scalability (generic interior‑point methods) or are limited to the k = 0 case (taut‑string, dynamic programming).
The authors propose a highly efficient, specialized Alternating Direction Method of Multipliers (ADMM) algorithm that leverages the recursive relationship D^{(k+1)} = D^{(1)} D^{(k)}. By introducing an auxiliary variable α = D^{(k)}β, the original problem is reformulated as
min_{β,α} ½‖y − β‖₂² + λ‖D^{(1)}α‖₁ subject to α = D^{(k)}β.
The augmented Lagrangian leads to three updates per iteration:
-
β‑update – solve a linear system (I + ρ (D^{(k)})ᵀD^{(k)})β = y + ρ (D^{(k)})ᵀ(α + u). Because D^{(k)} is a banded matrix with bandwidth k + 2, this step can be performed in O(n) time using a banded Cholesky factorization (once) followed by forward‑backward substitution.
-
α‑update – solve
α = argmin_{α} ½‖D^{(k)}β − u − α‖₂² + (λ/ρ)‖D^{(1)}α‖₁,
which is exactly a 0‑th order trend‑filtering (fused‑lasso) problem. The authors exploit existing linear‑time solvers for the fused‑lasso (taut‑string or dynamic programming) to compute this step exactly and efficiently.
- Dual‑variable update – u ← u + α − D^{(k)}β.
Thus each ADMM iteration costs O(n) operations, comparable to the specialized primal‑dual interior‑point (PDIP) method of Kim et al. (2009), but with a dramatically lower constant factor: empirical timings show a single ADMM iteration is roughly ten times faster than a PDIP iteration.
The paper provides a thorough empirical comparison. In a motivating simulation (n = 1000, k = 1, λ = 1000) standard first‑order methods (proximal gradient, accelerated proximal gradient, coordinate descent) and a “standard” ADMM formulation (where the ℓ₁ penalty is applied directly to D^{(k+1)}β) converge extremely slowly, requiring thousands of iterations. In contrast, both the specialized ADMM and PDIP reach high‑accuracy solutions within about 20 iterations. Across a broad grid of problem sizes (n up to 10⁵) and λ values, the specialized ADMM is more robust: it converges reliably even when PDIP struggles with ill‑conditioning (especially for small λ and large n). For very small problems and large λ, PDIP’s second‑order convergence can be faster, but the visual quality of the ADMM solution after a modest number of iterations is already indistinguishable from the exact solution.
Beyond the basic model, the authors discuss extensions made trivial by the ADMM framework:
- Sparse trend filtering – adding an ℓ₁ penalty on α itself to encourage exact zero differences, yielding a “sparse” piecewise‑polynomial fit.
- Mixed‑order trend filtering – simultaneously fitting several orders k₁,…,k_m by introducing multiple auxiliary variables and penalizing a weighted sum of their ℓ₁ norms.
- Isotonic (monotone) trend filtering – imposing α ≥ 0 (or ≤ 0) constraints, which are handled by a simple projection in the α‑update.
- Unequally spaced inputs – adjusting the difference matrices with appropriate scaling factors; the same ADMM updates apply unchanged.
All these variants require only minor modifications to the α‑update (e.g., different proximal operators or projections) while retaining the O(n) per‑iteration cost and the same convergence robustness.
The authors also provide open‑source software implementations in C and R, with a clean API, low memory footprint, and optional multithreading for the banded linear solves. The code includes the fused‑lasso sub‑solver, making the package self‑contained.
In summary, by exploiting the recursive structure of discrete differences, the paper delivers a specialized ADMM algorithm that (1) reduces each iteration to a cheap banded linear solve plus an exact fused‑lasso subproblem, (2) achieves linear‑time scaling and superior numerical stability compared to existing interior‑point methods, and (3) offers a flexible platform for a wide range of trend‑filtering extensions. This work substantially lowers the computational barrier to applying trend filtering in large‑scale, real‑world data analysis.
Comments & Academic Discussion
Loading comments...
Leave a Comment