MM Algorithms for Minimizing Nonsmoothly Penalized Objective Functions
In this paper, we propose a general class of algorithms for optimizing an extensive variety of nonsmoothly penalized objective functions that satisfy certain regularity conditions. The proposed framework utilizes the majorization-minimization (MM) algorithm as its core optimization engine. The resulting algorithms rely on iterated soft-thresholding, implemented componentwise, allowing for fast, stable updating that avoids the need for any high-dimensional matrix inversion. We establish a local convergence theory for this class of algorithms under weaker assumptions than previously considered in the statistical literature. We also demonstrate the exceptional effectiveness of new acceleration methods, originally proposed for the EM algorithm, in this class of problems. Simulation results and a microarray data example are provided to demonstrate the algorithm’s capabilities and versatility.
💡 Research Summary
This paper introduces a unified optimization framework for a broad class of high‑dimensional statistical estimation problems that involve nonsmooth (i.e., nondifferentiable at the origin) penalty functions. The authors formulate the objective as
ξ(β) = g(β) + p(β; λ) + λ ε‖β‖₂²,
where g(β) is a convex, coercive data‑fidelity term (e.g., negative log‑likelihood), p(β; λ) = Σ_j \tilde p(|β_j|; λ_j) is a separable penalty satisfying condition (P1): \tilde p is continuously differentiable, convex, positive for r > 0, zero at the origin, and has a finite positive right‑derivative at zero. This condition encompasses many popular penalties such as LASSO, Adaptive LASSO, Elastic Net, SCAD, MCP, Geman‑Reynolds, and Yao’s log‑penalty, as well as ridge‑type L2 regularization.
The core algorithmic engine is the Majorization‑Minimization (MM) principle. For a current iterate α, a surrogate (majorizing) function is constructed as
ξ_SUR(β, α) = ξ(β) + ψ(β, α),
with ψ(β, α) = h(β, α) + q(β, α; λ) – p(β; λ). Here h(β, α) ≥ 0 vanishes when β = α, and q(β, α; λ) = Σ_j \tilde q(|β_j|, |α_j|; λ_j) is a linearization of the penalty using the first‑order Taylor expansion
\tilde q(r, s; θ) = \tilde p(s; θ) + \tilde p′(s; θ)(r – s).
Theorem 2.1 establishes that, under mild regularity (convexity of g, coercivity, finiteness and isolation of stationary points, and strict majorization of ξ by ξ_SUR), the MM iteration β^{(k+1)} = arg min_β ξ_SUR(β, β^{(k)}) converges to a stationary point of ξ. Notably, the theorem relaxes the usual twice‑continuous differentiability requirement, allowing the framework to handle the nonsmooth penalties listed above.
When the surrogate is minimized, the problem reduces to
min_β
Comments & Academic Discussion
Loading comments...
Leave a Comment