MM Algorithms for Minimizing Nonsmoothly Penalized Objective Functions

MM Algorithms for Minimizing Nonsmoothly Penalized Objective Functions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper, we propose a general class of algorithms for optimizing an extensive variety of nonsmoothly penalized objective functions that satisfy certain regularity conditions. The proposed framework utilizes the majorization-minimization (MM) algorithm as its core optimization engine. The resulting algorithms rely on iterated soft-thresholding, implemented componentwise, allowing for fast, stable updating that avoids the need for any high-dimensional matrix inversion. We establish a local convergence theory for this class of algorithms under weaker assumptions than previously considered in the statistical literature. We also demonstrate the exceptional effectiveness of new acceleration methods, originally proposed for the EM algorithm, in this class of problems. Simulation results and a microarray data example are provided to demonstrate the algorithm’s capabilities and versatility.


💡 Research Summary

This paper introduces a unified optimization framework for a broad class of high‑dimensional statistical estimation problems that involve nonsmooth (i.e., nondifferentiable at the origin) penalty functions. The authors formulate the objective as

 ξ(β) = g(β) + p(β; λ) + λ ε‖β‖₂²,

where g(β) is a convex, coercive data‑fidelity term (e.g., negative log‑likelihood), p(β; λ) = Σ_j \tilde p(|β_j|; λ_j) is a separable penalty satisfying condition (P1): \tilde p is continuously differentiable, convex, positive for r > 0, zero at the origin, and has a finite positive right‑derivative at zero. This condition encompasses many popular penalties such as LASSO, Adaptive LASSO, Elastic Net, SCAD, MCP, Geman‑Reynolds, and Yao’s log‑penalty, as well as ridge‑type L2 regularization.

The core algorithmic engine is the Majorization‑Minimization (MM) principle. For a current iterate α, a surrogate (majorizing) function is constructed as

 ξ_SUR(β, α) = ξ(β) + ψ(β, α),

with ψ(β, α) = h(β, α) + q(β, α; λ) – p(β; λ). Here h(β, α) ≥ 0 vanishes when β = α, and q(β, α; λ) = Σ_j \tilde q(|β_j|, |α_j|; λ_j) is a linearization of the penalty using the first‑order Taylor expansion

 \tilde q(r, s; θ) = \tilde p(s; θ) + \tilde p′(s; θ)(r – s).

Theorem 2.1 establishes that, under mild regularity (convexity of g, coercivity, finiteness and isolation of stationary points, and strict majorization of ξ by ξ_SUR), the MM iteration β^{(k+1)} = arg min_β ξ_SUR(β, β^{(k)}) converges to a stationary point of ξ. Notably, the theorem relaxes the usual twice‑continuous differentiability requirement, allowing the framework to handle the nonsmooth penalties listed above.

When the surrogate is minimized, the problem reduces to

 min_β


Comments & Academic Discussion

Loading comments...

Leave a Comment