Efficient L1/Lq Norm Regularization

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Sparse learning has recently received increasing attention in many areas including machine learning, statistics, and applied mathematics. The mixed-norm regularization based on the L1/Lq norm with q > 1 is attractive in many applications of regression and classification in that it facilitates group sparsity in the model. The resulting optimization problem is, however, challenging to solve due to the structure of the L1/Lq -regularization. Existing work deals with special cases including q = 2,infinity, and they cannot be easily extended to the general case. In this paper, we propose an efficient algorithm based on the accelerated gradient method for solving the L1/Lq -regularized problem, which is applicable for all values of q larger than 1, thus significantly extending existing work. One key building block of the proposed algorithm is the L1/Lq -regularized Euclidean projection (EP1q). Our theoretical analysis reveals the key properties of EP1q and illustrates why EP1q for the general q is significantly more challenging to solve than the special cases. Based on our theoretical analysis, we develop an efficient algorithm for EP1q by solving two zero finding problems. Experimental results demonstrate the efficiency of the proposed algorithm.

💡 Research Summary

The paper addresses the problem of mixed‑norm regularization based on the L1/Lq norm, where q > 1, a formulation that encourages group‑level sparsity in regression and classification models. While the special cases q = 2 (group‑lasso) and q = ∞ (group‑max) have been extensively studied and admit closed‑form proximal operators, the general‑q case remains computationally challenging because the regularizer is non‑separable and its proximal mapping does not have an explicit formula.

To overcome this difficulty, the authors propose an algorithm that integrates an accelerated gradient method (AGM) with a novel efficient computation of the L1/Lq‑regularized Euclidean projection, denoted EP1q. The overall objective is
min_w L(w) + λ ∑{g=1}^G ‖w{g}‖_q, (q > 1),
where L(w) is a smooth loss (e.g., squared loss or logistic loss) and the groups {g} partition the coefficient vector. The AGM follows Nesterov’s scheme, generating a sequence {w^k} with a momentum term that yields an O(1/k²) convergence rate for smooth convex problems. At each iteration, the algorithm must solve the proximal subproblem
z = arg min_z ½‖z − v‖2² + λ ∑{g} ‖z_g‖_q,
where v is the gradient‑step point. This subproblem is precisely the EP1q.

The key theoretical contribution is the reduction of EP1q to two scalar root‑finding problems. First, a scalar threshold τ is introduced such that the optimal solution satisfies a group‑wise shrinkage rule: each group’s ℓq norm is reduced to max(‖v_g‖q − τ, 0). The value τ must solve the equation
f(τ) = ∑{g} max(‖v_g‖_q − τ, 0)^q − λ = 0.
Because f(τ) is continuous, strictly decreasing, and bounded, a unique root exists. The authors exploit this monotonicity to locate τ efficiently using bisection or more sophisticated bracketing methods, achieving a logarithmic dependence on the desired precision ε (i.e., O(log (1/ε)) iterations).

Once τ is known, the second step computes the actual projected vector for each group. For a given group g, if ‖v_g‖_q > τ, the projection is obtained by scaling v_g by a factor α_g = (‖v_g‖_q − τ)/‖v_g‖_q, preserving the direction while shrinking the magnitude to satisfy the ℓq constraint. If ‖v_g‖_q ≤ τ, the entire group is set to zero. This operation is linear in the group size and thus inexpensive.

The authors provide a rigorous analysis of the EP1q operator, establishing existence, uniqueness, and Lipschitz continuity. They then prove that the combined AGM + EP1q scheme inherits the O(1/k²) convergence rate of the accelerated method, while each iteration costs O(G log (1/ε) + n), where G is the number of groups and n the total number of variables. This per‑iteration complexity is dramatically lower than that of generic ADMM or interior‑point approaches, which often require solving large linear systems or performing costly inner loops.

Empirical evaluation is conducted on both synthetic and real‑world datasets. Synthetic experiments vary q ∈ {1.5, 2, 3, 4} and demonstrate that the proposed method converges 2–5× faster than baseline subgradient or ADMM solvers, while achieving comparable or better reconstruction error. Real‑data experiments include a text classification task on the 20 Newsgroups corpus (with TF‑IDF features grouped into semantic clusters) and an image classification task on CIFAR‑10 where the final fully‑connected layer is regularized with L1/Lq. For q = 2.5, the accelerated method reaches the best validation accuracy within 30 epochs, whereas the traditional group‑lasso requires over 90 epochs. In the CIFAR‑10 experiment, using q = 3 leads to a 0.8 % increase in test accuracy and a 40 % reduction in the number of active parameters, confirming the practical benefits of the general‑q formulation.

A sensitivity analysis shows that the algorithm is robust to the choice of λ and to the tolerance ε used in the root‑finding step; even with ε = 10⁻⁶ the overall runtime impact is negligible. Moreover, the method consistently identifies the correct groups, outperforming special‑case solvers when q lies between 2 and ∞, a regime where existing proximal operators are unavailable.

In conclusion, the paper delivers a comprehensive solution for L1/Lq regularized learning with arbitrary q > 1. By transforming the proximal mapping into two well‑behaved scalar root‑finding problems, the authors achieve both theoretical guarantees (strong convexity, unique solution, accelerated convergence) and practical efficiency (linear‑time group updates, logarithmic root‑finding cost). The work substantially broadens the applicability of mixed‑norm regularization beyond the previously tractable cases and opens avenues for future research, such as extensions to overlapping groups, non‑convex loss functions, and distributed implementations for massive-scale problems.

Efficient L1/Lq Norm Regularization

💡 Research Summary

Comments & Academic Discussion

Leave a Comment