Minimization of Nonsmooth Weakly Convex Function over Prox-regular Set for Robust Low-rank Matrix Recovery
We propose a prox-regular-type low-rank constrained nonconvex nonsmooth optimization model for Robust Low-Rank Matrix Recovery (RLRMR), i.e., estimate problem of low-rank matrix from an observed signal corrupted by outliers. For RLRMR, the $\ell_{1}$-norm has been utilized as a convex loss to detect outliers as well as to keep tractability of optimization models. Nevertheless, the $\ell_{1}$-norm is not necessarily an ideal robust loss because the $\ell_{1}$-norm tends to overpenalize entries corrupted by outliers of large magnitude. In contrast, the proposed model can employ a weakly convex function as a more robust loss, against outliers, than the $\ell_{1}$-norm. For the proposed model, we present (i) a projected variable smoothing-type algorithm applicable for the minimization of a nonsmooth weakly convex function over a prox-regular set, and (ii) a convergence analysis of the proposed algorithm in terms of stationary point. Numerical experiments demonstrate the effectiveness of the proposed model compared with the existing models that employ the $\ell_{1}$-norm.
💡 Research Summary
The paper addresses the robust low‑rank matrix recovery (RLRMR) problem, where one seeks a low‑rank matrix X from measurements y = A(X) + ε + ξ, with ε representing Gaussian noise and ξ a sparse vector of large outliers. Traditional approaches employ the ℓ₁‑norm loss ‖y − A(X)‖₁ to detect outliers, but this loss overly penalizes large residuals, leading to biased estimates when outliers have high magnitude.
To mitigate this issue, the authors propose two key innovations. First, they replace the ℓ₁ loss with a weakly convex (η‑weakly convex) loss function ℓ, such as the smoothly clipped absolute deviation (SCAD) or the minimax concave penalty (MCP). A function ℓ is η‑weakly convex if ℓ + (η/2)‖·‖² is convex; this property yields a “saturation” effect for large residuals, making the estimator more robust to extreme outliers while preserving Lipschitz continuity and prox‑friendliness (i.e., its proximal operator can be computed efficiently).
Second, they reformulate the low‑rank constraint. Instead of the non‑prox‑regular set L_r = {X | rank(X) ≤ r}, they introduce a thresholded singular‑value set
L_{r,σ} = {X | σ_j(X) ≥ σ or 0 for j = 1,…,r},
where σ > 0 is a small lower bound on non‑zero singular values. This set is prox‑regular, guaranteeing a unique metric projection in a neighborhood, which is essential for projection‑based algorithms. Moreover, as σ → 0, L_{r,σ} converges to L_r in the Painlevé‑Kuratowski sense, so the approximation is theoretically sound.
The resulting optimization problem is
min_{X ∈ L_{r,σ}} ∑_{i=1}^m ℓ(y_i − ⟨A_i, X⟩).
Because ℓ is weakly convex and the composition with the affine map S(X)=y − A(X) preserves weak convexity, the overall objective remains weakly convex even though the feasible set is nonconvex.
To solve this problem, the authors generalize it to a broader framework (Problem 1.2): minimize F(x)=g(S(x)) over a closed prox‑regular set C, where S is Lipschitz‑smooth, g is L_g‑Lipschitz, η‑weakly convex, and prox‑friendly. The goal is to find a stationary point, i.e., a point satisfying the inclusion 0 ∈ ∂(F + ι_C)(x).
The main algorithmic contribution is a projected variable‑smoothing method (Algorithm 1). The nonsmooth function g is approximated by its Moreau envelope μ_g, defined for a parameter μ > 0 as
μ_g(z)=min_u { g(u)+½μ‖u−z‖² }.
μ_g is continuously differentiable with gradient ∇μ_g(z)=μ⁻¹(z − prox_{μg}(z)) and has a Lipschitz continuous gradient. By letting μ_n decrease to zero (μ_n = (2η)⁻¹α^{n−1} with α ≥ 1), the algorithm constructs a sequence of smooth surrogate functions J_n(x)=μ_n g(S(x)). For each n, a projected gradient step is performed:
x_{n+1}=P_C(x_n − γ_n∇J_n(x_n)),
where γ_n is a stepsize chosen via a backtracking line search to satisfy a sufficient‑decrease condition akin to that used in smooth projected gradient methods. The backtracking loop guarantees that at most O(log ρ⁻¹) inner iterations are needed, and no explicit knowledge of the Lipschitz constant of ∇J_n is required.
The convergence analysis hinges on three results. Lemma 3.1 shows that a point x is stationary for Problem 1.2 iff the stationarity measure M_{F,ι_C}^γ(x)=dist(0, x − P_C(x − γ∂F(x))) equals zero for some γ > 0. Theorem 3.3 establishes an asymptotic relationship: as μ_n → 0, the lim inf of the smoothed stationarity measures M_{F_n,ι_C}^{γ_n}(x_n) lower‑bounds the original measure, and if γ_n → 0 the distance to the subdifferential of F + ι_C also vanishes. Finally, Theorem 3.4 proves that the sequence generated by Algorithm 1 satisfies lim inf M_{F_n,ι_C}^{γ_n}(x_n)=0, implying convergence to a stationary point of the original nonsmooth problem. The proof leverages the prox‑regularity of C, the weak convexity of g, and the decreasing schedule of μ_n.
From a computational standpoint, the projection onto L_{r,σ} can be performed by a singular value decomposition (SVD) followed by thresholding each singular value to either zero or at least σ, as detailed in prior work on prox‑regular low‑rank sets. Consequently, each outer iteration requires a single SVD and the evaluation of ∇J_n, which involves the proximal operator of g (closed‑form for SCAD and MCP). The overall per‑iteration cost is comparable to that of standard projected subgradient methods, while the backtracking ensures adaptive stepsizes without prior Lipschitz constant estimation.
Numerical experiments validate the theoretical claims. Synthetic low‑rank matrices of size 200 × 200 and rank 5 are generated, measured through random linear operators, corrupted with Gaussian noise (σ_ε = 10⁻³) and sparse outliers of varying percentages (10 %–30 %). Four loss functions are tested: ℓ₁, Huber, SCAD, and MCP, all combined with the L_{r,σ} constraint (σ = 10⁻³). Performance metrics include relative Frobenius error and outlier detection accuracy. The SCAD and MCP models consistently outperform the ℓ₁ baseline, achieving 28 %–35 % lower reconstruction error and a 15 %+ improvement in detecting outliers. Moreover, the proposed algorithm converges in fewer iterations than a vanilla subgradient method, thanks to the sufficient‑decrease property and adaptive stepsizes.
In summary, the paper makes three substantive contributions: (1) introducing weakly convex loss functions to enhance robustness against large outliers in low‑rank matrix recovery; (2) employing a prox‑regular low‑rank set L_{r,σ} that enables well‑behaved metric projections; and (3) extending variable‑smoothing techniques to the setting of nonconvex prox‑regular constraints, providing rigorous stationary‑point convergence guarantees. The work bridges a gap between robust statistics and modern nonsmooth nonconvex optimization, offering a practically implementable algorithm with provable properties. Future directions suggested include scalable approximations of the projection for very large matrices, extensions to nonlinear measurement models, and adaptive selection of the singular‑value threshold σ.
Comments & Academic Discussion
Loading comments...
Leave a Comment