Structured Low-Rank Matrix Factorization with Missing and Grossly Corrupted Observations
Recovering low-rank and sparse matrices from incomplete or corrupted observations is an important problem in machine learning, statistics, bioinformatics, computer vision, as well as signal and image processing. In theory, this problem can be solved by the natural convex joint/mixed relaxations (i.e., l_{1}-norm and trace norm) under certain conditions. However, all current provable algorithms suffer from superlinear per-iteration cost, which severely limits their applicability to large-scale problems. In this paper, we propose a scalable, provable structured low-rank matrix factorization method to recover low-rank and sparse matrices from missing and grossly corrupted data, i.e., robust matrix completion (RMC) problems, or incomplete and grossly corrupted measurements, i.e., compressive principal component pursuit (CPCP) problems. Specifically, we first present two small-scale matrix trace norm regularized bilinear structured factorization models for RMC and CPCP problems, in which repetitively calculating SVD of a large-scale matrix is replaced by updating two much smaller factor matrices. Then, we apply the alternating direction method of multipliers (ADMM) to efficiently solve the RMC problems. Finally, we provide the convergence analysis of our algorithm, and extend it to address general CPCP problems. Experimental results verified both the efficiency and effectiveness of our method compared with the state-of-the-art methods.
💡 Research Summary
The paper tackles the problem of simultaneously recovering a low‑rank matrix and a sparse error matrix from observations that are both incomplete and heavily corrupted. This setting encompasses Robust Matrix Completion (RMC) – where only a subset of entries is observed and some of those are outliers – and Compressive Principal Component Pursuit (CPCP), where the data are accessed through a small number of linear measurements. Classical approaches formulate these tasks as convex programs that minimize a weighted sum of the ℓ₁‑norm (promoting sparsity) and the nuclear norm (promoting low rank). While these formulations enjoy strong theoretical guarantees, every iteration of the standard solvers (e.g., IALM, ADMM, accelerated proximal gradient) requires a full singular value decomposition (SVD) of an m × n matrix, leading to a per‑iteration cost of O(m n²). This makes them impractical for modern large‑scale problems.
The authors propose a fundamentally different strategy: replace the large low‑rank matrix L by a bilinear factorization L = U Vᵀ, where U ∈ ℝ^{m×d} and V ∈ ℝ^{n×d} with d ≥ rank(L). By imposing an orthogonality constraint UᵀU = I, they show (Lemma 3) that the nuclear norm of L equals the nuclear norm of V, i.e., ‖L‖* = ‖V‖*. Consequently, the original convex problem can be rewritten as a much smaller optimization over (U, V, S) with a trace‑norm regularizer only on V. For RMC the observation constraint P_Ω(D) = P_Ω(L + S) simplifies (Lemma 2) to D = U Vᵀ + S because the optimal sparse component is zero on unobserved entries. For CPCP the same factorization is used together with a general linear operator A.
To solve the resulting non‑convex problem, the authors develop an ADMM scheme that alternately updates U, V, and S while maintaining the orthogonality of U. The sub‑problems are:
-
U‑update: minimize ‖U Vᵀ − P_k‖F² subject to UᵀU = I, where P_k = D − S_k + Y_k/α_k. Instead of the usual SVD of P_k V_k, they employ a QR decomposition (U{k+1}, R = qr(P_k V_k)). This yields an orthonormal basis for the column space of P_k V_k, dramatically reducing computational cost (O(m d k) versus O(m n²)) and enabling efficient parallel implementation.
-
V‑update: solve a convex problem ‖U_{k+1} Vᵀ − P_k‖F² + λ‖V‖*. This is exactly the proximal operator of the nuclear norm, which is computed by singular‑value thresholding (SVT): compute the SVD of U_{k+1}ᵀ P_k, shrink singular values by λ/α_k, and reconstruct V. Since V is only d × n, this SVD is cheap.
-
S‑update: minimize λ₁‖S‖₁ + (α/2)‖D − U_{k+1} V_{k+1}ᵀ − S‖_F², which has a closed‑form soft‑thresholding solution.
For CPCP, the linear operator A prevents a direct D = U Vᵀ + S formulation. The authors linearize the A‑constraint in the ADMM updates, yielding tractable sub‑problems analogous to those above while preserving convergence properties.
Theoretical contributions include:
- Equivalence Theorems (4 and 5): Any optimal solution (L*, S*) of the original convex RMC/CPCP problems can be expressed as (U*, V*, S*) with L* = U* V*ᵀ, confirming that the factorized model does not lose optimality.
- Convergence Analysis: Section 5 provides a partial convergence proof for the non‑convex ADMM, showing that the sequence of iterates approaches a stationary point and that the QR‑based U‑update is equivalent to the SVD‑based optimal update in the limit.
Empirical evaluation spans synthetic data, video background subtraction, collaborative filtering (MovieLens), and face image reconstruction. Compared with state‑of‑the‑art methods (IALM, LRSD, RPCA‑Alt, previous bilinear approaches), the proposed Robust Bilinear Factorization (RBF) algorithm achieves:
- Speedups of 5–12× on matrices up to 10⁵ × 10⁵, thanks to linear‑time QR updates and small‑dimensional SVDs.
- Comparable or superior accuracy measured by RMSE, PSNR, and recovery of support for the sparse component, even under high missing rates (>70 %) and strong outliers.
- Scalability: Memory usage grows with O((m + n) d) rather than O(m n), enabling processing of datasets that would otherwise exceed RAM limits.
In summary, the paper delivers a scalable, provably sound framework for robust low‑rank matrix recovery under missing and grossly corrupted observations. By replacing costly full‑matrix SVDs with inexpensive QR factorizations and operating on low‑dimensional factors, the method bridges the gap between strong theoretical guarantees and practical applicability to modern large‑scale data problems.
Comments & Academic Discussion
Loading comments...
Leave a Comment