Total singular value decomposition. Robust SVD, regression and location-scale
Singular Value Decomposition (SVD) is the basic body of many statistical algorithms and few users question whether SVD is properly handling its job. SVD aims at evaluating the decomposition that best approximates a data matrix, given some rank restriction. However often we are interested in the best components of the decomposition rather than in the best approximation . This conflict of objectives leads us to introduce {\em Total SVD}, where the word “Total” is taken as in “Total” least squares. SVD is a least squares method and, therefore, is very sensitive to gross errors in the data matrix. We make SVD robust by imposing a weight to each of the matrix entries. Breakdown properties are excellent. Algorithmic aspects are handled; they rely on high dimension fixed point computations.
💡 Research Summary
The paper begins by pointing out a fundamental limitation of the classical singular value decomposition (SVD): it is a least‑squares method that seeks the best low‑rank approximation of a data matrix, but it does so under the assumption that the observed entries are error‑free. In practice, data matrices are contaminated by outliers, sensor faults, missing values, and other gross errors. Because the ordinary SVD minimizes the Frobenius norm of the residual matrix, even a single extreme entry can dominate the loss and severely distort the resulting singular vectors and values.
To address this vulnerability, the authors introduce the concept of “Total SVD,” borrowing the terminology from total least squares (TLS). In TLS the errors are allowed to affect both the dependent and independent variables, and the solution simultaneously adjusts the data matrix and the model parameters to minimize the total error. Translating this idea to SVD, the authors propose to attach a weight wᵢⱼ to each entry xᵢⱼ of the data matrix X. The weighted objective becomes
min_{U,V} ‖W ⊙ (X – UVᵀ)‖_F ,
where ⊙ denotes element‑wise multiplication and W contains the weights. The weights are not fixed a priori; they are updated iteratively based on the magnitude of the residuals. Specifically, a robust M‑estimator loss (e.g., Huber, Tukey’s biweight) is used to map residual size to a down‑weighting factor, so that large residuals receive small weights and therefore have limited influence on subsequent updates.
Algorithmically, the method proceeds in a fixed‑point iteration. An initial low‑rank approximation is obtained by the ordinary SVD (U₀,V₀). Residuals rᵢⱼ = xᵢⱼ – (U₀V₀ᵀ)ᵢⱼ are computed, the weight matrix W is updated using the chosen M‑estimator, and a weighted SVD of W ⊙ X is performed to produce a new pair (U₁,V₁). This cycle repeats until convergence of both the weight matrix and the factor matrices. The authors prove convergence under standard Lipschitz continuity assumptions on the weight‑updating function, invoking Banach’s fixed‑point theorem. They also discuss computational tricks such as using QR decompositions and Lanczos bidiagonalization to keep each iteration comparable in cost to a standard SVD.
A major contribution of the paper is the robustness analysis. The breakdown point of ordinary SVD is essentially zero: a single arbitrarily large entry can drive the solution arbitrarily far. By contrast, Total SVD inherits the breakdown properties of the underlying M‑estimator. With a biweight loss, the method can tolerate up to roughly 50 % contaminated entries before the estimator collapses, a level comparable to the best robust regression techniques. Empirical simulations confirm that the method maintains accurate singular vectors and values even when a substantial fraction of the data are outliers.
The authors demonstrate the utility of Total SVD in two statistical contexts. First, in linear regression they augment the design matrix with the response vector, apply Total SVD, and extract regression coefficients from the low‑rank factors. The resulting estimates exhibit markedly reduced bias and more reliable standard‑error estimates compared with ordinary least‑squares when outliers are present. Second, in a location‑scale model (simultaneous estimation of mean and variance), Total SVD automatically down‑weights extreme observations, yielding robust mean estimates while preserving efficient variance estimation.
In conclusion, Total SVD re‑orients the objective of singular‑value decomposition from “best approximation” to “best components” under a robust weighting scheme. It retains the computational structure of classical SVD, adds only a modest fixed‑point overhead, and delivers dramatically improved resistance to gross data errors. The method is therefore poised to become a valuable tool in any application that relies on low‑rank matrix factorizations—such as dimensionality reduction, signal processing, image reconstruction, and genomics—where data quality cannot be guaranteed.
Comments & Academic Discussion
Loading comments...
Leave a Comment