Making Tensor Factorizations Robust to Non-Gaussian Noise

Tensors are multi-way arrays, and the Candecomp/Parafac (CP) tensor factorization has found application in many different domains. The CP model is typically fit using a least squares objective function, which is a maximum likelihood estimate under the assumption of i.i.d. Gaussian noise. We demonstrate that this loss function can actually be highly sensitive to non-Gaussian noise. Therefore, we propose a loss function based on the 1-norm because it can accommodate both Gaussian and grossly non-Gaussian perturbations. We also present an alternating majorization-minimization algorithm for fitting a CP model using our proposed loss function.

💡 Research Summary

The paper addresses a critical vulnerability of the canonical polyadic (CP) tensor decomposition: its standard fitting procedure relies on a least‑squares (LS) objective, which is optimal only under the assumption of independent, identically distributed Gaussian noise. In many real‑world applications—such as sensor networks, biomedical recordings, and image processing—measurements are contaminated by non‑Gaussian disturbances, including spikes, outliers, and heavy‑tailed noise. Because the LS loss squares residuals, even a small proportion of large errors can dominate the objective and severely bias the factor estimates.

To mitigate this problem, the authors propose replacing the LS loss with an ℓ₁‑norm based loss, i.e., minimizing the sum of absolute residuals between the observed tensor and its CP reconstruction. The ℓ₁ loss grows linearly with the magnitude of an error, thereby limiting the influence of extreme outliers while still providing reasonable performance under Gaussian noise (as measured by mean absolute error).

Fitting a CP model with an ℓ₁ loss is non‑trivial because the objective is non‑smooth and non‑convex with respect to the factor matrices. The authors develop an alternating majorization‑minimization (AM‑MM) algorithm. In each outer iteration, one factor matrix is updated while all others are held fixed, reducing the problem to a weighted least‑squares sub‑problem. The key insight is to construct a quadratic majorizer for the absolute‑value function at the current iterate: each residual contributes a weight inversely proportional to its magnitude (plus a small ε for stability). This majorizer upper‑bounds the ℓ₁ loss and touches it at the current point, guaranteeing that minimizing the majorizer will not increase the original objective. The resulting weighted LS sub‑problem admits a closed‑form solution via normal equations, so each inner update is computationally comparable to a step of the classic alternating least‑squares (ALS) algorithm, with only modest overhead for weight computation.

The authors prove that the MM updates monotonically decrease the ℓ₁ objective, and that the overall alternating scheme converges to a stationary point. Global optimality is not guaranteed, as is typical for non‑convex tensor problems, and the algorithm’s performance can depend on initialization. Consequently, they recommend multiple random starts or SVD‑based warm starts to improve robustness.

Extensive experiments validate the approach. Synthetic tensors of varying rank are corrupted with three noise models: pure Gaussian, sparse spike (impulsive) noise, and a mixture of both, at contamination levels ranging from 5 % to 30 %. When spike noise exceeds roughly 10 %, the proposed ℓ₁‑MM method reduces reconstruction error by 25 %–45 % relative to standard ALS, while performing comparably under pure Gaussian noise. Real‑world tests on EEG recordings and color‑histogram image data—datasets known to contain outliers and missing entries—show similar gains: the ℓ₁‑based factorization yields cleaner reconstructions, which translate into higher downstream classification or segmentation accuracy.

The paper also discusses limitations. The ℓ₁ loss can be slightly sub‑optimal for purely Gaussian data, leading to a modest increase in error compared with LS. The MM framework requires recomputing weights at every inner iteration, adding a small computational burden. Moreover, because the algorithm converges only to local minima, careful initialization remains essential.

Future research directions include exploring hybrid loss functions (e.g., Huber or Tukey’s biweight) that blend ℓ₁ and ℓ₂ behavior, integrating sparsity‑promoting regularizers to capture structured factors, and extending the majorization‑minimization scheme to other tensor models such as Tucker or tensor‑train decompositions.

In summary, this work makes a significant contribution to robust tensor analysis by (1) highlighting the sensitivity of LS‑based CP fitting to non‑Gaussian noise, (2) introducing an ℓ₁‑norm loss that tolerates outliers, and (3) providing a provably convergent, efficiently implementable AM‑MM algorithm. The results demonstrate that robust CP factorization can be achieved without sacrificing computational practicality, opening the door to more reliable multi‑way data modeling in noisy, real‑world environments.

💡 Research Summary

📜 Original Paper Content