Learning nonnegative matrix factorizations from compressed data

Learning nonnegative matrix factorizations from compressed data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a flexible and theoretically supported framework for scalable nonnegative matrix factorization. The goal is to find nonnegative low-rank components directly from compressed measurements, accessing the original data only once or twice. We consider compression through randomized sketching methods that can be adapted to the data, or can be oblivious. We formulate optimization problems that only depend on the compressed data, but which can recover a nonnegative factorization which closely approximates the original matrix. The defined problems can be approached with a variety of algorithms, and in particular, we discuss variations of the popular multiplicative updates method for these compressed problems. We demonstrate the success of our approaches empirically and validate their performance in real-world applications.


💡 Research Summary

This paper introduces a theoretically grounded and practically efficient framework for performing Nonnegative Matrix Factorization (NMF) directly on compressed measurements, thereby avoiding repeated passes over the full data matrix. The authors adopt a “sketch‑and‑solve” paradigm: first a linear sketch of the original matrix X is computed (either on both sides, A₁X and XA₂, or on a single side, A·X or X·A), and then the factorization problem is reformulated solely in terms of these compressed quantities.

Three families of optimization problems are proposed.

  1. Two‑sided compression (Theorem 1) minimizes a sum of two Frobenius‑norm reconstruction errors (A₁X ≈ UVᵀ and UVᵀ ≈ XA₂) together with two regularization terms that enforce the factors to lie in the column spaces of the sketch matrices. The theorem guarantees exact recovery (X = ŨṼᵀ) whenever the sketch dimensions k₁, k₂ are at least the true non‑negative rank r and the sketches preserve the rank of X.
  2. One‑sided compression with orthonormal sketches (Theorem 2) uses a single sketch A with orthonormal rows (or columns). A regularizer λ‖P_{K_{Aᵀ}}UVᵀ − I‖_F² is added; under the same rank condition the compressed solution approximates the original NMF with an error that depends on how well the sketch spans the data subspace.
  3. One‑sided compression with data‑oblivious (random) sketches (Theorem 4) replaces the orthonormality assumption by an approximate one; the regularizer simplifies to λ‖UVᵀ‖_F². The resulting error bound includes an additional term reflecting the randomness of the sketch.

A key technical contribution is the adaptation of the classic Multiplicative Updates (MU) algorithm to these compressed objectives. Because random sketches can destroy element‑wise nonnegativity, the authors introduce a small shift σ·1·1ᵀ to the sketch Gram matrix (AᵀA + σ·1·1ᵀ), ensuring that all intermediate quantities remain nonnegative and the MU step sizes stay valid. Theorem 5 and Corollaries 4‑6 prove that, for any of the three compressed loss functions, the MU iterates never increase the objective, providing a convergence guarantee analogous to the uncompressed case.

Implementation details are carefully addressed. The projection operators P_{K_{XA₂}} and P_{K_{XᵀA₁}} are computed implicitly via thin QR factorizations (matrices Q₁, Q₂), avoiding the formation of large m × m or n × n matrices. Consequently, the total storage requirement reduces to O(r·(k₁ + k₂)) plus the sketches themselves, a dramatic reduction compared with storing the full matrix X (size m·n).

Experimental evaluation spans synthetic data, image datasets (e.g., MNIST, face images), text corpora (20 Newsgroups), and hyperspectral imagery. Using only about 5 % of the original entries (i.e., k ≈ 0.05·min(m,n)), the compressed NMF achieves reconstruction errors and clustering metrics within a few percent of the full‑data baseline, while cutting memory usage and runtime by an order of magnitude. One‑sided compression offers roughly half the storage of the two‑sided variant with comparable accuracy when the sketch is well‑chosen (e.g., via a randomized range finder).

The paper’s contributions can be summarized as follows:

  • Formal definition of compressed NMF objectives that are provably close to the original problem.
  • Rigorous error bounds linking sketch dimension, sketch quality (orthonormal vs. random), and the attainable reconstruction error.
  • Novel MU‑based algorithms for the compressed problems, together with monotonicity guarantees.
  • Practical implementation tricks that keep the method memory‑efficient and suitable for streaming or distributed environments.

Future directions suggested include adaptive selection of sketch dimensions, incorporation of additional structural priors (sparsity, hierarchical clustering), exploration of nonlinear (e.g., kernel) sketches, and large‑scale distributed implementations. Overall, the work bridges a critical gap between compressive sensing ideas and scalable NMF, offering a solid theoretical foundation and a usable algorithmic toolkit for practitioners dealing with massive nonnegative data.


Comments & Academic Discussion

Loading comments...

Leave a Comment