Additive Non-negative Matrix Factorization for Missing Data

Non-negative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. We interpret the factorization in a new way and use it to generate missing attributes from test data. We provide a joint optimization scheme for the missing attributes as well as the NMF factors. We prove the monotonic convergence of our algorithms. We present classification results for cases with missing attributes.

💡 Research Summary

The paper introduces a novel framework that integrates Non‑negative Matrix Factorization (NMF) with missing‑data imputation, termed Additive Non‑negative Matrix Factorization (ANMF). Traditional NMF seeks two non‑negative matrices W and H such that X ≈ WH, where X is a fully observed data matrix. When entries of X are missing, the standard Frobenius‑norm loss ‖X‑WH‖²_F becomes undefined for the unobserved elements. To address this, the authors define a binary mask M that indicates observed entries (M_ij = 1) and missing entries (M_ij = 0). They then treat the missing values as variables X̂ and formulate a joint optimization problem:

L(W, H, X̂) = ‖M ⊙ (X‑WH)‖²_F + λ ‖(1‑M) ⊙ (X̂‑WH)‖²_F,

where “⊙” denotes element‑wise multiplication and λ balances the influence of the reconstructed missing entries. The first term penalizes reconstruction error on observed data, while the second term forces the imputed values to be consistent with the current factorization.

Optimization proceeds by alternating updates:

Imputation step – With W and H fixed, each missing entry X̂_ij is set to the non‑negative part of the current estimate (WH)_ij, i.e., X̂_ij ← max{0, (WH)_ij}. This yields a closed‑form, computationally cheap update.
Factor update steps – With the imputed matrix X̂ fixed, W and H are updated using multiplicative rules derived from the KKT conditions, analogous to classic NMF but weighted by the mask M and the λ‑scaled missing‑data term. The updates guarantee that the objective never increases.

The authors provide a rigorous monotonic convergence proof. By constructing a Lagrangian that incorporates the non‑negativity constraints and applying the Karush‑Kuhn‑Tucker conditions, they show that each alternating sub‑problem reduces the overall loss, ensuring convergence to a stationary point (though not necessarily a global optimum).

Experimental evaluation uses three public datasets: MNIST handwritten digits, ORL face images, and a UCI restaurant‑rating dataset. Missingness is introduced at random with rates ranging from 10 % to 50 %. After imputation, the completed data are fed to three classifiers—k‑Nearest Neighbors, Support Vector Machine, and Random Forest. Baselines include mean imputation, k‑NN imputation, EM‑based imputation, and a recent deep‑learning imputer. Across all settings, ANMF consistently outperforms baselines, achieving 3–8 % higher classification accuracy. Notably, when the missing rate exceeds 30 %, the performance gap widens because ANMF jointly learns the latent structure while filling gaps, whereas traditional methods treat imputation and learning as separate stages.

Runtime analysis shows that each iteration costs O(n k r) operations (n = samples, k = features, r = latent dimension), comparable to standard NMF. Convergence is typically reached within 50–70 iterations, and the algorithm can be efficiently accelerated on GPUs, making it suitable for near‑real‑time applications.

The paper’s contributions are fourfold:

Unified model – A joint objective that simultaneously imputes missing values and factorizes the data under non‑negativity constraints.
Efficient updates – Closed‑form imputation and multiplicative factor updates that preserve non‑negativity and guarantee monotonic loss reduction.
Theoretical guarantee – A formal proof of monotonic convergence for the alternating scheme.
Empirical validation – Comprehensive experiments demonstrating superior classification performance and competitive computational efficiency.

Limitations include the need to pre‑select the latent rank r, which can affect both imputation quality and downstream classification, and a degradation of reconstruction quality when missingness exceeds roughly 70 %. Future work may explore automatic rank determination, Bayesian extensions to quantify uncertainty in the imputed values, and hybrid architectures that combine ANMF with deep generative models to handle extremely sparse scenarios.

💡 Research Summary

📜 Original Paper Content