Bi-cross-validation of the SVD and the nonnegative matrix factorization
This article presents a form of bi-cross-validation (BCV) for choosing the rank in outer product models, especially the singular value decomposition (SVD) and the nonnegative matrix factorization (NMF). Instead of leaving out a set of rows of the data matrix, we leave out a set of rows and a set of columns, and then predict the left out entries by low rank operations on the retained data. We prove a self-consistency result expressing the prediction error as a residual from a low rank approximation. Random matrix theory and some empirical results suggest that smaller hold-out sets lead to more over-fitting, while larger ones are more prone to under-fitting. In simulated examples we find that a method leaving out half the rows and half the columns performs well.
💡 Research Summary
The paper introduces a novel “bi‑cross‑validation” (BCV) scheme for selecting the rank of outer‑product matrix models, with a focus on the singular value decomposition (SVD) and the nonnegative matrix factorization (NMF). Traditional cross‑validation for matrix factorization typically holds out a set of rows (or columns) while using the full complement of the other dimension to predict the missing entries. In contrast, BCV simultaneously holds out a subset of rows and a subset of columns, thereby partitioning the data matrix (X) into four blocks: a retained block (X_{11}), two cross‑blocks (X_{12}, X_{21}), and a held‑out block (X_{22}).
The retained block (X_{11}) is used to compute a low‑rank approximation (\hat X_{11}=U_r D_r V_r^{\top}) (for SVD) or a nonnegative factorization (W_r H_r) (for NMF). The pseudo‑inverse of this approximation, (\hat X_{11}^{+}), is then employed to predict the held‑out entries via the formula
\
Comments & Academic Discussion
Loading comments...
Leave a Comment