A Law of Data Reconstruction for Random Features (and Beyond)

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large-scale deep learning models are known to memorize parts of the training set. In machine learning theory, memorization is often framed as interpolation or label fitting, and classical results show that this can be achieved when the number of parameters $p$ in the model is larger than the number of training samples $n$. In this work, we consider memorization from the perspective of data reconstruction, demonstrating that this can be achieved when $p$ is larger than $dn$, where $d$ is the dimensionality of the data. More specifically, we show that, in the random features model, when $p \gg dn$, the subspace spanned by the training samples in feature space gives sufficient information to identify the individual samples in input space. Our analysis suggests an optimization method to reconstruct the dataset from the model parameters, and we demonstrate that this method performs well on various architectures (random features, two-layer fully-connected and deep residual networks). Our results reveal a law of data reconstruction, according to which the entire training dataset can be recovered as $p$ exceeds the threshold $dn$.

💡 Research Summary

The paper investigates the relationship between model over‑parameterization and the ability to reconstruct the entire training dataset from the learned parameters. While classical learning theory states that a model with more parameters than training examples (p > n) can interpolate the labels, the authors argue that reconstructing the inputs themselves requires a stronger condition: the number of parameters must exceed the product of the input dimensionality and the number of samples (p ≫ d n).

Using the random features (RF) regression model, where the predictor is f_RF(x,θ)=φ(x)ᵀθ with φ(x)=ϕ(Vx) (V∈ℝ^{p×d} random Gaussian, ϕ a non‑linear activation), they first show that after training with squared loss the optimal parameters θ* lie exactly in the span of the training feature vectors Φ=

A Law of Data Reconstruction for Random Features (and Beyond)

💡 Research Summary

Comments & Academic Discussion

Leave a Comment