Learning from compressed observations
The problem of statistical learning is to construct a predictor of a random variable $Y$ as a function of a related random variable $X$ on the basis of an i.i.d. training sample from the joint distribution of $(X,Y)$. Allowable predictors are drawn from some specified class, and the goal is to approach asymptotically the performance (expected loss) of the best predictor in the class. We consider the setting in which one has perfect observation of the $X$-part of the sample, while the $Y$-part has to be communicated at some finite bit rate. The encoding of the $Y$-values is allowed to depend on the $X$-values. Under suitable regularity conditions on the admissible predictors, the underlying family of probability distributions and the loss function, we give an information-theoretic characterization of achievable predictor performance in terms of conditional distortion-rate functions. The ideas are illustrated on the example of nonparametric regression in Gaussian noise.
💡 Research Summary
The paper introduces a novel learning‑under‑communication‑constraints framework that blends statistical learning theory with classical rate‑distortion concepts. In the standard supervised learning setting one observes i.i.d. samples ((X_i,Y_i)) from an unknown joint distribution and seeks a predictor (f) from a prescribed function class (\mathcal F) that minimizes the expected loss (\mathbb{E}