Beyond the Loss Curve: Scaling Laws, Active Learning, and the Limits of Learning from Exact Posteriors

Beyond the Loss Curve: Scaling Laws, Active Learning, and the Limits of Learning from Exact Posteriors
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

How close are neural networks to the best they could possibly do? Standard benchmarks cannot answer this because they lack access to the true posterior p(y|x). We use class-conditional normalizing flows as oracles that make exact posteriors tractable on realistic images (AFHQ, ImageNet). This enables five lines of investigation. Scaling laws: Prediction error decomposes into irreducible aleatoric uncertainty and reducible epistemic error; the epistemic component follows a power law in dataset size, continuing to shrink even when total loss plateaus. Limits of learning: The aleatoric floor is exactly measurable, and architectures differ markedly in how they approach it: ResNets exhibit clean power-law scaling while Vision Transformers stall in low-data regimes. Soft labels: Oracle posteriors contain learnable structure beyond class labels: training with exact posteriors outperforms hard labels and yields near-perfect calibration. Distribution shift: The oracle computes exact KL divergence of controlled perturbations, revealing that shift type matters more than shift magnitude: class imbalance barely affects accuracy at divergence values where input noise causes catastrophic degradation. Active learning: Exact epistemic uncertainty distinguishes genuinely informative samples from inherently ambiguous ones, improving sample efficiency. Our framework reveals that standard metrics hide ongoing learning, mask architectural differences, and cannot diagnose the nature of distribution shift.


💡 Research Summary

The paper tackles the fundamental question of how close modern neural networks are to their theoretical optimum by constructing a synthetic “oracle” world in which the true posterior p(y|x) is exactly computable. The authors train class‑conditional normalizing flows (specifically the recent T‑arFlow) on realistic image datasets (AFHQ and ImageNet‑64). Each class receives its own flow, yielding tractable class‑conditional densities pθ(x|y). After training, the flows are frozen; Bayes’ rule then provides a closed‑form posterior p(y|x) for any input, up to floating‑point precision.

In this oracle setting the expected cross‑entropy loss of any classifier qθ can be decomposed exactly into two non‑negative terms:

L(qθ) = Eₓ


Comments & Academic Discussion

Loading comments...

Leave a Comment