Neural Prior Estimation: Learning Class Priors from Latent Representations
Class imbalance induces systematic bias in deep neural networks by imposing a skewed effective class prior. This work introduces the Neural Prior Estimator (NPE), a framework that learns feature-conditioned log-prior estimates from latent representations. NPE employs one or more Prior Estimation Modules trained jointly with the backbone via a one-way logistic loss. Under the Neural Collapse regime, NPE is analytically shown to recover the class log-prior up to an additive constant, providing a theoretically grounded adaptive signal without requiring explicit class counts or distribution-specific hyperparameters. The learned estimate is incorporated into logit adjustment, forming NPE-LA, a principled mechanism for bias-aware prediction. Experiments on long-tailed CIFAR and imbalanced semantic segmentation benchmarks (STARE, ADE20K) demonstrate consistent improvements, particularly for underrepresented classes. NPE thus offers a lightweight and theoretically justified approach to learned prior estimation and imbalance-aware prediction.
💡 Research Summary
The paper tackles the pervasive problem of class imbalance in deep learning by proposing a novel framework called Neural Prior Estimator (NPE). Traditional logit‑adjustment methods rely on a static estimate of the class prior, usually derived from dataset‑level class counts, and apply a uniform shift to the logits. This approach fails when class frequencies evolve over time, when only partial observations are available, or when the effective prior induced by the learned feature space diverges from the raw counts. NPE addresses these shortcomings by learning a feature‑conditioned estimate of the log‑prior directly from the intermediate representations of the backbone network.
Core Architecture
NPE introduces one or more Prior Estimation Modules (PEMs). Each PEM receives the backbone’s latent feature vector (h(x)\in\mathbb{R}^d) and outputs a vector (u_k(x)\in\mathbb{R}^C) whose dimension matches the number of classes. In the simplest instantiation, a PEM is a single linear layer (W_k h(x)+b_k); more expressive designs (e.g., multi‑layer perceptrons or convolutional heads) are also permissible. The key training signal is a one‑way logistic loss applied only to the true‑class coordinate:
\
Comments & Academic Discussion
Loading comments...
Leave a Comment