Mutual information and task-relevant latent dimensionality
Estimating the dimensionality of the latent representation needed for prediction – the task-relevant dimension – is a difficult, largely unsolved problem with broad scientific applications. We cast it as an Information Bottleneck question: what embedding bottleneck dimension is sufficient to compress predictor and predicted views while preserving their mutual information (MI). This repurposes neural MI estimators for dimensionality estimation. We show that standard neural estimators with separable/bilinear critics systematically inflate the inferred dimension, and we address this by introducing a hybrid critic that retains an explicit dimensional bottleneck while allowing flexible nonlinear cross-view interactions, thereby preserving the latent geometry. We further propose a one-shot protocol that reads off the effective dimension from a single over-parameterized hybrid model, without sweeping over bottleneck sizes. We validate the approach on synthetic problems with known task-relevant dimension. We extend the approach to intrinsic dimensionality by constructing paired views of a single dataset, enabling comparison with classical geometric dimension estimators. In noisy regimes where those estimators degrade, our approach remains reliable. Finally, we demonstrate the utility of the method on multiple physics datasets.
💡 Research Summary
The paper tackles the long‑standing problem of determining the minimal latent dimensionality required for a prediction task, which the authors term “task‑relevant latent dimensionality.” They recast the problem as a symmetric Information Bottleneck (SIB): given paired views X (predictor) and Y (target), find the smallest bottleneck dimension k_z such that compressed representations Z_X = f(X) and Z_Y = g(Y) preserve the mutual information (MI) between the original views, i.e., I(Z_X; Z_Y) ≈ I(X; Y). This formulation allows the reuse of modern neural MI estimators, specifically the Donsker‑Varadhan‑based InfoNCE bound, to estimate the required dimension.
The authors first analyze two common critic architectures used in neural MI estimation. A concatenated critic (joint network) lacks an explicit bottleneck, making it unsuitable for dimension inference. A separable (bilinear) critic introduces a k_z‑dimensional bottleneck via two encoders and an inner product, but it cannot capture nonlinear cross‑view dependencies without inflating k_z, leading to systematic over‑estimation of the latent dimension. Empirical studies on synthetic data (joint Gaussian latent and multimodal Gaussian mixture latent) confirm that the separable critic saturates at k_z = K_Z + 1 or higher, whereas the true latent dimension K_Z remains hidden.
To overcome this limitation, the authors propose a hybrid critic: T_hybrid(x, y) = T_θ(
Comments & Academic Discussion
Loading comments...
Leave a Comment