Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The $β$-VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $β$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel’s semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $β> 1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $λβ$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $λ$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $λ> 0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $β$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.

💡 Research Summary

The paper investigates a long‑standing empirical puzzle in β‑Variational Autoencoders (β‑VAEs): as the regularization weight β is increased, disentanglement metrics such as Mutual Information Gap (MIG) and Separated Attribute Predictability (SAP) typically improve up to a moderate β and then sharply deteriorate. Prior explanations have attributed this “high‑β collapse” to loss of reconstruction fidelity or overly aggressive regularization, but a precise mechanistic account has been missing.

Theoretical framework
The authors adopt a linear‑Gaussian model as a tractable proxy for deep VAEs. Ground‑truth factors V∈ℝ^s are drawn from a diagonal Gaussian with covariance Σ_V. Observations Y∈ℝ^n are generated by a linear mixing matrix Γ plus isotropic noise. The encoder is an affine Gaussian map X = B Y + W with gain matrix B and noise covariance Σ_W; the decoder reconstructs Ŷ = A X + Z with gain A and noise Σ_Z. Under this model the joint covariance of (X,V) is fully determined by B Γ Σ_V, and the mutual information I(X;V) is a monotone function of the squared correlation S = (B Γ Σ_V)^2 / ((B Σ_Y Bᵀ+Σ_W) diag(Σ_V)).

β‑VAE stationary analysis
The β‑VAE objective is the ELBO with a β‑weighted KL term:
L_β = E_q

Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs

💡 Research Summary

Comments & Academic Discussion

Leave a Comment