From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows

From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Learning unsupervised representations that are both semantically meaningful and stable across runs remains a central challenge in modern representation learning. We introduce entropy-ordered flows (EOFlows), a normalizing-flow framework that orders latent dimensions by their explained entropy, analogously to PCA’s explained variance. This ordering enables adaptive injective flows: after training, one may retain only the top C latent variables to form a compact core representation while the remaining variables capture fine-grained detail and noise, with C chosen flexibly at inference time rather than fixed during training. EOFlows build on insights from Independent Mechanism Analysis, Principal Component Flows and Manifold Entropic Metrics. We combine likelihood-based training with local Jacobian regularization and noise augmentation into a method that scales well to high-dimensional data such as images. Experiments on the CelebA dataset show that our method uncovers a rich set of semantically interpretable features, allowing for high compression and strong denoising.


💡 Research Summary

The paper introduces Entropy‑Ordered Flows (EOFlows), a novel normalizing‑flow framework for unsupervised representation learning that explicitly orders latent dimensions by their “explained entropy”, a quantity analogous to the explained variance used in Principal Component Analysis. By sorting latent variables in decreasing order of information contribution, EOFlows enable a dynamic “core‑detail” split: after training, a user can retain only the top C dimensions as a compact core representation while the remaining dimensions capture fine‑grained detail and noise. The value of C does not need to be fixed during training; it can be chosen at inference time, providing flexibility absent in β‑VAEs, rectangular flows, or M‑flows.

The method builds on three key ideas. First, it leverages Independent Mechanism Analysis (IMA), which states that statistically independent factors correspond to orthogonal columns of the decoder Jacobian everywhere. By adding an IMA‑based contrast term to the standard maximum‑likelihood loss, the model encourages the decoder’s Jacobian to become locally orthogonal, thereby promoting disentanglement. Second, the authors adopt the inflation‑deflation principle: Gaussian noise is injected into the latent space during training, and the model is trained to be insensitive to this perturbation. This regularization mitigates manifold over‑fitting, especially when the intrinsic data dimension is far lower than the ambient dimension (as with images). Third, they propose a Maximum Manifold Likelihood (MML) objective that decomposes the log‑likelihood into three components: a core loss L_C, a detail loss L_D, and a cross‑term L_{C⊥D} that measures manifold mutual information (i.e., entanglement) between the two subspaces. Hyper‑parameters λ_C, λ_D, and λ_{C⊥D} weight these terms, allowing explicit control over the trade‑off between compression (small C), reconstruction fidelity, and disentanglement.

Mathematically, for a data point x, the encoder produces z = f(x) ∈ ℝ^D. After sorting, the first C indices form set C, the rest form D. The core loss L_C(x) = ½‖f_C(x)‖² + log|J_C(f(x))| + const. captures the negative log‑probability of the core sub‑manifold; similarly for L_D. The cross‑term L_{C⊥D}(x) = log|J_C| + log|J_D| – log|J_{C∪D}| quantifies how far the Jacobian deviates from block‑diagonal (i.e., orthogonal) structure. The full MML loss is L_MML = (1+λ_C)L_C + (1+λ_D)L_D + (λ_{C⊥D}−1)L_{C⊥D}. Setting all λ to zero recovers the standard maximum‑likelihood objective.

A practical contribution is a stochastic estimator for the cross‑term using Jacobian‑vector products, which scales to high‑dimensional data without materializing full Jacobians. This makes EOFlows applicable to image‑scale problems.

Experiments on the CelebA face dataset demonstrate several claims. First, the entropy spectrum (cumulative explained entropy versus C) saturates quickly, often before C = 50, suggesting that a small core subspace already captures most of the data’s intrinsic structure. Visual inspection of generated samples shows that the top latent dimensions correspond to semantically meaningful attributes such as hair color, glasses, smile, and pose, while lower dimensions encode texture, background variation, or pure noise. Second, the authors evaluate three reconstruction strategies: (i) zero‑out the detail dimensions (optimal rate‑distortion), (ii) sample them from the prior (optimal perception), and (iii) keep them as learned. The distortion increase between (i) and (ii) matches the theoretical bound of a factor of two from Blau & Michaeli (2019). Third, adding noise during training markedly improves disentanglement and manifold coverage; ablations reveal that without sufficient noise, the Jacobian orthogonality collapses and the entropy ordering becomes unstable.

Beyond compression, the authors argue that EOFlows provide a data‑driven estimate of intrinsic dimensionality: the point where explained entropy reaches the noise entropy marks a practical cutoff, reflecting the fact that structure below the noise floor is statistically indistinguishable from noise. This perspective reframes intrinsic dimension as a measurement‑limited quantity rather than an absolute property.

In summary, EOFlows unify density estimation, manifold learning, and disentangled representation learning within a single normalizing‑flow framework. By ordering latent dimensions via explained entropy, regularizing the decoder Jacobian for orthogonality, and employing noise inflation‑deflation, the method yields flexible, interpretable, and compressible representations that scale to high‑dimensional image data. The paper opens avenues for applying such adaptive core‑detail decompositions to scientific simulators, where the goal is often to discover low‑dimensional effective theories from high‑dimensional raw outputs.


Comments & Academic Discussion

Loading comments...

Leave a Comment