A Diffusive Classification Loss for Learning Energy-based Generative Models
Score-based generative models have recently achieved remarkable success. While they are usually parameterized by the score, an alternative way is to use a series of time-dependent energy-based models (EBMs), where the score is obtained from the negative input-gradient of the energy. Crucially, EBMs can be leveraged not only for generation, but also for tasks such as compositional sampling or building Boltzmann Generators via Monte Carlo methods. However, training EBMs remains challenging. Direct maximum likelihood is computationally prohibitive due to the need for nested sampling, while score matching, though efficient, suffers from mode blindness. To address these issues, we introduce the Diffusive Classification (DiffCLF) objective, a simple method that avoids blindness while remaining computationally efficient. DiffCLF reframes EBM learning as a supervised classification problem across noise levels, and can be seamlessly combined with standard score-based objectives. We validate the effectiveness of DiffCLF by comparing the estimated energies against ground truth in analytical Gaussian mixture cases, and by applying the trained models to tasks such as model composition and Boltzmann Generator sampling. Our results show that DiffCLF enables EBMs with higher fidelity and broader applicability than existing approaches.
💡 Research Summary
The paper addresses a long‑standing difficulty in training energy‑based generative models (EBMs): the need to estimate an intractable normalizing constant, which makes maximum‑likelihood training prohibitively expensive, and the “mode‑blindness” of score‑matching methods that fail to capture relative weights of disjoint modes. The authors propose a novel objective called Diffusive Classification (DiffCLF) that reframes the problem of learning a time‑dependent energy function as a supervised multi‑class classification task across several diffusion time steps.
In the standard diffusion framework, data are corrupted by additive Gaussian noise with a time‑dependent scale γ(t), yielding marginal distributions p_t(y). The authors parameterize a family of EBMs p_θ^{t}(y)=exp(−U_θ^{t}(y)+F_θ^{t}) where U_θ^{t} is the energy and F_θ^{t}=−log Z_θ^{t} acts as a learnable bias rather than a strict normalizer. For a set of N uniformly sampled time points {t_i}, each sample y_i drawn from p_{t_i} is assigned a class label i. The classifier’s posterior is
p_θ(c=i|y)=p_θ^{t_i}(y) / ∑{j=1}^{N} p_θ^{t_j}(y).
The DiffCLF loss is the categorical cross‑entropy of this classifier:
L_clf(θ)=−(1/N) ∑{i=1}^{N} E_{p_{t_i}}
Comments & Academic Discussion
Loading comments...
Leave a Comment