From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of objectives in the limit of infinitesimal discretization steps, linking entropic RL methods (GFlowNets) with continuous-time objects (partial differential equations and path space measures). We further show that an appropriate choice of coarse time discretization during training allows greatly improved sample efficiency and the use of time-local objectives, achieving competitive performance on standard sampling benchmarks with reduced computational cost.

💡 Research Summary

The paper addresses the problem of training neural stochastic differential equations (neural SDEs), also known as diffusion models, to sample from a Boltzmann distribution when only the unnormalized energy function E(x) is available and no target samples can be drawn. Traditional Monte‑Carlo methods rely on repeated evaluations of the energy, while recent deep generative approaches model a gradual transformation from a simple prior (e.g., a Gaussian) to the target distribution via a stochastic flow. When data are unavailable, the generative flow itself must be simulated during training, which raises the question of how to design efficient training objectives.

The authors first establish a rigorous connection between continuous‑time SDEs and discrete‑time Markov decision processes (MDPs). By discretizing the SDE with an Euler‑Maruyama scheme, a forward policy π⁻→ₙ(a|x) and a reverse policy π←ₙ(a|x) are obtained, defining forward and backward Markov chains. They prove (Propositions 3.1 and 3.3) that as the time step Δt→0, the path‑space measures induced by these discrete policies converge to the true continuous‑time path measures P and Q. Consequently, divergences defined on the discrete trajectories (global KL, Trajectory‑Balance (TB), Variance‑Gradient (LV)) become Riemann‑sum approximations of the continuous‑time path‑space KL.

The paper then distinguishes between global and local training objectives. Global objectives compare entire trajectories, while local objectives enforce consistency on each transition. The detailed‑balance (DB) divergence (Equation 8) penalizes the mismatch between forward and backward transition kernels weighted by marginal densities. The authors show (Proposition 3.4) that, in the infinitesimal limit, the DB condition is equivalent to the Fokker‑Planck PDE governing the evolution of the marginal density under the forward SDE, and similarly for the reverse process. This establishes a direct link between GFlowNet‑style entropy‑regularized reinforcement learning and the PDE description of diffusion.

A key practical contribution is the proposal to use a coarse time discretization during training. Conventional diffusion model training mirrors the fine discretization used at inference time (often thousands of steps), which makes global objectives computationally expensive. The authors demonstrate that by training with far fewer steps (e.g., 10–20) and relying solely on the local DB objective, one can still enforce the necessary forward–reverse consistency. Empirically, this “coarse‑training” strategy yields comparable or better performance on standard sampling benchmarks—including image generation (CIFAR‑10, ImageNet‑32) and physics‑based models (Ising lattice, molecular conformations)—while reducing training compute by up to 70 % and memory consumption dramatically.

The two main theoretical insights are: (1) a precise asymptotic equivalence between continuous‑time diffusion samplers and discrete‑time stochastic control policies, unifying reinforcement learning, variational inference, and diffusion modeling under a single measure‑transport framework; and (2) the realization that the discretization granularity can be decoupled from inference, allowing the use of inexpensive, locally‑defined losses without sacrificing the global reversibility property.

Overall, the work provides both a solid mathematical foundation for understanding diffusion‑based samplers as optimal transport problems in path space and a practical recipe for faster, more resource‑efficient training. It opens the door to scaling neural SDEs to higher‑dimensional, energy‑based models where data are scarce or unavailable, and suggests that future research can further explore adaptive discretization schedules, hybrid global‑local objectives, and extensions to stochastic control with constraints.

From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training

💡 Research Summary

Comments & Academic Discussion

Leave a Comment