Terminal Velocity Matching
Figure 1: (Left) a conceptual comparison of our method to prior methods. TVM guides the one-step model via terminal velocity rather than initial velocity. (Right) 1-NFE samples on ImageNet at 256 and 512 resolution.
š” Research Summary
Title: Terminal Velocity Matching (TVM) for OneāStep Diffusion Sampling
Abstract and Motivation
Diffusion models have become the deāfacto standard for highāfidelity image synthesis, but their generative process typically requires hundreds to thousands of iterative denoising steps. Recent work has focused on reducing the number of function evaluations (NFE) by designing āoneāstepā or āfewāstepā samplers that predict the final image directly from a noisy latent. These approaches, however, condition solely on the initial velocity (the noise injected at the start of the reverse diffusion). When the initial velocity is the only guide, the sampler often fails to capture the full dynamics of the diffusion trajectory, leading to noticeable quality degradation, especially at high resolutions (e.g., ImageNetā512).
Core Idea: Terminal Velocity Matching
The authors propose a fundamentally different guidance signal: the terminal velocity, i.e., the expected noise level (or velocity) of the diffusion process at its final time step, just before the latent becomes a clean image. By leveraging the timeāreversibility of the diffusion stochastic differential equation (SDE), they can analytically compute or learn an estimate of this terminal velocity and use it to steer a oneāstep denoising network. In practice, the network is trained to minimize a loss that combines two terms: (1) the conventional reconstruction error between the predicted image and the groundātruth, and (2) a velocityāmatching term that penalizes the discrepancy between the networkās output and the preācomputed terminal velocity. This dualāobjective forces the model to respect both the initial noise distribution and the dynamics that would naturally lead to the final clean image.
Methodology
- Loss Formulation ā The total loss is
\