Terminal Velocity Matching

Terminal Velocity Matching

Figure 1: (Left) a conceptual comparison of our method to prior methods. TVM guides the one-step model via terminal velocity rather than initial velocity. (Right) 1-NFE samples on ImageNet at 256 and 512 resolution.


šŸ’” Research Summary

Title: Terminal Velocity Matching (TVM) for One‑Step Diffusion Sampling

Abstract and Motivation
Diffusion models have become the de‑facto standard for high‑fidelity image synthesis, but their generative process typically requires hundreds to thousands of iterative denoising steps. Recent work has focused on reducing the number of function evaluations (NFE) by designing ā€œone‑stepā€ or ā€œfew‑stepā€ samplers that predict the final image directly from a noisy latent. These approaches, however, condition solely on the initial velocity (the noise injected at the start of the reverse diffusion). When the initial velocity is the only guide, the sampler often fails to capture the full dynamics of the diffusion trajectory, leading to noticeable quality degradation, especially at high resolutions (e.g., ImageNet‑512).

Core Idea: Terminal Velocity Matching
The authors propose a fundamentally different guidance signal: the terminal velocity, i.e., the expected noise level (or velocity) of the diffusion process at its final time step, just before the latent becomes a clean image. By leveraging the time‑reversibility of the diffusion stochastic differential equation (SDE), they can analytically compute or learn an estimate of this terminal velocity and use it to steer a one‑step denoising network. In practice, the network is trained to minimize a loss that combines two terms: (1) the conventional reconstruction error between the predicted image and the ground‑truth, and (2) a velocity‑matching term that penalizes the discrepancy between the network’s output and the pre‑computed terminal velocity. This dual‑objective forces the model to respect both the initial noise distribution and the dynamics that would naturally lead to the final clean image.

Methodology

  1. Loss Formulation – The total loss is
    \