Multilevel and Sequential Monte Carlo for Training-Free Diffusion Guidance
We address the problem of accurate, training-free guidance for conditional generation in trained diffusion models. Existing methods typically rely on point-estimates to approximate the posterior score, often resulting in biased approximations that fail to capture multimodality inherent to the reverse process of diffusion models. We propose a sequential Monte Carlo (SMC) framework that constructs an unbiased estimator of $p_θ(y|x_t)$ by integrating over the full denoising distribution via Monte Carlo approximation. To ensure computational tractability, we incorporate variance-reduction schemes based on Multi-Level Monte Carlo (MLMC). Our approach achieves new state-of-the-art results for training-free guidance on CIFAR-10 class-conditional generation, achieving $95.6%$ accuracy with $3\times$ lower cost-per-success than baselines. On ImageNet, our algorithm achieves $1.5\times$ cost-per-success advantage over existing methods.
💡 Research Summary
The paper tackles the long‑standing challenge of performing conditional generation with pretrained unconditional diffusion models without any additional training. Existing “training‑free” guidance methods such as Diffusion Posterior Sampling (DPS), LGD, FreeDoM, and Universal Guidance approximate the intractable marginal likelihood pθ(y|xt) by inserting a point estimate of the denoised image ˆxθ(xt) into the likelihood function. This collapses the full posterior pθ(x0|xt) to a Dirac mass at its mean, which is a severe approximation when the posterior is multimodal—a situation that commonly occurs at high noise levels or when the data distribution itself is a mixture of classes. The resulting bias drives the diffusion trajectory toward a single mode, ignoring uncertainty and often leading to weight degeneracy and poor conditional samples.
To overcome these limitations, the authors propose a two‑pronged solution: (1) a Sequential Monte Carlo (SMC) formulation that treats each diffusion timestep as an SMC transition, and (2) a Multi‑Level Monte Carlo (MLMC) variance‑reduction scheme that makes the otherwise expensive Monte Carlo (MC) estimation of pθ(y|xt) tractable. In the SMC view, particles are initialized from the standard Gaussian prior (xT∼N(0,I)) and propagated backward using the learned reverse kernel r_t(x_{t‑1}|x_t)=pθ(x_{t‑1}|x_t). The target distribution at each step is the conditional reverse distribution μ_t=pθ(x_t|x_{t+1},y). To correct the proposal, particles receive importance weights proportional to the marginal likelihood w_t∝pθ(y|x_t). Hence, an accurate, unbiased estimator of pθ(y|x_t) is essential.
The naïve MC estimator ˆpθ(y|x_t)=1/m∑_{i=1}^m p(y|x^{(i)}_0|t) is unbiased but requires O(m) reverse passes per timestep, leading to O(mT) total cost for a single particle. The authors mitigate this by employing MLMC, which evaluates the estimator at several resolution levels ℓ (different numbers of reverse steps). At each level, a control‑variate Δ_ℓ=E
Comments & Academic Discussion
Loading comments...
Leave a Comment