Blind denoising diffusion models and the blessings of dimensionality

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We analyze, theoretically and empirically, the performance of generative diffusion models based on \emph{blind denoisers}, in which the denoiser is not given the noise amplitude in either the training or sampling processes. Assuming that the data distribution has low intrinsic dimensionality, we prove that blind denoising diffusion models (BDDMs), despite not having access to the noise amplitude, \emph{automatically} track a particular \emph{implicit} noise schedule along the reverse process. Our analysis shows that BDDMs can accurately sample from the data distribution in polynomially many steps as a function of the intrinsic dimension. Empirical results corroborate these mathematical findings on both synthetic and image data, demonstrating that the noise variance is accurately estimated from the noisy image. Remarkably, we observe that schedule-free BDDMs produce samples of higher quality compared to their non-blind counterparts. We provide evidence that this performance gain arises because BDDMs correct the mismatch between the true residual noise (of the image) and the noise assumed by the schedule used in non-blind diffusion models.

💡 Research Summary

This paper provides the first rigorous theoretical justification for the empirical success of blind denoising diffusion models (BDDMs), a class of generative diffusion models that never receive the noise level σ as an input during either training or sampling. The authors start by formalizing the blind denoising objective, where a neural network fθ is trained to minimize the expected squared error over a distribution of noise levels Θ and over clean data samples. By adopting a Bayesian viewpoint, they show that the population optimum f⋆ can be expressed as a conditional expectation of the posterior mean of the clean image given a noisy observation, averaged over a posterior distribution μ(σ|y) on the noise scale. Consequently, the optimal blind score function s⋆(y)=f⋆(y)−y is an integral of the score ∇log pσ(y) weighted by μ(σ|y).

The core theoretical contribution is the derivation of an implicit noise schedule that the blind denoiser automatically tracks during the reverse diffusion process. Assuming that, at each time t, the conditional distribution μ(σ|X_t) concentrates around a single value σ_t, the authors replace the integral in s⋆ with σ_t²∇log p_{σ_t}(X_t). By plugging this drift into the reverse‑time SDE and enforcing that the marginal law of X_t remains the Gaussian‑blurred data distribution p_{σ_t}=p_X∗N(0,σ_t²I), they obtain an ordinary differential equation for σ_t:

σ̇_t = –σ_t + a_t/σ_t,

which solves to the explicit formula

σ_t² = σ_0² e^{–2t} + 2∫_0^t a_s e^{–2(t–s)} ds.

Thus, even though no explicit schedule is supplied, the dynamics of a perfectly trained blind denoiser follow exactly the same schedule that a conventional (non‑blind) diffusion model would require, provided the noise level can be inferred from the current noisy image.

To translate this insight into concrete sampling guarantees, the authors introduce three mild assumptions: (A1) the data support lies in a bounded ball, (A2) the data distribution has low intrinsic dimension k satisfying k² ≪ log d, and (A3) the trained blind denoiser is ε‑close in L² to the optimal blind denoiser. Under these conditions, they decompose the KL divergence between the final noisy target distribution p_{σ_T} and the distribution produced by the algorithm (b p_T) using Girsanov’s theorem. The decomposition isolates three error sources: initialization error, score‑estimation error (the distance between f̂ and f⋆), and, crucially, the error arising from the concentration of μ(σ|X_t) around σ_t. The latter term is shown to vanish at a rate governed by the intrinsic dimension, leading to an overall sample complexity of O(k²/ε²) steps—polynomial in the intrinsic dimension but independent of the ambient dimension d. Moreover, because the implicit schedule is automatically inferred, a constant step size suffices, eliminating the need for careful schedule tuning.

Empirically, the paper validates both the theoretical predictions. On synthetic low‑dimensional manifolds, the estimated σ_t from the blind denoiser matches the closed‑form schedule (4) with negligible deviation. On high‑resolution image datasets (FFHQ, LSUN‑Bedroom), BDDMs trained with the blind objective produce samples of higher perceptual quality (measured by FID, IS, and PSNR) than comparable non‑blind diffusion models that are supplied with a handcrafted schedule. The authors attribute this improvement to the elimination of a subtle mismatch present in non‑blind models: during discretized reverse diffusion, the prescribed noise level σ does not equal the true residual noise present in the current sample, leading to biased denoising. Since BDDMs rely solely on the noisy image, they inherently use the correct residual noise, resulting in more accurate denoising and better samples.

In summary, the paper establishes that (i) when data lie near a low‑dimensional structure, blind denoisers can reliably infer the noise level from a single noisy observation, (ii) this inference yields an implicit noise schedule that aligns perfectly with the theoretical requirements of diffusion sampling, (iii) the resulting algorithm enjoys dimension‑adaptive polynomial‑time sampling guarantees, and (iv) in practice, schedule‑free BDDMs outperform their non‑blind counterparts both quantitatively and qualitatively. The work bridges a gap between empirical observations of blind diffusion models and rigorous stochastic‑analysis, and opens avenues for further research on adaptive, schedule‑free generative modeling in high‑dimensional settings.

Blind denoising diffusion models and the blessings of dimensionality

💡 Research Summary

Comments & Academic Discussion

Leave a Comment