Enhancing Diffusion-Based Quantitatively Controllable Image Generation via Matrix-Form EDM and Adaptive Vicinal Training
Continuous Conditional Diffusion Model (CCDM) is a diffusion-based framework designed to generate high-quality images conditioned on continuous regression labels. Although CCDM has demonstrated clear advantages over prior approaches across a range of datasets, it still exhibits notable limitations and has recently been surpassed by a GAN-based method, namely CcGAN-AVAR. These limitations mainly arise from its reliance on an outdated diffusion framework and its low sampling efficiency due to long sampling trajectories. To address these issues, we propose an improved CCDM framework, termed iCCDM, which incorporates the more advanced \textit{Elucidated Diffusion Model} (EDM) framework with substantial modifications to improve both generation quality and sampling efficiency. Specifically, iCCDM introduces a novel matrix-form EDM formulation together with an adaptive vicinal training strategy. Extensive experiments on four benchmark datasets, spanning image resolutions from $64\times64$ to $256\times256$, demonstrate that iCCDM consistently outperforms existing methods, including state-of-the-art large-scale text-to-image diffusion models (e.g., Stable Diffusion 3, FLUX.1, and Qwen-Image), achieving higher generation quality while significantly reducing sampling cost.
💡 Research Summary
The paper addresses the shortcomings of the Continuous Conditional Diffusion Model (CCDM), which, despite its pioneering role in continuous conditional generative modeling (CCGM), suffers from severe label inconsistency, low sampling efficiency, and high memory overhead due to a heavyweight MLP that maps scalar regression labels to high‑dimensional embeddings. Recent GAN‑based work (CcGAN‑AVAR) has already outperformed CCDM, highlighting the need for a more powerful diffusion framework.
To this end, the authors propose iCCDM, an improved CCDM built on the modern Elucidated Diffusion Model (EDM) paradigm. The core contributions are:
-
Matrix‑form noise conditioning – Instead of the scalar noise level σ used in standard EDM, iCCDM defines a label‑dependent covariance matrix Σ(t, y) and its square root Σ(t, y)¹ᐟ². This enables the forward diffusion process xₜ = x₀ + Σ(t, y)¹ᐟ² ε to inject anisotropic, label‑specific noise.
-
Derivation of forward and reverse SDEs and the probability‑flow ODE (PF‑ODE) in matrix form – The forward SDE dXₜ = Ġ(t, y) dBₜ, with diffusion coefficient Ġ(t, y)=diag
Comments & Academic Discussion
Loading comments...
Leave a Comment