Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model

February 23, 2026

Reading time: 3 minute

...

📝 Original Info

Title: Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model
ArXiv ID: 2511.13387
Date: 2025-11-17
Authors: Researchers from original ArXiv paper

📝 Abstract

Denoising diffusion models have emerged as a dominant paradigm in image generation. Discretizing image data into tokens is a critical step for effectively integrating images with Transformer and other architectures. Although the Denoising Diffusion Codebook Models (DDCM) pioneered the use of pre-trained diffusion models for image tokenization, it strictly relies on the traditional discrete-time DDPM architecture. Consequently, it fails to adapt to modern continuous-time variants-such as Flow Matching and Consistency Models-and suffers from inefficient sampling in high-noise regions. To address these limitations, this paper proposes the Generalized Denoising Diffusion Codebook Models (gDDCM). We establish a unified theoretical framework and introduce a generic "De-noise and Back-trace" sampling strategy. By integrating a deterministic ODE denoising step with a residual-aligned noise injection step, our method resolves the challenge of adaptation. Furthermore, we introduce a backtracking parameter $p$ and significantly enhance tokenization ability. Extensive experiments on CIFAR10 and LSUN Bedroom datasets demonstrate that gDDCM achieves comprehensive compatibility with mainstream diffusion variants and significantly outperforms DDCM in terms of reconstruction quality and perceptual fidelity.

💡 Deep Analysis

Deep Dive into Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model.

Denoising diffusion models have emerged as a dominant paradigm in image generation. Discretizing image data into tokens is a critical step for effectively integrating images with Transformer and other architectures. Although the Denoising Diffusion Codebook Models (DDCM) pioneered the use of pre-trained diffusion models for image tokenization, it strictly relies on the traditional discrete-time DDPM architecture. Consequently, it fails to adapt to modern continuous-time variants-such as Flow Matching and Consistency Models-and suffers from inefficient sampling in high-noise regions. To address these limitations, this paper proposes the Generalized Denoising Diffusion Codebook Models (gDDCM). We establish a unified theoretical framework and introduce a generic “De-noise and Back-trace” sampling strategy. By integrating a deterministic ODE denoising step with a residual-aligned noise injection step, our method resolves the challenge of adaptation. Furthermore, we introduce a backtracking

📄 Full Content

允许利用高效ODE求解器加速采样。在此基础上，一致性模型 (Consistency Models [3]) 通过蒸馏技术，使得采样路径上任意一点直接映射至终点，实现了单步生成的突破。此外，流匹配模型(Flow Matching [4])及重整流模型(Rectified Flow)通过重新定义速度场和拉直采样轨迹(Straightening Trajectories) ，进一步提升了采样效率。尽管上述变体采用了不同的采样路径设计，但它们在理论上共享相似的边缘分布特性。图像离散化 [5][6]在人工智能之中有着广泛的应用，特别是随着Transformer架构 [7]已成为生成式建模的主流范式，大语言模型(LLMs)中，基于"预测下一个令牌(Next-token Prediction) "的机制已展现出卓越的性能。为了将这一范式迁移至图像生成，通常需要通过"分词(Tokenization) "将连续的图像数据离散化。主流方法如VQVAE [8]和VQGAN [9] 采用量化重构的方式将图像编码为2D令牌网格；而 TiTok [10]则利用QFormer [11]

2.1 扩散模型及其变体扩散模型已成为图像生成领域的主流范式。与依赖生成器和判别器对抗博弈的传统生成对抗网络 (GANs [13])不同，扩散模型通过逆向模拟高斯噪声的扩散过程来重建数据样本。扩散模型在训练稳定性及模式覆盖方面表现出显著优越性。扩散模型的研究已从早期的离散时间步长的去噪扩散模型 [1]演变为基于分数 [2]的连续时间生成模型。现有研究证明，扩散过程可以由随机微分方程描述，且存在一个对应的概率流常微分方程，二者具有相同的边缘分布但对应不同的加噪轨迹。基于此理论框架，一致性模型 [3]通过对ODE轨迹进行蒸馏，实现了单步生成的高效采样。此外，流匹配模型 [4]作为一种更通用的框架被提出。虽然流匹配与分数匹配模型在边缘分布上保持一致，但其构建了不同的概率流向量场，从而产生了差异化的加噪与去噪轨迹。在流匹配的基础上，重整流模型进一步引入了"重整(Reflow)

其次，对于总迭代次数𝑁，我们动态分配每个区间的迭代次数𝑟(𝑘, 𝜎 max )定义为： 𝑟(𝑘, 𝜎 )*+ ) = :

(1 + 𝑘/𝑃(𝜎 )*+ ,/. -1)) .

∑ (1 + 𝑖/𝑃(𝜎 )*+ ,/. -1)) .

/ 𝑁 + 0.5C . (10) 公式( 10)类似于在EDM [15]

从公式( 12

其中𝛴(𝑡) = 𝑠(𝑡)σ(𝑡)。如上文所述，我们只讨论𝑓 ! (𝒙) 是线性函数的情况，令𝑓 ! (𝒙) = 𝑓 ! 𝒙。则可以通过公式 (18)获得对应时刻的样本： 𝑥 /,=/ = 𝑠(𝑡 -𝛥𝑡)𝒙 2 + 𝛴(𝑡 -𝛥𝑡)𝝐.

(

定理1 公式( 18)在𝒪(𝛥𝑡

再将公式( 17

1：𝑡 = 𝑇, 𝒙 / = 𝐼(𝒙 2 , 𝑇) 2：for 𝑘 4： 𝒙 2 , 𝝐 = 𝜃(𝒙 / , 𝑡) 8： 𝐿 ϵ 𝑐 . append(𝑖)

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

A Design Study Process Model for Medical Visualization

A Model of Causal Explanation on Neural Networks for Tabular Data

A Multicenter Benchmark of Multiple Instance Learning Models for Lymphoma Subtyping from HE-stained Whole Slide Images

Start searching

No results found