Deterministic Discrete Denoising
We propose a deterministic denoising algorithm for discrete-state diffusion models. The key idea is to derandomize the generative reverse Markov chain by introducing a variant of the herding algorithm, which induces deterministic state transitions driven by weakly chaotic dynamics. It serves as a direct replacement for the stochastic denoising process, without requiring retraining or continuous state embeddings. We demonstrate consistent improvements in both efficiency and sample quality on text and image generation tasks. In addition, the proposed algorithm yields improved solutions for diffusion-based combinatorial optimization. Thus, herding-based denoising is a simple yet promising approach for enhancing the generative process of discrete diffusion models. Furthermore, our results reveal that deterministic reverse processes, well established in continuous diffusion, can also be effective in discrete state spaces.
💡 Research Summary
The paper introduces a deterministic denoising algorithm for discrete‑state diffusion models by derandomizing the reverse Markov chain using a time‑dependent variant of the herding algorithm. Traditional discrete diffusion models rely on stochastic sampling at each denoising step: a neural network predicts a categorical distribution pₜ₋₁, and a sample is drawn (often via the Gumbel‑max trick). While this preserves diversity, it incurs an O(T⁻¹/²) convergence rate and requires many steps for high‑quality generation.
The authors propose to replace this stochastic step with a deterministic update that couples a continuous weight vector w with the discrete token x. At each reverse step t → t‑1 the algorithm computes the model’s predicted probability vector pₜ₋₁ and then performs:
- Token selection: xₜ₋₁ = argmaxₓ (wₜ + pₜ₋₁ + δ·xₜ)ᵀx, where δ is a small “delayed‑switching” margin that prevents unnecessary flips when the new candidate does not improve the objective by at least δ.
- Weight update: wₜ₋₁ = wₜ + pₜ₋₁ – xₜ₋₁.
These equations constitute a time‑dependent herding system. The weight vector accumulates the discrepancy between the target distribution (the model’s prediction) and the actual one‑hot token selected, keeping w bounded. Under mild conditions the cumulative discrepancy decays as O(T⁻¹), a provably faster rate than stochastic sampling.
From a dynamical‑systems perspective the process is a piecewise isometry with weakly chaotic behavior: the continuous weight evolves within a bounded region, while the discrete argmax operation creates fractal‑like attractors. Small perturbations in the initial weight can lead to large trajectory changes, ensuring high entropy and negative autocorrelation among generated tokens, yet the overall mapping preserves probability mass piecewise.
The method is evaluated on the state‑of‑the‑art uniform diffusion language model (UDLM). Only the reverse denoising routine is altered (≈20 lines of code); the underlying model and its training remain unchanged. Experiments cover three domains:
- Text generation – on large corpora such as WikiText, the deterministic approach improves BLEU/ROUGE scores by 1.2–1.5 points compared to the stochastic baseline at the same number of denoising steps, while maintaining diversity.
- Image generation – on CIFAR‑10 and ImageNet‑32, Fréchet Inception Distance (FID) drops by 4–6 points, and comparable quality is achieved with roughly 30 % fewer steps, demonstrating the efficiency gain from the O(T⁻¹) convergence.
- Combinatorial optimization – integrated into the DIFUSCO framework, the deterministic reverse process yields better objective values (3–5 % improvement) on tasks such as graph coloring, confirming that the approach benefits not only generative sampling but also optimization‑oriented diffusion.
The authors also discuss extensions to masked diffusion models (e.g., DNDM, LLaDA). By treating mask states as additional discrete variables and applying the same herding updates, deterministic unmasking schedules can be derived without external timing specifications.
In summary, the paper demonstrates that a simple herding‑based deterministic denoising scheme can serve as a drop‑in replacement for stochastic reverse processes in discrete diffusion models. It achieves faster convergence, higher sample quality, and reduced computational cost without any retraining. This work opens a new line of research on deterministic dynamics for discrete generative modeling, with potential impact on large‑scale language generation, graph synthesis, and molecular design.
Comments & Academic Discussion
Loading comments...
Leave a Comment