HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models

HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Diffusion models have demonstrated significant applications in the field of image generation. However, their high computational and memory costs pose challenges for deployment. Model quantization has emerged as a promising solution to reduce storage overhead and accelerate inference. Nevertheless, existing quantization methods for diffusion models struggle to mitigate outliers in activation matrices during inference, leading to substantial performance degradation under low-bit quantization scenarios. To address this, we propose HQ-DM, a novel Quantization-Aware Training framework that applies Single Hadamard Transformation to activation matrices. This approach effectively reduces activation outliers while preserving model performance under quantization. Compared to traditional Double Hadamard Transformation, our proposed scheme offers distinct advantages by seamlessly supporting INT convolution operations while preventing the amplification of weight outliers. For conditional generation on the ImageNet 256x256 dataset using the LDM-4 model, our W4A4 and W4A3 quantization schemes improve the Inception Score by 12.8% and 467.73%, respectively, over the existing state-of-the-art method.


💡 Research Summary

The paper introduces HQ‑DM, a novel quantization‑aware training (QAT) framework designed to enable low‑bit (4‑bit weight and 4‑bit or 3‑bit activation) diffusion models without the severe quality loss typically observed in such settings. The core problem addressed is the presence of heavy‑tailed activation distributions and outliers that shift dramatically across denoising timesteps, causing clipping and rounding errors that accumulate over the multi‑step sampling process.

HQ‑DM mitigates these issues by applying a single Hadamard transformation to the activation tensors before quantization. Because the Hadamard matrix is orthogonal, it rotates the data into a sub‑space where extreme values are spread across many dimensions, effectively “diffusing” outliers. Unlike prior work that employed a double Hadamard transform on both weights and activations, the single‑transform approach limits the operation to activations, preserving the ability to perform integer‑only convolution and avoiding the creation of new weight outliers.

During QAT, the authors adopt the Straight‑Through Estimator (STE) to make rounding and clamping differentiable. They further incorporate LoRA‑based low‑rank adaptation, freezing the original pretrained weights and learning only a pair of low‑rank matrices (B·A). This dramatically reduces the number of trainable parameters while focusing updates on the most expressive sub‑space. Crucially, HQ‑DM introduces timestep‑wise learnable quantization scales, allowing the quantization step size to adapt dynamically to the changing variance of the noise‑conditioned inputs at each diffusion step. This addresses the non‑stationarity that hampers earlier PTQ and QAT methods.

The experimental evaluation uses the Latent Diffusion Model (LDM‑4) on the ImageNet 256×256 benchmark. Compared with the state‑of‑the‑art EfficientDM, HQ‑DM achieves:

  • W4A4 (4‑bit weight, 4‑bit activation) – Inception Score (IS) improvement of 12.8 %.
  • W4A3 (4‑bit weight, 3‑bit activation) – IS improvement of 467.73 %, a dramatic gain that demonstrates the method’s robustness even under aggressive activation quantization.

Additional tests on other diffusion variants (e.g., LDM‑2, Stable‑Diffusion) confirm consistent gains, and the integer‑only implementation yields notable reductions in memory footprint and inference latency, making the approach attractive for edge devices.

The paper’s contributions are fourfold:

  1. First introduction of a single Hadamard‑based distillation framework that reduces activation outliers prior to quantization.
  2. Hardware‑friendly transformation that maintains integer convolution compatibility and prevents weight outlier amplification.
  3. Integration of LoRA‑based distillation with timestep‑adaptive quantization scales, enabling effective QAT for diffusion models.
  4. Extensive empirical validation showing superior performance across multiple datasets and quantization levels, especially in low‑bit activation regimes.

Limitations include the reliance on dimensions that are powers of two (requiring block‑diagonal padding for other sizes) and the current focus on image generation; further work is needed to assess applicability to text‑to‑image, video, or other multimodal diffusion tasks. Nonetheless, HQ‑DM represents a significant step toward practical, low‑bit diffusion model deployment.


Comments & Academic Discussion

Loading comments...

Leave a Comment