CompSRT: Quantization and Pruning for Image Super Resolution Transformers

CompSRT: Quantization and Pruning for Image Super Resolution Transformers
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Model compression has become an important tool for making image super resolution models more efficient. However, the gap between the best compressed models and the full precision model still remains large and a need for deeper understanding of compression theory on more performant models remains. Prior research on quantization of LLMs has shown that Hadamard transformations lead to weights and activations with reduced outliers, which leads to improved performance. We argue that while the Hadamard transform does reduce the effect of outliers, an empirical analysis on how the transform functions remains needed. By studying the distributions of weights and activations of SwinIR-light, we show with statistical analysis that lower errors is caused by the Hadamard transforms ability to reduce the ranges, and increase the proportion of values around $0$. Based on these findings, we introduce CompSRT, a more performant way to compress the image super resolution transformer network SwinIR-light. We perform Hadamard-based quantization, and we also perform scalar decomposition to introduce two additional trainable parameters. Our quantization performance statistically significantly surpasses the SOTA in metrics with gains as large as 1.53 dB, and visibly improves visual quality by reducing blurriness at all bitwidths. At $3$-$4$ bits, to show our method is compatible with pruning for increased compression, we also prune $40%$ of weights and show that we can achieve $6.67$-$15%$ reduction in bits per parameter with comparable performance to SOTA.


💡 Research Summary

The paper introduces CompSRT, a compression framework designed for the lightweight image‑super‑resolution transformer SwinIR‑light. The authors focus on two complementary techniques: (1) a Hadamard‑based preprocessing step that reshapes weight and activation distributions, and (2) a scalar‑decomposition scheme that adds two learnable parameters to the quantization scale and zero‑point. By applying the Hadamard transform to every weight and activation tensor (after zero‑padding to a power‑of‑two dimension), the authors empirically demonstrate—through Shapiro‑Wilk normality tests and one‑sided Wilcoxon signed‑rank tests—that the transformed tensors become statistically more Gaussian, exhibit reduced dynamic range, and contain a higher proportion of values near zero (ε‑band). These properties directly mitigate quantization error, especially at extreme low‑bit settings (2–4 bits), because a narrower clipping range and more “flat” distribution lead to smaller quantization step sizes and fewer outliers.

The scalar‑decomposition idea re‑parameterizes the conventional quantization scalar S = (u‑l)/(2^b‑1) and offset l as S′ = S + α and l′ = l + β, where α and β are trainable from zero during the fine‑tuning phase. This extra degrees‑of‑freedom allow the optimizer to correct bias introduced by a fixed clipping range and to provide an alternative gradient path, resulting in measurable PSNR and SSIM gains (≈ 0.12 dB and 0.004 SSIM in an ablation on a 2‑bit × 2 model).

CompSRT integrates these two ideas into a post‑training quantization (PTQ) pipeline similar to 2DQuant and CondiQuant. After searching for optimal clipping bounds, the model undergoes fake‑quantization with the modified scale/offset and the Hadamard transform applied both before and after each quantized operation. The authors also explore the combination of quantization with magnitude‑based weight pruning. Leveraging the concentration of values around zero after the Hadamard transform, they prune 40 % of the smallest‑magnitude weights per layer at 3‑ and 4‑bit precisions. No additional retraining is required; only the quantization parameters are fine‑tuned. This yields a reduction of 6.67 % (3‑bit) and 15 % (4‑bit) in bits‑per‑parameter while maintaining performance comparable to the state‑of‑the‑art CondiQuant.

Extensive experiments on standard SR benchmarks (DIV2K for training, Set5, Set14, Manga109 for testing) show that CompSRT consistently outperforms the current PTQ SOTA across all scales (×2, ×3, ×4) and bit‑widths. The most striking result is a +1.53 dB PSNR and +0.03 SSIM improvement over CondiQuant on Manga109 at 2‑bit × 4, accompanied by visibly sharper reconstructions and reduced blurriness. All statistical tests confirm significance (p < 0.05) and large effect sizes (Cohen’s d > 0.8).

In summary, CompSRT provides a theoretically grounded and empirically validated method to compress image‑super‑resolution transformers to ultra‑low‑bit representations without sacrificing visual quality. Its two‑pronged approach—distribution‑flattening via Hadamard transforms and flexible scalar decomposition—offers a practical recipe for future hardware‑friendly SR models, and opens avenues for further research into higher pruning ratios, other transformer variants, and real‑time deployment on edge devices.


Comments & Academic Discussion

Loading comments...

Leave a Comment