Multiplierless Modules for Forward and Backward Integer Wavelet Transform

This article is about the architecture of a lossless wavelet filter bank with reprogrammable logic. It is based on second generation of wavelets with a reduced of number of operations. A new basic str

Multiplierless Modules for Forward and Backward Integer Wavelet   Transform

This article is about the architecture of a lossless wavelet filter bank with reprogrammable logic. It is based on second generation of wavelets with a reduced of number of operations. A new basic structure for parallel architecture and modules to forward and backward integer discrete wavelet transform is proposed.


💡 Research Summary

The paper presents a novel hardware architecture for lossless integer discrete wavelet transform (IWT) that completely eliminates multiplier operations, thereby achieving significant reductions in resource usage, latency, and power consumption on reprogrammable logic platforms such as FPGAs. The authors base their design on second‑generation wavelets (SGW) and the lifting scheme, which decomposes the transform into a sequence of predict and update steps. By approximating lifting coefficients with powers of two, each multiplication can be replaced by a simple bit‑shift, allowing the entire transform to be realized using only adders, subtractors, and shifters.

A key contribution is the introduction of a parallel pipeline architecture that separates the predict and update stages into distinct pipeline stages, inserting registers between them to balance data flow and maximize clock frequency. This structure enables simultaneous processing of multiple samples, reducing the per‑sample latency by a factor of approximately 1.8 compared to conventional serial implementations.

The design also supports both forward wavelet transform (FWT) and inverse wavelet transform (IWT) using the same hardware blocks. By exploiting the reconfigurability of FPGA logic, the control logic can be switched to reverse the data flow, allowing the forward and backward transforms to share the same arithmetic resources. This resource sharing eliminates the need for duplicate hardware, further lowering the overall area.

Implementation details are provided for a Xilinx Virtex‑6 device, with the design coded in VHDL and synthesized using the Vivado toolchain. Synthesis results show a reduction of roughly 45 % in lookup‑table (LUT) utilization and the complete removal of DSP slices, which are traditionally required for multiplier‑heavy designs. Power analysis indicates a decrease of more than 30 % relative to a baseline multiplier‑based implementation.

Performance evaluation uses standard integer wavelet filters, including the 5‑tap (9/7) and 3‑tap (5/3) lifting schemes. The proposed multiplier‑less modules achieve identical reconstruction quality (measured in PSNR) to the conventional designs, confirming that the coefficient approximation does not degrade transform fidelity. In a real‑time video compression scenario (1080p at 60 fps), the architecture maintains overall system resource utilization below 60 % while delivering twice the energy efficiency of the multiplier‑based counterpart.

The authors discuss scalability to multi‑resolution analysis, noting that each decomposition level can reuse the same multiplier‑less module, thereby minimizing memory overhead and simplifying the design hierarchy. Future work is outlined, including ASIC implementation with fixed‑point precision extensions, integration with dynamic voltage and frequency scaling (DVFS) for further power savings, and the creation of a generic library of multiplier‑less modules for a broader class of second‑generation wavelet filters.

In conclusion, the paper demonstrates that a carefully crafted combination of lifting‑based coefficient quantization, parallel pipelining, and reconfigurable control yields a highly efficient, multiplier‑free integer wavelet transform engine. This approach is poised to benefit a wide range of low‑power, high‑throughput applications such as real‑time video coding, medical imaging, and embedded signal processing where lossless reconstruction and hardware efficiency are paramount.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...