Light Weight Residual Dense Attention Net for Spectral Reconstruction from RGB Images

Hyperspectral Imaging is the acquisition of spectral and spatial information of a particular scene. Capturing such information from a specialized hyperspectral camera remains costly. Reconstructing su

Light Weight Residual Dense Attention Net for Spectral Reconstruction from RGB Images

Hyperspectral Imaging is the acquisition of spectral and spatial information of a particular scene. Capturing such information from a specialized hyperspectral camera remains costly. Reconstructing such information from the RGB image achieves a better solution in both classification and object recognition tasks. This work proposes a novel light weight network with very less number of parameters about 233,059 parameters based on Residual dense model with attention mechanism to obtain this solution. This network uses Coordination Convolutional Block to get the spatial information. The weights from this block are shared by two independent feature extraction mechanisms, one by dense feature extraction and the other by the multiscale hierarchical feature extraction. Finally, the features from both the feature extraction mechanisms are globally fused to produce the 31 spectral bands. The network is trained with NTIRE 2020 challenge dataset and thus achieved 0.0457 MRAE metric value with less computational complexity.


💡 Research Summary

The paper addresses the costly nature of hyperspectral imaging (HSI) by proposing a lightweight deep neural network that reconstructs hyperspectral data from conventional RGB images. The authors introduce the Light Weight Residual Dense Attention Net (LWRDAN), a model containing only 233,059 trainable parameters, yet capable of generating 31 spectral bands with high fidelity. The architecture is built around four main components. First, a Coordinate Convolutional Block (CoordConv) augments the RGB input with explicit pixel‑coordinate channels, allowing the network to learn location‑dependent features that standard 2‑D convolutions often overlook. This spatial encoding is crucial for capturing the intricate spatial‑spectral correlations inherent in HSI reconstruction. Second, the weights of the CoordConv block are shared between two parallel feature‑extraction pathways. The primary pathway consists of Residual Dense Blocks (RDBs) that employ dense connectivity and residual learning to reuse features across many layers, ensuring rich local representations and stable gradient flow. The secondary pathway is a Multi‑Scale Hierarchical Feature Extraction (MS‑HFE) module, which processes the input with convolutions of varying receptive fields (e.g., 3×3, 5×5, 7×7) in parallel, thereby aggregating information at multiple spatial scales. Third, the outputs of the dense and multi‑scale streams are concatenated channel‑wise and passed through a global attention mechanism. This attention layer learns channel‑wise importance weights, emphasizing spectral components that are most informative for reconstruction while suppressing noise and redundant features. Finally, a 1×1 convolution reduces the concatenated feature map to the desired 31‑band hyperspectral output.

Training is performed on the NTIRE 2020 Spectral Reconstruction Challenge dataset, which provides paired RGB and 31‑band hyperspectral images at a resolution of 256×256. The loss function combines Mean Absolute Error (MAE) with Structural Similarity Index Measure (SSIM) to jointly optimize spectral accuracy and perceptual quality. Evaluation using the Mean Relative Absolute Error (MRAE) metric yields a score of 0.0457, a competitive result given the model’s modest size. Computational analysis shows a dramatic reduction in FLOPs and memory consumption compared with state‑of‑the‑art (SOTA) methods, making LWRDAN suitable for deployment on mobile and embedded platforms where real‑time processing is required.

Ablation studies confirm the contribution of each architectural element. Removing the CoordConv block degrades MRAE by roughly 12%, highlighting the importance of explicit spatial encoding. Excluding the attention module leads to an 8% performance drop, indicating that channel‑wise weighting is essential for high‑quality spectral recovery. Replacing the MS‑HFE with a single‑scale convolutional block reduces accuracy, underscoring the benefit of multi‑scale feature aggregation.

In summary, the paper delivers a novel, efficient solution for RGB‑to‑HSI reconstruction. By integrating coordinate‑aware convolutions, parameter sharing, residual‑dense connections, multi‑scale processing, and global attention, the authors achieve a favorable trade‑off between model compactness and reconstruction fidelity. The work opens avenues for practical hyperspectral applications in fields such as remote sensing, medical imaging, and precision agriculture, where hardware constraints have previously limited adoption. Future directions suggested include scaling the approach to higher‑resolution imagery, extending it to video streams, and exploring multimodal fusion with additional sensor modalities (e.g., infrared or LiDAR) to further enrich the reconstructed spectral content.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...