Super-résolution non supervisée d'images hyperspectrales de télédétection utilisant un entraînement entièrement synthétique

Super-résolution non supervisée d'images hyperspectrales de télédétection utilisant un entraînement entièrement synthétique
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Hyperspectral single image super-resolution (SISR) aims to enhance spatial resolution while preserving the rich spectral information of hyperspectral images. Most existing methods rely on supervised learning with high-resolution ground truth data, which is often unavailable in practice. To overcome this limitation, we propose an unsupervised learning approach based on synthetic abundance data. The hyperspectral image is first decomposed into endmembers and abundance maps through hyperspectral unmixing. A neural network is then trained to super-resolve these maps using data generated with the dead leaves model, which replicates the statistical properties of real abundances. The final super-resolution hyperspectral image is reconstructed by recombining the super-resolved abundance maps with the endmembers. Experimental results demonstrate the effectiveness of our method and the relevance of synthetic data for training.


💡 Research Summary

The paper addresses the practical challenge of hyperspectral single‑image super‑resolution (SISR) when high‑resolution ground‑truth data are unavailable. The authors propose a fully unsupervised pipeline that leverages synthetic training data generated from a statistical texture model. First, a low‑resolution hyperspectral image (HSI) is decomposed into endmember spectra (S) and abundance maps (A_LR) using a minimum‑volume unmixing algorithm followed by non‑negative least squares. This linear mixing model provides a compact representation of the scene in terms of material signatures and their spatial proportions.

To train a super‑resolution network for the abundance maps, the authors synthesize a large set of paired high‑ and low‑resolution abundance maps using the dead‑leaves model. The model creates images by sequentially dropping randomly sized, rotated rectangular “leaves” at Poisson‑distributed locations, thereby reproducing non‑Gaussian, scale‑invariant statistics observed in real abundance maps. For each material dimension, the same leaf shape and spatial location are kept across all bands, preserving inter‑material spatial coherence. High‑resolution synthetic abundances (A_DL,HR) are generated directly, then blurred with a Gaussian point‑spread function (σ = 4) and down‑sampled bicubically to obtain the corresponding low‑resolution counterparts (A_DL,LR). In total, 5 000 such pairs of size 6 × 307 × 307 are produced.

These synthetic pairs train a Mixed 2D/3D Convolutional Network (MCNet), originally designed for supervised hyperspectral super‑resolution. The network’s 2‑D convolutions capture spatial features, while 3‑D convolutions jointly process spatial and spectral dimensions. The input channel count equals the number of endmembers (N). Training uses an L1 loss between the network output and the synthetic high‑resolution abundances, optimized with Adam (learning rate = 1e‑4) for 200 epochs.

During inference, the real low‑resolution abundance maps A_LR are fed to the trained MCNet, producing estimated high‑resolution abundances A_SR. The final high‑resolution HSI is reconstructed by recombining A_SR with the previously estimated endmembers: H_SR(l,i,j) = ∑_n S(l,n)·A_SR(n,i,j).

Experiments are conducted on the HYDICE Urban dataset (307 × 307 pixels, 162 usable bands). A 4× super‑resolution scenario is simulated with a Gaussian PSF (σ = 4) followed by bicubic down‑sampling. Evaluation metrics include Peak Signal‑to‑Noise Ratio (PSNR), Spectral Angle Mapper (SAM), and ERGAS. The proposed method (named MCNet‑DL) is compared against bicubic interpolation and three state‑of‑the‑art supervised SISR methods: MCNet, SSPSR, and HSISR. All supervised methods are trained on artificially created patches (16 × 76 × 76) to give them optimal conditions, which is unrealistic in practice. MCNet‑DL achieves the best average scores (PSNR = 26.69 dB, SAM = 14.53°, ERGAS = 7.60), surpassing the supervised baselines. Visual inspection shows that MCNet‑DL restores finer details and sharper material boundaries, especially in urban scenes where material heterogeneity is high.

The key contributions are: (1) a fully synthetic training regime that eliminates the need for high‑resolution hyperspectral ground truth; (2) adaptation of the dead‑leaves model to generate realistic abundance maps preserving inter‑material statistics; (3) integration of hyperspectral unmixing with abundance‑map super‑resolution into a complete unsupervised workflow. Limitations include sensitivity of the dead‑leaves parameters to scene characteristics and the propagation of errors from the unmixing step. Future work is suggested on adaptive parameter tuning, Bayesian unmixing to model uncertainty, and multi‑scale extensions of the texture model to further improve robustness and generalization.


Comments & Academic Discussion

Loading comments...

Leave a Comment