Cross-Domain Few-Shot Learning for Hyperspectral Image Classification Based on Mixup Foundation Model

Cross-Domain Few-Shot Learning for Hyperspectral Image Classification Based on Mixup Foundation Model
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Although cross-domain few-shot learning (CDFSL) for hyper-spectral image (HSI) classification has attracted significant research interest, existing works often rely on an unrealistic data augmentation procedure in the form of external noise to enlarge the sample size, thus greatly simplifying the issue of data scarcity. They involve a large number of parameters for model updates, being prone to the overfitting problem. To the best of our knowledge, none has explored the strength of the foundation model, having strong generalization power to be quickly adapted to downstream tasks. This paper proposes the MIxup FOundation MOdel (MIFOMO) for CDFSL of HSI classifications. MIFOMO is built upon the concept of a remote sensing (RS) foundation model, pre-trained across a large scale of RS problems, thus featuring generalizable features. The notion of coalescent projection (CP) is introduced to quickly adapt the foundation model to downstream tasks while freezing the backbone network. The concept of mixup domain adaptation (MDM) is proposed to address the extreme domain discrepancy problem. Last but not least, the label smoothing concept is implemented to cope with noisy pseudo-label problems. Our rigorous experiments demonstrate the advantage of MIFOMO, where it beats prior arts with up to 14% margin. The source code of MIFOMO is open-sourced in https://github.com/Naeem- Paeedeh/MIFOMO for reproducibility and convenient further study.


💡 Research Summary

This paper introduces MIFOMO (Mixup Foundation Model), a novel framework for cross‑domain few‑shot learning (CDFSL) in hyperspectral image (HSI) classification. The authors identify three major shortcomings of existing CDFSL approaches: (i) reliance on unrealistic data augmentation (typically Gaussian noise) that does not respect the spectral characteristics of HSIs, (ii) a large number of trainable parameters that cause severe over‑fitting when only a handful of labeled samples are available, and (iii) the absence of a hyperspectral‑specific foundation model, which limits the ability to leverage strong pre‑trained representations.

To address these issues, MIFOMO builds upon HyperSIGMA, a foundation model pre‑trained on the massive HyperGlobal‑450K dataset (450 K hyperspectral cubes). HyperSIGMA captures both spatial and spectral structures, providing a robust feature extractor for downstream tasks. The backbone of HyperSIGMA is kept frozen throughout training, and only a lightweight adaptation layer—Coalescent Projection (CP)—is learned. CP consists of a single trainable matrix that connects the key and query projections in the attention mechanism, dramatically reducing the number of trainable parameters compared with methods such as LoRA or full fine‑tuning. This parameter‑efficient fine‑tuning (PEFT) preserves the generalization power of the foundation model while mitigating over‑fitting.

Domain shift is tackled through Mixup Domain Adaptation (MDM). Instead of directly aligning the source and target domains—often impossible when their label spaces are disjoint—the authors generate an intermediate domain by linearly mixing source and target samples using a λ drawn from a Beta distribution. This intermediate domain serves as a bridge, allowing the episodic meta‑learning process to gradually transfer knowledge in two stages (source → intermediate → target). The mixup operation is applied either at the image level or in the embedding space, and it acts as a consistency‑based regularizer that encourages the network to produce similar predictions for interpolated samples.

Because the target domain’s query set is unlabeled, pseudo‑labels are assigned during meta‑training. To prevent label leakage and mitigate the inevitable noise in these pseudo‑labels, the authors incorporate label smoothing, which softens the target distribution and stabilizes training.

Extensive experiments were conducted on several benchmark HSI datasets (e.g., Indian Pines, Pavia University, Salinas) under standard 5‑way 1‑shot and 5‑way 5‑shot protocols. MIFOMO consistently outperformed state‑of‑the‑art discrepancy‑based, adversarial‑based, and contrastive‑based CDFSL methods, achieving up to a 14 % absolute gain in classification accuracy. Moreover, the number of trainable parameters was reduced by more than 70 % relative to competing approaches, and training time remained comparable.

In summary, MIFOMO demonstrates that (1) a hyperspectral‑specific foundation model can provide powerful, transferable representations; (2) a single, trainable CP layer enables highly parameter‑efficient adaptation; and (3) mixup‑based intermediate domain creation, combined with label smoothing, effectively bridges extreme domain gaps and handles noisy pseudo‑labels. The authors release their code and pretrained models, facilitating reproducibility and future research on foundation‑model‑driven CDFSL for remote sensing.


Comments & Academic Discussion

Loading comments...

Leave a Comment