Cross-domain Hyperspectral Image Classification based on Bi-directional Domain Adaptation

Cross-domain Hyperspectral Image Classification based on Bi-directional Domain Adaptation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Utilizing hyperspectral remote sensing technology enables the extraction of fine-grained land cover classes. Typically, satellite or airborne images used for training and testing are acquired from different regions or times, where the same class has significant spectral shifts in different scenes. In this paper, we propose a Bi-directional Domain Adaptation (BiDA) framework for cross-domain hyperspectral image (HSI) classification, which focuses on extracting both domain-invariant features and domain-specific information in the independent adaptive space, thereby enhancing the adaptability and separability to the target scene. In the proposed BiDA, a triple-branch transformer architecture (the source branch, target branch, and coupled branch) with semantic tokenizer is designed as the backbone. Specifically, the source branch and target branch independently learn the adaptive space of source and target domains, a Coupled Multi-head Cross-attention (CMCA) mechanism is developed in coupled branch for feature interaction and inter-domain correlation mining. Furthermore, a bi-directional distillation loss is designed to guide adaptive space learning using inter-domain correlation. Finally, we propose an Adaptive Reinforcement Strategy (ARS) to encourage the model to focus on specific generalized feature extraction within both source and target scenes in noise condition. Experimental results on cross-temporal/scene airborne and satellite datasets demonstrate that the proposed BiDA performs significantly better than some state-of-the-art domain adaptation approaches. In the cross-temporal tree species classification task, the proposed BiDA is more than 3%$\sim$5% higher than the most advanced method. The codes will be available from the website: https://github.com/YuxiangZhang-BIT/IEEE_TCSVT_BiDA.


💡 Research Summary

The paper tackles the persistent problem of spectral shift in cross‑scene or cross‑temporal hyperspectral image (HSI) classification, where training (source) and testing (target) data are collected under different illumination, sensor, or temporal conditions. Conventional domain adaptation (DA) approaches typically employ a single shared feature extractor and align source and target distributions in a unidirectional manner, which often fails when the spectral discrepancy is large and when fine‑grained classes exhibit high inter‑class similarity.
To overcome these limitations, the authors propose a Bi‑directional Domain Adaptation (BiDA) framework that learns independent adaptive spaces for the source and target domains while simultaneously mining inter‑domain correlations. The core of BiDA is a triple‑branch transformer architecture: a Source Branch, a Target Branch, and a Coupled Branch.

  1. Semantic Tokenizer – Instead of the patch‑splitting used in Vision Transformers, the tokenizer learns a spatial‑spectral projection that produces a small set (L = 4) of semantic tokens for each domain. This projection is obtained via a Conv3D‑ReLU‑MaxPool2D block followed by a Conv2D‑ReLU‑MaxPool2D block, a soft‑max attention map, and a 1×1 convolution that yields compact token matrices (T_s) and (T_t). A learnable classification token (T_{cls}) is also introduced for downstream classification and pseudo‑label generation.
  2. Source / Target Branches – Each branch consists of N layers of Multi‑head Self‑attention (MSA) and Feed‑Forward Networks (FFN). They operate on their respective token sets, adding positional embeddings, and produce domain‑specific adaptive representations (\tilde{T}_s) and (\tilde{T}_t). These branches capture intra‑domain spatial‑spectral relationships without interference from the opposite domain.
  3. Coupled Branch with CMCA – The Coupled Multi‑head Cross‑attention (CMCA) module takes the source tokens as queries and target tokens as keys/values, and vice‑versa, in a truly bidirectional fashion. This cross‑attention explicitly models inter‑domain correlations, generating coupled representations (\tilde{T}{t\rightarrow s}) and (\tilde{T}{s\rightarrow t}). By allowing information flow in both directions, the model can compensate for spectral shifts that would otherwise obscure class boundaries.
  4. Bi‑directional Distillation Loss – The coupled branch produces soft probability distributions for both domains. These distributions are used as soft labels to guide the source and target branches via KL‑divergence. This bidirectional distillation enforces that each branch not only learns its own adaptive space but also aligns its predictions with the knowledge distilled from the opposite domain, which is crucial when target labels are unavailable.
  5. Adaptive Reinforcement Strategy (ARS) – Inspired by teacher‑student paradigms, ARS injects two different noise perturbations ((o_1, o_2)) into the inputs and enforces intra‑domain consistency between the noisy views. A Maximum Mean Discrepancy (MMD) term further reduces the distribution gap between source and target token sets. This combination improves robustness to sensor noise and encourages the extraction of generalized intra‑domain features.
    Experimental Validation – The authors evaluate BiDA on four publicly available HSI datasets covering airborne and satellite platforms, with both cross‑temporal (same area, different acquisition dates) and cross‑scene (different geographic regions) scenarios. Compared against state‑of‑the‑art DA methods such as SSW‑ADA, CACL, PASDA, CPDIC, and MLUDA, BiDA consistently achieves higher overall accuracy (average gain ≈ 4.2 %) and notably superior F1 scores in fine‑grained tasks like tree‑species classification (improvements up to 5 %). Ablation studies reveal that removing CMCA, the bidirectional distillation, or ARS each leads to a substantial performance drop (3.5 %, 2.8 %, and 1.9 % respectively), confirming the complementary role of each component.
    Contributions
  • Introduction of a triple‑branch transformer that learns source‑ and target‑specific adaptive spaces independently.
  • Design of CMCA for true bidirectional inter‑domain attention, enabling effective cross‑domain correlation mining.
  • Development of a bidirectional distillation loss that uses coupled‑branch predictions as soft supervision for both domains.
  • Proposal of ARS, a noise‑based intra‑domain consistency regularizer combined with MMD, to reinforce generalized feature extraction under noisy conditions.
    Limitations & Future Work – The current implementation fixes the token count (L = 4) and patch size (13 × 13), which may limit scalability to very high‑resolution HSIs. Moreover, the quality of the soft labels depends on the reliability of the coupled branch; integrating uncertainty estimation or confidence‑aware weighting could further stabilize training. Future directions include dynamic token allocation, multi‑scale attention mechanisms, and coupling with self‑supervised objectives to handle completely unsupervised target domains.

Comments & Academic Discussion

Loading comments...

Leave a Comment