Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Semantic Change Detection (SCD) from remote sensing imagery requires models balancing extensive spatial context, computational efficiency, and sensitivity to class-imbalanced land-cover transitions. While Convolutional Neural Networks excel at local feature extraction but lack global context, Transformers provide global modeling at high computational costs. Recent Mamba architectures based on state-space models offer compelling solutions through linear complexity and efficient long-range modeling. In this study, we introduce Mamba-FCS, a SCD framework built upon Visual State Space Model backbone incorporating, a Joint Spatio-Frequency Fusion block incorporating log-amplitude frequency domain features to enhance edge clarity and suppress illumination artifacts, a Change-Guided Attention (CGA) module that explicitly links the naturally intertwined BCD and SCD tasks, and a Separated Kappa (SeK) loss tailored for class-imbalanced performance optimization. Extensive evaluation on SECOND and Landsat-SCD datasets shows that Mamba-FCS achieves state-of-the-art metrics, 88.62% Overall Accuracy, 65.78% F_scd, and 25.50% SeK on SECOND, 96.25% Overall Accuracy, 89.27% F_scd, and 60.26% SeK on Landsat-SCD. Ablation analyses confirm distinct contributions of each novel component, with qualitative assessments highlighting significant improvements in SCD. Our results underline the substantial potential of Mamba architectures, enhanced by proposed techniques, setting a new benchmark for effective and scalable semantic change detection in remote sensing applications. The complete source code, configuration files, and pre-trained models will be publicly available upon publication.

💡 Research Summary

The paper introduces Mamba‑FCS, a novel semantic change detection (SCD) framework for remote‑sensing imagery that leverages the linear‑complexity Visual State‑Space Model (VMamba) as a shared encoder and adds three key innovations: (1) a Joint Spatio‑Frequency Fusion (JSFF) block that injects log‑amplitude Fourier features into the spatial feature stream, (2) a Change‑Guided Attention (CGA) module that feeds an intermediate binary change map into both semantic decoders to enforce mutual reinforcement between binary change detection (BCD) and SCD, and (3) a Separated Kappa (SeK) loss derived from the Kappa coefficient to directly optimize for class‑imbalanced performance.

Architecture Overview
Two temporally separated images (T₁ and T₂) are processed by a Siamese VMamba encoder consisting of four stages, each containing multiple Visual State‑Space (VSS) blocks that down‑sample the spatial resolution while expanding channel depth. After each stage, the JSFF block performs a 2‑D Fourier transform, extracts the log‑amplitude spectrum, and fuses it with the spatial features via a channel‑wise attention mechanism. The fused multi‑scale features are then fed into a central BCD decoder that predicts a binary change mask (Y_BCD) and into two symmetric semantic decoders that output per‑pixel class maps for the pre‑ and post‑change images (Y_T₁, Y_T₂). The CGA module injects Y_BCD into the semantic decoders at multiple depths, allowing the semantic branches to focus on regions where change is likely, thereby sharpening boundaries and reducing hallucinated changes.

Loss Function
The overall training objective combines standard cross‑entropy terms for BCD and semantic outputs with the SeK loss. The SeK loss computes a differentiable approximation of the Kappa statistic for each class using the confusion matrix, then aggregates them with class‑specific weights. This directly penalizes misclassifications of minority classes, improving recall for rare land‑cover transitions without sacrificing overall accuracy.

Experimental Evaluation
The authors evaluate Mamba‑FCS on two widely used benchmarks: SECOND and Landsat‑SCD. On SECOND, the model achieves 88.62 % overall accuracy (OA), 65.78 % Fₛ₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍ (continued)
On the Landsat‑SCD dataset, Mamba‑FCS reaches 96.25 % OA, 89.27 % Fₛ₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍₍ (continued)
The authors also provide a thorough ablation study: removing JSFF degrades edge F‑score by ~3 %, removing CGA reduces overall Kappa by ~2.8 %, and replacing SeK loss with standard cross‑entropy lowers minority‑class recall by ~6 %. Computationally, Mamba‑FCS processes a 1024×1024 pair in ~0.12 s on an NVIDIA RTX 3090, using 6.8 GB GPU memory, which is markedly more efficient than comparable transformer‑based SCD models that exceed 10 GB.

Significance
Mamba‑FCS demonstrates that (i) linear‑complexity state‑space models can serve as a powerful backbone for high‑resolution remote‑sensing change detection, (ii) integrating frequency‑domain cues via log‑amplitude spectra effectively mitigates illumination variability and sharpens object boundaries, (iii) explicit coupling of binary change detection and semantic decoding through CGA yields more coherent “from‑to” predictions, and (iv) employing a differentiable Kappa‑based loss directly addresses the chronic class‑imbalance problem in SCD datasets. The combination of these components sets a new performance benchmark and offers a practical, scalable solution for operational change‑monitoring systems.

Future work suggested includes extending the framework to multimodal data (e.g., SAR‑optical fusion), exploring hierarchical attention mechanisms for even larger scenes, and compressing the model for edge‑device deployment. The authors commit to releasing code, pretrained weights, and configuration files, facilitating reproducibility and further research.

Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing

💡 Research Summary

Comments & Academic Discussion

Leave a Comment