ViLaCD-R1: A Vision-Language Framework for Semantic Change Detection in Remote Sensing

February 09, 2026

Reading time: 1 minute

...

📝 Original Info

Title: ViLaCD-R1: A Vision-Language Framework for Semantic Change Detection in Remote Sensing
ArXiv ID: 2512.23244
Date: 2025-12-29
Authors: Xingwei Ma, Shiyang Feng, Bo Zhang, Bin Wang

📝 Abstract

Remote sensing change detection (RSCD), a complex multiimage inference task, traditionally uses pixel-based operators or encoder-decoder networks that inadequately capture high-level semantics and are vulnerable to nonsemantic perturbations. Although recent multimodal and vision-language model (VLM)-based approaches enhance semantic understanding of change regions by incorporating textual descriptions, they still suffer from challenges such as inaccurate spatial localization, imprecise pixellevel boundary delineation, and limited interpretability. To address these issues, we propose ViLaCD-R1, a two-stage framework comprising a Multi-Image Reasoner (MIR) and a Mask-Guided Decoder (MGD). Specifically, the VLM is trained through supervised fine-tuning (SFT) and reinforcement learning (RL) on block-level dual-temporal inference tasks, taking dual-temporal image patches as input and outputting a coarse change mask. Then, the decoder integrates dual-temporal image features with this coarse mask to predict a precise binary change map. Comprehensive evaluations on multiple RSCD benchmarks demonstrate that ViLaCD-R1 substantially improves true semantic change recognition and localization, robustly suppresses non-semantic variations, and achieves state-of-the-art accuracy in complex real-world scenarios.

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

ViLaCD-R1: A Vision-Language Framework for Semantic Change Detection in Remote Sensing

📝 Original Info

📝 Abstract

📄 Full Content

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

📄 Full Content

Start searching

No results found