Accurately and swiftly assessing damage from conflicts is crucial for humanitarian aid and regional stability. In conflict zones, damaged zones often share similar architectural styles, with damage typically covering small areas and exhibiting blurred boundaries. These characteristics lead to limited data, annotation difficulties, and significant recognition challenges, including high intra-class similarity and ambiguous semantic changes. To address these issues, we introduce a pre-trained DINOv3 model and propose a multi-scale cross-attention difference siamese network (MC-DiSNet). The powerful visual representation capability of the DINOv3 backbone enables robust and rich feature extraction from bi-temporal remote sensing images. The multi-scale cross-attention mechanism allows for precise localization of subtle semantic changes, while the difference siamese structure enhances inter-class feature discrimination, enabling fine-grained semantic change detection. Furthermore, a simple yet powerful lightweight decoder is designed to generate clear detection maps while maintaining high efficiency. We also release a new Gaza-change dataset containing high-resolution satellite image pairs from 2023-2024 with pixel-level semantic change annotations. It is worth emphasizing that our annotations only include semantic pixels of changed areas. We evaluated our method on the Gaza-Change and two classical datasets: the SECOND and Landsat-SCD datasets. Experimental results demonstrate that our proposed approach effectively addresses the MCD task, and its outstanding performance paves the way for practical applications in rapid damage assessment across conflict zones.
Accurately and timely assessing damage zones in conflict areas is a critical task with profound implications for humanitarian assistance, disaster relief, and post-conflict reconstruction Qing, Ming, Wen, Weng, Xu, Chen, Zhang and Zeng (2022); Holail, Saleh, Xiao, Zahran, Xia and Li (2025). Similar to building damage assessment in natural disasters Han, Yang, Lu, Huang and Liu (2025), remote sensing images, particularly high-resolution satellite data, have become an indispensable tool for large-scale monitoring of these changes. However, whereas previous building damage assessments caused by natural disasters focused more on binary changes, damage assessment in conflict areas may place greater emphasis on fine-grained types. Therefore, the core task in conflict zones is semantic change detection (SCD) of buildings. challenge persists: these data-hungry deep models require vast amounts of meticulously annotated data, which is notoriously difficult and expensive to obtain for remote sensing applications.
To address the aforementioned challenges, we introduce the multi-class change detection (MSD). Distinct from conventional semantic change detection (SCD), MCD eliminates the need for annotating entire semantic regions, instead focusing solely on change masks. This framework represents a direct extension of binary change detection (BCD). While significantly reducing annotation difficulty and time requirements, this new paradigm consequently increases the challenge of limited target region proportions. This necessitates substantial improvements in the model’s capability to extract features from small target areas.
Based on the above analysis, we identify four major challenges in framing conflict-induced damage assessment as a MCD task: (1) Inherent data scarcity: Limited by the geographical extent of conflict zones and the number of destroyed areas available for training. (2) Small target regions: MCD focuses exclusively on damaged areas, resulting in minimal semantic region coverage. (3) Subtle and ambiguous changes: Infrastructure damage in conflict zones varies significantly in severity and extent, particularly making minor damage difficult to detect. (4) High inter-class similarity: Different facility categories within the same region may share similar characteristics, making fine-grained damage assessment particularly challenging for semantic change detection.
To bridge this gap, we draw inspiration from the recent success of foundational vision models. We argue that leveraging their rich, pre-trained representations is key to overcoming data scarcity and recognizing subtle semantic changes. In this paper, we propose a novel DINOv3driven siamese network for MCD. Specifically, we adopt the DINOv3 Siméoni, Vo, Seitzer, Baldassarre, Oquab, Jose, Khalidov, Szafraniec, Yi, Ramamonjisoa et al. (2025) model pre-trained on satellite data, with ConvNeXt Liu, Mao, Wu, Feichtenhofer, Darrell and Xie (2022) as its main backbone architecture, which helps reduce the distribution discrepancy between the pre-training data and the actual application data. Then, we propose a multi-scale attention mechanism to extract and enhance features at different levels, aiming to capture the subtle and ambiguous change features of infrastructure damage. Furthermore, we perform an absolute value differential operation on the obtained semantic-rich feature maps to increase inter-class feature differences. Finally, a carefully designed decoder network with attention enhancement is used to generate clear semantic change detection maps. We also release a building semantic change detection dataset of the Gaza area from 2023 to 2024. As shown in Figure 2, we present panoramic remote sensing images of the Gaza Strip captured by satellites. To the best of our knowledge, this is the first remote sensing semantic change detection study focused on conflict area assessment, laying a foundation for future research in related fields. In summary, our work makes the following key contributions:
• We introduce a multi-scale cross-attention difference siamese network (MC-DiSNnet). Built upon a pretrained DINOv3 backbone, our network extracts robust, generalized features. The cross-attention mechanism is strategically employed to fuse multi-scale temporal features, enabling it to pinpoint subtle, semanticchanging regions effectively.
• We contribute a new dataset for the Gaza area, containing high-resolution bi-temporal satellite image pairs from 2023-2024 with meticulously annotated pixel-level semantic change labels. To our knowledge, this is the first change detection study specifically focused on conflict area assessment.
• We introduce the multi-class change detection (MCD) paradigm for damage assessment that fundamentally shifts from exhaustive bi-temporal semantic annotation to focused labeling of changed semantic regions. This strategic simplification significantly reduces annotation complexity and human labor.
In this section, topics relat
This content is AI-processed based on open access ArXiv data.