초고해상도 UAV 기반 정밀 화재 확산 예측 데이터셋 FireSentry와 FiReDiff 모델

Reading time: 5 minute
...

📝 Abstract

Fine-grained wildfire spread prediction is crucial for enhancing emergency response efficacy and decision-making precision. However, existing research predominantly focuses on coarse spatiotemporal scales and relies on low-resolution satellite data, capturing only macroscopic fire states while fundamentally constraining highprecision localized fire dynamics modeling capabilities. To bridge this gap, we present FireSentry, a provincial-scale multi-modal wildfire dataset characterized by sub-meter spatial and sub-second temporal resolution. Collected using synchronized UAV platforms, FireSentry provides visible and infrared video streams, in-situ environmental measurements, and manually validated fire masks. Building on FireSentry, we establish a comprehensive benchmark encompassing physics-based, data-driven, and generative models, revealing the limitations of existing mask-only approaches. Our analysis proposes FiReDiff, a novel dual-modality paradigm that first predicts future video sequences in the infrared modality, and then precisely segments fire masks in the mask modality based on the generated dynamics. FiReDiff achieves state-of-the-art performance, with video quality gains of 39.2% in PSNR, 36.1% in SSIM, 50.0% in LPIPS, 29.4% in FVD, and mask accuracy gains of 3.3% in AUPRC, 59.1% in F1 score, 42.9% in IoU, and 62.5% in MSE when applied to generative models. The FireSentry benchmark dataset and FiReDiff paradigm collectively advance fine-grained wildfire forecasting and dynamic disaster simulation. The processed benchmark dataset is publicly available at: https://github.com/Munan222/FireSentry-Benchmark-Dataset .

💡 Analysis

Fine-grained wildfire spread prediction is crucial for enhancing emergency response efficacy and decision-making precision. However, existing research predominantly focuses on coarse spatiotemporal scales and relies on low-resolution satellite data, capturing only macroscopic fire states while fundamentally constraining highprecision localized fire dynamics modeling capabilities. To bridge this gap, we present FireSentry, a provincial-scale multi-modal wildfire dataset characterized by sub-meter spatial and sub-second temporal resolution. Collected using synchronized UAV platforms, FireSentry provides visible and infrared video streams, in-situ environmental measurements, and manually validated fire masks. Building on FireSentry, we establish a comprehensive benchmark encompassing physics-based, data-driven, and generative models, revealing the limitations of existing mask-only approaches. Our analysis proposes FiReDiff, a novel dual-modality paradigm that first predicts future video sequences in the infrared modality, and then precisely segments fire masks in the mask modality based on the generated dynamics. FiReDiff achieves state-of-the-art performance, with video quality gains of 39.2% in PSNR, 36.1% in SSIM, 50.0% in LPIPS, 29.4% in FVD, and mask accuracy gains of 3.3% in AUPRC, 59.1% in F1 score, 42.9% in IoU, and 62.5% in MSE when applied to generative models. The FireSentry benchmark dataset and FiReDiff paradigm collectively advance fine-grained wildfire forecasting and dynamic disaster simulation. The processed benchmark dataset is publicly available at: https://github.com/Munan222/FireSentry-Benchmark-Dataset .

📄 Content

Accurate and real-time fine-grained wildfire spread prediction is crucial for facilitating efficient evacuations and optimizing emergency responses. However, the highly dynamic and complex nature of wildfires poses significant challenges to accurate forecasting.

To address this challenge, existing approaches fall into three categories: physics-based models simulate fire dynamics using principles like fluid mechanics and heat transfer, yet struggle with realworld complexity [28]; data-driven models target large-scale temporal fire prediction and are heavily dependent on low-resolution satellite imagery, necessitating architectural adaptations for fine-grained wildfire prediction tasks [10,45]; and generative techniques (particularly world model-based video generation) show considerable promise but remain largely unexplored for wildfire contexts [8,17]. Critically, wildfires exhibit flashover behavior, characterized by meter-scale expansions within minutes (Figure 1), which necessitates high-resolution prediction for timely intervention. However, existing datasets typically offer only hundred-meter spatial resolution and hourly temporal resolution (Table 1), fundamentally lacking the spatio-temporal granularity required for such modeling. To bridge this gap, we introduce FireSentry, a novel benchmark dataset designed specifically for fine-grained wildfire spread prediction. Covering five distinct regions within a single province, the dataset provides dual-modality visual data in both visible light and infrared spectra, along with environmental data. To support model development, we generate fire masks from infrared videos using advanced semantic segmentation algorithms. Mask quality is ensured through a human verification protocol: annotators create a validation subset by referencing infrared videos as the primary source and visible-light videos as auxiliary inputs (providing smoke dispersion and vegetation context). Quantitative evaluation via pixelwise comparison between algorithmic masks and human-verified annotations yields an average accuracy of 0.925, mean Intersectionover-Union (mIoU) of 0.696, with commission and omission errors of 0.076 and 0.015 respectively.

Leveraging its high spatio-temporal resolution and multi-modal capabilities, FireSentry establishes a robust foundation for finegrained wildfire propagation modeling. Building upon this dataset, we propose FiReDiff, a novel predictive paradigm. Diverging from conventional mask-level approaches [19,25], FiReDiff innovatively employs a dual-stage architecture: first performing video prediction in the infrared modality, then executing fire mask segmentation within the mask modality. This paradigm deeply integrates video generation with semantic understanding, achieving significant performance breakthroughs while pioneering new pathways for wildfire forecasting research.

We conduct a comprehensive benchmark evaluation on FireSentry, comparing physics-based, data-driven, and generative models, as well as our proposed FiReDiff paradigm. Experimental results demonstrate that FiReDiff consistently outperforms state-of-theart (SOTA) baselines across multiple evaluation metrics, highlighting its effectiveness and potential in fine-grained wildfire prediction tasks. In summary, the contributions of this work are as follows:

• We present FireSentry, a multi-modal wildfire dataset that enables meter-level and minute-scale dynamics modeling. It integrates synchronized UAV-captured visible and infrared video streams, a spatio-temporally calibrated environmental telemetry, and manually validated fire segmentation masks.

• We introduce FiReDiff, a generative model-based prediction paradigm that jointly optimizes infrared video prediction and mask segmentation. By integrating spatio-temporal features from complementary infrared and mask modalities, FiReDiff mitigates key constraints of mask-only approaches and enhances prediction robustness. • We establish comprehensive benchmarking protocols comparing physics-based, data-driven, generative approaches, and our novel FiReDiff paradigm. Extensive experiments quantitatively demonstrate FiReDiff’s superior spatio-temporal prediction accuracy.

We systematically review the research landscape of wildfire spread prediction, and reveal a fundamental limitation in existing studies: the predominant focus on macro-scale spatiotemporal prediction, which hinders fine-grained fire spread modeling.

Existing research on fire spread prediction predominantly falls into three methodological categories. The first category encompasses physics-based methods, such as WRF-SFIRE [28], FARSITE [9], and WFDS [29]. These models simulate fire propagation based on principles of fluid dynamics, combustion, and heat transfer. While physically interpretable, these models often suffer from limited predictive accuracy and generalization in complex real-world scenarios.

The second category consists of data-driven models that learn spatio-

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut