Probabilistic Wildfire Susceptibility from Remote Sensing Using Random Forests and SHAP

Reading time: 6 minute
...

📝 Abstract

Wildfires pose a significant global threat to ecosystems worldwide, with California experiencing recurring fires due to various factors, including climate, topographical features, vegetation patterns, and human activities. This study aims to develop a comprehensive wildfire risk map for California by applying the random forest (RF) algorithm, augmented with Explainable Artificial Intelligence (XAI) through Shapley Additive exPlanations (SHAP), to interpret model predictions. Model performance was assessed using both spatial and temporal validation strategies. The RF model demonstrated strong predictive performance, achieving near-perfect discrimination for grasslands (AUC = 0.996) and forests (AUC = 0.997). Spatial cross-validation revealed moderate transferability, yielding ROC-AUC values of 0.6155 for forests and 0.5416 for grasslands. In contrast, temporal split validation showed enhanced generalization, especially for forests (ROC-AUC = 0.6615, PR-AUC = 0.8423). SHAP-based XAI analysis identified key ecosystem-specific drivers: soil organic carbon, tree cover, and Normalized Difference Vegetation Index (NDVI) emerged as the most influential in forests, whereas Land Surface Temperature (LST), elevation, and vegetation health indices were dominant in grasslands. District-level classification revealed that Central Valley and Northern Buttes districts had the highest concentration of high-risk grasslands, while Northern Buttes and North Coast Redwoods dominated forested high-risk areas. This RF-SHAP framework offers a robust, comprehensible, and adaptable method for assessing wildfire risks, enabling informed decisions and creating targeted strategies to mitigate dangers.

💡 Analysis

Wildfires pose a significant global threat to ecosystems worldwide, with California experiencing recurring fires due to various factors, including climate, topographical features, vegetation patterns, and human activities. This study aims to develop a comprehensive wildfire risk map for California by applying the random forest (RF) algorithm, augmented with Explainable Artificial Intelligence (XAI) through Shapley Additive exPlanations (SHAP), to interpret model predictions. Model performance was assessed using both spatial and temporal validation strategies. The RF model demonstrated strong predictive performance, achieving near-perfect discrimination for grasslands (AUC = 0.996) and forests (AUC = 0.997). Spatial cross-validation revealed moderate transferability, yielding ROC-AUC values of 0.6155 for forests and 0.5416 for grasslands. In contrast, temporal split validation showed enhanced generalization, especially for forests (ROC-AUC = 0.6615, PR-AUC = 0.8423). SHAP-based XAI analysis identified key ecosystem-specific drivers: soil organic carbon, tree cover, and Normalized Difference Vegetation Index (NDVI) emerged as the most influential in forests, whereas Land Surface Temperature (LST), elevation, and vegetation health indices were dominant in grasslands. District-level classification revealed that Central Valley and Northern Buttes districts had the highest concentration of high-risk grasslands, while Northern Buttes and North Coast Redwoods dominated forested high-risk areas. This RF-SHAP framework offers a robust, comprehensible, and adaptable method for assessing wildfire risks, enabling informed decisions and creating targeted strategies to mitigate dangers.

📄 Content

Wildfires are among the most widespread and severe threats to ecosystems worldwide, exerting profound environmental impacts [1]. Climate change has intensified wildfire risk, with rising temperatures and more frequent droughts increasing ecosystem vulnerability [2]- [4]. California is largely characterized as a “fuel-limited” ecosystem, with a fire regime spanning interior yellow pine, oak woodlands, grasslands, and mixed-conifer forests [5]. A critical factor influencing California wildfires is the Fig. 1: Study area in California showing forests, grasslands, and fire perimeters. (Source: NLCD) availability of ignition sources [6]. In the mountainous and desert regions of central California, frequent lightning strikes often trigger forest fires, whereas in the coastal regions, where lightning is rare, natural ignition sources are minimal. Nevertheless, coastal areas possess climatic conditions and fuel availability that remain highly conducive to the rapid spread of fires [6]). Remote sensing imagery provides an effective approach for studying wildfires and the environmental factors influencing their occurrence, enabling large-scale monitoring, detailed assessment of vegetation and fuel conditions, and evaluation of the climatic drivers of forest fires [7]. Wildfire risk prediction has been extensively studied, with existing models generally arXiv:2511.11680v1 [cs.LG] 12 Nov 2025 classified into three types: physics-based, semi-empirical, and empirical approaches [8]. These models often incorporate a multi-scale set of wildfire-related parameters, such as climatic factors, topographic attributes, and land cover and vegetation characteristics derived from satellite imagery [9]. Geographic Information System (GIS) methods, coupled with multi-criteria decision-making (MCDM) approaches such as the analytic hierarchy process (AHP) and Fuzzy AHP, are frequently employed in spatial analysis studies [10]- [14]. Advances in computational power and algorithm development have further facilitated the widespread adoption of machine learning (ML) algorithms for wildfire susceptibility mapping. Methods such as random forest (RF), support vector machines (SVM), gradient boosting machines (GBM), extreme gradient boosting (XGBoost), and artificial neural networks (ANN) have been increasingly utilized to capture complex patterns in environmental and climatic patterns, improving predictive accuracy for risk assessment, particularly, in estimating the spatial probability of fire occurrence. However, they are often regarded as “black-box” approaches, owing to their limited transparency in explaining the contribution of individual input variables to wildfire risk predictions at specific locations.

Explainable Artificial Intelligence (XAI) addresses the interpretability challenges of ML models by enhancing their transparency and providing insights on internal decision-making processes [9], [15]. XAI techniques enable visualization and interaction with model outputs, offering a clearer understanding of how predictions are derived.

Incorporating interpretable techniques, such as SHapley Additive exPlanations (SHAP), can significantly enhance trust and comprehension of ML-driven wildfire risk assessments [15]. Despite advances in wildfire modeling using GIS, MCDM, and ML approaches, several gaps remain. Many existing studies in California either lack spatially robust validation, rely on temporally constrained datasets, or fail to assess probabilistic model calibration. While ML models often achieve high predictive accuracy, they typically do not provide explicit measures of variable contributions, limiting their applicability for management and policy decisions. Moreover, few studies integrate XAI techniques with region-transfer and temporal validation, both of which are critical for generating reliable, generalizable, and operationally useful wildfire risk maps. Addressing these gaps is essential for providing decision-makers with transparent, interpretable, and robust wildfire susceptibility assessments across diverse Californian landscapes. The present research focuses on developing a wildfire risk Fig. 2: Proposed framework for remote sensing wildfire susceptibility using RF and SHAP. zonation model for California using ML methods, particularly the RF algorithm. In addition, XAI techniques, such as SHAP, are integrated to determine the most influential climatic, topographic, and anthropogenic factors driving wildfire occurrence and to quantify their relative importance within the predictive framework. Spatial and temporal cross-validation strategies are employed to ensure model robustness and generalizability across different regions and time periods. Overall, this study establishes an integrated ML-XAI framework to improve decisionmaking, enhance model interpretability, and pinpoint the most influential factors driving wildfire susceptibility.

California, situated along the western coast of the United States, encompasses rou

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut