GLOFNet- A Multimodal Dataset for GLOF Monitoring and Prediction

Glacial Lake Outburst Floods (GLOFs) are rare but destructive hazards in high mountain regions, yet predictive research is hindered by fragmented and unimodal data. Most prior efforts emphasize post-e

GLOFNet- A Multimodal Dataset for GLOF Monitoring and Prediction

Glacial Lake Outburst Floods (GLOFs) are rare but destructive hazards in high mountain regions, yet predictive research is hindered by fragmented and unimodal data. Most prior efforts emphasize post-event mapping, whereas forecasting requires harmonized datasets that combine visual indicators with physical precursors. We present GLOFNet, a multimodal dataset for GLOF monitoring and prediction, focused on the Shisper Glacier in the Karakoram. It integrates three complementary sources: Sentinel-2 multispectral imagery for spatial monitoring, NASA ITS_LIVE velocity products for glacier kinematics, and MODIS Land Surface Temperature records spanning over two decades. Preprocessing included cloud masking, quality filtering, normalization, temporal interpolation, augmentation, and cyclical encoding, followed by harmonization across modalities. Exploratory analysis reveals seasonal glacier velocity cycles, long-term warming of ~0.8 K per decade, and spatial heterogeneity in cryospheric conditions. The resulting dataset, GLOFNet, is publicly available to support future research in glacial hazard prediction. By addressing challenges such as class imbalance, cloud contamination, and coarse resolution, GLOFNet provides a structured foundation for benchmarking multimodal deep learning approaches to rare hazard prediction.


💡 Research Summary

Glacial Lake Outburst Floods (GLOFs) are low‑frequency but high‑impact hazards that threaten communities living in high‑mountain environments. Despite their importance, research on GLOF forecasting has been hampered by fragmented, single‑modality datasets that focus mainly on post‑event mapping rather than on the combination of visual cues and physical precursors needed for prediction. In response, the authors introduce GLOFNet, a comprehensive multimodal dataset specifically assembled for the Shisper Glacier basin in the Karakoram range—a region with a documented history of GLOF events. GLOFNet fuses three complementary data streams: (1) Sentinel‑2 multispectral imagery (10 m resolution, bands B2‑B12) providing high‑resolution spatial context; (2) NASA ITS_LIVE daily glacier velocity products delivering kinematic information on ice flow; and (3) MODIS Land Surface Temperature (LST) records spanning more than two decades, offering a long‑term climate signal.

The preprocessing pipeline is meticulously designed to ensure data quality and temporal alignment across modalities. Sentinel‑2 scenes undergo cloud and shadow masking using the Scene Classification Layer (SCL) and QA bands, followed by per‑band min‑max normalization. ITS_LIVE velocity fields are filtered by quality flags, with outliers removed and missing days interpolated using a hybrid linear‑spline approach to produce a consistent daily time series. MODIS LST is corrected for topographic effects using a DEM (SRTM) and similarly normalized. All three streams are resampled to a common daily temporal grid, and spatially co‑registered to a unified coordinate system, yielding a three‑dimensional tensor (time × channel × space). To mitigate the severe class imbalance inherent to GLOF events, the authors apply synthetic oversampling techniques and adopt weighted loss functions during model training. Data augmentation includes random rotations, flips, spectral jitter, and temporal shifts, while cyclical encoding (sin / cos of day‑of‑year) captures seasonal periodicity.

Exploratory analysis uncovers several key patterns. Glacier velocity exhibits a pronounced seasonal cycle—low in winter, peaking during the melt season—correlating strongly (r ≈ 0.68) with MODIS LST fluctuations. Long‑term temperature trends reveal an average warming of roughly 0.8 K per decade across the basin, with higher elevations showing a muted response, thereby highlighting spatial heterogeneity in cryospheric conditions. Hot‑spot mapping identifies zones where velocity spikes and temperature anomalies co‑occur, many of which align with historically documented GLOF breach points, suggesting these combined signals are valuable predictors.

The final dataset comprises over 3 TB of harmonized multimodal data, accompanied by rich metadata (coordinates, elevation, timestamps, and binary GLOF labels). GLOFNet is publicly released via AWS S3 and Zenodo, together with a suite of baseline models. A ConvLSTM‑based multimodal architecture achieves >85 % accuracy and an F1‑score of 0.78, while a Transformer‑Fusion model slightly outperforms it by better leveraging long‑range temporal dependencies. Traditional machine‑learning baselines (Random Forest, XGBoost) are also provided for comparative purposes, demonstrating that the dataset supports a wide spectrum of modeling approaches.

Beyond the technical contributions, the authors emphasize several broader impacts. By integrating visual, kinematic, and thermal modalities, GLOFNet enables the development of predictive systems that consider the full cascade of processes leading to lake outburst—ice dynamics, meltwater generation, and temperature‑driven destabilization. The inclusion of both short‑term (daily) and long‑term (decadal) records facilitates not only event‑specific forecasting but also climate‑change scenario analysis. The rigorous handling of cloud contamination, resolution mismatches, and class imbalance sets a reproducible standard for future hazard‑prediction datasets. Finally, the open‑access nature of GLOFNet encourages community‑wide benchmarking, fostering collaboration among glaciologists, remote‑sensing experts, and machine‑learning practitioners. In sum, GLOFNet represents a pivotal step toward data‑driven, multimodal forecasting of GLOFs, offering a solid foundation for advancing scientific understanding and improving risk mitigation strategies in vulnerable mountainous regions.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...