Reliable global streamflow forecasting is essential for flood preparedness and water resource management, yet data-driven models often suffer from a performance gap when transitioning from historical reanalysis to operational forecast products. This paper introduces AIFL (Artificial Intelligence for Floods), a deterministic LSTM-based model designed for global daily streamflow forecasting. Trained on 18,588 basins curated from the CARAVAN dataset, AIFL utilises a novel two-stage training strategy to bridge the reanalysis-to-forecast domain shift. The model is first pre-trained on 40 years of ERA5-Land reanalysis (1980-2019) to capture robust hydrological processes, then fine-tuned on operational Integrated Forecasting System (IFS) control forecasts (2016-2019) to adapt to the specific error structures and biases of operational numerical weather prediction. To our knowledge, this is the first global model trained end-to-end within the CARAVAN ecosystem. On an independent temporal test set (2021-2024), AIFL achieves high predictive skill with a median modified Kling-Gupta Efficiency (KGE') of 0.66 and a median Nash-Sutcliffe Efficiency (NSE) of 0.53. Benchmarking results show that AIFL is highly competitive with current state-of-the-art global systems, achieving comparable accuracy while maintaining a transparent and reproducible forcing pipeline. The model demonstrates exceptional reliability in extreme-event detection, providing a streamlined and operationally robust baseline for the global hydrological community.
Global-scale streamflow forecasting is a critical capability for disaster risk reduction, supporting humanitarian aid, water resource management, and climate adaptation. The European Centre for Medium-Range Weather Forecasts (ECMWF) has long served as a central hub for these efforts, providing the computational backbone for the Copernicus Emergency Management Service's Global Flood Awareness System (GloFAS) [1]. Historically, hydrological forecasting has relied on a spectrum of approaches ranging from empirical relationships to complex, physically based frameworks [2]. GloFAS traditionally relies on coupling numerical weather predictions (NWP) with process-based hydrological models to generate operational alerts. These models require rigorous calibration to link parameters to global geophysical maps-such as land cover, topography, and soil texture-using in-situ river discharge observations and forcing data like ERA5 [3]. In GloFAS, this process is further enhanced by regionalization methods that transfer parameters from gauged "donor" catchments to ungauged regions based on geographical and climatic similarity [4]. However, while these process-based models are credible alternatives to in-situ measurements, they face significant challenges in representing complex hydrological processes accurately and are often limited by the quality and spatial resolution of climate-weather forcing variables [3]. These constraints, alongside the high computational demand for globalscale simulations, have motivated increasing interest in data-driven alternatives that can deliver fast, scalable inference while maintaining competitive predictive skill.
The application of machine learning (ML) to Earth system forecasting has recently accelerated, first transforming meteorology. ECMWF is pioneering this shift with the Artificial Intelligence Forecasting System (AIFS), a graph neural network-based model that now competes with physics-based NWP in medium-range accuracy [5]. As detailed by Moldovan et al. [6], AIFS has transitioned to fully operational status and is expanding its capabilities beyond atmospheric variables to include land-surface outputs such as runoff, signalling a convergence of meteorological and hydrological ML capabilities.
A similar paradigm shift has occurred in hydrology [7]. Kratzert et al. [8] argued that ML models typically outperform traditional approaches when trained on large, diverse datasets rather than single basins. This hypothesis has been validated by the widespread adoption of Long Short-Term Memory (LSTM) networks, which have demonstrated the ability to learn universal hydrological behaviours, outperforming regionally calibrated process-based models and enabling accurate prediction in ungauged basins [9]. This success has spurred a diverse family of advanced architectures, such as Hydra-LSTM [10], which employs a semi-shared architecture to improve multi-basin prediction, and MC-LSTM [11], which integrates mass conservation constraints directly into the network structure.
Most state-of-the-art approaches employ a lumped formulation [12], where the LSTM operates on inputs spatially aggregated over the entire catchment, including both time-varying meteorological forcings and static attributes such as topography, soil properties, and land cover. By collapsing the spatial distribution of these features into basin-wide aggregates, these models inherently overlook sub-catchment heterogeneity and the internal dynamics of lateral water rout-that a “domain shift” occurs when transitioning from reanalysis to forecast products, resulting in a significant reduction in predictive skill if not explicitly mitigated.
Importantly, this limitation is orthogonal to architectural complexity. Even state-of-theart models-whether spatially aggregated (lumped), connectivity-aware (graph-based), or those integrating physical constraints and process-based structures (hybrid)-are fundamentally constrained by the characteristics of their forcing data. Addressing this reanalysis-to-forecast domain shift is therefore a prerequisite for reliable operational deployment. Leveraging the recently introduced CARAVAN MultiMet dataset [25], we introduce AIFL (Artificial Intelligence for Floods). Unlike previous works that utilise complex probabilistic or explicit graph-based connectivity, AIFL utilises a standard, deterministic LSTM architecture trained on the entire CARAVAN dataset (over 18,000 basins) to provide a scalable baseline.
The primary contribution of AIFL is a novel two-stage training strategy designed to solve the reanalysis-to-forecast domain shift. Inspired by findings that fine-tuning pre-trained models improves generalisation [26], we apply this concept to the temporal and data-source domain. We first pretrain the model on 40 years of ERA5-Land reanalysis to learn robust physical processes, and then fine-tune it on IFS control forecasts to adapt to operational biases. This approach offers a transparent, reproduc
This content is AI-processed based on open access ArXiv data.