Predicting river flow in places without streamflow records is challenging because basins respond differently to climate, terrain, vegetation, and soils. Traditional basin attributes describe some of these differences, but they cannot fully represent the complexity of natural environments. This study examines whether AlphaEarth Foundation embeddings, which are learned from large collections of satellite images rather than designed by experts, offer a more informative way to describe basin characteristics. These embeddings summarize patterns in vegetation, land surface properties, and long-term environmental dynamics. We find that models using them achieve higher accuracy when predicting flows in basins not used for training, suggesting that they capture key physical differences more effectively than traditional attributes. We further investigate how selecting appropriate donor basins influences prediction in ungauged regions. Similarity based on the embeddings helps identify basins with comparable environmental and hydrological behavior, improving performance, whereas adding many dissimilar basins can reduce accuracy. The results show that satellite-informed environmental representations can strengthen hydrological forecasting and support the development of models that adapt more easily to different landscapes.
Deep Dive into Utilizing Earth Foundation Models to Enhance the Simulation Performance of Hydrological Models with AlphaEarth Embeddings.
Predicting river flow in places without streamflow records is challenging because basins respond differently to climate, terrain, vegetation, and soils. Traditional basin attributes describe some of these differences, but they cannot fully represent the complexity of natural environments. This study examines whether AlphaEarth Foundation embeddings, which are learned from large collections of satellite images rather than designed by experts, offer a more informative way to describe basin characteristics. These embeddings summarize patterns in vegetation, land surface properties, and long-term environmental dynamics. We find that models using them achieve higher accuracy when predicting flows in basins not used for training, suggesting that they capture key physical differences more effectively than traditional attributes. We further investigate how selecting appropriate donor basins influences prediction in ungauged regions. Similarity based on the embeddings helps identify basins with
Accurate prediction in ungauged basins remains one of the most persistent challenges in hydrology. The Prediction in Ungauged Basins (PUB) initiative launched in 2003 reframed this issue as a decade-long community effort to improve process understanding, quantify uncertainties, and develop models capable of transferring information across basins with sparse or nonexistent observations (Hrachowitz et al., 2013;Sivapalan et al., 2003). A long-standing strategy for PUB is regionalization, in which model parameters or training data for an ungauged target basin are borrowed from gauged donor basins. This process relies fundamentally on the hypothesis that hydrological behaviors can be transferred between locations that share high similarity in their physical and climatic characteristics (Pool et al., 2021). Although foundational studies have demonstrated the potential of these donor-selection schemes, they also show that prediction performance is sensitive to how similarity is defined and implemented (Oudin et al., 2008).
Deep learning has reshaped the landscape of regional rainfall-runoff modeling, as Long Short-Term Memory (LSTM) models (Hochreiter and Schmidhuber, 1997) trained on large collections of basins have demonstrated strong information sharing and generalization capacity, achieving impressive performance even in ungauged settings (Fang et al., 2022;Kratzert et al., 2019a). This success has led to the prevailing view that multi-basin learning should supplant single-basin training (Kratzert et al., 2024). However, the tendency to continually enlarge the training dataset introduces an important limitation. Simply expanding the training set does not always yield better predictions for every catchment. Evidence from studies covering more than 3,000 U.S. basins indicates that pooling all basins into a single training set can be suboptimal, largely because of the substantial hydrological heterogeneity among nonreference basins affected by human activities (Ouyang et al., 2021). Critically, even within the carefully curated CAMELS dataset (Addor et al., 2017), a benchmark collection of reference basins with minimal human impact, training a regional deep learning model on all available data without proper screening can degrade performance at the local scale (Nai et al., 2024;Yu et al., 2024). Thus, identifying a small yet informative set of training basins remains an underexplored but highly influential lever for improving PUB performance.
The challenge of selecting these informative donors has traditionally relied on distance-based similarity analysis using readily quantifiable catchment descriptors (He et al., 2011). These hand-crafted attributes spanning physiographic, climatic, and geological factors, are typically evaluated individually or combined into composite metrics such as standardized Euclidean distance. However, conventional descriptors alone are often inadequate (Tarasova et al., 2024). Hydrological processes can vary sharply over space, meaning that spatial proximity only loosely reflects functional similarity. Furthermore, these static attributes are often too sparse to capture the complex, dynamic nature of catchment behavior, resulting in distance metrics that do not map effectively onto true hydrological similarity.
The integration of deep learning has introduced sophisticated approaches to overcome the limitations of traditional, hand-crafted similarity measures. Early studies demonstrated that regional LSTM models could implicitly exploit static attributes to infer streamflow generation mechanisms (Kratzert et al., 2019b). Recent developments take a more explicit approach by deriving improved similarity rules directly from data. One line of work defines similarity through the classification of streamflow generation mechanisms based on hydrological signatures, thereby grouping catchments with more homogeneous behavior (Yu et al., 2024). Another, more data-driven direction introduces dynamic basin-affinity metrics that quantify how the learning gradient from one basin affects the loss function of a target basin. This allows for the automatic identification of training subsets that positively contribute to the target model while filtering out “mutual noise” (Nai et al., 2024).
These emerging task-specific similarity definitions represent a significant advance. However, they define similarity primarily through the lens of the rainfall-runoff model itself (e.g., via gradients or behavioral signatures). This raises a fundamental question: can we find a task-agnostic similarity definition that is as rich and data-driven as these learned metrics, yet as physically grounded as traditional catchment descriptors? The core limitation of conventional catchment descriptors lies not in the concept of using physical attributes, but in their execution: the attributes were sparse, hand-crafted, and failed to capture the complex, integrated spatio-temporal dynamics of the land surface, such as vegetatio
…(Full text truncated)…
This content is AI-processed based on ArXiv data.