On the Adversarial Robustness of Hydrological Models

On the Adversarial Robustness of Hydrological Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The evaluation of hydrological models is essential for both model selection and reliability assessment. However, simply comparing predictions to observations is insufficient for understanding the global landscape of model behavior. This is especially true for many deep learning models, whose structures are complex. Further, in risk-averse operational settings, water managers require models that are trustworthy and provably safe, as non-robustness can put our critical infrastructure at risk. Motivated by the need to select reliable models for operational deployment, we introduce and explore adversarial robustness analysis in hydrological modeling, evaluating whether small, targeted perturbations to meteorological forcings induce substantial changes in simulated discharge. We compare physical-conceptual and deep learning-based hydrological models across 1,347 German catchments under perturbations of varying magnitudes, using the fast gradient sign method (FGSM). We find that, as expected, the FGSM perturbations systematically reduce KGE and increase MSE. However, catastrophic failure is rare and, surprisingly, LSTMs generally demonstrate greater robustness than HBV models. Further, changes in both the predicted hydrographs and the internal model states often respond approximately linearly (at least locally) as perturbation size increases, providing a compact summary of how errors grow under such perturbations. Similar patterns are also observed for random perturbations, suggesting that small input changes usually introduce approximately proportional changes in model output. Overall, these findings support further consideration of LSTMs for operational deployment (due both to their predictive power and robustness), and motivate future work on both characterizing model responses to input changes and improving robustness through architectural modifications and training design.


💡 Research Summary

The paper introduces adversarial robustness testing as a novel evaluation paradigm for hydrological models, addressing the gap between traditional performance metrics (e.g., NSE, KGE) and the need for models that remain reliable under small, potentially malicious input perturbations. The authors compare a classic process‑based conceptual model (HBV) with a deep‑learning LSTM architecture across 1,347 German catchments, using the same meteorological forcings (precipitation, temperature, potential evapotranspiration) as inputs. To generate adversarial examples they employ the Fast Gradient Sign Method (FGSM), which computes the gradient of a loss function with respect to the inputs and adds a signed perturbation scaled by ε. Four perturbation magnitudes (ε = 0.01, 0.03, 0.05, 0.1) are examined, corresponding to realistic sensor noise levels. Model outputs (daily discharge) are evaluated before and after perturbation using Kling‑Gupta Efficiency (KGE) and Mean Squared Error (MSE); changes are reported as ΔKGE and ΔMSE.

Key findings are: (1) Both models experience degradation as ε grows, but catastrophic failures (e.g., KGE ≤ 0 or explosive MSE) are rare, indicating an inherent resilience tied to physical constraints such as mass balance. (2) The LSTM consistently shows smaller ΔKGE and ΔMSE than HBV, especially for ε ≤ 0.05, suggesting superior robustness despite its black‑box nature. (3) Internal state analysis reveals that LSTM cell and hidden states vary almost linearly with ε, implying that small input changes are absorbed gradually and reflected proportionally in the output. By contrast, HBV’s nonlinear components (snow melt thresholds, soil moisture limits) cause abrupt error spikes at certain ε values. (4) Random Gaussian perturbations produce similar linear error growth, confirming that the observed behavior is not unique to adversarially crafted inputs but reflects a general property of the models’ input‑output mapping.

The authors argue that these results support the operational deployment of LSTMs, as they combine high predictive skill with demonstrated robustness to input noise—critical for flood forecasting, reservoir management, and other risk‑averse water‑resource applications. Moreover, the methodology of probing internal model states offers a “crash‑test” style diagnostic that can uncover structural weaknesses before field implementation. Future work is outlined: extending attacks to stronger methods (PGD, Carlini‑Wagner), incorporating physical constraints directly into training (physics‑informed loss functions), testing robustness under climate‑driven distribution shifts, and exploring ensemble or architectural modifications to further improve resilience.

In sum, this study provides the first large‑scale, systematic assessment of adversarial robustness in hydrological modeling, demonstrating that LSTM networks not only outperform traditional HBV models in accuracy but also exhibit greater tolerance to small, targeted input perturbations. The findings bridge a crucial gap between model performance and trustworthiness, paving the way for more reliable, safety‑critical hydrological decision support systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment