Predicting Forecast Error for the HRRR Using LSTM Neural Networks: A Comparative Study Using New York and Oklahoma State Mesonets

Predicting Forecast Error for the HRRR Using LSTM Neural Networks: A Comparative Study Using New York and Oklahoma State Mesonets
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Long Short-Term Memory (LSTM) models are trained to predict forecast error for the High-Resolution Rapid Refresh (HRRR) model using the New York State Mesonet and Oklahoma State Mesonet near-surface weather observations as ground truth. Physical and dynamical mechanisms tied to LSTM performance are evaluated by comparing the New York domain to the Oklahoma domain. The contrasting geography and atmospheric dynamics of the two domains provide a compelling scientific foil. Evaluating them side by side highlights variations in LSTM prediction of forecast error that are closely linked to region-specific phenomena driven by both dynamics and geography. Using mean-absolute-error and percent improvement relative to HRRR, LSTMs predict precipitation error most accurately, followed by wind error and then temperature error. Precipitation errors exhibit an asymmetry, with overforecast precipitation detected more accurately than underforecast, while wind error predictions are consistent across over- and underforecast predictions. Temperature error predictions are relatively accurate but smoother, with respect to variance, than true observations. This paper describes an overview of LSTM performance with the expressed intent of providing forecasters with real-time predictions of forecast error at the point of use within the New York State and Oklahoma State Mesonets. This research demonstrates the potential of LSTM-based machine learning models to provide actionable, location-specific predictions of forecast error for high-resolution operational numerical weather prediction (NWP) systems.


💡 Research Summary

This paper presents a machine‑learning framework that predicts the forecast error of the High‑Resolution Rapid Refresh (HRRR) model in real time using surface observations from two state‑wide mesonet networks: the New York State Mesonet (NYSM) and the Oklahoma State Mesonet (OKSM). The authors argue that traditional verification of numerical weather prediction (NWP) systems relies on retrospective statistical methods that are computationally intensive and often limited to specific model versions or climatological periods. To give forecasters actionable, point‑of‑use error estimates, they develop Long Short‑Term Memory (LSTM) neural networks in an encoder‑decoder architecture that ingest both HRRR model fields and contemporaneous mesonet observations.

Data spanning 2018 – 2024 are used. HRRR outputs from three successive model versions (v2, v3, v4) provide about 30 atmospheric variables, while the mesonets supply 15 – 16 surface variables (temperature, humidity, wind, precipitation, etc.). Observations recorded every five minutes are aggregated to hourly values (instantaneous variables, hourly precipitation totals, and wind averages) to match the HRRR temporal resolution. In addition, geographic descriptors—land‑use/land‑cover (LULC) classes, elevation, and aspect/slope—are extracted for a 12 km (NYSM) or 30 km (OKSM) buffer around each station, clustered via k‑means, and encoded as categorical features. This allows the LSTM to incorporate terrain heterogeneity without inflating the feature space.

The LSTM encoder processes a time series that includes the most recent mesonet observation repeated (persistence) for future steps where observations are unavailable, preserving sequence length. A sinusoidal time‑encoding (sine and cosine of hour‑of‑day and day‑of‑year) is added to capture diurnal and seasonal cycles. The encoder’s final hidden state is fed to a decoder LSTM, whose output passes through a fully‑connected multilayer perceptron that predicts the error (HRRR forecast minus observation) for temperature (2 m/1.5 m), wind speed (10 m), and hourly precipitation. Training uses data from 2018‑2022, validation on 2023, and testing on 2024, ensuring chronological separation to avoid data leakage.

Performance is evaluated with mean absolute error (MAE) and percent improvement relative to the raw HRRR. Results show that precipitation error is predicted most accurately, especially over‑forecast events, followed by wind speed error, and finally temperature error, which is smoother (lower variance) than the true error distribution but still reasonably accurate. The comparative analysis between the two regions reveals that the NYSM domain, with its complex topography, varied land‑cover, and more pronounced mesoscale dynamics, yields larger and more heterogeneous error patterns that the LSTM must learn, whereas the OKSM domain, being flatter and more homogeneous, exhibits simpler error structures and slightly higher predictive skill.

Key contributions include: (1) a side‑by‑side assessment of LSTM error prediction across contrasting geographic and dynamical regimes; (2) an encoder‑decoder LSTM design that handles missing future observations via persistence; (3) the integration of clustered geographic attributes as categorical inputs to improve model awareness of terrain effects. Limitations noted are the simplistic handling of missing data, the absence of comparisons with newer sequence models such as Transformers or Temporal Convolutional Networks, and reduced skill during dry periods with scant precipitation. The authors suggest future work incorporating multi‑modal data (radar, satellite), hybrid architectures (e.g., Conv‑LSTM, EMD‑LSTM), region‑specific hyper‑parameter tuning, and extending the forecast horizon beyond 18 hours to further enhance operational utility.


Comments & Academic Discussion

Loading comments...

Leave a Comment