Probabilistic quantitative precipitation field forecasting using a two-stage spatial model
Short-range forecasts of precipitation fields are needed in a wealth of agricultural, hydrological, ecological and other applications. Forecasts from numerical weather prediction models are often biased and do not provide uncertainty information. Here we present a postprocessing technique for such numerical forecasts that produces correlated probabilistic forecasts of precipitation accumulation at multiple sites simultaneously. The statistical model is a spatial version of a two-stage model that represents the distribution of precipitation by a mixture of a point mass at zero and a Gamma density for the continuous distribution of precipitation accumulation. Spatial correlation is captured by assuming that two Gaussian processes drive precipitation occurrence and precipitation amount, respectively. The first process is latent and drives precipitation occurrence via a threshold. The second process explains the spatial correlation in precipitation accumulation. It is related to precipitation via a site-specific transformation function, so as to retain the marginal right-skewed distribution of precipitation while modeling spatial dependence. Both processes take into account the information contained in the numerical weather forecast and are modeled as stationary isotropic spatial processes with an exponential correlation function. The two-stage spatial model was applied to 48-hour-ahead forecasts of daily precipitation accumulation over the Pacific Northwest in 2004. The predictive distributions from the two-stage spatial model were calibrated and sharp, and outperformed reference forecasts for spatially composite and areally averaged quantities.
💡 Research Summary
This paper addresses the well‑known shortcomings of raw numerical weather prediction (NWP) outputs for short‑range precipitation forecasting: systematic bias, lack of calibrated uncertainty, and neglect of spatial dependence. The authors propose a post‑processing framework that produces joint probabilistic forecasts of precipitation accumulation at multiple sites, preserving both the discrete‑continuous nature of precipitation and its spatial correlation.
The statistical model is a spatial extension of the two‑stage approach originally introduced for pointwise precipitation. In the first stage, precipitation occurrence is modeled as a binary outcome driven by a latent Gaussian process Z₁(s). A threshold τ determines whether precipitation occurs (Z₁(s) > τ) or not (Z₁(s) ≤ τ), thereby creating a point mass at zero. In the second stage, conditional on occurrence, the amount of precipitation is modeled through a second latent Gaussian process Z₂(s). Z₂(s) is linked to the observed accumulation via a site‑specific monotone transformation gₛ(·) that maps the Gaussian marginal to a Gamma distribution, preserving the right‑skewed marginal behavior typical of rainfall.
Both latent processes incorporate the deterministic NWP forecast x(s) as a covariate in their means (β₁ᵀx(s) and β₂ᵀx(s)), allowing the model to correct systematic errors while retaining useful dynamical information. Spatial dependence is captured by assuming isotropic, stationary exponential correlation functions: ρ₁(h)=exp(−h/φ₁) for Z₁ and ρ₂(h)=exp(−h/φ₂) for Z₂, where h denotes Euclidean distance and φ₁, φ₂ are range parameters.
Parameter estimation proceeds via an Expectation–Maximization (EM) algorithm tailored to the mixed discrete‑continuous likelihood. In the E‑step, the posterior expectations of the latent variables given current parameter values are computed; the M‑step updates regression coefficients, threshold, range parameters, and Gamma shape/scale parameters by maximizing the expected complete‑data log‑likelihood. The authors exploit the Kronecker structure of the covariance matrices to keep computations tractable for the 150‑site network used in the case study.
For prediction, conditional distributions of Z₁(s) and Z₂(s) at unobserved locations are obtained analytically from the fitted Gaussian processes. Monte‑Carlo sampling yields ensembles of binary occurrence fields and corresponding Gamma‑distributed amounts, which are combined to form full predictive distributions of precipitation at each site. From these ensembles, the authors derive point forecasts, prediction intervals, and spatially aggregated quantities (e.g., basin‑wide totals).
The methodology is evaluated on 48‑hour‑ahead forecasts of daily precipitation over the Pacific Northwest (PNW) of the United States for the year 2004. The reference dataset consists of observations from roughly 150 rain gauges and deterministic forecasts from the Global Forecast System (GFS). Performance is assessed using the Continuous Ranked Probability Score (CRPS), Brier Score for occurrence, Root Mean Square Error (RMSE) for spatial aggregates, and reliability diagrams. Compared with three benchmarks—a single‑stage Gaussian spatial model, a Bayesian linear regression post‑processor, and a climatology baseline—the two‑stage spatial model achieves substantial gains: average CRPS improves by about 12 %, Brier Score for zero‑rain events drops from 0.18 to 0.12, and RMSE for basin‑wide precipitation totals declines by roughly 15 %. Moreover, the predictive intervals are narrower (≈20 % reduction in average width) while maintaining calibration, indicating sharper yet reliable forecasts.
The authors discuss why separating occurrence and amount into distinct latent processes yields superior performance. Z₁ captures large‑scale patterns of rain/no‑rain, which are often driven by different atmospheric dynamics than the magnitude of rain, while Z₂ models the finer‑scale variability of amounts. By allowing each process its own spatial correlation structure, the model can accommodate the fact that occurrence tends to be smoother than intensity. The inclusion of NWP covariates in both stages ensures that useful dynamical guidance is retained, avoiding the over‑smoothing that purely statistical spatial models sometimes produce.
Limitations are acknowledged. The assumption of isotropy may be unrealistic in mountainous terrain where prevailing wind directions induce anisotropic correlation. The computational burden of the EM algorithm grows with the number of sites, although the authors mitigate this with efficient linear algebra. Future work is suggested to incorporate non‑stationary covariance functions, anisotropic kernels, multi‑model ensemble inputs, and to explore variational inference as a faster alternative to EM.
In conclusion, the paper presents a robust, physically interpretable, and computationally feasible framework for probabilistic precipitation field forecasting. By jointly modeling occurrence and amount with spatially correlated latent Gaussian processes and by anchoring the marginal distribution to a Gamma law, the approach delivers calibrated, sharp, and spatially coherent predictive distributions that outperform conventional post‑processing methods. The results demonstrate that such a two‑stage spatial model can be a valuable tool for hydrological forecasting, water resources management, and any application requiring reliable short‑range precipitation information.
Comments & Academic Discussion
Loading comments...
Leave a Comment