Machine learning emulation of precipitation from km-scale UK regional climate simulations using a diffusion model

Machine learning emulation of precipitation from km-scale UK regional climate simulations using a diffusion model
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

High-resolution climate simulations are valuable for understanding climate change impacts. This has motivated use of regional convection-permitting climate models (CPMs), but these are very computationally expensive. We present a convection-permitting model generative emulator (CPMGEM), to skilfully emulate precipitation simulations by a 2.2km-resolution regional CPM at much lower cost. This utilises a generative machine learning approach, a diffusion model. It takes inputs at the 60km resolution of the driving global climate model and downscales these to 8.8km, with daily-mean time resolution, capturing the effect of convective processes represented in the CPM at these scales. The emulator is trained on simulations over England and Wales from the United Kingdom Climate Projections Local product, covering years between 1980 and 2080 following a high emissions scenario. The output precipitation has a similar spatial structure and intensity distribution as in the CPM simulations. The emulator is stochastic, which improves the realism of samples. We include some evidence about the emulator’s skill for extreme events with return times up to ~100 years. We demonstrate successful transfer from a “perfect model” training setting to application using GCM variable inputs. It captures the main features of the simulated 21st century climate change, but exhibits some error in the magnitude. We also show that the method can be useful in situations with limited amounts of high-resolution data. Potential applications include producing high-resolution precipitation predictions for large-ensemble climate simulations and producing output based on different GCMs and climate change scenarios to better sample uncertainty.


💡 Research Summary

Paper Overview
The authors introduce a novel generative emulator, called CPMGEM (Convection‑Permitting Model Generative EMulator), that reproduces the high‑resolution precipitation output of a 2.2 km convection‑permitting regional climate model (CPM) for England and Wales at a fraction of the computational cost. The emulator is built on a state‑of‑the‑art diffusion model, a stochastic generative neural network that learns to reverse a forward noising process. By conditioning the diffusion model on coarse‑resolution (60 km) fields from a global climate model (GCM)—specifically mean sea‑level pressure, specific humidity, temperature and vorticity at five pressure levels (250, 500, 700, 850 hPa)—the system directly generates daily‑mean precipitation fields at 8.8 km resolution, which is a conservative‑interpolation down‑sampling of the original 2.2 km CPM output.

Data and Training Regime
Training data are drawn from the UK Climate Projections (UKCP) Local product, which provides a 12‑member ensemble of CPM simulations driven by UKCP18 GCM runs under the high‑emissions RCP8.5 scenario. Three 20‑year periods are used: “Historic” (1981‑2000), “Present” (2021‑2040) and “Future” (2061‑2080), yielding 720 years of daily‑mean precipitation (360 days per year). The CPM precipitation is first re‑gridded to 8.8 km using a mass‑conserving interpolation, preserving the most energetic scales while reducing storage and computational demands. The same GCM predictor fields are extracted at the 60 km grid for each day, providing the conditional input for the diffusion model.

The diffusion architecture follows a UNet‑style backbone with time‑step embeddings, trained over 1000 epochs with a batch size of 8 on a single 10 GB GPU (e.g., RTX 2080 Ti). The loss combines a mean‑squared error on the denoised output with a Kullback‑Leibler term that enforces the correct noise schedule. Training completes in roughly two days, after which inference can generate thousands of stochastic precipitation realizations per second on the same hardware.

Performance Evaluation

  1. Climatological Skill – When evaluated against the “perfect‑model” case (i.e., using the same GCM‑CPM pair for training and testing), CPMGEM reproduces the spatial mean, variance, and spatial autocorrelation of the 8.8 km CPM precipitation with R² ≈ 0.94 for the full probability density function. Visual inspection of maps shows that convective bands, frontal structures, and mesoscale organization are faithfully reproduced, despite the coarser output grid.

  2. Extreme‑Event Representation – The stochastic nature of the diffusion model allows the generation of many independent realizations for a single GCM forcing. By constructing empirical return‑period curves, the authors demonstrate that events with return periods up to ~100 years (top 0.1 % of daily totals) are captured both in magnitude and spatial footprint. This is a notable improvement over deterministic downscaling methods, which typically underestimate tail behaviour.

  3. Climate‑Change Signal Transfer – The model is then applied to GCM predictor fields that were not part of the training set (i.e., a “transfer” experiment). The resulting precipitation fields exhibit the same directional trends as the CPM (increased mean precipitation, stronger clustering of heavy rain) across the 21st‑century period. However, a systematic bias of 5‑10 % in absolute precipitation magnitude remains, indicating that while the emulator learns the pattern of change, it does not perfectly preserve the amplitude.

  4. Data‑Efficiency Tests – When the training set is reduced to 30 % of the original size, the emulator still retains most of its skill, suggesting that the diffusion framework is robust to limited high‑resolution data—a valuable property for regions where CPM simulations are scarce.

  5. Computational Gains – A single forward pass of CPMGEM on a consumer‑grade GPU produces an 8.8 km precipitation field in milliseconds, compared with days‑to‑weeks of wall‑clock time required for a full CPM run at 2.2 km resolution. This translates into a speed‑up of three to four orders of magnitude, making it feasible to generate large ensembles for impact studies (e.g., flood risk, water resources) that would otherwise be prohibitive.

Limitations and Future Directions
The authors acknowledge several constraints: (i) the emulator currently outputs only daily means, so sub‑daily dynamics (e.g., hourly intensity peaks) are not captured; (ii) the training period does not include climate states far outside the RCP8.5 trajectory, leaving extrapolation to more extreme warming scenarios untested; (iii) the residual magnitude bias suggests a need for physics‑based post‑processing or hybrid approaches that embed conservation laws or convective parameterizations directly into the generative network. Future work could explore conditional diffusion models with multi‑scale time embeddings, incorporation of physical loss terms, and meta‑learning techniques to improve transfer across disparate GCMs and emission pathways.

Implications
CPMGEM demonstrates that diffusion‑based generative AI can serve as a powerful, stochastic downscaling tool for convection‑permitting climate models. By preserving both the spatial structure and the stochastic variability of high‑resolution precipitation, it bridges the gap between computationally expensive CPM ensembles and the need for high‑resolution climate information in impact assessments. Its ability to operate with limited training data further broadens its applicability to other regions and to emerging high‑resolution modeling initiatives worldwide. If refined to address the noted biases and extended to finer temporal resolutions, such emulators could become a standard component of next‑generation climate‑impact pipelines, enabling robust quantification of extreme‑event risk under a wide range of future climate scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment