Report on workflow analysis for specific LAM applications
This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns in terms of computation and communication (in the spirit of the Berkeley dwarfs). These dwarfs are then optimised for different hardware architectures (single and multi-node) and alternative algorithms are explored. Performance portability is addressed through the use of domain specific languages. In this deliverable we focus on the RMI-EPS ensemble prediction suite. We first provide a detailed report on the workflow of the suite in which 5 main categories of jobs are defined; pre-processing, lateral boundary conditions (LBCs), data assimilation, forecast and post-processing. Combined Energy and wall-clock time measurements of the entire RMI-EPS suite were performed. They indicate that the wall-clock times are relatively spread between the various defined job categories, with the forecast accounting for the largest fraction at about 35%. As far as energy consumption is concerned, the forecast part dwarfs everything else and is responsible for up to 99% of the total energy consumption. This means that energy optimizations for the forecast part will translate almost proportionally into optimizations of the whole suite, while the maximum theoretical speed-up due to forecast optimizations cannot exceed a factor of about 3/2. Therefore, in terms of energy consumption, optimizations should first focus on the forecast part. For wall-clock time performance gains, however, optimizations (and possibly additional dwarfs) can be considered for the categories outside of the forecast part.
💡 Research Summary
The ESCAPE project (Energy‑efficient Scalable Algorithms for Weather Prediction at Exascale) aims to deliver world‑class, extreme‑scale computing capabilities for European operational numerical weather prediction (NWP) and future climate modelling. A central tenet of the project is the identification of “Weather & Climate dwarfs” – recurring computational and communication patterns analogous to the Berkeley dwarfs – which can be targeted for optimisation across diverse hardware architectures. In this deliverable the authors focus on the RMI‑EPS (Regional Model Integration – Ensemble Prediction System) suite, a widely used ensemble forecasting workflow.
The paper first dissects the RMI‑EPS workflow into five distinct job categories:
- Pre‑processing – ingestion, quality control, and conversion of raw observational data into model‑ready initial conditions.
- Lateral Boundary Conditions (LBCs) – retrieval and interpolation of global model output to provide time‑varying boundary data for the regional model.
- Data Assimilation – integration of observations into the model state using sophisticated algorithms such as 4‑D‑Var or Ensemble Kalman Filters, which involve large linear‑system solves.
- Forecast – the core dynamical core execution on high‑resolution grids, typically employing thousands of MPI ranks and GPU accelerators.
- Post‑processing – generation of diagnostic fields, statistical verification, and conversion of model output into user‑friendly products.
To quantify the relative importance of each category, the authors performed combined energy‑and‑wall‑clock measurements on a full RMI‑EPS run (48‑hour forecast). Energy consumption was captured using on‑node power meters and aggregated across the job steps; wall‑clock times were obtained from the batch scheduler logs. The results reveal a striking imbalance: the forecast step alone accounts for approximately 99 % of the total energy consumption, while it consumes only about 35 % of the total execution time. The remaining 65 % of the wall‑clock time is distributed among the other four categories (pre‑processing ~15 %, LBCs ~20 %, data assimilation ~10 %, post‑processing ~20 %).
These findings have direct implications for optimisation strategy. From an energy‑efficiency perspective, the forecast stage is the obvious target. Potential avenues include:
- Re‑engineering the dynamical core to exploit mixed‑precision arithmetic, thereby reducing floating‑point activity on GPUs.
- Replacing existing spectral transforms and linear‑solver kernels with low‑power GPU‑native implementations that minimise data movement.
- Introducing algorithmic refinements such as adaptive time‑stepping or reduced‑order models for less critical ensemble members.
Because the forecast stage dominates the power budget, any reduction in its energy draw translates almost linearly into a reduction of the whole suite’s energy consumption.
Conversely, for wall‑clock performance gains, the analysis suggests a broader focus. The forecast step, while still important, offers a theoretical maximum speed‑up of only about 1.5× for the entire workflow (Amdahl’s law, given its 35 % share of total time). Therefore, substantial further reductions in total runtime require optimisation of the other stages:
- Pre‑processing can benefit from asynchronous I/O, parallel file formats (e.g., NetCDF‑4/HDF5 with collective buffering), and on‑the‑fly data compression.
- LBCs suffer from heavy I/O and interpolation costs; caching of boundary fields, pipelining of read‑compute‑write operations, and employing GPU‑accelerated interpolation kernels can alleviate the bottleneck.
- Data Assimilation is compute‑intensive; improved preconditioners, multigrid solvers, and hybrid MPI‑OpenMP‑GPU parallelism can accelerate the large linear solves.
- Post‑processing can be accelerated by moving statistical calculations onto GPUs, using vectorised libraries, and parallelising visualisation pipelines.
The authors also discuss the concept of “additional dwarfs” – new domain‑specific patterns that may emerge when optimising the non‑forecast stages. For example, efficient I/O handling and data‑compression dwarfs become critical in the pre‑processing and LBC phases, while linear‑solver dwarfs dominate data assimilation.
In summary, the paper provides a comprehensive quantitative profile of the RMI‑EPS workflow, highlighting that energy optimisation should be concentrated on the forecast component, whereas wall‑clock time reductions demand a multi‑pronged approach that also targets pre‑processing, LBCs, data assimilation, and post‑processing. These insights guide future ESCAPE efforts to develop performance‑portable, energy‑efficient exascale weather‑prediction codes, ensuring that both computational speed and power consumption meet the stringent requirements of operational meteorology.
Comments & Academic Discussion
Loading comments...
Leave a Comment