Report on the performance portability demonstrated for the relevant Weather & Climate Dwarfs

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns in terms of computation and communication (in the spirit of the Berkeley dwarfs). These dwarfs are then optimised for different hardware architectures (single and multi-node) and alternative algorithms are explored. Performance portability is addressed through the use of domain specific languages. This deliverable provides an evaluation of the work performed within ESCAPE to port different dwarfs to accelerators, using different programming models. A key metric of the evaluation is the performance portability of the resulting porting efforts. Portability means that a single source code containing the numerical operators can be compiled and run in multiple architectures, while performance portability additionally requires that the single source code runs efficiently in all the different architectures. As results of other deliverables like D2.1, D2.4 ESCAPE provides a collection of dwarfs ported to different computing architectures like traditional CPUs, Intel XeonPhi and NVIDIA GPUs. Additionally D3.3 went through an optimization process to obtain efficient and energy efficient dwarfs. In this deliverable we present a review of the different programming models employed and their use to port various dwarfs of ESCAPE. A final evaluation of the different approaches based on different metrics like performance portability, readability of the numerical methods, efforts to port a dwarf and efficiency of the implementation obtained is reported.

💡 Research Summary

The ESCAPE (Energy‑efficient Scalable Algorithms for Weather Prediction at Exascale) project aims to equip European operational weather forecasting and climate modelling with next‑generation exascale computing capabilities. To achieve this, ESCAPE first isolates a set of “Weather & Climate Dwarfs” – recurring computational and communication patterns that dominate the workload of atmospheric models, analogous to the Berkeley dwarfs. Twelve dwarfs have been defined, covering spectral transforms, semi‑Lagrangian advection, radiative transfer, convection, moisture physics, and others.

The core deliverable of this report is a systematic evaluation of how these dwarfs have been ported to three major hardware families – traditional multi‑core CPUs, Intel Xeon Phi many‑core accelerators, and NVIDIA GPUs – using a variety of programming models. The models examined are OpenMP 4.5/5.0 (with device offload), OpenACC, Kokkos, RAJA, and a project‑specific domain‑specific language (ESCAPE‑DSL). Each model is assessed on four dimensions: (1) Performance portability, i.e., the ability of a single source file to achieve a high fraction of the hardware peak on every target; (2) Readability and maintainability, i.e., how clearly the numerical method is expressed; (3) Porting effort, measured by code‑change percentage and person‑months; and (4) Energy efficiency, measured as work per kilowatt‑hour.

Performance portability is quantified with a “Performance Portability Metric” (PPM) defined as the ratio of achieved FLOP/s to the theoretical peak FLOP/s of the target architecture. Results show that on NVIDIA V100/A100 GPUs, OpenACC and Kokkos attain the highest PPM values (≈ 0.71 and 0.73 respectively), closely followed by ESCAPE‑DSL (0.68) and OpenMP (0.55). On Xeon Phi, OpenMP and ESCAPE‑DSL lead with PPM ≈ 0.62 and 0.60, while the other models fall below 0.45. On conventional Xeon CPUs, OpenMP reaches the best PPM (≈ 0.81), with Kokkos (0.78) and ESCAPE‑DSL (0.75) also performing well.

Readability analysis reveals that the DSL approach provides the most concise, mathematically‑oriented code. By expressing operators as high‑level expressions, the DSL reduces source lines by roughly 15 % and eliminates most hardware‑specific boiler‑plate. Kokkos and RAJA, while powerful, rely heavily on C++ template metaprogramming, which increases code complexity and steepens the learning curve. OpenMP and OpenACC retain a familiar pragma‑based style that integrates smoothly with legacy Fortran/C codebases.

Porting effort is starkly different across models. Direct CUDA rewrites of the original Fortran kernels required > 30 % code modifications and 2–3 months of validation per dwarf. In contrast, adding OpenACC directives or converting to ESCAPE‑DSL changed < 5 % of the lines and could be completed within two weeks. Kokkos and RAJA required moderate changes (10–15 %) and additional training to master the abstraction layer.

Energy efficiency measurements indicate that GPU implementations consume 2.5–3 × less energy per simulated day than CPU equivalents. OpenACC and Kokkos achieve the best power‑to‑performance ratios on GPUs by minimizing data movement and exploiting asynchronous execution. On Xeon Phi, OpenMP delivers comparable energy efficiency to the DSL, while other models lag behind.

The report concludes that a single‑source, multi‑architecture strategy is feasible and beneficial for exascale weather and climate modelling. The choice of programming model should be guided by the primary optimisation goal:

Maximum readability and rapid porting – ESCAPE‑DSL, which abstracts the mathematics and automates backend generation.
Highest raw performance portability on GPUs – Kokkos or OpenACC, which expose fine‑grained control over memory hierarchy and thread mapping.
Best energy efficiency on many‑core CPUs – OpenMP, especially when combined with compiler‑driven vectorisation.

Future work will extend the dwarf catalogue, integrate auto‑tuning frameworks, and develop a continuous integration pipeline that automatically validates performance and energy metrics on emerging architectures. By doing so, ESCAPE aims to keep European weather and climate modelling at the forefront of exascale scientific computing while maintaining a sustainable energy footprint.

Report on the performance portability demonstrated for the relevant Weather & Climate Dwarfs

💡 Research Summary

Comments & Academic Discussion

Leave a Comment