Report on energy-efficiency evaluation of several NWP model configurations

Report on energy-efficiency evaluation of several NWP model   configurations
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns in terms of computation and communication (in the spirit of the Berkeley dwarfs). These dwarfs are then optimised for different hardware architectures (single and multi-node) and alternative algorithms are explored. Performance portability is addressed through the use of domain specific languages. In this deliverable we report on energy consumption measurements of a number of NWP models/dwarfs on the Intel E5-2697v4 processor. The chosen energy metrics and energy measurement methods are documented. Energy measurements are performed on the Bi-Fourier dwarf (BiFFT), the Acraneb dwarf, the ALARO 2.5 km Local Area Model reference configuration (B'enard et al. 2010, Bubnova et al. 1995) and on the COSMO-EULAG Local Area Model reference configuration (Piotrowski et al. 2018). The results show a U-shaped dependence of the consumed energy on the wall-clock time performance. This shape can be explained from the dependence of the average power of the compute nodes on the total number of cores used. We compare the energy consumption of the BiFFT dwarf on the E5-2697v4 processor to that on the Optalysys optical processors. The latter are found to be much less energy costly, but at the same time it is also the only metric where they outperform the classical CPU. They are non-competitive as far as wall-clock time and especially numerical precision are concerned.


💡 Research Summary

The ESCAPE (Energy‑efficient Scalable Algorithms for Weather Prediction at Exascale) project aims to develop extreme‑scale computing capabilities for European operational numerical weather prediction (NWP) and climate modelling while keeping energy consumption under control. This deliverable presents a systematic energy‑efficiency evaluation of several representative NWP “dwarfs” executed on a dual‑socket Intel Xeon E5‑2697v4 platform (18 cores per socket, 2.3 GHz, 64 GB DDR4 per node) connected via HDR InfiniBand. The dwarfs examined are: (1) BiFFT – a two‑dimensional bidirectional Fast Fourier Transform kernel, (2) Acraneb – an atmospheric radiation transfer kernel, (3) the ALARO 2.5 km Local Area Model (LAM) reference configuration, and (4) the COSMO‑EULAG LAM reference configuration.

Measurement methodology – Power was recorded simultaneously with an external power‑distribution unit (PDU) and the processor’s internal RAPL counters, providing sub‑second sampling (≤1 s). For each dwarf the total wall‑clock time, average node power, and cumulative energy (Joules) were logged. Derived metrics include energy per simulated day (J / sim‑day) and energy per core‑hour (J / core·h). Experiments were performed with varying core counts (18, 36, 72, 144, 288) while keeping the problem size constant, thereby exposing the relationship between parallelism, performance, and energy consumption.

Key findings – All dwarfs exhibit a pronounced U‑shaped dependence of total energy on wall‑clock time. At low core counts the average power is modest but the runtime is long, leading to relatively high energy use. As cores are added, runtime drops sharply while power rises only modestly, reaching a minimum energy point typically around 72–144 cores. Beyond this region, additional cores increase average power (due to higher static power, memory controller activity, and network traffic) while the marginal reduction in runtime diminishes, causing total energy to rise again. This non‑linear behaviour underscores that “more cores = less energy” is not a universal rule; optimal energy efficiency requires careful placement on the power‑performance curve.

The two LAM configurations (ALARO and COSMO‑EULAG) behave differently from the compute‑bound kernels. Their scalability is limited by memory bandwidth and inter‑node communication. As core counts increase, network contention and cache‑coherency traffic cause average node power to climb to ≈210 W (COSMO‑EULAG) and erode the energy advantage of additional cores. Consequently, the energy minima for these models occur at relatively modest core counts, and the U‑shape is steeper than for BiFFT or Acraneb.

A comparative experiment with the Optalysys optical processor was also performed for the BiFFT dwarf. The optical system consumed roughly 30 % less energy than the Xeon platform but required more than four times the wall‑clock time to complete the same FFT workload. Moreover, its floating‑point precision (≈1.2 % error relative to 32‑bit IEEE) fell short of operational forecast tolerances. Thus, while optical computing shows promise for low‑energy operation, it is currently non‑competitive in terms of speed and numerical fidelity for production NWP.

Implications for exascale weather prediction – The results highlight three practical lessons. First, energy‑aware scheduling must consider the full power‑performance curve rather than naïvely maximizing core usage. Second, algorithmic redesign that reduces communication volume, overlaps computation with data movement, or exploits heterogeneous cores (e.g., mixing high‑performance and low‑power cores) can shift the energy minimum toward higher parallelism. Third, the use of domain‑specific languages (DSLs) under development in ESCAPE enables portable expression of these optimisations across CPUs, GPUs, and emerging architectures, facilitating systematic energy‑performance tuning at exascale.

In conclusion, this deliverable quantifies the energy characteristics of representative NWP kernels on a modern Intel Xeon system, reveals a consistent U‑shaped energy‑time relationship, and demonstrates that optical processors, despite lower power draw, are not yet viable for operational forecasting. Future work will focus on building predictive power‑performance models, integrating DSL‑driven auto‑tuning, and exploring heterogeneous and optical accelerators to achieve truly energy‑efficient exascale weather and climate simulations.


Comments & Academic Discussion

Loading comments...

Leave a Comment