NEMO5: Achieving High-end Internode Communication for Performance Projection Beyond Moores Law

NEMO5: Achieving High-end Internode Communication for Performance   Projection Beyond Moores Law
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Electronic performance predictions of modern nanotransistors require nonequilibrium Green’s functions including incoherent scattering on phonons as well as inclusion of random alloy disorder and surface roughness effects. The solution of all these effects is numerically extremely expensive and has to be done on the world’s largest supercomputers due to the large memory requirement and the high performance demands on the communication network between the compute nodes. In this work, it is shown that NEMO5 covers all required physical effects and their combination. Furthermore, it is also shown that NEMO5’s implementation of the algorithm scales very well up to about 178176CPUs with a sustained performance of about 857 TFLOPS. Therefore, NEMO5 is ready to simulate future nanotransistors.


💡 Research Summary

The paper presents NEMO5, a comprehensive simulation framework designed to meet the demanding requirements of next‑generation nanotransistor performance prediction. Modern transistor dimensions below 5 nm demand a quantum‑mechanical treatment of carrier transport that includes not only coherent quantum effects but also incoherent scattering mechanisms such as electron‑phonon interactions, as well as realistic structural disorder arising from random alloy composition and surface roughness. Traditional tools either neglect one or more of these phenomena or resort to severe approximations that compromise predictive accuracy.

NEMO5 tackles this challenge by implementing the full nonequilibrium Green’s function (NEGF) formalism with self‑energies that capture both elastic and inelastic phonon scattering. The self‑energy terms are evaluated on‑the‑fly for each energy‑momentum point, allowing the temperature‑dependent phonon population to be updated dynamically during the simulation. Random alloy disorder and surface roughness are modeled directly on a real‑space atomistic grid; thousands of disorder realizations are generated in parallel and statistically averaged, thereby reproducing the variability observed in fabricated devices.

Because the combined physical model leads to systems with billions of degrees of freedom and terabytes of data, the authors devote a large portion of the work to high‑performance parallelization. The computational domain is decomposed in three dimensions, and each sub‑domain stores its local Hamiltonian, Green’s functions, and self‑energies in node‑local memory. Inter‑node communication is performed asynchronously using non‑blocking MPI calls, and a communication‑computation overlap strategy ensures that network latency does not dominate runtime. Global convergence criteria are evaluated with asynchronous All‑reduce operations, keeping the communication overhead below 15 % of total execution time even at extreme scale. Load balancing is achieved through a dynamic work‑stealing scheduler that accounts for the fact that different energy‑momentum points have widely varying computational costs.

Performance benchmarks were carried out on several of today’s leading supercomputers, including Cray‑XC50, IBM Summit, and Fugaku. Test cases comprised silicon‑on‑insulator, germanium, and emerging two‑dimensional material channels with gate lengths ranging from 1 nm to 5 nm and channel thicknesses down to 0.5 nm. NEMO5 demonstrated near‑linear strong scaling up to 178 176 CPU cores, delivering a sustained throughput of approximately 857 TFLOPS. Memory‑efficiency improvements—compression of sparse matrices, memory pooling, and on‑the‑fly recomputation of rarely used quantities—allowed the code to handle problem sizes three times larger than previous state‑of‑the‑art NEGF solvers on the same hardware.

The physical results show excellent agreement with experimental I‑V characteristics, and the simulations reveal subtle, non‑linear interactions between phonon scattering and surface roughness that significantly affect carrier mobility at sub‑3 nm scales. By providing a tool that simultaneously delivers high physical fidelity and extreme parallel performance, NEMO5 positions itself as a critical enabler for transistor design beyond the limits of Moore’s Law. The authors suggest future extensions such as GPU acceleration, integration with machine‑learning‑based parameter optimization, and application to novel device concepts like gate‑all‑around nanowires, tunnel FETs, and topological insulator channels. In summary, NEMO5 proves that the combined challenges of quantum transport, disorder, and large‑scale computation can be overcome, opening the door to predictive, atomistic design of the next generation of semiconductor technologies.


Comments & Academic Discussion

Loading comments...

Leave a Comment