High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA

Reading time: 6 minute
...

📝 Original Info

  • Title: High Performance Direct Gravitational N-body Simulations on Graphics Processing Units – II: An implementation in CUDA
  • ArXiv ID: 0707.0438
  • Date: 2008-11-26
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We present the results of gravitational direct $N$-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the $N$-body problem is implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different $N$-body codes: two direct $N$-body integration codes, using the 4th order predictor-corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU. We find that for $N > 512$ particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose hardware. Using the same time-step criterion, the total energy of the $N$-body system was conserved better than to one in $10^6$ on the GPU, only about an order of magnitude worse than obtained with GRAPE-6Af. For $N \apgt 10^5$ the 8800GTX outperforms the host CPU by a factor of about 100 and runs at about the same speed as the GRAPE-6Af.

💡 Deep Analysis

Deep Dive into High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA.

We present the results of gravitational direct $N$-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the $N$-body problem is implemented in ``Compute Unified Device Architecture’’ (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different $N$-body codes: two direct $N$-body integration codes, using the 4th order predictor-corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU. We find that for $N > 512$ particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purp

📄 Full Content

arXiv:0707.0438v2 [astro-ph] 16 Jul 2007 High Performance Direct Gravitational N-body Simulations on Graphics Processing Units II: An implementation in CUDA Robert G. Belleman a Jeroen B´edorf a Simon F. Portegies Zwart a,b aSection Computational Science, University of Amsterdam, Amsterdam, The Netherlands bAstronomical Institute ”Anton Pannekoek” , University of Amsterdam, Amsterdam, The Netherlands Abstract We present the results of gravitational direct N-body simulations using the Graph- ics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the N-body problem is implemented in “Compute Unified Device Architecture” (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different N-body codes: two direct N-body integration codes, using the 4th order predictor-corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU. We find that for N > 512 particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose hardware. Using the same time-step criterion, the total energy of the N-body sys- tem was conserved better than to one in 106 on the GPU, only about an order of magnitude worse than obtained with GRAPE-6Af. For N >∼105 the 8800GTX outperforms the host CPU by a factor of about 100 and runs at about the same speed as the GRAPE-6Af. Key words: gravitation – stellar dynamics – methods: N-body simulation – methods: numerical Preprint submitted to Elsevier Preprint 30 October 2018 1 Introduction The introduction of multiple processing cores in one chip allows microprocessor manufacturers to improve the performance of CPUs while the clock rate stays the same. This multi-core principle is not new. Over the last decade, a similar approach has been taken by manufacturers of graphics processing units (GPU) under the influence of the gaming industry to deliver increasingly detailed and responsive computer games. As a result of this, the GPU underwent a dramatic increase in performance; a doubling in performance over a period of 9 months, instead of 18 months for CPUs (NVIDIA 2007; Moore 1965). In terms of raw performance, today’s GPUs outperform conventional CPUs. For example, the NVIDIA GeForce 8800GTX has a performance of about 350 GFLOP/s (see § 4). However, harvesting this computing power is not triv- ial as GPUs are designed and optimized for graphics operations. Over the last 7 years GPUs have evolved from fixed function hardware for the support of primitive graphical operations to programmable processors that outper- form conventional CPUs, in particular for vectorizable parallel operations. Today’s GPUs contain many multiple smaller processors called stream pro- cessors (Owens 2005), that are specialized in processing large amounts of data in a streaming and parallel fashion. It is because of these developments that more and more people use the GPU for wider purposes than just for graphics (Fernando 2004; Pharr & Fernando 2005; Buck et al. 2004). Initially, the programming of GPUs was done in assembly language and re- quired a very specific knowledge of the hardware. Newer generations of GPUs offered more possibilities for the programmer and with this came the need for high-level programming languages. With the introduction of shading languages like Cg (Mark et al. 2003) and GLSL (Kessenich et al. 2007), the programmer could focus on the problem at hand. Around this time, the performance of the GPU attracted the attention of researchers with an interest in the use of the GPU as a high-performance coprocessor. First implementations mapped their problems into a graphics problem where data is represented as coloured pixels stored in textures. Shad- ing programs were then used to perform computations on the data. Although not every problem is easily represented as a graphics problem, the use of the GPU was demonstrated in many scientific areas, including but not limiting to PDE solvers, ray tracing, image segmentation and gravitational simulations (Owens et al. 2007). One downside of the GPU is that the current generation only supports single precision (32-bit) floating point arithmetic. This limits their use to applica- tions for which single precision is sufficient. In the release notes of Compute 2 Unified Device Architecture (CUDA) version 0.8, NVIDIA announced that GPUs supporting 64-bit double precision floating point arithmetic will be- come available in late 2007 (NVIDIA 2007). In this second paper on high performance N-body simulations using GPUs, we present an implantation using CUDA, and apply the implementation to solve gravitational N-b

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut