📝 Original Info
- Title: High Performance Direct Gravitational N-body Simulations on Graphics Processing Units – II: An implementation in CUDA
- ArXiv ID: 0707.0438
- Date: 2008-11-26
- Authors: Researchers from original ArXiv paper
📝 Abstract
We present the results of gravitational direct $N$-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the $N$-body problem is implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different $N$-body codes: two direct $N$-body integration codes, using the 4th order predictor-corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU. We find that for $N > 512$ particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose hardware. Using the same time-step criterion, the total energy of the $N$-body system was conserved better than to one in $10^6$ on the GPU, only about an order of magnitude worse than obtained with GRAPE-6Af. For $N \apgt 10^5$ the 8800GTX outperforms the host CPU by a factor of about 100 and runs at about the same speed as the GRAPE-6Af.
💡 Deep Analysis
Deep Dive into High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA.
We present the results of gravitational direct $N$-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the $N$-body problem is implemented in ``Compute Unified Device Architecture’’ (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different $N$-body codes: two direct $N$-body integration codes, using the 4th order predictor-corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU. We find that for $N > 512$ particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purp
📄 Full Content
arXiv:0707.0438v2 [astro-ph] 16 Jul 2007
High Performance Direct Gravitational
N-body Simulations on Graphics Processing
Units
II: An implementation in CUDA
Robert G. Belleman a Jeroen B´edorf a
Simon F. Portegies Zwart a,b
aSection Computational Science, University of Amsterdam, Amsterdam, The
Netherlands
bAstronomical Institute ”Anton Pannekoek” , University of Amsterdam,
Amsterdam, The Netherlands
Abstract
We present the results of gravitational direct N-body simulations using the Graph-
ics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed
for gaming computers. The force evaluation of the N-body problem is implemented
in “Compute Unified Device Architecture” (CUDA) using the GPU to speed-up the
calculations. We tested the implementation on three different N-body codes: two
direct N-body integration codes, using the 4th order predictor-corrector Hermite
integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd
order leapfrog integration scheme. The integration of the equations of motions for
all codes is performed on the host CPU.
We find that for N > 512 particles the GPU outperforms the GRAPE-6Af, if
some softening in the force calculation is accepted. Without softening and for very
small integration time steps the GRAPE still outperforms the GPU. We conclude
that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose
hardware. Using the same time-step criterion, the total energy of the N-body sys-
tem was conserved better than to one in 106 on the GPU, only about an order
of magnitude worse than obtained with GRAPE-6Af. For N >∼105 the 8800GTX
outperforms the host CPU by a factor of about 100 and runs at about the same
speed as the GRAPE-6Af.
Key words: gravitation – stellar dynamics – methods: N-body simulation –
methods: numerical
Preprint submitted to Elsevier Preprint
30 October 2018
1
Introduction
The introduction of multiple processing cores in one chip allows microprocessor
manufacturers to improve the performance of CPUs while the clock rate stays
the same. This multi-core principle is not new. Over the last decade, a similar
approach has been taken by manufacturers of graphics processing units (GPU)
under the influence of the gaming industry to deliver increasingly detailed and
responsive computer games. As a result of this, the GPU underwent a dramatic
increase in performance; a doubling in performance over a period of 9 months,
instead of 18 months for CPUs (NVIDIA 2007; Moore 1965).
In terms of raw performance, today’s GPUs outperform conventional CPUs.
For example, the NVIDIA GeForce 8800GTX has a performance of about
350 GFLOP/s (see § 4). However, harvesting this computing power is not triv-
ial as GPUs are designed and optimized for graphics operations. Over the
last 7 years GPUs have evolved from fixed function hardware for the support
of primitive graphical operations to programmable processors that outper-
form conventional CPUs, in particular for vectorizable parallel operations.
Today’s GPUs contain many multiple smaller processors called stream pro-
cessors (Owens 2005), that are specialized in processing large amounts of data
in a streaming and parallel fashion. It is because of these developments that
more and more people use the GPU for wider purposes than just for graphics
(Fernando 2004; Pharr & Fernando 2005; Buck et al. 2004).
Initially, the programming of GPUs was done in assembly language and re-
quired a very specific knowledge of the hardware. Newer generations of GPUs
offered more possibilities for the programmer and with this came the need for
high-level programming languages. With the introduction of shading languages
like Cg (Mark et al. 2003) and GLSL (Kessenich et al. 2007), the programmer
could focus on the problem at hand.
Around this time, the performance of the GPU attracted the attention of
researchers with an interest in the use of the GPU as a high-performance
coprocessor. First implementations mapped their problems into a graphics
problem where data is represented as coloured pixels stored in textures. Shad-
ing programs were then used to perform computations on the data. Although
not every problem is easily represented as a graphics problem, the use of the
GPU was demonstrated in many scientific areas, including but not limiting to
PDE solvers, ray tracing, image segmentation and gravitational simulations
(Owens et al. 2007).
One downside of the GPU is that the current generation only supports single
precision (32-bit) floating point arithmetic. This limits their use to applica-
tions for which single precision is sufficient. In the release notes of Compute
2
Unified Device Architecture (CUDA) version 0.8, NVIDIA announced that
GPUs supporting 64-bit double precision floating point arithmetic will be-
come available in late 2007 (NVIDIA 2007).
In this second paper on high performance N-body simulations using GPUs,
we present an implantation using CUDA, and apply the implementation to
solve gravitational N-b
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.