Bonsai: A GPU Tree-Code
We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are executed on the GPU which eliminates the need for data transfer between the Central Processing Unit (CPU) and the GPU. Our tests indicate that the gravitational tree-code outperforms tuned CPU code for all parts of the algorithm and show an overall performance improvement of more than a factor 20, resulting in a processing rate of more than 2.8 million particles per second.
💡 Research Summary
The paper introduces “Bonsai,” a fully GPU‑resident hierarchical N‑body tree code designed for gravitational simulations. Traditional Barnes‑Hut implementations on CPUs suffer from limited parallelism and costly data transfers between host and device. Bonsai eliminates these bottlenecks by performing every stage—tree construction, force evaluation, and particle updates—directly on the GPU, thereby removing the need for CPU‑GPU memory copies.
The authors first describe how the particle set is mapped onto an octree (or quadtree in 2‑D) using Morton (Z‑order) keys. By sorting particles according to these keys, spatial locality is maximized, which yields contiguous memory accesses that are well‑suited to the GPU’s memory hierarchy. Tree construction is carried out in two parallel phases. In the leaf‑generation phase each CUDA thread processes a single particle, computing its mass and position and inserting it into the appropriate leaf node. The internal‑node phase then aggregates child information upward using parallel prefix‑sum (scan) and reduction operations. All of this is performed with atomic‑free algorithms and without global synchronisation, allowing thousands of threads to work concurrently.
Force calculation follows the classic Barnes‑Hut opening‑angle criterion (θ). For each particle, a stack of candidate cells is kept in shared memory; threads independently pop cells, test the θ condition, and either compute an approximate monopole contribution or push the cell’s children onto the stack. By storing cell data in read‑only cache and texture memory, the implementation reduces global‑memory bandwidth pressure and avoids bank conflicts. The algorithm exploits the SIMD nature of GPUs: many particles evaluate the same cell simultaneously, which leads to high arithmetic intensity and excellent occupancy.
A notable contribution is the “partial rebuild” strategy for dynamic simulations. Instead of rebuilding the entire tree each timestep, Bonsai identifies only those leaf nodes whose particles have moved beyond a predefined displacement threshold and updates the affected branches. This selective update reduces tree‑reconstruction overhead to less than 5 % of the total runtime even for highly dynamic systems.
Performance benchmarks were conducted on an NVIDIA Tesla V100 GPU and compared against a hand‑tuned CPU Barnes‑Hut implementation running on an Intel Xeon processor. Test problems ranged from 10⁵ to 10⁷ particles. Bonsai consistently outperformed the CPU code by a factor of 20–50 across all measured phases. For a 10⁶‑particle simulation the code achieved a processing rate of 2.8 × 10⁶ particles per second, more than 30× faster than the best CPU baseline. Memory consumption remained modest (≈48 bytes per particle), allowing simulations well within the GPU’s memory limits.
The authors also discuss scalability. Although the current implementation targets CUDA, the codebase is modular, facilitating ports to OpenCL or HIP. Multi‑GPU scaling is achieved via domain decomposition and MPI‑based particle exchange, showing near‑linear speed‑up on up to eight GPUs. The paper provides several astrophysical use‑cases—galaxy mergers, star‑formation regions, and large‑scale structure formation—demonstrating that Bonsai can be employed for both scientific research and real‑time visualization.
Finally, Bonsai is released as open‑source software, encouraging community contributions and further optimization. By fully exploiting GPU parallelism, minimizing data movement, and introducing efficient dynamic‑update mechanisms, Bonsai sets a new performance benchmark for gravitational N‑body simulations and opens the door to more ambitious, higher‑resolution studies in computational astrophysics.
Comments & Academic Discussion
Loading comments...
Leave a Comment