Parallel grid library for rapid and flexible simulation development

We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at [https://gitorious.org/dccrg]. Users are also kindly requested to cite this work when publishing results obtained with dccrg.

💡 Research Summary

The paper introduces dccrg (Distributed Cartesian Cell‑Refinable Grid), a parallel grid library designed to accelerate the development of highly scalable scientific simulations. Unlike many existing grid frameworks that assume a static mesh and fixed‑size cell data, dccrg allows any user‑defined C++ class to serve as the content of a cell, enabling the storage of variable‑size data that can change both spatially and temporally. This flexibility makes the library suitable for a wide range of applications, from fluid dynamics to particle‑in‑cell (PIC) codes, where the amount of information per cell may differ dramatically across the domain.

The core data structure is a globally indexed three‑dimensional Cartesian mesh. Each MPI process owns a sub‑domain consisting of a set of cells and a minimal amount of metadata describing the existence, refinement level, and neighbor relationships of those cells. The library supports adaptive mesh refinement (AMR) by recursively subdividing a cell into eight (or four in 2‑D) child cells. Refinement and coarsening are driven by user‑supplied criteria such as gradient thresholds or error estimators. In the current implementation, a portion of the mesh metadata is replicated on all processes. This design choice simplifies the bookkeeping required for AMR but introduces a scalability bottleneck: as the number of processes grows, the cost of synchronizing the replicated metadata dominates, limiting effective AMR scaling to roughly 200–600 processes.

Communication is handled entirely through non‑blocking MPI calls (MPI_Isend, MPI_Irecv). When a process needs data from neighboring cells that reside on other ranks, it packs the required information using a user‑provided callback, initiates asynchronous sends, and posts matching receives. Received data are unpacked into “ghost” cells, allowing the main computation to proceed without waiting for communication to finish. This overlap of computation and communication yields excellent strong‑scaling performance for fixed‑mesh problems.

Performance experiments focus on three‑dimensional magnetohydrodynamic (MHD) test cases. For a static grid, dccrg demonstrates near‑linear scaling up to 32 k cores on a modern HPC cluster, achieving over 90 % parallel efficiency at the million‑cell scale. When AMR is enabled, the same test shows a sharp drop in scalability once the process count exceeds a few hundred, confirming that metadata replication is the limiting factor. Nevertheless, the library’s ability to hide communication latency and its straightforward API (functions such as Grid::refine, Grid::updateGhostCells, etc.) make it attractive for rapid prototyping and production‑level runs alike.

The authors provide the library as open‑source software under the GPLv3 license, hosted at https://gitorious.org/dccrg. The distribution includes comprehensive documentation, a suite of example programs ranging from simple diffusion to full MHD, and a set of regression tests. Users are encouraged to cite the paper when publishing results obtained with dccrg.

In conclusion, dccrg fills an important niche by combining flexible, user‑defined cell data with asynchronous MPI communication and built‑in AMR support. While its current AMR scalability is constrained by the replicated metadata approach, the authors suggest that more sophisticated metadata management (e.g., hierarchical or distributed hash tables) could alleviate this issue. Future extensions such as GPU offloading, hybrid MPI‑OpenMP execution, and dynamic load balancing are envisioned, which would further broaden the library’s applicability across computational physics, engineering, and beyond.