Efficient Parallelization for AMR MHD Multiphysics Calculations; Implementation in AstroBEAR

Reading time: 5 minute
...

📝 Original Info

  • Title: Efficient Parallelization for AMR MHD Multiphysics Calculations; Implementation in AstroBEAR
  • ArXiv ID: 1110.1616
  • Date: 2019-03-27
  • Authors: : Cunningham et al.

📝 Abstract

Current AMR simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors.

💡 Deep Analysis

Figure 1

📄 Full Content

The development of Adaptive Mesh Refinement (Berger & Oliger 1984;Berger & Colella 1989) methods were meant to provide high resolution simulations for much lower computational cost than fixed grid methods would allow. The use of highly parallel systems and the algorithms that go with them were also meant to allow higher resolution simulations to be run faster (relative to wall clock time). The parallelization of AMR alogrithms, which should combine the cost/time savings of both methods is not straight forward however and there have been many different approaches (MacNeice et al. 2000;Ziegler 2008;O'Shea et al. 2004;A.M & Khokhlov 1998), . While parallelization of a uniform mesh demands little communication between processors, AMR methods can demand considerable communication to maintain data consistency across the unstructured mesh as well as shuffling new grids from one processor to another to balance work load.

In this paper we report the development and implementation of new algorithms for the efficient parallelization of AMR designed to scale to very large simulations. The new alogorithms are part of the AstroBEAR package for simulation of astrophysical 2 Carroll-Nellenback, Shroyer, Frank, and Ding fluid multi-physics problems (Cunningham et al. 2009). The new algorithmic structrure described in this paper constitudes the development of version 2.0 of the AstroBEAR code.

AstroBEAR like many other grid based AMR codes utilizes a nested tree structure to organize each individual refinement region. However, as we will describe, unlike many other AMR codes, AstroBEAR 2.0 uses a distributed tree in which no processor has access to the entire tree but rather each processor is only aware of the AMR structure it needs to manage in order to carry out its computations and perform the necessary communications. While currently, this additional memory is small compared to the resources typically available to a CPU, future clusters will likely have much less memory per processor similar to what is already seen in GPU’s. Additionally each processor only sends and receives the portions of the tree necessary to carry out its communication.

AstroBEAR 2.0 also uses extended ghost zones to decouple advances on various levels of refinement. As we show below this allows for each level’s advance to be computed independently on separate threads. Such inter-level threading allows for total load balancing across all refinement levels instead of balancing each level independently. Independent load balancing becomes especially important for deep simulations (simulations with low filling fractions but many levels of AMR) as opposed to shallow simulations (high filling fractions and only a few levels of AMR). Processors with coarse grids can advance their grids simultaneously while processors with finer grids advance theirs. Without such a capability, each level would need to have enough cells to be able to be distributed across all of the processors. Variations in the filling fractions from level to level can make the number of cells on each level very different. If there are enough cells on the level with the fewest to be adequately distributed, there will likely be far too many cells on the highest level to allow the computation to be completed in a reasonable wall time. This often restricts the number of levels of AMR that can be practically used. With inter-level threading this restriction is lifted. Inter-level threading also allows processors to remain busy while waiting for messages from other processors.

In what follows we provide descriptions of the new code and its structure as well as providing tests which demonstrate its effective scaling. In 2 we review patch based AMR. In section 3 we will discuss the distributed tree algorithm, in section 4 we will discuss the inter-level threading of the advance, in section 5 we will discuss the load balancing algorithm, and in section 6 we will present our scaling results.

Here we give a brief overview of patch based AMR introducing our terminology along the way. The fundamental unit of the AMR algorithm is a patch or grid. Each grid contains a regular array of cells in which the fluid variables are stored. Grids with a common resolution or cell width ∆x l belong to the same level l and on all but the coarsest level are always nested within a coarser “parent” grid of level l -1 and resolution ∆x l-1 = R × ∆x l where R is the refinement ratio. The collection of grids comprises the AMR mesh, an example of which is shown in figure 1. In addition to the computations required to advance the fluid variables, each grid needs to exchange data with its parent grid (on level l-1) as well as any child grids (on level l+1). Grids also need to exchange data with physically adjacent neighboring grids (on level l). In order to exchange data, Mesh/tree showing current grids/nodes. Previous set of grids not shown here nor overlap connections to corresponding previous nodes.

Adjacent Siblings

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut