Efficient Parallelization for AMR MHD Multiphysics Calculations; Implementation in AstroBEAR

Reading time: 6 minute
...

📝 Abstract

Current Adaptive Mesh Refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5-20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling efficiency (> 80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.

💡 Analysis

Current Adaptive Mesh Refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5-20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling efficiency (> 80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.

📄 Content

Efficient Parallelization for AMR MHD Multiphysics Calculations; Implementation in AstroBEAR Jonathan J. Carroll-Nellenbacka, Brandon Shroyera, Adam Franka, Chen Dingb aDepartment of Physics and Astronomy, University of Rochester, Rochester, NY 14620 bDepartment of Computer Science, University of Rochester, Rochester, NY 14620 Abstract Current Adaptive Mesh Refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient paralleliza- tion and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Thread- ing of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5 −20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling effi- ciency (> 80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.

  1. Introduction The development of AMR [7, 8, 1] was meant to provide high resolution simulations for much lower computational cost than fixed grid methods would allow. The use of highly parallel systems and the algorithms that go with them were also meant to allow higher resolution simulations to be run faster (relative to wall clock time). The parallelization of AMR algorithms, which should combine the cost/time savings of both methods is not straight forward however and there have been many different approaches [10, 13, 12, 2, 6] . While parallelization of a uniform mesh demands little communication between processors, AMR methods can demand considerable communication to maintain data Email address: johannjc@pas.rochester.edu (Jonathan J. Carroll-Nellenback) Preprint submitted to Elsevier November 4, 2018 arXiv:1112.1710v2 [astro-ph.SR] 21 Feb 2013 consistency across the unstructured mesh as well as shuffling new grids from one processor to another to balance workload. In this paper we report the development and implementation of new algorithms for the efficient parallelization of AMR designed to scale to very large simulations. The new algorithms are part of the AstroBEAR package for simulation of astrophysical fluid multi- physics problems [9]. The new algorithmic structure described in this paper constitutes the development of version 2.0 of the AstroBEAR code. AMR methods come in many varieties. Meshes can either be unstructured or semi- structured. Semi-structured methods can be further divided into those which allow grids to be of arbitrary size (patch based) and those which require grids to be of a fixed size (block based or cell-based if the block size is 1). With block (or cell) based AMR, the additional constraints imposed on the structure of the mesh allow for a simpler type of connectivity within a tree. For example in block based AMR, any given block will have exactly 8 children or none (if it is a leaf) and will have at most 6 face sharing neighbors. With patch based AMR, there is no limit to the number of children or neighbors. In addition, the operation of regridding in block based AMR is much simpler. As the grid changes a given block will either persist if the physical region continues to require refinement or be destroyed. In patch based AMR, a given region may subsequently be better covered by patches of a different shape requiring transfer of data between physically overlapping previous patches and new patches. This adds an additional dimension to the tree structure and increases the complexity of maintaining a distributed tree. For both block (or cell) and patch based AMR, the actual grid data (fluid variables etc.) are always distributed across the various processors. Usually some overlap in grid data (guard/ghost cells) is desired to allow for frequent access to neighboring values without the need for additional communication. But the metadata that describes the shape and distribution of the grid data is usually stored on every processor. For 100’s or 1000’s of cores, this global tree typically requires less memory than that required for the local grid data a

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut