A GPU Implementation for Two-Dimensional Shallow Water Modeling
In this paper, we present a GPU implementation of a two-dimensional shallow water model. Water simulations are useful for modeling floods, river/reservoir behavior, and dam break scenarios. Our GPU implementation shows vast performance improvements over the original Fortran implementation. By taking advantage of the GPU, researchers and engineers will be able to study water systems more efficiently and in greater detail.
đĄ Research Summary
The paper presents a CUDAâbased GPU implementation of a twoâdimensional shallowâwater model originally written in Fortran. The authors adopt Owen Ransomâs 2âD predictorâcorrector MacCormack scheme, which divides each time step into 16 subâsteps to respect data dependencies while keeping cell updates independent within a subâstep. Two GPU versions are developed: a nonâsharedâmemory version that launches multiple kernels per time step and stores all data in global memory, and a sharedâmemory version that groups threads into 16âŻĂâŻ16 blocks, copies each blockâs interior and halo cells into fast onâchip shared memory, and thus reduces global memory traffic. The nonâshared version suffers from a serial timeâstepâsize calculation performed on the CPU, incurring PCIâExpress transfer overhead, whereas the shared version achieves an additional 10â15âŻ% speedup by reusing data locally. Performance is evaluated on an Ubuntu 12.04 system with an Intel i7â3370K CPU and an NVIDIA GTXâŻ680 GPU using five datasets of varying grid sizes (up to 1,048,576 cells) and simulation lengths. Results show that the GPU code outperforms the original Fortran code by more than an order of magnitude, with larger grids yielding greater relative gains. The sharedâmemory optimization further improves runtime, and the authors discuss how increased sharedâmemory capacity or larger block sizes could provide even higher speedups. Related work is surveyed, highlighting alternative schemes such as KurganovâPetrov flux calculations and earlyâexit optimizations for dry cells. The paper also outlines future directions: scaling the solver across multiple GPUs by domain decomposition, handling dry cells to avoid negative water elevations, and exploiting newer heterogeneous CPUâGPU architectures. Overall, the study demonstrates that explicit shallowâwater simulations are wellâsuited to massively parallel GPU hardware, delivering substantial runtime reductions and opening the door to higherâresolution, more detailed flood and damâbreak analyses.
Comments & Academic Discussion
Loading comments...
Leave a Comment