You are here

High Performance Seismic Simulation with GPGPU

SC11 Special Awards paper candidate
Matsuoka-lab, Department of Mathematics And Computer Science
http://matsu-www.is.titech.ac.jp/en/contact

Background / Motivations

In extremely parallel computation environments, like Tsubame 2.0, the number of computing nodes is so large that we cannot neglect the probability of node failures. To make progress in such high failures frequency, it is important to enable calculation to continue after node failures.
In order to reach this goal, we have to save the state of application periodically, which is called checkpointing. It is important to reduce the checkpointing overhead to achieve better performance, because its time cost approach about 25% of the total execution time in current systems.
In our research, we have improved the checkpoint performance of GPGPU version of SPECFEM3D, which is one of the leading applications to simulate seismic wave propagations.

Method


We have implemented efficient checkpointing facility into CUDA version of SPECFEM3D. The checkpointing of parallel applications are divided into two parts: 1) saving memory contents of parallel applications locally by stopping the execution of the application, and 2) encoding that data among nodes using Reed-Solomon, to not lose checkpointed data during a node failure.
GPGPU applications typically use only GPUs to their calculation and some CPU cores are idle during calculation. The temporal overhead of checkpointing can be hidden by offloading the data encoding and transfer to those idle CPU cores.

Evaluation


We have compared the performance of the SPECFEM3D GPGPU version among the following conditions: 1) without any checkpointing, 2) with checkpointing locally on SSDs 3) checkpointing by our method (FTI L1, L2), 4) checkpointing by our method (FTI L1, L2, L3) and 5) with checkpointing by existing method (BLCR+Lustre),
The performance graph shows that our method can provide checkpointing facility with smaller impact to execution, and keeps better performance even when we solve the same size problem by many computing nodes (Strong scaling).


In addition to that, we simulated the seismic movement of the March 11th Tohoku earthquake with SPECFEM3D. We split the simulated area into a 960×960 mesh and provided slip distribution of planes as source and calculated 1500 seconds of movement in east-west, north-south, and vertical axis on 70 stations. The simulated displacement result agrees with the actual displacement after the earthquake.


Synthetic Seismograms at Hirono station in Fukushima prefecture

PageTop