Communication-Avoiding Seismic Numerical Kernels on Multicore Processors
Abstract
The finite-difference method is routinely used to simulate seismic wave propagation both in the oil and gas industry and in strong motion analysis in seismology. This numerical method also lies at the heart of a significant fraction of numerical solvers in other fields. In terms of computational efficiency, one of the main difficulties is to deal with the disadvantageous ratio between the limited pointwise computation and the intensive memory access required, leading to a memory-bound situation. Naive sequential implementations offer poor cache-reuse and achieve in general a low fraction of peak performance of the processors. The situation is worst on multicore computing nodes with several levels of memory hierarchy. In this case, each cache miss corresponds to a costly memory access. Additionally, the memory bandwidth available on multicore chips improves slowly regarding the number of computing core which induces a dramatic reduction of the expected parallel performance. In this article, we introduce a cache-efficient algorithm for stencil-based computations using a decomposition along both the space and the time directions. We report a maximum speedup of x3.59 over the standard implementation.