A data-centric directive-based framework to accelerate out-of-core stencil computation on a GPU
View/Open
Cita com:
hdl:2117/334394
Document typeArticle
Defense date2020-12-01
PublisherInstitute of Electronics, Information and Communication Engineers
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Graphics processing units (GPUs) are highly efficient architectures for parallel stencil code; however, the small device (i.e., GPU) memory capacity (several tens of GBs) necessitates the use of out-of-core computation to process excess data. Great programming effort is needed to manually implement efficient out-of-core stencil code. To relieve such programming burdens, directive-based frameworks emerged, such as the pipelined accelerator (PACC); however, they usually lack specific optimizations to reduce data transfer. In this paper, we extend PACC with two data-centric optimizations to address data transfer problems. The first is a direct-mapping scheme that eliminates host (i.e., CPU) buffers, which intermediate between the original data and device buffers. The second is a region-sharing scheme that significantly reduces host-to-device data transfer. The extended PACC was applied to an acoustic wave propagator, automatically extending the length of original serial code 2.3-fold to obtain the out-of-core code. Experimental results revealed that on a Tesla V100 GPU, the generated code ran 41.0, 22.1, and 3.6 times as fast as implementations based on Open Multi-Processing (OpenMP), Unified Memory, and the previous PACC, respectively. The generated code also demonstrated usefulness with small datasets that fit in the device capacity, running 1.3 times as fast as an in-core implementation.
Description
Special Section on Parallel, Distributed, and Reconfigurable Computing, and Networking
CitationShen, J. [et al.]. A data-centric directive-based framework to accelerate out-of-core stencil computation on a GPU. "EICE Transactions on Information and Systems", 1 Desembre 2020, vol. E103.D, núm. 12, p. 2421-2434.
ISSN0916-8532
1745-1361
1745-1361
Collections
Files | Description | Size | Format | View |
---|---|---|---|---|
E103.D_2020PAP0014.pdf | 1,670Mb | View/Open |