POSTER: collective dynamic parallelism for directive based GPU programming languages and compilers
Document typeConference report
PublisherAssociation for Computing Machinery (ACM)
Rights accessRestricted access - publisher's policy
Early programs for GPU (Graphics Processing Units) acceleration were based on a flat, bulk parallel programming model, in which programs had to perform a sequence of kernel launches from the host CPU. In the latest releases of these devices, dynamic (or nested) parallelism is supported, making possible to launch kernels from threads running on the device, without host intervention. Unfortunately, the overhead of launching kernels from the device is higher compared to launching from the host CPU, making the exploitation of dynamic parallelism unprofitable. This paper proposes and evaluates the basic idea behind a user-directed code transformation technique, named collective dynamic parallelism, that targets the effective exploitation of nested parallelism in modern GPUs. The technique dynamically packs dynamic parallelism kernel invocations and postpones their execution until a bunch of them are available. We show that for sparse matrix vector multiplication, CollectiveDP outperforms well optimized libraries, making GPU useful when matrices are highly irregular.
CitationOzen, G., Ayguade, E., Labarta, J. POSTER: collective dynamic parallelism for directive based GPU programming languages and compilers. A: International Conference on Parallel Architectures and Compilation Techniques. "PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation". Haifa: Association for Computing Machinery (ACM), 2016, p. 423-424.
|POSTER collecti ... anguages and compilers.pdf||936,8Kb||Restricted access|