Exploring dynamic parallelism in OpenMP
Document typeConference report
PublisherAssociation for Computing Machinery (ACM)
Rights accessRestricted access - publisher's policy
GPU devices are becoming a common element in current HPC platforms due to their high performance-per-Watt ratio. However, developing applications able to exploit their dazzling performance is not a trivial task, which becomes even harder when they have irregular data access patterns or control flows. Dynamic Parallelism (DP) has been introduced in the most recent GPU architecture as a mechanism to improve applicability of GPU computing in these situations, resource utilization and execution performance. DP allows to launch a kernel within a kernel without intervention of the CPU. Current experiences reveal that DP is offered to programmers at the expenses of an excessive overhead which, together with its architecture dependency, makes it difficult to see the benefits in real applications. In this paper, we propose how to extend the current OpenMP accelerator model to make the use of DP easy and effective. The proposal is based on nesting of teams constructs and conditional clauses, showing how it is possible for the compiler to generate code that is then efficiently executed under dynamic runtime scheduling. The proposal has been implemented on the MACC compiler supporting the OmpSs task--based programming model and evaluated using three kernels with data access and computation patterns commonly found in real applications: sparse matrix vector multiplication, breadth-first search and divide--and--conquer Mandelbrot. Performance results show speed-ups in the 40x range relative to versions not using DP.
CitationOzen, G., Ayguade, E., Labarta, J. Exploring dynamic parallelism in OpenMP. A: Workshop on Accelerator Programming using Directives. "Proceedings of WACCPD 2015: Second Workshop on Accelerator Programming using Directives: Monday, November 16, 2015. Held in conjunction with SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis: Austin, Texas: November 15-20, 2015". Austin, TX: Association for Computing Machinery (ACM), 2015.