Exploring dynamic parallelism in OpenMP
Visualitza/Obre
a5-ozen.pdf (529,0Kb) (Accés restringit)
Sol·licita una còpia a l'autor
Què és aquest botó?
Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:
- Disposem del correu electrònic de l'autor
- El document té una mida inferior a 20 Mb
- Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
Cita com:
hdl:2117/99716
Tipus de documentText en actes de congrés
Data publicació2015
EditorAssociation for Computing Machinery (ACM)
Condicions d'accésAccés restringit per política de l'editorial
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
GPU devices are becoming a common element in current HPC platforms due to their high performance-per-Watt ratio. However, developing applications able to exploit their dazzling performance is not a trivial task, which becomes even harder when they have irregular data access patterns or control flows. Dynamic Parallelism (DP) has been introduced in the most recent GPU architecture as a mechanism to improve applicability of GPU computing in these situations, resource utilization and execution performance. DP allows to launch a kernel within a kernel without intervention of the CPU. Current experiences reveal that DP is offered to programmers at the expenses of an excessive overhead which, together with its architecture dependency, makes it difficult to see the benefits in real applications.
In this paper, we propose how to extend the current OpenMP accelerator model to make the use of DP easy and effective. The proposal is based on nesting of teams constructs and conditional clauses, showing how it is possible for the compiler to generate code that is then efficiently executed under dynamic runtime scheduling. The proposal has been implemented on the MACC compiler supporting the OmpSs task--based programming model and evaluated using three kernels with data access and computation patterns commonly found in real applications: sparse matrix vector multiplication, breadth-first search and divide--and--conquer Mandelbrot. Performance results show speed-ups in the 40x range relative to versions not using DP.
CitacióOzen, G., Ayguade, E., Labarta, J. Exploring dynamic parallelism in OpenMP. A: Workshop on Accelerator Programming using Directives. "Proceedings of WACCPD 2015: Second Workshop on Accelerator Programming using Directives: Monday, November 16, 2015. Held in conjunction with SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis: Austin, Texas: November 15-20, 2015". Austin, TX: Association for Computing Machinery (ACM), 2015.
ISBN978-1-4503-4014-4
Versió de l'editorhttp://dl.acm.org/citation.cfm?id=2832113
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
a5-ozen.pdf | 529,0Kb | Accés restringit |