Tasking in accelerators: performance evaluation

Cita com:
hdl:2117/185805
Tipo de documentoTexto en actas de congreso
Fecha de publicación2019
EditorInstitute of Electrical and Electronics Engineers (IEEE)
Condiciones de accesoAcceso abierto
Todos los derechos reservados. Esta obra
está protegida por los derechos de propiedad intelectual e industrial. Sin perjuicio de las exenciones legales
existentes, queda prohibida su reproducción, distribución, comunicación pública o transformación sin la
autorización del titular de los derechos
ProyectoECO-H-MEM - Advanced Ecosystem for Broad Heterogeneous Memory Usage (EC-H2020-749516)
EPEEC - European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing (EPEEC) (EC-H2020-801051)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
EPEEC - European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing (EPEEC) (EC-H2020-801051)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
Resumen
In this work, we analyze the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems. As a benchmark we propose three different methods for solving DGEMM operation on tiled-matrices; which might be the most popular benchmark for performance analysis. For the algorithms that we study, we present significant differences in terms of data dependencies, synchronization and granularity. The main contribution of this work is determining which of the previous approaches work better for having multiple task running concurrently in a single GPU, as well as stating the main limitations and benefits of every technique. Using dynamic parallelism and CUDA Streams we were able to achieve up to 30% speedups and for CUDA Graph API up to 25x acceleration outperforming state of the art results.
Descripción
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
CitaciónToledo, L. [et al.]. Tasking in accelerators: performance evaluation. A: International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). "2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT): Gold Coast, Australia: 5-7 December 2019: proceeding". Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 127-132.
Versión del editorhttps://ieeexplore.ieee.org/abstract/document/9029123
Colecciones
Ficheros | Descripción | Tamaño | Formato | Ver |
---|---|---|---|---|
Tasking in Acce ... rmance Evaluation_2019.pdf | 302,2Kb | Ver/Abrir |