Tasking in accelerators: performance evaluation

Cita com:
hdl:2117/185805
Document typeConference report
Defense date2019
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
European Commission's projectECO-H-MEM - Advanced Ecosystem for Broad Heterogeneous Memory Usage (EC-H2020-749516)
EPEEC - European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing (EPEEC) (EC-H2020-801051)
EPEEC - European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing (EPEEC) (EC-H2020-801051)
Abstract
In this work, we analyze the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems. As a benchmark we propose three different methods for solving DGEMM operation on tiled-matrices; which might be the most popular benchmark for performance analysis. For the algorithms that we study, we present significant differences in terms of data dependencies, synchronization and granularity. The main contribution of this work is determining which of the previous approaches work better for having multiple task running concurrently in a single GPU, as well as stating the main limitations and benefits of every technique. Using dynamic parallelism and CUDA Streams we were able to achieve up to 30% speedups and for CUDA Graph API up to 25x acceleration outperforming state of the art results.
Description
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
CitationToledo, L. [et al.]. Tasking in accelerators: performance evaluation. A: International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). "2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT): Gold Coast, Australia: 5-7 December 2019: proceeding". Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 127-132.
Publisher versionhttps://ieeexplore.ieee.org/abstract/document/9029123
Collections
Files | Description | Size | Format | View |
---|---|---|---|---|
Tasking in Acce ... rmance Evaluation_2019.pdf | 302,2Kb | View/Open |
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder