Mostra el registre d'ítem simple
POSTER: collective dynamic parallelism for directive based GPU programming languages and compilers
dc.contributor.author | Ozen, Guray |
dc.contributor.author | Ayguadé Parra, Eduard |
dc.contributor.author | Labarta Mancho, Jesús José |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors |
dc.contributor.other | Barcelona Supercomputing Center |
dc.date.accessioned | 2016-11-18T16:02:22Z |
dc.date.issued | 2016 |
dc.identifier.citation | Ozen, G., Ayguade, E., Labarta, J. POSTER: collective dynamic parallelism for directive based GPU programming languages and compilers. A: International Conference on Parallel Architectures and Compilation Techniques. "PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation". Haifa: Association for Computing Machinery (ACM), 2016, p. 423-424. |
dc.identifier.isbn | 978-1-4503-4121-9 |
dc.identifier.uri | http://hdl.handle.net/2117/96856 |
dc.description.abstract | Early programs for GPU (Graphics Processing Units) acceleration were based on a flat, bulk parallel programming model, in which programs had to perform a sequence of kernel launches from the host CPU. In the latest releases of these devices, dynamic (or nested) parallelism is supported, making possible to launch kernels from threads running on the device, without host intervention. Unfortunately, the overhead of launching kernels from the device is higher compared to launching from the host CPU, making the exploitation of dynamic parallelism unprofitable. This paper proposes and evaluates the basic idea behind a user-directed code transformation technique, named collective dynamic parallelism, that targets the effective exploitation of nested parallelism in modern GPUs. The technique dynamically packs dynamic parallelism kernel invocations and postpones their execution until a bunch of them are available. We show that for sparse matrix vector multiplication, CollectiveDP outperforms well optimized libraries, making GPU useful when matrices are highly irregular. |
dc.format.extent | 2 p. |
dc.language.iso | eng |
dc.publisher | Association for Computing Machinery (ACM) |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles |
dc.subject.lcsh | Parallel processing (Electronic computers) |
dc.subject.other | Application programming interfaces (API) |
dc.subject.other | Computer graphics |
dc.subject.other | Concurrency control |
dc.subject.other | Cosine transforms |
dc.subject.other | Memory architecture |
dc.subject.other | Parallel architectures |
dc.subject.other | Parallel programming |
dc.subject.other | Program processors |
dc.subject.other | CUDA |
dc.subject.other | Graphics Processing Unit |
dc.subject.other | Languages and compilers |
dc.subject.other | Nested Parallelism |
dc.subject.other | Openacc |
dc.subject.other | openmp |
dc.subject.other | Parallel programming model |
dc.subject.other | Sparse matrix-vector multiplication |
dc.title | POSTER: collective dynamic parallelism for directive based GPU programming languages and compilers |
dc.type | Conference report |
dc.subject.lemac | Processament en paral·lel (Ordinadors) |
dc.contributor.group | Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
dc.identifier.doi | 10.1145/2967938.2974056 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://dl.acm.org/citation.cfm?doid=2967938.2974056 |
dc.rights.access | Restricted access - publisher's policy |
local.identifier.drac | 19160709 |
dc.description.version | Postprint (published version) |
dc.date.lift | 10000-01-01 |
local.citation.author | Ozen, G.; Ayguade, E.; Labarta, J. |
local.citation.contributor | International Conference on Parallel Architectures and Compilation Techniques |
local.citation.pubplace | Haifa |
local.citation.publicationName | PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation |
local.citation.startingPage | 423 |
local.citation.endingPage | 424 |