Show simple item record

dc.contributor.authorGonzález Tallada, Marc
dc.contributor.authorMorancho Llena, Enrique
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2023-10-05T08:55:56Z
dc.date.available2023-10-05T08:55:56Z
dc.date.issued2024-01-10
dc.identifier.citationGonzalez, M.; Morancho, E. Compute units in OpenMP: extensions for heterogeneous parallel programming. "Concurrency and computation: practice and experience", 10 Gener 2024, vol. 36, núm. 1, article e7885.
dc.identifier.issn1532-0626
dc.identifier.urihttp://hdl.handle.net/2117/394656
dc.description.abstractThis article evaluates the current support for heterogeneous OpenMP 5.2 applications regarding the simultaneous activation of host and device computing units (e.g., CPUs, GPUs, or FPGAs). The article identifies limitations in the current OpenMP specification and describes the design and implementation of novel OpenMP extensions and runtime support for heterogeneous parallel programming. The Compute Unit (CUs) abstraction is introduced in the OpenMP programming model. The Compute Unit abstraction is defined in terms of an aggregation of computing elements (e.g., CPUs, GPUs, FPGAs). On top of CUs, the article describes dynamic work sharing constructs and schedulers that address the inherent differences in compute power of host and device CUs. New constructs and the corresponding runtime support are described for the new abstractions. The article evaluates the case of a hybrid multilevel parallelization of the NPB-MZ benchmark suite. The implementation exploits both coarse-grain and fine-grain parallelism, mapped to CUs of different nature (GPUs and CPUs). All CUs are activated using the new extensions and runtime support. We compare hybrid and nonhybrid executions under two state-of-the-art work-distribution schemes (Static and Dynamic Task schedulers). On a computing node composed of one AMD EPYC 7742 @ 2.250GHz (64 cores and 2 threads/core, totalling 128 threads per node) and 2 GPU AMD Radeon Instinct MI50 with 32GB, hybrid executions present speedups from 1.08 up to 3.18 with respect to a nonhybrid GPU implementation, depending on the number of activated CUs.
dc.description.sponsorshipThis work was supported by the Spanish Ministry of Science and Technology (PID2019-107255GB).
dc.format.extent22 p.
dc.language.isoeng
dc.publisherJohn Wiley & sons
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshParallel programming (Computer science)
dc.subject.lcshApplication program interfaces (Computer software)
dc.subject.lcshGraphics processing units
dc.subject.otherGPUs
dc.subject.otherHeterogeneous computing
dc.subject.otherOpenMP
dc.subject.otherWork distribution
dc.titleCompute units in OpenMP: extensions for heterogeneous parallel programming
dc.typeArticle
dc.subject.lemacProgramació en paral·lel (Informàtica)
dc.subject.lemacInterfícies de programació d'aplicacions (Programari)
dc.subject.lemacUnitats de processament gràfic
dc.contributor.groupUniversitat Politècnica de Catalunya. PM - Programming Models
dc.identifier.doi10.1002/cpe.7885
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://onlinelibrary.wiley.com/doi/full/10.1002/cpe.7885
dc.rights.accessOpen Access
local.identifier.drac37003704
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C22/ES/UPC-COMPUTACION DE ALTAS PRESTACIONES VIII/
local.citation.authorGonzalez, M.; Morancho, E.
local.citation.publicationNameConcurrency and computation: practice and experience
local.citation.volume36
local.citation.number1, article e7885


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record