Show simple item record

dc.contributor.authorMatsumura, Kazuaki
dc.contributor.authorGarcía De Gonzalo, Simón
dc.contributor.authorPeña, Antonio
dc.date.accessioned2021-05-28T10:27:13Z
dc.date.available2021-05-28T10:27:13Z
dc.date.issued2021-05
dc.identifier.citationMatsumura, K.; García De Gonzalo, S.; Peña, A. Predicate-based filtering for multi-GPU utilization in directive-based programming. A: . Barcelona Supercomputing Center, 2021, p. 48-49.
dc.identifier.urihttp://hdl.handle.net/2117/346337
dc.description.abstractDesigning and building supercomputers is a complex task in the field of high-performance computing (HPC). The hardware, middleware and algorithms need to effectively collaborate to achieve ideal results for massive and practical problems. To facilitate the easy usage of supercomputers, compiler technologies have been developed with highly automated program optimizations that use domain-specific knowledge and understandings of target architectures [1]. Directive-based programming has been employed for enabling accelerator use, while replacing vendor-specific coding with directive insertion. Keeping software portability with minimum engineering efforts upon sequential code, OpenACC and OpenMP are now widely used for accelerator programming [2], [3]. However, pursuing ideal performance is often challenging. The bare insertion of directives by the programmers exposes less program characteristics for the compilation; thus, programmers aiming at better efficiency are forced to reshape their code merely for adjusting to the environment such as compilers, software stacks and heterogeneous architecture. While keeping the productivity, our research extends OpenACC to exploit further optimization opportunities. In a portable fashion that relies on other compilers, our approach provides an environment which enables dynamic analysis of computation and perform on-the-fly kernel specialization. Considering the high memory latency of GPUs, we add a novel code-translation technique named predicated-based filtering to automate multi-device utilization. We never split loop ranges nor introduce fine dependency analysis, but divide data ranges to be updated on each device. This idea allows to distribute highly-tuned code without changing code structure nor parallelism.
dc.format.extent2 p.
dc.languageen
dc.language.isoeng
dc.publisherBarcelona Supercomputing Center
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshHigh performance computing
dc.subject.otherMulti-GPU
dc.subject.otherOpenACC
dc.subject.otherCompiler
dc.subject.otherCode Generation
dc.titlePredicate-based filtering for multi-GPU utilization in directive-based programming
dc.typeConference report
dc.subject.lemacCàlcul intensiu (Informàtica)
dc.rights.accessOpen Access
local.citation.startingPage48
local.citation.endingPage49


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record