Multiple target task sharing support for the OpenMP accelerator model

Ozen, Guray; Mateo, Sergi; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Beyer, James B.

doi:10.1007/978-3-319-45550-1_19

Visualitza/Obre

Multiple Target Task Sharing Support.pdf (1,647Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Ozen, Guray

Mateo, Sergi

Ayguadé Parra, Eduard

Labarta Mancho, Jesús José

Beyer, James B.

Tipus de documentText en actes de congrés

Data publicació2016

EditorSpringer

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)

Abstract

The use of GPU accelerators is becoming common in HPC platforms due to the their effective performance and energy efficiency. In addition, new generations of multicore processors are being designed with wider vector units and/or larger hardware thread counts, also contributing to the peak performance of the whole system. Although current directive–based paradigms, such as OpenMP or OpenACC, support both accelerators and multicore-based hosts, they do not provide an effective and efficient way to concurrently use them, usually resulting in accelerated programs in which the potential computational performance of the host is not exploited. In this paper we propose an extension to the OpenMP 4.5 directive-based programming model to support the specification and execution of multiple instances of task regions on different devices (i.e. accelerators in conjunction with the vector and heavily multithreaded capabilities in multicore processors). The compiler is responsible for the generation of device-specific code for each device kind, delegating to the runtime system the dynamic schedule of the tasks to the available devices. The new proposed clause conveys useful insight to guide the scheduler while keeping a clean, abstract and machine independent programmer interface. The potential of the proposal is analyzed in a prototype implementation in the OmpSs compiler and runtime infrastructure. Performance evaluation is done using three kernels (N-Body, tiled matrix multiply and Stream) on different GPU-capable systems based on ARM, Intel x86 and IBM Power8. From the evaluation we observe speed–ups in the 8–20% range compared to versions in which only the GPU is used, reaching 96 % of the additional peak performance thanks to the reduction of data transfers and the benefits introduced by the OmpSs NUMA-aware scheduler.

CitacióOzen, G., Mateo, S., Ayguadé, E., Labarta, J., Beyer, J. Multiple target task sharing support for the OpenMP accelerator model. A: International Workshop on OpenMP. "OpenMP: memory, devices, and tasks: 12th International Workshop on OpenMP: IWOMP 2016: Nara, Japan: October 5-7, 2016: proceedings". Nara: Springer, 2016, p. 268-280.

URIhttp://hdl.handle.net/2117/91300

DOI10.1007/978-3-319-45550-1_19

ISBN978-3-319-45549-5

Versió de l'editorhttp://link.springer.com/chapter/10.1007%2F978-3-319-45550-1_19

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Multiple Target Task Sharing Support.pdf		1,647Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Multiple target task sharing support for the OpenMP accelerator model

Visualitza/Obre

Explora