Show simple item record

dc.contributor.authorSainz, Florentino
dc.contributor.authorMateo Bellido, Sergi
dc.contributor.authorBeltran Querol, Vicenç
dc.contributor.authorBosque, José L.
dc.contributor.authorMartorell Bofill, Xavier
dc.contributor.authorAyguadé Parra, Eduard
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.identifier.citationSainz, F. [et al.]. Leveraging OmpSs to exploit hardware accelerators. A: International Symposium on Computer Architecture and High Performance Computing. "IEEE 26th International Symposium on Computer Architecture and High Performance Computing: 22–24 October 2014: Paris, France: proceedings". París: Institute of Electrical and Electronics Engineers (IEEE), 2014, p. 112-119.
dc.description.abstractCUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both programming models provide a C-based programming language to write accelerator kernels and a host API used to glue the host and kernel parts. Although this model is a clear improvement over a low-level and ad-hoc programming model for each hardware accelerator, it is still too complex and cumbersome for general adoption. For large and complex applications using several accelerators, the main problem becomes the explicit coordination and management of resources required between the host and the hardware accelerators that introduce a new family of issues (scheduling, data transfers, synchronization,...) that the programmer must take into account. In this paper, we propose a simple extension to OmpSs - a data-flow programming model - that dramatically simplifies the integration of accelerated code, in the form of CUDA or OpenCL kernels, into any C, C++ or Fortran application. Our proposal fully replaces the CUDA and OpenCL host APIs with a few pragmas, so we can leverage any kernel written in CUDA C or OpenCL C without any performance impact. Our compiler generates all the boilerplat code while our runtime system takes care of kernels scheduling, data transfers between host and accelerators and synchronizations between host and kernels parts. To evaluate our approach, we have ported several native CUDA and OpenCL applications to OmpSs by replacing all the CUDA or OpenCL API calls by a few number of pragmas. The OmpSs versions of these applications have competitive performance and scalability but with a significantly lower complexity than the original ones.
dc.format.extent8 p.
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.subjectÀrees temàtiques de la UPC::Informàtica::Llenguatges de programació::C
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcshC (Computer program language)
dc.subject.lcshParallel programming (Computer science)
dc.titleLeveraging OmpSs to exploit hardware accelerators
dc.typeConference report
dc.subject.lemacC (Llenguatge de programació)
dc.subject.lemacProgramació en paral·lel (Informàtica)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.description.peerreviewedPeer Reviewed
dc.rights.accessRestricted access - publisher's policy
dc.description.versionPostprint (published version)
local.citation.authorSainz, F.; Mateo, S.; Beltran, V.; Bosque, J.; Martorell, X.; Ayguade, E.
local.citation.contributorInternational Symposium on Computer Architecture and High Performance Computing
local.citation.publicationNameIEEE 26th International Symposium on Computer Architecture and High Performance Computing: 22–24 October 2014: Paris, France: proceedings

Files in this item


This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain