Mostra el registre d'ítem simple

dc.contributor.authorLorenzon, Arthur F.
dc.contributor.authorMarques, Sandro M. V. N.
dc.contributor.authorNavarro Muñoz, Antoni
dc.contributor.authorBeltran Querol, Vicenç
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
dc.date.accessioned2022-06-30T08:17:19Z
dc.date.available2022-06-30T08:17:19Z
dc.date.issued2022
dc.identifier.citationLorenzon, A. [et al.]. Seamless optimization of the GEMM kernel for task-based programming models. A: International Conference on Supercomputing. "Proceedings of the 36th ACM International Conference on Supercomputing (ICS-2022): virtual event, June 27–30, 2022". New York: Association for Computing Machinery (ACM), 2022, ISBN 978-1-4503-9281-5. DOI 10.1145/3524059.3532385.
dc.identifier.isbn978-1-4503-9281-5
dc.identifier.urihttp://hdl.handle.net/2117/369338
dc.description.abstractThe general matrix-matrix multiplication (GEMM) kernel is a fundamental building block of many scientific applications. Many libraries such as Intel MKL and BLIS provide highly optimized sequential and parallel versions of this kernel. The parallel implementations of the GEMM kernel rely on the well-known fork-join execution model to exploit multi-core systems efficiently. However, these implementations are not well suited for task-based applications as they break the data-flow execution model. In this paper, we present a task-based implementation of the GEMM kernel that can be seamlessly leveraged by task-based applications while providing better performance than the fork-join version. Our implementation leverages several advanced features of the OmpSs-2 programming model and a new heuristic to select the best parallelization strategy and blocking parameters based on the matrix and hardware characteristics. When evaluating the performance and energy consumption on two modern multi-core systems, we show that our implementations provide significant performance improvements over an optimized OpenMP fork-join implementation, and can beat vendor implementations of the GEMM (e.g., Intel MKL and AMD AOCL). We also demonstrate that a real application can leverage our optimized task-based implementation to enhance performance.
dc.language.isoeng
dc.publisherAssociation for Computing Machinery (ACM)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.lcshMicroprocessors -- Energy consumption
dc.subject.otherGEMM
dc.subject.otherMalleability
dc.subject.otherParallel computing
dc.subject.otherEnergy-efficiency
dc.titleSeamless optimization of the GEMM kernel for task-based programming models
dc.typeConference report
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.subject.lemacMicroprocessadors -- Consum d'energia
dc.identifier.doi10.1145/3524059.3532385
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://dl.acm.org/doi/10.1145/3524059.3532385
dc.rights.accessOpen Access
local.identifier.drac33880378
dc.description.versionPostprint (author's final draft)
local.citation.authorLorenzon, A.; Marques, S.; Navarro, A.; Beltran, V.
local.citation.contributorInternational Conference on Supercomputing
local.citation.pubplaceNew York
local.citation.publicationNameProceedings of the 36th ACM International Conference on Supercomputing (ICS-2022): virtual event, June 27–30, 2022


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple