Show simple item record

dc.contributor.authorCatalán Pallarés, Sandra
dc.contributor.authorUsui, Tetsuzo
dc.contributor.authorToledo, Leonel
dc.contributor.authorMartorell Bofill, Xavier
dc.contributor.authorLabarta Mancho, Jesús José
dc.contributor.authorValero Lara, Pedro
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2020-10-20T07:25:46Z
dc.date.available2020-10-20T07:25:46Z
dc.date.issued2020
dc.identifier.citationCatalán, S. [et al.]. Towards an auto-tuned and task-based SpMV (LASs Library). A: International Workshop on OpenMP. "OpenMP: Portable Multi-Level Parallelism on Modern Systems, 16th International Workshop on OpenMP, IWOMP 2020: Austin, TX, USA, September 22–24, 2020: proceedings". Berlín: Springer, 2020, p. 115-129. ISBN 978-3-030-58144-2. DOI 10.1007/978-3-030-58144-2_8.
dc.identifier.isbn978-3-030-58144-2
dc.identifier.urihttp://hdl.handle.net/2117/330455
dc.description.abstractWe present a novel approach to parallelize the SpMV kernel included in LASs (Linear Algebra routines on OmpSs) library, after a deep review and analysis of several well-known approaches. LASs is based on OmpSs, a task-based runtime that extends OpenMP directives, providing more flexibility to apply new strategies. Based on tasking and nesting, with the aim of improving the workload imbalance inherent to the SpMV operation, we present a strategy especially useful for highly imbalanced input matrices. In this approach, the number of created tasks is dynamically decided in order to maximize the use of the resources of the platform. Throughout this paper, SpMV behavior depending on the selected strategy (state of the art and proposed strategies) is deeply analyzed, setting in this way the base for a future auto-tunable code that is able to select the most suitable approach depending on the input matrix. The experiments of this work were carried out for a set of 12 matrices from the Suite Sparse Matrix Collection, all of them with different characteristics regarding their sparsity. The experiments of this work were performed on a node of Marenostrum 4 supercomputer (with two sockets Intel Xeon, 24 cores each) and on a node of Dibona cluster (using one ARM ThunderX2 socket with 32 cores). Our tests show that, for Intel Xeon, the best parallelization strategy reduces the execution time of the reference MKL multi-threaded version up to 67%. On ARM ThunderX2, the reduction is up to 56% with respect to the OmpSs parallel reference.
dc.description.sponsorshipThis project has received funding from the Spanish Ministry of Economy and Competitiveness under the project Computación de Altas Prestaciones VII (TIN2015- 65316-P), the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Parallels (2014-SGR-1051), and the Juan de la Cierva Grant Agreement No IJCI-2017- 33511, and the Spanish Ministry of Science and Innovation under the project Heterogeneidad y especialización en la era post-Moore (RTI2018-093684-BI00). We also acknowledge the funding provided by Fujitsu under the BSC-Fujitsu joint project: Math Libraries Migration and Optimization
dc.format.extent15 p.
dc.language.isoeng
dc.publisherSpringer
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshGraphics processing units
dc.subject.lcshParallel programming (Computer science)
dc.subject.otherSpMV
dc.subject.otherTasking
dc.subject.otherAuto-tuning
dc.subject.otherTaskloop
dc.subject.otherNesting
dc.subject.otherLASs OmpSs
dc.titleTowards an auto-tuned and task-based SpMV (LASs Library)
dc.typeConference report
dc.subject.lemacUnitats de processament gràfic
dc.subject.lemacProgramació en paral·lel (Informàtica)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1007/978-3-030-58144-2_8
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://link.springer.com/chapter/10.1007/978-3-030-58144-2_8
dc.rights.accessOpen Access
local.identifier.drac29519631
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TIN2015-65316-P
dc.relation.projectidinfo:eu-repo/grantAgreement/AGAUR/V PRI/2014 SGR 1051
local.citation.authorCatalán, S.; Usui, T.; Toledo, L.; Martorell, X.; Labarta, J.; Valero-Lara, P.
local.citation.contributorInternational Workshop on OpenMP
local.citation.pubplaceBerlín
local.citation.publicationNameOpenMP: Portable Multi-Level Parallelism on Modern Systems, 16th International Workshop on OpenMP, IWOMP 2020: Austin, TX, USA, September 22–24, 2020: proceedings
local.citation.startingPage115
local.citation.endingPage129


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder