Show simple item record

dc.contributor.authorValero-Lara, Pedro
dc.contributor.authorMartinez-Perez, Ivan
dc.contributor.authorMateo, Sergio
dc.contributor.authorSirvent Pardell, Raül
dc.contributor.authorBeltran Querol, Vicenç
dc.contributor.authorMartorell Bofill, Xavier
dc.contributor.authorLabarta Mancho, Jesús José
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2018-11-12T14:43:48Z
dc.date.issued2018
dc.identifier.citationValero-Lara, P., Martinez-Perez, I., Mateo, S., Sirvent, R., Beltran, V., Martorell, X., Labarta, J. Variable batched DGEMM. A: Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. "PDP 2018: 26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing: proceedings: Cambridge, United Kingdom 21-23 March 2018". Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 363-367.
dc.identifier.isbn978-1-5386-4976-3
dc.identifier.urihttp://hdl.handle.net/2117/123997
dc.description.abstractMany scientific applications are in need to solve a high number of small-size independent problems. These individual problems do not provide enough parallelism and then, these must be computed as a batch. Today, vendors such as Intel and NVIDIA are developing their own suite of batch routines. Although most of the works focus on computing batches of fixed size, in real applications we can not assume a uniform size for all set of problems. We explore and analyze different strategies based on parallel for, task and taskloop OpenMP pragmas. Although these strategies are straightforward from a programmer's point of view, they have a different impact on performance. We also analyze a new prototype provided by Intel (MKL), which deals with batch operations (cblas dgemm batch). We propose a new approach called grouping. It basically groups a set of problems until filling a limit in terms of memory occupancy or number of operations. In this way, groups composed by different number of problems are distributed on cores, achieving a more balanced distribution in terms of computational cost. This strategy is able to be up to 6× faster than the Intel (MKL) batch routine
dc.description.sponsorshipThis project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 720270 (HBP SGA1), from the Spanish Ministry of Economy and Competitiveness under the project Computacion de Altas Prestaciones VII (TIN2015-65316-P) and the Departament d’Innovacio, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programacio i Entorns d’Execucio Paral·lels (2014-SGR-1051).
dc.format.extent5 p.
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.otherAuto tunning
dc.subject.otherBatched BLAS
dc.subject.otherDGEMM
dc.subject.otherIntel Xeon
dc.subject.otherOpenMP
dc.subject.otherRuntime
dc.titleVariable batched DGEMM
dc.typeConference report
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1109/PDP2018.2018.00065
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://ieeexplore.ieee.org/document/8374488
dc.rights.accessOpen Access
local.identifier.drac23438756
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/720270/EU/Human Brain Project Specific Grant Agreement 1/HBP SGA1
dc.relation.projectidinfo:eu-repo/grantAgreement/AGAUR/PRI2010-2013/2014 SGR 1051
local.citation.authorValero-Lara, P.; Martinez-Perez, I.; Mateo, S.; Sirvent, R.; Beltran, V.; Martorell, X.; Labarta, J.
local.citation.contributorEuromicro International Conference on Parallel, Distributed, and Network-Based Processing
local.citation.publicationNamePDP 2018: 26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing: proceedings: Cambridge, United Kingdom 21-23 March 2018
local.citation.startingPage363
local.citation.endingPage367


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record