Multi-GPU parallelization of the NAS multi-zone parallel benchmarks

González Tallada, Marc; Morancho Llena, Enrique

doi:10.1109/TPDS.2020.3015148

dc.contributor.author	González Tallada, Marc
dc.contributor.author	Morancho Llena, Enrique
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2021-01-07T11:54:26Z
dc.date.available	2021-01-07T11:54:26Z
dc.date.issued	2021-01-01
dc.identifier.citation	González, M.; Morancho, E. Multi-GPU parallelization of the NAS multi-zone parallel benchmarks. "IEEE transactions on parallel and distributed systems", 1 Gener 2021, vol. 32, núm. 1, article 9162505, p. 229-241.
dc.identifier.issn	1045-9219
dc.identifier.uri	http://hdl.handle.net/2117/334984
dc.description.abstract	GPU-based computing systems have become a widely accepted solution for the high-performance-computing (HPC) domain. GPUs have shown highly competitive performance-per-watt ratios and can exploit an astonishing level of parallelism. However, exploiting the peak performance of such devices is a challenge, mainly due to the combination of two essential aspects of multi-GPU execution. On one hand, the workload should be distributed evenly among the GPUs. On the other hand, communications between GPU devices are costly and should be minimized. Therefore, a trade-of between work-distribution schemes and communication overheads will condition the overall performance of parallel applications run on multi-GPU systems. In this article we present a multi-GPU implementation of NAS Multi-Zone Parallel Benchmarks (which execution alternate communication and computational phases). We propose several work-distribution strategies that try to evenly distribute the workload among the GPUs. Our evaluations show that performance is highly sensitive to this distribution strategy, as the the communication phases of the applications are heavily affected by the work-distribution schemes applied in computational phases. In particular, we consider Static, Dynamic, and Guided schedulers to find a trade-off between both phases to maximize the overall performance. In addition, we compare those schedulers with an optimal scheduler computed offline using IBM CPLEX. On an evaluation environment composed of 2 x IBM Power9 8335-GTH and 4 x GPU NVIDIA V100 (Volta), our multi-GPU parallelization outperforms single-GPU execution from 1.48x to 1.86x (2 GPUs) and from 1.75x to 3.54x (4 GPUs). This article analyses these improvements in terms of the relationship between the computational and communication phases of the applications as the number of GPUs is increased. We prove that Guided schedulers perform at similar level as optimal schedulers.
dc.description.sponsorship	This work was supported by the Spanish Ministry of Science and Technology (TIN2015-65316-P) and by the Generalitat de Catalunya (2014-SGR-1051).
dc.format.extent	13 p.
dc.language.iso	eng
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcsh	Graphics processing units
dc.subject.lcsh	Parallel programming (Computer science)
dc.subject.other	Multi-GPU parallelization
dc.subject.other	Load balancing
dc.subject.other	Static
dc.subject.other	Dynamic
dc.subject.other	Guided schedulings
dc.title	Multi-GPU parallelization of the NAS multi-zone parallel benchmarks
dc.type	Article
dc.subject.lemac	Unitats de processament gràfic
dc.subject.lemac	Programació en paral·lel (Informàtica)
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1109/TPDS.2020.3015148
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	https://ieeexplore.ieee.org/document/9162505
dc.rights.access	Open Access
local.identifier.drac	29925190
dc.description.version	Postprint (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
dc.relation.projectid	info:eu-repo/grantAgreement/AGAUR/V PRI/2014 SGR 1051
local.citation.author	González, M.; Morancho, E.
local.citation.publicationName	IEEE transactions on parallel and distributed systems
local.citation.volume	32
local.citation.number	1, article 9162505
local.citation.startingPage	229
local.citation.endingPage	241

Fitxers d'aquest items

Nom:: tpds2020.pdf
Mida:: 4,782Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Articles de revista [1.049]
Articles de revista [382]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Multi-GPU parallelization of the NAS multi-zone parallel benchmarks

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora