Multi-GPU parallelization of the NAS multi-zone parallel benchmarks

Carregant...
Miniatura
El pots comprar en digital a:
El pots comprar en paper a:

Projectes de recerca

Unitats organitzatives

Número de la revista

Títol de la revista

ISSN de la revista

Títol del volum

Col·laborador

Editor

Tribunal avaluador

Realitzat a/amb

Tipus de document

Article

Data publicació

Editor

Condicions d'accés

Accés obert

item.page.rightslicense

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització de la persona titular dels drets

Assignatures relacionades

Assignatures relacionades

Publicacions relacionades

Datasets relacionats

Datasets relacionats

Projecte CCD

Abstract

GPU-based computing systems have become a widely accepted solution for the high-performance-computing (HPC) domain. GPUs have shown highly competitive performance-per-watt ratios and can exploit an astonishing level of parallelism. However, exploiting the peak performance of such devices is a challenge, mainly due to the combination of two essential aspects of multi-GPU execution. On one hand, the workload should be distributed evenly among the GPUs. On the other hand, communications between GPU devices are costly and should be minimized. Therefore, a trade-of between work-distribution schemes and communication overheads will condition the overall performance of parallel applications run on multi-GPU systems. In this article we present a multi-GPU implementation of NAS Multi-Zone Parallel Benchmarks (which execution alternate communication and computational phases). We propose several work-distribution strategies that try to evenly distribute the workload among the GPUs. Our evaluations show that performance is highly sensitive to this distribution strategy, as the the communication phases of the applications are heavily affected by the work-distribution schemes applied in computational phases. In particular, we consider Static, Dynamic, and Guided schedulers to find a trade-off between both phases to maximize the overall performance. In addition, we compare those schedulers with an optimal scheduler computed offline using IBM CPLEX. On an evaluation environment composed of 2 x IBM Power9 8335-GTH and 4 x GPU NVIDIA V100 (Volta), our multi-GPU parallelization outperforms single-GPU execution from 1.48x to 1.86x (2 GPUs) and from 1.75x to 3.54x (4 GPUs). This article analyses these improvements in terms of the relationship between the computational and communication phases of the applications as the number of GPUs is increased. We prove that Guided schedulers perform at similar level as optimal schedulers.

Descripció

Persones/entitats

Document relacionat

Versió de

Citació

González, M.; Morancho, E. Multi-GPU parallelization of the NAS multi-zone parallel benchmarks. "IEEE transactions on parallel and distributed systems", 1 Gener 2021, vol. 32, núm. 1, article 9162505, p. 229-241.

Ajut

Forma part

Dipòsit legal

ISBN

ISSN

1045-9219

Altres identificadors

Referències