MPI+X: task-based parallelization and dynamic load balance of finite element assembly

Garcia-Gasulla, Marta; Houzeaux, Guillaume; Ferrer, Roger; Artigues, Antoni; López, Victor; Labarta Mancho, Jesús José; Vázquez, Mariano

doi:10.1080/10618562.2019.1617856

dc.contributor.author	Garcia-Gasulla, Marta
dc.contributor.author	Houzeaux, Guillaume
dc.contributor.author	Ferrer, Roger
dc.contributor.author	Artigues, Antoni
dc.contributor.author	López, Victor
dc.contributor.author	Labarta Mancho, Jesús José
dc.contributor.author	Vázquez, Mariano
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2018-04-27T13:21:03Z
dc.date.available	2018-04-27T13:21:03Z
dc.date.issued	2018
dc.identifier.citation	Garcia-Gasulla, M. [et al.]. "MPI+X: task-based parallelization and dynamic load balance of finite element assembly". 2018.
dc.identifier.issn	1061-8562
dc.identifier.uri	http://hdl.handle.net/2117/116807
dc.description.abstract	The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy can be straightforwardly applied to other discretization methods, like the finite volume method. The matrix assembly consists of a loop over the elements of the MPI partition to compute element matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strate- gies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring techniques to circumvent the race condition that appears when assembling the element system into the local system. The main drawback of the first technique is the decrease of the IPC due to bad spatial locality. The second technique avoids this issue but requires extensive changes in the implementation, which can be cumbersome when several element loops should be treated. We propose an alternative, based on the task parallelism of the element loop using some extensions to the OpenMP programming model. The task- ification of the assembly solves both aforementioned problems. In addition, dynamic load balance will be applied using the DLB library, especially efficient in the presence of hybrid meshes, where the relative costs of the different elements is impossible to estimate a priori. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores.
dc.description.sponsorship	The use of large part of a supercomputer, even more in normal conditions of use, is never an innocuous exercise. The research leading to these results has received funding from: the European Union's Horizon 2020 Programme (2014–2020) and from Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP), HPC4E Project, grant agreement 689772; the Energy oriented Centre of Excellence (EoCoE), grant agreement number 676629, funded within the Horizon2020 framework of the European Union; The Spanish Government (grant SEV2015-0493 of the Severo Ochoa Program); the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P); the Generalitat de Catalunya (contract 2014-SGR-1051); the Intel-BSC Exascale Lab collaboration project. Comissió Interdepartamental de Recerca i Innovació Tecnológica(Interdepartmental Commission for Research and Technological Innovation)
dc.format.extent	26 p.
dc.language.iso	eng
dc.publisher	Taylor & Francis
dc.subject	Àrees temàtiques de la UPC::Informàtica
dc.subject.lcsh	OpenMP
dc.subject.other	Finite element code (FE)
dc.subject.other	OpenMP
dc.title	MPI+X: task-based parallelization and dynamic load balance of finite element assembly
dc.type	Article
dc.subject.lemac	OpenMP
dc.identifier.doi	10.1080/10618562.2019.1617856
dc.description.peerreviewed	Sí
dc.relation.publisherversion	https://www.tandfonline.com/doi/full/10.1080/10618562.2019.1617856
dc.rights.access	Open Access
dc.description.version	Post-print (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/689772/EU/HPC for Energy/HPC4E
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/676629/EU/Energy oriented Centre of Excellence for computer applications/EoCoE
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
local.citation.publicationName	International Journal of Computational Fluid Dynamics

Fitxers d'aquest items

Nom:: arti66.pdf
Mida:: 4,421Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Altres [2]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

MPI+X: task-based parallelization and dynamic load balance of finite element assembly

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora