MPI+X: task-based parallelization and dynamic load balance of finite element assembly

View/Open
Cita com:
hdl:2117/116807
Document typeArticle
Defense date2018
PublisherTaylor & Francis
Rights accessOpen Access
ProjectHPC4E - HPC for Energy (EC-H2020-689772)
EoCoE - Energy oriented Centre of Excellence for computer applications (EC-H2020-676629)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
EoCoE - Energy oriented Centre of Excellence for computer applications (EC-H2020-676629)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
Abstract
The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's)
are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context
of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy
can be straightforwardly applied to other discretization methods, like the finite volume method.
The matrix assembly consists of a loop over the elements of the MPI partition to compute element
matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X
hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strate-
gies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring
techniques to circumvent the race condition that appears when assembling the element system into the local
system. The main drawback of the first technique is the decrease of the IPC due to bad spatial locality.
The second technique avoids this issue but requires extensive changes in the implementation, which can
be cumbersome when several element loops should be treated. We propose an alternative, based on the
task parallelism of the element loop using some extensions to the OpenMP programming model. The task-
ification of the assembly solves both aforementioned problems. In addition, dynamic load balance will be
applied using the DLB library, especially efficient in the presence of hybrid meshes, where the relative costs
of the different elements is impossible to estimate a priori. This paper presents the proposed methodology,
its implementation and its validation through the solution of large computational mechanics problems up
to 16k cores.
CitationGarcia-Gasulla, M. [et al.]. "MPI+X: task-based parallelization and dynamic load balance of finite element assembly". 2018.
ISSN1061-8562
Publisher versionhttps://www.tandfonline.com/doi/full/10.1080/10618562.2019.1617856
Collections
Files | Description | Size | Format | View |
---|---|---|---|---|
arti66.pdf | 4,421Mb | View/Open |
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder