Performance analysis and optimization of the FFTXlib on the Intel knights landing architecture

Wagner, Michael; López, Victor; Morillo, Julian; Cavazzoni, Carlo; Affinito, Fabio; Gimenez, Judit; Labarta Mancho, Jesús José

doi:10.1109/ICPPW.2017.44

Visualitza/Obre

Performance Analysis and Optimization of the.pdf (246,8Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Labarta Mancho, Jesús José

Tipus de documentText en actes de congrés

Data publicació2017

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteMaX - Materials design at the eXascale (EC-H2020-676598)
POP - Performance Optimisation and Productivity (EC-H2020-676553)

Abstract

In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping computation and communication and, second, decreasing resource contention for higher compute efficiency. In order to achieve this we use the OmpSs programming model based on task dependencies. We allow overlapping of computation and communication by converting all steps of the FFT into tasks following a flow dependency. In the same way, we decrease resource contention by converting each FFT into an individual task that can be scheduled asynchronously. In both cases, multiple FFTs can be computed in parallel. The task-based optimizations are implemented in the FFTXlib and show up to 10% runtime reduction on the already highly optimized version. Since the task scheduling is done dynamically during execution by the parallel runtime, not statically by the user, it also frees the user from finding the ideal parallel configuration himself.

CitacióWagner, M., López, V., Morillo, J., Cavazzoni, C., Affinito, F., Gimenez, J., Labarta, J. Performance analysis and optimization of the FFTXlib on the Intel knights landing architecture. A: International Conference on Parallel Processing Workshops. "ICPPW 2017: 46th International Conference on Parallel Processing Workshops: 14 August 2017, Bristol, United Kingdom: proceedings". Bristol: Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 243-250.

URIhttp://hdl.handle.net/2117/109837

DOI10.1109/ICPPW.2017.44

ISBN978-1-5386-1044-2

Versió de l'editorhttp://ieeexplore.ieee.org/abstract/document/8026092/

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Performance Analysis and Optimization of the.pdf		246,8Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Performance analysis and optimization of the FFTXlib on the Intel knights landing architecture

Visualitza/Obre

Explora