Performance analysis and optimization of the FFTXlib on the Intel knights landing architecture
Visualitza/Obre
Cita com:
hdl:2117/109837
Tipus de documentText en actes de congrés
Data publicació2017
EditorInstitute of Electrical and Electronics Engineers (IEEE)
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
ProjecteMaX - Materials design at the eXascale (EC-H2020-676598)
POP - Performance Optimisation and Productivity (EC-H2020-676553)
POP - Performance Optimisation and Productivity (EC-H2020-676553)
Abstract
In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping computation and communication and, second, decreasing resource contention for higher compute efficiency. In order to achieve this we use the OmpSs programming model based on task dependencies. We allow overlapping of computation and communication by converting all steps of the FFT into tasks following a flow dependency. In the same way, we decrease resource contention by converting each FFT into an individual task that can be scheduled asynchronously. In both cases, multiple FFTs can be computed in parallel. The task-based optimizations are implemented in the FFTXlib and show up to 10% runtime reduction on the already highly optimized version. Since the task scheduling is done dynamically during execution by the parallel runtime, not statically by the user, it also frees the user from finding the ideal parallel configuration himself.
CitacióWagner, M., López, V., Morillo, J., Cavazzoni, C., Affinito, F., Gimenez, J., Labarta, J. Performance analysis and optimization of the FFTXlib on the Intel knights landing architecture. A: International Conference on Parallel Processing Workshops. "ICPPW 2017: 46th International Conference on Parallel Processing Workshops: 14 August 2017, Bristol, United Kingdom: proceedings". Bristol: Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 243-250.
ISBN978-1-5386-1044-2
Versió de l'editorhttp://ieeexplore.ieee.org/abstract/document/8026092/
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Performance Analysis and Optimization of the.pdf | 246,8Kb | Visualitza/Obre |