Show simple item record

dc.contributor.authorWagner, Michael
dc.contributor.authorLópez, Victor
dc.contributor.authorMorillo, Julian
dc.contributor.authorCavazzoni, Carlo
dc.contributor.authorAffinito, Fabio
dc.contributor.authorGimenez, Judit
dc.contributor.authorLabarta Mancho, Jesús José
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2017-11-06T10:22:52Z
dc.date.available2017-11-06T10:22:52Z
dc.date.issued2017
dc.identifier.citationWagner, M., López, V., Morillo, J., Cavazzoni, C., Affinito, F., Gimenez, J., Labarta, J. Performance analysis and optimization of the FFTXlib on the Intel knights landing architecture. A: International Conference on Parallel Processing Workshops. "ICPPW 2017: 46th International Conference on Parallel Processing Workshops: 14 August 2017, Bristol, United Kingdom: proceedings". Bristol: Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 243-250.
dc.identifier.isbn978-1-5386-1044-2
dc.identifier.urihttp://hdl.handle.net/2117/109837
dc.description.abstractIn this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping computation and communication and, second, decreasing resource contention for higher compute efficiency. In order to achieve this we use the OmpSs programming model based on task dependencies. We allow overlapping of computation and communication by converting all steps of the FFT into tasks following a flow dependency. In the same way, we decrease resource contention by converting each FFT into an individual task that can be scheduled asynchronously. In both cases, multiple FFTs can be computed in parallel. The task-based optimizations are implemented in the FFTXlib and show up to 10% runtime reduction on the already highly optimized version. Since the task scheduling is done dynamically during execution by the parallel runtime, not statically by the user, it also frees the user from finding the ideal parallel configuration himself.
dc.description.sponsorshipWe gratefully acknowledge the support of the MaX and POP projects, which have received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 676598 and 676553, respectively.
dc.format.extent8 p.
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.lcshSoftware architecture
dc.subject.lcshHigh performance computing
dc.subject.otherTools
dc.subject.otherRuntime
dc.subject.otherComputer architecture
dc.subject.otherKernel
dc.subject.otherOptimization
dc.subject.otherDiscrete Fourier transforms
dc.subject.otherProgramming
dc.subject.otherPerformance analysis
dc.subject.otherTracing
dc.subject.otherTools
dc.subject.otherKNL
dc.subject.otherKnights landing
dc.subject.otherXeon Phi
dc.subject.otherHPC
dc.subject.otherExtrae
dc.subject.otherParaver
dc.subject.otherQuantum Espresso
dc.subject.otherFFTXlib
dc.titlePerformance analysis and optimization of the FFTXlib on the Intel knights landing architecture
dc.typeConference report
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.subject.lemacProgramari -- Disseny
dc.subject.lemacCàlcul intensiu (Informàtica)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1109/ICPPW.2017.44
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://ieeexplore.ieee.org/abstract/document/8026092/
dc.rights.accessOpen Access
local.identifier.drac21548576
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/676598/EU/Materials design at the eXascale/MaX
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/676553/EU/Performance Optimisation and Productivity/POP
local.citation.authorWagner, M.; López, V.; Morillo, J.; Cavazzoni, C.; Affinito, F.; Gimenez, J.; Labarta, J.
local.citation.contributorInternational Conference on Parallel Processing Workshops
local.citation.pubplaceBristol
local.citation.publicationNameICPPW 2017: 46th International Conference on Parallel Processing Workshops: 14 August 2017, Bristol, United Kingdom: proceedings
local.citation.startingPage243
local.citation.endingPage250


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder