Mostra el registre d'ítem simple

dc.contributor.authorCebrián González, Juan Manuel
dc.contributor.authorBalem, Thibaud
dc.contributor.authorBarredo Ferreira, Adrián
dc.contributor.authorCasas, Marc
dc.contributor.authorMoretó Planas, Miquel
dc.contributor.authorRos Bardisa, Alberto
dc.contributor.authorJimborean, Alexandra
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2021-07-14T13:04:20Z
dc.date.available2021-07-14T13:04:20Z
dc.date.issued2022-04-01
dc.identifier.citationCebrián, J. [et al.]. Compiler-assisted compaction/restoration of SIMD instructions. "IEEE transactions on parallel and distributed systems", 1 Abril 2022, vol. 33, núm. 4, p. 779-791.
dc.identifier.issn1045-9219
dc.identifier.urihttp://hdl.handle.net/2117/349312
dc.description.abstractAll the supercomputers in the world exploit data-level parallelism (DLP), for example by using single instructions to operate over several data elements. Improving vector processing is therefore key for exascale computing. Control flow divergence is one of the main vector performance limiting factors. Most modern vector instruction sets rely on predication to support divergence control. Nevertheless, the performance and energy consumption in predicated codes is usually insensitive to the number of active elements. Since the trend is that vector register size doubles every four years, the energy efficiency of exascale systems will become sub-optimal. This paper proposes the Compiler-Assisted Compaction/Restoration (CACR) technique. The baseline CR delays predicated SIMD instructions with inactive elements and compacts active elements with instances of the same instruction from later loop iterations to form and execute an equivalent dense vector instruction. The compiler assisted CR analyzes the code looking for key information required to configure CR. Then, it passes this information to the processor via new instructions. Our evaluation shows that CACR improves performance by up to 29\% and reduces dynamic energy consumption by up to 24.2\% on average. The baseline CR only achieves 18.6\% performance and 14\% energy improvements for the same configuration.
dc.description.sponsorshipThis work has been partially supported by the Spanish Government (SEV2015-0493, BES-2017-080635), the Spanish Ministry of Science and Innovation (PID2019-107255GBC21/AEI/10.13039/501100011033, RTI2018-098156-B-C53), the ECHO and RoMoL ERC projects (819134, 321253), the European HiPEAC Network and the Mont-Blanc 2020 project (EU-FP7-610402 and EU-H2020-779877). and the Spanish Ministry of Economy, Industry and Competitiveness (RYC-2016-21104, RYC-2017-23269 and RYC-2018-025200-I).
dc.format.extent12 p.
dc.language.isoeng
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.lcshEnergy consumption
dc.subject.lcshVector processing (Computer science)
dc.subject.otherSIMD
dc.subject.otherPredication
dc.subject.otherLLVM
dc.subject.otherDensity-time performance
dc.titleCompiler-assisted compaction/restoration of SIMD instructions
dc.typeArticle
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.subject.lemacEnergia -- Consum
dc.subject.lemacTractament vectorial
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1109/TPDS.2021.3091015
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://ieeexplore.ieee.org/abstract/document/9462482/
dc.rights.accessOpen Access
local.identifier.drac31918515
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/FP7/610402/EU/Mont-Blanc 2, European scalable and power efficient HPC platform based on low-power embedded technology/MONT-BLANC 2
dc.relation.projectidinfo:eu-repo/grantAgreement/AEI/RYC-2016-21104
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/FP7/321253/EU/Riding on Moore's Law/ROMOL
local.citation.authorCebrián, J.; Balem, T.; Barredo, A.; Casas, M.; Moreto, M.; Ros, A.; Jimborean, A.
local.citation.publicationNameIEEE transactions on parallel and distributed systems
local.citation.volume33
local.citation.number4
local.citation.startingPage779
local.citation.endingPage791


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple