Mostra el registre d'ítem simple
Compiler-assisted compaction/restoration of SIMD instructions
dc.contributor.author | Cebrián González, Juan Manuel |
dc.contributor.author | Balem, Thibaud |
dc.contributor.author | Barredo Ferreira, Adrián |
dc.contributor.author | Casas, Marc |
dc.contributor.author | Moretó Planas, Miquel |
dc.contributor.author | Ros Bardisa, Alberto |
dc.contributor.author | Jimborean, Alexandra |
dc.contributor.other | Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors |
dc.contributor.other | Barcelona Supercomputing Center |
dc.date.accessioned | 2021-07-14T13:04:20Z |
dc.date.available | 2021-07-14T13:04:20Z |
dc.date.issued | 2022-04-01 |
dc.identifier.citation | Cebrián, J. [et al.]. Compiler-assisted compaction/restoration of SIMD instructions. "IEEE transactions on parallel and distributed systems", 1 Abril 2022, vol. 33, núm. 4, p. 779-791. |
dc.identifier.issn | 1045-9219 |
dc.identifier.uri | http://hdl.handle.net/2117/349312 |
dc.description.abstract | All the supercomputers in the world exploit data-level parallelism (DLP), for example by using single instructions to operate over several data elements. Improving vector processing is therefore key for exascale computing. Control flow divergence is one of the main vector performance limiting factors. Most modern vector instruction sets rely on predication to support divergence control. Nevertheless, the performance and energy consumption in predicated codes is usually insensitive to the number of active elements. Since the trend is that vector register size doubles every four years, the energy efficiency of exascale systems will become sub-optimal. This paper proposes the Compiler-Assisted Compaction/Restoration (CACR) technique. The baseline CR delays predicated SIMD instructions with inactive elements and compacts active elements with instances of the same instruction from later loop iterations to form and execute an equivalent dense vector instruction. The compiler assisted CR analyzes the code looking for key information required to configure CR. Then, it passes this information to the processor via new instructions. Our evaluation shows that CACR improves performance by up to 29\% and reduces dynamic energy consumption by up to 24.2\% on average. The baseline CR only achieves 18.6\% performance and 14\% energy improvements for the same configuration. |
dc.description.sponsorship | This work has been partially supported by the Spanish Government (SEV2015-0493, BES-2017-080635), the Spanish Ministry of Science and Innovation (PID2019-107255GBC21/AEI/10.13039/501100011033, RTI2018-098156-B-C53), the ECHO and RoMoL ERC projects (819134, 321253), the European HiPEAC Network and the Mont-Blanc 2020 project (EU-FP7-610402 and EU-H2020-779877). and the Spanish Ministry of Economy, Industry and Competitiveness (RYC-2016-21104, RYC-2017-23269 and RYC-2018-025200-I). |
dc.format.extent | 12 p. |
dc.language.iso | eng |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors |
dc.subject.lcsh | Parallel processing (Electronic computers) |
dc.subject.lcsh | Energy consumption |
dc.subject.lcsh | Vector processing (Computer science) |
dc.subject.other | SIMD |
dc.subject.other | Predication |
dc.subject.other | LLVM |
dc.subject.other | Density-time performance |
dc.title | Compiler-assisted compaction/restoration of SIMD instructions |
dc.type | Article |
dc.subject.lemac | Processament en paral·lel (Ordinadors) |
dc.subject.lemac | Energia -- Consum |
dc.subject.lemac | Tractament vectorial |
dc.contributor.group | Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
dc.identifier.doi | 10.1109/TPDS.2021.3091015 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | https://ieeexplore.ieee.org/abstract/document/9462482/ |
dc.rights.access | Open Access |
local.identifier.drac | 31918515 |
dc.description.version | Postprint (author's final draft) |
dc.relation.projectid | info:eu-repo/grantAgreement/EC/FP7/610402/EU/Mont-Blanc 2, European scalable and power efficient HPC platform based on low-power embedded technology/MONT-BLANC 2 |
dc.relation.projectid | info:eu-repo/grantAgreement/AEI/RYC-2016-21104 |
dc.relation.projectid | info:eu-repo/grantAgreement/EC/FP7/321253/EU/Riding on Moore's Law/ROMOL |
local.citation.author | Cebrián, J.; Balem, T.; Barredo, A.; Casas, M.; Moreto, M.; Ros, A.; Jimborean, A. |
local.citation.publicationName | IEEE transactions on parallel and distributed systems |
local.citation.volume | 33 |
local.citation.number | 4 |
local.citation.startingPage | 779 |
local.citation.endingPage | 791 |
Fitxers d'aquest items
Aquest ítem apareix a les col·leccions següents
-
Articles de revista [318]
-
Articles de revista [1.050]
-
Articles de revista [382]
-
Articles de revista [164]