Compiler-assisted compaction/restoration of SIMD instructions

Cebrián González, Juan Manuel; Balem, Thibaud; Barredo Ferreira, Adrián; Casas, Marc; Moretó Planas, Miquel; Ros Bardisa, Alberto; Jimborean, Alexandra

doi:10.1109/TPDS.2021.3091015

dc.contributor.author	Cebrián González, Juan Manuel
dc.contributor.author	Balem, Thibaud
dc.contributor.author	Barredo Ferreira, Adrián
dc.contributor.author	Casas, Marc
dc.contributor.author	Moretó Planas, Miquel
dc.contributor.author	Ros Bardisa, Alberto
dc.contributor.author	Jimborean, Alexandra
dc.contributor.other	Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2021-07-14T13:04:20Z
dc.date.available	2021-07-14T13:04:20Z
dc.date.issued	2022-04-01
dc.identifier.citation	Cebrián, J. [et al.]. Compiler-assisted compaction/restoration of SIMD instructions. "IEEE transactions on parallel and distributed systems", 1 Abril 2022, vol. 33, núm. 4, p. 779-791.
dc.identifier.issn	1045-9219
dc.identifier.uri	http://hdl.handle.net/2117/349312
dc.description.abstract	All the supercomputers in the world exploit data-level parallelism (DLP), for example by using single instructions to operate over several data elements. Improving vector processing is therefore key for exascale computing. Control flow divergence is one of the main vector performance limiting factors. Most modern vector instruction sets rely on predication to support divergence control. Nevertheless, the performance and energy consumption in predicated codes is usually insensitive to the number of active elements. Since the trend is that vector register size doubles every four years, the energy efficiency of exascale systems will become sub-optimal. This paper proposes the Compiler-Assisted Compaction/Restoration (CACR) technique. The baseline CR delays predicated SIMD instructions with inactive elements and compacts active elements with instances of the same instruction from later loop iterations to form and execute an equivalent dense vector instruction. The compiler assisted CR analyzes the code looking for key information required to configure CR. Then, it passes this information to the processor via new instructions. Our evaluation shows that CACR improves performance by up to 29\% and reduces dynamic energy consumption by up to 24.2\% on average. The baseline CR only achieves 18.6\% performance and 14\% energy improvements for the same configuration.
dc.description.sponsorship	This work has been partially supported by the Spanish Government (SEV2015-0493, BES-2017-080635), the Spanish Ministry of Science and Innovation (PID2019-107255GBC21/AEI/10.13039/501100011033, RTI2018-098156-B-C53), the ECHO and RoMoL ERC projects (819134, 321253), the European HiPEAC Network and the Mont-Blanc 2020 project (EU-FP7-610402 and EU-H2020-779877). and the Spanish Ministry of Economy, Industry and Competitiveness (RYC-2016-21104, RYC-2017-23269 and RYC-2018-025200-I).
dc.format.extent	12 p.
dc.language.iso	eng
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcsh	Parallel processing (Electronic computers)
dc.subject.lcsh	Energy consumption
dc.subject.lcsh	Vector processing (Computer science)
dc.subject.other	SIMD
dc.subject.other	Predication
dc.subject.other	LLVM
dc.subject.other	Density-time performance
dc.title	Compiler-assisted compaction/restoration of SIMD instructions
dc.type	Article
dc.subject.lemac	Processament en paral·lel (Ordinadors)
dc.subject.lemac	Energia -- Consum
dc.subject.lemac	Tractament vectorial
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1109/TPDS.2021.3091015
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	https://ieeexplore.ieee.org/abstract/document/9462482/
dc.rights.access	Open Access
local.identifier.drac	31918515
dc.description.version	Postprint (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/EC/FP7/610402/EU/Mont-Blanc 2, European scalable and power efficient HPC platform based on low-power embedded technology/MONT-BLANC 2
dc.relation.projectid	info:eu-repo/grantAgreement/AEI/RYC-2016-21104
dc.relation.projectid	info:eu-repo/grantAgreement/EC/FP7/321253/EU/Riding on Moore's Law/ROMOL
local.citation.author	Cebrián, J.; Balem, T.; Barredo, A.; Casas, M.; Moreto, M.; Ros, A.; Jimborean, A.
local.citation.publicationName	IEEE transactions on parallel and distributed systems
local.citation.volume	33
local.citation.number	4
local.citation.startingPage	779
local.citation.endingPage	791

Fitxers d'aquest items

Nom:: cebrian et al.pdf
Mida:: 1,959Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Compiler-assisted compaction/restoration of SIMD instructions

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora