Improving predication efficiency through compaction/restoration of SIMD instructions
Visualitza/Obre
10.1109/HPCA47549.2020.00064
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/341626
Tipus de documentText en actes de congrés
Data publicació2020
EditorInstitute of Electrical and Electronics Engineers (IEEE)
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
ProjecteCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
ROMOL - Riding on Moore's Law (EC-FP7-321253)
Mont-Blanc 2020 - Mont-Blanc 2020, European scalable, modular and power efficient HPC processor (EC-H2020-779877)
ROMOL - Riding on Moore's Law (EC-FP7-321253)
Mont-Blanc 2020 - Mont-Blanc 2020, European scalable, modular and power efficient HPC processor (EC-H2020-779877)
Abstract
Vector processors offer a wide range of unexplored opportunities to improve performance and energy efficiency. However, despite its potential, vector code generation and execution have significant challenges, the most relevant ones being control flow divergence. Most modern processors including SIMD extensions (such as AVX) rely on predication to support divergence control. In predicated codes, performance and energy consumption are usually insensitive to the number of true values in a predicated mask. This implies that the system efficiency becomes sub-optimal as vector length increases. In this paper we focus on SIMD extensions and propose a novel approach to improve execution efficiency in predicated SIMD instructions, the Compaction/Restoration (CR) technique. CR delays predicated SIMD instructions with inactive elements and compacts them with instances of the same instruction from different loop iterations to form an equivalent dense vector instruction, where, in the best case, all the elements are active. After executing such dense instructions, their results are restored to the original instructions. Our evaluation shows that CR improves performance by up to 25% and reduces dynamic energy consumption by up to 43% on real unmodified applications with predicated execution. Moreover, CR allows executing unmodified legacy code with short vector instructions (AVX-2) on newer architectures with wider vectors (AVX-512), achieving up to 56% performance benefits.
CitacióBarredo, A. [et al.]. Improving predication efficiency through compaction/restoration of SIMD instructions. A: International Symposium on High-Performance Computer Architecture. "2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020: San Diego, California, USA, 22-26 February 2020: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2020, p. 717-728. ISBN 978-1-7281-6149-5. DOI 10.1109/HPCA47549.2020.00064.
ISBN978-1-7281-6149-5
Versió de l'editorhttps://ieeexplore.ieee.org/document/9065430
Col·leccions
- Doctorat en Arquitectura de Computadors - Ponències/Comunicacions de congressos [285]
- Computer Sciences - Ponències/Comunicacions de congressos [565]
- CAP - Grup de Computació d'Altes Prestacions - Ponències/Comunicacions de congressos [784]
- Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [1.948]
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Barredo et al.pdf | 828,4Kb | Visualitza/Obre |