Adaptable register file organization for vector processors

Ramírez Lazo, Cristóbal; Reggiani, Enrico; Rojas Morales, Carlos; Figueras Bagué, Roger; Villa Vargas, Luis Alfonso; Ramírez Salinas, Marco Antonio; Valero Cortés, Mateo; Unsal, Osman Sabri; Cristal Kestelman, Adrián

doi:10.1109/HPCA53966.2022.00063

Visualitza/Obre

Ramirez et al.pdf (1,168Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Ramírez Lazo, Cristóbal

Reggiani, Enrico

Rojas Morales, Carlos

Figueras Bagué, Roger

Villa Vargas, Luis Alfonso

Ramírez Salinas, Marco Antonio

Valero Cortés, Mateo

Unsal, Osman Sabri

Cristal Kestelman, Adrián

Tipus de documentText en actes de congrés

Data publicació2022

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteBSC - COMPUTACION DE ALTAS PRESTACIONES VIII (AEI-PID2019-107255GB-C21)

Abstract

Contemporary Vector Processors (VPs) are de-signed either for short vector lengths, e.g., Fujitsu A64FX with 512-bit ARM SVE vector support, or long vectors, e.g., NEC Aurora Tsubasa with 16Kbits Maximum Vector Length (MVL1). Unfortunately, both approaches have drawbacks. On the one hand, short vector length VP designs struggle to provide high efficiency for applications featuring long vectors with high Data Level Parallelism (DLP). On the other hand, long vector VP designs waste resources and underutilize the Vector Register File (VRF) when executing low DLP applications with short vector lengths. Therefore, those long vector VP implementations are limited to a specialized subset of applications, where relatively high DLP must be present to achieve excellent performance with high efficiency. Modern scientific applications are getting more diverse, and the vector lengths in those applications vary widely. To overcome these limitations, we propose an Adaptable Vector Architecture (AVA) that leads to having the best of both worlds. AVA is designed for short vectors (MVL=16 elements) and is thus area and energy-efficient. However, AVA has the functionality to reconfigure the MVL, thereby allowing to exploit the benefits of having a longer vector of up to 128 elements microarchitecture when abundant DLP is present. We model AVA on the gem5 simulator and evaluate AVA performance with six applications taken from the RiVEC Benchmark Suite. To obtain area and power consumption metrics, we model AVA on McPAT for 22nm technology. Our results show that by reconfiguring our small VRF (8KB) plus our novel issue queue scheme, AVA yields a 2X speedup over the default configuration for short vectors. Additionally, AVA shows competitive performance when compared to a long vector VP, while saving 50% of area.

CitacióRamírez, C. [et al.]. Adaptable register file organization for vector processors. A: IEEE International Symposium on High-Performance Computer Architecture. "2022 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2022: virtual, 2-6 April 2022, proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 786-799. ISBN 978-1-6654-2027-3. DOI 10.1109/HPCA53966.2022.00063.

URIhttp://hdl.handle.net/2117/367955

DOI10.1109/HPCA53966.2022.00063

ISBN978-1-6654-2027-3

Versió de l'editorhttps://ieeexplore.ieee.org/document/9773222

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Ramirez et al.pdf		1,168Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Adaptable register file organization for vector processors

Visualitza/Obre

Explora