Exploiting vector code semantics for efficient data cache prefetching

Carregant...
Miniatura
El pots comprar en digital a:
El pots comprar en paper a:

Projectes de recerca

Unitats organitzatives

Número de la revista

Títol de la revista

ISSN de la revista

Títol del volum

Col·laborador

Editor

Tribunal avaluador

Realitzat a/amb

Tipus de document

Text en actes de congrés

Data publicació

Editor

Association for Computing Machinery (ACM)

Condicions d'accés

Accés obert

item.page.rightslicense

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització de la persona titular dels drets

Assignatures relacionades

Assignatures relacionades

Publicacions relacionades

Datasets relacionats

Datasets relacionats

Projecte CCD

Abstract

Emerging workloads from domains like high performance computing, data analytics or deep learning consume large amounts of memory bandwidth. To mitigate this problem, computing systems include large and deep memory cache hierarchies that exploit both spatial and temporal locality. In this context, hardware data cache prefetching constitutes a useful method to anticipate cache misses and boost performance. Despite their success in terms of high coverage rates, current data cache prefetchers incur a significant number of late and sometimes useless prefetches. Additionally, these state-of-the-art prefetchers are not aware of architecture trends towards larger vector units and vector-length agnostic instruction sets. This paper demonstrates that these trends bring new prefetching opportunities that make it possible to increase the accuracy and timeliness of any state-of-the-art prefetcher with a negligible area cost. We propose the the Register Vector Length Agnostic (ReVeLA) prefetcher. ReVeLA exploits program semantics present in vectorized codes. The ReVeLA prefetcher complements existing data cache prefetchers by providing highly accurate prefetch requests that improve prefetching timeliness and accuracy without significantly increasing memory bandwidth consumption. When applied on top of a state-of-the-art out-of-order vector processor, ReVeLA delivers a speed-up of 1.23 × with respect to a system without any prefetching approach. When combined with the NextLine, BOP, SPP, and PPF prefetchers, ReVeLA improves performance by 6.57%, 4.46%, 11.83%, and 11.40% respectively, with respect to a vector processor equipped with these prefetching approaches. Additionally, our evaluation demonstrates that ReVeLA increases memory bandwidth consumption by only 3.74% when combined with the most performing data cache prefetcher of our experimental campaign.

Descripció

Persones/entitats

Document relacionat

Versió de

Citació

Martínez, F. [et al.]. Exploiting vector code semantics for efficient data cache prefetching. A: International Conference on Supercomputing. "ACM ICS'24: proceedings of the 38th ACM International Conference on Supercomputing: June 4–7, 2024, Kyoto, Japan". New York: Association for Computing Machinery (ACM), 2024, p. 98-109. ISBN 979-8-4007-0610-3. DOI 10.1145/3650200.3656635.

Ajut

Forma part

Dipòsit legal

ISBN

979-8-4007-0610-3

ISSN

Altres identificadors

Referències