Investigating memory prefetcher performance over parallel applications: from real to simulated

Carregant...
Miniatura
El pots comprar en digital a:
El pots comprar en paper a:

Projectes de recerca

Unitats organitzatives

Número de la revista

Títol de la revista

ISSN de la revista

Títol del volum

Col·laborador

Editor

Tribunal avaluador

Realitzat a/amb

Càtedra / Departament / Institut

Tipus de document

Text en actes de congrés

Data publicació

Editor

Barcelona Supercomputing Center

Part de

Condicions d'accés

Accés obert

item.page.rightslicense

Creative Commons
Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Llevat que s'hi indiqui el contrari, els seus continguts estan subjectes a la llicència de Creative Commons: Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional

Assignatures relacionades

Assignatures relacionades

Datasets relacionats

Datasets relacionats

Projecte CCD

Abstract

In recent years, there have been significant advances in the performance of processors, exemplified by the reduction of transistor size and the increase in the number of cores in a processor. Conversely, the memory subsystem did not advance as significantly as processors, not being able to deliver data at the required rate, and creating what is known as the memory wall [1]. An example of a technology used to mitigate the memory latency is the prefetcher, a technique that identifies access patterns from each core, creates speculative memory requests, and fetches data that can be potentially useful to the cache beforehand. In High-Performance Computing (HPC) systems, many other problems arise with parallelism. Since HPC applications are highly parallel, with many threads communicating with one another mainly through shared memory, it becomes necessary to keep data coherence in the several cache levels. Moreover, the memory interactions among different threads may also unpredictably change the data path through the memory hierarchy. When considering the memory hierarchy complexity along with prefetcher action, the behavior of the processor’s memory subsystem reaches a new level of complexity. In this work, we seek to shed light on how the prefetcher affects the processing performance of parallel HPC applications, and how accurately state-of-the-art multicore architecture simulators are simulating the execution of such applications, with and without prefetcher. We identify that an L2 cache prefetcher is more efficient in comparison with an L1 prefetcher, since avoiding excessive L3 cache accesses better contributes to performance, when comparing to accessing the L2 cache. Moreover, we show evidence that the prefetchers’ contribution to performance is limited by the memory contention that emerges when the level of parallelism increases.

Descripció

Document relacionat

Citació

Girelli, V.S. [et al.]. Investigating memory prefetcher performance over parallel applications: from real to simulated. A: . Barcelona Supercomputing Center, 2022, p. 93-94.

Ajut

Forma part

DOI

Dipòsit legal

ISBN

ISSN

Versió de l'editor

Altres identificadors

Referències