Investigating memory prefetcher performance over parallel applications: from real to simulated

dc.contributor.authorGirelli, Valéria S.
dc.contributor.authorMoreira, Francis B.
dc.contributor.authorSerpa, Matheus S.
dc.contributor.authorSantos, Danilo C.
dc.contributor.authorNavaux, Philippe O. A.
dc.date.accessioned2023-02-23T18:37:50Z
dc.date.available2023-02-23T18:37:50Z
dc.date.issued2022-05
dc.description.abstractIn recent years, there have been significant advances in the performance of processors, exemplified by the reduction of transistor size and the increase in the number of cores in a processor. Conversely, the memory subsystem did not advance as significantly as processors, not being able to deliver data at the required rate, and creating what is known as the memory wall [1]. An example of a technology used to mitigate the memory latency is the prefetcher, a technique that identifies access patterns from each core, creates speculative memory requests, and fetches data that can be potentially useful to the cache beforehand. In High-Performance Computing (HPC) systems, many other problems arise with parallelism. Since HPC applications are highly parallel, with many threads communicating with one another mainly through shared memory, it becomes necessary to keep data coherence in the several cache levels. Moreover, the memory interactions among different threads may also unpredictably change the data path through the memory hierarchy. When considering the memory hierarchy complexity along with prefetcher action, the behavior of the processor’s memory subsystem reaches a new level of complexity. In this work, we seek to shed light on how the prefetcher affects the processing performance of parallel HPC applications, and how accurately state-of-the-art multicore architecture simulators are simulating the execution of such applications, with and without prefetcher. We identify that an L2 cache prefetcher is more efficient in comparison with an L1 prefetcher, since avoiding excessive L3 cache accesses better contributes to performance, when comparing to accessing the L2 cache. Moreover, we show evidence that the prefetchers’ contribution to performance is limited by the memory contention that emerges when the level of parallelism increases.
dc.format.extent2 p.
dc.identifier.citationGirelli, V.S. [et al.]. Investigating memory prefetcher performance over parallel applications: from real to simulated. A: . Barcelona Supercomputing Center, 2022, p. 93-94.
dc.identifier.urihttps://hdl.handle.net/2117/384135
dc.languageen
dc.language.isoeng
dc.publisherBarcelona Supercomputing Center
dc.rights.accessOpen Access
dc.rights.licensenameAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshHigh performance computing
dc.subject.lemacCàlcul intensiu (Informàtica)
dc.subject.otherPrefetching
dc.subject.othermemory system
dc.subject.otherHigh-performance computing
dc.titleInvestigating memory prefetcher performance over parallel applications: from real to simulated
dc.typeConference report
dspace.entity.typePublication
local.citation.endingPage94
local.citation.startingPage93

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
9BSCDS_42_Investigating Memory Prefetcher.pdf
Mida:
745.89 KB
Format:
Adobe Portable Document Format
Descripció: