Show simple item record

dc.contributor.authorJamet, Alexandre Valentin
dc.contributor.authorVavouliotis, Georgios
dc.contributor.authorJiménez, Daniel A.
dc.contributor.authorÁlvarez Martí, Lluc
dc.contributor.authorCasas, Marc
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
dc.contributor.otherBracelona Supercomputing Center
dc.date.accessioned2024-04-11T08:17:50Z
dc.date.available2024-04-11T08:17:50Z
dc.date.issued2024
dc.identifier.citationAlexandre, J. [et al.]. A two level neural approach combining off-chip prediction with adaptive prefetch filtering. A: IEEE International Symposium on High-Performance Computer Architecture. "2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024: 2-6 March 2024, Edinburgh, United Kingdom". Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 528-542. ISBN 979-8-3503-9313-2. DOI 10.1109/HPCA57654.2024.00046.
dc.identifier.isbn979-8-3503-9313-2
dc.identifier.urihttp://hdl.handle.net/2117/406359
dc.description.abstractTo alleviate the performance and energy overheads of contemporary applications with large data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism that effectively combines predicting whether an access will be off-chip with adaptive prefetch filtering at the first-level data cache (L1D). TLP is composed of two connected microarchitectural perceptron predictors, named First Level Predictor (FLP) and Second Level Predictor (SLP). FLP performs accurate off-chip prediction by using several program features based on virtual addresses and a novel selective delay component. The novelty of SLP relies on leveraging off-chip prediction to drive L1D prefetch filtering by using physical addresses and the FLP prediction as features. TLP constitutes the first hardware proposal targeting both off-chip prediction and prefetch filtering using a multilevel perceptron hardware approach. TLP only requires 7KB of storage. To demonstrate the benefits of TLP we compare its performance with state-of-the-art approaches using off-chip prediction and prefetch filtering on a wide range of single-core and multi-core workloads. Our experiments show that TLP reduces the average DRAM transactions by 30.7% and 17.7%, as compared to a baseline using state-of-the-art cache prefetchers but no off-chip prediction mechanism, across the single-core and multi-core workloads, respectively, while recent work significantly increases DRAM transactions. As a result, TLP achieves geometric mean performance speedups of 6.2% and 11.8% across single-core and multi-core workloads, respectively. In addition, our evaluation demonstrates that TLP is effective independently of the L1D prefetching logic.
dc.description.sponsorshipThis work has been partially supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033 (contracts PID2019-107255GB-C21 and PID2019-105660RBC22) and by the Generalitat de Catalunya (contract 2021-SGR00763). This work is supported by the National Science Foundation through grant CCF-1912617 and generous gifts from Intel. Marc Casas has been partially supported by the Grant RYC2017-23269 funded by MCIN/AEI/10.13039/501100011033 and by ESF Investing in your future. Els autors agraeixen el suport del Departament de Recerca i Universitats de la Generalitat de Catalunya al Grup de Recerca ”Performance understanding, analysis, and simulation/emulation of novel architectures” (Codi: 2021 SGR 00865).
dc.format.extent15 p.
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshMemory management (Computer science)
dc.subject.lcshRandom access memory
dc.subject.otherHardware prefetching
dc.subject.otherOff-chip prediction
dc.subject.otherPrefetch filtering
dc.subject.otherMicro-architecture
dc.subject.otherGraph-processing
dc.titleA two level neural approach combining off-chip prediction with adaptive prefetch filtering
dc.typeConference report
dc.subject.lemacGestió de memòria (Informàtica)
dc.subject.lemacMemòria d'accés aleatori
dc.identifier.doi10.1109/HPCA57654.2024.00046
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://ieeexplore.ieee.org/abstract/document/10476485
dc.rights.accessOpen Access
local.identifier.drac38777557
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C21/ES/BSC - COMPUTACION DE ALTAS PRESTACIONES VIII/
dc.relation.projectidinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-105660RB-C22/ES/REDES DE INTERCONEXION, ACELERADORES HARDWARE Y OPTIMIZACION DE APLICACIONES/
dc.relation.projectidinfo:eu-repo/grantAgreement/MICIU//RYC2017-23269
local.citation.authorAlexandre, J.; Vavouliotis, G.; Jiménez, D. A.; Alvarez, L.; Casas, M.
local.citation.contributorIEEE International Symposium on High-Performance Computer Architecture
local.citation.publicationName2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024: 2-6 March 2024, Edinburgh, United Kingdom
local.citation.startingPage528
local.citation.endingPage542


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record