dc.contributor.author | Jamet, Alexandre Valentin |
dc.contributor.author | Vavouliotis, Georgios |
dc.contributor.author | Jiménez, Daniel A. |
dc.contributor.author | Álvarez Martí, Lluc |
dc.contributor.author | Casas, Marc |
dc.contributor.other | Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors |
dc.contributor.other | Bracelona Supercomputing Center |
dc.date.accessioned | 2024-04-11T08:17:50Z |
dc.date.available | 2024-04-11T08:17:50Z |
dc.date.issued | 2024 |
dc.identifier.citation | Alexandre, J. [et al.]. A two level neural approach combining off-chip prediction with adaptive prefetch filtering. A: IEEE International Symposium on High-Performance Computer Architecture. "2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024: 2-6 March 2024, Edinburgh, United Kingdom". Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 528-542. ISBN 979-8-3503-9313-2. DOI 10.1109/HPCA57654.2024.00046. |
dc.identifier.isbn | 979-8-3503-9313-2 |
dc.identifier.uri | http://hdl.handle.net/2117/406359 |
dc.description.abstract | To alleviate the performance and energy overheads of contemporary applications with large data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism that effectively combines predicting whether an access will be off-chip with adaptive prefetch filtering at the first-level data cache (L1D). TLP is composed of two connected microarchitectural perceptron predictors, named First Level Predictor (FLP) and Second Level Predictor (SLP). FLP performs accurate off-chip prediction by using several program features based on virtual addresses and a novel selective delay component. The novelty of SLP relies on leveraging off-chip prediction to drive L1D prefetch filtering by using physical addresses and the FLP prediction as features. TLP constitutes the first hardware proposal targeting both off-chip prediction and prefetch filtering using a multilevel perceptron hardware approach. TLP only requires 7KB of storage. To demonstrate the benefits of TLP we compare its performance with state-of-the-art approaches using off-chip prediction and prefetch filtering on a wide range of single-core and multi-core workloads. Our experiments show that TLP reduces the average DRAM transactions by 30.7% and 17.7%, as compared to a baseline using state-of-the-art cache prefetchers but no off-chip prediction mechanism, across the single-core and multi-core workloads, respectively, while recent work significantly increases DRAM transactions. As a result, TLP achieves geometric mean performance speedups of 6.2% and 11.8% across single-core and multi-core workloads, respectively. In addition, our evaluation demonstrates that TLP is effective independently of the L1D prefetching logic. |
dc.description.sponsorship | This work has been partially supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033 (contracts PID2019-107255GB-C21 and PID2019-105660RBC22) and by the Generalitat de Catalunya (contract 2021-SGR00763). This work is supported by the National Science Foundation through grant CCF-1912617 and generous gifts from Intel. Marc Casas has been partially supported by the Grant RYC2017-23269 funded by MCIN/AEI/10.13039/501100011033 and by ESF Investing in your future. Els autors agraeixen el suport del Departament de Recerca i Universitats de la Generalitat de Catalunya al Grup de Recerca ”Performance understanding, analysis, and simulation/emulation of novel architectures” (Codi: 2021 SGR 00865). |
dc.format.extent | 15 p. |
dc.language.iso | eng |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors |
dc.subject.lcsh | Memory management (Computer science) |
dc.subject.lcsh | Random access memory |
dc.subject.other | Hardware prefetching |
dc.subject.other | Off-chip prediction |
dc.subject.other | Prefetch filtering |
dc.subject.other | Micro-architecture |
dc.subject.other | Graph-processing |
dc.title | A two level neural approach combining off-chip prediction with adaptive prefetch filtering |
dc.type | Conference report |
dc.subject.lemac | Gestió de memòria (Informàtica) |
dc.subject.lemac | Memòria d'accés aleatori |
dc.identifier.doi | 10.1109/HPCA57654.2024.00046 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | https://ieeexplore.ieee.org/abstract/document/10476485 |
dc.rights.access | Open Access |
local.identifier.drac | 38777557 |
dc.description.version | Postprint (author's final draft) |
dc.relation.projectid | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C21/ES/BSC - COMPUTACION DE ALTAS PRESTACIONES VIII/ |
dc.relation.projectid | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-105660RB-C22/ES/REDES DE INTERCONEXION, ACELERADORES HARDWARE Y OPTIMIZACION DE APLICACIONES/ |
dc.relation.projectid | info:eu-repo/grantAgreement/MICIU//RYC2017-23269 |
local.citation.author | Alexandre, J.; Vavouliotis, G.; Jiménez, D. A.; Alvarez, L.; Casas, M. |
local.citation.contributor | IEEE International Symposium on High-Performance Computer Architecture |
local.citation.publicationName | 2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024: 2-6 March 2024, Edinburgh, United Kingdom |
local.citation.startingPage | 528 |
local.citation.endingPage | 542 |