Reducing data movement on large shared memory systems by exploiting computation dependencies

Barrera, I.S.; Ayguadé Parra, Eduard; Valero Cortés, Mateo; Moretó Planas, Miquel; Labarta Mancho, Jesús José; Casas, Marc

doi:10.1145/3205289.3205310

Visualitza/Obre

p207-Barrera.pdf (1,363Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Barrera, I.S.

Ayguadé Parra, Eduard

Valero Cortés, Mateo

Moretó Planas, Miquel

Labarta Mancho, Jesús José

Casas, Marc

Tipus de documentText en actes de congrés

Data publicació2018

EditorAssociation for Computing Machinery (ACM)

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
Mont-Blanc 2020 - Mont-Blanc 2020, European scalable, modular and power efficient HPC processor (EC-H2020-779877)
Mont-Blanc 3 - Mont-Blanc 3, European scalable and power efficient HPC platform based on low-power embedded technology (EC-H2020-671697)

Abstract

Shared memory systems are becoming increasingly complex as they typically integrate several storage devices. That brings different access latencies or bandwidth rates depending on the proximity between the cores where memory accesses are issued and the storage devices containing the requested data. In this context, techniques to manage and mitigate non-uniform memory access (NUMA) effects consist in migrating threads, memory pages or both and are generally applied by the system software. We propose techniques at the runtime system level to further mitigate the impact of NUMA effects on parallel applications' performance. We leverage runtime system metadata expressed in terms of a task dependency graph, where nodes are pieces of serial code and edges are control or data dependencies between them, to efficiently reduce data transfers. Our approach, based on graph partitioning, adds negligible overhead and is able to provide performance improvements up to 1.52× and average improvements of 1.12× with respect to the best state-of-the-art approach when deployed on a 288-core shared-memory system. Our approach reduces the coherence traffic by 2.28× on average with respect to the state-of-the-art.

CitacióBarrera, I., Ayguade, E., Valero, M., Moreto, M., Labarta, J., Casas, M. Reducing data movement on large shared memory systems by exploiting computation dependencies. A: International Conference on Supercomputing. "ICS'18: 2018 International Conference on Supercomputing: Beijing, China: June 12-15, 2018". New York: Association for Computing Machinery (ACM), 2018, p. 207-217.

URIhttp://hdl.handle.net/2117/125137

DOI10.1145/3205289.3205310

ISBN978-1-4503-5783-8

Versió de l'editorhttps://dl.acm.org/citation.cfm?id=3205310

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
p207-Barrera.pdf		1,363Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Reducing data movement on large shared memory systems by exploiting computation dependencies

Visualitza/Obre

Explora