Reducing cache coherence traffic with hierarchical directory cache and NUMA-aware runtime scheduling

Caheny, Paul; Casas, Marc; Moretó Planas, Miquel; Gloaguen, Hervé; Saintes, Maxime; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Valero Cortés, Mateo

doi:10.1145/2967938.2967962

Visualitza/Obre

Reducing Cache Coherence Traffic with Hierarchical.pdf (889,8Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Caheny, Paul

Casas, Marc

Moretó Planas, Miquel

Gloaguen, Hervé

Saintes, Maxime

Ayguadé Parra, Eduard

Labarta Mancho, Jesús José

Valero Cortés, Mateo

Tipus de documentText en actes de congrés

Data publicació2016

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
ROMOL - Riding on Moore's Law (EC-FP7-321253)
Mont-Blanc 3 - Mont-Blanc 3, European scalable and power efficient HPC platform based on low-power embedded technology (EC-H2020-671697)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
ROMOL - Riding on Moore's Law (EC-FP7-321253)
BARCELONA SUPERCOMPUTING CENTER - CENTRO. NACIONAL DE SUPERCOMPUTACION (MINECO-SEV-2015-0493)
Mont-Blanc 3 - Mont-Blanc 3, European scalable and power efficient HPC platform based on low-power embedded technology (EC-H2020-671697)

Abstract

Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on- and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol combined with runtime managed NUMA-aware scheduling and data allocation techniques to make most efficient use of the added hardware. The effectiveness of this joint approach is demonstrated by speedups of 1.23x to 2.54x and coherence traffic reductions between 44% and 77% in comparison to NUMA-oblivious scheduling and data allocation. Furthermore, we show that the NUMA-aware techniques we employ at the runtime level are crucial to ensure the added hierarchical layer in the directory coherence protocol does not introduce significant coherence traffic to the system.

CitacióCaheny, P., Casas, M., Moreto, M., Gloaguen, H., Saintes, M., Ayguadé, E., Labarta, J., Valero, M. Reducing cache coherence traffic with hierarchical directory cache and NUMA-aware runtime scheduling. A: International Conference on Parallel Architectures and Compilation Techniques. "PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation". Haifa: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 275-286.

URIhttp://hdl.handle.net/2117/96470

DOI10.1145/2967938.2967962

ISBN978-1-4503-4121-9

Versió de l'editorhttp://dl.acm.org/citation.cfm?id=2967962

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Reducing Cache ... ffic with Hierarchical.pdf		889,8Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Reducing cache coherence traffic with hierarchical directory cache and NUMA-aware runtime scheduling

Visualitza/Obre

Explora