Reducing data movement on large shared memory systems by exploiting computation dependencies

Barrera, I.S.; Ayguadé Parra, Eduard; Valero Cortés, Mateo; Moretó Planas, Miquel; Labarta Mancho, Jesús José; Casas, Marc

doi:10.1145/3205289.3205310

dc.contributor.author	Barrera, I.S.
dc.contributor.author	Ayguadé Parra, Eduard
dc.contributor.author	Valero Cortés, Mateo
dc.contributor.author	Moretó Planas, Miquel
dc.contributor.author	Labarta Mancho, Jesús José
dc.contributor.author	Casas, Marc
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2018-11-27T15:43:03Z
dc.date.issued	2018
dc.identifier.citation	Barrera, I., Ayguade, E., Valero, M., Moreto, M., Labarta, J., Casas, M. Reducing data movement on large shared memory systems by exploiting computation dependencies. A: International Conference on Supercomputing. "ICS'18: 2018 International Conference on Supercomputing: Beijing, China: June 12-15, 2018". New York: Association for Computing Machinery (ACM), 2018, p. 207-217.
dc.identifier.isbn	978-1-4503-5783-8
dc.identifier.uri	http://hdl.handle.net/2117/125137
dc.description.abstract	Shared memory systems are becoming increasingly complex as they typically integrate several storage devices. That brings different access latencies or bandwidth rates depending on the proximity between the cores where memory accesses are issued and the storage devices containing the requested data. In this context, techniques to manage and mitigate non-uniform memory access (NUMA) effects consist in migrating threads, memory pages or both and are generally applied by the system software. We propose techniques at the runtime system level to further mitigate the impact of NUMA effects on parallel applications' performance. We leverage runtime system metadata expressed in terms of a task dependency graph, where nodes are pieces of serial code and edges are control or data dependencies between them, to efficiently reduce data transfers. Our approach, based on graph partitioning, adds negligible overhead and is able to provide performance improvements up to 1.52× and average improvements of 1.12× with respect to the best state-of-the-art approach when deployed on a 288-core shared-memory system. Our approach reduces the coherence traffic by 2.28× on average with respect to the state-of-the-art.
dc.description.sponsorship	This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the European Union’s Horizon 2020 research and innovation programme (grant agreements 671697 and 779877). I. Sánchez Barrera has been partially supported by the Spanish Ministry of Education, Culture and Sport under Formación del Profesorado Universitario fellowship number FPU15/03612. M. Moretó has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramón y Cajal fellowship number RYC-2016-21104.
dc.format.extent	11 p.
dc.language.iso	eng
dc.publisher	Association for Computing Machinery (ACM)
dc.subject	Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació::Emmagatzematge i recuperació de la informació
dc.subject.lcsh	Parallel programming (Computer science)
dc.subject.other	NUMA
dc.subject.other	Scheduling
dc.subject.other	Shared memory
dc.subject.other	Task-based programming model Data transfer
dc.subject.other	Graph theory
dc.subject.other	Intelligent control
dc.subject.other	Memory architecture
dc.subject.other	Scheduling
dc.subject.other	Virtual storage
dc.subject.other	Graph Partitioning
dc.subject.other	Non uniform memory access
dc.subject.other	NUMA
dc.subject.other	Parallel application
dc.subject.other	Performance improvements
dc.subject.other	Shared memory
dc.subject.other	Shared memory system
dc.subject.other	Task-based programming
dc.subject.other	Data reduction
dc.title	Reducing data movement on large shared memory systems by exploiting computation dependencies
dc.type	Conference report
dc.subject.lemac	Programació en paral·lel (Informàtica)
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1145/3205289.3205310
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	https://dl.acm.org/citation.cfm?id=3205310
dc.rights.access	Open Access
local.identifier.drac	23526805
dc.description.version	Postprint (published version)
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
dc.relation.projectid	info:eu-repo/grantAgreement/AGAUR/PRI2010-2013/2014 SGR 1051
dc.relation.projectid	info:eu-repo/grantAgreement/AGAUR/PRI2010-2013/2014 SGR 1272
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/779877/EU/Mont-Blanc 2020, European scalable, modular and power efficient HPC processor/Mont-Blanc 2020
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/671697/EU/Mont-Blanc 3, European scalable and power efficient HPC platform based on low-power embedded technology/Mont-Blanc 3
dc.date.lift	10000-01-01
local.citation.author	Barrera, I.; Ayguade, E.; Valero, M.; Moreto, M.; Labarta, J.; Casas, M.
local.citation.contributor	International Conference on Supercomputing
local.citation.pubplace	New York
local.citation.publicationName	ICS'18: 2018 International Conference on Supercomputing: Beijing, China: June 12-15, 2018
local.citation.startingPage	207
local.citation.endingPage	217

Fitxers d'aquest items

Nom:: p207-Barrera.pdf
Mida:: 1,363Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [574]
Ponències/Comunicacions de congressos [784]
Ponències/Comunicacions de congressos [1.955]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Reducing data movement on large shared memory systems by exploiting computation dependencies

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora