Show simple item record

dc.contributor.authorBarrera, I.S.
dc.contributor.authorAyguadé Parra, Eduard
dc.contributor.authorValero Cortés, Mateo
dc.contributor.authorMoreto Planas, Miquel
dc.contributor.authorLabarta Mancho, Jesús José
dc.contributor.authorCasas Guix, Marc
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2018-11-27T15:43:03Z
dc.date.issued2018
dc.identifier.citationBarrera, I., Ayguade, E., Valero, M., Moreto, M., Labarta, J., Casas, M. Reducing data movement on large shared memory systems by exploiting computation dependencies. A: International Conference on Supercomputing. "ICS'18: 2018 International Conference on Supercomputing: Beijing, China: June 12-15, 2018". New York: Association for Computing Machinery (ACM), 2018, p. 207-217.
dc.identifier.isbn978-1-4503-5783-8
dc.identifier.urihttp://hdl.handle.net/2117/125137
dc.description.abstractShared memory systems are becoming increasingly complex as they typically integrate several storage devices. That brings different access latencies or bandwidth rates depending on the proximity between the cores where memory accesses are issued and the storage devices containing the requested data. In this context, techniques to manage and mitigate non-uniform memory access (NUMA) effects consist in migrating threads, memory pages or both and are generally applied by the system software. We propose techniques at the runtime system level to further mitigate the impact of NUMA effects on parallel applications' performance. We leverage runtime system metadata expressed in terms of a task dependency graph, where nodes are pieces of serial code and edges are control or data dependencies between them, to efficiently reduce data transfers. Our approach, based on graph partitioning, adds negligible overhead and is able to provide performance improvements up to 1.52× and average improvements of 1.12× with respect to the best state-of-the-art approach when deployed on a 288-core shared-memory system. Our approach reduces the coherence traffic by 2.28× on average with respect to the state-of-the-art.
dc.description.sponsorshipThis work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the European Union’s Horizon 2020 research and innovation programme (grant agreements 671697 and 779877). I. Sánchez Barrera has been partially supported by the Spanish Ministry of Education, Culture and Sport under Formación del Profesorado Universitario fellowship number FPU15/03612. M. Moretó has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramón y Cajal fellowship number RYC-2016-21104.
dc.format.extent11 p.
dc.language.isoeng
dc.publisherAssociation for Computing Machinery (ACM)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Sistemes d'informació::Emmagatzematge i recuperació de la informació
dc.subject.lcshParallel programming (Computer science)
dc.subject.otherNUMA
dc.subject.otherScheduling
dc.subject.otherShared memory
dc.subject.otherTask-based programming model Data transfer
dc.subject.otherGraph theory
dc.subject.otherIntelligent control
dc.subject.otherMemory architecture
dc.subject.otherScheduling
dc.subject.otherVirtual storage
dc.subject.otherGraph Partitioning
dc.subject.otherNon uniform memory access
dc.subject.otherNUMA
dc.subject.otherParallel application
dc.subject.otherPerformance improvements
dc.subject.otherShared memory
dc.subject.otherShared memory system
dc.subject.otherTask-based programming
dc.subject.otherData reduction
dc.titleReducing data movement on large shared memory systems by exploiting computation dependencies
dc.typeConference report
dc.subject.lemacProgramació en paral·lel (Informàtica)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1145/3205289.3205310
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://dl.acm.org/citation.cfm?id=3205310
dc.rights.accessOpen Access
drac.iddocument23526805
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TIN2015-65316-P
dc.relation.projectidinfo:eu-repo/grantAgreement/AGAUR/PRI2010-2013/2014 SGR 1051
dc.relation.projectidinfo:eu-repo/grantAgreement/AGAUR/PRI2010-2013/2014 SGR 1272
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/779877/EU/Mont-Blanc 2020, European scalable, modular and power efficient HPC processor/Mont-Blanc 2020
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/671697/EU/Mont-Blanc 3, European scalable and power efficient HPC platform based on low-power embedded technology/Mont-Blanc 3
dc.date.lift10000-01-01
upcommons.citation.authorBarrera, I.; Ayguade, E.; Valero, M.; Moreto, M.; Labarta, J.; Casas, M.
upcommons.citation.contributorInternational Conference on Supercomputing
upcommons.citation.pubplaceNew York
upcommons.citation.publishedtrue
upcommons.citation.publicationNameICS'18: 2018 International Conference on Supercomputing: Beijing, China: June 12-15, 2018
upcommons.citation.startingPage207
upcommons.citation.endingPage217


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder