Show simple item record

dc.contributor.authorGarcía-Flores, Víctor
dc.contributor.authorAyguadé Parra, Eduard
dc.contributor.authorPeña, Antonio J.
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2017-09-22T13:58:28Z
dc.date.issued2017
dc.identifier.citationGarcía-Flores, V., Ayguade, E., Peña, A. Efficient data sharing on heterogeneous systems. A: International Conference on Parallel Processing. "ICCP 2017: 46th International Conference on Parallel Processing: proceedings: 14-17 August 2017: Bristol, United Kingdom". Bristol: Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 121-130.
dc.identifier.isbn978-1-5386-1043-5
dc.identifier.urihttp://hdl.handle.net/2117/107928
dc.description.abstractGeneral-purpose computing on GPUs has become more accessible due to features such as shared virtual memory and demand paging. Unfortunately it comes at a price, and that is performance. Automatic memory management is convenient but suffers from many drawbacks, preventing heterogeneous systems from achieving their full potential. In this work we analyze the challenges and inefficiencies of demand paging in GPUs, in particular on collaborative computations where data migrates multiple times between host and device. We establish that demand paging on GPUs introduces significant overheads for these kind of computations, and identify the issues of false sharing and unnecessary data transfers derived from the granularity at which data is migrated. In order to alleviate these problems we propose a memory organization and dynamic migration scheme to efficiently share data between host and device at fine granularities and without software intervention. We evaluate our design with a set of collaborative heterogeneous benchmarks and find it achieves 15% lower execution times on average with cache line-sized migrations, but severely degrading performance on benchmarks that access large blocks of contiguous memory. Page-sized migrations, although inefficient, provide on average a 47% execution time reduction with our design over a baseline system implementing demand paging. Our results suggest that cache line-sized migrations are not feasible in systems using a PCI-Express interconnect. In order to understand how future interconnect technologies will impact the feasibility of fine-grained migrations, we evaluate our scheme with various link latencies. We find interconnect latencies four to five times lower than PCI-Express are sufficient to effectively share data at finer granularities.
dc.description.sponsorshipThe authors would like to thank Ivan Tanasic and Lluc Alvarez for their insightful suggestions and Javier López Ovalle for his help proofreading the document. This work has been supported by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P) and by the BSC/UPC NVIDIA GPU Center of Excellence. Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266.
dc.format.extent10 p.
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Sistemes d'informació::Emmagatzematge i recuperació de la informació
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcshMemory management (Computer science)
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.otherGraphics processing units
dc.subject.otherCollaboration
dc.subject.otherBenchmark testing
dc.subject.otherMemory management
dc.subject.otherOrganizations
dc.subject.otherPerformance evaluation
dc.subject.otherRandom access memory
dc.subject.otherMemory organization
dc.subject.otherHeterogeneous architectures
dc.subject.otherGPUs
dc.subject.otherCollaborative computations
dc.subject.otherDemand paging
dc.titleEfficient data sharing on heterogeneous systems
dc.typeConference report
dc.subject.lemacGestió de memòria (Informàtica)
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1109/ICPP.2017.21
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://ieeexplore.ieee.org/abstract/document/8025286/
dc.rights.accessRestricted access - publisher's policy
local.identifier.drac21548818
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TIN2015-65316-P
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/IJCI-2015-23266
dc.date.lift10000-01-01
local.citation.authorGarcía-Flores, V.; Ayguade, E.; Peña, A.
local.citation.contributorInternational Conference on Parallel Processing
local.citation.pubplaceBristol
local.citation.publicationNameICCP 2017: 46th International Conference on Parallel Processing: proceedings: 14-17 August 2017: Bristol, United Kingdom
local.citation.startingPage121
local.citation.endingPage130


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder