Show simple item record

dc.contributor.authorGarcía Flores, Víctor
dc.contributor.authorGomez Luna, J.
dc.contributor.authorGrass, Thomas Dieter
dc.contributor.authorRico, Alejandro
dc.contributor.authorAyguadé Parra, Eduard
dc.contributor.authorPena, A. J.
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.identifier.citationGarcia, V., Gomez, J., Grass, T., Rico, A., Ayguade, E., Pena, A. Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications. A: IEEE International Symposium on Workload Characterization. "2016 IEEE International Symposium on Workload Characterization (IISWC 2016): Providence, Rhode Island, USA: 25-27 September 2016". Providence, Rhode Island: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 168-177.
dc.description.abstractHeterogeneous systems are ubiquitous in the field of High- Performance Computing (HPC). Graphics processing units (GPUs) are widely used as accelerators for their enormous computing potential and energy efficiency; furthermore, on-die integration of GPUs and general-purpose cores (CPUs) enables unified virtual address spaces and seamless sharing of data structures, improving programmability and softening the entry barrier for heterogeneous programming. Although on-die GPU integration seems to be the trend among the major microprocessor manufacturers, there are still many open questions regarding the architectural design of these systems. This paper is a step forward towards understanding the effect of on-chip resource sharing between GPU and CPU cores, and in particular, of the impact of last-level cache (LLC) sharing in heterogeneous computations. To this end, we analyze the behavior of a variety of heterogeneous GPU-CPU benchmarks on different cache configurations. We perform an evaluation of the popular Rodinia benchmark suite modified to leverage the unified memory address space. We find such GPGPU workloads to be mostly insensitive to changes in the cache hierarchy due to the limited interaction and data sharing between GPU and CPU. We then evaluate a set of heterogeneous benchmarks specifically designed to take advantage of the finegrained data sharing and low-overhead synchronization between GPU and CPU cores that these integrated architectures enable. We show how these algorithms are more sensitive to the design of the cache hierarchy, and find that when GPU and CPU share the LLC execution times are reduced by 25% on average, and energy-to-solution by over 20% for all benchmarks.
dc.description.sponsorshipThis work has been supported by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P) and by the BSC/UPC NVIDIA GPU Center of Excellence.
dc.format.extent10 p.
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshComputer architecture
dc.subject.otherCache storage
dc.subject.otherGraphics processing units
dc.subject.otherMicroprocessor chips
dc.subject.otherLast-level cache sharing
dc.subject.otherIntegrated GPU-CPU system
dc.subject.otherHeterogeneous system
dc.subject.otherHighperformance computing
dc.subject.otherGraphics processing unit
dc.subject.otherEnergy efficiency
dc.subject.otherVirtual address space
dc.subject.otherOn-die GPU integration
dc.subject.otherOn-chip resource sharing
dc.subject.otherRodinia benchmark
dc.subject.otherUnified memory address space
dc.titleEvaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications
dc.typeConference report
dc.subject.lemacArquitectura d'ordinadors
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.description.peerreviewedPeer Reviewed
dc.rights.accessRestricted access - publisher's policy
dc.description.versionPostprint (published version)
upcommons.citation.authorGarcia, V., Gomez, J., Grass, T., Rico, A., Ayguade, E., Pena, A.
upcommons.citation.contributorIEEE International Symposium on Workload Characterization
upcommons.citation.pubplaceProvidence, Rhode Island
upcommons.citation.publicationName2016 IEEE International Symposium on Workload Characterization (IISWC 2016): Providence, Rhode Island, USA: 25-27 September 2016

Files in this item


This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder