Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications

García Flores, Víctor; Gomez Luna, J.; Grass, Thomas Dieter; Rico, Alejandro; Ayguadé Parra, Eduard; Pena, A. J.

doi:10.1109/IISWC.2016.7581277

Visualitza/Obre

07581277.pdf (2,188Mb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

García Flores, Víctor

Gomez Luna, J.

Grass, Thomas Dieter

Rico, Alejandro

Ayguadé Parra, Eduard

Pena, A. J.

Tipus de documentText en actes de congrés

Data publicació2016

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés restringit per política de l'editorial

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)

Abstract

Heterogeneous systems are ubiquitous in the field of High- Performance Computing (HPC). Graphics processing units (GPUs) are widely used as accelerators for their enormous computing potential and energy efficiency; furthermore, on-die integration of GPUs and general-purpose cores (CPUs) enables unified virtual address spaces and seamless sharing of data structures, improving programmability and softening the entry barrier for heterogeneous programming. Although on-die GPU integration seems to be the trend among the major microprocessor manufacturers, there are still many open questions regarding the architectural design of these systems. This paper is a step forward towards understanding the effect of on-chip resource sharing between GPU and CPU cores, and in particular, of the impact of last-level cache (LLC) sharing in heterogeneous computations. To this end, we analyze the behavior of a variety of heterogeneous GPU-CPU benchmarks on different cache configurations. We perform an evaluation of the popular Rodinia benchmark suite modified to leverage the unified memory address space. We find such GPGPU workloads to be mostly insensitive to changes in the cache hierarchy due to the limited interaction and data sharing between GPU and CPU. We then evaluate a set of heterogeneous benchmarks specifically designed to take advantage of the finegrained data sharing and low-overhead synchronization between GPU and CPU cores that these integrated architectures enable. We show how these algorithms are more sensitive to the design of the cache hierarchy, and find that when GPU and CPU share the LLC execution times are reduced by 25% on average, and energy-to-solution by over 20% for all benchmarks.

CitacióGarcia, V., Gomez, J., Grass, T., Rico, A., Ayguade, E., Pena, A. Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications. A: IEEE International Symposium on Workload Characterization. "2016 IEEE International Symposium on Workload Characterization (IISWC 2016): Providence, Rhode Island, USA: 25-27 September 2016". Providence, Rhode Island: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 168-177.

URIhttp://hdl.handle.net/2117/96866

DOI10.1109/IISWC.2016.7581277

ISBN978-1-5090-3895-4

Versió de l'editorhttp://ieeexplore.ieee.org/document/7581277/

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
07581277.pdf		2,188Mb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications

Visualitza/Obre

Explora