SCU: a GPU stream compaction unit for graph processing

Segura Salvador, Albert; Arnau Montañés, José María; González Colás, Antonio María

doi:10.1145/3307650.3322254

Visualitza/Obre

ISCA2019.pdf (866,7Kb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Segura Salvador, Albert

Arnau Montañés, José María

González Colás, Antonio María

Tipus de documentText en actes de congrés

Data publicació2019

EditorAssociation for Computing Machinery (ACM)

Condicions d'accésAccés restringit per política de l'editorial

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Projecte

Abstract

Graph processing algorithms are key in many emerging applications in areas such as machine learning and data analytics. Although the processing of large scale graphs exhibits a high degree of parallelism, the memory access pattern tend to be highly irregular, leading to poor GPGPU efficiency due to memory divergence. To ameliorate this issue, GPGPU applications perform a stream compaction operation each iteration of the algorithm to extract the subset of active nodes/edges, so subsequent steps work on compacted dataset. We show that GPGPU architectures are inefficient for stream compaction, and propose to offload this task to a programmable Stream Compaction Unit (SCU) tailored to the requirements of this kernel. The SCU is a small unit tightly integrated in the GPU that efficiently gathers the active nodes/edges into a compacted array in memory. Applications can make use of it through a simple API. The remaining steps of the graph-based algorithm are executed on the GPU cores taking benefit of the large amount of parallelism in the GPU, but they operate on the SCU-prepared data and achieve larger memory coalescing and, hence, much higher efficiency. Besides, the SCU performs filtering of repeated and already visited nodes during the compaction process, significantly reducing GPGPU workload, and writes the compacted nodes/edges in an order that improves memory coalescing by reducing memory divergence. We evaluate the performance of a state-of-the-art GPGPU architecture extended with our SCU for a wide variety of applications. Results show that for high-performance and for low-power GPU systems the SCU achieves speedups of 1.37x and 2.32x, 84.7% and 69% energy savings, and an area increase of 3.3% and 4.1% respectively.

CitacióSegura, A.; Arnau, J.; Gonzalez, A. SCU: a GPU stream compaction unit for graph processing. A: International Symposium on Computer Architecture. "ISCA'19: Proceedings of the 46th International Symposium on Computer Architecture: June 22-26, 2019: Phoenix, AZ, USA". New York: Association for Computing Machinery (ACM), 2019, p. 423-435.

URIhttp://hdl.handle.net/2117/176876

DOI10.1145/3307650.3322254

ISBN978-1-4503-6669-4

Versió de l'editorhttps://dl.acm.org/doi/abs/10.1145/3307650.3322254

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
ISCA2019.pdf		866,7Kb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

SCU: a GPU stream compaction unit for graph processing

Visualitza/Obre

Explora