SCU: a GPU stream compaction unit for graph processing

Segura Salvador, Albert; Arnau Montañés, José María; González Colás, Antonio María

doi:10.1145/3307650.3322254

dc.contributor.author	Segura Salvador, Albert
dc.contributor.author	Arnau Montañés, José María
dc.contributor.author	González Colás, Antonio María
dc.contributor.other	Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2020-02-05T15:50:57Z
dc.date.issued	2019
dc.identifier.citation	Segura, A.; Arnau, J.; Gonzalez, A. SCU: a GPU stream compaction unit for graph processing. A: International Symposium on Computer Architecture. "ISCA'19: Proceedings of the 46th International Symposium on Computer Architecture: June 22-26, 2019: Phoenix, AZ, USA". New York: Association for Computing Machinery (ACM), 2019, p. 423-435.
dc.identifier.isbn	978-1-4503-6669-4
dc.identifier.uri	http://hdl.handle.net/2117/176876
dc.description.abstract	Graph processing algorithms are key in many emerging applications in areas such as machine learning and data analytics. Although the processing of large scale graphs exhibits a high degree of parallelism, the memory access pattern tend to be highly irregular, leading to poor GPGPU efficiency due to memory divergence. To ameliorate this issue, GPGPU applications perform a stream compaction operation each iteration of the algorithm to extract the subset of active nodes/edges, so subsequent steps work on compacted dataset. We show that GPGPU architectures are inefficient for stream compaction, and propose to offload this task to a programmable Stream Compaction Unit (SCU) tailored to the requirements of this kernel. The SCU is a small unit tightly integrated in the GPU that efficiently gathers the active nodes/edges into a compacted array in memory. Applications can make use of it through a simple API. The remaining steps of the graph-based algorithm are executed on the GPU cores taking benefit of the large amount of parallelism in the GPU, but they operate on the SCU-prepared data and achieve larger memory coalescing and, hence, much higher efficiency. Besides, the SCU performs filtering of repeated and already visited nodes during the compaction process, significantly reducing GPGPU workload, and writes the compacted nodes/edges in an order that improves memory coalescing by reducing memory divergence. We evaluate the performance of a state-of-the-art GPGPU architecture extended with our SCU for a wide variety of applications. Results show that for high-performance and for low-power GPU systems the SCU achieves speedups of 1.37x and 2.32x, 84.7% and 69% energy savings, and an area increase of 3.3% and 4.1% respectively.
dc.format.extent	13 p.
dc.language.iso	eng
dc.publisher	Association for Computing Machinery (ACM)
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo
dc.subject	Àrees temàtiques de la UPC::Informàtica::Hardware
dc.subject.lcsh	Image processing -- Digital techniques
dc.subject.lcsh	Computers
dc.subject.other	GPGPU
dc.subject.other	Graph processing
dc.subject.other	Stream compaction
dc.title	SCU: a GPU stream compaction unit for graph processing
dc.type	Conference report
dc.subject.lemac	Imatges -- Processament -- Tècniques digitals
dc.subject.lemac	Ordinadors
dc.contributor.group	Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
dc.identifier.doi	10.1145/3307650.3322254
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	https://dl.acm.org/doi/abs/10.1145/3307650.3322254
dc.rights.access	Restricted access - publisher's policy
local.identifier.drac	26580782
dc.description.version	Postprint (published version)
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO/1PE/TIN2016-75344-R
dc.date.lift	10000-01-01
local.citation.author	Segura, A.; Arnau, J.; Gonzalez, A.
local.citation.contributor	International Symposium on Computer Architecture
local.citation.pubplace	New York
local.citation.publicationName	ISCA'19: Proceedings of the 46th International Symposium on Computer Architecture: June 22-26, 2019: Phoenix, AZ, USA
local.citation.startingPage	423
local.citation.endingPage	435

Fitxers d'aquest items

Nom:: ISCA2019.pdf
Mida:: 866,7Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [294]
Ponències/Comunicacions de congressos [187]
Ponències/Comunicacions de congressos [1.955]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

SCU: a GPU stream compaction unit for graph processing

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora