Mostra el registre d'ítem simple

dc.contributor.authorTanasic, Ivan
dc.contributor.authorVilanova, Lluís
dc.contributor.authorJorda, Marc
dc.contributor.authorCabezas, Javier
dc.contributor.authorGelado Fernandez, Isaac
dc.contributor.authorNavarro, Nacho
dc.contributor.authorHwu, Wen-mei W.
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2015-04-14T09:02:45Z
dc.date.created2013
dc.date.issued2013
dc.identifier.citationTanasic, I. [et al.]. Comparison based sorting for systems with multiple GPUs. A: Workshop on General Purpose Processing Using GPUs. "GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processing Using Graphics Processing Units". Houston, TX: Association for Computing Machinery (ACM), 2013, p. 1-11.
dc.identifier.isbn978-1-4503-2017-7
dc.identifier.urihttp://hdl.handle.net/2117/27303
dc.description.abstractAs a basic building block of many applications, sorting algorithms that efficiently run on modern machines are key for the performance of these applications. With the recent shift to using GPUs for general purpose compuing, researches have proposed several sorting algorithms for single-GPU systems. However, some workstations and HPC systems have multiple GPUs, and applications running on them are designed to use all available GPUs in the system. In this paper we present a high performance multi-GPU merge sort algorithm that solves the problem of sorting data distributed across several GPUs. Our merge sort algorithm first sorts the data on each GPU using an existing single-GPU sorting algorithm. Then, a series of merge steps produce a globally sorted array distributed across all the GPUs in the system. This merge phase is enabled by a novel pivot selection algorithm that ensures that merge steps always distribute data evenly among all GPUs. We also present the implementation of our sorting algorithm in CUDA, as well as a novel inter-GPU communication technique that enables this pivot selection algorithm. Experimental results show that an efficient implementation of our algorithm achieves a speed up of 1.9x when running on two GPUs and 3.3x when running on four GPUs, compared to sorting on a single GPU. At the same time, it is able to sort two and four times more records, compared to sorting on one GPU.
dc.format.extent11 p.
dc.language.isoeng
dc.publisherAssociation for Computing Machinery (ACM)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshMultiprocessors
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.otherParallel
dc.subject.otherSorting
dc.subject.otherGPU
dc.subject.otherCUDA
dc.titleComparison based sorting for systems with multiple GPUs
dc.typeConference report
dc.subject.lemacMultiprocessadors
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1145/2458523.2458524
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://dl.acm.org/citation.cfm?id=2458524&dl=ACM&coll=DL&CFID=554585565&CFTOKEN=17692281
dc.rights.accessRestricted access - publisher's policy
local.identifier.drac15112635
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/FP7/288777/EU/Mont-Blanc, European scalable and power efficient HPC platform based on low-power embedded technology/MONT-BLANC
dc.date.lift10000-01-01
local.citation.authorTanasic, I.; Vilanova, L.; Jorda, M.; Cabezas, J.; Gelado, I.; Navarro, Nacho; Hwu, W.
local.citation.contributorWorkshop on General Purpose Processing Using GPUs
local.citation.pubplaceHouston, TX
local.citation.publicationNameGPGPU-6: Proceedings of the 6th Workshop on General Purpose Processing Using Graphics Processing Units
local.citation.startingPage1
local.citation.endingPage11


Fitxers d'aquest items

Imatge en miniatura

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple