Show simple item record

dc.contributor.authorUgljesa, Milic
dc.contributor.authorVilla, Oreste
dc.contributor.authorBolotin, Evgeny
dc.contributor.authorArunkumar, Akhil
dc.contributor.authorEbrahimi, Eiman
dc.contributor.authorJaleel, Aamer
dc.contributor.authorRamirez, Alex
dc.contributor.authorNellans, David
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2017-11-03T10:59:53Z
dc.date.available2017-11-03T10:59:53Z
dc.date.issued2017-10
dc.identifier.citationUgljesa, M. [et al.]. Beyond the socket: NUMA-aware GPUs. A: "MICRO-50 '17 Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture". Association for Computing Machinery, 2017, p. 123-135.
dc.identifier.isbn978-1-4503-4952-9
dc.identifier.urihttp://hdl.handle.net/2117/109704
dc.description.abstractGPUs achieve high throughput and power efficiency by employing many small single instruction multiple thread (SIMT) cores. To minimize scheduling logic and performance variance they utilize a uniform memory system and leverage strong data parallelism exposed via the programming model. With Moore's law slowing, for GPUs to continue scaling performance (which largely depends on SIMT core count) they are likely to embrace multi-socket designs where transistors are more readily available. However when moving to such designs, maintaining the illusion of a uniform memory system is increasingly difficult. In this work we investigate multi-socket non-uniform memory access (NUMA) GPU designs and show that significant changes are needed to both the GPU interconnect and cache architectures to achieve performance scalability. We show that application phase effects can be exploited allowing GPU sockets to dynamically optimize their individual interconnect and cache policies, minimizing the impact of NUMA effects. Our NUMA-aware GPU outperforms a single GPU by 1.5×, 2.3×, and 3.2× while achieving 89%, 84%, and 76% of theoretical application scalability in 2, 4, and 8 sockets designs respectively. Implementable today, NUMA-aware multi-socket GPUs may be a promising candidate for scaling GPU performance beyond a single socket.
dc.description.sponsorshipWe would like to thank anonymous reviewers and Steve Keckler for their help in improving this paper. The first author is supported by the Ministry of Economy and Competitiveness of Spain (TIN2012-34557, TIN2015-65316-P, and BES-2013-063925)
dc.format.extent13 p.
dc.language.isoeng
dc.publisherAssociation for Computing Machinery
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Enginyeria electrònica
dc.subject.lcshComputing Methodologies
dc.subject.lcshGPUs (Graphics processing units)
dc.subject.otherComputing methodologies
dc.subject.otherGraphics processors
dc.subject.otherComputer systems organization
dc.subject.otherSingle instruction
dc.subject.otherMultiple data
dc.titleBeyond the socket: NUMA-aware GPUs
dc.typeConference lecture
dc.subject.lemacOrdinadors--Programació
dc.identifier.doi10.1145/3123939.3124534
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://dl.acm.org/citation.cfm?id=3124534
dc.rights.accessOpen Access
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TIN2015-65316-P
local.citation.publicationNameMICRO-50 '17 Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture
local.citation.startingPage123
local.citation.endingPage135


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain