Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
61.603 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Beyond the socket: NUMA-aware GPUs

Thumbnail
View/Open
Beyond the Socket NUMA-AwareGPUs.pdf (1,109Mb)
 
10.1145/3123939.3124534
 
  View Usage Statistics
  LA Referencia / Recolecta stats
Cita com:
hdl:2117/109704

Show full item record
Ugljesa, Milic
Villa, Oreste
Bolotin, Evgeny
Arunkumar, Akhil
Ebrahimi, Eiman
Jaleel, Aamer
Ramirez, Alex
Nellans, David
Document typeConference lecture
Defense date2017-10
PublisherAssociation for Computing Machinery
Rights accessOpen Access
Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain
ProjectCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
Abstract
GPUs achieve high throughput and power efficiency by employing many small single instruction multiple thread (SIMT) cores. To minimize scheduling logic and performance variance they utilize a uniform memory system and leverage strong data parallelism exposed via the programming model. With Moore's law slowing, for GPUs to continue scaling performance (which largely depends on SIMT core count) they are likely to embrace multi-socket designs where transistors are more readily available. However when moving to such designs, maintaining the illusion of a uniform memory system is increasingly difficult. In this work we investigate multi-socket non-uniform memory access (NUMA) GPU designs and show that significant changes are needed to both the GPU interconnect and cache architectures to achieve performance scalability. We show that application phase effects can be exploited allowing GPU sockets to dynamically optimize their individual interconnect and cache policies, minimizing the impact of NUMA effects. Our NUMA-aware GPU outperforms a single GPU by 1.5×, 2.3×, and 3.2× while achieving 89%, 84%, and 76% of theoretical application scalability in 2, 4, and 8 sockets designs respectively. Implementable today, NUMA-aware multi-socket GPUs may be a promising candidate for scaling GPU performance beyond a single socket.
CitationUgljesa, M. [et al.]. Beyond the socket: NUMA-aware GPUs. A: "MICRO-50 '17 Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture". Association for Computing Machinery, 2017, p. 123-135. 
URIhttp://hdl.handle.net/2117/109704
DOI10.1145/3123939.3124534
ISBN978-1-4503-4952-9
Publisher versionhttps://dl.acm.org/citation.cfm?id=3124534
Collections
  • Computer Sciences - Ponències/Comunicacions de congressos [530]
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
Beyond the Socket NUMA-AwareGPUs.pdf1,109MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina