Show simple item record

dc.contributor.authorPujol, Roger
dc.contributor.authorTabani, Hamid
dc.contributor.authorKosmidis, Leonidas
dc.contributor.authorMezzetti, Enrico
dc.contributor.authorAbella Ferrer, Jaume
dc.contributor.authorCazorla, Francisco J.
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2019-07-11T14:45:52Z
dc.date.available2019-07-11T14:45:52Z
dc.date.issued2019
dc.identifier.citationPujol, R. [et al.]. Generating and Exploiting Deep Learning Variants to Increase Heterogeneous Resource Utilization in the NVIDIA Xavier. A: 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). "31st Euromicro Conference on Real-Time Systems (ECRTS 2019)". 2019.
dc.identifier.isbn978-3-95977-110-8
dc.identifier.issn1868-8969
dc.identifier.urihttp://hdl.handle.net/2117/166069
dc.description.abstractDeep learning-based solutions and, in particular, deep neural networks (DNNs) are at the heart of several functionalities in critical-real time embedded systems (CRTES) from vision-based perception (object detection and tracking) systems to trajectory planning. As a result, several DNN instances simultaneously run at any time on the same computing platform. However, while modern GPUs offer a variety of computing elements (e.g. CPUs, GPUs, and specific accelerators) in which those DNN tasks can be executed depending on their computational requirements and temporal constraints, current DNNs are mainly programmed to exploit one of them, namely, regular cores in the GPU. This creates resource imbalance and under-utilization of GPU resources when executing several DNN instances, causing an increase in DNN tasks' execution time requirements. In this paper, (a) we develop different variants (implementations) of well-known DNN libraries used in the Apollo Autonomous Driving (AD) software for each of the computing elements of the latest NVIDIA Xavier SoC. Each variant can be configured to balance resource requirements and performance: the regular CPU core implementation that can run on 2, 4, and 6 cores; the GPU regular and Tensor core variants that can run in 4 or 8 GPU's Streaming Multiprocessors (SM); and 1 or 2 NVIDIA's Deep Learning Accelerators (NVDLA); (b) we show that each particular variant/configuration offers a different resource utilization/performance point; finally, (c) we show how those heterogeneous computing elements can be exploited by a static scheduler to sustain the execution of multiple and diverse DNN variants on the same platform.
dc.description.sponsorshipThis work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773), and the HiPEAC Network of Excellence. MINECO partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717), Enrico Mezzetti under Juan de la Cierva-Incorporación postdoctoral fellowship (IJCI-2016-27396), and Leonidas Kosmidis under Juan de la Cierva-Formación postdoctoral fellowship (FJCI-2017-34095).
dc.format.extent23
dc.language.isoeng
dc.subjectÀrees temàtiques de la UPC::Informàtica
dc.subject.lcshHigh performance computing
dc.subject.otherDeep Neural Network (DNN)
dc.subject.otherGPU
dc.subject.otherHeterogeneous Resources
dc.titleGenerating and Exploiting Deep Learning Variants to Increase Heterogeneous Resource Utilization in the NVIDIA Xavier
dc.typeConference lecture
dc.subject.lemacSupercomputadors
dc.identifier.doi10.4230/LIPIcs.ECRTS.2019.23
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://drops.dagstuhl.de/opus/volltexte/2019/10760/
dc.rights.accessOpen Access
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/772773/EU/Sustainable Performance for High-Performance Embedded Computing Systems/SuPerCom
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/PE2013-2016/TIN2015-65316-P
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/PE2013-2016/RYC-2013-14717
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/PE2013-2016/IJCI-2016-27396
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/PE2013-2016/FJCI-2017-34095
local.citation.contributor31st Euromicro Conference on Real-Time Systems (ECRTS 2019)
local.citation.publicationName31st Euromicro Conference on Real-Time Systems (ECRTS 2019)
local.citation.volume23


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder