Show simple item record

dc.contributor.authorSharma, Robin Kumar
dc.contributor.authorCasas Guix, Marc
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2020-07-07T06:15:43Z
dc.date.available2020-07-07T06:15:43Z
dc.date.issued2020
dc.identifier.citationSharma, R.; Casas, M. Wavefront parallelization of recurrent neural networks on multi-core architectures. A: International Conference on Supercomputing. "Proceedings of the 34th ACM International Conference on Supercomputing (ICS-2020): Barcelona, June 29–July 2, 2020". New York: Association for Computing Machinery (ACM), 2020, article 5, p. 1-12.
dc.identifier.isbn978-1-4503-7983-0
dc.identifier.urihttp://hdl.handle.net/2117/192508
dc.description.abstractRecurrent neural networks (RNNs) are widely used for natural language processing, time-series prediction, or text analysis tasks. The internal structure of RNNs inference and training in terms of data or control dependencies across their fundamental numerical kernels complicate the exploitation of model parallelism, which is the reason why just data-parallelism has been traditionally applied to accelerate RNNs. This paper presents W-Par (Wavefront-Parallelization), a comprehensive approach for RNNs inference and training on CPUs that relies on applying model parallelism into RNNs models. We use fine-grained pipeline parallelism in terms of wavefront computations to accelerate multi-layer RNNs running on multi-core CPUs. Wavefront computations have been widely applied in many scientific computing domains like stencil kernels or dynamic programming. W-Par divides RNNs workloads across different parallel tasks by defining input and output dependencies for each RNN cell. Our experiments considering different RNNs models demonstrate that W-Par achieves up to 6.6X speed-up for RNN models inference and training in comparison to current state-of-the-art implementations on modern multi-core CPU architectures. Importantly, W-Par maximizes performance on a wide range of scenarios, including different core counts or memory hierarchy configurations, without requiring any change at the source code level.
dc.description.sponsorshipThis work has been supported by the European Union's Horizon 2020 research and innovation program (MB2020 project, grant agreement 779877), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), and by Generalitat de Catalunya (contracts 2017-SGR-1414 and 2017-SGR-1328). M. Casas has been partially supported by the Spanish Ministry of Economy, Industry, and Competitiveness under Ramon y Cajal fellowship number RYC2017-23269.
dc.format.extent12 p.
dc.language.isoeng
dc.publisherAssociation for Computing Machinery (ACM)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcshNeural networks (Computer science)
dc.subject.lcshParallel algorithms
dc.subject.otherDeep Neural Network (DNN)
dc.subject.otherWavefront Parallelization
dc.subject.otherRecurrent Neural Networks (RNNs)
dc.subject.otherLong-Short Term Memory (LSTM)
dc.subject.otherGated Recurrent Units (GRUs)
dc.subject.otherOmpSs
dc.subject.otherCPU Task Parallelism
dc.titleWavefront parallelization of recurrent neural networks on multi-core architectures
dc.typeConference report
dc.subject.lemacXarxes neuronals (Informàtica)
dc.subject.lemacAlgorismes paral·lels
dc.identifier.doi10.1145/3392717.3392762
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://dl.acm.org/doi/abs/10.1145/3392717.3392762
dc.rights.accessOpen Access
local.identifier.drac28818339
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TIN2015-65316-P
dc.relation.projectidinfo:eu-repo/grantAgreement/AGAUR/2017 SGR 1414
dc.relation.projectidinfo:eu-repo/grantAgreement/AGAUR/2017-SGR-1328
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/779877/EU/Mont-Blanc 2020, European scalable, modular and power efficient HPC processor/Mont-Blanc 2020
local.citation.authorSharma, R.; Casas, M.
local.citation.contributorInternational Conference on Supercomputing
local.citation.pubplaceNew York
local.citation.publicationNameProceedings of the 34th ACM International Conference on Supercomputing (ICS-2020): Barcelona, June 29–July 2, 2020
local.citation.startingPage1
local.citation.endingPage12


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder