Show simple item record

dc.contributor.authorSilfa Feliz, Franyell Antonio
dc.contributor.authorDot, Gem
dc.contributor.authorArnau Montañés, José María
dc.contributor.authorGonzález Colás, Antonio María
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2019-01-29T13:50:04Z
dc.date.issued2018
dc.identifier.citationSilfa, F.A. [et al.]. E-PUR: an energy-efficient processing unit for recurrent neural networks. A: International Conference on Parallel Architectures and Compilation. "Proceedings of the 27th International Conference on Parallel Processing". 2018, p. 1-12.
dc.identifier.isbn978-1-4503-5986-3
dc.identifier.urihttp://hdl.handle.net/2117/127819
dc.description.abstractRecurrent Neural Networks (RNNs) are a key technology for emerging applications such as automatic speech recognition, machine translation or image description. Long Short Term Memory (LSTM) networks are the most successful RNN implementation, as they can learn long term dependencies to achieve high accuracy. Unfortunately, the recurrent nature of LSTM networks significantly constrains the amount of parallelism and, hence, multicore CPUs and many-core GPUs exhibit poor efficiency for RNN inference. In this paper, we present E-PUR, an energy-efficient processing unit tailored to the requirements of LSTM computation. The main goal of E-PUR is to support large recurrent neural networks for low-power mobile devices. E-PUR provides an efficient hardware implementation of LSTM networks that is flexible to support diverse applications. One of its main novelties is a technique that we call Maximizing Weight Locality (MWL), which improves the temporal locality of the memory accesses for fetching the synaptic weights, reducing the memory requirements by a large extent. Our experimental results show that E-PUR achieves real-time performance for different LSTM networks, while reducing energy consumption by orders of magnitude with respect to general-purpose processors and GPUs, and it requires a very small chip area. Compared to a modern mobile SoC, an NVIDIA Tegra X1, E-PUR provides an average energy reduction of 88x.
dc.format.extent12 p.
dc.language.isoeng
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subjectÀrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
dc.subject.lcshParallel programming (Computer science)
dc.subject.lcshMachine learning
dc.subject.otherRecurrent neural networks
dc.subject.otherLong short term memory
dc.subject.otherAccelerators
dc.titleE-PUR: an energy-efficient processing unit for recurrent neural networks
dc.typeConference report
dc.subject.lemacProgramació en paral·lel (Informàtica)
dc.subject.lemacAprenentatge automàtic
dc.contributor.groupUniversitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
dc.identifier.doi10.1145/3243176.3243184
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://dl.acm.org/citation.cfm?id=3243184
dc.rights.accessRestricted access - publisher's policy
local.identifier.drac23628347
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TIN2016-75344-R
dc.date.lift10000-01-01
local.citation.authorSilfa, F.A.; Dot, G.; Arnau, J.; Gonzalez, A.
local.citation.contributorInternational Conference on Parallel Architectures and Compilation
local.citation.publicationNameProceedings of the 27th International Conference on Parallel Processing
local.citation.startingPage1
local.citation.endingPage12


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder