Mostra el registre d'ítem simple
A low-power, high-performance speech recognition accelerator
dc.contributor.author | Yazdani, Reza |
dc.contributor.author | Arnau Montañés, José María |
dc.contributor.author | González Colás, Antonio María |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors |
dc.date.accessioned | 2020-01-20T16:18:47Z |
dc.date.available | 2020-01-20T16:18:47Z |
dc.date.issued | 2019-12-01 |
dc.identifier.citation | Yazdani, R.; Arnau, J.; Gonzalez, A. A low-power, high-performance speech recognition accelerator. "IEEE transactions on computers", 1 Desembre 2019, vol. 68, núm. 12, p. 1817-1831. |
dc.identifier.issn | 0018-9340 |
dc.identifier.uri | http://hdl.handle.net/2117/175332 |
dc.description | © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
dc.description.abstract | Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. Hardware acceleration reduces energy-consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for largevocabulary, speaker-independent, continuous speech-recognition. It focuses on the Viterbi search algorithm representing the main bottleneck in an ASR system. The proposed design consists of innovative techniques to improve the memory subsystem, since memory is the main bottleneck for performance and power in these accelerators' design. It includes a prefetching scheme tailored to the needs of ASR systems that hides main memory latency for a large fraction of the memory accesses, negligibly impacting area. Additionally, we introduce a novel bandwidth-saving technique that removes off-chip memory accesses by 20 percent. Finally, we present a power saving technique that significantly reduces the leakage power of the accelerators scratchpad memories, providing between 8.5 and 29.2 percent reduction in entire power dissipation. Overall, the proposed design outperforms implementations running on the CPU by orders of magnitude, and achieves speedups between 1.7x and 5.9x for different speech decoders over a highly optimized CUDA implementation running on Geforce-GTX-980 GPU, while reducing the energy by 123-454x. |
dc.format.extent | 15 p. |
dc.language.iso | eng |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) |
dc.subject | Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic |
dc.subject.lcsh | Automatic speech recognition |
dc.subject.other | Viterbi algorithm |
dc.subject.other | Speech recognition |
dc.subject.other | Graphics processing units |
dc.subject.other | Acoustics |
dc.subject.other | Central Processing Unit |
dc.subject.other | Hardware |
dc.subject.other | Decoding |
dc.subject.other | Automatic Speech Recognition (ASR) |
dc.subject.other | Viterbi search |
dc.subject.other | hardware accelerator |
dc.subject.other | WFST |
dc.subject.other | low-power architecture |
dc.title | A low-power, high-performance speech recognition accelerator |
dc.type | Article |
dc.subject.lemac | Reconeixement automàtic de la parla |
dc.contributor.group | Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors |
dc.identifier.doi | 10.1109/TC.2019.2937075 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | https://ieeexplore.ieee.org/document/8812893 |
dc.rights.access | Open Access |
local.identifier.drac | 26417166 |
dc.description.version | Postprint (author's final draft) |
dc.relation.projectid | info:eu-repo/grantAgreement/EC/H2020/833057/EU/CoCoUnit: An Energy-Efficient Processing Unit for Cognitive Computing/CoCoUnit |
dc.relation.projectid | info:eu-repo/grantAgreement/MINECO/1PE/TIN2016-75344-R |
local.citation.author | Yazdani, R.; Arnau, J.; Gonzalez, A. |
local.citation.publicationName | IEEE transactions on computers |
local.citation.volume | 68 |
local.citation.number | 12 |
local.citation.startingPage | 1817 |
local.citation.endingPage | 1831 |
Fitxers d'aquest items
Aquest ítem apareix a les col·leccions següents
-
Articles de revista [1.050]
-
Articles de revista [68]