A low-power, high-performance speech recognition accelerator

Yazdani, Reza; Arnau Montañés, José María; González Colás, Antonio María

doi:10.1109/TC.2019.2937075

dc.contributor.author	Yazdani, Reza
dc.contributor.author	Arnau Montañés, José María
dc.contributor.author	González Colás, Antonio María
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2020-01-20T16:18:47Z
dc.date.available	2020-01-20T16:18:47Z
dc.date.issued	2019-12-01
dc.identifier.citation	Yazdani, R.; Arnau, J.; Gonzalez, A. A low-power, high-performance speech recognition accelerator. "IEEE transactions on computers", 1 Desembre 2019, vol. 68, núm. 12, p. 1817-1831.
dc.identifier.issn	0018-9340
dc.identifier.uri	http://hdl.handle.net/2117/175332
dc.description	© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.description.abstract	Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. Hardware acceleration reduces energy-consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for largevocabulary, speaker-independent, continuous speech-recognition. It focuses on the Viterbi search algorithm representing the main bottleneck in an ASR system. The proposed design consists of innovative techniques to improve the memory subsystem, since memory is the main bottleneck for performance and power in these accelerators' design. It includes a prefetching scheme tailored to the needs of ASR systems that hides main memory latency for a large fraction of the memory accesses, negligibly impacting area. Additionally, we introduce a novel bandwidth-saving technique that removes off-chip memory accesses by 20 percent. Finally, we present a power saving technique that significantly reduces the leakage power of the accelerators scratchpad memories, providing between 8.5 and 29.2 percent reduction in entire power dissipation. Overall, the proposed design outperforms implementations running on the CPU by orders of magnitude, and achieves speedups between 1.7x and 5.9x for different speech decoders over a highly optimized CUDA implementation running on Geforce-GTX-980 GPU, while reducing the energy by 123-454x.
dc.format.extent	15 p.
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Automatic speech recognition
dc.subject.other	Viterbi algorithm
dc.subject.other	Speech recognition
dc.subject.other	Graphics processing units
dc.subject.other	Acoustics
dc.subject.other	Central Processing Unit
dc.subject.other	Hardware
dc.subject.other	Decoding
dc.subject.other	Automatic Speech Recognition (ASR)
dc.subject.other	Viterbi search
dc.subject.other	hardware accelerator
dc.subject.other	WFST
dc.subject.other	low-power architecture
dc.title	A low-power, high-performance speech recognition accelerator
dc.type	Article
dc.subject.lemac	Reconeixement automàtic de la parla
dc.contributor.group	Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
dc.identifier.doi	10.1109/TC.2019.2937075
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	https://ieeexplore.ieee.org/document/8812893
dc.rights.access	Open Access
local.identifier.drac	26417166
dc.description.version	Postprint (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/833057/EU/CoCoUnit: An Energy-Efficient Processing Unit for Cognitive Computing/CoCoUnit
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO/1PE/TIN2016-75344-R
local.citation.author	Yazdani, R.; Arnau, J.; Gonzalez, A.
local.citation.publicationName	IEEE transactions on computers
local.citation.volume	68
local.citation.number	12
local.citation.startingPage	1817
local.citation.endingPage	1831

Fitxers d'aquest items

Nom:: A_Low_Power__High_Performance_ ...
Mida:: 1,969Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Articles de revista [1.050]
Articles de revista [68]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

A low-power, high-performance speech recognition accelerator

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora