An ultra low-power hardware accelerator for automatic speech recognition

Yazdani Aminabadi, Reza; Segura Salvador, Albert; Arnau Montañés, José María; González Colás, Antonio María

doi:10.1109/MICRO.2016.7783750

dc.contributor.author	Yazdani Aminabadi, Reza
dc.contributor.author	Segura Salvador, Albert
dc.contributor.author	Arnau Montañés, José María
dc.contributor.author	González Colás, Antonio María
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2017-06-01T06:40:38Z
dc.date.available	2017-06-01T06:40:38Z
dc.date.issued	2016
dc.identifier.citation	Yazdani, R., Segura, A., Arnau, J., González, A. An ultra low-power hardware accelerator for automatic speech recognition. A: Annual IEEE/ACM International Symposium on Microarchitecture. "Proceedings of the 49th IEEE/ACM Symposium on Microarchitecture". Taipei: IEEE Press, 2016, p. 580-591.
dc.identifier.isbn	978-1-5090-3509-0
dc.identifier.uri	http://hdl.handle.net/2117/105093
dc.description.abstract	Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for large-vocabulary, speaker-independent, continuous speech recognition. It focuses on the Viterbi search algorithm, that represents the main bottleneck in an ASR system. The proposed design includes innovative techniques to improve the memory subsystem, since memory is identified as the main bottleneck for performance and power in the design of these accelerators. We propose a prefetching scheme tailored to the needs of an ASR system that hides main memory latency for a large fraction of the memory accesses with a negligible impact on area. In addition, we introduce a novel bandwidth saving technique that removes 20% of the off-chip memory accesses issued during the Viterbi search. The proposed design outperforms software implementations running on the CPU by orders of magnitude and achieves 1.7x speedup over a highly optimized CUDA implementation running on a high-end Geforce GTX 980 GPU, while reducing by two orders of magnitude (287x) the energy required to convert the speech into text.
dc.format.extent	12 p.
dc.language.iso	eng
dc.publisher	IEEE Press
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Automatic speech recognition
dc.subject.lcsh	Microprocessors
dc.subject.lcsh	Parallel processing (Electronic computers)
dc.subject.other	Storage management
dc.subject.other	Microprocessor chips
dc.subject.other	Parallel architectures
dc.subject.other	Power aware computing
dc.subject.other	Power consumption
dc.subject.other	Search problems
dc.subject.other	Speech recognition
dc.title	An ultra low-power hardware accelerator for automatic speech recognition
dc.type	Conference report
dc.subject.lemac	Processament de la parla
dc.subject.lemac	Microprocessadors
dc.subject.lemac	Processament en paral·lel (Ordinadors)
dc.contributor.group	Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
dc.identifier.doi	10.1109/MICRO.2016.7783750
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://ieeexplore.ieee.org/document/7783750/
dc.rights.access	Open Access
local.identifier.drac	19685485
dc.description.version	Postprint (author's final draft)
local.citation.author	Yazdani, R.; Segura, A.; Arnau, J.; González, A.
local.citation.contributor	Annual IEEE/ACM International Symposium on Microarchitecture
local.citation.pubplace	Taipei
local.citation.publicationName	Proceedings of the 49th IEEE/ACM Symposium on Microarchitecture
local.citation.startingPage	580
local.citation.endingPage	591

Fitxers d'aquest items

Nom:: 07783750.pdf
Mida:: 323,7Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [187]
Ponències/Comunicacions de congressos [1.954]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

An ultra low-power hardware accelerator for automatic speech recognition

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora