Mostra el registre d'ítem simple

dc.contributor.authorYazdani Aminabadi, Reza
dc.contributor.authorSegura Salvador, Albert
dc.contributor.authorArnau Montañés, José María
dc.contributor.authorGonzález Colás, Antonio María
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2017-06-01T06:40:38Z
dc.date.available2017-06-01T06:40:38Z
dc.date.issued2016
dc.identifier.citationYazdani, R., Segura, A., Arnau, J., González, A. An ultra low-power hardware accelerator for automatic speech recognition. A: Annual IEEE/ACM International Symposium on Microarchitecture. "Proceedings of the 49th IEEE/ACM Symposium on Microarchitecture". Taipei: IEEE Press, 2016, p. 580-591.
dc.identifier.isbn978-1-5090-3509-0
dc.identifier.urihttp://hdl.handle.net/2117/105093
dc.description.abstractAutomatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for large-vocabulary, speaker-independent, continuous speech recognition. It focuses on the Viterbi search algorithm, that represents the main bottleneck in an ASR system. The proposed design includes innovative techniques to improve the memory subsystem, since memory is identified as the main bottleneck for performance and power in the design of these accelerators. We propose a prefetching scheme tailored to the needs of an ASR system that hides main memory latency for a large fraction of the memory accesses with a negligible impact on area. In addition, we introduce a novel bandwidth saving technique that removes 20% of the off-chip memory accesses issued during the Viterbi search. The proposed design outperforms software implementations running on the CPU by orders of magnitude and achieves 1.7x speedup over a highly optimized CUDA implementation running on a high-end Geforce GTX 980 GPU, while reducing by two orders of magnitude (287x) the energy required to convert the speech into text.
dc.format.extent12 p.
dc.language.isoeng
dc.publisherIEEE Press
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcshAutomatic speech recognition
dc.subject.lcshMicroprocessors
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.otherStorage management
dc.subject.otherMicroprocessor chips
dc.subject.otherParallel architectures
dc.subject.otherPower aware computing
dc.subject.otherPower consumption
dc.subject.otherSearch problems
dc.subject.otherSpeech recognition
dc.titleAn ultra low-power hardware accelerator for automatic speech recognition
dc.typeConference report
dc.subject.lemacProcessament de la parla
dc.subject.lemacMicroprocessadors
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
dc.identifier.doi10.1109/MICRO.2016.7783750
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://ieeexplore.ieee.org/document/7783750/
dc.rights.accessOpen Access
local.identifier.drac19685485
dc.description.versionPostprint (author's final draft)
local.citation.authorYazdani, R.; Segura, A.; Arnau, J.; González, A.
local.citation.contributorAnnual IEEE/ACM International Symposium on Microarchitecture
local.citation.pubplaceTaipei
local.citation.publicationNameProceedings of the 49th IEEE/ACM Symposium on Microarchitecture
local.citation.startingPage580
local.citation.endingPage591


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple