Automatic speech recognition with deep neural networks for impaired speech

España-i-Bonet, Cristina; Rodríguez Fonollosa, José Adrián

doi:10.1007/978-3-319-49169-1_10

Visualitza/Obre

LNAI16.pdf (181,3Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

España-i-Bonet, Cristina

Rodríguez Fonollosa, José Adrián

Tipus de documentText en actes de congrés

Data publicació2016

EditorSpringer

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjectePrograma segunda voz (MINECO-IPT-2012-0914-300000)
TECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO (MINECO-TEC2015-69266-P)

Abstract

Automatic Speech Recognition has reached almost human performance in some controlled scenarios. However, recognition of impaired speech is a difficult task for two main reasons: data is (i) scarce and (ii) heterogeneous. In this work we train different architectures on a database of dysarthric speech. A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models outperform classical GMM-HMM according to word error rate measures. A DNN is able to improve the recognition word error rate a 13% for subjects with dysarthria with respect to the best classical architecture. This improvement is higher than the one given by other deep neural networks such as CNNs, TDNNs and LSTMs. All the experiments have been done with the Kaldi toolkit for speech recognition for which we have adapted several recipes to deal with dysarthric speech and work on the TORGO database. These recipes are publicly available.

Descripció

The final publication is available at https://link.springer.com/chapter/10.1007%2F978-3-319-49169-1_10

CitacióEspaña-i-Bonet, C., Fonollosa, J. A. R. Automatic speech recognition with deep neural networks for impaired speech. A: International Conference on Advances in Speech and Language Technologies for Iberian Languages. "Advances in Speech and Language Technologies for Iberian Languages: Third International Conference, IberSPEECH 2016: Lisbon, Portugal, November 23-25, 2016: proceedings". Lisbon: Springer, 2016, p. 97-107.

URIhttp://hdl.handle.net/2117/103823

DOI10.1007/978-3-319-49169-1_10

ISBN978-3-319-49169-1

Versió de l'editorhttps://link.springer.com/chapter/10.1007%2F978-3-319-49169-1_10

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
LNAI16.pdf		181,3Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Automatic speech recognition with deep neural networks for impaired speech

Visualitza/Obre

Explora