Deep neural networks for i-vector language identification of short utterances in cars

Ghahabi Esfahani, Omid; Bonafonte Cávez, Antonio; Hernando Pericás, Francisco Javier; Moreno Bilbao, M. Asunción

doi:10.21437/Interspeech.2016-1045

dc.contributor.author	Ghahabi Esfahani, Omid
dc.contributor.author	Bonafonte Cávez, Antonio
dc.contributor.author	Hernando Pericás, Francisco Javier
dc.contributor.author	Moreno Bilbao, M. Asunción
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2016-12-07T13:39:25Z
dc.date.issued	2016
dc.identifier.citation	Ghahabi, O., Bonafonte, A., Hernando, J., Moreno, A. Deep neural networks for i-vector language identification of short utterances in cars. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2016: September 8-12, 2016, San Francisco, USA". San Francisco, CA: International Speech Communication Association (ISCA), 2016, p. 367-371.
dc.identifier.isbn	1990-9770
dc.identifier.uri	http://hdl.handle.net/2117/97867
dc.description.abstract	This paper is focused on the application of the Language Identification (LID) technology for intelligent vehicles. We cope with short sentences or words spoken in moving cars in four languages: English, Spanish, German, and Finnish. As the response time of the LID system is crucial for user acceptance in this particular task, speech signals of different durations with total average of 3.8s are analyzed. In this paper, the authors propose the use of Deep Neural Networks (DNN) to model effectively the i-vector space of languages. Both raw i-vectors and session variability compensated i-vectors are evaluated as input vectors to DNNs. The performance of the proposed DNN architecture is compared with both conventional GMM-UBM and i-vector/LDA systems considering the effect of durations of signals. It is shown that the signals with durations between 2 and 3s meet the requirements of this application, i.e., high accuracy and fast decision, in which the proposed DNN architecture outperforms GMM-UBM and i-vector/LDA systems by 37% and 28%, respectively.
dc.format.extent	5 p.
dc.language.iso	eng
dc.publisher	International Speech Communication Association (ISCA)
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Automatic speech recognition
dc.subject.other	Natural language processing systems
dc.subject.other	Network architecture
dc.subject.other	Speech communication
dc.subject.other	Speech processing
dc.subject.other	Speech recognition
dc.subject.other	Vectors
dc.subject.other	Deep neural networks
dc.subject.other	High-accuracy
dc.subject.other	I vectors
dc.subject.other	Input vector
dc.subject.other	Language identification
dc.subject.other	Speech signals
dc.subject.other	Speech technology
dc.subject.other	User acceptance
dc.title	Deep neural networks for i-vector language identification of short utterances in cars
dc.type	Conference report
dc.subject.lemac	Reconeixement automàtic de la parla
dc.contributor.group	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi	10.21437/Interspeech.2016-1045
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://www.isca-speech.org/archive/Interspeech_2016/pdfs/1045.PDF
dc.rights.access	Restricted access - publisher's policy
local.identifier.drac	19287294
dc.description.version	Postprint (published version)
dc.date.lift	10000-01-01
local.citation.author	Ghahabi, O.; Bonafonte, A.; Hernando, J.; Moreno, A.
local.citation.contributor	Annual Conference of the International Speech Communication Association
local.citation.pubplace	San Francisco, CA
local.citation.publicationName	INTERSPEECH 2016: September 8-12, 2016, San Francisco, USA
local.citation.startingPage	367
local.citation.endingPage	371

Fitxers d'aquest items

Nom:: 1045.PDF
Mida:: 285,6Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [437]
Ponències/Comunicacions de congressos [3.323]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Deep neural networks for i-vector language identification of short utterances in cars

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora