Show simple item record

dc.contributor.authorGhahabi Esfahani, Omid
dc.contributor.authorBonafonte Cávez, Antonio
dc.contributor.authorHernando Pericás, Francisco Javier
dc.contributor.authorMoreno Bilbao, M. Asunción
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2016-12-07T13:39:25Z
dc.date.issued2016
dc.identifier.citationGhahabi, O., Bonafonte, A., Hernando, J., Moreno, A. Deep neural networks for i-vector language identification of short utterances in cars. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2016: September 8-12, 2016, San Francisco, USA". San Francisco, CA: International Speech Communication Association (ISCA), 2016, p. 367-371.
dc.identifier.isbn1990-9770
dc.identifier.urihttp://hdl.handle.net/2117/97867
dc.description.abstractThis paper is focused on the application of the Language Identification (LID) technology for intelligent vehicles. We cope with short sentences or words spoken in moving cars in four languages: English, Spanish, German, and Finnish. As the response time of the LID system is crucial for user acceptance in this particular task, speech signals of different durations with total average of 3.8s are analyzed. In this paper, the authors propose the use of Deep Neural Networks (DNN) to model effectively the i-vector space of languages. Both raw i-vectors and session variability compensated i-vectors are evaluated as input vectors to DNNs. The performance of the proposed DNN architecture is compared with both conventional GMM-UBM and i-vector/LDA systems considering the effect of durations of signals. It is shown that the signals with durations between 2 and 3s meet the requirements of this application, i.e., high accuracy and fast decision, in which the proposed DNN architecture outperforms GMM-UBM and i-vector/LDA systems by 37% and 28%, respectively.
dc.format.extent5 p.
dc.language.isoeng
dc.publisherInternational Speech Communication Association (ISCA)
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcshAutomatic speech recognition
dc.subject.otherNatural language processing systems
dc.subject.otherNetwork architecture
dc.subject.otherSpeech communication
dc.subject.otherSpeech processing
dc.subject.otherSpeech recognition
dc.subject.otherVectors
dc.subject.otherDeep neural networks
dc.subject.otherHigh-accuracy
dc.subject.otherI vectors
dc.subject.otherInput vector
dc.subject.otherLanguage identification
dc.subject.otherSpeech signals
dc.subject.otherSpeech technology
dc.subject.otherUser acceptance
dc.titleDeep neural networks for i-vector language identification of short utterances in cars
dc.typeConference report
dc.subject.lemacReconeixement automàtic de la parla
dc.contributor.groupUniversitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi10.21437/Interspeech.2016-1045
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://www.isca-speech.org/archive/Interspeech_2016/pdfs/1045.PDF
dc.rights.accessRestricted access - publisher's policy
drac.iddocument19287294
dc.description.versionPostprint (published version)
dc.date.lift10000-01-01
upcommons.citation.authorGhahabi, O., Bonafonte, A., Hernando, J., Moreno, A.
upcommons.citation.contributorAnnual Conference of the International Speech Communication Association
upcommons.citation.pubplaceSan Francisco, CA
upcommons.citation.publishedtrue
upcommons.citation.publicationNameINTERSPEECH 2016: September 8-12, 2016, San Francisco, USA
upcommons.citation.startingPage367
upcommons.citation.endingPage371


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Except where otherwise noted, content on this work is licensed under a Creative Commons license: Attribution-NonCommercial-NoDerivs 3.0 Spain