Deep neural networks for i-vector language identification of short utterances in cars
Document typeConference report
PublisherInternational Speech Communication Association (ISCA)
Rights accessRestricted access - publisher's policy
This paper is focused on the application of the Language Identification (LID) technology for intelligent vehicles. We cope with short sentences or words spoken in moving cars in four languages: English, Spanish, German, and Finnish. As the response time of the LID system is crucial for user acceptance in this particular task, speech signals of different durations with total average of 3.8s are analyzed. In this paper, the authors propose the use of Deep Neural Networks (DNN) to model effectively the i-vector space of languages. Both raw i-vectors and session variability compensated i-vectors are evaluated as input vectors to DNNs. The performance of the proposed DNN architecture is compared with both conventional GMM-UBM and i-vector/LDA systems considering the effect of durations of signals. It is shown that the signals with durations between 2 and 3s meet the requirements of this application, i.e., high accuracy and fast decision, in which the proposed DNN architecture outperforms GMM-UBM and i-vector/LDA systems by 37% and 28%, respectively.
CitationGhahabi, O., Bonafonte, A., Hernando, J., Moreno, A. Deep neural networks for i-vector language identification of short utterances in cars. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2016: September 8-12, 2016, San Francisco, USA". San Francisco, CA: International Speech Communication Association (ISCA), 2016, p. 367-371.