Show simple item record

dc.contributor.authorKhan, Umair
dc.contributor.authorIndia Massana, Miquel Àngel
dc.contributor.authorHernando Pericás, Francisco Javier
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2020-07-06T11:25:20Z
dc.date.issued2020
dc.identifier.citationKhan, U.; India, M.; Hernando, J. I-vector transformation using k-nearest neighbors for speaker verification. A: IEEE International Conference on Acoustics, Speech, and Signal Processing. "2020 IEEE International Conference on Acoustics, Speech,and Signal Processing: proceedings: May 4-8, 2020: Centre de Convencions Internacional de Barcelona (CCIB) Barcelona, Spain". Institute of Electrical and Electronics Engineers (IEEE), 2020, p. 7574-7578.
dc.identifier.isbn978-1-5090-6632-2
dc.identifier.urihttp://hdl.handle.net/2117/192485
dc.description.abstractProbabilistic Linear Discriminant Analysis (PLDA) is the most efficient backend for i-vectors. However, it requires labeled background data which can be difficult to access in practice. Unlike PLDA, cosine scoring avoids speaker-labels at the cost of degrading the performance. In this work, we propose a post processing of i-vectors using a Deep Neural Network (DNN) to transform i-vectors into a new speaker vector representation. The DNN will be trained using i-vectors that are similar to the training i-vectors. These similar i-vectors will be selected in an unsupervised manner. Using the new vector representation, we will score the experimental trials using cosine scoring. The evaluation was performed on the speaker verification trials of VoxCeleb-1 database. The experiments have shown that with the help of the similar i-vectors the new vectors become more discriminative than the original i-vectors. The new vectors have gained a relative improvement of 53% in terms of EER, compared to the conventional i-vector/PLDA system, but without using speaker labels.
dc.description.sponsorshipThis work has been developed in the framework of DeepVoice Project (TEC2015-69266-P), funded by Spanish Ministry.
dc.format.extent5 p.
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcshAutomatic speech recognition
dc.subject.otherDeep learning
dc.subject.otherK-nearest neighbors
dc.subject.otherI-vectors
dc.subject.otherSpeaker verification
dc.titleI-vector transformation using k-nearest neighbors for speaker verification
dc.typeConference report
dc.subject.lemacReconeixement automàtic de la parla
dc.contributor.groupUniversitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi10.1109/ICASSP40776.2020.9053504
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://ieeexplore.ieee.org/abstract/document/9053504
dc.rights.accessRestricted access - publisher's policy
local.identifier.drac28660908
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TEC2015-69266-P
dc.date.lift10000-01-01
local.citation.authorKhan, U.; India, M.; Hernando, J.
local.citation.contributorIEEE International Conference on Acoustics, Speech, and Signal Processing
local.citation.publicationName2020 IEEE International Conference on Acoustics, Speech,and Signal Processing: proceedings: May 4-8, 2020: Centre de Convencions Internacional de Barcelona (CCIB) Barcelona, Spain
local.citation.startingPage7574
local.citation.endingPage7578


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder