Show simple item record

dc.contributor.authorKhan, Umair
dc.contributor.authorIndia Massana, Miquel Àngel
dc.contributor.authorHernando Pericás, Francisco Javier
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.identifier.citationKhan, U.; India, M.; Hernando, J. Auto-encoding nearest neighbor i-vectors for speaker verification. A: Annual Conference of the International Speech Communication Association. "Interspeech 2019: the 20th Annual Conference of the International Speech Communication Association: 15-19 September 2019: Graz, Austria". Baixas: International Speech Communication Association (ISCA), 2019, p. 4060-4064.
dc.description.abstractIn the last years, i-vectors followed by cosine or PLDA scoringtechniques were the state-of-the-art approach in speaker veri-fication. PLDA requires labeled background data, and thereexists a significant performance gap between the two scoringtechniques. In this work, we propose to reduce this gap by us-ing an autoencoder to transform i-vector into a new speaker vec-tor representation, which will be referred to as ae-vector. Theautoencoder will be trained to reconstruct neighbor i-vectors in-stead of the same training i-vectors, as usual. These neighbori-vectors will be selected in an unsupervised manner accordingto the highest cosine scores to the training i-vectors. The evalua-tion is performed on the speaker verification trials of VoxCeleb-1 database. The experiments show that our proposed ae-vectorsgain a relative improvement of 42% in terms of EER comparedto the conventional i-vectors using cosine scoring, which fillsthe performance gap between cosine and PLDA scoring tech-niques by 92%, but without using speaker labels
dc.format.extent5 p.
dc.publisherInternational Speech Communication Association (ISCA)
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcshSpeech processing systems
dc.subject.otherDeep learning
dc.subject.otherSpeaker verification
dc.titleAuto-encoding nearest neighbor i-vectors for speaker verification
dc.typeConference lecture
dc.subject.lemacProcessament de la parla
dc.contributor.groupUniversitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.description.peerreviewedPeer Reviewed
dc.rights.accessOpen Access
dc.description.versionPostprint (published version)
local.citation.authorKhan, U.; India, M.; Hernando, J.
local.citation.contributorAnnual Conference of the International Speech Communication Association
local.citation.publicationNameInterspeech 2019: the 20th Annual Conference of the International Speech Communication Association: 15-19 September 2019: Graz, Austria

Files in this item


This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain