Show simple item record

dc.contributor.authorKhan, Umair
dc.contributor.authorIndia Massana, Miquel Àngel
dc.contributor.authorHernando Pericás, Francisco Javier
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2020-02-25T13:54:16Z
dc.date.available2020-02-25T13:54:16Z
dc.date.issued2019
dc.identifier.citationKhan, U.; India, M.; Hernando, J. Auto-encoding nearest neighbor i-vectors for speaker verification. A: Annual Conference of the International Speech Communication Association. "Interspeech 2019: the 20th Annual Conference of the International Speech Communication Association: 15-19 September 2019: Graz, Austria". Baixas: International Speech Communication Association (ISCA), 2019, p. 4060-4064.
dc.identifier.isbn1990-9772
dc.identifier.urihttp://hdl.handle.net/2117/178617
dc.description.abstractIn the last years, i-vectors followed by cosine or PLDA scoringtechniques were the state-of-the-art approach in speaker veri-fication. PLDA requires labeled background data, and thereexists a significant performance gap between the two scoringtechniques. In this work, we propose to reduce this gap by us-ing an autoencoder to transform i-vector into a new speaker vec-tor representation, which will be referred to as ae-vector. Theautoencoder will be trained to reconstruct neighbor i-vectors in-stead of the same training i-vectors, as usual. These neighbori-vectors will be selected in an unsupervised manner accordingto the highest cosine scores to the training i-vectors. The evalua-tion is performed on the speaker verification trials of VoxCeleb-1 database. The experiments show that our proposed ae-vectorsgain a relative improvement of 42% in terms of EER comparedto the conventional i-vectors using cosine scoring, which fillsthe performance gap between cosine and PLDA scoring tech-niques by 92%, but without using speaker labels
dc.format.extent5 p.
dc.language.isoeng
dc.publisherInternational Speech Communication Association (ISCA)
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcshSpeech processing systems
dc.subject.otherDeep learning
dc.subject.otherAutoencoders
dc.subject.otheri-vectors
dc.subject.otherSpeaker verification
dc.titleAuto-encoding nearest neighbor i-vectors for speaker verification
dc.typeConference lecture
dc.subject.lemacProcessament de la parla
dc.contributor.groupUniversitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi10.21437/Interspeech.2019-1444
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://www.isca-speech.org/archive/Interspeech_2019/pdfs/1444.pdf
dc.rights.accessOpen Access
local.identifier.drac27031255
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TEC2015-69266-P
local.citation.authorKhan, U.; India, M.; Hernando, J.
local.citation.contributorAnnual Conference of the International Speech Communication Association
local.citation.pubplaceBaixas
local.citation.publicationNameInterspeech 2019: the 20th Annual Conference of the International Speech Communication Association: 15-19 September 2019: Graz, Austria
local.citation.startingPage4060
local.citation.endingPage4064


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain