Auto-encoding nearest neighbor i-vectors for speaker verification
Document typeConference lecture
PublisherInternational Speech Communication Association (ISCA)
Rights accessOpen Access
In the last years, i-vectors followed by cosine or PLDA scoringtechniques were the state-of-the-art approach in speaker veri-fication. PLDA requires labeled background data, and thereexists a significant performance gap between the two scoringtechniques. In this work, we propose to reduce this gap by us-ing an autoencoder to transform i-vector into a new speaker vec-tor representation, which will be referred to as ae-vector. Theautoencoder will be trained to reconstruct neighbor i-vectors in-stead of the same training i-vectors, as usual. These neighbori-vectors will be selected in an unsupervised manner accordingto the highest cosine scores to the training i-vectors. The evalua-tion is performed on the speaker verification trials of VoxCeleb-1 database. The experiments show that our proposed ae-vectorsgain a relative improvement of 42% in terms of EER comparedto the conventional i-vectors using cosine scoring, which fillsthe performance gap between cosine and PLDA scoring tech-niques by 92%, but without using speaker labels
CitationKhan, U.; India, M.; Hernando, J. Auto-encoding nearest neighbor i-vectors for speaker verification. A: Annual Conference of the International Speech Communication Association. "Interspeech 2019: the 20th Annual Conference of the International Speech Communication Association: 15-19 September 2019: Graz, Austria". Baixas: International Speech Communication Association (ISCA), 2019, p. 4060-4064.