The UPC speaker verification system submitted to VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20)

Khan, Umair; Hernando Pericás, Francisco Javier

dc.contributor.author	Khan, Umair
dc.contributor.author	Hernando Pericás, Francisco Javier
dc.contributor.other	Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2020-11-06T16:51:17Z
dc.date.available	2020-11-06T16:51:17Z
dc.date.issued	2020-10-27
dc.identifier.citation	Khan, U.; Hernando, J. The UPC speaker verification system submitted to VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20). 2020.
dc.identifier.uri	http://hdl.handle.net/2117/331625
dc.description	This report describes the submission from Technical University of Catalonia (UPC) to the VoxCeleb Speaker Recognition Challenge (VoxSRC-20) at Interspeech 2020. The final submission is a combination of three systems. System-1 is an autoencoder based approach which tries to reconstruct similar i-vectors, whereas System-2 and -3 are Convolutional Neural Network (CNN) based siamese architectures. The siamese networks have two and three branches, respectively, where each branch is a CNN encoder. The double-branch siamese performs binary classification using cross entropy loss during training. Whereas, our triple-branch siamese is trained to learn speaker embeddings using triplet loss. We provide results of our systems on VoxCeleb-1 test, VoxSRC-20 validation and test sets.
dc.description.abstract	This report describes the submission from Technical University of Catalonia (UPC) to the VoxCeleb Speaker Recognition Challenge (VoxSRC-20) at Interspeech 2020. The final submission is a combination of three systems. System-1 is an autoencoder based approach which tries to reconstruct similar i-vectors, whereas System-2 and -3 are Convolutional Neural Network (CNN) based siamese architectures. The siamese networks have two and three branches, respectively, where each branch is a CNN encoder. The double-branch siamese performs binary classification using cross entropy loss during training. Whereas, our triple-branch siamese is trained to learn speaker embeddings using triplet loss. We provide results of our systems on VoxCeleb-1 test, VoxSRC-20 validation and test sets.
dc.description.sponsorship	This work was supported by the project PID2019-107579RBI00 / AEI / 10.13039/501100011033
dc.format.extent	3 p.
dc.language.iso	eng
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Automatic speech recognition
dc.title	The UPC speaker verification system submitted to VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20)
dc.type	External research report
dc.subject.lemac	Reconeixement automàtic de la parla
dc.contributor.group	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.relation.publisherversion	https://arxiv.org/abs/2010.10937
dc.rights.access	Open Access
local.identifier.drac	29753633
dc.description.version	Preprint
local.citation.author	Khan, U.; Hernando, J.

Fitxers d'aquest items

Nom:: 2010.10937.pdf
Mida:: 508,8Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

The UPC speaker verification system submitted to VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20)

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora