Unsupervised training of siamese networks for speaker verification

Khan, Umair; Hernando Pericás, Francisco Javier

doi:10.21437/Interspeech.2020-1882

dc.contributor.author	Khan, Umair
dc.contributor.author	Hernando Pericás, Francisco Javier
dc.contributor.other	Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2020-11-12T17:06:10Z
dc.date.available	2020-11-12T17:06:10Z
dc.date.issued	2020
dc.identifier.citation	Khan, U.; Hernando, J. Unsupervised training of siamese networks for speaker verification. A: Annual Conference of the International Speech Communication Association. "Interspeech 2020: the 20th Annual Conference of the International Speech Communication Association: 25-29 October 2020: Shanghai, China". Baixas: International Speech Communication Association (ISCA), 2020, p. 3002-3006. ISBN 1990-9772. DOI 10.21437/Interspeech.2020-1882.
dc.identifier.isbn	1990-9772
dc.identifier.uri	http://hdl.handle.net/2117/332092
dc.description.abstract	Speaker labeled background data is an essential requirement for most state-of-the-art approaches in speaker recognition, e.g., xvectors and i-vector/PLDA. However, in reality it is difficult to access large amount of labeled data. In this work, we propose siamese networks for speaker verification without using speaker labels. We propose two different siamese networks having two and three branches, respectively, where each branch is a CNN encoder. Since the goal is to avoid speaker labels, we propose to generate the training pairs in an unsupervised manner. The client samples are selected within one database according to highest cosine scores with the anchor in i-vector space. The impostor samples are selected in the same way but from another database. Our double-branch siamese performs binary classification using cross entropy loss during training. In testing phase, we obtain speaker verification scores directly from its output layer. Whereas, our triple-branch siamese is trained to learn speaker embeddings using triplet loss. During testing, we extract speaker embeddings from its output layer, which are scored in the experiments using cosine scoring. The evaluation is performed on VoxCeleb-1 database, which show that using the proposed unsupervised systems, solely or in fusion, the results get closer to supervised baseline
dc.description.sponsorship	This work has been developed in the framework of DeepVoice Project (TEC2015-69266-P), funded by Spanish Ministry
dc.format.extent	5 p.
dc.language.iso	eng
dc.publisher	International Speech Communication Association (ISCA)
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Automatic speech recognition
dc.subject.other	i-vector
dc.subject.other	Impostor selection
dc.subject.other	CNN
dc.subject.other	Triplet loss
dc.title	Unsupervised training of siamese networks for speaker verification
dc.type	Conference report
dc.subject.lemac	Reconeixement automàtic de la parla
dc.contributor.group	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi	10.21437/Interspeech.2020-1882
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://dx.doi.org/10.21437/Interspeech.2020-1882
dc.rights.access	Open Access
local.identifier.drac	29753620
dc.description.version	Postprint (published version)
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TEC2015-69266-P/ES/TECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO/
local.citation.author	Khan, U.; Hernando, J.
local.citation.contributor	Annual Conference of the International Speech Communication Association
local.citation.pubplace	Baixas
local.citation.publicationName	Interspeech 2020: the 20th Annual Conference of the International Speech Communication Association: 25-29 October 2020: Shanghai, China
local.citation.startingPage	3002
local.citation.endingPage	3006

Fitxers d'aquest items

Nom:: 1882.pdf
Mida:: 343,1Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [239]
Ponències/Comunicacions de congressos [437]
Ponències/Comunicacions de congressos [3.332]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Unsupervised training of siamese networks for speaker verification

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora