DNN speaker embeddings using autoencoder pre-training

Khan, Umair; Hernando Pericás, Francisco Javier

doi:10.23919/EUSIPCO.2019.8902945

Visualitza/Obre

08902945.pdf (368,1Kb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Khan, Umair

Hernando Pericás, Francisco Javier

Tipus de documentComunicació de congrés

Data publicació2019

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés restringit per política de l'editorial

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteTECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO (MINECO-TEC2015-69266-P)

Abstract

Over the last years, i-vectors have been the state-of-the-art approach in speaker recognition. Recent improvements in deep learning have increased the discriminative quality of i-vectors. However, deep learning architectures require a large amount of labeled background data which is difficult in practice. The aim of this paper is to propose an alternative scheme in order to reduce the need of labeled data. We propose the use of autoencoder pre-training in a speaker verification task. First, we train an autoencoder in an unsupervised way, using a large amount of unlabeled background data. Then, we train a Deep Neural Network (DNN) initialized with the parameters of the pre-trained autoencoder. The DNN training is carried out in a supervised way using relatively small labeled background data. In the testing phase, we extract speaker embeddings as the output of an intermediate layer of the DNN. The training and evaluation were performed on VoxCeleb-2 and VoxCeleb1 databases, respectively. The experimental results have shown that by initializing DNN with the parameters of the pre-trained autoencoder, we have achieved a relative improvement of 21%, in terms of Equal Error Rate (EER), over the baseline i-vector/PLDA system.

CitacióKhan, U.; Hernando, J. DNN speaker embeddings using autoencoder pre-training. A: European Signal Processing Conference. "27th EUSIPCO 2019 European Signal Processing Conference: A Coruña, Spain: September 2-6, 2019". Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 1-5.

URIhttp://hdl.handle.net/2117/175406

DOI10.23919/EUSIPCO.2019.8902945

ISBN978-1-5386-7300-3

Versió de l'editorhttps://ieeexplore.ieee.org/document/8902945

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
08902945.pdf		368,1Kb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

DNN speaker embeddings using autoencoder pre-training

Visualitza/Obre

Explora