Restricted Boltzmann machines for vector representation of speech in speaker recognition

Ghahabi Esfahani, Omid; Hernando Pericás, Francisco Javier

doi:10.1016/j.csl.2017.06.007

Visualitza/Obre

1-s2.0-S0885230816302923-main.pdf (990,3Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Ghahabi Esfahani, Omid

Hernando Pericás, Francisco Javier

Tipus de documentArticle

Data publicació2018-01

EditorElsevier

Condicions d'accésAccés obert

Attribution-NonCommercial-NoDerivs 3.0 Spain

Llevat que s'hi indiqui el contrari, els continguts d'aquesta obra estan subjectes a la llicència de Creative Commons : Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya

Abstract

Over the last few years, i-vectors have been the state-of-the-art technique in speaker recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need phonetically labeled background data. The aim of this work is to develop an efficient alternative vector representation of speech by keeping the computational cost as low as possible and avoiding phonetic labels, which are not always accessible. The proposed vectors will be based on both Gaussian Mixture Models (GMM) and Restricted Boltzmann Machines (RBM) and will be referred to as GMM–RBM vectors. The role of RBM is to learn the total speaker and session variability among background GMM supervectors. This RBM, which will be referred to as Universal RBM (URBM), will then be used to transform unseen supervectors to the proposed low dimensional vectors. The use of different activation functions for training the URBM and different transformation functions for extracting the proposed vectors are investigated. At the end, a variant of Rectified Linear Units (ReLU) which is referred to as variable ReLU (VReLU) is proposed. Experiments on the core test condition 5 of NIST SRE 2010 show that comparable results with conventional i-vectors are achieved with a clearly lower computational load in the vector extraction process.

CitacióGhahabi, O., Hernando, J. Restricted Boltzmann machines for vector representation of speech in speaker recognition. "Computer speech and language", Gener 2018, vol. 47, p. 16-29.

URIhttp://hdl.handle.net/2117/106743

DOI10.1016/j.csl.2017.06.007

ISSN0885-2308

Versió de l'editorhttp://www.sciencedirect.com/science/article/pii/S0885230816302923?via%3Dihub

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
1-s2.0-S0885230816302923-main.pdf		990,3Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Restricted Boltzmann machines for vector representation of speech in speaker recognition

Visualitza/Obre

Explora