From features to speaker vectors by means of restricted Boltzmann machine adaptation
Document typeConference lecture
Rights accessOpen Access
Restricted Boltzmann Machines (RBMs) have shown success in different stages of speaker recognition systems. In this paper, we propose a novel framework to produce a vector-based representation for each speaker, which will be referred to as RBM-vector. This new approach maps the speaker spectral features to a single fixed-dimensional vector carrying speaker-specific information. In this work, a global model, referred to as Universal RBM (URBM), is trained taking advantage of RBM unsupervised learning capabilities. Then, this URBM is adapted to the data of each speaker in the development, enrolment and evaluation datasets. The network connection weights of the adapted RBMs are further concatenated and subject to a whitening with dimension reduction stage to build the speaker vectors. The evaluation is performed on the core test condition of the NIST SRE 2006 database, and it is shown that RBM-vectors achieve 15% relative improvement in terms of EER compared to i-vectors using cosine scoring. The score fusion with i-vector attains more than 24% relative improvement. The interest of this result for score fusion yields on the fact that both vectors are produced in an unsupervised fashion and can be used instead of i-vector/PLDA approach, when no data label is available. Results obtained for RBM-vector/PLDA framework is comparable with the ones from i-vector/PLDA. Their score fusion achieves 14% relative improvement compared to i-vector/PLDA.
CitationSafari, P., Ghahabi, O., Hernando, J. From features to speaker vectors by means of restricted Boltzmann machine adaptation. A: The Speaker and Language Recognition Workshop. "ODYSSEY 2016 - The Speaker and Language Recognition Workshop". Bilbao: 2016, p. 366-371.