Restricted Boltzmann Machine vectors for speaker clustering
Document typeConference lecture
PublisherInternational Speech Communication Association (ISCA)
Rights accessOpen Access
Restricted Boltzmann Machines (RBMs) have been used both in the front-end and backend of speaker verification systems. In this work, we apply RBMs as a front-end in the context of speaker clustering. Speakers' utterances are transformed into a vector representation by means of RBMs. These vectors, referred to as RBM vectors, have shown to preserve speaker-specific information and are used for the task of speaker clustering. In this work, we perform the traditional bottom-up Agglomerative Hierarchical Clustering (AHC). Using the RBM vector representation of speakers, the performance of speaker clustering is improved. The evaluation has been performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed system outperforms the baseline i-vectors system in terms of Equal Impurity (EI). Using cosine scoring, a relative improvement of 11% and 12% are achieved for average and single linkage clustering algorithms respectively. Using PLDA scoring, the RBM vectors achieve a relative improvement of 11% compared to i-vectors for the single linkage algorithm.
CitationKhan, U.; Safari, P.; Hernando, J. Restricted Boltzmann Machine vectors for speaker clustering. A: International Conference on Advances in Speech and Language Technologies for Iberian Languages. "IberSPEECH 2018: program and proceedings: 21-23 November 2018: Barcelona, Spain". Baixas: International Speech Communication Association (ISCA), 2018, p. 10-14.