On the use of agglomerative and spectral clustering in speaker diarization of meetings

Hernando Pericás, Francisco Javier

dc.contributor.author	Hernando Pericás, Francisco Javier
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2013-03-08T12:50:19Z
dc.date.created	2012
dc.date.issued	2012
dc.identifier.citation	Hernando, J. On the use of agglomerative and spectral clustering in speaker diarization of meetings. A: The Speaker and Language Recognition Workshop. "Odyssey 2012: The Speaker and Language Recognition Workshop". Singapur: 2012, p. 130-137.
dc.identifier.uri	http://hdl.handle.net/2117/18147
dc.description.abstract	In this paper, we present a clustering algorithm for speaker diarization based on spectral clustering. State-of-the-art diariza- tion systems are based on agglomerative hierarchical clustering using Bayesian Information Criterion and other statistical met- rics among clusters which results in a high computational cost and in a time demanding approach. Our proposal avoids the use of such metrics applying Euclidean distances on the eigenvec- tors computed from the normalized graph Laplacian. A hybrid system is proposed in which HMM/GMM modelling and Viterbi alignment are still applied, but the BIC for merging and stop- ping criterion are substituted by a spectral clustering algorithm. Once an initial segmentation is obtained and the clustering align- ment is computed using the Viterbi algorithm, the remaining clusters are modeled by stacking the means of the Gaussians in a super vector. In such a space single value decomposition of the associated normalized graph Laplacian is computed. Most similar clusters are merged based on the Euclidean distances in resulting eigenspace. Cluster number estimation is based on analyzing eigenstructure of the similarity matrix by selecting a threshold on the eigenvalues gap. In experiments, this ap- proach has obtained a comparable performance to the traditional AHC+BIC approach on the Rich Transcription conference eval- uation data. Although it still relies on Gaussian modelling of clusters and Viterbi alignment, the proposed approach leads to a system which runs several times faster than traditional one.
dc.format.extent	8 p.
dc.language.iso	eng
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Automatic speech recognition
dc.title	On the use of agglomerative and spectral clustering in speaker diarization of meetings
dc.type	Conference report
dc.subject.lemac	Reconeixement automàtic de la parla
dc.contributor.group	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.description.peerreviewed	Peer Reviewed
dc.rights.access	Restricted access - publisher's policy
local.identifier.drac	11052668
dc.description.version	Postprint (published version)
dc.date.lift	10000-01-01
local.citation.author	Hernando, J.
local.citation.contributor	The Speaker and Language Recognition Workshop
local.citation.pubplace	Singapur
local.citation.publicationName	Odyssey 2012: The Speaker and Language Recognition Workshop
local.citation.startingPage	130
local.citation.endingPage	137

Fitxers d'aquest items

Nom:: Paper Odyssey 2012.pdf
Mida:: 1,116Mb
Format:: PDF
Descripció:: Paper Odyssey 2012

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [437]
Ponències/Comunicacions de congressos [3.327]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

On the use of agglomerative and spectral clustering in speaker diarization of meetings

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora