On the use of agglomerative and spectral clustering in speaker diarization of meetings
Visualitza/Obre
Paper Odyssey 2012 (1,116Mb) (Accés restringit)
Sol·licita una còpia a l'autor
Què és aquest botó?
Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:
- Disposem del correu electrònic de l'autor
- El document té una mida inferior a 20 Mb
- Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/18147
Tipus de documentText en actes de congrés
Data publicació2012
Condicions d'accésAccés restringit per política de l'editorial
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
In this paper, we present a clustering algorithm for speaker
diarization based on spectral clustering. State-of-the-art diariza-
tion systems are based on agglomerative hierarchical clustering
using Bayesian Information Criterion and other statistical met-
rics among clusters which results in a high computational cost
and in a time demanding approach. Our proposal avoids the use
of such metrics applying Euclidean distances on the eigenvec-
tors computed from the normalized graph Laplacian. A hybrid
system is proposed in which HMM/GMM modelling and Viterbi
alignment are still applied, but the BIC for merging and stop-
ping criterion are substituted by a spectral clustering algorithm.
Once an initial segmentation is obtained and the clustering align-
ment is computed using the Viterbi algorithm, the remaining
clusters are modeled by stacking the means of the Gaussians in
a super vector. In such a space single value decomposition of
the associated normalized graph Laplacian is computed. Most
similar clusters are merged based on the Euclidean distances
in resulting eigenspace. Cluster number estimation is based on
analyzing eigenstructure of the similarity matrix by selecting
a threshold on the eigenvalues gap. In experiments, this ap-
proach has obtained a comparable performance to the traditional
AHC+BIC approach on the Rich Transcription conference eval-
uation data. Although it still relies on Gaussian modelling of
clusters and Viterbi alignment, the proposed approach leads to a
system which runs several times faster than traditional one.
CitacióHernando, J. On the use of agglomerative and spectral clustering in speaker diarization of meetings. A: The Speaker and Language Recognition Workshop. "Odyssey 2012: The Speaker and Language Recognition Workshop". Singapur: 2012, p. 130-137.
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Paper Odyssey 2012.pdf | Paper Odyssey 2012 | 1,116Mb | Accés restringit |