Clustering initialization based on spatial information for speaker diarization of meetings
Document typeConference report
Rights accessRestricted access - publisher's policy
This paper proposes an initialization for an agglomerative system applied to speaker diarization in the meeting environment. The initialization is based on a previous clustering of the temporal sequence generated by the estimation of the Time Delay of Arrival (TDOA) among pair of sensors. That initial clustering has the purpose of obtaining initial classes with speaker information from a sole speaker. The aim is to ensure the purity of the initial segments based on the position of the speakers in a meeting along time. The TDOA initialization was tested with the dataset used in the RT07s evaluation where an improvement of the diariazation error rate is obtained with respect to the classical uniform initialization. The most of the experiments show that the purity of the beginning segments leads to a better clustering on the posterior hierarchical strategy based on cepstral features.
CitationLuque, J., Segura, C., Hernando, J. Clustering initialization based on spatial information for speaker diarization of meetings. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2008". Brisbane: 2008, p. 383-386.