Detection and handling of overlapping speech for speaker diarization
Document typeConference report
Rights accessOpen Access
This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker diarization performance. We propose the use of three spatial cross-correlation-based parameters for overlap detection on distant microphone channel data. Spatial features from dierent microphone pairs are fused by means of principal component analysis or by an approach involving a multilayer perceptron. In addition, we investigate the possibility of employing long-term prosodic information. The most suitable subset of candidate prosodic features is determined by a two-step mRMR feature selection algorithm. For segments including detected overlapping speech the speaker diarization system picks a second speaker label, and such segments are also discarded from the model training. The proposed overlap labeling technique is integrated in the Viterbi-decoding part of the diarization algorithm.
CitationZelenák, M.; Hernando, J. Detection and handling of overlapping speech for speaker diarization. A: Iberspeech. "IBERSPEECH 2012". Madrid: 2012, p. 460-469.