Improving i-Vector and PLDA based speaker clustering with long-term features
Document typeConference report
PublisherInternational Speech Communication Association (ISCA)
Rights accessRestricted access - publisher's policy
European Commisision's projectBISON - BIg Speech data analytics for cONtact centres (EC-H2020-645323)
i-vector modeling techniques have been successfully used for speaker clustering task recently. In this work, we propose the extraction of i-vectors from short-and long-term speech features, and the fusion of their PLDA scores within the frame of speaker diarization. Two sets of i-vectors are first extracted from short-term spectral and longterm voice-quality, prosodic and glottal to noise excitation ratio (GNE) features. Then, the PLDA scores of these two ivectors are fused for speaker clustering task. Experiments have been carried out on single and multiple site scenario test sets of Augmented Multi-party Interaction (AMI) corpus. Experimental results show that i-vector based PLDA speaker clustering technique provides a significant diarization error rate (DER) improvement than GMM based BIC clustering technique.
CitationWoubie, A., Jordi Luque, Hernando, J. Improving i-Vector and PLDA based speaker clustering with long-term features. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2016: September 8-12, 2016, San Francisco, USA". San Francisco, CA: International Speech Communication Association (ISCA), 2016, p. 372-376.