Jitter and shimmer voice quality features have been successfully
used to characterize speaker voice traits and detect voice pathologies.
Jitter and shimmer measure variations in the fundamental frequency
and amplitude of speaker's voice, respectively. Due to their nature, they can be used to assess differences between speakers. In this paper, we investigate the usefulness of these voice quality features in the task of speaker diarization. The combination of voice quality features with the conventional spectral features, Mel-Frequency Cepstral Coefficients (MFCC), is addressed in the framework of Augmented Multiparty Interaction (AMI) corpus, a multi-party and spontaneous speech set of recordings. Both sets of features are independently modeled using mixture of Gaussians and fused together at the score likelihood level. The experiments carried out on the AMI corpus show that incorporating jitter and shimmer measurements to the baseline spectral features decreases the diarization error rate in most of the recordings.
CitacióZewoudie, A.; Jordi Luque; Hernando, J. Jitter and Shimmer measurements for speaker diarization. A: Jornadas en Tecnología del Habla and III Iberian SLTech Workshop. "VII Jornadas en Tecnología del Habla and III Iberian SLTech Workshop: proceedings: November 19-21, 2014: Escuela de Ingeniería en Telecomunicación y Electrónica Universidad de Las Palmas de Gran Canaria: Las Palmas de Gran Canaria, Spain". Las Palmas de Gran Canaria: 2014, p. 21-30.