Short- and long-term speech features for hybrid HMM-i-vector based speaker diarization system

Zewoudie, Abraham Woubie; Luque, Jordi; Hernando Pericás, Francisco Javier

doi:10.21437/Odyssey.2016-58

Visualitza/Obre

Paper (353,2Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Zewoudie, Abraham Woubie

Luque, Jordi

Hernando Pericás, Francisco Javier

Tipus de documentComunicació de congrés

Data publicació2016

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteBISON - BIg Speech data analytics for cONtact centres (EC-H2020-645323)

Abstract

i-vectors have been successfully applied over the last years in speaker recognition tasks. This work aims at assessing the suitability of i-vector modeling within the frame of speaker diarization task. In such context, a weighted cosine-distance between two different sets of i-vectors is proposed for speaker clustering. Speech clusters generated by Viterbi segmentation are first modeled by two different i-vectors. Whilst the first i-vector represents the distribution of the commonly used short-term Mel Frequency Cepstral Coefficients, the second one depicts a selection of voice quality and prosodic features. In order to combine both short- and long-term speech statistics, the cosine-distance scores of those two i-vectors are linearly weighted to obtain a unique similarity score. The final fused score is then used as speaker clustering distance. Our experimental results on two different evaluation sets of the Augmented Multi-party Interaction corpus show the suitability of combining both sources of information within the i-vector space. Our experimental results show that the use of i-vector based clustering technique provide a significant improvement, in terms of diarization error rate, than those based on Gaussian Mixture Modeling technique. Furthermore, this work also reports a significant speaker error reduction by augmenting short-term based i-vector clustering with a second i-vector estimated from voice quality and prosody related speech features.

CitacióZewoudie, A., Jordi Luque, Hernando, J. Short- and long-term speech features for hybrid HMM-i-vector based speaker diarization system. A: The Speaker and Language Recognition Workshop. "ODYSSEY 2016 - The Speaker and Language Recognition Workshop". Bilbao: 2016, p. 400-406.

URIhttp://hdl.handle.net/2117/101681

DOI10.21437/Odyssey.2016-58

Versió de l'editorhttp://www.isca-speech.org/archive/Odyssey_2016/pdfs/18.pdf

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Odyssey Abraham.pdf	Paper	353,2Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Short- and long-term speech features for hybrid HMM-i-vector based speaker diarization system

Visualitza/Obre

Explora