Simultaneous speech detection with spatial features for speaker diarization

Zelenak, Martin; Segura Perales, Carlos; Luque, Jordi; Hernando Pericás, Francisco Javier

doi:10.1109/TASL.2011.2160167

Visualitza/Obre

Simultaneous speech detection with spatial features for speaker diarization.pdf (633,0Kb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Zelenak, Martin

Segura Perales, Carlos

Luque, Jordi

Hernando Pericás, Francisco Javier

Tipus de documentArticle

Data publicació2012-02

Condicions d'accésAccés restringit per política de l'editorial

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Simultaneous speech poses a challenging problem for conventional speaker diarization systems. In meeting data, a substantial amount of missed speech error is due to speaker overlaps, since usually only one speaker label per segment is assigned. Furthermore, simultaneous speech included in training data can lead to corrupt speaker models and thus worse segmentation performance. In this paper, we propose the use of three spatial cross-correlation-based features together with spectral information for speaker overlap detection on distant microphones. Different microphone-pair data are fused by means of principal component analysis. We have obtained an improvement of the speaker diarization system over the baseline by discarding overlap segments from model training and assigning two speaker labels to them according to likelihoods in Viterbi decoding. In experiments conducted on the AMI Meeting corpus, we achieve a relative DER reduction of 11.2% and 17.0% for single- and multi-site data, respectively. The improvement of clustering with techniques such as beamforming and TDOA-feature stream also leads to a higher effectiveness of the overlap labeling algorithm. Preliminary experiments with NIST RT data show DER improvement on the RT'09 meeting recordings as well.

CitacióZelenak, M. [et al.]. Simultaneous speech detection with spatial features for speaker diarization. "IEEE transactions on audio speech and language processing", Febrer 2012, vol. 20, núm. 2, p. 436-446.

URIhttp://hdl.handle.net/2117/15864

DOI10.1109/TASL.2011.2160167

ISSN1558-7916

Versió de l'editorhttp://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6136544&tag=1

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Simultaneous sp ... or speaker diarization.pdf		633,0Kb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

Simultaneous speech detection with spatial features for speaker diarization

Visualitza/Obre

Explora