LSTM neural network-based speaker segmentation using acoustic and language modelling
Document typeConference lecture
PublisherInternational Speech Communication Association (ISCA)
Rights accessOpen Access
This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different Joint Factor Analysis (JFA) acoustic approaches: i-vectors and speaker factors. Both of them are compared with a baseline algorithm that uses cosine distance to detect speaker turn changes. LSTM neural networks with both linguistic and acoustic features have been able to produce a robust speaker segmentation. The experimental results show that our proposal clearly outperforms the baseline system.
CitationIndia, M., Fonollosa, José A. R., Hernando, J. LSTM neural network-based speaker segmentation using acoustic and language modelling. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2017: 20-24 August 2017: Stockholm". Stockholm: International Speech Communication Association (ISCA), 2017, p. 2834-2838.