LSTM neural network-based speaker segmentation using acoustic and language modelling

India Massana, Miquel Àngel; Rodríguez Fonollosa, José Adrián; Hernando Pericás, Francisco Javier

doi:10.21437/Interspeech.2017

dc.contributor.author	India Massana, Miquel Àngel
dc.contributor.author	Rodríguez Fonollosa, José Adrián
dc.contributor.author	Hernando Pericás, Francisco Javier
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2018-01-19T13:10:29Z
dc.date.available	2018-01-19T13:10:29Z
dc.date.issued	2017
dc.identifier.citation	India, M., Fonollosa, José A. R., Hernando, J. LSTM neural network-based speaker segmentation using acoustic and language modelling. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2017: 20-24 August 2017: Stockholm". Stockholm: International Speech Communication Association (ISCA), 2017, p. 2834-2838.
dc.identifier.isbn	1990-9772
dc.identifier.uri	http://hdl.handle.net/2117/112988
dc.description.abstract	This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different Joint Factor Analysis (JFA) acoustic approaches: i-vectors and speaker factors. Both of them are compared with a baseline algorithm that uses cosine distance to detect speaker turn changes. LSTM neural networks with both linguistic and acoustic features have been able to produce a robust speaker segmentation. The experimental results show that our proposal clearly outperforms the baseline system.
dc.format.extent	5 p.
dc.language.iso	eng
dc.publisher	International Speech Communication Association (ISCA)
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Automatic speech recognition
dc.subject.lcsh	Neural networks (Neurobiology)
dc.subject.other	Speaker segmentation
dc.subject.other	Neural language modelling
dc.subject.other	I-vectors
dc.subject.other	Speaker factors
dc.subject.other	LSTM neural networks
dc.title	LSTM neural network-based speaker segmentation using acoustic and language modelling
dc.type	Conference lecture
dc.subject.lemac	Reconeixement automàtic de la parla
dc.subject.lemac	Xarxes neuronals (Neurobiologia)
dc.contributor.group	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi	10.21437/Interspeech.2017
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0407.PDF
dc.rights.access	Open Access
local.identifier.drac	21716191
dc.description.version	Postprint (published version)
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/115902/EU/Remote Assessment of Disease and Relapse in Central Nervous System Disorders/RADAR-CNS
local.citation.author	India, M.; Fonollosa, José A. R.; Hernando, J.
local.citation.contributor	Annual Conference of the International Speech Communication Association
local.citation.pubplace	Stockholm
local.citation.publicationName	INTERSPEECH 2017: 20-24 August 2017: Stockholm
local.citation.startingPage	2834
local.citation.endingPage	2838

Fitxers d'aquest items

Nom:: 0407.PDF
Mida:: 531,7Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [437]
Ponències/Comunicacions de congressos [3.327]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

LSTM neural network-based speaker segmentation using acoustic and language modelling

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora