Mostra el registre d'ítem simple
LSTM neural network-based speaker segmentation using acoustic and language modelling
dc.contributor.author | India Massana, Miquel Àngel |
dc.contributor.author | Rodríguez Fonollosa, José Adrián |
dc.contributor.author | Hernando Pericás, Francisco Javier |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2018-01-19T13:10:29Z |
dc.date.available | 2018-01-19T13:10:29Z |
dc.date.issued | 2017 |
dc.identifier.citation | India, M., Fonollosa, José A. R., Hernando, J. LSTM neural network-based speaker segmentation using acoustic and language modelling. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2017: 20-24 August 2017: Stockholm". Stockholm: International Speech Communication Association (ISCA), 2017, p. 2834-2838. |
dc.identifier.isbn | 1990-9772 |
dc.identifier.uri | http://hdl.handle.net/2117/112988 |
dc.description.abstract | This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different Joint Factor Analysis (JFA) acoustic approaches: i-vectors and speaker factors. Both of them are compared with a baseline algorithm that uses cosine distance to detect speaker turn changes. LSTM neural networks with both linguistic and acoustic features have been able to produce a robust speaker segmentation. The experimental results show that our proposal clearly outperforms the baseline system. |
dc.format.extent | 5 p. |
dc.language.iso | eng |
dc.publisher | International Speech Communication Association (ISCA) |
dc.subject | Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic |
dc.subject.lcsh | Automatic speech recognition |
dc.subject.lcsh | Neural networks (Neurobiology) |
dc.subject.other | Speaker segmentation |
dc.subject.other | Neural language modelling |
dc.subject.other | I-vectors |
dc.subject.other | Speaker factors |
dc.subject.other | LSTM neural networks |
dc.title | LSTM neural network-based speaker segmentation using acoustic and language modelling |
dc.type | Conference lecture |
dc.subject.lemac | Reconeixement automàtic de la parla |
dc.subject.lemac | Xarxes neuronals (Neurobiologia) |
dc.contributor.group | Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
dc.identifier.doi | 10.21437/Interspeech.2017 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0407.PDF |
dc.rights.access | Open Access |
local.identifier.drac | 21716191 |
dc.description.version | Postprint (published version) |
dc.relation.projectid | info:eu-repo/grantAgreement/EC/H2020/115902/EU/Remote Assessment of Disease and Relapse in Central Nervous System Disorders/RADAR-CNS |
local.citation.author | India, M.; Fonollosa, José A. R.; Hernando, J. |
local.citation.contributor | Annual Conference of the International Speech Communication Association |
local.citation.pubplace | Stockholm |
local.citation.publicationName | INTERSPEECH 2017: 20-24 August 2017: Stockholm |
local.citation.startingPage | 2834 |
local.citation.endingPage | 2838 |