Show simple item record

dc.contributor.authorIndia Massana, Miquel Àngel
dc.contributor.authorRodríguez Fonollosa, José Adrián
dc.contributor.authorHernando Pericás, Francisco Javier
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2018-01-19T13:10:29Z
dc.date.available2018-01-19T13:10:29Z
dc.date.issued2017
dc.identifier.citationIndia, M., Fonollosa, José A. R., Hernando, J. LSTM neural network-based speaker segmentation using acoustic and language modelling. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2017: 20-24 August 2017: Stockholm". Stockholm: International Speech Communication Association (ISCA), 2017, p. 2834-2838.
dc.identifier.isbn1990-9772
dc.identifier.urihttp://hdl.handle.net/2117/112988
dc.description.abstractThis paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different Joint Factor Analysis (JFA) acoustic approaches: i-vectors and speaker factors. Both of them are compared with a baseline algorithm that uses cosine distance to detect speaker turn changes. LSTM neural networks with both linguistic and acoustic features have been able to produce a robust speaker segmentation. The experimental results show that our proposal clearly outperforms the baseline system.
dc.format.extent5 p.
dc.language.isoeng
dc.publisherInternational Speech Communication Association (ISCA)
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcshAutomatic speech recognition
dc.subject.lcshNeural networks (Neurobiology)
dc.subject.otherSpeaker segmentation
dc.subject.otherNeural language modelling
dc.subject.otherI-vectors
dc.subject.otherSpeaker factors
dc.subject.otherLSTM neural networks
dc.titleLSTM neural network-based speaker segmentation using acoustic and language modelling
dc.typeConference lecture
dc.subject.lemacReconeixement automàtic de la parla
dc.subject.lemacXarxes neuronals (Neurobiologia)
dc.contributor.groupUniversitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi10.21437/Interspeech.2017
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://www.isca-speech.org/archive/Interspeech_2017/pdfs/0407.PDF
dc.rights.accessOpen Access
drac.iddocument21716191
dc.description.versionPostprint (published version)
upcommons.citation.authorIndia, M.; Fonollosa, José A. R.; Hernando, J.
upcommons.citation.contributorAnnual Conference of the International Speech Communication Association
upcommons.citation.pubplaceStockholm
upcommons.citation.publishedtrue
upcommons.citation.publicationNameINTERSPEECH 2017: 20-24 August 2017: Stockholm
upcommons.citation.startingPage2834
upcommons.citation.endingPage2838


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder