Mostra el registre d'ítem simple
Language modelling for speaker diarization in telephonic interviews
dc.contributor.author | India Massana, Miquel Àngel |
dc.contributor.author | Hernando Pericás, Francisco Javier |
dc.contributor.author | Rodríguez Fonollosa, José Adrián |
dc.contributor.other | Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2022-10-06T10:54:06Z |
dc.date.available | 2022-10-06T10:54:06Z |
dc.date.issued | 2023-03 |
dc.identifier.citation | India, M.; Hernando, J.; Fonollosa, J.A.R. Language modelling for speaker diarization in telephonic interviews. "Computer speech and language", Març 2023, vol. 78, article 101441, p. 1-12. |
dc.identifier.issn | 0885-2308 |
dc.identifier.uri | http://hdl.handle.net/2117/374077 |
dc.description.abstract | The aim of this paper is to investigate the benefit of combining both language and acoustic modelling for speaker diarization. Although conventional systems only use acoustic features, in some scenarios linguistic data contain high discriminative speaker information, even more reliable than the acoustic ones. In this study we analyze how an appropriate fusion of both kind of features is able to obtain good results in these cases. The proposed system is based on an iterative algorithm where a LSTM network is used as a speaker classifier. The network is fed with character-level word embeddings and a GMM based acoustic score created with the output labels from previous iterations. The presented algorithm has been evaluated in a Call-Center database, which is composed of telephone interview audios. The combination of acoustic features and linguistic content shows a 84.29% improvement in terms of a word-level DER as compared to a HMM/VB baseline system. The results of this study confirms that linguistic content can be efficiently used for some speaker recognition tasks. |
dc.description.sponsorship | This work was partially supported by the Spanish Project DeepVoice (TEC2015-69266-P) and by the project PID2019-107579RBI00/ AEI /10.13039/501100011033. |
dc.format.extent | 12 p. |
dc.language.iso | eng |
dc.publisher | Elsevier |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.subject | Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic |
dc.subject.lcsh | Speech processing systems |
dc.subject.lcsh | Neural networks (Computer science) |
dc.subject.other | Speaker diarization |
dc.subject.other | Language modelling |
dc.subject.other | Acoustic modelling |
dc.subject.other | LSTM neural networks |
dc.title | Language modelling for speaker diarization in telephonic interviews |
dc.type | Article |
dc.subject.lemac | Processament de la parla |
dc.subject.lemac | Xarxes neuronals (Informàtica) |
dc.contributor.group | Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
dc.identifier.doi | 10.1016/j.csl.2022.101441 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | https://www.sciencedirect.com/science/article/pii/S0885230822000651 |
dc.rights.access | Open Access |
local.identifier.drac | 34244864 |
dc.description.version | Postprint (published version) |
dc.relation.projectid | info:eu-repo/grantAgreement/MINECO//TEC2015-69266-P/ES/TECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO/ |
dc.relation.projectid | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107579RB-I00/ES/ARQUITECTURAS AVANZADAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ, AUDIO Y LENGUAJE/ |
local.citation.author | India, M.; Hernando, J.; Fonollosa, José A. R. |
local.citation.publicationName | Computer speech and language |
local.citation.volume | 78 |
local.citation.number | article 101441 |
local.citation.startingPage | 1 |
local.citation.endingPage | 12 |
Fitxers d'aquest items
Aquest ítem apareix a les col·leccions següents
-
Articles de revista [172]
-
Articles de revista [2.492]
-
Articles de revista [196]