Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model

Pascual, Santiago; Bonafonte Cávez, Antonio

doi:10.21437/SSW.2016-19

dc.contributor.author	Pascual, Santiago
dc.contributor.author	Bonafonte Cávez, Antonio
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2017-06-16T09:53:44Z
dc.date.available	2017-06-16T09:53:44Z
dc.date.issued	2016
dc.identifier.citation	Pascual, S., Bonafonte, A. Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model. A: ISCA Speech Synthesis Workshop. "SSW9: 9th ISCA Workshop on Speech Synthesis: proceedings: Sunnyvale (CA, USA): September 13-15, 2016". Sunnyvale, CA: International Speech Communication Association (ISCA), 2016, p. 112-117.
dc.identifier.isbn	978-0-9928-6266-4
dc.identifier.uri	http://hdl.handle.net/2117/105484
dc.description.abstract	Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with sin- gle speaker model. Moreover, we also tackle the problem of speaker interpolation by adding a new output layer (a-layer) on top of the multi-output branches. An identifying code is injected into the layer together with acoustic features of many speakers. Experiments show that the a-layer can effectively learn to interpolate the acoustic features between speakers.
dc.format.extent	6 p.
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Speech processing systems
dc.subject.other	Text to speech
dc.subject.other	Acoustic mapping
dc.subject.other	Speaker interpolation
dc.subject.other	Recurrent neural network
dc.title	Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model
dc.type	Conference report
dc.subject.lemac	Processament de la parla
dc.contributor.group	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi	10.21437/SSW.2016-19
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://www.isca-speech.org/archive/SSW_2016/pdfs/ssw9_OS2-3_Pascual.pdf
dc.rights.access	Open Access
local.identifier.drac	21092861
dc.description.version	Postprint (published version)
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TEC2015-69266-P/ES/TECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO/
local.citation.author	Pascual, S.; Bonafonte, A.
local.citation.contributor	ISCA Speech Synthesis Workshop
local.citation.pubplace	Sunnyvale, CA
local.citation.publicationName	SSW9: 9th ISCA Workshop on Speech Synthesis: proceedings: Sunnyvale (CA, USA): September 13-15, 2016
local.citation.startingPage	112
local.citation.endingPage	117

Fitxers d'aquest items

Nom:: ssw9_OS2-3_Pascual.pdf
Mida:: 673,6Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [437]
Ponències/Comunicacions de congressos [3.332]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora