Mostra el registre d'ítem simple
Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model
dc.contributor.author | Pascual, Santiago |
dc.contributor.author | Bonafonte Cávez, Antonio |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2017-06-16T09:53:44Z |
dc.date.available | 2017-06-16T09:53:44Z |
dc.date.issued | 2016 |
dc.identifier.citation | Pascual, S., Bonafonte, A. Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model. A: ISCA Speech Synthesis Workshop. "SSW9: 9th ISCA Workshop on Speech Synthesis: proceedings: Sunnyvale (CA, USA): September 13-15, 2016". Sunnyvale, CA: International Speech Communication Association (ISCA), 2016, p. 112-117. |
dc.identifier.isbn | 978-0-9928-6266-4 |
dc.identifier.uri | http://hdl.handle.net/2117/105484 |
dc.description.abstract | Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with sin- gle speaker model. Moreover, we also tackle the problem of speaker interpolation by adding a new output layer (a-layer) on top of the multi-output branches. An identifying code is injected into the layer together with acoustic features of many speakers. Experiments show that the a-layer can effectively learn to interpolate the acoustic features between speakers. |
dc.format.extent | 6 p. |
dc.language.iso | eng |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Spain |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ |
dc.subject | Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic |
dc.subject.lcsh | Speech processing systems |
dc.subject.other | Text to speech |
dc.subject.other | Acoustic mapping |
dc.subject.other | Speaker interpolation |
dc.subject.other | Recurrent neural network |
dc.title | Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model |
dc.type | Conference report |
dc.subject.lemac | Processament de la parla |
dc.contributor.group | Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
dc.identifier.doi | 10.21437/SSW.2016-19 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | http://www.isca-speech.org/archive/SSW_2016/pdfs/ssw9_OS2-3_Pascual.pdf |
dc.rights.access | Open Access |
local.identifier.drac | 21092861 |
dc.description.version | Postprint (published version) |
dc.relation.projectid | info:eu-repo/grantAgreement/MINECO//TEC2015-69266-P/ES/TECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO/ |
local.citation.author | Pascual, S.; Bonafonte, A. |
local.citation.contributor | ISCA Speech Synthesis Workshop |
local.citation.pubplace | Sunnyvale, CA |
local.citation.publicationName | SSW9: 9th ISCA Workshop on Speech Synthesis: proceedings: Sunnyvale (CA, USA): September 13-15, 2016 |
local.citation.startingPage | 112 |
local.citation.endingPage | 117 |