Mostra el registre d'ítem simple

dc.contributor.authorPascual, Santiago
dc.contributor.authorBonafonte Cávez, Antonio
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2017-06-16T09:53:44Z
dc.date.available2017-06-16T09:53:44Z
dc.date.issued2016
dc.identifier.citationPascual, S., Bonafonte, A. Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model. A: ISCA Speech Synthesis Workshop. "SSW9: 9th ISCA Workshop on Speech Synthesis: proceedings: Sunnyvale (CA, USA): September 13-15, 2016". Sunnyvale, CA: International Speech Communication Association (ISCA), 2016, p. 112-117.
dc.identifier.isbn978-0-9928-6266-4
dc.identifier.urihttp://hdl.handle.net/2117/105484
dc.description.abstractDeep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with sin- gle speaker model. Moreover, we also tackle the problem of speaker interpolation by adding a new output layer (a-layer) on top of the multi-output branches. An identifying code is injected into the layer together with acoustic features of many speakers. Experiments show that the a-layer can effectively learn to interpolate the acoustic features between speakers.
dc.format.extent6 p.
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcshSpeech processing systems
dc.subject.otherText to speech
dc.subject.otherAcoustic mapping
dc.subject.otherSpeaker interpolation
dc.subject.otherRecurrent neural network
dc.titleMulti-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model
dc.typeConference report
dc.subject.lemacProcessament de la parla
dc.contributor.groupUniversitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi10.21437/SSW.2016-19
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://www.isca-speech.org/archive/SSW_2016/pdfs/ssw9_OS2-3_Pascual.pdf
dc.rights.accessOpen Access
local.identifier.drac21092861
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO//TEC2015-69266-P/ES/TECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO/
local.citation.authorPascual, S.; Bonafonte, A.
local.citation.contributorISCA Speech Synthesis Workshop
local.citation.pubplaceSunnyvale, CA
local.citation.publicationNameSSW9: 9th ISCA Workshop on Speech Synthesis: proceedings: Sunnyvale (CA, USA): September 13-15, 2016
local.citation.startingPage112
local.citation.endingPage117


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple