Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model

Pascual, Santiago; Bonafonte Cávez, Antonio

doi:10.21437/SSW.2016-19

Visualitza/Obre

ssw9_OS2-3_Pascual.pdf (673,6Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Pascual, Santiago

Bonafonte Cávez, Antonio

Tipus de documentText en actes de congrés

Data publicació2016

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés obert

Attribution-NonCommercial-NoDerivs 3.0 Spain

Llevat que s'hi indiqui el contrari, els continguts d'aquesta obra estan subjectes a la llicència de Creative Commons : Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya

ProjecteTECNOLOGIAS DE APRENDIZAJE PROFUNDO APLICADAS AL PROCESADO DE VOZ Y AUDIO (MINECO-TEC2015-69266-P)

Abstract

Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with sin- gle speaker model. Moreover, we also tackle the problem of speaker interpolation by adding a new output layer (a-layer) on top of the multi-output branches. An identifying code is injected into the layer together with acoustic features of many speakers. Experiments show that the a-layer can effectively learn to interpolate the acoustic features between speakers.

CitacióPascual, S., Bonafonte, A. Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model. A: ISCA Speech Synthesis Workshop. "SSW9: 9th ISCA Workshop on Speech Synthesis: proceedings: Sunnyvale (CA, USA): September 13-15, 2016". Sunnyvale, CA: International Speech Communication Association (ISCA), 2016, p. 112-117.

URIhttp://hdl.handle.net/2117/105484

DOI10.21437/SSW.2016-19

ISBN978-0-9928-6266-4

Versió de l'editorhttp://www.isca-speech.org/archive/SSW_2016/pdfs/ssw9_OS2-3_Pascual.pdf

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
ssw9_OS2-3_Pascual.pdf		673,6Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Multi-output RNN-LSTM for multiple speaker speech synthesis with a-interpolation model

Visualitza/Obre

Explora