Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation

Pascual, Santiago; Bonafonte Cávez, Antonio

doi:10.1109/EUSIPCO.2016.7760664

Visualitza/Obre

07760664.pdf (551,6Kb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Pascual, Santiago

Bonafonte Cávez, Antonio

Tipus de documentText en actes de congrés

Data publicació2016

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés restringit per política de l'editorial

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with single speaker model. Moreover, we also tackle the problem of speaker adaptation by adding a new output branch to the model and successfully training it without the need of modifying the base optimized model. This fine tuning method achieves better results than training the new speaker from scratch with its own model.

CitacióPascual, S., Bonafonte, A. Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation. A: European Signal Processing Conference. "2016 24th European Signal Processing Conference (EUSIPCO): took place 28 August-2 September 2016 in Budapest, Hungary". Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 2325-2329.

URIhttp://hdl.handle.net/2117/117430

DOI10.1109/EUSIPCO.2016.7760664

ISBN978-1-5090-1891-8

Versió de l'editorhttp://ieeexplore.ieee.org/document/7760664/

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
07760664.pdf		551,6Kb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation

Visualitza/Obre

Explora