Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation
07760664.pdf (551,6Kb) (Restricted access) Request copy
Què és aquest botó?
Aquest botó permet demanar una còpia d'un document restringit a l'autor. Es mostra quan:
- Disposem del correu electrònic de l'autor
- El document té una mida inferior a 20 Mb
- Es tracta d'un document d'accés restringit per decisió de l'autor o d'un document d'accés restringit per política de l'editorial
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessRestricted access - publisher's policy
Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with single speaker model. Moreover, we also tackle the problem of speaker adaptation by adding a new output branch to the model and successfully training it without the need of modifying the base optimized model. This fine tuning method achieves better results than training the new speaker from scratch with its own model.
CitationPascual, S., Bonafonte, A. Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation. A: European Signal Processing Conference. "2016 24th European Signal Processing Conference (EUSIPCO): took place 28 August-2 September 2016 in Budapest, Hungary". Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 2325-2329.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder