Perceptual optimization of unit-selection text-to-speech synthesis systems by means of active interactive genetic algorithms
Document typeConference report
Rights accessOpen Access
The tuning process of Unit Selection TTS (US-TTS) system is usually performed by an expert that typically conducts the task of weighting the cost function by hand. However, hand tuning is costly in terms of the required training time and inaccurate and ambiguous in terms of methodology. With the purpose of easing the task of properly tuning the weights of the cost function, this thesis make its contribution from a perceptual-based approach using of active interactive Genetic Algorithms (aiGAs). The thesis pursues four major guidelines: i) accuracy when tuning the weights, ii) robustness of the obtained weights, iii)real world applicability of the methodology to any cost function design, and iv)finding consensus of the different users when tuning the weights. The experimentation is carried out through a small and medium sized corpus (1.9h) applied to different configurations (type of features) of the US-TTS cost function. The thesis concludes that aiGAs are highly competitive in comparison to other weight tuning techniques from the state-of-the-art
CitationFormiga, L.; Álias, F. Perceptual optimization of unit-selection text-to-speech synthesis systems by means of active interactive genetic algorithms. A: Jornadas en Tecnología del Habla and Iberian SLTech Workshop. "Proceedings IberSPEECH 2012: VII Jornadas en Tecnología del Habla and III Iberian SLTech Workshop: November 21 - 23, 2012, Madrid, Spain". Madrid: 2012, p. 500-509.