Creating expressive synthetic voices by unsupervised clustering of audiobooks

Jauk, Igor; Bonafonte Cávez, Antonio; López Otero, Paula; Docio Fernández, Laura

Visualitza/Obre

Creating Expressive Synthetic Voices by Unsupervised Clustering of Audiobooks.pdf (225,6Kb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Jauk, Igor

Bonafonte Cávez, Antonio

López Otero, Paula

Docio Fernández, Laura

Tipus de documentComunicació de congrés

Data publicació2015

EditorInternational Speech Communication Association (ISCA)

Condicions d'accésAccés restringit per política de l'editorial

Attribution-NonCommercial-NoDerivs 3.0 Spain

Llevat que s'hi indiqui el contrari, els continguts d'aquesta obra estan subjectes a la llicència de Creative Commons : Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya

Abstract

In this work we design an approach for automatic feature selection and voice creation for expressive synthesis. Our approach is guided by two main goals: (1) increasing the flexibility of expressive voice creation and (2) overcoming the limitations of speaking styles in expressive synthesis. We define a novel set of features, combining traditionally used prosodic features with spectral features and proposing the use of iVectors. With these features we perform unsupervised clustering of an audiobook excerpt and, from these clusters, we create synthetic voices using the SAT technique. To evaluate the clustering performance we propose an objective evaluation of the unsupervised clustering results technique based on perplexity reduction. This objective evaluation indicates that both prosodic and spectral features contribute to separate speaking styles and emotions, achieving the best results when including iVectors in the feature set, leading to a perplexity reduction of the expressions and audiobook characters by factors 14 and 2, respectively. We also designed a novel subjective evaluation method where the participants have to edit a small excerpt of an audiobook using synthetic voices created from clusters. The results suggest that our feature set is effective in the task of expressiveness and character detection.

CitacióJauk, I., Bonafonte, A., López-Otero, P., Docio-Fernández, L. Creating expressive synthetic voices by unsupervised clustering of audiobooks. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2015: 16th Annual Conference of the International Speech Communication Association: Dresden, Germany: September 6-10, 2015". Dresden: International Speech Communication Association (ISCA), 2015, p. 3380-3384.

URIhttp://hdl.handle.net/2117/81613

ISBN1990-9770

Versió de l'editorhttp://www.isca-speech.org/archive/interspeech_2015/i15_3380.html

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Creating Expres ... ustering of Audiobooks.pdf		225,6Kb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

Creating expressive synthetic voices by unsupervised clustering of audiobooks

Visualitza/Obre

Explora