Creating expressive synthetic voices by unsupervised clustering of audiobooks

Jauk, Igor; Bonafonte Cávez, Antonio; López Otero, Paula; Docio Fernández, Laura

dc.contributor.author	Jauk, Igor
dc.contributor.author	Bonafonte Cávez, Antonio
dc.contributor.author	López Otero, Paula
dc.contributor.author	Docio Fernández, Laura
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2016-01-18T14:34:59Z
dc.date.issued	2015
dc.identifier.citation	Jauk, I., Bonafonte, A., López-Otero, P., Docio-Fernández, L. Creating expressive synthetic voices by unsupervised clustering of audiobooks. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2015: 16th Annual Conference of the International Speech Communication Association: Dresden, Germany: September 6-10, 2015". Dresden: International Speech Communication Association (ISCA), 2015, p. 3380-3384.
dc.identifier.isbn	1990-9770
dc.identifier.uri	http://hdl.handle.net/2117/81613
dc.description.abstract	In this work we design an approach for automatic feature selection and voice creation for expressive synthesis. Our approach is guided by two main goals: (1) increasing the flexibility of expressive voice creation and (2) overcoming the limitations of speaking styles in expressive synthesis. We define a novel set of features, combining traditionally used prosodic features with spectral features and proposing the use of iVectors. With these features we perform unsupervised clustering of an audiobook excerpt and, from these clusters, we create synthetic voices using the SAT technique. To evaluate the clustering performance we propose an objective evaluation of the unsupervised clustering results technique based on perplexity reduction. This objective evaluation indicates that both prosodic and spectral features contribute to separate speaking styles and emotions, achieving the best results when including iVectors in the feature set, leading to a perplexity reduction of the expressions and audiobook characters by factors 14 and 2, respectively. We also designed a novel subjective evaluation method where the participants have to edit a small excerpt of an audiobook using synthetic voices created from clusters. The results suggest that our feature set is effective in the task of expressiveness and character detection.
dc.format.extent	5 p.
dc.language.iso	eng
dc.publisher	International Speech Communication Association (ISCA)
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject	Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural
dc.subject.lcsh	Automatic speech recognition
dc.subject.lcsh	Natural language processing (Computer science)
dc.subject.other	Expressive speech synthesis
dc.subject.other	Automatic voice creation
dc.subject.other	Expressive speech synthesis evaluation
dc.title	Creating expressive synthetic voices by unsupervised clustering of audiobooks
dc.type	Conference lecture
dc.subject.lemac	Reconeixement automàtic de la parla
dc.subject.lemac	Tractament del llenguatge natural (Informàtica)
dc.contributor.group	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://www.isca-speech.org/archive/interspeech_2015/i15_3380.html
dc.rights.access	Restricted access - publisher's policy
local.identifier.drac	16678708
dc.description.version	Postprint (published version)
dc.date.lift	10000-01-01
local.citation.author	Jauk, I.; Bonafonte, A.; López-Otero, P.; Docio-Fernández, L.
local.citation.contributor	Annual Conference of the International Speech Communication Association
local.citation.pubplace	Dresden
local.citation.publicationName	INTERSPEECH 2015: 16th Annual Conference of the International Speech Communication Association: Dresden, Germany: September 6-10, 2015
local.citation.startingPage	3380
local.citation.endingPage	3384

Fitxers d'aquest items

Nom:: Creating Expressive Synthetic ...
Mida:: 225,6Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [437]
Ponències/Comunicacions de congressos [3.323]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Creating expressive synthetic voices by unsupervised clustering of audiobooks

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora