Expressive speech synthesis using sentiment embeddings

Jauk, Igor; Lorenzo Trueba, J.; Yamagishi, J.; Bonafonte Cávez, Antonio

doi:10.21437/Interspeech.2018-2467

dc.contributor.author	Jauk, Igor
dc.contributor.author	Lorenzo Trueba, J.
dc.contributor.author	Yamagishi, J.
dc.contributor.author	Bonafonte Cávez, Antonio
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2018-11-09T14:16:22Z
dc.date.available	2018-11-09T14:16:22Z
dc.date.issued	2018
dc.identifier.citation	Jauk, I., Lorenzo Trueba, J., Yamagishi, J., Bonafonte, A. Expressive speech synthesis using sentiment embeddings. A: Annual Conference of the International Speech Communication Association. "Interspeech 2018: 2-6 September 2018, Hyderabad". Baixas: International Speech Communication Association (ISCA), 2018, p. 3062-3066.
dc.identifier.isbn	1990-9772
dc.identifier.uri	http://hdl.handle.net/2117/123860
dc.description.abstract	In this paper we present a DNN based speech synthesis system trained on an audiobook including sentiment features predicted by the Stanford sentiment parser. The baseline system uses DNN to predict acoustic parameters based on conventional linguistic features, as they have been used in statistical parametric speech synthesis. The predicted parameters are transformed into speech using a conventional high-quality vocoder. In this paper, the conventional linguistic features are enriched using sentiment features. Different sentiment representations have been considered, combining sentiment probabilities with hierarchical distance and context. After preliminary analysis a listening experiment is conducted, where participants evaluate the different systems. The results show the usefulness of the proposed features and reveal differences between expert and non-expert TTS user.
dc.format.extent	5 p.
dc.language.iso	eng
dc.publisher	International Speech Communication Association (ISCA)
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Automatic speech recognition
dc.subject.other	DNN
dc.subject.other	Expressive speech synthesis
dc.subject.other	Sentiment analysis
dc.subject.other	TTS Linguistics
dc.subject.other	Sentiment analysis
dc.subject.other	Speech synthesis
dc.subject.other	Acoustic parameters
dc.subject.other	Baseline systems
dc.subject.other	Expressive speech synthesis
dc.subject.other	Linguistic features
dc.subject.other	Preliminary analysis
dc.subject.other	Sentiment features
dc.subject.other	Speech synthesis system
dc.subject.other	Statistical parametric speech synthesis
dc.subject.other	Speech communication
dc.title	Expressive speech synthesis using sentiment embeddings
dc.type	Conference report
dc.subject.lemac	Reconeixement automàtic de la parla
dc.contributor.group	Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi	10.21437/Interspeech.2018-2467
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	https://www.isca-speech.org/archive/Interspeech_2018/pdfs/2467.pdf
dc.rights.access	Open Access
local.identifier.drac	23470866
dc.description.version	Postprint (published version)
local.citation.author	Jauk, I.; Lorenzo Trueba, J.; Yamagishi, J.; Bonafonte, A.
local.citation.contributor	Annual Conference of the International Speech Communication Association
local.citation.pubplace	Baixas
local.citation.publicationName	Interspeech 2018: 2-6 September 2018, Hyderabad
local.citation.startingPage	3062
local.citation.endingPage	3066

Fitxers d'aquest items

Nom:: 2467.pdf
Mida:: 240,6Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [437]
Ponències/Comunicacions de congressos [3.327]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Expressive speech synthesis using sentiment embeddings

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora