Show simple item record

dc.contributor.authorJauk, Igor
dc.contributor.authorLorenzo Trueba, J.
dc.contributor.authorYamagishi, J.
dc.contributor.authorBonafonte Cávez, Antonio
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2018-11-09T14:16:22Z
dc.date.available2018-11-09T14:16:22Z
dc.date.issued2018
dc.identifier.citationJauk, I., Lorenzo Trueba, J., Yamagishi, J., Bonafonte, A. Expressive speech synthesis using sentiment embeddings. A: Annual Conference of the International Speech Communication Association. "Interspeech 2018: 2-6 September 2018, Hyderabad". Baixas: International Speech Communication Association (ISCA), 2018, p. 3062-3066.
dc.identifier.isbn1990-9772
dc.identifier.urihttp://hdl.handle.net/2117/123860
dc.description.abstractIn this paper we present a DNN based speech synthesis system trained on an audiobook including sentiment features predicted by the Stanford sentiment parser. The baseline system uses DNN to predict acoustic parameters based on conventional linguistic features, as they have been used in statistical parametric speech synthesis. The predicted parameters are transformed into speech using a conventional high-quality vocoder. In this paper, the conventional linguistic features are enriched using sentiment features. Different sentiment representations have been considered, combining sentiment probabilities with hierarchical distance and context. After preliminary analysis a listening experiment is conducted, where participants evaluate the different systems. The results show the usefulness of the proposed features and reveal differences between expert and non-expert TTS user.
dc.format.extent5 p.
dc.language.isoeng
dc.publisherInternational Speech Communication Association (ISCA)
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcshAutomatic speech recognition
dc.subject.otherDNN
dc.subject.otherExpressive speech synthesis
dc.subject.otherSentiment analysis
dc.subject.otherTTS Linguistics
dc.subject.otherSentiment analysis
dc.subject.otherSpeech synthesis
dc.subject.otherAcoustic parameters
dc.subject.otherBaseline systems
dc.subject.otherExpressive speech synthesis
dc.subject.otherLinguistic features
dc.subject.otherPreliminary analysis
dc.subject.otherSentiment features
dc.subject.otherSpeech synthesis system
dc.subject.otherStatistical parametric speech synthesis
dc.subject.otherSpeech communication
dc.titleExpressive speech synthesis using sentiment embeddings
dc.typeConference report
dc.subject.lemacReconeixement automàtic de la parla
dc.contributor.groupUniversitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.identifier.doi10.21437/Interspeech.2018-2467
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://www.isca-speech.org/archive/Interspeech_2018/pdfs/2467.pdf
dc.rights.accessOpen Access
drac.iddocument23470866
dc.description.versionPostprint (published version)
upcommons.citation.authorJauk, I., Lorenzo Trueba, J., Yamagishi, J., Bonafonte, A.
upcommons.citation.contributorAnnual Conference of the International Speech Communication Association
upcommons.citation.pubplaceBaixas
upcommons.citation.publishedtrue
upcommons.citation.publicationNameInterspeech 2018: 2-6 September 2018, Hyderabad
upcommons.citation.startingPage3062
upcommons.citation.endingPage3066


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder