Mostra el registre d'ítem simple
Expressive speech synthesis using sentiment embeddings
dc.contributor.author | Jauk, Igor |
dc.contributor.author | Lorenzo Trueba, J. |
dc.contributor.author | Yamagishi, J. |
dc.contributor.author | Bonafonte Cávez, Antonio |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2018-11-09T14:16:22Z |
dc.date.available | 2018-11-09T14:16:22Z |
dc.date.issued | 2018 |
dc.identifier.citation | Jauk, I., Lorenzo Trueba, J., Yamagishi, J., Bonafonte, A. Expressive speech synthesis using sentiment embeddings. A: Annual Conference of the International Speech Communication Association. "Interspeech 2018: 2-6 September 2018, Hyderabad". Baixas: International Speech Communication Association (ISCA), 2018, p. 3062-3066. |
dc.identifier.isbn | 1990-9772 |
dc.identifier.uri | http://hdl.handle.net/2117/123860 |
dc.description.abstract | In this paper we present a DNN based speech synthesis system trained on an audiobook including sentiment features predicted by the Stanford sentiment parser. The baseline system uses DNN to predict acoustic parameters based on conventional linguistic features, as they have been used in statistical parametric speech synthesis. The predicted parameters are transformed into speech using a conventional high-quality vocoder. In this paper, the conventional linguistic features are enriched using sentiment features. Different sentiment representations have been considered, combining sentiment probabilities with hierarchical distance and context. After preliminary analysis a listening experiment is conducted, where participants evaluate the different systems. The results show the usefulness of the proposed features and reveal differences between expert and non-expert TTS user. |
dc.format.extent | 5 p. |
dc.language.iso | eng |
dc.publisher | International Speech Communication Association (ISCA) |
dc.subject | Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic |
dc.subject.lcsh | Automatic speech recognition |
dc.subject.other | DNN |
dc.subject.other | Expressive speech synthesis |
dc.subject.other | Sentiment analysis |
dc.subject.other | TTS Linguistics |
dc.subject.other | Sentiment analysis |
dc.subject.other | Speech synthesis |
dc.subject.other | Acoustic parameters |
dc.subject.other | Baseline systems |
dc.subject.other | Expressive speech synthesis |
dc.subject.other | Linguistic features |
dc.subject.other | Preliminary analysis |
dc.subject.other | Sentiment features |
dc.subject.other | Speech synthesis system |
dc.subject.other | Statistical parametric speech synthesis |
dc.subject.other | Speech communication |
dc.title | Expressive speech synthesis using sentiment embeddings |
dc.type | Conference report |
dc.subject.lemac | Reconeixement automàtic de la parla |
dc.contributor.group | Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
dc.identifier.doi | 10.21437/Interspeech.2018-2467 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | https://www.isca-speech.org/archive/Interspeech_2018/pdfs/2467.pdf |
dc.rights.access | Open Access |
local.identifier.drac | 23470866 |
dc.description.version | Postprint (published version) |
local.citation.author | Jauk, I.; Lorenzo Trueba, J.; Yamagishi, J.; Bonafonte, A. |
local.citation.contributor | Annual Conference of the International Speech Communication Association |
local.citation.pubplace | Baixas |
local.citation.publicationName | Interspeech 2018: 2-6 September 2018, Hyderabad |
local.citation.startingPage | 3062 |
local.citation.endingPage | 3066 |