Expressive speech synthesis using sentiment embeddings
Visualitza/Obre
10.21437/Interspeech.2018-2467
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/123860
Tipus de documentText en actes de congrés
Data publicació2018
EditorInternational Speech Communication Association (ISCA)
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
In this paper we present a DNN based speech synthesis system trained on an audiobook including sentiment features predicted by the Stanford sentiment parser. The baseline system uses DNN to predict acoustic parameters based on conventional linguistic features, as they have been used in statistical parametric speech synthesis. The predicted parameters are transformed into speech using a conventional high-quality vocoder. In this paper, the conventional linguistic features are enriched using sentiment features. Different sentiment representations have been considered, combining sentiment probabilities with hierarchical distance and context. After preliminary analysis a listening experiment is conducted, where participants evaluate the different systems. The results show the usefulness of the proposed features and reveal differences between expert and non-expert TTS user.
CitacióJauk, I., Lorenzo Trueba, J., Yamagishi, J., Bonafonte, A. Expressive speech synthesis using sentiment embeddings. A: Annual Conference of the International Speech Communication Association. "Interspeech 2018: 2-6 September 2018, Hyderabad". Baixas: International Speech Communication Association (ISCA), 2018, p. 3062-3066.
ISBN1990-9772
Versió de l'editorhttps://www.isca-speech.org/archive/Interspeech_2018/pdfs/2467.pdf
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
2467.pdf | 240,6Kb | Visualitza/Obre |