Multimodal speech emotion recognition

Kreplak López, Marina

Visualitza/Obre

153481.pdf (5,433Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Kreplak López, Marina

Tutor / directorPadró, Lluís

; Adil Moujahid, Mohammed

Tipus de documentProjecte Final de Màster Oficial

Data2020-06-22

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

The recognition of emotions in speech is one of the most challenging topics in data science. In this work, we define a pipeline for the study of multimodal speech recognition, using a wide set of features from audio samples and text transcripts. This work aims to study the interaction and contribution of multimodal features and for this purpose, three types of features have been selected. We extract a set of handcrafted features related to speech prosody, along with classical mel spectrogram acoustic features and TF-IDF for text. Combining these three types of data we evaluate the contribution that they represent to each other. This Thesis also provides a comparative study between the classical machine learning models performance over neural architectures in terms of performance and learning potential from speech. Finally, it presents an application that provides emotion classification and feedback retrieval for misclassified samples.

MatèriesMachine learning, Artificial intelligence, Aprenentatge automàtic, Intel·ligència artificial

TitulacióMÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2017)

URIhttp://hdl.handle.net/2117/336107

Col·leccions

Màsters oficials - Master in Artificial Intelligence - MAI [278]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
153481.pdf		5,433Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Multimodal speech emotion recognition

Visualitza/Obre

Explora