Multimodal emotion recognition via face and voice

Griera i Jiménez, Oriol

dc.contributor	Hernando Pericás, Francisco Javier
dc.contributor	de Marsico, Maria
dc.contributor.author	Griera i Jiménez, Oriol
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2022-10-05T17:02:14Z
dc.date.available	2022-10-05T17:02:14Z
dc.date.issued	2022-07-14
dc.identifier.uri	http://hdl.handle.net/2117/374046
dc.description.abstract	Recent advances in technology have allowed humans to interact with computers in ways previously unimaginable. Despite significant progress, a necessary element for natural interaction is still lacking: emotions. Emotions play an important role in human communication and interaction, allowing people to express themselves beyond the language domain. The purpose of this project is to develop a multimodal system to classify emotions using facial expressions and the voice taken from videos. For face emotion recognition, face images and optical flow frames are used to exploit spatial and temporal information of the videos. Regarding the voice, the model uses speech features extracted from the chunked audio signals to predict the emotion. The combination of the two biometrics with a score-level fusion achieves excellent performance on the RAVDESS and the BAUM-1 datasets. However, the results remark the importance of further investigating the preprocessing techniques applied in this work to "normalize" the datasets to a unified format to improve the cross-dataset performance.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.rights	S'autoritza la difusió de l'obra mitjançant la llicència Creative Commons o similar 'Reconeixement-NoComercial- SenseObraDerivada'
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic
dc.subject.lcsh	Computer vision
dc.subject.lcsh	Deep learning
dc.subject.other	Computer Vision
dc.subject.other	Deep Learning
dc.subject.other	Emotion recognition
dc.title	Multimodal emotion recognition via face and voice
dc.title.alternative	Multimodal emotion recognition via face and voice
dc.type	Master thesis
dc.subject.lemac	Visió per ordinador
dc.subject.lemac	Aprenentatge profund
dc.identifier.slug	ETSETB-230.170960
dc.rights.access	Open Access
dc.date.updated	2022-10-05T05:50:57Z
dc.audience.educationlevel	Màster
dc.audience.mediator	Escola Tècnica Superior d'Enginyeria de Telecomunicació de Barcelona
dc.audience.degree	MÀSTER UNIVERSITARI EN ENGINYERIA DE TELECOMUNICACIÓ (Pla 2013)

Fitxers d'aquest items

Nom:: GrieraJimenezOriol_TFM.pdf
Mida:: 6,986Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Master's degree in Telecommunications Engineering (MET) [393]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Multimodal emotion recognition via face and voice

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora