Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

63.157 UPC academic works
You are here:
View Item 
  •   DSpace Home
  • Treballs acadèmics
  • Màsters oficials
  • Master's degree in Telecommunications Engineering (MET)
  • View Item
  •   DSpace Home
  • Treballs acadèmics
  • Màsters oficials
  • Master's degree in Telecommunications Engineering (MET)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Multimodal emotion recognition via face and voice

Thumbnail
View/Open
GrieraJimenezOriol_TFM.pdf (6,986Mb)
Share:
 
  View Usage Statistics
Cita com:
hdl:2117/374046

Show full item record
Griera i Jiménez, Oriol
Tutor / directorHernando Pericás, Francisco JavierMés informacióMés informacióMés informació; de Marsico, Maria
Document typeMaster thesis
Date2022-07-14
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Recent advances in technology have allowed humans to interact with computers in ways previously unimaginable. Despite significant progress, a necessary element for natural interaction is still lacking: emotions. Emotions play an important role in human communication and interaction, allowing people to express themselves beyond the language domain. The purpose of this project is to develop a multimodal system to classify emotions using facial expressions and the voice taken from videos. For face emotion recognition, face images and optical flow frames are used to exploit spatial and temporal information of the videos. Regarding the voice, the model uses speech features extracted from the chunked audio signals to predict the emotion. The combination of the two biometrics with a score-level fusion achieves excellent performance on the RAVDESS and the BAUM-1 datasets. However, the results remark the importance of further investigating the preprocessing techniques applied in this work to "normalize" the datasets to a unified format to improve the cross-dataset performance.
SubjectsComputer vision, Deep learning, Visió per ordinador, Aprenentatge profund
DegreeMÀSTER UNIVERSITARI EN ENGINYERIA DE TELECOMUNICACIÓ (Pla 2013)
URIhttp://hdl.handle.net/2117/374046
Collections
  • Màsters oficials - Master's degree in Telecommunications Engineering (MET) [356]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
GrieraJimenezOriol_TFM.pdf6,986MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Inici de la pàgina