Personality regression from multimodal dyadic data

Curto Janó, David

Visualitza/Obre

160245.pdf (7,343Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Curto Janó, David

Tutor / directorEscalera, Sergio; Clapés i Sintes, Albert; Smeureanu, Sorina

Tipus de documentProjecte Final de Màster Oficial

Data2021-06-21

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Personality is made up of broad traits that are relatively stable over time and allow to differentiate one person from another. The most widely accepted theory to model personality is the Big-Five model that defines the traits as a spectrum, allowing to rank and measure differences between individual's personality. Humans infer personality by observing different verbal and non-verbal cues. We are able to infer the personality of others through the observation of different modalities, capturing patterns from speech, body gestures, facial expressions, among others. This Master's thesis proposes a multimodal model that extracts audiovisual features using state-of-the-art methods to infer the personality of a target person in a dyadic scenario. The model is trained on the UDIVA dataset , a multimodal dataset of non-scripted face-to-face dyadic interactions based on free and structured tasks that elicit different behavior and cognitive workload in the participants. All sessions are conducted in a controlled environment and the personality of the participants is obtained through self-reported assessments. We investigate the effect of the audio and video modalities for the recognition of the personality separately but also jointly, analyzing the general performance, by session, participant and by task. Furthermore, we also evaluate the effect of adding a larger range of visual and acoustic cues before producing the prediction regarding the performance of the model. The results from an incremental study show that the performance of the model is improved when combining long-range visual and acoustic features. Showing significant improvements in most metrics compared to the performance of the previous state-of-the-art model. The results are very promising considering that our model has been trained with a smaller part of the data set, fewer modalities and in a multi-task manner (a single model for all tasks).

MatèriesMachine learning, Computer vision, Artificial intelligence, Aprenentatge automàtic, Visió per ordinador, Intel·ligència artificial

TitulacióMÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2017)

URIhttp://hdl.handle.net/2117/352537

Col·leccions

Màsters oficials - Master in Artificial Intelligence - MAI [278]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
160245.pdf		7,343Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Personality regression from multimodal dyadic data

Visualitza/Obre

Explora