Multimodal 2DCNN action recognition from RGB-D data with video summarization

Roig Ripoll, Vicent

dc.contributor	Escalera, Sergio
dc.contributor.author	Roig Ripoll, Vicent
dc.date.accessioned	2018-05-27T08:48:19Z
dc.date.available	2018-05-27T08:48:19Z
dc.date.issued	2017-10
dc.identifier.uri	http://hdl.handle.net/2117/117559
dc.description.abstract	Human action recognition is nowadays within the most active computer vision research areas. The problem of action recognition is challenging due to the large intra-class variations, low video resolution and high dimension of video data, among others things. Recent development of affordable depth sensors like Microsoft Kinect leads to new opportunities in this field by providing both RGB and depth data. Multimodal fusion in this scenario can greatly help to boost performance of action recognition methods. Recently, although handcrafted features are still widely used owing to their high performance and low computational complexity, there has been a migration from traditional handcrafting towards deep learning. In this work, 2DCNN is extended to multimodal (MM2DCNN) by introducing scene flow fields as the new input for an additional stream. Then, model outputs are integrated in a late fusion fashion. Furthermore, this work also focuses on analyzing the impact of video summarization in action recognition models. To this end, four different summarization techniques have been applied and compared to uniform random selection. Video summarization algorithms aim to select the most discriminative frames of each video, providing keyframe sequences as a result. Each of these methods has been performed over the two different types of data available, extracting keyframe sequences from RGB and depth videos separately. On top of that, we also perform a novel hybrid-like summarization, namely RGB-D synopsis, by combining results from both sequences. Finally, we evaluate and compare the results of each modality in three state-of-the-art action datasets, integrating them with a late fusion for every summarization sequence modality along with uniform random selection. Experimental results show that our new representation improves the accuracy in comparison to 2DCNNs. Besides, the use of video summarization succeeds in boosting the final performance when compared to random frames.
dc.language.iso	eng
dc.publisher	Universitat PolitÃ©cnica de Catalunya
dc.subject	Àrees temàtiques de la UPC::Informàtica
dc.subject.lcsh	Machine Learning
dc.subject.lcsh	Computer vision
dc.title	Multimodal 2DCNN action recognition from RGB-D data with video summarization
dc.title.alternative	Deep-learning based temporal analysis if actions
dc.type	Master thesis
dc.subject.lemac	Aprenentatge automàtic
dc.subject.lemac	Visió per ordinador
dc.identifier.slug	128469
dc.rights.access	Open Access
dc.date.updated	2017-11-06T07:55:40Z
dc.audience.educationlevel	Màster
dc.audience.mediator	Facultat d'Informàtica de Barcelona
dc.audience.degree	MÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2017)

Fitxers d'aquest items

Nom:: 128469.pdf
Mida:: 7,320Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Master in Artificial Intelligence - MAI [278]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Multimodal 2DCNN action recognition from RGB-D data with video summarization

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora