Mostra el registre d'ítem simple

dc.contributorEscalera, Sergio
dc.contributor.authorRoig Ripoll, Vicent
dc.date.accessioned2018-05-27T08:48:19Z
dc.date.available2018-05-27T08:48:19Z
dc.date.issued2017-10
dc.identifier.urihttp://hdl.handle.net/2117/117559
dc.description.abstractHuman action recognition is nowadays within the most active computer vision research areas. The problem of action recognition is challenging due to the large intra-class variations, low video resolution and high dimension of video data, among others things. Recent development of affordable depth sensors like Microsoft Kinect leads to new opportunities in this field by providing both RGB and depth data. Multimodal fusion in this scenario can greatly help to boost performance of action recognition methods. Recently, although handcrafted features are still widely used owing to their high performance and low computational complexity, there has been a migration from traditional handcrafting towards deep learning. In this work, 2DCNN is extended to multimodal (MM2DCNN) by introducing scene flow fields as the new input for an additional stream. Then, model outputs are integrated in a late fusion fashion. Furthermore, this work also focuses on analyzing the impact of video summarization in action recognition models. To this end, four different summarization techniques have been applied and compared to uniform random selection. Video summarization algorithms aim to select the most discriminative frames of each video, providing keyframe sequences as a result. Each of these methods has been performed over the two different types of data available, extracting keyframe sequences from RGB and depth videos separately. On top of that, we also perform a novel hybrid-like summarization, namely RGB-D synopsis, by combining results from both sequences. Finally, we evaluate and compare the results of each modality in three state-of-the-art action datasets, integrating them with a late fusion for every summarization sequence modality along with uniform random selection. Experimental results show that our new representation improves the accuracy in comparison to 2DCNNs. Besides, the use of video summarization succeeds in boosting the final performance when compared to random frames.
dc.language.isoeng
dc.publisherUniversitat Politécnica de Catalunya
dc.subjectÀrees temàtiques de la UPC::Informàtica
dc.subject.lcshMachine Learning
dc.subject.lcshComputer vision
dc.titleMultimodal 2DCNN action recognition from RGB-D data with video summarization
dc.title.alternativeDeep-learning based temporal analysis if actions
dc.typeMaster thesis
dc.subject.lemacAprenentatge automàtic
dc.subject.lemacVisió per ordinador
dc.identifier.slug128469
dc.rights.accessOpen Access
dc.date.updated2017-11-06T07:55:40Z
dc.audience.educationlevelMàster
dc.audience.mediatorFacultat d'Informàtica de Barcelona
dc.audience.degreeMÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2017)


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple