Action recognition based on efficient deep feature learning in the spatio-temporal domain

Husain, Syed Farzad; Dellen, Babette; Torras, Carme

doi:10.1109/LRA.2016.2529686

dc.contributor.author	Husain, Syed Farzad
dc.contributor.author	Dellen, Babette
dc.contributor.author	Torras, Carme
dc.contributor.other	Institut de Robòtica i Informàtica Industrial
dc.date.accessioned	2017-04-21T13:12:57Z
dc.date.available	2017-04-21T13:12:57Z
dc.date.issued	2016
dc.identifier.citation	Husain, S., Dellen, B., Torras, C. Action recognition based on efficient deep feature learning in the spatio-temporal domain. "IEEE robotics and automation letters", 2016, vol. 1, núm. 2, p. 984-991.
dc.identifier.issn	2377-3766
dc.identifier.uri	http://hdl.handle.net/2117/103626
dc.description	© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.description.abstract	Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.
dc.format.extent	8 p.
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Informàtica::Automàtica i control
dc.subject.other	Computer vision for automation
dc.subject.other	recognition
dc.subject.other	visual learning
dc.subject.other	artificial intelligence
dc.subject.other	computer vision
dc.subject.other	pattern classification
dc.title	Action recognition based on efficient deep feature learning in the spatio-temporal domain
dc.type	Article
dc.contributor.group	Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI
dc.identifier.doi	10.1109/LRA.2016.2529686
dc.description.peerreviewed	Peer Reviewed
dc.subject.inspec	Classificació INSPEC::Pattern recognition::Computer vision
dc.subject.inspec	Classificació INSPEC::Pattern recognition
dc.relation.publisherversion	http://ieeexplore.ieee.org/document/7406684/
dc.rights.access	Open Access
local.identifier.drac	19160450
dc.description.version	Postprint (author's final draft)
local.citation.author	Husain, S.; Dellen, B.; Torras, C.
local.citation.publicationName	IEEE robotics and automation letters
local.citation.volume	1
local.citation.number	2
local.citation.startingPage	984
local.citation.endingPage	991

Fitxers d'aquest items

Nom:: 1756-Action-Recognition-based- ...
Mida:: 4,227Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Articles de revista [163]
Articles de revista [376]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Action recognition based on efficient deep feature learning in the spatio-temporal domain

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora