dc.contributor.author | Xu, Zhengyu |
dc.contributor.author | Vilaplana Besler, Verónica |
dc.contributor.author | Morros Rubió, Josep Ramon |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions |
dc.date.accessioned | 2019-02-04T07:48:06Z |
dc.date.available | 2019-02-04T07:48:06Z |
dc.date.issued | 2018 |
dc.identifier.citation | Xu, Z.; Vilaplana, V.; Morros, J.R. Action tube extraction based 3D-CNN for RGB-D action recognition. A: International Workshop on Content-Based Multimedia Indexing. "16th International Conference on Content-Based Multimedia Indexing: 4-6 September, 2018 La Rochelle, France". Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 1-6. |
dc.identifier.isbn | 978-1-5386-7021-7 |
dc.identifier.uri | http://hdl.handle.net/2117/128191 |
dc.description.abstract | In this paper we propose a novel action tube extractor for RGB-D action recognition in trimmed videos. The action tube extractor takes as input a video and outputs an action tube. The method consists of two parts: spatial tube extraction and temporal sampling. The first part is built upon MobileNet-SSD and its role is to define the spatial region where the action takes place. The second part is based on the structural similarity index (SSIM) and is designed to remove frames without obvious motion from the primary action tube. The final extracted action tube has two benefits: 1) a higher ratio of ROI (subjects of action) to background; 2) most frames contain obvious motion change. We propose to use a two-stream (RGB and Depth) I3D architecture as our 3D-CNN model. Our approach outperforms the state-of-the-art methods on the OA and NTU RGB-D datasets. © 2018 IEEE. |
dc.format.extent | 6 p. |
dc.language.iso | eng |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) |
dc.subject | Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo |
dc.subject | Àrees temàtiques de la UPC::So, imatge i multimèdia::Creació multimèdia::Vídeo digital |
dc.subject.lcsh | Digital video |
dc.subject.lcsh | 3-D video (Three-dimensional imaging) |
dc.subject.lcsh | Image processing--Digital techniques |
dc.subject.other | 3D-CNN |
dc.subject.other | action recognition |
dc.subject.other | action tube extraction
extraction |
dc.subject.other | indexing (of information) |
dc.subject.other | 3D-CNN |
dc.subject.other | action recognition |
dc.subject.other | CNN models |
dc.subject.other | spatial regions |
dc.subject.other | state-of-the-art methods |
dc.subject.other | structural similarity indices (SSIM) |
dc.subject.other | temporal sampling |
dc.subject.other | two-stream |
dc.subject.other | tubes (components) |
dc.title | Action tube extraction based 3D-CNN for RGB-D action recognition |
dc.type | Conference report |
dc.subject.lemac | Vídeo digital |
dc.subject.lemac | Visualització tridimensional (Informàtica) |
dc.subject.lemac | Imatges -- Processament -- Tècniques digitals |
dc.contributor.group | Universitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo |
dc.identifier.doi | 10.1109/CBMI.2018.8516450 |
dc.description.peerreviewed | Peer Reviewed |
dc.rights.access | Open Access |
local.identifier.drac | 23551422 |
dc.description.version | Postprint (published version) |
local.citation.author | Xu, Z.; Vilaplana, V.; Morros, J.R. |
local.citation.contributor | International Workshop on Content-Based Multimedia Indexing |
local.citation.publicationName | 16th International Conference on Content-Based Multimedia Indexing: 4-6 September, 2018 La Rochelle, France |
local.citation.startingPage | 1 |
local.citation.endingPage | 6 |