Mostra el registre d'ítem simple
Modeling long-term interactions to enhance action recognition
dc.contributor.author | Cartas Ayala, Alejandro |
dc.contributor.author | Radeva, Petia |
dc.contributor.author | Dimiccoli, Mariella |
dc.contributor.other | Institut de Robòtica i Informàtica Industrial |
dc.date.accessioned | 2021-09-14T09:28:40Z |
dc.date.available | 2021-09-14T09:28:40Z |
dc.date.issued | 2021 |
dc.identifier.citation | Cartas, A.; Radeva, P.; Dimiccoli, M. Modeling long-term interactions to enhance action recognition. A: International Conference on Pattern Recognition. "Proceedings of ICPR 2020: 25th International Conference on Pattern Recognition: Milan, 10–15 January 2021". Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 10351-10358. ISBN 978-1-7281-8808-9. DOI 10.1109/ICPR48806.2021.9412148. |
dc.identifier.isbn | 978-1-7281-8808-9 |
dc.identifier.uri | http://hdl.handle.net/2117/351241 |
dc.description | © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
dc.description.abstract | In this paper, we propose a new approach to understand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels. At the frame level, we use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects and calculates the action score through a CNN formulation. This information is then fed to a Hierarchical Long Short-Term Memory Network (HLSTM) that captures temporal dependencies between actions within and across shots. Ablation studies thoroughly validate the proposed approach, showing in particular that both levels of the HLSTM architecture contribute to performance improvement. Furthermore, quantitative comparisons show that the proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks, without relying on motion information. |
dc.description.sponsorship | This work was partially supported by CONACYT grant 366596, TIN2018-095232-B-C21, SGR-2017 1742, Nestore project of the European Commission Horizon 2020 programme (Grant No769643), Validithi EIT Health program and CERCA Programme/Generalitat de Catalunya, MINECO/ERDF-EU through the program Ramon y Cajal, projects PID2019-110977GA-I00 and RED2018-102511-T. We thank the support of NVIDIA Corporation for hardware donation |
dc.format.extent | 8 p. |
dc.language.iso | eng |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Spain |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Automàtica i control |
dc.subject.other | Pattern recognition |
dc.title | Modeling long-term interactions to enhance action recognition |
dc.type | Conference report |
dc.contributor.group | Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI |
dc.identifier.doi | 10.1109/ICPR48806.2021.9412148 |
dc.description.peerreviewed | Peer Reviewed |
dc.subject.inspec | Classificació INSPEC::Pattern recognition |
dc.relation.publisherversion | https://ieeexplore.ieee.org/document/9412148/ |
dc.rights.access | Open Access |
local.identifier.drac | 31844658 |
dc.description.version | Postprint (author's final draft) |
local.citation.author | Cartas, A.; Radeva, P.; Dimiccoli, M. |
local.citation.contributor | International Conference on Pattern Recognition |
local.citation.publicationName | Proceedings of ICPR 2020: 25th International Conference on Pattern Recognition: Milan, 10–15 January 2021 |
local.citation.startingPage | 10351 |
local.citation.endingPage | 10358 |