Cultural event recognition with visual ConvNets and temporal models

Salvador Aguilera, Amaia; Manchon Vizuete, Daniel; Calafell, Andrea; Giró Nieto, Xavier; Zeppelzauer, Matthias

Visualitza/Obre

Open Access version from the Computer Vision Foundation (1,174Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Salvador Aguilera, Amaia

Manchon Vizuete, Daniel

Calafell, Andrea

Giró Nieto, Xavier

Zeppelzauer, Matthias

Tipus de documentComunicació de congrés

Data publicació2015

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

This paper presents our contribution to the ChaLearn Challenge 2015 on Cultural Event Classification. The challenge in this task is to automatically classify images from 50 different cultural events. Our solution is based on the combination of visual features extracted from convolutional neural networks with temporal information using a hierarchical classifier scheme. We extract visual features from the last three fully connected layers of both CaffeNet (pretrained with ImageNet) and our fine tuned version for the ChaLearn challenge. We propose a late fusion strategy that trains a separate low-level SVM on each of the extracted neural codes. The class predictions of the low-level SVMs form the input to a higher level SVM, which gives the final event scores. We achieve our best result by adding a temporal refinement step into our classification scheme, which is applied directly to the output of each low-level SVM. Our approach penalizes high classification scores based on visual features when their time stamp does not match well an event-specific temporal distribution learned from the training and validation data. Our system achieved the second best result in the ChaLearn Challenge 2015 on Cultural Event Classification with a mean average precision of 0.767 on the test set.

CitacióSalvador, A., Manchon, D., Calafell, A., Giro, X., Zeppelzauer, M. Cultural event recognition with visual ConvNets and temporal models. A: Challenge and Workshop on Pose Recovery, Action Recognition, and Cultural Event Recognition. "Proceedings of ChaLearn 2015: Challenge and Workshop on Pose Recovery, Action Recognition, and Cultural Event Recognition". Boston: 2015, p. 36-44.

URIhttp://hdl.handle.net/2117/81947

Versió de l'editorhttp://www.cv-foundation.org/openaccess/CVPR2015_workshops/menu.py

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Salvador_Cultur ... nition_2015_CVPR_paper.pdf	Open Access version from the Computer Vision Foundation	1,174Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Cultural event recognition with visual ConvNets and temporal models

Visualitza/Obre

Explora