Video understanding through the disentanglement of appearance and motion

Arenas Gallego, Carlos

dc.contributor	Palacio, Sebastian
dc.contributor	Giró Nieto, Xavier
dc.contributor	Campos, Víctor
dc.contributor.author	Arenas Gallego, Carlos
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned	2019-01-17T11:20:19Z
dc.date.available	2019-01-17T11:20:19Z
dc.date.issued	2018-10
dc.identifier.uri	http://hdl.handle.net/2117/127080
dc.description	Self-supervised feature learning from video.
dc.description.abstract	Understanding the inner workings of deep learning algorithms is key to efficiently exploit the large number of videos that are generated every day. For the self-supervised learning of the spatio-temporal information contained within these videos, there are several types of algorithms based on convolutional neural networks (CNNs) following an auto-encoder style architecture. However, we have checked that this type of models, trained for the frame prediction task, learn jointly these spatio-temporal information, so the model is not able to recognize appearance-motion combinations not seen during training. Our proposed model, called DisNet, can learn separately the appearance and motion through disentanglement, so that it solves the generalization and scalability problems. To demonstrate this, we conducted numerous experiments under highly controlled conditions, generating specific datasets that make the "conventional" model fails for the appearance and motion classification tasks, and analyzing how well our proposal behaves under the same conditions.
dc.description.abstract	Entender el funcionamiento de los algoritmos de aprendizaje profundo es clave para poder explotar de manera eficiente la gran cantidad de vídeos que se generan cada día. Para el aprendizaje auto-supervisado de la información espacio-temporal contenida en los vídeos se emplean diversos tipos de algoritmos basados en redes neuronales convolucionales (CNNs) siguiendo una arquitectura de tipo auto-encoder. Sin embargo, hemos comprobado que este tipo de modelos, entrenados para la tarea de predicción de frames, aprenden de forma combinada esta información espacio-temporal, de modo que el modelo no es capaz de reconocer combinaciones apariencia-movimiento no vistas durante el entrenamiento. Nuestro modelo propuesto, denominado DisNet, es capaz de aprender de forma separada la apariencia y el movimiento mediante disentanglement, de modo que resuelve el problema de generalización y escalabilidad. Para demostrarlo, realizamos numerosos experimentos bajo condiciones muy controladas, generando bases de datos específicas que hagan fallar al modelo "convencional" para la tarea de clasificación de apariencia y movimiento, y analizando cómo de bien se comporta nuestra propuesta bajo las mismas condiciones.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.rights	S'autoritza la difusió de l'obra mitjançant la llicència Creative Commons o similar 'Reconeixement-NoComercial- SenseObraDerivada'
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Enginyeria de la telecomunicació
dc.subject.lcsh	Neural networks (Computer science)
dc.subject.lcsh	Machine learning
dc.subject.other	deep learning
dc.subject.other	convolutional neural networks
dc.subject.other	auto-encoders
dc.subject.other	motion
dc.subject.other	aprendizaje profundo
dc.subject.other	redes neuronales convolucionales
dc.subject.other	disentanglement
dc.subject.other	movimiento
dc.subject.other	appearance
dc.subject.other	apariencia
dc.title	Video understanding through the disentanglement of appearance and motion
dc.title.alternative	Comprensión de vídeo a través del desenredo de apariencia y movimiento
dc.type	Master thesis
dc.subject.lemac	Xarxes neuronals (Informàtica)
dc.subject.lemac	Aprenentatge automàtic
dc.identifier.slug	ETSETB-230.135871
dc.rights.access	Open Access
dc.date.updated	2018-10-26T05:50:30Z
dc.audience.educationlevel	Màster
dc.audience.mediator	Escola Tècnica Superior d'Enginyeria de Telecomunicació de Barcelona
dc.audience.degree	MÀSTER UNIVERSITARI EN ENGINYERIA DE TELECOMUNICACIÓ (Pla 2013)

Fitxers d'aquest items

Nom:: TFM_Carlos_Arenas_Gallego(MET_ ...
Mida:: 1,056Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Master's degree in Telecommunications Engineering (MET) [392]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Video understanding through the disentanglement of appearance and motion

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora