Video Object Segmentation by Tracking Structured Key Points and Contours

Caelles Prat, Sergi

Visualitza/Obre

thesis.pdf (38,60Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Caelles Prat, Sergi

Tutor / directorPont Tuset, Jordi

Realitzat a/ambEidgenössische Technische Hochschule Zürich

Tipus de documentProjecte Final de Màster Oficial

Data2016-08

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

In this thesis, we tackle the problem of video object segmentation where we have to classify every pixel of every frame in a video sequence into background and foreground classes. Our algorithms fall in the semi-supervised category, i.e., they start with the object of interest annotated in the first frame and then they track and segment that object in the following frames. The first algorithm that we have implemented describes the object of interest in terms of a set of points distributed on the object and then tracks them in the following frames. To make the tracking robust, we impose that the spatial distribution of these points is stable along the frames. To do so, we place a mesh on top of the mask of the object, whose vertices are the interest points to track, and the edges define the spatial structure within them. We then compute a descriptor of the appearance of each of the points and look for the displacements that bring those points in the following frame to a point with a similar descriptor. We enforce that the displacements of neighboring points are similar, which favors coherent deformations of the object. This algorithm may experience difficulties at the contours of the objects as the point descriptors might be influenced by the background. To overcome this problem, our second algorithm is based on the idea of tracking the contour of the object by imposing smooth deformations between frames. Starting from a polygonal representation of the contour of the object,we look for the locations at the following frame that have a strong response of an edge detector while minimizing the deformation of the shape. Specifically, we build a multiscale pyramid of segments of the contour polygon and look for the displacement of every segment that matches the edge response while being coherent with the rest of elements of the pyramid. This second algorithm can be understood as complementary to the first one, since it might fail in object with low-contrasted contours or with cluttered background. As an overall trade off, we propose a combination of the two algorithms that tries to make the most out of each of them and compensate their weaknesses. In order to validate our approaches, we perform an extensive validation on a recently-published database called DAVIS that provides fifty sequences with the ground truth annotated in each of their frames. We sweep all the different parameters of the algorithms in order to achieve the best performance in this database. The results show that the contour algorithm outperforms the mesh algorithm, so the weaknesses presented in the previous paragraph are more prominent in the mesh algorithm. Once we combine both of them, although we have not been able to do a full search in the parameter space, the results obtained are promising and an increase in the parameter space search suggests that we would outperform any of the standalone methods. We also perform a comparison against six state-of-the-art algorithms which shows that although we are still behind the better-performing ones, our approach might be competitive with further tuning and experimentation.

MatèriesDigital video, Computer vision, Machine learning, Vídeo digital, Visió per ordinador, Aprenentatge automàtic -- Algorismes

TitulacióMÀSTER UNIVERSITARI EN ENGINYERIA DE TELECOMUNICACIÓ (Pla 2013)

URIhttp://hdl.handle.net/2117/90646

Col·leccions

Màsters oficials - Master's degree in Telecommunications Engineering (MET) [393]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
thesis.pdf		38,60Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Video Object Segmentation by Tracking Structured Key Points and Contours

Visualitza/Obre

Explora