Spatio-Temporal networks for few-shot video segmentation with annotation guidance

Garcia Valdés, Andrea

Visualitza/Obre

170739.pdf (6,960Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Garcia Valdés, Andrea

Tutor / directorGiorgos Tolias

Realitzat a/ambČeské vysoké učení technické v Praze

Tipus de documentTreball Final de Grau

Data2022-06-26

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

The project is dealing with the task of video semantic segmentation with respect to labeled data annotated by the user to indicate the underlying semantic classes. The current paradigm for segmentation methods and benchmark datasets is to segment objects in video provided a single annotation in the first frame. Instead we extend this setup to multiple annotated data, specifically, two different scenarios are proposed: having two annotated frames and having pixel-level annotations. For each of these settings, solutions have been explored, inspired by active learning, to offer guidance for choosing which data, whether frames or pixels, should be suitable for annotation. To achieve it, we will rely on previous works based on spatio-temporal networks for video object segmentation, an actual stateof-the-art approach. For each approach, an adaptation of the inference is done in order to be able to exploit the new data. Finally, different selection criteria will be explored based on the confidence predictions and uncertainty. When applying a selection criteria to choose which frame to annotate, the performance improves reasonably, in particular, we get up to 89% in segmentation performance on the DAVIS benchmark. When dealing with pixels, qualitative results do not increase as much, achieving over 87% on DAVIS17 when annotating around 100 and 200 pixels. Comparing both methods, we see that in some cases, annotating pixels is better considering the trade-off between the annotation cost and the percentage of improved segmentation.

MatèriesImage segmentation, Imatges -- Segmentació

TitulacióGRAU EN CIÈNCIA I ENGINYERIA DE DADES (Pla 2017)

URIhttp://hdl.handle.net/2117/370658

Col·leccions

Facultat d'Informàtica de Barcelona - Grau en Ciència i Enginyeria de Dades (Pla 2017) [118]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
170739.pdf		6,960Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Spatio-Temporal networks for few-shot video segmentation with annotation guidance

Visualitza/Obre

Explora