Food related scene recognition in egocentric images
Tutor / director / evaluatorRadeva, Petia
Document typeMaster thesis
Rights accessOpen Access
Lifelogging is a raising field nowadays with the normalization of many devices that collect data from our daily routines. Egocentric cameras are particularly interesting devices that allow us to capture very rich information about the life of the wearer, including his/her social interactions, activities and contexts where he or she spends the day. Context or scene is one of the things that influences us most, in almost every aspect of our lives, and also one of the most challenging things to log, analyze and visualize with an automatic device. But, among all kind of contexts, one of the most important is the one related with food. We are what we eat, and we eat depending on where we are. So, in order to keep track of a person’s relation with food related environments, we are going to propose a deep learning based approach in order to perform food related scene recognition in images gathered from an egocentric camera. We explore in detail and propose an optimal framework for food related environment recognition. Moreover, we introduce a new egocentric dataset called Egoplaces, that contains over 60.000 thousand labeled images distributed in 28 categories, corresponding to 27 food related scenes and one non food related, and we propose several techniques to automatically classify the environment the user is seeing. We had to face several challenges, including a small amount of images, images with small range of view and noise, and, particularly, the problem of having a very unbalanced dataset. We propose several techniques to deal with it, using deep convolutional networks to do the classification, and varying the training strategy. We explore the possibilities of learning incrementally by doing several training iterations introducing new categories in each, choosing the most frequent labels first. We also propose a hierarchical learning strategy, by exploiting the semantic relations among the labels, and learning from less to more specific. We explore the possibility of applying Bayesian inference when doing hierarchical classification. Finally, we propose to introduce repeated images in our dataset in order to overcome the unbalanced problem, and a post-classification smoothing technique based on K-Nearest Neighbours algorithm that exploits the fact of egocentric images coming in a sequence.
En col·laboració amb la Universitat de Barcelona (UB)