The temporal dimension of visual attention models
Tutor / director / evaluatorGiró Nieto, Xavier
Document typeBachelor thesis
Rights accessOpen Access
This thesis explores methodologies for scanpath prediction on images using deep learning frameworks. As a preliminary step, we analyze the characteristics of the data provided by different datasets. We then explore the use of Convolutional Neural Networks (CNN) and Long-Short-Term-Memory (LSTM) newtworks for scanpath prediction. We observe that these models fail due to the high stochastic nature of the data. With the gained insight, we propose a novel time-aware visual saliency representation named Saliency Volume, that averages scanpaths over multiple observers. Next, we explore the SalNet network and adapt it for saliency volume prediction, and we find several ways of generating scanpaths from saliency volumes. Finally, we fine-tuned our model for scanpaht prediction on 360-degree images and successfully submitted it to the Salient360! Challenge from ICME. The source code and models are publicly available at https://github.com/massens/saliency-360salient-2017.
Details of the project will be defined once the student is in Dublin.