A study of Deep Learning techniques for sequence-based problems
Visualitza/Obre
161075.pdf (10,11Mb) (Accés restringit)
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/361621
Tipus de documentProjecte Final de Màster Oficial
Data2021-10
Condicions d'accésAccés restringit per decisió de l'autor
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
Transformer Networks are a new type of Deep Learning architecture first introduced in 2017. By only applying attention mechanisms, the transformer network can model relations between text sequences that outperformed other models in natural language processing tasks, such as language translation. In this work, we explore the capabilities of the transformer architecture to model sub-sequences of a time series, and we use this model to produce forecasts of longer horizons. We implement a transformer network model on a time series dataset that describes the daily aggregated sales of Camper, a shoes and apparel store. This model aims to capture the relation between two sub-sequences from the series and produces a forecast of a third sub-sequence in the future. We explore the different parts of the model and their relation to its performance, as well as the impact of modifying the shape of the input sequences used in training and inference. We use this model to forecast one year of data, and we compare these results with those of other, more classical approaches frequently used in time series forecasting, such as Autoregressive Integrated Moving Average (ARIMA) and Long-Short Term Memory (LSTM) networks. We further examine the capabilities of the model to exploit other features from the dataset, such as descriptors of the sales and temporal features from the target. Finally, we look at the attention maps produced by the attention mechanism implemented in the model and discuss its capability to explain the forecasts it produces. Our implementation shows that the model can exploit temporal features and produce forecasts that improve the proposed benchmarks in most scenarios, and that the attention plots produced provide some explainability guidelines that could be further explored.
TitulacióMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
161075.pdf | 10,11Mb | Accés restringit |