Efficient transformers for direct speech translation
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/349294
Tipus de documentTreball Final de Grau
Data2021-07
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
In this thesis, we propose a new approach for Speech-to-Text translation, where thanks to an efficient Transformer we can work with a spectrogram without having to use convolutional layers before the Transformer. This allows the encoder to learn directly from the spectrogram and no information is lost, which we believe could be profitable. We have created an encoder-decoder model, where the encoder is an efficient Transformer -the Longformer- and the decoder is a traditional Transformer decoder. Firstly we trained our model for an Automatic Speech Recognition (ASR) task, and then for Speech Translation using the ASR pre-trained encoder. Our results are close to the ones obtained with convolutional layers and a regular Transformer, showing less than a 10% relative reduction of the performance, meaning that this is a great starting point for a promising research path.
TitulacióGRAU EN MATEMÀTIQUES (Pla 2009)
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
memoria.pdf | 1,822Mb | Visualitza/Obre |