Mostra el registre d'ítem simple
Efficient transformers for direct speech translation
dc.contributor | Ruiz Costa-Jussà, Marta |
dc.contributor | Gallego Olsina, Gerard Ion |
dc.contributor.author | Alastruey Lasheras, Belén |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Ciències de la Computació |
dc.date.accessioned | 2021-07-14T12:19:19Z |
dc.date.available | 2021-07-14T12:19:19Z |
dc.date.issued | 2021-07 |
dc.identifier.uri | http://hdl.handle.net/2117/349294 |
dc.description.abstract | In this thesis, we propose a new approach for Speech-to-Text translation, where thanks to an efficient Transformer we can work with a spectrogram without having to use convolutional layers before the Transformer. This allows the encoder to learn directly from the spectrogram and no information is lost, which we believe could be profitable. We have created an encoder-decoder model, where the encoder is an efficient Transformer -the Longformer- and the decoder is a traditional Transformer decoder. Firstly we trained our model for an Automatic Speech Recognition (ASR) task, and then for Speech Translation using the ASR pre-trained encoder. Our results are close to the ones obtained with convolutional layers and a regular Transformer, showing less than a 10% relative reduction of the performance, meaning that this is a great starting point for a promising research path. |
dc.language.iso | eng |
dc.publisher | Universitat Politècnica de Catalunya |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ |
dc.subject | Àrees temàtiques de la UPC::Matemàtiques i estadística |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial |
dc.subject.lcsh | Artificial intelligence |
dc.subject.other | Deep Learning |
dc.subject.other | Transformer |
dc.subject.other | Speech-to-Text |
dc.subject.other | Speech Translation |
dc.subject.other | Machine Translation |
dc.subject.other | Neural Network |
dc.subject.other | Automatic Speech Recognition |
dc.title | Efficient transformers for direct speech translation |
dc.type | Bachelor thesis |
dc.subject.lemac | Intel·ligència artificial |
dc.subject.ams | Classificació AMS::68 Computer science::68T Artificial intelligence |
dc.identifier.slug | FME-2152 |
dc.rights.access | Open Access |
dc.date.updated | 2021-07-14T05:22:35Z |
dc.audience.educationlevel | Grau |
dc.audience.mediator | Universitat Politècnica de Catalunya. Facultat de Matemàtiques i Estadística |
dc.audience.degree | GRAU EN MATEMÀTIQUES (Pla 2009) |