Show simple item record

dc.contributorBelanche Muñoz, Luis Antonio
dc.contributor.authorMorán Pomés, David
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.date.accessioned2021-02-22T09:40:07Z
dc.date.available2021-02-22T09:40:07Z
dc.date.issued2020-10-22
dc.identifier.urihttp://hdl.handle.net/2117/340266
dc.description.abstractKernel methods (KM) and Artificial Neural Networks (ANN) are two of the main families of methods that are more widely spread nowadays in the world of Machine Learning, being the Support Vector Machine (SVM) and the Deep Learning Neural Network (DLNN) its most known members. One of the main benefits of both families is how they can be adapted to work with different kinds of input data, even though they do it in a different way: KM rely on different kernel functions and ANN rely on specific network architectures. In the specific world of Natural Language Processing, both families have specific resources to deal with texts as inputs. While KM rely on specific kernels like the string kernel, ANN rely on Recurrent Neural Network (RNN) architectures and, more recently, on Transformer architectures. Even though KM were the first ones to dive into the NLP world, ANN are nowadays the state-of-the-art in all kinds of NLP tasks. The main reason of this success is the fact that ANN perform as a nonlinear parametric feature extractor, which processes the text in a complex way that lets the model learn an adequate representation of the input. On the other side, kernel functions do only rely on fixed similarity measures to contain that information, which is less flexible and powerful. This is the reason why we developed a new kernel function, called Transformer kernel. This kernel function is a parametric model like an ANN, but instead of computing a nonlinear representation of the input, it directly computes a similarity measure that corresponds to an inner product in some implicit feature space. The main idea of this model is that it can be pre-trained in a generic task using the whole Wikipedia corpus (like the different Transformer models for NLP tasks), but the resulting model can be directly used as a kernel function for any kind of kernel method (like the SVM). In order to do so, we build this model by stacking two different components: A Transformer Encoder block (based on the BERT architecture) and a newly designed kernel function called Attention Kernel. By using this kernel function with real text datasets, we have shown that it provides results that are much better than the ones obtained by the use of the string kernel. We have also seen that, due to the fact that our model is built as a Neural Network and can be executed on a GPU, the Transformer kernel can compute the kernel matrix of a dataset much faster than the classical string kernel. Results are very promising, and suggest that further study of the behaviour of this kernel function with different datasets for pre-training and different final tasks will be very interesting. Moreover, it has to be considered if it is possible to adapt this kernel in order to combine the text input with other kinds of data (like numerical or categorical variables), or if the kernel can be fine-tuned into a specific task before it is used as a kernel function.
dc.language.isoeng
dc.publisherUniversitat Politècnica de Catalunya
dc.subjectÀrees temàtiques de la UPC::Enginyeria de la telecomunicació::Telemàtica i xarxes d'ordinadors
dc.subject.lcshKernel functions
dc.subject.lcshNeural networks (Computer science)
dc.subject.otherTransformers
dc.subject.otherKernel Methods
dc.subject.otherNeural Networks
dc.subject.otherKernel Functions
dc.subject.otherNLP
dc.subject.otherAttention
dc.titleTransformer-based kernels for Natural Language Processing
dc.title.alternativeTransformer-based kernels for NLP
dc.typeMaster thesis
dc.subject.lemacKernel, Funcions de
dc.subject.lemacXarxes neuronals (Informàtica)
dc.identifier.slug152729
dc.rights.accessOpen Access
dc.date.updated2020-11-05T05:00:32Z
dc.audience.educationlevelMàster
dc.audience.mediatorFacultat d'Informàtica de Barcelona
dc.audience.degreeMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record