Active learning algorithms for multitopic classification
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/360302
Tipus de documentProjecte Final de Màster Oficial
Data2021-07-08
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
In this master thesis we develop a model that surpasses previous studies to be able to detect cyberbullying and other disorders that are a common behaviour in teenagers. We analyze short sentences in social media with new techniques that haven?t been studied in depth in language processing in order to be able to detect these problems. Deep learning is nowadays the common approach for text analysis. However, struggling with dataset size is one of the most common problems. It is not optimal to dedicate thousands of hours to label data by humans every time we want to create a new model. Different techniques have been used over the years to solve or at least minimize this problem, for instance transfer learning or self-learning. One of the most known ways to solve this is by data augmentation. In this thesis we make use of active learning and self-training to address having restrictions of labelled data. We have used data that has not been labeled to improve the performance of our models. The architecture of the model is composed of a Bert model plus a linear layer that projects the Bert sentence embedding into the number of classes we want to detect. We take advantage of this already functional model to label new data that we will use afterwards to create our final model. Using noise techniques we modify the data so the final model has to predict less structured data and learn from difficult scenarios. Thanks to this technique we were able to improve the results in some of the classes, for instance the F-score modified increases by 7% for substance abuse (drugs, alcohol, etc) and 3% in disorders (anxiety, depression and distress) while keeping the performance of the other classes.
MatèriesDeep learning, Machine learning, Active learning, Aprenentatge profund, Aprenentatge automàtic, Aprenentatge actiu
TitulacióMÀSTER UNIVERSITARI EN TECNOLOGIES AVANÇADES DE TELECOMUNICACIÓ (Pla 2019)
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Active learning ... i topic classification.pdf | 743,9Kb | Visualitza/Obre |