Word Level Approach for Tweets Classification based on its Content
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/170142
Tutor / directorSaito, Hiroaki
Realitzat a/ambKeiō Gijuku Daigaku
Tipus de documentProjecte Final de Màster Oficial
Data2018
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
Twitter has become the largest microblogging platform where users can interact
between each other expressing opinions, thoughts and feelings related to any topic
or source of news in a compressed 280 character message, called tweet. Hashtags
are popular keywords used to label these tweets according to its content. This
work tries to nd out if the usage of hashtags to label tweets with similar content
is accurate enough. To do so, tweets from di erent popular hashtags have been
retrieved and processed in order to have a dataset with a content as close to
reality as possible. Several embedding methods and learning algorithms have been
studied to classify tweets from di erent hashtags based on the content. Results
showed that the best performance is achieved when using the Tf-idf embedding
method and support vectors machine. The learning algorithm obtained a precision
around 90% for classi cation on 10 classes and above 70% when dealing with 100
classes trained on datasets of only 13680 and 143067 samples respectively. The
results also indicated that BoW and Tf-idf methods outperformed other state of
the art methods for other natural language processing tasks, such as GloVe or
Word2Vec.
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Thesis_Final.pdf | 2,326Mb | Visualitza/Obre |