Word Level Approach for Tweets Classification based on its Content

Centellas Gil, Victor

Visualitza/Obre

Thesis_Final.pdf (2,326Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Centellas Gil, Victor

Tutor / directorSaito, Hiroaki

Realitzat a/ambKeiō Gijuku Daigaku

Tipus de documentProjecte Final de Màster Oficial

Data2018

Condicions d'accésAccés obert

Attribution-NonCommercial-NoDerivs 3.0 Spain

Llevat que s'hi indiqui el contrari, els continguts d'aquesta obra estan subjectes a la llicència de Creative Commons : Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya

Abstract

Twitter has become the largest microblogging platform where users can interact between each other expressing opinions, thoughts and feelings related to any topic or source of news in a compressed 280 character message, called tweet. Hashtags are popular keywords used to label these tweets according to its content. This work tries to nd out if the usage of hashtags to label tweets with similar content is accurate enough. To do so, tweets from di erent popular hashtags have been retrieved and processed in order to have a dataset with a content as close to reality as possible. Several embedding methods and learning algorithms have been studied to classify tweets from di erent hashtags based on the content. Results showed that the best performance is achieved when using the Tf-idf embedding method and support vectors machine. The learning algorithm obtained a precision around 90% for classi cation on 10 classes and above 70% when dealing with 100 classes trained on datasets of only 13680 and 143067 samples respectively. The results also indicated that BoW and Tf-idf methods outperformed other state of the art methods for other natural language processing tasks, such as GloVe or Word2Vec.

MatèriesOnline social networks, Xarxes socials en línia

URIhttp://hdl.handle.net/2117/170142

Col·leccions

Màsters oficials - Màster universitari en Automàtica i Robòtica [215]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Thesis_Final.pdf		2,326Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Word Level Approach for Tweets Classification based on its Content

Visualitza/Obre

Explora