Sentiment analysis on Twitter
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/100796
Realitzat a/ambServiZurich
Tipus de documentProjecte Final de Màster Oficial
Data2017-02
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
In recent years more and more people have been connecting with Social Networks. One
of the most used is Twitter. This huge amount of information is attracting the interest
of companies. One reason is that this huge source of information can be used to detect
public opinion about their brands and thus improve their business values.
In order to transform the information present in the Social Networks into knowledge
several steps are required. This project aim to describe them and provide tools that are
able to perform this task.
The first problem is how to retrieve the data. Several ways are available, each one with
its own pros and cons. After that it is necessary to study and define proper queries in
order to retrieve the information needed.
Once the data is retrieved you may need to filter and explore your data. For this task
a Topic Model Algorithm ( LDA ) has been studied and analyzed. LDA has shown
positive results when it is tuned in the proper way and it is combined with appropriate
visualization techniques. The difference between a Topic Model Algorithm and other
Clustering/Segmentation techniques is that Topic Models allows each ”document” (
instance ) to belong to more than one topic ( cluster ).
LDA doesn’t natively work well on Twitter due to the very short length of the tweets. An
investigation in the literature has revealed a solution to this problem. Another problem
that is common in clustering is how to validate the Algorithm and how to choose the
proper number of topics ( clusters), for this problem several metrics in the literature
have been explored.
Afterwards, Sentiment Analysis techniques can be applied in order to measure the opinion
of the users . The literature presents several approaches and ways to solving this
problem. This work is focused in solving the Polarity Detection task, with three classes
, so, classify if a tweet express a positive , a negative or a neutral sentiment. Here
reach accurate results can be challenging, due to the messy nature of the twitter posts.
Several approaches have been tested and compared. The baseline method tested is the
use of sentiment dictionaries, after that , since the real sentiment of the twitter posts
is not available, a sample has been manually labeled and several Supervised approaches
combined with various Feature Selection/Transformation techniques have been tested.
Finally, a totally new experimental approach, inspired from the Soft Labeling technique
present in the literature, has been defined and tested. This method try to avoid the
costly task to manually label a sample in order to validate a model. In the literature
this problem is solved for the two-class problem, so by considering only positive and
negative tweets. This work try to extend the soft-labeling approach to the three class
problem.
TitulacióMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
123565.pdf | 2,911Mb | Visualitza/Obre |