Text summarization of online hotel reviews with sentiment analysis
Visualitza/Obre
163268.pdf (5,915Mb) (Accés restringit)
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/361216
Tipus de documentProjecte Final de Màster Oficial
Data2021-10-16
Condicions d'accésAccés restringit per decisió de l'autor
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
The aim of this thesis is the creation of a system that summarizes positive and negative property reviews. To achieve this, an extractive summarization system that produces two summaries is proposed: one for the positive reviews and another for the negative ones. This is achieved with a classification system that will feed positive and nega- tive reviews to the summarization system. To pursue our objective, a study on the different NLP methods, along with their pros and cons, was performed, leading to the conclu- sion that the use of transformers and more specifically, the combination of BERT and GPT-2 architectures, would be the best approach. To obtain the data from TripAdvisor that is in StayForLong website, a crawling process was performed from the StayForLong and TripAdvi- sor. These consisted on a total of over 80000 reviews, and over 175 properties that we pre-processed, cleaned and tokenized, in order to work with BERT for the sentiment analysis and GPT-2 for the sum- marization. Then we proceeded, with an extensive analysis in regards to the impact of the variables. Finally, we fine-tuned each of the mod- els so that it performed at its possible best. To evaluate our two systems, we evaluated the the binary sen- timent classification system, with multi-modal BERT with a 96% of precision and for the GPT-2 summarization system, we opted to apply the ROUGE-F1 metric, were we obtained an average of 57.5%.
TitulacióMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
163268.pdf | 5,915Mb | Accés restringit |