Merging datasets for emotion analysis
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
ProjectDESARROLLO, OPERATIVA Y GOBERNANZA DE DATOS PARA SISTEMAS SOFTWARE BASADOS EN APRENDIZAJE AUTOMATICO (AEI-PID2020-117191RB-I00)
Context. Applying sentiment analysis is in general a laborious task. Furthermore, if we add the task of getting a good quality dataset with balanced distribution and enough samples, the job becomes more complicated. Objective. We want to find out whether merging compatible datasets improves emotion analysis based on machine learning (ML) techniques, compared to the original, individual datasets. Method. We obtained two datasets with Covid-19-related tweets written in Spanish, and then built from them two new datasets combining the original ones with different consolidation of balance. We analyzed the results according to precision, recall, F1-score and accuracy. Results. The results obtained show that merging two datasets can improve the performance of ML models, particularly the F1-score, when the merging process follows a strategy that optimizes the balance of the resulting dataset. Conclusions. Merging two datasets can improve the performance of ML models for emotion analysis, whilst saving resources for labeling training data. This might be especially useful for several software engineering activities that leverage on ML-based emotion analysis techniques.
CitationDe Arriba, A.; Oriol, M.; Franch, X. Merging datasets for emotion analysis. A: International Workshop on Software Engineering Automation: A Natural Language Perspective. "2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops: 15-19 November 2021, online event: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2021, p. 227-231. ISBN 978-1-6654-3583-3. DOI 10.1109/ASEW52652.2021.00051.