Performance evaluation of different machine learning methods applied on churn database
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/375453
Tutor / directorPijuan Casanova, Pau
Tipus de documentProjecte Final de Màster Oficial
Data2022-10-31
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
The growth of data and its storage is becoming more and more important every day. However, occasionally this information is gathered but never used, or perhaps it is improperly gathered, making the extraction of the insides difficult. As a result, while beginning any project, choosing the analysis method is just as crucial as choosing the design of the data collection strategy.
Most of the time, we only focus on the analysis of the data and do not consider how it was gathered or whether the fields were actually valuable or just added noise to what we were searching for. For this reason, a trustworthy data set has been chosen for this project. The data came from a telecom company, which, like other modern businesses, collects a lot of data. However, in this case, the data was published on the machine learning web competition Kaggle, where participants competed to build the best model to predict consumer behaviour. One of the key considerations in optimizing any organization's income is preventing customer churn. It happens when customers quit utilizing a company's goods or services, and is also referred to as customer attrition. The main goal of this master's thesis is to analyse a Churn database and categorise the clients in order to determine whether they are likely to leave the company. To do this, two machine learning techniques will be used in the current document. Extreme Gradient Boosting and Random Forest. In order to achieve high performance, the Random Forest (RF) method creates a large number of low-performance models and combines them. In this case, the lower-performance method is called Decision Tree, so it will be explained in more detail in the following document. Similar work is done by eXtreme Gradient Boosting (XGB), although it builds new models based on earlier findings. Both are quite effective predictor models, even with unbalanced data, as will be demonstrated in the next document. This adds another level of complexity that the algorithms must overcome to execute effectively. Different performance indicators will be provided and examined in order to determine which one is the greatest indicator to choose the best model during the process of determining the best model. Sensitivity, Specificity, Precision, F1 Score, and Geometric Mean are a few of the markers that are listed. Additionally, their trends for the various parameter values of the examined models will be shown and analysed. The strong performance of these machine learning algorithms will once more be supported in this thesis. the affirmation of the significance and practical use of these methodologies, as in the case of this project, to comprehend processes and behaviours. All fields can benefit from the information gleaned, and a successful application will undoubtedly yield financial rewards. The two machine learning applied algorithms' default and best models are finally shown, and their advantages and disadvantages will be evaluated while taking into account the many scenarios that exist. This thesis will demonstrate the good performance of both models, with XGB significantly outperforming RF. It will also demonstrate that while XGB performs better on precision and RF has better results on sensitivity
MatèriesAutomatic classification -- Software -- Design and construction, Machine learning -- Evaluation -- Mathematical models, Classificació automàtica -- Programari -- Disseny i construcció, Aprenentatge automàtic -- Avaluació -- Models matemàtics
TitulacióMÀSTER UNIVERSITARI EN ENGINYERIA INDUSTRIAL (Pla 2014)
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
memoria-rodriguez-acoran.docx.pdf | 2,682Mb | Visualitza/Obre |