Application of data mining technology to analyze and predict academic performance
Tutor / director / evaluatorTalavera Mendez, Luis José
Document typeBachelor thesis
Rights accessOpen Access
This project attempts to predict the performance of students in their third semester (the first semester of their second year) of the bachelor’s degree in Industrial Technology Engineering based on their marks on their first year. To do so two models will be used, decision tree and random forest, as well as several evaluation metrics. The four evaluation metrics that will be used are accuracy, recall, precision and F1. From all of them the more important will be F1 because is the one that provides a more balanced explanation of how our model is performing. The objective will be to evaluate how each model performs and compare them, as well as to study how the parameters of each model affect them and in which way. The decision tree will use two parameters, Max_Depth and Min_Sample_Split, while the random forest will use four, the former two as well as N_Estimators and Max_Features. It will be interesting to see how the two decision tree parameters affect the random forest. Finally this project will find the best combination of parameters for each model achieving an optimized F1. Those results will be compared and they will establish if either model would be worth implementing by the teachers of the third semester subjects.