Predicting the outcome of a chess game by statistical and machine learning techniques
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/106389
Tipus de documentProjecte Final de Màster Oficial
Data2016-10
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
This document will go through the process of Big Data analytics, which combines computer
science, data warehousing and applied statistics. We plan to to predict the result of Chess
matches after a twenty full movements. To do this we are constrained to work with the complete
database that was provided at the start of this project.
The Gorgo Base [12] consist of around three million matches, comes in an unknown database
format, and once we were able to read it, we were confronted with it’s size, this database is
able to overwhelm any computer that tried to compute many operations at the same time, this
was one major challenge to overcome. As with database this size, we had to spend significant
of resources filtering out missing and faulty data.
To process this database we had to tokenize it, separate it into chunks we could actually
compute, and then we started aggregating and filtering data. Aggregating data is an important
part of any dataset creation, using all the database we were for example able to capture the
average ELO of all the players we found. We also generated the score of every board later used
to predict game results. At this step we generated our test and train files, we separated 70%
to training and 30% for testing purposes.
One final challenge was to collect all the information of the board positions, this was challenging
because we wanted to keep a record of the historical results for every game that was in our
database, and to do this we had to compare and add results, and at the end we end up
recording thirteen million board historical records. We did the same with the historical record
of competitors, we stored their average ELO, and their results history, to create the competitors
database.
The biggest problem in predicting chess matches is the enormous amount of legally possible
board positions, it has been estimated at 1043 by Shanon [16] and others, but since we are not
taking into account the endgame, because we want to predict the result at an early stage, we
believe that we might be able to use the information on matches of this database.
Finally we gathered all the data from our three sources, the refined Gorgo Base, the Movement
history, and the Competitors records, to generate a dataset we could work with. We applied
an SVM with RBF kernel, and compared it to a random forest model. At the end we were satisfied with our results, which showed us how powerful using big data is to solve problems.
TitulacióMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
119749.pdf | 1,691Mb | Visualitza/Obre |