Implementation of a random forest machine learning algorithm in the context of Gaia space mission
Tutor / director / evaluatorTorres Gil, Santiago
Document typeMaster thesis
Rights accessOpen Access
Gaia space astrometry mission will scan about one billion stars an average of 70 times each over five years. During the mission time repeated astrometric, photometric and spectroscopic observations of the entire sky down to magnitude 20 will be recorded. In other words, Gaia will be able to build a complete three-dimensional map of 1 per cent of our Galaxy storing a huge amount of results from all stars observed with the highest quality ever achieved. All these large amount of astronomical data must be efficiently handled. The use of machine learning algorithms and other automatic classification strategies becomes essential in such a big data frame. The main objective of this master thesis consists to prepare and tested an efficient automatize machine learning algorithm. Five supervised models are considered in this work, becoming the Random Forest algorithm the model that present the best capabilities and performance. We will take advantage of a detailed simulator of the white dwarf population, provided by the Astronomy and Astrophysics Group of the Physics Department of the UPC. This simulator will provide us with a detailed synthetic population that will mimic the characteristic of the observed population of white dwarfs by Gaia. This synthetic population will be used in the learning stage of the Random Forest Algorithm, in order to optimize its implementation to the observed data. Once tested, our algorithm has been applied to the extracted data from available Gaia Data Releases in order to classify its content in the different subpopulations of Galaxy such as the halo or the disk. The accuracy obtained in the present work by our Random Forest algorithm (85\%) represents a substantial improvement with respect to other classical methods (55\%).