Spectral automated classification in large databases
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/350341
Correu electrònic de l'autordavid.echeverry.valenciagmail.com
Tipus de documentTreball Final de Grau
Data2021-07-15
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
Due to the vast amount of data collected every day, there exists a need of modelling Machine Learning algorithms that are able to manipulate and link the raw data with as little human supervision as possible. One of the most popular is the Random Forest, which can be used to solve a great variety of categorization tasks. Particularly, in Astronomy millions of objects are captured by satellites and telescopes, for instance by the Gaia space mission, and the receiving signals are displayed in a spectrum. Random Forest algorithms have been proven to be a versatile and powerful tool in identifying and classifying stellar populations. In the present project, we apply a Random Forest algorithm based on spectroscopic analysis with the aim of efficiently classifying three different populations of stars of particular interest. Our main objective is to study the principle parameters and variables that affect the classification performance of the algorithm, and also to model the Random Forest to categorize observed spectra by current and future missions. We aim to obtain the best results according to the characteristics of each population, while maintaining an efficient and versatile model. To achieve that, we rely on both simulated and observed spectra to train and test the algorithm, and on quantitative metrics to measure its performance. Along this project, we have set the basis of the modelled Random Forest classifier and the preparation of the data, analyzing the theoretical classification with simulated data. We have classified with the Random Forest model a real set of spectroscopic data collected by the Sloan Digital Sky Survey, which revealed a notable agreement between the human-made and the Random Forest classifications, greatly enhanced after the application of different improvements to the algorithm. Finally, we simulated spectra of the expected observed population that will be released by the Gaia space mission, and built a Random Forest model based on it. Several improvements were introduced, but we could eventually achieve a solid model with satisfactory results. With that, we were able to classify two different sets of stellar spectra with different characteristics, maximizing the number of well classified objects while minimizing the amount of false positives.
TitulacióGRAU EN ENGINYERIA DE SISTEMES AEROESPACIALS/GRAU EN ENGINYERIA DE SISTEMES DE TELECOMUNICACIÓ (Pla 2015)
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
memoria.pdf | 23,11Mb | Visualitza/Obre |