Spectral automated classification in large databases

Echeverry Valencia, Cristian David

Visualitza/Obre

memoria.pdf (23,11Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Echeverry Valencia, Cristian David

Correu electrònic de l'autordavid.echeverry.valencia

gmail.com

Tutor / directorRebassa Mansergas, Alberto

; Torres Gil, Santiago

Tipus de documentTreball Final de Grau

Data2021-07-15

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

Due to the vast amount of data collected every day, there exists a need of modelling Machine Learning algorithms that are able to manipulate and link the raw data with as little human supervision as possible. One of the most popular is the Random Forest, which can be used to solve a great variety of categorization tasks. Particularly, in Astronomy millions of objects are captured by satellites and telescopes, for instance by the Gaia space mission, and the receiving signals are displayed in a spectrum. Random Forest algorithms have been proven to be a versatile and powerful tool in identifying and classifying stellar populations. In the present project, we apply a Random Forest algorithm based on spectroscopic analysis with the aim of efficiently classifying three different populations of stars of particular interest. Our main objective is to study the principle parameters and variables that affect the classification performance of the algorithm, and also to model the Random Forest to categorize observed spectra by current and future missions. We aim to obtain the best results according to the characteristics of each population, while maintaining an efficient and versatile model. To achieve that, we rely on both simulated and observed spectra to train and test the algorithm, and on quantitative metrics to measure its performance. Along this project, we have set the basis of the modelled Random Forest classifier and the preparation of the data, analyzing the theoretical classification with simulated data. We have classified with the Random Forest model a real set of spectroscopic data collected by the Sloan Digital Sky Survey, which revealed a notable agreement between the human-made and the Random Forest classifications, greatly enhanced after the application of different improvements to the algorithm. Finally, we simulated spectra of the expected observed population that will be released by the Gaia space mission, and built a Random Forest model based on it. Several improvements were introduced, but we could eventually achieve a solid model with satisfactory results. With that, we were able to classify two different sets of stellar spectra with different characteristics, maximizing the number of well classified objects while minimizing the amount of false positives.

MatèriesAlgorithms, Statistical astronomy, Algorismes

TitulacióGRAU EN ENGINYERIA DE SISTEMES AEROESPACIALS/GRAU EN ENGINYERIA DE SISTEMES DE TELECOMUNICACIÓ (Pla 2015)

URIhttp://hdl.handle.net/2117/350341

Col·leccions

Escola d'Enginyeria de Telecomunicació i Aeroespacial de Castelldefels - Grau en Enginyeria de Sistemes Aeroespacials + Enginyeria de Sistemes de Telecomunicació (Pla 2015) [68]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
memoria.pdf		23,11Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Spectral automated classification in large databases

Visualitza/Obre

Explora