From a topological approach for classification to KNN

Baltes Staubli, Simon

Visualitza/Obre

153052.pdf (596,2Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Baltes Staubli, Simon

Tutor / directorBelanche Muñoz, Luis Antonio

Tipus de documentProjecte Final de Màster Oficial

Data2020-10-29

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

This work proposes a possible generalization of k-nearest neighbors rule (kNN) where the impact of a point goes beyond the exact location. The decision criteria does not only depend on the distances to points but also on the distance of those to the nearest miss. This concept is de ned in this work as a second order interaction and it could be extended to high orders. The motivation of this concept is to take into consideration if points are in pure areas or if there are points of other classes nearby. Indeed, the methodology used to integrate the concept of second order interaction in the model is via hyper spheres centered around the points in the dataset. Those spheres have a radius that is proportional to the nearest miss and their unions and intersections will drive the decision rule. Therefore, we call this family of models Hyper Sphere Wrappers (HSW). Using those concepts, we developed three versions of the model: One-Class HSW for binary classi cation, Multi-Class HSW for multi-label classi cation and Complete-HSW for regularized multi-label classi cation. Indeed, we show that the latter converge asymptotically to a local kNN rule and in particular cases to the classical kNN rule. Once the models are de ned, we apply them to diverse benchmark data sets and compare them with typical machine learning models. The results are promising, in particular for high dimensional data. In addition, the complexity of the model enhance those kind of datasets because, as most distance based method, HSW scale better with the number of dimensions than with the number of samples. Moreover, the work shows that even if the Minkovsky metric for p < 1 is no longer a proper distance measure, it help the algorithm to gain accuracy in high dimensional data. The results indicate that in those cases the model perform a kind of instance selection procedure. The idea is that the model will select as nearest neighbors those points that lie parallel to any local axis of the sample that we want to predict. In consequence, the model obviate points with variability in lots of di erent directions which seem to help to boost performance of HSW and kNN. Finally, the work also shows that metric learning algorithms like Neighborhood Component Analysis (NCA) are able to increase substantially the performance of HSW. However, those kind of algorithms scale poorly in high dimensional data. Therefore, we leave a set of open questions that could be interesting to enhance this kind of rules. Among others, the study of Kth order interactions, the impact of de ning the radius of the spheres as non-linear functions of the nearest miss or a metric learning algorithm that is able to optimize p instead of the linear transformation. 1

MatèriesInformation resources management, Gestió de la informació

TitulacióMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)

URIhttp://hdl.handle.net/2117/340406

Col·leccions

Màsters oficials - Master in Innovation and Research in Informatics - MIRI [453]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
153052.pdf		596,2Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

From a topological approach for classification to KNN

Visualitza/Obre

Explora