From a topological approach for classification to KNN
Visualitza/Obre
Estadístiques de LA Referencia / Recolecta
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/340406
Tipus de documentProjecte Final de Màster Oficial
Data2020-10-29
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
This work proposes a possible generalization of k-nearest neighbors rule (kNN) where the
impact of a point goes beyond the exact location. The decision criteria does not only depend
on the distances to points but also on the distance of those to the nearest miss. This concept
is de ned in this work as a second order interaction and it could be extended to high orders.
The motivation of this concept is to take into consideration if points are in pure areas or
if there are points of other classes nearby. Indeed, the methodology used to integrate the
concept of second order interaction in the model is via hyper spheres centered around the
points in the dataset. Those spheres have a radius that is proportional to the nearest miss
and their unions and intersections will drive the decision rule. Therefore, we call this family
of models Hyper Sphere Wrappers (HSW).
Using those concepts, we developed three versions of the model: One-Class HSW for
binary classi cation, Multi-Class HSW for multi-label classi cation and Complete-HSW for
regularized multi-label classi cation. Indeed, we show that the latter converge asymptotically
to a local kNN rule and in particular cases to the classical kNN rule. Once the models
are de ned, we apply them to diverse benchmark data sets and compare them with typical
machine learning models. The results are promising, in particular for high dimensional data.
In addition, the complexity of the model enhance those kind of datasets because, as most distance
based method, HSW scale better with the number of dimensions than with the number
of samples.
Moreover, the work shows that even if the Minkovsky metric for p < 1 is no longer a
proper distance measure, it help the algorithm to gain accuracy in high dimensional data.
The results indicate that in those cases the model perform a kind of instance selection procedure.
The idea is that the model will select as nearest neighbors those points that lie parallel
to any local axis of the sample that we want to predict. In consequence, the model obviate
points with variability in lots of di erent directions which seem to help to boost performance
of HSW and kNN.
Finally, the work also shows that metric learning algorithms like Neighborhood Component
Analysis (NCA) are able to increase substantially the performance of HSW. However,
those kind of algorithms scale poorly in high dimensional data. Therefore, we leave a set of
open questions that could be interesting to enhance this kind of rules. Among others, the
study of Kth order interactions, the impact of de ning the radius of the spheres as non-linear
functions of the nearest miss or a metric learning algorithm that is able to optimize p instead
of the linear transformation.
1
TitulacióMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
Col·leccions
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
153052.pdf | 596,2Kb | Visualitza/Obre |