Show simple item record

dc.contributorBelanche Muñoz, Luis Antonio
dc.contributor.authorBaltes Staubli, Simon
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.description.abstractThis work proposes a possible generalization of k-nearest neighbors rule (kNN) where the impact of a point goes beyond the exact location. The decision criteria does not only depend on the distances to points but also on the distance of those to the nearest miss. This concept is de ned in this work as a second order interaction and it could be extended to high orders. The motivation of this concept is to take into consideration if points are in pure areas or if there are points of other classes nearby. Indeed, the methodology used to integrate the concept of second order interaction in the model is via hyper spheres centered around the points in the dataset. Those spheres have a radius that is proportional to the nearest miss and their unions and intersections will drive the decision rule. Therefore, we call this family of models Hyper Sphere Wrappers (HSW). Using those concepts, we developed three versions of the model: One-Class HSW for binary classi cation, Multi-Class HSW for multi-label classi cation and Complete-HSW for regularized multi-label classi cation. Indeed, we show that the latter converge asymptotically to a local kNN rule and in particular cases to the classical kNN rule. Once the models are de ned, we apply them to diverse benchmark data sets and compare them with typical machine learning models. The results are promising, in particular for high dimensional data. In addition, the complexity of the model enhance those kind of datasets because, as most distance based method, HSW scale better with the number of dimensions than with the number of samples. Moreover, the work shows that even if the Minkovsky metric for p < 1 is no longer a proper distance measure, it help the algorithm to gain accuracy in high dimensional data. The results indicate that in those cases the model perform a kind of instance selection procedure. The idea is that the model will select as nearest neighbors those points that lie parallel to any local axis of the sample that we want to predict. In consequence, the model obviate points with variability in lots of di erent directions which seem to help to boost performance of HSW and kNN. Finally, the work also shows that metric learning algorithms like Neighborhood Component Analysis (NCA) are able to increase substantially the performance of HSW. However, those kind of algorithms scale poorly in high dimensional data. Therefore, we leave a set of open questions that could be interesting to enhance this kind of rules. Among others, the study of Kth order interactions, the impact of de ning the radius of the spheres as non-linear functions of the nearest miss or a metric learning algorithm that is able to optimize p instead of the linear transformation. 1
dc.publisherUniversitat Politècnica de Catalunya
dc.subjectÀrees temàtiques de la UPC::Informàtica
dc.subject.lcshInformation resources management
dc.subject.otherNearest Miss
dc.titleFrom a topological approach for classification to KNN
dc.typeMaster thesis
dc.subject.lemacGestió de la informació
dc.rights.accessOpen Access
dc.audience.mediatorFacultat d'Informàtica de Barcelona

Files in this item


This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder