From a topological approach for classification to KNN

Baltes Staubli, Simon

dc.contributor	Belanche Muñoz, Luis Antonio
dc.contributor.author	Baltes Staubli, Simon
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.date.accessioned	2021-02-24T08:04:47Z
dc.date.available	2021-02-24T08:04:47Z
dc.date.issued	2020-10-29
dc.identifier.uri	http://hdl.handle.net/2117/340406
dc.description.abstract	This work proposes a possible generalization of k-nearest neighbors rule (kNN) where the impact of a point goes beyond the exact location. The decision criteria does not only depend on the distances to points but also on the distance of those to the nearest miss. This concept is de ned in this work as a second order interaction and it could be extended to high orders. The motivation of this concept is to take into consideration if points are in pure areas or if there are points of other classes nearby. Indeed, the methodology used to integrate the concept of second order interaction in the model is via hyper spheres centered around the points in the dataset. Those spheres have a radius that is proportional to the nearest miss and their unions and intersections will drive the decision rule. Therefore, we call this family of models Hyper Sphere Wrappers (HSW). Using those concepts, we developed three versions of the model: One-Class HSW for binary classi cation, Multi-Class HSW for multi-label classi cation and Complete-HSW for regularized multi-label classi cation. Indeed, we show that the latter converge asymptotically to a local kNN rule and in particular cases to the classical kNN rule. Once the models are de ned, we apply them to diverse benchmark data sets and compare them with typical machine learning models. The results are promising, in particular for high dimensional data. In addition, the complexity of the model enhance those kind of datasets because, as most distance based method, HSW scale better with the number of dimensions than with the number of samples. Moreover, the work shows that even if the Minkovsky metric for p < 1 is no longer a proper distance measure, it help the algorithm to gain accuracy in high dimensional data. The results indicate that in those cases the model perform a kind of instance selection procedure. The idea is that the model will select as nearest neighbors those points that lie parallel to any local axis of the sample that we want to predict. In consequence, the model obviate points with variability in lots of di erent directions which seem to help to boost performance of HSW and kNN. Finally, the work also shows that metric learning algorithms like Neighborhood Component Analysis (NCA) are able to increase substantially the performance of HSW. However, those kind of algorithms scale poorly in high dimensional data. Therefore, we leave a set of open questions that could be interesting to enhance this kind of rules. Among others, the study of Kth order interactions, the impact of de ning the radius of the spheres as non-linear functions of the nearest miss or a metric learning algorithm that is able to optimize p instead of the linear transformation. 1
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.subject	Àrees temàtiques de la UPC::Informàtica
dc.subject.lcsh	Information resources management
dc.subject.other	kNN
dc.subject.other	Nearest Miss
dc.subject.other	Topology
dc.subject.other	Classifier
dc.title	From a topological approach for classification to KNN
dc.type	Master thesis
dc.subject.lemac	Gestió de la informació
dc.identifier.slug	153052
dc.rights.access	Open Access
dc.date.updated	2020-11-05T07:16:39Z
dc.audience.educationlevel	Màster
dc.audience.mediator	Facultat d'Informàtica de Barcelona
dc.audience.degree	MÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)

Fitxers d'aquest items

Nom:: 153052.pdf
Mida:: 596,2Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Master in Innovation and Research in Informatics - MIRI [454]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

From a topological approach for classification to KNN

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora