dc.contributor | Belanche Muñoz, Luis Antonio |
dc.contributor.author | Villegas García, Marco Antonio |
dc.contributor.other | Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics |
dc.date.accessioned | 2013-02-11T13:49:07Z |
dc.date.available | 2013-02-11T13:49:07Z |
dc.date.issued | 2013-01 |
dc.identifier.uri | http://hdl.handle.net/2099.1/17172 |
dc.description.abstract | Kernel-based methods first appeared in the form of support vector
machines. Since the first Support Vector Machine (SVM) formulation in
1995, we have seen how the number of proposed kernel
functions has quickly grown, and how these kernels have approached a
wide range of problems and domains. The most common and direct
applications of these methods are focused on continuous numeric data,
given that SVMs at the end involves the solution of an optimization problem. Additionally, some kernel functions have been oriented to more
symbolic data, in problems like text analysis, or hand-written digits
recognition. But surprisingly, there is a gap in the area of kernel
functions devoted to handle datasets with qualitative variables. One of the
most common practices to overcome this lack consists on recoding the
source qualitative information, making them suitable for applying numeric
kernel functions.
This thesis presents the development of new kernel functions that can
better model symbolic information presented as categorical variables, in a
direct way, and without the need of data preprocessing methods. The
proposition is based on the use of probabilistic information (probability
mass distribution) to compare the different modalities of a variable.
Additionally, the idea is formulated through a modular schema, combining a
set of components to obtain the kernel functions, facilitating the
modification and extension of single components.
The experimental results suggest an slightly improvement with respect
to traditional kernel functions, in the accuracy obtained on classification
problems. This progress is clearer on datasets with known probabilistic
structure. |
dc.language.iso | eng |
dc.publisher | Universitat Politècnica de Catalunya |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Spain |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Informàtica teòrica::Algorísmica i teoria de la complexitat |
dc.subject.lcsh | Support vector machines |
dc.subject.lcsh | Kernel functions |
dc.subject.lcsh | Computer algorithms |
dc.title | An investigation into new kernels for categorical variables |
dc.type | Master thesis |
dc.subject.lemac | Kernel, Funcions de |
dc.subject.lemac | Algorismes computacionals |
dc.rights.access | Open Access |
dc.audience.educationlevel | Màster |
dc.audience.mediator | Facultat d'Informàtica de Barcelona |
dc.audience.degree | MÀSTER UNIVERSITARI EN COMPUTACIÓ (Pla 2006) |