Support Vector Machines. Similarity functions to work with heterogeneous data and classifying documents

Parrilla Gutiérrez, Juan Manuel

Visualitza/Obre

62280.pdf (469,0Kb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Parrilla Gutiérrez, Juan Manuel

Tutor / directorHallam, John; Romero Merino, Enrique

Realitzat a/ambSyddansk universitet

Tipus de documentProjecte/Treball Final de Carrera

Data2010

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

The objective of Data Mining (DM) is to classify information from the real world. That kind of information is commonly heterogeneous data: information that needs different kind of data to be represented. How to deal with heterogeneous data has been usually something DM lacks about because DM is not deeply used with real world problems. Different solutions has been shown and our objective is to show a new one using similarities and Support Vector Machines (SVM). How to use similarities instead of kernels in SVM and later how to combine similarities to work with heterogeneous data. The idea is that any type of data will have a similarity related and then all this similarities will be combined to output a result. What makes this idea powerful is the way we can combine similarities, it can be practically anything while other methods to work with heterogeneous data only do linear combinations.First of all understand how SVM works and what does it means to use similarities instead of Kernels. Later implement in a SVM library what explained before and show it working with an example. We will work with documents so it would be also required to do some NLP, learn about a NLP is another of my goals. Another of our goals is to use OO techniques and get a good design. Make our framework easy to be modified by anybody. Make an easy implementation. The objective is to extend the library used not to fork it.

Descripció

Projecte fet en col.laboració amb University of Southern Denmark

MatèriesData mining, Mineria de dades

TitulacióENGINYERIA INFORMÀTICA (Pla 2003)

URIhttp://hdl.handle.net/2099.1/11809

Col·leccions

Facultat d'Informàtica de Barcelona - Enginyeria Informàtica (Pla 2003) [1.189]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
62280.pdf		469,0Kb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Support Vector Machines. Similarity functions to work with heterogeneous data and classifying documents

Visualitza/Obre

Explora