Intelligent RSS Tool

Mettälä, Markus

dc.contributor	Béjar Alonso, Javier
dc.contributor	Gionis, Aristides
dc.contributor.author	Mettälä, Markus
dc.contributor.other	Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics
dc.date.accessioned	2014-01-22T12:50:22Z
dc.date.available	2014-01-22T12:50:22Z
dc.date.issued	2013-08-19
dc.identifier.uri	http://hdl.handle.net/2099.1/20431
dc.description	Projecte realitzat en col·laboració amb la Aalto University
dc.description.abstract	Easy access to a wide range of information available online enables people to explore this information with an ambition to explore interesting content even more. This opportunity often leads to a problem of finding interesting and relevant information from the sea of knowledge. This problem is often referred to as the information overload problem, which is getting harder and harder to deal with as the amount of information available online grows. In this thesis, one source of information is exploited and organized in such a way that the task of discovering new content is made easier. We use Really Simple Syndication (RSS) as our source of information and two methods to categorize it: document clustering with K-Means and Latent Dirichlet Allocation (LDA). We use the textual information that the RSS contains, each RSS feed usually contains a specific set of topics. Our first goal is to perform document clustering to the data, in order to generate meaningful clusters with the help of natural language processing (NLP) techniques to preprocess the data. Our second goal is to analyze the clustered RSS feeds and exploit the similarities between the documents to generate meaningful user models based on user feed subscriptions. The third goal is to provide relevant recommendations based on the user models we have learned. We combine the current state-of-the-art methods and present novel methods to compare feeds. We exploit WordNet shallow ontologies in our novel method to create generalized representations of the feeds. The final goal is to develop a functional application that can leverage the methods we developed with the help of machine learning libraries. The method we propose is a combination of document clustering techniques, text similarity, feed modeling and recommendation system.The results of our experiments show that K-Means clustered documents combined with recommendations based on the feed contents yield the best results. Using WordNet to measure the similarity of words provides also promising results. Further exploring the advantages of using semantic similarities would be an interesting research topic in the document similarity measures.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.publisher	Aalto-yliopisto
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject.lcsh	RSS feeds
dc.subject.other	shallow ontologies
dc.subject.other	vector space distance measures
dc.subject.other	text classification
dc.subject.other	recommendation system
dc.subject.other	document clustering
dc.title	Intelligent RSS Tool
dc.type	Master thesis
dc.subject.lemac	Recursos electrònics en xarxa
dc.rights.access	Open Access
dc.audience.educationlevel	Màster
dc.audience.mediator	Facultat d'Informàtica de Barcelona
dc.audience.degree	MÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2009)

Fitxers d'aquest items

Nom:: Mettala.pdf
Mida:: 2,104Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Master in Artificial Intelligence - MAI (Pla 2006) [73]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Intelligent RSS Tool

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora