DSpace DSpace UPC
 English   Castellano   Català  

Treballs academics UPC >
Facultat d'Informàtica de Barcelona >
Enginyeria Informàtica (Pla 2003) >

Empreu aquest identificador per citar o enllaçar aquest ítem: http://hdl.handle.net/2099.1/14547

Arxiu Descripció MidaFormat
70019.pdf607,79 kBAdobe PDFVeure/Obrir

Títol: Comparison of automatic classifiers'performances using word-based feature extraction techniques in an e-government setting
Autor: Marin Rodenas, Alfonso
Tutor/director/avaluador: Velupillai, Sumithra; Dalianis, Hercules
Universitat: Universitat Politècnica de Catalunya
Kungl. Tekniska högskolan (Estocolm)
Matèries: Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
Internet in public administration
e-government
machine learning
WEKA
SVM
Naive Bayes
kNN
Swedish
PoStagging
feature extraction
feature selection
automatic e-mail classification
Administració electrònica
Data: gen-2011
Tipus de document: Master thesis (pre-Bologna period)
Resum: Nowadays email is commonly used by citizens to establish communication with their government. On the received emails, governments deal with some common queries and subjects which some handling officers have to manually answer. Automatic email classification of the incoming emails allows to increase the communication efficiency by decreasing the delay between the query and its response. This thesis takes part within the IMAIL project, which aims to provide an automatic answering solution to the Swedish Social Insurance Agency (SSIA) (¿Försäkringskassan¿ in Swedish). The goal of this thesis is to analyze and compare the classification performance of different sets of features extracted from SSIA emails on different automatic classifiers. The features extracted from the emails will depend on the previous preprocessing that is carried out as well. Compound splitting, lemmatization, stop words removal, Part-of-Speech tagging and Ngrams are the processes used in the data set. Moreover, classifications will be performed using Support Vector Machines, k- Nearest Neighbors and Naive Bayes. For the analysis and comparison of different results, precision, recall and F-measure are used. From the results obtained in this thesis, SVM provides the best classification with a F-measure value of 0.787. However, Naive Bayes provides a better classification for most of the email categories than SVM. Thus, it can not be concluded whether SVM classify better than Naive Bayes or not. Furthermore, a comparison to Dalianis et al. (2011) is made. The results obtained in this approach outperformed the results obtained before. SVM provided a F-measure value of 0.858 when using PoS-tagging on original emails. This result improves by almost 3% the 0.83 obtained in Dalianis et al. (2011). In this case, SVM was clearly better than Naive Bayes.
Descripció: Projecte realitzat mitjançant programa de mobilitat. KUNGLIGA TEKNISKA HÖGSKOLAN, STOCKHOLM
URI: http://hdl.handle.net/2099.1/14547
Condicions d'accés: Open Access
Apareix a les col·leccions:Enginyeria Informàtica (Pla 2003)
Comparteix:



SFX Query

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets.

Per a qualsevol ús que se'n vulgui fer no previst a la llei, dirigiu-vos a: sepi.bupc@upc.edu

 

Valid XHTML 1.0! Programari DSpace Copyright © 2002-2004 MIT and Hewlett-Packard Comentaris
Universitat Politècnica de Catalunya. Servei de Biblioteques, Publicacions i Arxius