Mostra el registre d'ítem simple

dc.contributorVelupillai, Sumithra
dc.contributorDalianis, Hercules
dc.contributor.authorMarin Rodenas, Alfonso
dc.date.accessioned2012-03-07T09:42:27Z
dc.date.available2012-03-07T09:42:27Z
dc.date.issued2011-01
dc.identifier.urihttp://hdl.handle.net/2099.1/14547
dc.descriptionProjecte realitzat mitjançant programa de mobilitat. KUNGLIGA TEKNISKA HÖGSKOLAN, STOCKHOLM
dc.description.abstractNowadays email is commonly used by citizens to establish communication with their government. On the received emails, governments deal with some common queries and subjects which some handling officers have to manually answer. Automatic email classification of the incoming emails allows to increase the communication efficiency by decreasing the delay between the query and its response. This thesis takes part within the IMAIL project, which aims to provide an automatic answering solution to the Swedish Social Insurance Agency (SSIA) (¿Försäkringskassan¿ in Swedish). The goal of this thesis is to analyze and compare the classification performance of different sets of features extracted from SSIA emails on different automatic classifiers. The features extracted from the emails will depend on the previous preprocessing that is carried out as well. Compound splitting, lemmatization, stop words removal, Part-of-Speech tagging and Ngrams are the processes used in the data set. Moreover, classifications will be performed using Support Vector Machines, k- Nearest Neighbors and Naive Bayes. For the analysis and comparison of different results, precision, recall and F-measure are used. From the results obtained in this thesis, SVM provides the best classification with a F-measure value of 0.787. However, Naive Bayes provides a better classification for most of the email categories than SVM. Thus, it can not be concluded whether SVM classify better than Naive Bayes or not. Furthermore, a comparison to Dalianis et al. (2011) is made. The results obtained in this approach outperformed the results obtained before. SVM provided a F-measure value of 0.858 when using PoS-tagging on original emails. This result improves by almost 3% the 0.83 obtained in Dalianis et al. (2011). In this case, SVM was clearly better than Naive Bayes.
dc.language.isoeng
dc.publisherUniversitat Politècnica de Catalunya
dc.publisherKungl. Tekniska högskolan (Estocolm)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject.lcshInternet in public administration
dc.subject.othere-government
dc.subject.othermachine learning
dc.subject.otherWEKA
dc.subject.otherSVM
dc.subject.otherNaive Bayes
dc.subject.otherkNN
dc.subject.otherSwedish
dc.subject.otherPoStagging
dc.subject.otherfeature extraction
dc.subject.otherfeature selection
dc.subject.otherautomatic e-mail classification
dc.titleComparison of automatic classifiers'performances using word-based feature extraction techniques in an e-government setting
dc.typeMaster thesis (pre-Bologna period)
dc.subject.lemacAdministració electrònica
dc.identifier.slug70019
dc.rights.accessOpen Access
dc.date.updated2012-02-28T13:25:28Z
dc.audience.educationlevelEstudis de primer/segon cicle
dc.audience.mediatorFacultat d'Informàtica de Barcelona
dc.audience.degreeENGINYERIA INFORMÀTICA (Pla 2003)
dc.contributor.covenanteeKungliga Tekniska högskolan


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple