Spam Classification Using Machine Learning Techniques - Sinespam

Norte Sosa, José

dc.contributor	Alquézar Mancho, René
dc.contributor.author	Norte Sosa, José
dc.date.accessioned	2011-03-10T15:24:54Z
dc.date.available	2011-03-10T15:24:54Z
dc.date.issued	2010-08
dc.identifier.uri	http://hdl.handle.net/2099.1/11321
dc.description.abstract	Most e-mail readers spend a non-trivial amount of time regularly deleting junk e-mail (spam) messages, even as an expanding volume of such e-mail occupies server storage space and consumes network bandwidth. An ongoing challenge, therefore, rests within the development and refinement of automatic classifiers that can distinguish legitimate e-mail from spam. Some published studies have examined spam detectors using Naïve Bayesian approaches and large feature sets of binary attributes that determine the existence of common keywords in spam, and many commercial applications also use Naïve Bayesian techniques. Spammers recognize these attempts to prevent their messages and have developed tactics to circumvent these filters, but these evasive tactics are themselves patterns that human readers can often identify quickly. This work had the objectives of developing an alternative approach using a neural network (NN) classifier brained on a corpus of e-mail messages from several users. The features selection used in this work is one of the major improvements, because the feature set uses descriptive characteristics of words and messages similar to those that a human reader would use to identify spam, and the model to select the best feature set, was based on forward feature selection. Another objective in this work was to improve the spam detection near 95% of accuracy using Artificial Neural Networks; actually nobody has reached more than 89% of accuracy using ANN.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Sistemes experts
dc.subject.lcsh	Unsolicited electronic mail messages -- Classification
dc.title	Spam Classification Using Machine Learning Techniques - Sinespam
dc.type	Master thesis
dc.subject.lemac	Correu brossa (Correu electrònic) -- Classificació
dc.rights.access	Open Access
dc.audience.educationlevel	Màster
dc.audience.mediator	Facultat d'Informàtica de Barcelona
dc.audience.degree	MÀSTER UNIVERSITARI EN INTEL·LIGÈNCIA ARTIFICIAL (Pla 2009)

Fitxers d'aquest items

Nom:: J.Norte.pdf
Mida:: 888,6Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Master in Artificial Intelligence - MAI (Pla 2006) [73]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Spam Classification Using Machine Learning Techniques - Sinespam

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora