Boosting trees for anti-spam email filtering (Extended version)

View/Open
Document typeResearch report
Defense date2001-10
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
In this work, a set of comparative experiments for the
problem of automatically filtering unwanted electronic mail
messages are performed on two public corpora: PU1 and
LingSpam. Several variants of the AdaBoost algorithm with
confidence-rated predictions (Schapire et al., 99) have been
applied, which differ in the complexity of the base learners
considered. Two main conclusions can be drawn from our
experiments: a) The boosting--based methods clearly
outperform the other learning algorithms results published
on the two evaluation corpora, achieving very high levels of
the F_1 measure; b) Increasing the complexity of the base
learners allows to obtain better high-precision
classifiers, which is a very important issue when
misclassification costs are
considered.
CitationCarreras, X., Marquez, L. "Boosting trees for anti-spam email filtering (Extended version)". 2001.
Is part ofLSI-01-44-R