Boosting trees for anti-spam email filtering (Extended version)
Document typeResearch report
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
In this work, a set of comparative experiments for the problem of automatically filtering unwanted electronic mail messages are performed on two public corpora: PU1 and LingSpam. Several variants of the AdaBoost algorithm with confidence-rated predictions (Schapire et al., 99) have been applied, which differ in the complexity of the base learners considered. Two main conclusions can be drawn from our experiments: a) The boosting--based methods clearly outperform the other learning algorithms results published on the two evaluation corpora, achieving very high levels of the F_1 measure; b) Increasing the complexity of the base learners allows to obtain better high-precision classifiers, which is a very important issue when misclassification costs are considered.
CitationCarreras, X., Marquez, L. "Boosting trees for anti-spam email filtering (Extended version)". 2001.
Is part ofLSI-01-44-R