ENTERPRISE ANTI-SPAM SOLUTION BASED ON MACHINE LEARNING APPROACH

Igor Mashechkin, Mikhail Petrovskiy, Andrey Rozinkin

Abstract

Spam-detection systems based on traditional methods have several obvious disadvantages like low detection rate, necessity of regular knowledge bases’ updates, impersonal filtering rules. New intelligent methods for spam detection, which use statistical and machine learning algorithms, solve these problems successfully. But these methods are not widespread in spam filtering for enterprise-level mail servers, because of their high resources consumption and insufficient accuracy regarding false-positive errors. The developed solution offers precise and fast algorithm. Its classification quality is better than the quality of Naïve-Bayes method that is the most widespread machine learning method now. The problem of time efficiency that is typical for all learning based methods for spam filtering is solved using multi-agent architecture. It allows easy system scaling and building unified corporate spam detection system based on heterogeneous enterprise mail systems. Pilot program implementation and its experimental evaluation for standard data sets and for real mail flows have demonstrated that our approach outperforms existing learning and traditional spam filtering methods. That allows considering it as a promising platform for constructing enterprise spam filtering systems.

References

  1. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., and Spyropoulos, C.D. (2000). An evaluation of naïve Bayesian anti-spam filtering. In Proceedings of the Workshop on Machine Learning in the New Information Age, 11-the European Conference on Machine Learning (ECML 2000), Barcelona, Spain, pp. 9-17 [WWW] Available at http://www.aueb.gr/users/ion/data/lingspam_public.tar .gz (retrieved November 2003)
  2. Apache Software Foundation (2004a) The Apache SpamAssassin Project [WWW] Available at http://spamassassin.apache.org (accessed December 2004).
  3. Apache Software Foundation (2004b) The Apache SpamAssassin Public Corpus [WWW] Available at http://spamassassin.apache.org/publiccorpus/ (retrieved November 2003)
  4. Farmer, J. (2004) SpamPal - Mail Classification Program [WWW] Available at http://www.spampal.org (accessed December 2004).
  5. Joachims, T. (1998) Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Proceedings of {ECML}-98, 10th European Conference on Machine Learning, Springer Verlag, Heidelberg, DE, 137-142
  6. Kaspersky Labs (2004), Kaspersky Anti-Spam Enterprise Edition [WWW] Available at http://www.kaspersky.com/antispamenterprise (accessed December 2004).
  7. Microsoft Corp. (2004) Sender ID technology [WWW] Available at http://www.microsoft.com/senderid (accessed December 2004).
  8. ORDB.org (2004) Open Relay Database [WWW] Available at http://www.ordb.org (accessed December 2004).
  9. Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. (1998). A Bayesian approach to filtering junk email. AAAI Workshop on Learning for Text Categorization, Madison, Wisconsin. AAAI Technical Report WS-98- 05
  10. Scholkopf, B. and Smola, A., J. (2000) Learning with kernels: Support Vector Machines, Regularization, Optimization and Beyond. The MIT Press Cambridge, Massachusetss
  11. Vapnik, V., N. (1998) Statistical learning theory, Wiley, New York
  12. Yang, Y. and Pedersen, J., O. (1997) A comparative study on feature selection in text categorization. Proceedings of {ICML}-97, 14th International Conference on Machine Learning, 412-420
  13. Yang, Y. (1999) An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval, 1, 1/2, 69-90
Download


Paper Citation


in Harvard Style

Mashechkin I., Petrovskiy M. and Rozinkin A. (2005). ENTERPRISE ANTI-SPAM SOLUTION BASED ON MACHINE LEARNING APPROACH . In Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-19-8, pages 188-193. DOI: 10.5220/0002521801880193


in Bibtex Style

@conference{iceis05,
author={Igor Mashechkin and Mikhail Petrovskiy and Andrey Rozinkin},
title={ENTERPRISE ANTI-SPAM SOLUTION BASED ON MACHINE LEARNING APPROACH},
booktitle={Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2005},
pages={188-193},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002521801880193},
isbn={972-8865-19-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - ENTERPRISE ANTI-SPAM SOLUTION BASED ON MACHINE LEARNING APPROACH
SN - 972-8865-19-8
AU - Mashechkin I.
AU - Petrovskiy M.
AU - Rozinkin A.
PY - 2005
SP - 188
EP - 193
DO - 10.5220/0002521801880193