Authors:
Fred N. Kiwanuka
1
;
Ja’far Alqatawna
2
;
Anang Hudaya Muhamad Amin
1
;
Sujni Paul
1
and
Hossam Faris
3
Affiliations:
1
Computer Information Science, Higher Colleges of Technology, Dubai and U.A.E.
;
2
Computer Information Science, Higher Colleges of Technology, Dubai, U.A.E., King Abdullah II School for Information Technology, The University of Jordan, Amman and Jordan
;
3
King Abdullah II School for Information Technology, The University of Jordan, Amman and Jordan
Keyword(s):
Spam Detection, Dataset Processing, Automated Feature Engineering, Classification, Spam Features, Data Mining, Machine Learning, Python E-mail Feature Extraction and Classification Tool (CPyEFECT).
Abstract:
Everyday billions of emails are passed or processed through online servers of which about 59% is spam according to a recent research. Spam emails have increasingly contained viruses or other harmful malware and are a security risk to computer systems. The importance of spam filtering and the security of computer systems has become more essential than ever. The rate of evolution of spam nowadays is so high and hence previously successful spam detection methods are failing to cope. In this paper, we propose a comprehensive and automated feature engineering framework for spam classification. The proposed framework enables first, the development of a large number of features from any email corpus, and second extracting automated features using feature transformation and aggregation primitives. We show that the performance of classification of spam improves between 2% to 28% for almost all conventional machine learning classifiers when using automated feature engineering. As a by product
of our comprehensive automated feature engineering, we develop a Python-based open source tool, which incorporates the proposed framework.
(More)