Authors:
Igor Santos
;
Carlos Laorden
;
Xabier Ugarte-Pedrero
;
Borja Sanz
and
Pablo G. Bringas
Affiliation:
University of Deusto, Spain
Keyword(s):
Computer security, Spam filtering, Anomaly detection, Text classification.
Related
Ontology
Subjects/Areas/Topics:
Human Factors and Human Behaviour Recognition Techniques
;
Information and Systems Security
;
Information Assurance
;
Intrusion Detection & Prevention
;
Security Verification and Validation
Abstract:
Spam has become an important problem for computer security because it is a channel for the spreading of
threats such as computer viruses, worms and phishing. Currently, more than 85% of received e-mails are
spam. Historical approaches to combat these messages, including simple techniques such as sender blacklisting
or the use of e-mail signatures, are no longer completely reliable. Many solutions utilise machine-learning
approaches trained using statistical representations of the terms that usually appear in the e-mails. However,
these methods require a time-consuming training step with labelled data. Dealing with the situation where
the availability of labelled training instances is limited slows down the progress of filtering systems and offers
advantages to spammers. In this paper, we present the first spam filtering method based on anomaly detection
that reduces the necessity of labelling spam messages and only employs the representation of legitimate emails.
This approach repres
ents legitimate e-mails as word frequency vectors. Thereby, an email is classified
as spam or legitimate by measuring its deviation to the representation of the legitimate e-mails. We show
that this method achieves high accuracy rates detecting spam while maintaining a low false positive rate and
reducing the effort produced by labelling spam.
(More)