Authors:
Mina Sheikh Alishahi
;
Mohamed Mejri
and
Nadia Tawbi
Affiliation:
Université Laval, Canada
Keyword(s):
Clustering Algorithm, Spam Emails, Machine Learning, Spam Campaign Detection.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
Spam emails constitute a fast growing and costly problems associated with the Internet today. To fight effectively
against spammers, it is not enough to block spam messages. Instead, it is necessary to analyze the
behavior of spammer. This analysis is extremely difficult if the huge amount of spam messages is considered
as a whole. Clustering spam emails into smaller groups according to their inherent similarity, facilitates discovering
spam campaigns sent by a spammer, in order to analyze the spammer behavior. This paper proposes
a methodology to group large sets of spam emails into spam campaigns, on the base of categorical attributes
of spam messages. A new informative clustering algorithm, named Categorical Clustering Tree (CCTree), is
introduced to cluster and characterize spam campaigns. The complexity of the algorithm is also analyzed and
its efficiency has been proven.