Authors:
Bill Karakostas
1
and
Babis Theodoulidis
2
Affiliations:
1
City University London, United Kingdom
;
2
University of Manchester, United Kingdom
Keyword(s):
Real Time MapReduce, Data Stream Analysis, Web Log Analysis, Web Mining, Erlang.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Business Analytics
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Information Systems
;
OLAP and MDA Models
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Web Analytics
Abstract:
Monitoring the behaviour of large numbers of web site users in real time poses significant performance challenges, due to the decentralised location and volume of generated data. This paper proposes a MapReduce-style architecture where the processing of event series from the Web users is performed by a number of cascading mappers, reducers and rereducers, local to the event origin. With the use of static analysis and a prototype implementation, we show how this architecture is capable to carry out time series analysis in real time for very large web data sets, based on the actual events, instead of resorting to sampling or other extrapolation techniques.