Web Usage Mining for Automatic Link Generation

Olatz Arbelaitz, Ibai Gurrutxaga, Aizea Lojo, Javier Muguerza, Jesús M. Pérez, Iñigo Perona

2012

Abstract

During the last decades, the information in the web has increased drastically but larger quantities of data do not provide perse added value for web visitors; there is a need for more efficient access to the required information and adaptation to user preferences or needs. The use of machine learning techniques to build user profiles allows to take into account users’ real preferences. We present in this work a preliminary system, based on the collaborative filtering approach, to identify and generate interesting links for the users while they are navigating. The system uses only web navigation logs stored in any web server (according to the Common Log Format) and extracts information from them combining unsupervised and supervised classification techniques and frequent pattern mining techniques. It also includes a generalization procedure in the data preprocessing phase and in this work we analyze its effect on the final performance of the whole system. We also analyze the effect of the cold start (0 day problem) in the proposed system. The experiments show that the proposed generalization option improves the results of the designed system, which performs efficiently w.r.t. a web-accessible database and is even able to deal with the cold start problem.

References

  1. Anitha, A. A new web usage mining approach for next page access prediction. International Journal of Computer Applications, 8(11):7-10, 2010.
  2. Brusilovsky P., Kobsa A. and Nejdl W. The Adaptive Web: Methods and Strategies of Web Personalization LNCS 4321, Springer, 2007.
  3. Chen X., Zhang X. A popularity-based prediction model for web prefetching. Computer, 36(3):63-70, 2003.
  4. Chordia B.S., Adhiya K.P. Grouping Web Access Sequences Using Sequence Alignment Method. In Indian Journal of Computer Science and Engineering (IJCSE), Vol. 2(3), (2011)
  5. The Common Log Format http://www.w3.org/Daemon/User/Config/Logging.html #common-logfile-format
  6. Cooley R., Mobasher B. and Srivastava J. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), 1999.
  7. Dasarathy B.V. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques IEEE Computer Society Press, Silver Spring, MD, 1991.
  8. Desikan P., Srivastava J., Kumar V. and Tan P.N. Hyperlink Analysis - Techniques and Applications. Army High Performance Computing Center Technical Report, 2002.
  9. García E., Romero C., Ventura S. and De Castro C. An architecture for making recommendations to courseware authors using association rule mining and collaborative filtering. User Modeling User and Adapted Interaction, 19(1-2), pages 99-132, 2009.
  10. Gusfield D. Algorithms on strings, trees, and sequences. Cambridge University Press, 1997.
  11. The Internet Traffic Archive. http://ita.ee.lbl.gov/. ACM SIGCOMM.
  12. Jain A.K., Dubes R.C. Algorithms for Clustering Data. Prentice-Hall, Upper Saddle River, NJ, USA, 1988.
  13. Kosala R. and Blockeel H. Web Mining Research: A Survey. ACM SIGKDD Explorations Newsletter, 2(1), pages 1-15, 2000.
  14. Liu B. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. Springer, 2007.
  15. Liu L. and O zsu M.T. Encyclopedia of Database Systems. In: PAM (Partitioning Around Medoids). Springer US, 2009.
  16. Makkar P., Gulati P. and Sharma A. A novel approach for predicting user behavior for improving web performance. International Journal on Computer Science and Engineering (IJCSE), 2(4):1233-1236, 2010.
  17. Mobasher B. Web Usage Mining. In: Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, Berlin, 2006.
  18. NASA-HTTP logs. HTTP requests to the NASA Kennedy Space Center WWW server. http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html, in Florida, 1995.
  19. National Aeronautics and Space Administration. http://www.nasa.gov/, 2010.
  20. Pierrakos D., Paliouras G., Papatheodorou C. and Spyropoulos C.D. Web Usage Mining as a Tool for Personalization: A Survey User Modeling and User Adapted Interaction, 13:311- 372, 2003.
  21. Srivastava J., Desikan P. and Kumar V. Web Mining - Concepts, Applications & Research Directions. In Foundations and Advances in Data Mining. Springer, Berlin, 2005.
  22. Zaki M.J. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning, 42:31-60, 2001.
Download


Paper Citation


in Harvard Style

Arbelaitz O., Gurrutxaga I., Lojo A., Muguerza J., M. Pérez J. and Perona I. (2012). Web Usage Mining for Automatic Link Generation . In Proceedings of the 10th International Workshop on Modelling, Simulation, Verification and Validation of Enterprise Information Systems and 1st International Workshop on Web Intelligence - Volume 1: WEBI, (ICEIS 2012) ISBN 978-989-8565-14-3, pages 71-80. DOI: 10.5220/0004090400710080


in Bibtex Style

@conference{webi12,
author={Olatz Arbelaitz and Ibai Gurrutxaga and Aizea Lojo and Javier Muguerza and Jesús M. Pérez and Iñigo Perona},
title={Web Usage Mining for Automatic Link Generation},
booktitle={Proceedings of the 10th International Workshop on Modelling, Simulation, Verification and Validation of Enterprise Information Systems and 1st International Workshop on Web Intelligence - Volume 1: WEBI, (ICEIS 2012)},
year={2012},
pages={71-80},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004090400710080},
isbn={978-989-8565-14-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Workshop on Modelling, Simulation, Verification and Validation of Enterprise Information Systems and 1st International Workshop on Web Intelligence - Volume 1: WEBI, (ICEIS 2012)
TI - Web Usage Mining for Automatic Link Generation
SN - 978-989-8565-14-3
AU - Arbelaitz O.
AU - Gurrutxaga I.
AU - Lojo A.
AU - Muguerza J.
AU - M. Pérez J.
AU - Perona I.
PY - 2012
SP - 71
EP - 80
DO - 10.5220/0004090400710080