A NOVEL WEB USAGE MINING METHOD - Mining and Clustering of DAG Access Patterns Considering Page Browsing Time

Koichiro Mihara, Masahiro Terabe, Kazuo Hashimoto

Abstract

In this paper, we propose a novel method to analyze web access logs. The proposed method defines a web access pattern as a DAG with page browsing time, and extracts the patterns using the closed frequent embedded DAG mining algorithm, DIGDAG. The proposed method succeeds in extracting as small number of patterns as necessary minimum, and enables more efficient analysis by clustering the extracted results.

References

  1. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. In The 20th International Conference on Very Large Data Bases (VLDB), pages 487-499.
  2. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., and Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In SIAM International Conference on Data Mining.
  3. Asai, T., Arimura, H., Uno, T., and Nakano, S.-I. (2003). Discovering frequent substructures in large unordered trees. In Discovery Science, pages 47-61.
  4. Ayres, J., Flannick, J., Gehrke, J., and Yiu, T. (2002). Sequential pattern mining using a bitmap representation. In The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 7802), pages 429-435. ACM.
  5. Berkhin, P. (2002). Survey of clustering data mining techniques. Technical report, Accrue Software.
  6. Bhattacharya, I. and Getoor, L. (2006). Entity Resolution in Graphs, chapter Mining Graph Data (L. Holder and D. Cook, eds.). Wiley.
  7. Bose, A., Beemanapalli, K., Srivastava, J., and Sahar, S. (2006). Incorporating concept hierarchies into usage mining based recommendations. In WebKDD 2006: KDD Workshop on Web Mining and Web Usage Analysis, in conjunction with the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 7806).
  8. Burby, J., Brown, A., and WAA Standards Committee (2007). Web analytics definitions - version 4.0.
  9. Burton, M. C. and Walther, J. B. (2001). The value of web log data in use-based design and testing. ComputerMediated Communication, 6(3).
  10. Chi, Y., Yang, Y., and Muntz, R. R. (2004). Hybridtreeminer: An efficient algorithm for mining frequent rooted trees and free trees using canonical forms. The 16th International Conference on Scientific and Statistical Database Management (SSDBM 7804), 00:11.
  11. Cooley, R., Mobasher, B., and Srivastava, J. (1999). Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1):5- 32.
  12. Draheim, M.-D., Hanser, C., and von Seckendorff, C. (2005). E-business case studies web log analysis: testberichte.de seminar paper.
  13. Hirate, Y. and Yamana, H. (2006). Sequential pattern mining with time intervals. In Ng, W. K., Kitsuregawa, M., Li, J., and Chang, K., editors, The 10th PacificAsia Conference on Knowledge Discovery and Data Mining (PAKDD 7806), volume 3918 of Lecture Notes in Computer Science, pages 775-779. Springer-Verlag New York, Inc.
  14. Hofgesang, P. I. (2006). Relevance of time spent on web pages. In WebKDD 2006: KDD Workshop on Web Mining and Web Usage Analysis, in conjunction with the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 7806).
  15. Inokuchi, A., Washio, T., and Motoda, H. (2000). An apriori-based algorithm for mining frequent substructures from graph data. In The 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 7800), pages 13-23. Springer-Verlag New York, Inc.
  16. Iváncsy, R. and Vajk, I. (2006). Frequent pattern mining in web log data. Acta Polytechnica Hungarica, Journal of Applied Science at Budapest Tech Hungary, Special Issue on Computational Intelligence, 3(1):77-90.
  17. Kuramochi, M. and Karypis, G. (2001). Frequent subgraph discovery. In The 2001 IEEE International Conference on Data Mining (ICDM 7801), pages 313-320. IEEE Computer Society.
  18. Liu, B. (2006). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York, Inc.
  19. Nakayama, T., Kato, H., and Yamane, Y. (2000). Discovering the gap between web site designers' expectations and users' behavior. Comput. Networks, 33(1-6):811- 822.
  20. Nijssen, S. and Kok, J. N. (2004). A quickstart in frequent structure mining can make a difference. In Tthe 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 7804), pages 647-652. ACM.
  21. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and Hsu, M.-C. (2001). PrefixSpan mining sequential patterns efficiently by prefix projected pattern growth. In The 17th International Conference on Data Engineering (ICDE 7801), pages 215-226. IEEE Computer Society.
  22. Raymond, K. and Hendrik, B. (2000). Web mining research: A survey. SIGKDD Explor. Newsl., 2(1):1-15.
  23. Recupero, D. R. and Shasha, D. (2007). GraphClust. http://cs.nyu.edu/shasha/papers/GraphClust.html.
  24. Srikant, R. and Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In Apers, P. M. G., Bouzeghoub, M., and Gardarin, G., editors, The 5th International Conference on Extending Database Technology (EDBT), volume 1057, pages 3-17. Springer-Verlag New York, Inc.
  25. Srivastava, J., Cooley, R., Deshpande, M., and Tan, P.-N. (2000). Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):12-23.
  26. Su, J.-H. and Lin, W.-Y. (2004). CBW: An efficient algorithm for frequent itemset mining. The 37th Hawaii International Conference on System Sciences (HICSS 7804), 3:30064.3.
  27. Tec-Ed (1999). Assessing web site usability from server log files.
  28. Termier, A., Rousset, M.-C., and Sebag, M. (2004). DRYADE: A new approach for discovering closed frequent trees in heterogeneous tree databases. In The 4th IEEE International Conference on Data Mining (ICDM 7804), pages 543-546. IEEE Computer Society.
  29. Termier, A., Tamada, Y., Numata, K., Imoto, S., Washio, T., and Higuchi, T. (2007). DIGDAG, a first algorithm to mine closed frequent embedded sub-DAGs. In The 5th International Workshop on Mining and Learning with Graphs (MLG 7807).
  30. T., Kiyomi, M., and Arimura, H. (2004).
  31. LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In Proceedings of the IEEE ICDM 7804 Workshop on Frequent Itemset Mining Implementations (FIMI 7804).
  32. Xia, Y. and Yang, Y. (2005). Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering, 17(2):190-202.
  33. Yan, X. and Han, J. (2002). gspan: Graph-based substructure pattern mining. In IThe 2002 IEEE International Conference on Data Mining (ICDM 7802), pages 721- 724. IEEE Computer Society.
  34. Yan, X. and Han, J. (2003). Closegraph: mining closed frequent graph patterns. In The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 7803), pages 286-295. ACM.
  35. Yang, Z., Wang, Y., and Kitsuregawa, M. (2005). LAPIN: Effective sequential pattern mining algorithms by last position induction. Technical report, Tokyo University.
  36. Zaki, M. J. (2001). SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2):31-60.
  37. Zaki, M. J. (2002). Efficiently mining frequent trees in a forest. In The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 7802), pages 71-80. ACM.
Download


Paper Citation


in Harvard Style

Mihara K., Terabe M. and Hashimoto K. (2008). A NOVEL WEB USAGE MINING METHOD - Mining and Clustering of DAG Access Patterns Considering Page Browsing Time . In Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-989-8111-27-2, pages 313-320. DOI: 10.5220/0001528303130320


in Bibtex Style

@conference{webist08,
author={Koichiro Mihara and Masahiro Terabe and Kazuo Hashimoto},
title={A NOVEL WEB USAGE MINING METHOD - Mining and Clustering of DAG Access Patterns Considering Page Browsing Time},
booktitle={Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2008},
pages={313-320},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001528303130320},
isbn={978-989-8111-27-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Fourth International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - A NOVEL WEB USAGE MINING METHOD - Mining and Clustering of DAG Access Patterns Considering Page Browsing Time
SN - 978-989-8111-27-2
AU - Mihara K.
AU - Terabe M.
AU - Hashimoto K.
PY - 2008
SP - 313
EP - 320
DO - 10.5220/0001528303130320