BENEFICIAL SEQUENTIAL COMBINATION OF DATA MINING ALGORITHMS

Mathias Goller, Markus Humer, Michael Schrefl

2006

Abstract

Depending on the goal of an instance of the Knowledge Discovery in Databases (KDD) process, there are instances that require more than a single data mining algorithm to determine a solution. Sequences of data mining algorithms offer room for improvement that are yet unexploited. If it is known that an algorithm is the first of a sequence of algorithms and there will be future runs of other algorithms, the first algorithm can determine intermediate results that the succeeding algorithms need. The anteceding algorithm can also determine helpful statistics for succeeding algorithms. As the anteceding algorithm has to scan the data anyway, computing intermediate results happens as a by-product of computing the anteceding algorithm’s result. On the one hand, a succeeding algorithm can save time because several steps of that algorithm have already been pre-computed. On the other hand, additional information about the analysed data can improve the quality of results such as the accuracy of classification, as demonstrated in experiments with synthetical and real data.

References

  1. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. In Bocca, J. B., Jarke, M., and Zaniolo, C., editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487-499. Morgan Kaufmann.
  2. Bradley, P. S., Fayyad, U. M., and Reina, C. (1998). Scaling clustering algorithms to large databases. In Knowledge Discovery and Data Mining, pages 9-15.
  3. Chiu, T., Fang, D., Chen, J., Wang, Y., and Jeris, C. (2001). A robust and scalable clustering algorithm for mixed type attributes in large database environment. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 263-268. ACM Press.
  4. Cooley, R. (2000). Web Usage Mining: Discovery and Application of Interesting Patterns from Web Data. PhD thesis, University of Minnesota.
  5. Dempster, A. P., Laird, N., and Rubin, D. (1977). Maximum likelihood via the EM algorithm. Journal of the Royal Statistical Society, (39):1-38.
  6. Dhillon, S. I., Kumar, R., and Mallela, S. (2002). Enhancded word clustering for hierarchical text classification. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 191-200.
  7. Facca, F. M. and Lanzi, P. L. (2005). Mining interesting knowledge from weblogs: a survey. Data and Knowledge Engineering, 53(3):225-241.
  8. Freitas, A. A. (2002). Data Mining and Knowledge Discovery with Evolutionary Algorithms. Spinger-Verlag, Berlin.
  9. Gehrke, J., Ramakrishnan, R., and Ganti, V. (2000). Rainforest - a framework for fast decision tree construction of large datasets. Data Mining and Knowledge Discovery, 4(2/3):127-162.
  10. Genther, H. and Glesner, M. (1994). Automatic generation of a fuzzy classification system using fuzzy clustering methods. In SAC 7894: Proceedings of the 1994 ACM symposium on Applied computing, pages 180- 183, New York, NY, USA. ACM Press.
  11. Guha, S., Meyerson, A., Mishra, N., Motwani, R., and O'Callaghan, L. (2003). Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15.
  12. Han, E.-H., Karypis, G., Kumar, V., and Mobasher, B. (1997). Clustering based on association rule hypergraphs. In Research Issues on Data Mining and Knowledge Discovery.
  13. Han, J., Pei, J., and Yin, Y. (2000). Mining frequent patterns without candidate generation. In Chen, W., Naughton, J., and Bernstein, P. A., editors, 2000 ACM SIGMOD Intl. Conference on Management of Data, pages 1-12. ACM Press.
  14. Kim, K. M., Park, J. J., and Song, M. H. (2004). Binary decision tree using genetic algorithm for recognizing defect patterns of cold mill strip. In Proc. of the Canadian Conference on AI 2004, pages 461 - 466. Springer.
  15. Kruengkrai, C. and Jaruskulchai, C. (2002). A parallel learning algorithm for text classification. In KDD 7802: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 201-206, New York, NY, USA. ACM Press.
  16. Lai, H. and Yang, T.-C. (2000). A group-based inference approach to customized marketing on the web integrating clustering and association rules techniques. In HICSS 7800: Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 6, page 6054, Washington, DC, USA. IEEE Computer Society.
  17. Lent, B., Swami, A. N., and Widom, J. (1997). Clustering association rules. In ICDE, pages 220-231.
  18. Liu, B., Xia, Y., and Yu, P. S. (2000). Clustering through decision tree construction. In CIKM 7800: Proceedings of the ninth international conference on Information and knowledge management, pages 20-29, New York, NY, USA. ACM Press.
  19. MacQueen, J. (1967). Some methods for classification and multivariate observations. In Proceedings of the 5th Berkeley Symp. Math. Statist, Prob., pages 1:281-297.
  20. O'Callaghan, L., Mishra, N., Meyerson, A., Guha, S., and Motwani, R. (2002). High-performance clustering of streams and large data sets. In Proc. of the 2002 Intl. Conf. on Data Engineering (ICDE 2002), February 2002.
  21. Zhang, D., Gunopulos, D., Tsotras, V. J., and Seeger, B. (2003). Temporal and spatio-temporal aggregations over data streams using multiple time granularities. Inf. Syst., 28(1-2):61-84.
Download


Paper Citation


in Harvard Style

Goller M., Humer M. and Schrefl M. (2006). BENEFICIAL SEQUENTIAL COMBINATION OF DATA MINING ALGORITHMS . In Proceedings of the Eighth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-972-8865-42-9, pages 135-143. DOI: 10.5220/0002495501350143


in Bibtex Style

@conference{iceis06,
author={Mathias Goller and Markus Humer and Michael Schrefl},
title={BENEFICIAL SEQUENTIAL COMBINATION OF DATA MINING ALGORITHMS},
booktitle={Proceedings of the Eighth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2006},
pages={135-143},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002495501350143},
isbn={978-972-8865-42-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Eighth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - BENEFICIAL SEQUENTIAL COMBINATION OF DATA MINING ALGORITHMS
SN - 978-972-8865-42-9
AU - Goller M.
AU - Humer M.
AU - Schrefl M.
PY - 2006
SP - 135
EP - 143
DO - 10.5220/0002495501350143