USING ENSEMBLE AND LEARNING TECHNIQUES TOWARDS EXTENDING THE KNOWLEDGE DISCOVERY PIPELINE

Yu-N Cheah, Sakthiaseelan Karthigasoo, Selvakumar Manickam

Abstract

Knowledge discovery presents itself as a very useful technique to transform enterprise data into actionable knowledge. However, their effectiveness is limited in view that it is difficult to develop a knowledge discovery pipeline that is suited for all types of datasets. Moreover, it is difficult to select the best possible algorithm for each stage of the pipeline. In this paper, we define (a) a novel clustering ensemble algorithm based on self-organizing maps to automate the annotation of un-annotated medical datasets; (b) a data discretization algorithm based on Boolean Reasoning to discretize continuous data values; (c) a rule filtering mechanism; and (d) to extend the regular knowledge discovery process by including a learning mechanism based on neural network ensembles to produce a neural knowledge base for decision support. We believe that this would result in a decision support system that is tolerant towards ambiguous queries, e.g. with incomplete inputs. We also believe that the boosting and aggregating features of ensemble techniques would help to compensate for any shortcomings in some stages of the pipeline. Ultimately, we combine these efforts to produce an extended knowledge discovery pipeline.

References

  1. Abidi, S.S.R. and Hoe, K.M., 2002. Symbolic Exposition of Medical Data-Sets: A Data Mining Workbench to Inductively Derive Data-Defining Symbolic Rules. In Proceedings of the 15th IEEE Symposium on Computer Based Medical Systems (CBMS 2002). Maribor, Slovenia.
  2. Breiman, L., 1996. Bagging Predictors. Machine Learning. Vol. 24, pp. 123-140.
  3. Dimitriadou, E., Weingessel, A., and Hornik, K., 2003. A cluster ensembles framework. In Abraham, A., Köppen, M., and Franke, K. (eds.), Design and Application of Hybrid Intelligent Systems. Frontiers in Artificial Intelligence and Applications. Vol. 104, pp. 528-534. IOS Press.
  4. Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P., 1996. From Data Mining to Knowledge Discovery in Databases. AI Magazine. Vol. 17, No. 3, pp. 37-54.
  5. Freund, Y. and Schapire, R.E., 1995. A DecisionTheoretic Generalization of On-line Learning and an Application to Boosting. In Proceedings of the Second European Conference on Computational Learning Theory. Barcelona, Spain, pp. 23-37.
  6. Hansen, L.K. and Salamon, P., 1990. Neural Network Ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 12, pp. 993-1001.
  7. Johnson, D.S., 1974. Approximation Algorithms for Combinational Problems. Journal of Computer and System Sciences, Vol. 9, pp. 256-278.
  8. Michalski, R.S., 1983. A Theory and Methodology of Inductive Learning. In Michalski, R., Carbonell, J. and Mitchell, T. (eds.), Machine Learning. An Artificial Intelligence Approach, pp. 83-134. SpringerVerlag.
  9. Risvik, K.M., 1997. Discretization of Numerical Attributes - Preprocessing for Machine Learning. Project Report. Knowledge Systems Group, Department of Computer Systems and Telematics, Norwegian Institute of Technology, University of Trondheim, Norway.
  10. Strehl, A., and Ghosh, J., 2002. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research. Vol. 3, pp. 583-617.
  11. Yang, Y and Kamel, M., 2003. Clustering Ensemble Using Swarm Intelligence. In IEEE Swarm Intelligence Symposium. Indianapolis, Indiana, USA.
  12. Zhou, Z.H., Jiang, Y. and Chen, S.-F., 2003. Extracting Symbolic Rules from Trained Neural Network Ensembles. AI Communications. Vol. 16, No. 1, pp. 3- 15.
Download


Paper Citation


in Harvard Style

Cheah Y., Karthigasoo S. and Manickam S. (2005). USING ENSEMBLE AND LEARNING TECHNIQUES TOWARDS EXTENDING THE KNOWLEDGE DISCOVERY PIPELINE . In Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-19-8, pages 408-411. DOI: 10.5220/0002538104080411


in Bibtex Style

@conference{iceis05,
author={Yu-N Cheah and Sakthiaseelan Karthigasoo and Selvakumar Manickam},
title={USING ENSEMBLE AND LEARNING TECHNIQUES TOWARDS EXTENDING THE KNOWLEDGE DISCOVERY PIPELINE},
booktitle={Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2005},
pages={408-411},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002538104080411},
isbn={972-8865-19-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - USING ENSEMBLE AND LEARNING TECHNIQUES TOWARDS EXTENDING THE KNOWLEDGE DISCOVERY PIPELINE
SN - 972-8865-19-8
AU - Cheah Y.
AU - Karthigasoo S.
AU - Manickam S.
PY - 2005
SP - 408
EP - 411
DO - 10.5220/0002538104080411