DISTRIBUTED ENSEMBLE LEARNING IN TEXT CLASSIFICATION

Catarina Silva, Bernardete Ribeiro, Uroš Lotrič, Andrej Dobnikar

2008

Abstract

In today’s society, individuals and organizations are faced with an ever growing load and diversity of textual information and content, and with increasing demands for knowledge and skills. In this work we try to answer part of these challenges by addressing text classification problems, essential to managing knowledge, by combining several different pioneer kernel-learning machines, namely Support Vector Machines and Relevance Vector Machines. To excel complex learning procedures we establish a model of high-performance distributed computing environment to help tackling the tasks involved in the text classification problem. The presented approach is valuable in many practical situations where text classification is used. Reuters-21578 benchmark data set is used to demonstrate the strength of the proposed system while different ensemble based learning machines provide text classification models that are efficiently deployed in the Condor and Alchemi platforms.

References

  1. Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. Addison-Wesley, Wokingham, UK.
  2. Berman, F., Fox, G. C., and Hey, A. J. G., editors (2003). Grid Computing: Making the Global Infrastructure a Reality. Wiley, Chichester.
  3. Joachims, T. (2007). Svm light web page. http://svmlight. joachims.org.
  4. Quinn, M. (2003). Parallel Programming inC with MPI and OpenMP. McGraw Hill.
  5. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1-47.
  6. Silva, C. and Ribeiro, B. (2003). The importance of stop word removal on recall values in text categorization. In Proceedings of the International Joint Conference on Neural Networks, volume 3, pages 1661-1666, Portland.
  7. Tipping, M. E. (2001). Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research I, pages 211-214.
  8. van Rijsbergen, C. (1979). Information Retrieval. Butterworths.
  9. Vapnik, V. (1998). The Nature of Statistical Learning Theory. Springer, Berlin.
Download


Paper Citation


in Harvard Style

Silva C., Ribeiro B., Lotrič U. and Dobnikar A. (2008). DISTRIBUTED ENSEMBLE LEARNING IN TEXT CLASSIFICATION . In Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8111-37-1, pages 420-423. DOI: 10.5220/0001680604200423


in Bibtex Style

@conference{iceis08,
author={Catarina Silva and Bernardete Ribeiro and Uroš Lotrič and Andrej Dobnikar},
title={DISTRIBUTED ENSEMBLE LEARNING IN TEXT CLASSIFICATION},
booktitle={Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2008},
pages={420-423},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001680604200423},
isbn={978-989-8111-37-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - DISTRIBUTED ENSEMBLE LEARNING IN TEXT CLASSIFICATION
SN - 978-989-8111-37-1
AU - Silva C.
AU - Ribeiro B.
AU - Lotrič U.
AU - Dobnikar A.
PY - 2008
SP - 420
EP - 423
DO - 10.5220/0001680604200423