AN APPROACH TO SEMI-SUPERVISED CLASSIFICATION USING THE HUNGARIAN ALGORITHM

Amparo Albalate, Aparna Suchindranath, Wolfgang Minker

2011

Abstract

In this paper we propose a novel semi-supervised classification algorithm from the cluster-and-label framework. A small amount of labeled examples is used to automatically label the extracted clusters, so that the initial labeled seed is implicitely ”augmented” to the whole clustered data. The optimum cluster labelling is achieved by means of the Hungarian algorithm, traditionally used to solve any optimisation assignment problem. Finally, the augmented labeled set is applied to train a SVM classifier. This semi-supervised approach has been compared to a fully supervised version. In our experiments we used an artificial dataset (mixture of Gaussians) as well as other five real data sets from the UCI repository. In general, the experimental results showed significant improvements in the classification performance under minimal labeled sets using the semi-supervised algorithm.

References

  1. Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121-167.
  2. Castelli, V. and Cover, T. M. (1995). On the exponential value of labeled samples. Pattern Recogn. Lett., 16(1):105-111.
  3. Demiriz, A., Bennett, K., and Embrechts, M. J. (1999). Semi-supervised clustering using genetic algorithms. In In Artificial Neural Networks in Engineering (ANNIE-99, pages 809-814.
  4. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1-38.
  5. Freedman, D. and Diaconis, P. (1981). On the histogram as a density estimator:l2 theory. Probability Theory and Related Fields, 57(4):453-476.
  6. Joachims, T., Informatik, F., Informatik, F., Informatik, F., Informatik, F., and Viii, L. (1997). Text categorization with support vector machines: Learning with many relevant features.
  7. Kaufmann, L. and Rousseeuw, P. (1990). Finding Groups in Data. An Introduction to Cluster Analysis. Wiley, New York, USA.
  8. Kuhn, H. W. (1955). The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83-97.
  9. Maeireizo, B., Litman, D., and Hwa, R. (2004). Co-training for predicting emotions with spoken dialogue data. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions.
  10. Nigam, K., McCallum, A. K., Thrun, S., and Mitchell, T. (2000). Text classification from labeled and unlabeled documents using em. Mach. Learn., 39(2-3):103-134.
  11. Rousseeuw, P. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Jornal of Computational and Applied Mathematics, 20:53- 65.
  12. Seeger, M. (2001). Learning with labeled and unlabeled data. Technical report.
  13. Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics.
  14. Zhu, X. (2006). Semi-supervised learning literature survey.
Download


Paper Citation


in Harvard Style

Albalate A., Suchindranath A. and Minker W. (2011). AN APPROACH TO SEMI-SUPERVISED CLASSIFICATION USING THE HUNGARIAN ALGORITHM . In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-40-9, pages 424-433. DOI: 10.5220/0003187304240433


in Bibtex Style

@conference{icaart11,
author={Amparo Albalate and Aparna Suchindranath and Wolfgang Minker},
title={AN APPROACH TO SEMI-SUPERVISED CLASSIFICATION USING THE HUNGARIAN ALGORITHM},
booktitle={Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2011},
pages={424-433},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003187304240433},
isbn={978-989-8425-40-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - AN APPROACH TO SEMI-SUPERVISED CLASSIFICATION USING THE HUNGARIAN ALGORITHM
SN - 978-989-8425-40-9
AU - Albalate A.
AU - Suchindranath A.
AU - Minker W.
PY - 2011
SP - 424
EP - 433
DO - 10.5220/0003187304240433