A Weighted Maximum Entropy Language Model for Text Classification

Kostas Fragos, Yannis Maistros, Christos Skourlas


The Maximum entropy (ME) approach has been extensively used in various Natural Language Processing tasks, such as language modeling, part-of-speech tagging, text classification and text segmentation. Previous work in text classification was conducted using maximum entropy modeling with binary-valued features or counts of feature words. In this work, we present a method for applying Maximum Entropy modeling for text classification in a different way. Weights are used to select the features of the model and estimate the contribution of each extracted feature in the classification task. Using the X square test to assess the importance of each candidate feature we rank them and the most prevalent features, the most highly ranked, are used as the features of the model. Hence, instead of applying Maximum Entropy modeling in the classical way, we use the X square values to assign weights to the features of the model. Our method was evaluated on Reuters-21578 dataset for test classification tasks, giving promising results and comparably performing with some of the “state of the art” classification schemes.


  1. Lewis, D. and Ringuette, M., A comparison of two learning algorithms for text categorization. In The Third Annual Symposium on Document Analysis and Information Retrieval pp.81-93, 1994
  2. Makoto, I. and Takenobu, T., Cluster-based text categorization: a comparison of category search strategies, In ACM SIGIR'95, pp.273-280, 1995
  3. McCallum, A. and Nigam, K., A comparison of event models for naïve Bayes text classification, In AAAI-98 Workshop on Learning for Text Categorization, pp.41-48, 1998
  4. Masand, B., Lino, G. and Waltz, D., Classifying news stories using memory based reasoning, In ACM SIGIR'92, pp.59-65, 1992
  5. Yang, Y. and Liu, X., A re-examination of text categorization methods, In ACM SIGIR'99, pp.42-49, 1999
  6. Yang, Y., Expert network: Effective and efficient learning from human decisions in text categorization and retrieval, In ACM SIGIR'94, pp.13-22, 1994
  7. Buckley, C., Salton, G. and Allan, J., The effect of adding relevance information in a relevance feedback environment, In ACM SIGIR'94, pp.292-300, 1994
  8. Joachims, T., A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization, In ICML'97, pp.143-151, 1997
  9. Guo, H. and Gelfand S. B., Classification trees with neural network feature extraction, In IEEE Trans. on Neural Networks, Vol. 3, No. 6, pp.923-933, Nov., 1992
  10. Liu, J. M. and Chua, T. S., Building semantic perception net for topic spotting, In ACL'01, pp.370-377, 2001
  11. Ruiz, M. E. and Srinivasan, P., Hierarchical neural networks for text categorization, In ACM SIGIR'99, pp.81-82, 1999
  12. Schutze, H., Hull, D. A. and Pedersen, J. O., A comparison of classifier and document representations for the routing problem, In ACM SIGIR'95, pp.229-237, 1995
  13. Cortes, C. and Vapnik, V., Support vector networks, In Machine Learning, Vol.20, pp.273- 297, 1995
  14. Joachims, T., Learning to classify text using Support Vector Machines, Kluwer Academic Publishers, 2002
  15. Joachims, T., Text categorization with Support Vector Machines: learning with many relevant features, In ECML'98, pp.137-142, 1998
  16. Schapire, R. and Singer, Y., BoosTexter: A boosting-based system for text categorization, In Machine Learning, Vol.39, No.2-3, pp.135-168, 2000
  17. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C.J., Classification and Regression Trees, Wadsworth Int. 1984
  18. Brodley, C. E. and Utgoff, P. E., Multivariate decision trees, In Machine Learning, Vol.19, No.1, pp.45-77, 1995
  19. Denoyer, L., Zaragoza, H. and Gallinari, P., HMM-based passage models for document classification and ranking, In ECIR'01, 2001
  20. Miller, D. R. H., Leek, T. and Schwartz, R. M., A Hidden Markov model information retrieval system, In ACM SIGIR'99, pp.214-221, 1999
  21. Kira, K. and Rendell, L. A practical approach to feature selection. In Proc. 9th International workshop on machine learning (pp. 249-256) 1992
  22. Gilad-Bachrach, Navot A., Tishby N. Margin Based Feature Selection - Theory and Algorithms. In Proc of ICML 2004
  23. Stanley F. Chen and Rosenfeld R. A Gaussian prior for smoothing maximum entropy models. Technical report CMU-CS-99108, Carnegie Mellon University, 1999
  24. Ronald Rosenfeld. Adaptive statistical language modelling: A maximum entropy approach, PhD thesis, Carnegie Mellon University, 1994
  25. Ratnparkhi Adwait, J. Reynar, S. Roukos. A maximum entropy model for prepositional phrase attachment. In proceedings of the ARPA Human Language Technology Workshop, pages 250-255, 1994
  26. Ratnparkhi Adwait. A maximum entropy model for part-of-speech tagging. In Proceedings of the Empirical Methods in Natural Language Conference, 1996
  27. Shannon C.E. 1948. A mathematical theory of communication. Bell System Technical Journal 27:379 - 423, 623 - 656
  28. Berger A, A Brief Maxent Tutorial. http://www-2.cs.cmu.edu/aberger/maxent.html 29.Berger A. 1997. The improved iterative scaling algorithm: a gentle introduction http://www-2.cs.cmu.edu/aberger/maxent.html
  29. Della Pietra S., Della Pietra V. and Lafferty J., Inducing features of random fields. IEEE transaction on Pattern Analysis and Machine Intelligence, 19(4), 1997
  30. Nigam K., J. Lafferty, A. McCallum. Using maximum entropy for text classification, 1999
  31. Dumais, S. T., Platt, J., Heckerman, D., and Sahami, M, Inductive learning algorithms and representations for text categorization. Submitted for publication, 1998 http://research.microsoft.com/sdumais/cikm98.doc
  32. Mikheev A., Feature Lattics and maximum entropy models. In machine Learning, McGraw-Hill, New York, 1999
  33. Yang, Y. and Pedersen J., A comparative study on feature selection in text categorization. Fourteenth International Conference on Machine Learning (ICML'97) pp 412-420, 1997
  34. Berger A., Della Pietra S., Della Pietra V., A maximum entropy approach to natural language processing, Computational Linguistics, 22 (1), pp 39-71, 1996

Paper Citation

in Harvard Style

Fragos K., Maistros Y. and Skourlas C. (2005). A Weighted Maximum Entropy Language Model for Text Classification . In Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005) ISBN 972-8865-23-6X, pages 55-67. DOI: 10.5220/0002571800550067

in Bibtex Style

author={Kostas Fragos and Yannis Maistros and Christos Skourlas},
title={A Weighted Maximum Entropy Language Model for Text Classification},
booktitle={Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)},

in EndNote Style

JO - Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)
TI - A Weighted Maximum Entropy Language Model for Text Classification
SN - 972-8865-23-6X
AU - Fragos K.
AU - Maistros Y.
AU - Skourlas C.
PY - 2005
SP - 55
EP - 67
DO - 10.5220/0002571800550067