AUTOMATIC ESTIMATION OF THE LSA DIMENSION

Jorge Fernandes, Andreia Artífice, Manuel J. Fonseca

Abstract

Nowadays the size of collections of information achieved considerable sizes, making the finding and exploration of a particular subject hard to achieve. One way to solve this problem is through text classification, where a theme or category is assigned to a text based on the analysis of its content. However, existing approaches to text classification require some effort and a high level of knowledge on this subject by the users, making them inaccessible to the common user. Another problem of current approaches is that they are optimized for a specific problem and can not easily be adapted to another context. In particular, unsupervised methods based on the LSA algorithm require users to define the dimension to use in the algorithm. In this paper we describe an approach to make the use of text classification more accessible to common users, by providing a formula to estimate the dimension of the LSA based on the number of texts used during the bootstrapping process. Experimental results show that our formula for estimation of the LSA dimension allows us to create unsupervised solutions able to achieve results similar to supervised approaches.

References

  1. Barak, L., Dagan, I., and Shnarch, E. (2009). Text categorization from category name via lexical reference. In NAACL 7809: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers,
Download


Paper Citation


in Harvard Style

Fernandes J., Artífice A. and J. Fonseca M. (2011). AUTOMATIC ESTIMATION OF THE LSA DIMENSION . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 301-305. DOI: 10.5220/0003666103090313


in Bibtex Style

@conference{kdir11,
author={Jorge Fernandes and Andreia Artífice and Manuel J. Fonseca},
title={AUTOMATIC ESTIMATION OF THE LSA DIMENSION},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={301-305},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003666103090313},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - AUTOMATIC ESTIMATION OF THE LSA DIMENSION
SN - 978-989-8425-79-9
AU - Fernandes J.
AU - Artífice A.
AU - J. Fonseca M.
PY - 2011
SP - 301
EP - 305
DO - 10.5220/0003666103090313