AN APPROACH FOR COMBINING SEMANTIC INFORMATION AND PROXIMITY INFORMATION FOR TEXT SUMMARIZATION

Hogyeong Jeong, Yeogirl Yun

Abstract

This paper develops and evaluates an approach for combining semantic information with proximity information for text summarization. The approach is based on the proximity language model, which incorporates proximity information into the unigram language model. This paper novelly expands the proximity language model to also incorporate semantic information using latent semantic analysis (LSA). We argue that this approach achieves a good balance between syntactic and semantic information. We evaluate the approach using ROUGE scores on the Text Analysis Conference (TAC) 2009 Summarization task, and find that incorporating LSA into PLM gives improvements over the baseline models.

References

  1. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, pages 993-1022.
  2. Carbonell, J. and Goldstein, J. (1998). The use of mmr, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR 7898, pages 335-336.
  3. Genest, P., Lapalme, G., and Yousfi-Monod, M. (2010). Hextac: the creation of a manual extractive run. Proceedings of the Second Text Analysis Conference, Gaithersburg, Maryland, USA: National Institute of Standards and Technology.
  4. Gillick, D., Favre, B., Hakkani-Tur, D., Bohnet, B., Liu, Y., and Xie, S. (2010). The icsi/utd summarization system at tac 2009. Proceedings of the Second Text Analysis Conference, Gaithersburg, Maryland, USA: National Institute of Standards and Technology.
  5. Kumar, C., Pingali, P., and Varma, V. (2009). Estimating risk of picking a sentence for document summarization. Computational Linguistics and Intelligent.
  6. Landauer, T., Foltz, P., and Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2):259-284.
  7. Lin, C. (2004). Rouge: A package for automatic evaluation of summaries. Proceedings of the workshop on text summarization branches out (WAS 2004), pages 25- 26.
  8. Mihalcea, R. (2004). Graph-based ranking algorithms for sentence extraction, applied to text summarization. Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, page 20.
  9. Sahlgren, M. and Karlgren, J. (2005). Automatic bilingual lexicon acquisition using random indexing of parallel corpora. Journal of Natural Language Engineering, pages 327-341.
  10. Sellberg, L. and Jonsson, A. (2008). Using random indexing to improve singular value decomposition for latent semantic analysis. In Proceedings of the Sixth International Language Resources and Evaluation - LREC 7808.
  11. Steinberger, J. (2004). Using latent semantic analysis in text summarization and summary evaluation. Proc. ISIM'04.
  12. Xie, Z., Li, X., Di Eugenio, B., Nelson, P. C., Xiao, W., and Tirpak, T. M. (2004). Using gene expression programming to construct sentence ranking functions for text summarization. Proceedings of the 20th international conference on Computational Linguistics - COLING 7804, pages 1381-es.
  13. Zhao, J. and Yun, Y. (2009). A proximity language model for information retrieval. Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 291-298.
Download


Paper Citation


in Harvard Style

Jeong H. and Yun Y. (2011). AN APPROACH FOR COMBINING SEMANTIC INFORMATION AND PROXIMITY INFORMATION FOR TEXT SUMMARIZATION . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 419-424. DOI: 10.5220/0003650704270432


in Bibtex Style

@conference{kdir11,
author={Hogyeong Jeong and Yeogirl Yun},
title={AN APPROACH FOR COMBINING SEMANTIC INFORMATION AND PROXIMITY INFORMATION FOR TEXT SUMMARIZATION},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={419-424},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003650704270432},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - AN APPROACH FOR COMBINING SEMANTIC INFORMATION AND PROXIMITY INFORMATION FOR TEXT SUMMARIZATION
SN - 978-989-8425-79-9
AU - Jeong H.
AU - Yun Y.
PY - 2011
SP - 419
EP - 424
DO - 10.5220/0003650704270432