GReAT - A Model for the Automatic Generation of Text Summaries

Claudia Gomez Puyana, Alexandra Pomares Quimbaya

2013

Abstract

The excessive amount of available narrative texts within diverse domains such as health (e.g. medical records), justice (e.g. laws, declarations), assurance (e.g. declarations), etc. increases the required time for the analysis of information in a decision making process. Different approaches of summary generation of these texts have been proposed to solve this problem. However, some of them do not take into account the sequentiality of the original document, which reduces the quality of the final summary, other ones create overall summaries that do not satisfy the end user who requires a summary that is related to his profile (e.g. different medical specializations require different information) and others do not analyze the potential duplication of information and the noise of natural language on the summary. To cope these problems this paper presents GReAT a model for automatic summarization that relies on natural language processing and text mining techniques to extract the most relevant information from narrative texts focused on the requirements of the end user. GReAT is an extraction based summary generation model which principle is to identify the user’s relevant information filtering the text by topic and frequency of words, also it reduces the number of phrases of the summary avoiding the duplication of information. Experimental results show that the functionality of GReAT improves the quality of the summary over other existing methods.

References

  1. Abu-Jbara, A. and Radev, D. (2011). Coherent citationbased summarization of scientific papers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT 7811, pages 500- 509, Stroudsburg, PA, USA. Association for Computational Linguistics.
  2. Arora, R. and Ravindran, B. (2008). Latent dirichlet allocation based multi-document summarization. In Proceedings of the second workshop on Analytics for noisy unstructured text data, AND 7808, pages 91-97, New York, NY, USA. ACM.
  3. Bossard, A., Généreux, M., and Poibeau, T. (2009). Cbseas, a summarization system integration of opinion mining techniques to summarize blogs. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session, EACL 7809, pages 5-8, Stroudsburg, PA, USA. Association for Computational Linguistics.
  4. Brun, Ricardo Eto, S. J. A. (2004). Minería textual. El profesional de la informacin, 13(1).
  5. Chang, T.-M. and Hsiao, W.-F. (2008). A hybrid approach to automatic text summarization. In Computer and Information Technology, 2008. CIT 2008. 8th IEEE International Conference on, pages 65-70.
  6. Dalal, M. K. and Zaveri, M. A. (2011). Heuristics based automatic text summarization of unstructured text. In Proceedings of the International Conference & Workshop on Emerging Trends in Technology, ICWET 7811, pages 690-693, New York, NY, USA. ACM.
  7. Daumé, III, H. and Marcu, D. (2002). A noisy-channel model for document compression. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 7802, pages 449-456, Stroudsburg, PA, USA. Association for Computational Linguistics.
  8. Devasena, C. (2012). Automatic text categorization and summarization using rule reduction. In Advances in Engineering, Science and Management (ICAESM), 2012 International Conference on, pages 594-598.
  9. Genest, P.-E. and Lapalme, G. (2011). Framework for abstractive summarization using text-to-text generation. In Proceedings of the Workshop on Monolingual TextTo-Text Generation, pages 64-73, Portland, Oregon. Association for Computational Linguistics.
  10. Guelpeli, M. V. C., Garcia, A., and Branco, A. (2011). The process of summarization in the pre-processing stage in order to improve measurement of texts when clustering. In Internet Technology and Secured Transactions (ICITST), 2011 International Conference for, pages 388-395.
  11. Gunen, Erkan, D. R. R. (2004). Lexrank: Graph.based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22 (2004) 457- 479, 22.
  12. Hotho, A., Nrnberger, A., and Paa, G. (2005). A brief survey of text mining. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology.
  13. Inniss, T. R., Lee, J. R., Light, M., Grassi, M. A., Thomas, G., and Williams, A. B. (2006). Towards applying text mining and natural language processing for biomedical ontology acquisition. In Proceedings of the 1st international workshop on Text mining in bioinformatics, TMBIO 7806, pages 7-14, New York, NY, USA. ACM.
  14. Kianmehr, K., Gao, S., Attari, J., Rahman, M. M., Akomeah, K., Alhajj, R., Rokne, J., and Barker, K. (2009). Text summarization techniques: Svm versus neural networks. In Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services, iiWAS 7809, pages 487-491, New York, NY, USA. ACM.
  15. Ling, X., Mei, Q., Zhai, C., and Schatz, B. (2008). Mining multifaceted overviews of arbitrary topics in a text collection. In In Proc. SIGKDD08, pages 497-505. ACM.
  16. Liu, H.-H., Huang, Y.-T., and Chiang, J.-H. (2010). A study on paragraph ranking and recommendation by topic information retrieval from biomedical literature. In Computer Symposium (ICS), 2010 International, pages 859-864.
  17. Long, C., Huang, M.-L., Zhu, X.-Y., and Li, M. (2010). A new approach for multi-document update summarization. J. Comput. Sci. Technol., 25(4):739-749.
  18. Mei, Q., Guo, J., and Radev, D. (2010). Divrank: the interplay of prestige and diversity in information networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 7810, pages 1009-1018, New York, NY, USA. ACM.
  19. Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D., and Zajic, D. (2009). Using citations to generate surveys of scientific paradigms. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 7809, pages 584- 592, Stroudsburg, PA, USA. Association for Computational Linguistics.
  20. Muthukrishnan, P., Radev, D., and Mei, Q. (2011). Simultaneous similarity learning and feature-weight learning for document clustering. In Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing, TextGraphs-6, pages 42- 50, Stroudsburg, PA, USA. Association for Computational Linguistics.
  21. Park, J., Fukuhara, T., Ohmukai, I., Takeda, H., and Lee, S.-g. (2008). Web content summarization using social bookmarks: a new approach for social summarization. In Proceedings of the 10th ACM workshop on Web information and data management, WIDM 7808, pages 103-110, New York, NY, USA. ACM.
  22. Reeve, L. H., Han, H., Nagori, S. V., Yang, J. C., Schwimmer, T. A., and Brooks, A. D. (2006). Concept frequency distribution in biomedical text summarization. In Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM 7806, pages 604-611, New York, NY, USA. ACM.
  23. Saravanan, M., Raman, S., and Ravindran, B. (2005). A probabilistic approach to multi-document summarization for generating a tiled summary. In Computational Intelligence and Multimedia Applications, 2005. Sixth International Conference on, pages 167-172.
  24. Tran, N.-P., Lee, M., Hong, S., and Shin, M. (2012). Memory efficient parallelization for aho-corasick algorithm on a gpu. In High Performance Computing and Communication 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, pages 432-438.
  25. Wang, W., Xiao, C., Lin, X., and Zhang, C. (2009). Efficient approximate entity extraction with edit distance constraints. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, SIGMOD 7809, pages 759-770, New York, NY, USA. ACM.
  26. Yu, L. and Ren, F. (2009). A study on cross-language text summarization using supervised methods. In Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on, pages 1-7.
  27. Zhan, J., Loh, H. T., and Liu, Y. (2009). Gather customer concerns from online product reviews - a text summarization approach. Expert Syst. Appl., 36(2):2107- 2115.
  28. Zhang, Pei-ying, L. C.-h. (2009). Automatic text summarization based on sentences clustering and extraction. IEEE.
Download


Paper Citation


in Harvard Style

Gomez Puyana C. and Pomares Quimbaya A. (2013). GReAT - A Model for the Automatic Generation of Text Summaries . In Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8565-59-4, pages 280-288. DOI: 10.5220/0004454602800288


in Bibtex Style

@conference{iceis13,
author={Claudia Gomez Puyana and Alexandra Pomares Quimbaya},
title={GReAT - A Model for the Automatic Generation of Text Summaries},
booktitle={Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2013},
pages={280-288},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004454602800288},
isbn={978-989-8565-59-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - GReAT - A Model for the Automatic Generation of Text Summaries
SN - 978-989-8565-59-4
AU - Gomez Puyana C.
AU - Pomares Quimbaya A.
PY - 2013
SP - 280
EP - 288
DO - 10.5220/0004454602800288