Learning Sentence Reduction Rules for Brazilian Portuguese

Daniel Kawamoto, Thiago Alexandre Salgueiro Pardo

2010

Abstract

We present in this paper a method for sentence reduction with summarization purposes. The task is modeled as a machine learning problem, relying on shallow and linguistic features, in order to automatically learn symbolic patterns/rules that produce good sentence reductions. We evaluate our results with Brazilian Portuguese texts and show that we achieve high accuracy and produce better results than the existing solution for this language.

References

  1. Bick, E. (2000). The Parsing System PALAVRAS: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press.
  2. Clark, J. and Lapata, M. (2006). Constraint-based Sentence Compression: An Integer Programming Approach. In the Proceedings of the COLING/ACL, pp. 144-151.
  3. Cordeiro, J.P.; Dias, G.; Brazdil, P. (2009). Unsupervised Induction of Sentence Compression Rules. In the Proceedings of the Workshop on Language Generation and Summarisation, pp. 15-22. Singapore.
  4. Daumé III, H. and Marcu, D. (2002). A noisy-channel model for document compression. In the Proceedings of the Conference of the Association for Computational Linguistics, pp. 449-456.
  5. Daumé III, H. and Marcu, D. (2005). Induction of Word and Phrase Alignments for Automatic Document Summarization. Computational Linguistics, V. 31, N. 4, pp. 505-530.
  6. Endres-Niggemeyer, B. and Neugebauer, E. (1995). Professional summarising: No cognitive simulation without observation. In the Proceedings of the International Conference in Cognitive Science.
  7. Jing, H. (2000). Sentence Reduction for Automatic Text Summarization (2000). In the Proceedings of the 6th Applied Natural Language Processing Conference, pp. 310-315.
  8. Jing, H. and McKeown, K.R. (1999). The Decomposition of Human-Written Summary Sentence. Research and Development in Information Retrieval, pp.129-136.
  9. Knight, K. and Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, V. 139, N.1, pp. 91- 107.
  10. Lin C.-Y. and McKeown K.R. (2000). Cut and Paste Based Text Summarization. In the Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 178-185.
  11. Lin, C.Y. and Hovy, E. (2003). Automatic Evaluation of Summaries Using N-gram Cooccurrence Statistics. In the Proceedings of 2003 Language Technology Conference (HLTNAACL 2003), Edmonton, Canada.
  12. Mani, I. (2001). Automatic Summarization. John Benjamins Publishing Co. Amsterdam.
  13. Mani, I. and Maybury, M.T. (1999). Advances in Automatic Text Summarization. MIT Press.
  14. Mann, W.C. and Thompson, S.A. (1987). Rhetorical Structure Theory: A Theory of Text Organization. Technical Report ISI/RS-87-190.
  15. Marcu, D. (1999). The automatic construction of large-scale corpora for summarization research. In the Proceedings of the 22nd Conference on Research and Development in Information Retrieval, pp 137-144.
  16. Nguyen, M.L.; Shimazu, A.; Horiguchi, S.; Ho, B.T.; Fukushi, M. (2004). Probabilistic Sentence Reduction Using Support Vector Machines. In the Proceedings of the 20th international conference on Computational Linguistics.
  17. Pardo, T.A.S.; Rino, L.H.M.; Nunes, M.G.V. (2003). GistSumm: A Summarization Tool Based on a New Extractive Method. In Lecture Notes in Artificial Intelligence 2721, pp. 210-218. Faro, Portugal. June 26-27.
  18. Pardo, T.A.S. e Rino, L.H.M. (2003). TeMário: Um Corpus para Sumarização Automática de Textos. Série de Relatórios do NILC. NILC-TR-03-09. São Carlos-SP, Outubro, 13p.
  19. Pardo, T.A.S. (2005). GistSumm - GIST SUMMarizer: Extensões e Novas Funcionalidades. Série de Relatórios do NILC. NILC-TR-05-05. São Carlos-SP, Fevereiro, 8p.
  20. Prati, R.C. e Monard, M.C. (2008). Novas abordagens em aprendizado de máquina para a geração de regras, classes desbalanceadas e ordenação de casos. In the Proceedings of the VI Best MSc Dissertation/PhD Thesis Contest.
  21. Turner, J. and Charniak, E. (2005). Supervised and Unsupervised Learning for Sentence Compression. In the Proceedings of the 43rd Annual Meeting of the ACL, pp. 290-297.
  22. Unno, Y.; Ninomiya, T.; Miyao, Y.; Tsujii, J. (2006). Trimming CFG Parse Trees for Sentence Compression Using Machine Learning Approaches. In the Proceedings of the COLING/ACL, pp. 850-857.
  23. Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York: Springer-Verlag.
  24. Witten, I.H. and Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Download


Paper Citation


in Harvard Style

Kawamoto D. and Alexandre Salgueiro Pardo T. (2010). Learning Sentence Reduction Rules for Brazilian Portuguese . In Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2010) ISBN 978-989-8425-13-3, pages 90-99. DOI: 10.5220/0003030300900099


in Bibtex Style

@conference{nlpcs10,
author={Daniel Kawamoto and Thiago Alexandre Salgueiro Pardo},
title={Learning Sentence Reduction Rules for Brazilian Portuguese},
booktitle={Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2010)},
year={2010},
pages={90-99},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003030300900099},
isbn={978-989-8425-13-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science - Volume 1: NLPCS, (ICEIS 2010)
TI - Learning Sentence Reduction Rules for Brazilian Portuguese
SN - 978-989-8425-13-3
AU - Kawamoto D.
AU - Alexandre Salgueiro Pardo T.
PY - 2010
SP - 90
EP - 99
DO - 10.5220/0003030300900099