Identifying Information Units for Multiple Document Summarization

Seamus Lyons, Dan Smith

Abstract

Multiple document summarization is becoming increasingly important as a way of reducing information overload, particularly in the context of the proliferation of similar accounts of events that are available on the Web. Removal of similar sentences often results in either partial or unwanted elimination of important information. In this paper, we present an approach to split sentences into their component clauses and use these clauses to produce comprehensive summaries of multiple documents describing particular events. Detailed analysis of all clauses and clause boundaries may be complex and computationally expensive. Our rule-based approach demonstrates that it is possible to achieve high accuracy in reasonable time.

References

  1. Yan T., and Garcia-Molina H., 1995. Duplication Removal in Information Dissemination, In Proc of VLDB-95, pp66-77, September. 1995
  2. Satoshi S., Chikashi N., 1997. Sentence Extraction and Information Extraction technique, Document Understanding Conference 2003.
  3. Seki Y., 2003 Sentence Extraction by tf/idf and Position Weighting from Newspaper Articles, Document Understanding Conference 2003.
  4. Klaus, Z 1997. A Literature Survey on Information Extraction and Text Summarization, Carnegie Mellon University, April 1997
  5. DUC 2003, Document Understanding Conference 2003, http://wwwnlpir.nist.gov/projects/duc/
  6. Barzilay R., Elhadad N., and McKeown K., 2002. Inferring Strategies for Sentence Ordering in Multidocument News Summarization JAIR 17, pp35-55.
  7. Nenkova, A., Schiffman B., Schlaiker A., Blair-Goldensohn S., Barzilay R., Sigelman S., Hatzivassiloglou V., McKeown K. 2003, Columbia University at the Document Understanding Conference 2003.
  8. Goldensohn S., Evans D., Hatzivassiloglou V., McKeown K., Nenkova A., Passonneau, R., Schiffman B., Schlaikjar A., Siddharthan A., Siegelman S., 2004. Columbia University at DUC 2004, Document Understanding Conference.
  9. Barzilay R., McKeown K., and Elhadad N., 1999. Information Fusion in the Context of Multi-Document Summarization. ACL 1999, pp703-733.
  10. Lyons, S. and Smith, D., 2002. Domain-Specific Information Extraction Structures, DEXA Workshops 2002: 80-84
  11. Tjong E.F. and Déjean H., 2001, Introduction to the CONLL-2001 Shared Task: Clause Identification, CoNLL-2001. http://cnts.uia.ac.be/conll2001/clauses/
  12. Carreras, X. and Màrquez, L., 2001. Boosting Trees for Clause Splitting. In CoNLL'01, 5th International Conference on Computational Natural Language Learning, Toulouse, France
  13. Carreras X., Màrquez L., Evans V., and Roth D., 2002 Learning and Inference for Clause Identification. ECML, Finland 2002
  14. Mitkov R., Evans R., Orasan C., Barbu C., Jones L., Sotirova V., 2000. Coreference and anaphor: developing annotating resources and annotation strategies, DAARC2000, 49-58
  15. Ginker M., 1994. Clauses: Restrictive and Nonrestrictive http://www.kentlaw.edu/academics/lrw/grinker/LwtaClauses__Restrictive_and_Nonrest.ht m
Download


Paper Citation


in Harvard Style

Lyons S. and Smith D. (2005). Identifying Information Units for Multiple Document Summarization . In Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005) ISBN 972-8865-23-6X, pages 110-117. DOI: 10.5220/0002564701100117


in Bibtex Style

@conference{nlucs05,
author={Seamus Lyons and Dan Smith},
title={Identifying Information Units for Multiple Document Summarization},
booktitle={Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)},
year={2005},
pages={110-117},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002564701100117},
isbn={972-8865-23-6X},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)
TI - Identifying Information Units for Multiple Document Summarization
SN - 972-8865-23-6X
AU - Lyons S.
AU - Smith D.
PY - 2005
SP - 110
EP - 117
DO - 10.5220/0002564701100117