WEB PAGE SUMMARIZATION BY USING CONCEPT HIERARCHIES

Ben Choi, Xiaomei Huang

2009

Abstract

To address the problem of information overload and to make effective use of information contained on the Web, we created a summarization system that can abstract key concepts and can extract key sentences to summarize text documents including Web pages. Our proposed system is the first summarization system that uses a knowledge base to generate new abstract concepts to summarize documents. To generate abstract concepts, our system first maps words contained in a document to concepts contained in the knowledge base called ResearchCyc, which organized concepts into hierarchies forming an ontology in the domain of human consensus reality. Then, it increases the weights of the mapped concepts to determine the importance, and propagates the weights upward in the concept hierarchies, which provides a method for generalization. To extract key sentences, our system weights each sentence in the document based on the concept weights associated with the sentence, and extracts the sentences with some of the highest weights to summarize the document. Moreover, we created a word sense disambiguation method based on the concept hierarchies to select the most appropriate concepts. Test results show that our approach is viable and applicable for knowledge discovery and semantic Web.

References

  1. Barzilay R. and Elhadad M, “Using lexical chains for text summarization,” Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, pp. 10-17, 1997.
  2. Cañas A. J., Valerio A, Lalinde-Pulido J., Carvalho M, & Arguedas M., “Using WordNet for Word Sense Disambiguation to Support Concept Map Construction,” Lecture Notes in Computer Science: String Processing and Information Retrieval, Vol. 2857/2003, pp. 350-359, 2004.
  3. Choi B. & Yao Z., “Web Page Classification,” Foundations and Advances in Data Mining, SpringerVerag, pp. 221 - 274, 2005.
  4. Choi B., “Method and Apparatus for Individualizing and Updating a Directory of Computer Files,” United States Patent # 7,134,082, November 7, 2006.
  5. Cycorp, ResearchCyc, http://research.cyc.com/, http://www.cyc.com/, 2008.
  6. Doran W., Stokes N., Carthy J., & Dunnion J., “Comparing lexical chain-based summarisation approaches using an extrinsic evaluation,” In Global WordNet Conference (GWC), 2004.
  7. Hahn U. & Mani I., “The Challenges of Automatic Summarization”, IEEE Computer, Vol. 33, Issue 11, pp. 29-36, Nov. 2000.
  8. Kupiec J., Pedersen J., & Chen F., “A Trainable Document Summarizer,” In Proceedings of the Eighteenth Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), 68-73. Seattle, WA, 1995.
  9. Lin C.Y., “ROUGE: A Package for Automatic Evaluation of Summaries,” Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain, pp. 74-81, July, 2004.
  10. Mann W.C. & Thompson S.A., “Rhetorical Structure Theory: Toward a Functional Theory of Text Organization,” Text 8(3), 243-281. Also available as USC/Information Sciences Institute Research Report RR-87-190, 1988.
  11. Manning C. & Jurafsky D., The Stanford Natural Language Processing Group, The Stanford Parser: A statistical parser, http://nlp.stanford.edu/software/lexparser.shtml, 2008.
  12. Mittal V.O. & Witbrock M. J., "Language Modeling Experiments in Non-Extractive Summarization," Chapter 10 in Croft, W. Bruce and Lafferty, John, Language Modeling for Information Retrieval, Kluwer Academic Publishers, 2003.
  13. NIST, “Text Analysis Conference”, http://www.nist.gov/tac/, National Institute of Standards and Technology, 2008.
  14. Salton G., Singhal A., Mitra M., & Buckley C., “Automatic text structuring and summarization,” Information Processing and Management, 33, 193-20, 1997.
  15. Silber G. & McCoy K., “Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization,” Computational Linguistics, 2002.
  16. Simón-Cuevas1 A., Ceccaroni L., Rosete-Suárez A., Suárez-Rodríguez A., & Iglesia-Campos, M., “A concept sense disambiguation algorithm for concept maps,” Proc. of the Third Int. Conference on Concept Mapping, Tallinn, Estonia & Helsinki, Finland 2008.
  17. Teufel S. & Moens M., “Sentence Extraction as a Classification Task,” In Proceedings of the Workshop on Intelligent Scalable Summarization. ACL/EACL Conference, 58-65. Madrid, Spain, 1997.
  18. Yao Z. & Choi B., “Clustering Web Pages into Hierarchical Categories,” International Journal of Intelligent Information Technologies, Vol. 3, No. 2, pp.17-35, April-June, 2007.
Download


Paper Citation


in Harvard Style

Choi B. and Huang X. (2009). WEB PAGE SUMMARIZATION BY USING CONCEPT HIERARCHIES . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8111-66-1, pages 281-286. DOI: 10.5220/0001664102810286


in Bibtex Style

@conference{icaart09,
author={Ben Choi and Xiaomei Huang},
title={WEB PAGE SUMMARIZATION BY USING CONCEPT HIERARCHIES},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2009},
pages={281-286},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001664102810286},
isbn={978-989-8111-66-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - WEB PAGE SUMMARIZATION BY USING CONCEPT HIERARCHIES
SN - 978-989-8111-66-1
AU - Choi B.
AU - Huang X.
PY - 2009
SP - 281
EP - 286
DO - 10.5220/0001664102810286