Authors:
Ben Choi
1
and
Xiaomei Huang
2
Affiliations:
1
Louisiana Tech University, United States
;
2
Computer Science, United States
Keyword(s):
Text summarization, Ontology, Word sense disambiguation, Natural language processing, Knowledge base, Web mining, Knowledge discovery.
Related
Ontology
Subjects/Areas/Topics:
Agents
;
Applications
;
Artificial Intelligence
;
e-Business
;
Enterprise Engineering
;
Enterprise Information Systems
;
Enterprise Ontologies
;
Formal Methods
;
Knowledge Discovery and Information Retrieval
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Natural Language Processing
;
Ontologies
;
Pattern Recognition
;
Simulation and Modeling
;
Soft Computing
;
Symbolic Systems
;
Web Information Systems and Technologies
;
Web Intelligence
;
Web Mining
Abstract:
To address the problem of information overload and to make effective use of information contained on the Web, we created a summarization system that can abstract key concepts and can extract key sentences to summarize text documents including Web pages. Our proposed system is the first summarization system that uses a knowledge base to generate new abstract concepts to summarize documents. To generate abstract concepts, our system first maps words contained in a document to concepts contained in the knowledge base called ResearchCyc, which organized concepts into hierarchies forming an ontology in the domain of human consensus reality. Then, it increases the weights of the mapped concepts to determine the importance, and propagates the weights upward in the concept hierarchies, which provides a method for generalization. To extract key sentences, our system weights each sentence in the document based on the concept weights associated with the sentence, and extracts the sentences with
some of the highest weights to summarize the document. Moreover, we created a word sense disambiguation method based on the concept hierarchies to select the most appropriate concepts. Test results show that our approach is viable and applicable for knowledge discovery and semantic Web.
(More)