Authors:
Massimo Ruffolo
1
and
Marco Manna
2
Affiliations:
1
Exeura s.r.l. - ICAR-CNR, University of Calabria, Italy
;
2
University of Calabria, Italy
Keyword(s):
Information Extraction, Knowledge Representation, Logic Programming, Two-Dimensional Grammars, Knowledge Management.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence and Decision Support Systems
;
Biomedical Engineering
;
Data Engineering
;
Enterprise Information Systems
;
Health Information Systems
;
Industrial Applications of Artificial Intelligence
;
Information Systems Analysis and Specification
;
Knowledge Management
;
Ontologies and the Semantic Web
;
Society, e-Business and e-Government
;
Strategic Decision Support Systems
;
Verification and Validation of Knowledge-Based Systems
;
Web Information Systems and Technologies
Abstract:
Recognizing and extracting meaningful information from unstructured documents, taking into account their semantics, is an important problem in the field of information and knowledge management. In this paper we describe a novel logic-based approach to semantic information extraction, from both HTML pages and flat text documents, implemented in the HıLεX system. The approach is founded on a new two-dimensional representation of documents, and heavily exploits DLP + - an extension of disjunctive logic programming for ontology representation and reasoning, which has been recently implemented on top of the DLV system. Ontologies, representing the semantics of information to be extracted, are encoded in DLP + , while the extraction patterns are expressed using regular expressions and an ad hoc two-dimensional grammar. The execution of DLP + reasoning modules, encoding the HıLεX grammar expressions, yields the actual extraction of information from the input document. Unlike previous syst
ems, which are merely syntactic, HıLεX combines both semantic and syntactic knowledge for a powerful information extraction.
(More)