Authors:
Susana Martin-Toral
1
;
Gregorio I. Sainz-Palmero
2
and
Yannis Dimitriadis
3
Affiliations:
1
Computer and Information Technologies Division. Fundación CARTIF, Spain
;
2
Computer and Information Technologies Division. Fundación CARTIF; School of Industrial Engineering, University of Valladolid, Spain
;
3
GSIC - Group of Intelligent and Cooperative Systems, School of Telecommunications Engineering, University of Valladolid, Spain
Keyword(s):
Document corpus, content incoherence, natural language processing, text mining, artificial intelligence, document engineering.
Related
Ontology
Subjects/Areas/Topics:
Applications of Expert Systems
;
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Biomedical Engineering
;
Business Analytics
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Enterprise Information Systems
;
Health Information Systems
;
Industrial Applications of Artificial Intelligence
;
Information Systems Analysis and Specification
;
Knowledge Management
;
Natural Language Interfaces to Intelligent Systems
;
Ontologies and the Semantic Web
;
Sensor Networks
;
Signal Processing
;
Society, e-Business and e-Government
;
Soft Computing
;
Verification and Validation of Knowledge-Based Systems
;
Web Information Systems and Technologies
Abstract:
This paper is focused on the problems and effects generated by the use of a document corpus with mistakes, content incoherences amongst its connected documents and other errors. The problem introduced in this paper is very relevant in any area of human activity when this corpus is used as base element in the relationships between company partners, legal support, etc., and the way in which these incoherences can be detected. These problems can appear in several ways, and the produced effects are different, but a common situation exists in those areas of activity where many linked documents must be generated, managed and updated by different authors. This paper describes some examples of this problem in the case of a technical document corpus used amongst partners, and the solution framework developed for this case. Several types of incoherence have been detected and formulated, connected with problems described in other research areas such as information extraction and retrieval, text
mining, document interpretation and others, but all of them have been bounded and introduced from the point of view of document incoherences and their effects, specially in a company context. Finally the computational architecture and methodology uses are described and some initial results of incoherence detection are discussed.
(More)