Authors:
Angel L. Garrido
;
Maria G. Buey
;
Sandra Escudero
;
Alvaro Peiro
;
Sergio Ilarri
and
Eduardo Mena
Affiliation:
University of Zaragoza, Spain
Keyword(s):
Knowledge Management, Text Mining, Ontologies, Linguistics.
Related
Ontology
Subjects/Areas/Topics:
Biomedical Engineering
;
Context Aware Media Tagging
;
Data Engineering
;
Databases and Datawarehouses
;
Enterprise Information Systems
;
Health Information Systems
;
Information Systems Analysis and Specification
;
Internet Technology
;
Knowledge Management
;
Mobile Information Systems
;
Ontologies and the Semantic Web
;
Ontology and the Semantic Web
;
Society, e-Business and e-Government
;
Web Information Systems and Technologies
;
Web Interfaces and Applications
Abstract:
Automatic text categorisation systems is a type of software that every day it is receiving more interest, due not
only to its use in documentaries environments but also to its possible application to tag properly documents
on the Web. Many options have been proposed to face this subject using statistical approaches, natural language
processing tools, ontologies and lexical databases. Nevertheless, there have been no too many empirical
evaluations comparing the influence of the different tools used to solve these problems, particularly in a multilingual
environment. In this paper we propose a multi-language rule-based pipeline system for automatic
document categorisation and we compare empirically the results of applying techniques that rely on statistics
and supervised learning with the results of applying the same techniques but with the support of smarter tools
based on language semantics and ontologies, using for this purpose several corpora of documents. GENIE is
being applied to
real environments, which shows the potential of the proposal.
(More)