Authors:
Joao Paulo Carvalho
and
Sérgio Curto
Affiliation:
Universidade de Lisboa, Portugal
Keyword(s):
Fuzzy Text Preprocessing, Medical Text Reports, Natural Language Processing, Word Similarity, MIMIC II.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Fuzzy Information Processing, Fusion, Text Mining
;
Fuzzy Systems
;
Soft Computing
Abstract:
Large unedited technical textual databases might contain information that cannot be properly extracted using Natural Language Processing (NLP) tools due to the many existent word errors. A good example is the MIMIC II database, where medical text reports are a direct representation of experts’ views on real time observable data. Such reports contain valuable information that can improve predictive medic decision making models based on physiological data, but have never been used with that goal so far. In this paper we propose a fuzzy based semi-automatic method to specifically address the large number of word errors contained in such databases that will allow the direct application of NLP techniques, such as Bag of Words, to the textual data.