EMTE: An Enhanced Medical Terms Extractor Using Pattern Matching Rules

Monah Hatoum, Jean-Claude Charr, Christophe Guyeux, David Laiymani, Alia Ghaddar

2023

Abstract

Downstream tasks like clinical textual data classification perform best when given good-quality datasets. Most of the existing clinical textual data preparation techniques rely on two main approaches, removing irrelevant data using cleansing techniques or extracting valuable data using feature extraction techniques. However, they still have limitations, mainly when applied to real-world datasets. This paper proposes a cleansing approach (called EMTE) which extracts phrases (medical terms, abbreviations, and negations) using pattern-matching rules based on the linguistic processing of the clinical textual data. Without requiring training, EMTE extracts valuable medical data from clinical textual records even if they have different writing styles. Furthermore, since EMTE relies on dictionaries to store abbreviations and pattern-matching rules to detect phrases, it can be easily maintained and extended for industrial use. To evaluate the performance of our approach, we compared the performance of EMTE to three other techniques. All four cleansing techniques were applied to a large industrial imbalanced dataset, consisting of 2.21M samples from different specialties with 1,050 ICD-10 codes. The experimental results on several Deep Neural Network (DNN) algorithms showed that our cleansing approach significantly improves the trained models’ performance compared to the other tested techniques and according to different metrics.

Download


Paper Citation


in Harvard Style

Hatoum M., Charr J., Guyeux C., Laiymani D. and Ghaddar A. (2023). EMTE: An Enhanced Medical Terms Extractor Using Pattern Matching Rules. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-623-1, pages 301-311. DOI: 10.5220/0011717300003393


in Bibtex Style

@conference{icaart23,
author={Monah Hatoum and Jean-Claude Charr and Christophe Guyeux and David Laiymani and Alia Ghaddar},
title={EMTE: An Enhanced Medical Terms Extractor Using Pattern Matching Rules},
booktitle={Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2023},
pages={301-311},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011717300003393},
isbn={978-989-758-623-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - EMTE: An Enhanced Medical Terms Extractor Using Pattern Matching Rules
SN - 978-989-758-623-1
AU - Hatoum M.
AU - Charr J.
AU - Guyeux C.
AU - Laiymani D.
AU - Ghaddar A.
PY - 2023
SP - 301
EP - 311
DO - 10.5220/0011717300003393