De-identification of Medical Information for Forming Multimodal Datasets to Train Neural Networks

Margarita Suzdaltseva, Alexandra Shamakhova, Natalia Dobrenko, Olga Alekseeva, Jaafar Hammud, Natalia Gusarova, Aleksandra Vatian, Anatoly Shalyto

Abstract

An important source of medical information for forming multimodal datasets to train neural networks is electronic patient records. In order to process data from electronic health records with a specified purpose, the number of requirements must be met - first of all, de-identification. This paper discusses the first stage of this process - searching for named entities in medical texts (which should be replaced or encrypted afterwards). The problem is solved by an example of semi-structured EHRs in Russian as a fusional, grammatically complex language. The structure and specificity of EMC typical for Russia is analyzed in detail. A problem-oriented comparison of approaches to solving the NER problem is carried out. We developed a pipeline for processing of HER and experimentally showed the advantages of the rule-based method over using specialized libraries. The achieved Recall and Precision values were 0.990 and 0.980 respectively.

Download


Paper Citation


in Harvard Style

Suzdaltseva M., Shamakhova A., Dobrenko N., Alekseeva O., Hammud J., Gusarova N., Vatian A. and Shalyto A. (2021). De-identification of Medical Information for Forming Multimodal Datasets to Train Neural Networks. In Proceedings of the 7th International Conference on Information and Communication Technologies for Ageing Well and e-Health - Volume 1: ICT4AWE, ISBN 978-989-758-506-7, pages 163-170. DOI: 10.5220/0010406001630170


in Bibtex Style

@conference{ict4awe21,
author={Margarita Suzdaltseva and Alexandra Shamakhova and Natalia Dobrenko and Olga Alekseeva and Jaafar Hammud and Natalia Gusarova and Aleksandra Vatian and Anatoly Shalyto},
title={De-identification of Medical Information for Forming Multimodal Datasets to Train Neural Networks},
booktitle={Proceedings of the 7th International Conference on Information and Communication Technologies for Ageing Well and e-Health - Volume 1: ICT4AWE,},
year={2021},
pages={163-170},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010406001630170},
isbn={978-989-758-506-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 7th International Conference on Information and Communication Technologies for Ageing Well and e-Health - Volume 1: ICT4AWE,
TI - De-identification of Medical Information for Forming Multimodal Datasets to Train Neural Networks
SN - 978-989-758-506-7
AU - Suzdaltseva M.
AU - Shamakhova A.
AU - Dobrenko N.
AU - Alekseeva O.
AU - Hammud J.
AU - Gusarova N.
AU - Vatian A.
AU - Shalyto A.
PY - 2021
SP - 163
EP - 170
DO - 10.5220/0010406001630170