An Easy-to-Use and Robust Approach for the Differentially Private De-Identification of Clinical Textual Documents

Yakini Tchouka, Jean-François Couchot, David Laiymani

2023

Abstract

Unstructured textual data is at the heart of healthcare systems. For obvious privacy reasons, these documents are not accessible to researchers as long as they contain personally identifiable information. One way to share this data while respecting the legislative framework (notably GDPR or HIPAA) is, within the medical structures, to de-identify it, i.e. to detect the personal information of a person through a Named Entity Recognition (NER) system and then replacing it to make it very difficult to associate the document with the person. The challenge is having reliable NER and substitution tools without compromising confidentiality and consistency in the document. Most of the conducted research focuses on English medical documents with coarse substitutions by not benefiting from advances in privacy. This paper shows how an efficient and differentially private de-identification approach can be achieved by strengthening the less robust de-identification method and by adapting state-of-the-art differentially private mechanisms for substitution purposes. The result is an approach for de-identifying clinical documents in French language, but also generalizable to other languages and whose robustness is mathematically proven.

Download


Paper Citation


in Harvard Style

Tchouka Y., Couchot J. and Laiymani D. (2023). An Easy-to-Use and Robust Approach for the Differentially Private De-Identification of Clinical Textual Documents. In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 5: HEALTHINF; ISBN 978-989-758-631-6, SciTePress, pages 94-104. DOI: 10.5220/0011646600003414


in Bibtex Style

@conference{healthinf23,
author={Yakini Tchouka and Jean-François Couchot and David Laiymani},
title={An Easy-to-Use and Robust Approach for the Differentially Private De-Identification of Clinical Textual Documents},
booktitle={Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 5: HEALTHINF},
year={2023},
pages={94-104},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011646600003414},
isbn={978-989-758-631-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - Volume 5: HEALTHINF
TI - An Easy-to-Use and Robust Approach for the Differentially Private De-Identification of Clinical Textual Documents
SN - 978-989-758-631-6
AU - Tchouka Y.
AU - Couchot J.
AU - Laiymani D.
PY - 2023
SP - 94
EP - 104
DO - 10.5220/0011646600003414
PB - SciTePress