An Efficient, Robust, and Customizable Information Extraction and Pre-processing Pipeline for Electronic Health Records

Eva K. Lee, Yuanbo Wang, Yuntian He, Brent M. Egan

2019

Abstract

Electronic Health Records (EHR) containing large amounts of patient data present both opportunities and challenges to industry, policy makers, and researchers. These data, when extracted and analyzed effectively, can reveal critical factors that can improve clinical practices and decisions. However, the inherently complex, heterogeneous and rapidly evolving nature of these data make them extremely difficult to analyze effectively. In addition, Protected Health Information (PHI) containing sensitive yet valuable information for clinical research must first be anonymized. In this paper we identify current challenges with obtaining and pre-processing information from EHR. We then present a comprehensive, efficient “pipeline” for extracting, de-identifying, and standardizing EHR data. We demonstrate the use of this pipeline, based on software from EPIC Systems, in analysing chronic kidney disease, prostate cancer, and cardiovascular disease. We also address challenges associated with temporal laboratory time series data and natural text data and develop a novel approach for clustering irregular Multivariate Time Series (MTS). The pipeline organizes data into a structured, machine-readable format which can be effectively applied in clinical research studies to optimize processes, personalize care, and improve quality, and outcomes.

Download


Paper Citation


in Harvard Style

Lee E., Wang Y., He Y. and Egan B. (2019). An Efficient, Robust, and Customizable Information Extraction and Pre-processing Pipeline for Electronic Health Records. In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR; ISBN 978-989-758-382-7, SciTePress, pages 310-321. DOI: 10.5220/0008071303100321


in Bibtex Style

@conference{kdir19,
author={Eva K. Lee and Yuanbo Wang and Yuntian He and Brent M. Egan},
title={An Efficient, Robust, and Customizable Information Extraction and Pre-processing Pipeline for Electronic Health Records},
booktitle={Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR},
year={2019},
pages={310-321},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008071303100321},
isbn={978-989-758-382-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR
TI - An Efficient, Robust, and Customizable Information Extraction and Pre-processing Pipeline for Electronic Health Records
SN - 978-989-758-382-7
AU - Lee E.
AU - Wang Y.
AU - He Y.
AU - Egan B.
PY - 2019
SP - 310
EP - 321
DO - 10.5220/0008071303100321
PB - SciTePress