The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record

Scott McLachlan, Kudakwashe Dube, Thomas Gallagher, Bridget Daley, Jason Walonoski

2018

Abstract

Realistic synthetic data are increasingly being recognized as solutions to lack of data or privacy concerns in healthcare and other domains, yet little effort has been expended in establishing a generic framework for characterizing, achieving and validating realism in Synthetic Data Generation (SDG). The objectives of this paper are to: (1) present a characterization of the concept of realism as it applies to synthetic data; and (2) present and demonstrate application of the generic ATEN Framework for achieving and validating realism for SDG. The characterization of realism is developed through insights obtained from analysis of the literature on SDG. The development of the generic methods for achieving and validating realism for synthetic data was achieved by using knowledge discovery in databases (KDD), data mining enhanced with concept analysis and identification of characteristic, and classification rules. Application of this framework is demonstrated by using the synthetic Electronic Healthcare Record (EHR) for the domain of midwifery. The knowledge discovery process improves and expedites the generation process; having a more complex and complete understanding of the knowledge required to create the synthetic data significantly reduce the number of generation iterations. The validation process shows similar efficiencies through using the knowledge discovered as the elements for assessing the generated synthetic data. Successful validation supports claims of success and resolves whether the synthetic data is a sufficient replacement for real data. The ATEN Framework supports the researcher in identifying the knowledge elements that need to be synthesized, as well as supporting claims of sufficient realism through the use of that knowledge in a structured approach to validation. When used for SDG, the ATEN Framework enables a complete analysis of source data for knowledge necessary for correct generation. The ATEN Framework ensures the researcher that the synthetic data being created is realistic enough for the replacement of real data for a given use-case.

Download


Paper Citation


in Harvard Style

McLachlan S., Dube K., Gallagher T., Daley B. and Walonoski J. (2018). The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record. In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 5: HEALTHINF; ISBN 978-989-758-281-3, SciTePress, pages 220-230. DOI: 10.5220/0006677602200230


in Bibtex Style

@conference{healthinf18,
author={Scott McLachlan and Kudakwashe Dube and Thomas Gallagher and Bridget Daley and Jason Walonoski},
title={The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record},
booktitle={Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 5: HEALTHINF},
year={2018},
pages={220-230},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006677602200230},
isbn={978-989-758-281-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 5: HEALTHINF
TI - The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record
SN - 978-989-758-281-3
AU - McLachlan S.
AU - Dube K.
AU - Gallagher T.
AU - Daley B.
AU - Walonoski J.
PY - 2018
SP - 220
EP - 230
DO - 10.5220/0006677602200230
PB - SciTePress