loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Safeguarding Ethical AI: Detecting Potentially Sensitive Data Re-Identification and Generation of Misleading or Abusive Content from Quantized Large Language Models

Topics: Application of Health Informatics in Clinical Cases; Big Data in Healthcare; Electronic Health Records and Standards; Interoperability and Data Integration; Large Language Models in Medicine

Authors: Navya Martin Kollapally 1 and James Geller 2

Affiliations: 1 Department of Computer Science, New Jersey Institute of Technology, Newark, U.S.A. ; 2 Department of Data Science, New Jersey Institute of Technology, Newark, U.S.A.

Keyword(s): Natural Language Processing, Redaction, Re-identification of EHR Entries, Large Language Models, Privacy-Preserving Machine Learning, HIPAA Act, Social Determinants of Health.

Abstract: Research on privacy-preserving Machine Learning (ML) is essential to prevent the re-identification of health data ensuring the confidentiality and security of sensitive patient information. In this era of unprecedented usage of large language models (LLMs), LLMs carry inherent risks when applied to sensitive data, especially as LLMs are trained on trillions of words from the internet, without a global standard for data selection. The lack of standardization in training LLMs poses a significant risk in the field of health informatics, potentially resulting in the inadvertent release of sensitive information, despite the availability of context-aware redaction of sensitive information. The research goal of this paper is to determine whether sensitive information could be re-identified from electronic health records during Natural Language Processing (NLP) tasks such as text classification without using any dedicated re-identification techniques. We performed zero and 8-shot learning wi th the quantized LLM models FLAN, Llama2, Mistral, and Vicuna for classifying social context data extracted from MIMIC-III. In this text classification task, our focus was on detecting potential sensitive data re-identification and the generation of misleading or abusive content during the fine-tuning and prompting stages of the process, along with evaluating the performance of the classification. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.224.58.24

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Martin Kollapally, N. and Geller, J. (2024). Safeguarding Ethical AI: Detecting Potentially Sensitive Data Re-Identification and Generation of Misleading or Abusive Content from Quantized Large Language Models. In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF; ISBN 978-989-758-688-0; ISSN 2184-4305, SciTePress, pages 554-561. DOI: 10.5220/0012411900003657

@conference{healthinf24,
author={Navya {Martin Kollapally}. and James Geller.},
title={Safeguarding Ethical AI: Detecting Potentially Sensitive Data Re-Identification and Generation of Misleading or Abusive Content from Quantized Large Language Models},
booktitle={Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF},
year={2024},
pages={554-561},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012411900003657},
isbn={978-989-758-688-0},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF
TI - Safeguarding Ethical AI: Detecting Potentially Sensitive Data Re-Identification and Generation of Misleading or Abusive Content from Quantized Large Language Models
SN - 978-989-758-688-0
IS - 2184-4305
AU - Martin Kollapally, N.
AU - Geller, J.
PY - 2024
SP - 554
EP - 561
DO - 10.5220/0012411900003657
PB - SciTePress