Defending Language Models: Safeguarding Against Hallucinations, Adversarial Attacks, and Privacy Concerns

Yetian He

2024

Abstract

While Large Language Models (LLMs) have garnered immense popularity, they also bring forth significant safety concerns. If LLMs are not disseminated to users in a secure and reliable manner, their continued development and widespread adoption could encounter substantial opposition and impediments. Hence, the primary aim of this survey is to systematically organize and consolidate current studies on LLM security to facilitate further exploration in this critical area. The article meticulously examines various security issues associated with LLMs, categorizing them into distinct sub-problems. It delves into the phenomenon of LLM hallucinations, elucidates mitigation strategies, explores adversarial attacks targeting language models, and evaluates defence mechanisms against such attacks. Furthermore, it discusses research pertaining to Artificial Intelligence (AI) alignment and security concerns in the context of LLMs. Additionally, the survey presents findings from relevant experiments to underscore the significance of addressing LLM security. By providing a comprehensive overview of LLM security, this paper aims to expedite researchers' understanding of this burgeoning field and catalyse advancements in ensuring the secure deployment and utilization of LLMs.

Download


Paper Citation


in Harvard Style

He Y. (2024). Defending Language Models: Safeguarding Against Hallucinations, Adversarial Attacks, and Privacy Concerns. In Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI; ISBN 978-989-758-713-9, SciTePress, pages 669-674. DOI: 10.5220/0012961600004508


in Bibtex Style

@conference{emiti24,
author={Yetian He},
title={Defending Language Models: Safeguarding Against Hallucinations, Adversarial Attacks, and Privacy Concerns},
booktitle={Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI},
year={2024},
pages={669-674},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012961600004508},
isbn={978-989-758-713-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI
TI - Defending Language Models: Safeguarding Against Hallucinations, Adversarial Attacks, and Privacy Concerns
SN - 978-989-758-713-9
AU - He Y.
PY - 2024
SP - 669
EP - 674
DO - 10.5220/0012961600004508
PB - SciTePress