Optimization Strategies for Role-Playing Games Based on Large

Language Models

Bowei Chen

Department of Computer, North China Electric Power University, Baoding, Hebei Province, China

Keywords: Large Language Models, Roel-Playing Games, Optimization Strategy.

Abstract: With the booming development of the video game industry around the world, it has become an important

industry that cannot be ignored. The evolution of video games and the close combination of computer

technology have promoted the continuous advancement of game innovation. In recent years, with the rapid

development of large language models (LLMs), its application potential in various fields has gradually

emerged. Especially in the game industry, the combination of LLMs and game mechanisms has triggered

extensive discussions. In this context, this paper selects role-playing games (RPGs) as the research object to

explore the application of LLMs in this type of game and the possible impact. RPGs emphasize the interaction

and emotional experience between players and virtual characters, and they show a high degree of complexity

and innovation needs in terms of storylines, character development, and dialogue systems. This paper aims to

investigate the practical application of LLMs in RPGs, analyze their potential in improving game experience,

enhancing character interaction, and enriching plot development, and deeply explore the advantages of their

application, in order to provide a reference for research and practice in this field.

1 INTRODUCTION

The symbiotic relationship between artificial

intelligence and gaming traces its origins to the mid-

20th century, with seminal developments occurring

during computing's formative era. As early as 1952,

pioneering research initiatives in heuristic

programming were launched, culminating in 1956

with the emergence of the first self-learning algorithm

capable of mastering checkers - a technological

milestone that effectively constituted the embryonic

phase of intelligent gaming systems. Throughout the

subsequent three decades, researchers persistently

endeavored to integrate artificial intelligence (AI)

technologies into game design architectures. By

1990, AI-assisted game design began to emerge,

though constrained by the technological limitations of

the era, with applications primarily restricted to

rudimentary game components and fundamental

operational systems. The 21st century has witnessed

accelerated advancements in AI driving progressively

sophisticated implementations across gaming

ecosystems, with contemporary applications now

permeating core gameplay mechanics, player

https://orcid.org/0009-0002-2623-3745

modeling systems, and real-time environmental

generation pipelines. By 2020, AI had been

systematically implemented in procedural content

generation (PCG) for gaming applications,

encompassing level design optimization, character

animation automation, and narrative scripting. This

technological integration substantially enhanced

developmental efficiency across the industry. The

emergence of LLMs particularly demonstrated

transformative potential in dynamic dialogue

generation and contextual quest synthesis, thereby

introducing novel possibilities for innovative game

design paradigms. Xu et al. conducted a

comprehensive investigation on the application of

LLMs in gaming agents, examining their functional

implementations in digital games. The study

confirmed LLMs' substantial potential within gaming

contexts while identifying significant developmental

opportunities requiring further exploration(Xu et al.,

2024). At the 2024 Game Developers Conference

(GDC), Ubisoft unveiled its experimental AI

framework NEO NPC, designed to implement LLM-

powered conversational agents within gaming

environments. Ubisoft's technical specifications

632

Chen, B.

Optimization Strategies for Role-Playing Games Based on Large Language Models.

DOI: 10.5220/0013703100004670

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 2nd International Conference on Data Science and Engineering (ICDSE 2025), pages 632-637

ISBN: 978-989-758-765-8

indicate this architecture enables dynamic dialogue

generation constrained by predefined narrative

parameters (backstory, setting, character personas)

while maintaining strict alignment with designers'

creative directives through parameterized response

filtering mechanisms. In essence, game developers

seek to leverage LLMs to bring non-player characters

(NPCs) to life. The integration of the LLM-integrated

system's deployment marks a paradigm shift in

artificial intelligence applications within interactive

entertainment domains and reveals substantial

unexplored potential in emergent gameplay design.

Since there are many types of games, and in each

type, LLMs play different roles and undertake various

tasks, this paper collects research papers related to

RPGs to understand the roles that LLMs can play in

RPGs. It explores how LLMs can assist in game

development, optimize players' experiences, and

whether there is a potential for better integration

between LLMs and RPGs in the future, resulting in

even better outcomes as LLMs continue to evolve.

2 RPG OVERVIEW

RPG is a type of interactive video game in which

players assume the roles of virtual characters,

typically tasked with completing missions,

interacting with other characters, and exploring the

game world. The core features of RPGs lie in

character development and storyline progression.

Players continuously enhance their character's

abilities, equipment, and skills by engaging in various

activities within the game. In this process, the growth

of the character is closely tied to the progression of

the story, providing players with an immersive

gaming experience.

2.1 An Engaging Storyline

First is the plot with depth. A great RPG requires a

complex and engaging main storyline that draws

players in. It often includes elements such as

character development, emotional conflicts, and

moral choices, allowing players to not only "play" the

game but also "experience" the story.

The second is diverse endings. Based on the

player's choices, the game’s storyline may have

multiple endings, which increases replay value. The

player's decisions have a tangible impact on the

world, sparking an interest in exploring different

endings.

2.2 Vivid Character Development

First is rich character design. Each character has a

distinct personality, backstory, and motivation. A

great RPG reveals the characters' pasts, goals, and

emotions through interactions with NPCs, allowing

players to become more immersed in the game.

The second is the character development system.

In addition to numerical improvements, characters

should also develop in terms of skills, personality,

and even appearance. The player's choices can shape

the character's development path and even influence

the character's relationships with other individuals.

2.3 The Degree of Openness in the

Game World and the Diversity of

Choice

First is the open world. Many great RPGs offer an

open world where players can freely explore, not

limited to completing tasks. The world is filled with

rich details and secrets waiting to be discovered,

enhancing the joy of exploration.

The second is the consequences of player choices.

Players' decisions in the game should have substantial

consequences, influencing both the storyline and the

world’s state. For example, a player's choices could

lead to shifts in political situations in certain regions

or alter relationships with NPCs.

2.1 Detailed World-building and

Environment Design

First is the immersive world. A successful RPG needs

to design a virtual world with a rich historical

background and internal logic. The design of the

environment, architecture, culture, language, and

other aspects should make players feel that this world

is alive.

The second is the interactive world. Not just the

map and quests, every element in the world should

have its purpose, with players able to interact with the

environment, NPCs, and other characters in various

ways. For example, through communication, trade,

combat, or quests, players should feel their impact on

the world.

3 LLMS OVERVIEW

LLMs are the third stage in the development of

language models. Based on the Transformer

architecture, these models learn the grammar,

semantics and contextual information of language by

Optimization Strategies for Role-Playing Games Based on Large Language Models

633

training on large amounts of text data. In June 2018,

OpenAI released the GPT-1 model, marking a new

era for language models, namely the era of LLMs.

Released in June 2020, GPT-3 is regarded as a major

breakthrough in large language models,

demonstrating significant advancements across

multiple natural language processing tasks,

particularly in the quality and diversity of generated

text. Subsequent iterations like GPT-4, launched in

2023, further enhanced these capabilities with

improved contextual reasoning, reduced error rates,

and expanded multimodal functionalities. Table 1

provides a comparison of parameter counts across

different models.

Table1: Parameter Count Comparison.

Models Release Time Developers

Parameter size/108 Sample size/109

GPT-1 2018 OpenAI 1.17 10

BERT 2018 Goo

le 3.40 34

GPT-2 2019 O

enAI 15.00 100

Fairseq 2020 Meta 130.00 ——

GPT-3 2020 OpenAI 1750.00 4990

GLaM 2021 Google 1200.00 16000

LaMDA 2022 Google 1370.00 15600

GPT-4 2023 OpenAI 1750-2800 ——

In addition, Rudolph et al. analyzed the ChatGPT,

Google Bard (now Gemini), Microsoft Bing Chat

(now Copilot), and Baidu Ernie—in educational

applications, analyzing their capabilities in

knowledge delivery, interactive learning support, and

pedagogical effectiveness. In the field of education,

materials such as textbooks, exercises, solutions, and

various reports, such as summaries of each student's

learning progress, often require extensive creative

work. Since these are all text-related tasks, the

education field is highly compatible with text

generation (Rudolph, Tan, & Tan, 2023). In addition

to requiring text generation capabilities, the education

field demands that educational content be accurate

and thematically relevant. When users ask questions

or request exercise solutions, the model must

effectively understand the query's content and

provide appropriate answers. Furthermore, the

generated text must not only be clearly expressed but

also logically structured, which necessitates robust

comprehension and data processing capabilities

within the model itself. Therefore, the model's

performance in this study's testing phase can serve as

key evidence for evaluating the relative strengths of

different models. Table 2 presents the performance

comparison results.

Table 2: Performance Comparison.

index GPT-4 Bin

Chat Bar

Ernie

Generate

accuracy

High Medium Low

Medium（good in Chinese

）

Timeliness None High Low None

Academic

abilit

Excellent Average Poor

Strong academic ability in

Chinese

User

erience

Excellent Average Average Chinese-friendly

Problems

Hallucination, logical

errors

Prejudice, low academic

accurac

Incoherent

responses

Content restrictions due to

censorship.

4 THE APPLICATION OF LLMS

IN RPGS

4.1 Dynamic Dialogue Generation

In RPGs, LLMs are utilized to generate dynamic

dialogues for NPCs, imbuing them with character

depth and contextual responsiveness that enhances

narrative immersion. In this way, NPCs are no longer

limited to a single reaction to player behavior and

dialogue, but can instead exhibit more diverse and

personalized conversations based on different

contexts, player choices, and game progression. This

enables more vivid NPC characterization, enhancing

both the immersion and interactivity of the game,

which significantly boosts the player's overall gaming

ICDSE 2025 - The International Conference on Data Science and Engineering

634

experience. Nananukul et al. proposed a framework

for enhancing NPC dialogue with a more narrative-

driven approach using large language models,

focusing on games like Final Fantasy VII and

Pokémon. The goal is to enable NPCs to respond with

personality-appropriate reactions and tone in specific

scenarios (Nananukul & Wongkamjan, 2024). By

gathering character details, situational descriptions,

skills, and personality traits, a knowledge graph is

created to structure the data of game characters and

scenarios. Tailored prompt templates are then

developed for different games and characters,

providing character personality, describing specific

contexts, and outlining the dialogue goals and style.

The relevant information from the knowledge graph

is incorporated into the prompts, improving the

contextual relevance of the model’s generated

dialogues. In Final Fantasy VII, battle scenario

dialogues were tested, requiring characters to

generate dynamic responses based on the battle state,

such as enemy health or skill usage. In Pokémon,

NPC dialogues were tested, with Red being given

different personalities (e.g., talkative, confident) to

evaluate the diversity of the generated responses. The

results indicate that the quality of dialogue generation

is relatively high, with GPT-4 being able to

understand the character's behavior and reactions in

specific contexts and generate reasonable and natural

dialogues. It performs well in expressing simple

personalities, such as talkative or shy, with accuracy.

However, its ability to express more complex

personalities, such as mature or introverted, is

limited. The generated content may seem overly

positive or superficial, particularly in expressing

nuanced personalities like maturity or introspection.

Additionally, when generating repetitive dialogues or

overly positive tones, it does not always align with the

character’s established traits, such as Cloud's cold

personality.

Huang discussed and analyzed the performance of

GPT-4 and GPT-3.5 Turbo. He used GPT-4 for NPC

dialogue generation in RPG games, leveraging its

large number of parameters and enhanced context

window to generate more natural, coherent, and

highly contextually relevant text (Huang, 2024). By

providing the model with the game's background and

NPC character settings, and capturing dynamic

information, NPCs can generate dialogue that better

aligns with the NPC's background and the game's

current state.

Chubar designed an RPG game and used GPT to

generate procedural content, creating a game with

rich narratives and dynamic content. The study tested

the performance of the two models by comparing the

number of worlds and total units generated under two

different diversity conditions (Chubar, 2024). With a

temperature of 0.6, GPT-3.5 Turbo generated 15

worlds, totaling 489 units, with an average of

approximately 32.6 units per world. In comparison,

GPT-4 generated 5 worlds, totaling 210 units, with an

average of 42 units per world, showing a higher unit

density. At a temperature of 1.2, the generated worlds

were the same as those at 0.6, but GPT-3.5 Turbo

produced fewer total units (437), suggesting that

higher temperature may lead to greater content

diversity at the cost of unit density. Meanwhile, GPT-

4 generated fewer total units (195), with a slight

decrease in the average number of units per world

(39). GPT-4's world unit density was higher at a lower

temperature (0.6) (42 units per world), indicating that

its generated content is superior to GPT-3.5 Turbo in

terms of control and detail richness. As the

temperature increased from 0.6 to 1.2, the number and

density of units slightly decreased, but remained

relatively high, indicating that GPT-4 can maintain a

high level of quality while increasing generation

diversity. In contrast, the content generated by GPT-

3.5 Turbo showed a more significant decline.

4.2 Task Description

LLMs can also be used for task description

generation. Due to their exceptional text generation

capabilities, using LLMs for task generation in games

can significantly reduce the workload of developers.

By simply setting the game background and relevant

content for the LLMs, high-quality task descriptions

can be generated. Värtinen et al. explored the

application of GPT-2 and GPT-3 in generating task

descriptions for Role-Playing Games (RPGs). As

players' demand for game content continues to rise,

developers face the challenge of manually designing

tasks. The paper proposes an improved version of

GPT-2 called Quest-GPT-2, designed to

automatically generate RPG task descriptions

(Värtinen, Hämäläinen, & Guckelsberger, 2022). The

study evaluated the model’s performance through

calculated metrics and a large-scale user study

involving 349 players. Task data was extracted from

six classic RPG games—Baldur’s Gate 1 & 2,

Oblivion & Skyrim, Torchlight II, and Minecraft. The

GPT-2 fine-tuned version, Quest-GPT-2, and GPT-3

were used to generate 500 task descriptions. These

descriptions were then rated by the 349 RPG players

based on three criteria: "Does the task description

match the game task?", "Is the text coherent and

logical?", and "Does it match the RPG task style?".

The results showed that the task descriptions

Optimization Strategies for Role-Playing Games Based on Large Language Models

635

generated by the large models were mostly

acceptable, with some even outperforming those

written by humans. However, issues such as

repetition, character confusion, and logical errors

were present, while the task descriptions generated by

GPT-3 were more natural and logically clear.

Koomen collected thousands of task data from

four multiplayer role-playing games (World of

Warcraft, The Lord of the Rings Online, Neverwinter

Nights, The Elder Scrolls Online), which included

task objectives, context information, and task

descriptions. Using GPT-J-6B, he fine-tuned the

model on the task data from each game, generating

four different models to generate tasks for each game.

The quality of the generated tasks was then evaluated

through user surveys and focus group interviews,

comparing them to the original task descriptions. The

results showed that the generated tasks contained

more elements of surprise and mystery, but also

revealed issues such as logical inconsistencies and

repetition (Koomen, 2023). Although the generated

task descriptions showed some improvement over

those written by humans, challenges remain in

generating complex backgrounds and task

motivations. For example, the background motivation

of some tasks may not be fully captured, leading to

missing or inconsistent motivations in certain tasks.

4.3 Dungeon Master in Some Contexts

LLMs can also serve as game masters (GM) for

tabletop role-playing games (TRPGs), such as in

Dungeons & Dragons, or as an auxiliary system. In

TRPGs, the GM plays a crucial role, ensuring the

game runs smoothly and being responsible for

creating and describing the world, environment,

backstory, and plot within the game.

Sakellaridis explored the feasibility of using

LLMs as Dungeon Masters (DM) for Dungeons &

Dragons (D&D) (Sakellaridis, 2024). The study was

based on text data from the popular Critical Role

D&D livestream series, fine-tuning ChatGPT to adapt

it for the DM role. The reference adventure module

"The Sunless Citadel" was chosen to ensure the LLM

agent adhered to D&D rules. A text-based TRPG

environment was created, where players input text

commands, and the LLM agent responds. Finally,

through player feedback on both human DMs and

LLMs, game dialogue data, and statistical analysis,

the study compared the strengths and weaknesses of

using an LLM as a DM. The results showed that

LLMs excel at creating immersive worlds but

perform less well than human DMs in terms of

storyline flow. In some scenarios, LLMs tended to

force the plot forward, reducing player freedom and

struggled to fully adapt to nonlinear story branches,

leading to issues with narrative continuity.

Zhu et al. discussed the potential of using LLMs

as an assistant system for DMs, helping them generate

scenes, summarize monster information, extract key

details, and provide creative suggestions. The study

explored how LLMs can assist DMs in game

narration, enhancing the immersion of the game (Zhu

et al., 2023). This research invited 71 players and

DMs to participate in an experiment, testing the game

on a Discord-based D&D server. Quantitative data

analysis and user interviews were used to evaluate the

quality of the content generated by the LLM, the

player experience, and the role of AI in the game. The

study concluded that LLMs can assist the DM, but

they are not yet able to independently guide the game.

4.4 Assist in Game Development

Another important use of LLMs is to assist game

developers in game development. However, research

in this area is still relatively limited. The main reason

is that the accuracy of LLMs in generating code needs

improvement. Their understanding of the global

context of code is still limited, especially when

dealing with cross-file references, complex

architectures, or large projects, which can lead to

missing key information and thus affect the

correctness and accuracy of the code. Additionally,

LLMs can only generate code statically and lack the

code execution and error-checking capabilities of an

IDE or interpreter, making it difficult to detect

potential bugs. Furthermore, LLMs also face issues

such as hallucinations and redundancy, which further

limit their use in game development applications.

Chen et al. explored the potential of using LLMs

for automated game development to improve game

development efficiency. The study introduced a

model called GameGPT, which can play a role in

various stages of game development, including

planning, task classification, code generation, and

task execution (Chen et al., 2023). By collaborating

with small expert models, LLMs can help reduce

redundancy and hallucination issues. Additionally,

breaking the code into smaller segments for

GameGPT to generate can improve code accuracy.

The study incorporated a review role at each stage to

test the reliability of the model. The results showed

that it not only optimized the generation process but

also enhanced the system's reliability and scalability

through a feedback loop.

ICDSE 2025 - The International Conference on Data Science and Engineering

636

In addition to assisting with programming, LLMs

can also aid game developers in the creation of game

worlds. Nasir et al. studied the use of LLMs to

transform stories into playable game worlds. They

proposed a system called Word2World, which

extracts information about characters, settings,

objectives, and interactive elements from a story and

generates the game world’s environment, layout, plot,

and interactive content. The system can also be

continuously optimized through algorithmic checks

and feedback loops (Nasir, James, & Togelius, 2024).

The experiment showed that the generated game

worlds were playable in 90% of cases, but LLMs

faced certain challenges in creating large-scale 3D

games or animation effects.

5 CONCLUSIONS

By investigating the development direction and

current situation of LLMs in RPG games, this paper

summarizes four directions that LLMs can play:

generating dynamic dialogue for NPCS, generating

more narrative mission descriptions, acting as a host

for games, and assisting game development.

Although progress has been made in these directions,

the LLM's data compression and illusion problems

caused by inconsistencies, as well as the redundancy

problems caused by the model itself or its search

strategy, hinder the LLMs from performing better in

RPG games.

With the development of the current LLMs model,

there will be 70 models in the Chatbot Arena in 2024

that perform better than the GPT-4 in 2023, and the

deepseek V3 model will be released in early 2025.

With the update of algorithms and the upgrade of

hardware, it can be predicted that LLMs will certainly

have rapid development in the future, and various

models will definitely be upgraded in the direction of

more intelligence. Under such conditions, LLMs can

play a very important role in RPG games, and even in

the whole electronic games, and have an important

role in the development of the game industry.

REFERENCES

Chen, D., Wang, H., Huo, Y., Li, Y., & Zhang, H. (2023).

Gamegpt: Multi-agent collaborative framework for

game development. arXiv preprint arXiv:2310.08067.

Chubar, A. (2024). Generating game content with

generative language models.

Huang, J. (2024). Generating dynamic and lifelike NPC

dialogs in role-playing games using large language

model.

Koomen, S. B. (2023). Text generation for quests in

multiplayer role-playing video games (Master's thesis,

University of Twente).

Nananukul, N., & Wongkamjan, W. (2024). What if Red

can talk? Dynamic dialogue generation using large

language models. arXiv preprint arXiv:2407.20382.

Nasir, M. U., James, S., & Togelius, J. (2024).

Word2World: Generating Stories and Worlds through

Large Language Models. arXiv preprint

arXiv:2405.06686.

Rudolph, J., Tan, S., & Tan, S. (2023). War of the chatbots:

Bard, Bing Chat, ChatGPT, Ernie and beyond. The new

AI gold rush and its impact on higher education. Journal

of Applied Learning and Teaching, 6(1), 364-389.

Sakellaridis, P. (2024). Exploring the Potential of LLM-

based Agents as Dungeon Masters in Tabletop Role-

playing Games (Master's thesis).

Värtinen, S., Hämäläinen, P., & Guckelsberger, C. (2022).

Generating role-playing game quests with GPT

language models. IEEE Transactions on Games, 16(1),

127-139.

Xu, X., Wang, Y., Xu, C., Ding, Z., Jiang, J., Ding, Z., &

Karlsson, B. F. (2024). A survey on game playing

agents and large models: Methods, applications, and

challenges. arXiv preprint arXiv:2403.10249.

Zhu, A., Martin, L., Head, A., & Callison-Burch, C. (2023,

October). CALYPSO: LLMs as Dungeon Master's

Assistants. In Proceedings of the AAAI Conference on

Artificial Intelligence and Interactive Digital

Entertainment (Vol. 19, No. 1, pp. 380-390).

Optimization Strategies for Role-Playing Games Based on Large Language Models

637