Understand Watchdogs: Discover How Game Bot Get Discovered
Eunji Park, Kyung Ho Park and Huy Kang Kim
School of Cybersecurity, Korea University, Seoul, Republic of Korea
Keywords:
Explainable Artificial Intelligence, Game Bot Detection.
Abstract:
The game industry has long been troubled by malicious activities utilizing game bots. The game bots disturb
other game players and destroy the environmental system of the games. For these reasons, the game industry
put their best efforts to detect the game bots among players’ characters using the learning-based detections.
However, one problem with the detection methodologies is that they do not provide rational explanations
about their decisions. To resolve this problem, in this work, we investigate the explainabilities of the game
bot detection. We develop the XAI model using a dataset from the Korean MMORPG, AION, which includes
game logs of human players and game bots. More than one classification model has been applied to the dataset
to be analyzed by applying interpretable models. This provides us explanations about the game bots’ behavior,
and the truthfulness of the explanations has been evaluated. Besides, interpretability contributes to minimizing
false detection, which imposes unfair restrictions on human players.
1 INTRODUCTION
Along with the growth of the online game industry,
Massively Multiplayer Online Role-Playing Game
(MMORPG) has become a significant modern leisure
measure. MMORPG is one of the types of online
games that a user creates his or her character to en-
counter a variety of entertaining content, such as
building a social community, selling in-game assets,
or even dating with other players. As users can ex-
perience a wide range of contents in a virtual world,
MMORPG ecosystems resemble an actual ecosystem
of the real world (Galarneau, 2005).
Interestingly, one of the similarities between the
MMORPG ecosystem and the real world is malicious
activity. Similar to criminals or offenders in a society,
there have existed malicious characters in MMORPG
called game bots (Castronova, 2001). A game bot
is an automated program that makes the character
move around the virtual world as specified and per-
forms predefined actions to collect game assets (Kang
et al., 2013). Because game assets can be traded with
the cash in the real world, which is called a Real
Money Trade (RMT), most game bots do a repetitive
action for asset accumulation, such as hunting weak
monsters over and over again without getting tired
(Kwon et al., 2013). By utilizing these advantages
of the game bot, some malicious entities form a Gold
Farmer Group (GFG), which systematically manages
a huge amount of bot characters to earn illegal income
on a large scale (Kwon et al., 2013).
These malicious activities of GFGs cause damage
to the game company. First, the amount of game as-
sets offered by GFG is lower than the legal ‘in-app
purchase’ price; thus, users do business with GFG.
Furthermore, GFGs harm user satisfaction as they
monopolize game items through a massive amount of
game bots, and it promotes normal users to leave the
game. Due to the damage coming from GFGs, the
game industry has focused on an effective game bot
detection model.
Early approaches employed data mining methods
to identify unique characteristics of game bots. As
game bots are designed to accumulate game assets
rather than other activities, bot characters show a dif-
ferent pattern from normal users. Past approaches
analyzed chatting patterns, social events, or chat-
ting patterns to scrutinize bot characters’ distinct pat-
terns. Under the effective representation of the fea-
tures mentioned above, prior researches achieved a
significant bot detection performance.
Although bot detection models with data mining
approaches showed a precise performance; however,
the performance of the detection model highly re-
lied on the feature engineering process. If the bot
changes its behavioral pattern in order to evade the
monitoring system, a particular feature for the bot
detection changes, and it recesses the model perfor-
mance. Not only the change of bots’ pattern, but data
mining-based models also require frequent improve-
924
Park, E., Park, K. and Kim, H.
Understand Watchdogs: Discover How Game Bot Get Discovered.
DOI: 10.5220/0010264609240931
In Proceedings of the 13th International Conference on Agents and Artificial Intelligence (ICAART 2021) - Volume 2, pages 924-931
ISBN: 978-989-758-484-8
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
ment along with the update of the game ecosystem.
In case of the update of the game ecosystem, features
used in the detection model might be changed or dis-
missed; therefore, the practitioner had to provide a
huge effort on the feature engineering process.
To overcome the limits, recent approaches present
bot detection methods with machine learning mod-
els. Machine learning models effectively recognize
patterns with fewer features; thus, prior works em-
ployed a wide range of models to the game bot de-
tection (Chung et al., 2013; Lee et al., 2016). Fur-
thermore, deep neural networks contributed to state-
of-the-art bot detection performance by recognizing
distinct patterns between game bots and normal char-
acters. However, there still exists the problem of
interpretability. Although the model identified bot
characters from normal users, game operators should
provide why a particular character is classified as a
game bot. Furthermore, interpretable bot detection re-
sults produce significant insights that game operators
to design the game ecosystem with decreased mali-
cious activities. Unfortunately, deep machine learn-
ing models cannot explain detailed explanations on
the detection result and leave a room for improvement
related to the interpretability.
This study employed the Explainable Artificial
Intelligence (XAI) approaches to establish ‘explana-
tions’ on the game bot detection. The XAI is a mod-
ern approach to provide detailed logic behind the ma-
chine learning models’ detection results. As a base-
line of our study, we established concrete bot detec-
tion models that achieve detection accuracy above
88%. We trained two classifiers composed of ran-
dom forest models (RF) and multi-layered percep-
trons (MLP), which are the algorithms that belong
to machine learning and deep learning, respectively.
Leveraging trained classifiers, we explored the impor-
tance of utilized features on detection. We scrutinized
the reasons behind the classification result of two clas-
sifiers to provide the interpretability of the model.
For the first classifier with the RF model, we ex-
tracted the information regarding feature importance,
which is intrinsically equipped in the model. Addi-
tionally, we identified permutation importance, which
applies feature importance repeatedly. For the second
classifier with MLP, we applied Local Interpretable
Model-agnostic Explanations (LIME) and SHapley
Additive Explanations (SHAP) to scrutinize the ex-
planations about features that stand out most in clas-
sifying game bot characters. Finally, we compared
the explanations of two classifiers applied with XAI
approaches and addressed how high-performing clas-
sifiers identify bot characters from normal characters.
Throughout the research, our contributions are as
below.
We established two high-performing bot detec-
tion models with RF model and deep neural net-
works and applied XAI approaches to produce in-
terpretability behind their decisions. Specifically,
we figured out significant features on game bot
detection leveraging four XAI methods: the fea-
ture importance and permutation importance at
RF model, and LIME and SHAP at MLP model.
Consequentially, we examined both the similarity
and difference of significant features between two
classifiers.
We explored the impact of significant features re-
sulted from XAI approaches to detection accu-
racy. To estimate the importance of particular
features, we deleted a specific feature from the
feature set and observed the detection accuracy
change. We compared the performance between
the feature set without a particular feature and the
original feature set and estimated the importance
of an excluded feature in the meaning of detection
accuracy.
We analyzed a difference between game bots and
heavy users, which past bot detection studies en-
deavored to clarify. As heavy users and automated
bot characters show a similar pattern, it has been
critical to proving the difference between the two
types of characters mentioned above. Through-
out the analysis with XAI approaches, we checked
the difference of significant features between bot
characters and heavy users and resulted in candi-
date features that can be used for further detection
analysis.
2 LITERATURE REVIEW
Lots of research have suggested insights for the game
bot detection. In the early stage of the related re-
search, people considered the game bots’ character-
istics a primary key. Kang et al. analyzed the bots’
behaviors and established detection rules. Compar-
ing the game logs of both human players and bots,
the researchers discovered that the game bots make
a party of two members and stay much longer than
average. They also observed the game bots tend to
have lots of experience points logs and fewer race
points logs. The researchers established bot detec-
tion rules and showed efficacy with 95.92% accuracy
based on the knowledge (Kang et al., 2013). Fur-
thermore, Lee et al. unveiled a monetary relationship
between game bots leveraging network analysis. By
Understand Watchdogs: Discover How Game Bot Get Discovered
925
Table 1: Related researches of the game bot detection.
Reference Data Proposed Method
(Kang et al., 2013) AION Rule set
(Chung et al., 2013) Yulgang Online Multiple SVM
(Lee et al., 2016) Lineage, AION, Blade & Soul Logistic Regression
(Park et al., 2019) AION LSTM
(Tsaur et al., 2019) KANO MLP NN
(Tao et al., 2018) NetEase MMORPGs ABLSTM
(Lee et al., 2018) Lineage Network Analysis
visualizing the bot characters’ network in a live ser-
vice game, they proposed the analysis of transactions
among game characters can be a significant pattern
of game bot detection (Lee et al., 2018). Chung et
al. proposed a machine learning methodology to de-
tect malicious activities. First, twelve behavioral fea-
tures were selected, and five advanced features were
extracted by combining and calculating such features.
The players’ data clustered based on game-playing
styles: Killers, Achievers, Explorers, and Socializ-
ers. At last, the researchers trained support vector
machine (SVM) models with the data clusters to ob-
tain the best model. Their methodology demonstrated
higher accuracy in all play style clusters with diverse
feature combinations compared to the global SVM
model (Chung et al., 2013).
Lee et al. built a framework using the self-
similarity of players. The concept of self-similarity
came from the fact that game bots have a high chance
of repeating the same actions over time. They pre-
processed three MMORPGs dataset to get the self-
similarity with other known features. The result with
logistic regression demonstrated the effectiveness of
the feature showing 95% detection accuracy (Lee
et al., 2016). However, the methods mentioned above
are not sustainable because game bots’ actions vary
along with their purpose. For this reason, if we do
not give attention to the feature importance, the ma-
chine learning models produce relatively less precise
results. It makes the feature selection the most crucial
step for game bot detection using machine learning
models. Moreover, due to the enormous number of
logs by the nature of games, choosing features could
be either essential or ineffective with much work. In
short, the methodologies are excessively dependent
on feature selections that require high costs.
Because of these shortcomings, research has in-
troduced deep learning technologies for bot detec-
tion. Park et al. used the time-series financial data
of the game logs. The insisted the game bots’ finan-
cial status was exceptional regardless of how the bots
behave. Based on the idea, they used a Long Short-
Term Memory (LSTM) model to display the differ-
ence between the financial status of human players
and bots. The LSTM operated well with the time-
series data in terms of computations of influences
to the future situation. They proved the proposed
method achieved high performance with over 95%
accuracy (Park et al., 2019). Tsaur et al. experi-
mented with calculating the abnormal rate of each
player and classified game bots through their pro-
posed deep learning models. They provided insight
into the concept of gray zone players, which are the
human players that behave similarly to bots. Their
performance was proven through a real game dataset
of “KANO” (Tsaur et al., 2019).
Tao et al. proposed a bot detection framework,
NGUARD. The framework included preprocessing,
online inference, the auto-iteration mechanism, and
offline training. One of the two solutions provided
in the online inference phase was a supervised clas-
sification using a pre-trained ABLSTM model. The
model got trained from the offline training phase to
classify the game bots well. The training was con-
ducted based on the continuous sequence data of user
behaviors. NGUARD framework performed signifi-
cantly on the bot detection showing higher than 97%
precision (Tao et al., 2018). Even though both the re-
search of Park and Tao considered several features,
they did not require critical feature engineering.
Table 1 shows the data and proposed method to
detect bots in the previous research. Bot detection
technologies have reached their goal in the game in-
dustry, classifying the game bots among human play-
ers. One problem that has arisen recently is that they
cannot explain the classification, which is a chronic
problem of deep learning models. Because the pri-
mary purpose of the research has been a high detec-
tion performance, how the models made results has
not been in consideration. Nowadays, such black-box
nature becomes “hot potatoes” in detection systems.
In the game bot detection, the most critical feature
is not revealed so far. Through several methods de-
signed to explain non-interpretable models, we com-
pare their explanations to get the best applicable fea-
tures. The following sections describe further details
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
926
of how to gain interpretability and what affects the bot
detection most.
3 EXPERIMENT AND RESULT
3.1 Dataset
We use the game dataset of a popular MMORPG
AION provided by the Korean game company, NC-
Soft. The company accumulated game log data from
AION to discover patterns of malicious players. The
preprocessed data of the given dataset is distributed
by Hacking and Countermeasure Research Lab, Ko-
rea University. The data contains 49,530 players who
have played the game for more than three hours, in-
cluding 6,213 bots. The data collection period was
from April 10 to July 5, with 88 days.
The publisher confirms the ground truth with the
data labels, “Human” and “Bot.” NCSoft investigated
all doubtful players by having people look through
the players’ game logs manually. Even after classi-
fying humans and bots in such a method, the com-
pany kept updating labels for players who got con-
firmed not to be bots through customer inquiries or
other ways. One more thing to consider is that the
numbers of data in the two classes differ enough to
make the classifiers biased. To eliminate the bias, we
proceed with both over-sampling and under-sampling
by manipulating the number of each class’ data.
3.2 Features
The raw dataset included an enormous amount of
game log data that expressed every player’s character-
istics related to the items, skills, and social activities
in a time-series manner. We decided to use a pre-
processed dataset that converted time series data into
table structure data instead of the original dataset be-
cause we only need information about each account
and its characteristics for our research purposes. Ad-
ditionally, the data of players with level below five
were disregarded because they have not experienced
the game enough.
The list of features of the dataset is described in
Table 2. The player information category contains
information related to the users’ gameplay, and the
player action category covers features on the users’
behaviors while playing the game. The category for
social activity shows how much the users have partic-
ipated in social activities such as party and guild to
interact with other users in the game.
3.3 Experiment Setup
3.3.1 Baseline Classifiers
For the detection, we used RF and MLP models. RF
is a tree-based classifier, and it is one of the tradi-
tional machine learning models for classification. The
model randomly selects combinations of features for
the most acceptable performance. RF is not in a black
box; thus, we choose the RF for the experiment with
a baseline assumption that the RF model can provide
rational explanations. On the other hand, MLP is con-
sidered as the most common deep learning model.
Our experiment uses the MLP model because it is not
limited to specific data forms, and it gives a chance
to try diverse layers to customize the model. We or-
ganize the model with one input layer, three hidden
layers, and one output layer that classifies the player
into three classes. As this paper does not aim to build
an accurate classification model, the model parameter
is not specifically selected.
3.3.2 Explainers
The purpose of the experiment is to conduct several
explainable models. The RF model can explain it-
self with “feature importance” attribution. Consider-
ing the entire decisions that the model makes, it com-
putes the weight of each feature. The model estimates
the influence of features based on the impurity. It ex-
plains how the RF model predicted and which fea-
tures performed essential roles in the classification.
Another explainer derived for the RF classifier is an
inspection method called “permutation importance.
The permutation is computed the feature importance
through multiple validation and evaluations. In the
case of the MLP classifier, it guarantees a higher ac-
curacy in the detection process, but the model can-
not be interpretable. For the interpretability of the
MLP classifier, LIME is a known open-source mod-
ule that serves the explanation for the single predic-
tion. Ribeiro et al. introduced LIME which gets im-
portances of each feature by changing the input data
for the specified instance and getting into the different
class (Ribeiro et al., 2016). Unlike LIME, SHAP sup-
ports the explanation of the global instances. Making
the trained MLP model go through SHAP provides us
a perception of the overall data classification.
3.3.3 Experiments
A comparison between definitive explanations for RF
Classifier and external explanations using LIME and
SHAP modules for MLP is the key takeaway. First of
all, it is essential to train the models with RF and MLP
Understand Watchdogs: Discover How Game Bot Get Discovered
927
Table 2: Features.
Category Features
Player Information Login count (1), Logout count (2), Playtime (3-4), Money (5), Total login count (6), IP
count (7), Max level (8)
Player Action Collect (9), Sit (10-12), Experience (13-15), Item (16-18), Money (19-21), Abyss (22-
24), Exp repair (25-26), Portal usage (27-28), Killed (29-32), Teleport (33-34), Reborn
(35-36)
Social Activities Party time (37), Guild action (38), Guild join (39), Social diversity (40)
(a) Feature importance (b) Permutation importance
Figure 1: Explanations for Random Forest Classifiers.
to classify the human players and game bots with de-
cent performance. We build and train a simple MLP
model with a customized number of hidden layers and
demonstrate acceptable performance. When the clas-
sifiers are trained enough, it is time to explore which
features influence the decision making on classifica-
tions. The RF model uses its built-in attribute, the
feature importance, and is applied to calculate the per-
mutation importance of the features. The outputs give
us what features impact on the classifier most.
On the other hand, as the black-box model, the
model with MLP requires interpretability coming
from the external module named LIME. The primary
LIME explains for one instance at a time, and it guar-
antees the accurate explanations. However, observing
the interpretability for all instances is unrealistic in
the industrial environment; thus, a LIME extension
called an SP-LIME is used. SP-LIME selects repre-
sentative cases with diverse explanations.
After the experiment, we obtain a total of four ex-
planatory results and attempt to evaluate them. We
delete the most significant features shown in each re-
sult and retrain the classifiers to compare the efficacy
of the classification. The more accurate the XAI ex-
plains, the more performance we lose when eliminat-
ing the features.
Furthermore, the last experiment extracts only
false positives from the classification. The false-
positive case is that the model classifies human users
as a bot, and for the game industries, the case should
be minimized. We investigate the cases through
LIME and figure out the characteristics of human
players who play like bots, “Heavy users. Based on
significant features on misclassifying heavy users to
bots, the class named ‘Human’ is divided into nor-
mal users and heavy users. The classification problem
now has three categories, and we expect to get lower
false positives and a better detection rate.
3.4 Results
At first, we trained a RF classifier and MLP classifier.
The dataset contained two classes: Bot and Human,
and it has been through the oversampling process due
to the low number of data in the “Bot” class. 90%
of the dataset was used to train the models, and the
rest of the data was for the validation. The validation
accuracy of the RF model is 88.51%, while the MLP
model reached over 90.10%.
The first explanation method is ‘fea-
ture importances , which is for the RF model.
Figure 1 shows the result of the explanation
method with the top 30 crucial features on its
left side. According to the first method, the most
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
928
Table 3: SP-LIME Results.
Class Feature 1 Feature 2 Feature 3
Human 1 Login day count Item get ratio Login count
Human 2 Login count Exp get count playtime per day
Bot 1 Exp get count Login day count exp get count per day
Human 3 Item get ratio Teleport count per day Login day count
Bot 2 exp get count per day sit count Login day count
Figure 2: Explanations using SHAP for MLP.
significant feature of the RF model is ‘play-
time per day. ‘item get count per day’ and
‘exp get count per day’ follow in it, and fea-
tures about item and playtime are on the top of
the rank. The other explanation method at the RF
model is ‘permutation importance. The importance
of playtime per day has shown much higher in this
method, and we thought a reason is that this method
repeats the calculation of representative features
to obtain the permutation. Therefore, the feature
showed extraordinary importance; thus, we exclude
the feature on the right side of Figure 1, showing 30
importances from the second important feature. Un-
like the previous method, the money-related features
and points getting ratio seemed to be influential.
On the other hand, the deep learning-based
classification model does not provide a method to
describe the model independently. To detour the
disadvantage, we applied the LIME module on the
model. Because LIME provides interpretability of
what classification results in each case, it is not prac-
tical to scrutinize all the cases one by one. Therefore,
an extended version of LIME called SP-LIME has
been introduced to analyze several representations,
and we chose to examine the ve cases. The results
are in Table 3, saying the Human 1 frequently logs
in a day and is mainly focused on collecting items.
Human 2 involves the players who have played the
game for a long time. The Human 3 appears to
be wandering around the game world, and it has
a similar play style to the Human 1. On the other
hand, the common characteristic of Bot classes is that
they gain an overwhelming amount of experience
points in a day compared to human players. After
all, SHAP looks for explanations by checking each
instance like LIME, and it also provides an average
global explanation. Applying SHAP to the trained
MLP classifier, Figure 2 shows its output with the
crucial features. It represents the importance of the
classification features without the distinction between
positive and negative values (Lundberg and Lee,
2017). The result said the playtime had the most
influence on the classification, and login counts and
experience points getting counts also had notable
impacts.
At the results, playtime per day and
item get count per day are the most important in
the feature importance method, and avg money and
exp get ratio are the most important in permutation
importance. The most frequently observed features
in SP-LIME results were login day count and
exp get count, while SHAP said playtime per day
and login day count were critical. The two clas-
sification models got trained again by excluding
the most important features from the correspond-
ing XAI results. Removing playtime per day and
item get count per day features, the performance
of the RF classifier has not reduced at all with an
accuracy of 88.58%. With the permutation impor-
tance, when we deleted avg money and exp get ratio
from the data, the classifier showed a little lower
performance, 88.21%. For the MLP classifier, we
first dropped login day count and exp get count
according to SP-LIME results, and its performance
with validation accuracy became 89.13%, which
were lower compared to the original performance of
90.10%. In the end, when deleting playtime per day
and login day count features referring SHAP, the
MLP classifier showed 89.37% validation accuracy.
The last experiment was to identify features that
play an important role in distinguishing between
the heat users who play like bots. Using only the
MLP classifier, when the trained model got obtained,
Understand Watchdogs: Discover How Game Bot Get Discovered
929
Figure 3: Comparison of classification performance by
class distribution.
we extracted only false-positive cases from the
classification results, and the features that played
important roles in the cases were identified through
XAI. Because we used an MLP classifier and re-
quired XAI methods that can represent explanations
for each instance, we decided to use LIME for
this purpose. We examined several cases that the
classifier predicted as a bot when the case was a
human user. In the most cases, exp get ratio feature
seemed effective, followed by collect max count
and exp get count per day. Based on the results, the
new class called “Heavy user” was formed from the
“Human” category, and the classification used the
new dataset with three classes. The classification
performance of the MLP model improved a lot from
the original MLP with the validation accuracy of
96.02%. Furthermore, the metric of false positives
became significantly lower from 881 to 327. Figure
3 describes the improvement by classifying heavy
users utilizing the XAI.
4 DISCUSSION
We have identified the two base classification models’
performance and features that played an essential role
in the classification through the experiments. The in-
terpretability of the RF model could only identify the
rank of feature importance, but not the difference be-
tween the game bots & the non-bot. Consequently,
we used LIME and SHAP, and they successfully sup-
ported the functionalities.
First, we could identify the features that were im-
portant in classifying bots. The RF classifier ex-
plained the binary result that is either human or bot,
while the MLP model provided an explanation based
on the prediction probabilities of each class. We
found out that the main criteria for distinguishing bots
are related to earning items and experience points
through explanatory technologies. Additionally, play-
time per day showed significant relevance in some
explanations.
The second experiment was purposed to deter-
mine which method is the best among the XAI tech-
niques by observing the reduction of the performance
when deleting the most significant features. The ex-
planation from feature importance could not make
the classification performance lower, while one from
the permutation importance affected the performance
with a little figure, 0.3%. We considered the permuta-
tion importance showed a better explanation than the
feature importance because of its repetitive patterns.
The important features of the MLP model seemed to
be more apparent than the former model. When we
deleted the features that SP-LIME determined as the
most important, the detection accuracy of the classi-
fier became 1% lower. With the result of SHAP, the
performance change was similar to the experimental
result using permutation importance. Therefore, we
could observe some tendencies with the essential fea-
tures of bot detection. Even though playtime per day
appeared a lot in the explanations, it has little effect
on the bot detection itself. The features related to
the day count or get ratio of login, experience points,
items are rather more critical. Additionally, among
the four explainable AI techniques, the one that pro-
vided the best explanations was the LIME module.
At last, we made a distinction between regu-
lar users and heavy users. With the three fea-
tures, exp get ratio feature, collect max count and
exp get count per day, “heavy user” class was cre-
ated, and the classification showed outstanding per-
formance with low false positives. The result im-
plies that the features were essential to distinguish the
heavy players and game bots. We added the heavy
user class by identifying the classification reasons and
confirmed that the classification performance had im-
proved a lot without model modification.
Whether it is a game bot or a human player has
been a long time problem. Additionally, false pos-
itives can cause complications if a human player
is misclassified as a bot. These days, with more
deep learning-based classifications, such character-
istics and problems have become more noticeable.
There is a growing need to discover the cause of de-
tection for those who want to protest false detections
as bots. To that end, we checked out how to explain
the bot detection models. In this way, the game indus-
tries can verify the properties of the game bot and ex-
amine the features that must be referred to. By inves-
tigating how the game bot detection works, designing
a new game will help identify the data extracted to
win the fight against the game bot. We also think it
is useful when the game logging system gets updated
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
930
with better features.
5 CONCLUSION
Numerous researches have suggested methods for de-
tection to resolve the problem that the online game
industry is suffering from. However, because a game
industry is based on the interactive behaviors of real
people, false detections should not be overlooked
when it occurs. The actions to provide logical evi-
dence are the most required for this reason. There has
been a disadvantage of not interpretable in a compli-
cated deep neural network model, but it has become
feasible through a new paradigm called XAI.
Upon the importance of interpretability, we aimed
to establish a bot detection model that pursues two
goals: a precise detection performance and providing
cues of detecting game bot activities. We extracted
candidate features from raw log data of AION, a pop-
ular MMORPG in a live service. Based on extracted
features, we applied multiple XAI approaches to the
bot detection task. First, we leveraged a RF model, a
classical machine learning algorithm, to provide the
importance of particular features and permutations.
Then we set a simple MLP model with XAI modules:
LIME and SHAP.
Along with the experiment, a game bot detec-
tion performance of both the RF model and MLP
model achieved over 88% accuracy, which is a de-
tection result as a baseline. We compared the re-
sulted set of features from both the RF Model and
MLP model with XAI modules. We also evaluated
the XAI methods by comparing the detection perfor-
mance after excluding significant features from the
feature set. The permutation importance and SHAP
modules were evaluated as better methods than the
feature importance, and the LIME module was rated
as the best of the used explanation methods. Last
but not least, we also clarified the difference between
game bot characters and heavy users. As game bots
and heavy users similarly accumulate in-game assets
in a short time, past detection models experienced
confusion between two classes. We established our
analogy based on the XAI results, and we evaluated it
resolved the confusion through the experiment.
We believe future studies consider the in-depth ex-
amination of the significant features that the expla-
nation models present. It involves exploring the dif-
ference between the meanings of significant features
when using the RF model’s explanation and the XAI
module. Even the data is from different domains other
than the game industry, if we analyze the meaning of
each feature, it will reflect our understanding of the
domain and reach a way to reduce false detection fur-
ther.
REFERENCES
Castronova, E. (2001). Virtual worlds: A first-hand account
of market and society on the cyberian frontier.
Chung, Y., Park, C.-y., Kim, N.-r., Cho, H., Yoon, T., Lee,
H., and Lee, J.-H. (2013). Game bot detection ap-
proach based on behavior analysis and consideration
of various play styles. ETRI Journal, 35(6):1058–
1067.
Galarneau, L. (2005). Spontaneous communities of learn-
ing: A social analysis of learning ecosystems in mas-
sively multiplayer online gaming (mmog) environ-
ments.
Kang, A. R., Woo, J., Park, J., and Kim, H. K. (2013). On-
line game bot detection based on party-play log anal-
ysis. Computers & Mathematics with Applications,
65(9):1384–1395.
Kwon, H., Woo, K., Kim, H.-c., Kim, C.-k., and Kim, H. K.
(2013). Surgical strike: A novel approach to mini-
mize collateral damage to game bot detection. In 2013
12th Annual Workshop on Network and Systems Sup-
port for Games (NetGames), pages 1–2. IEEE.
Lee, E., Woo, J., Kim, H., and Kim, H. K. (2018). No silk
road for online gamers! using social network analysis
to unveil black markets in online games. In Proceed-
ings of the 2018 World Wide Web Conference, pages
1825–1834.
Lee, E., Woo, J., Kim, H., Mohaisen, A., and Kim, H. K.
(2016). You are a game bot!: Uncovering game bots
in mmorpgs via self-similarity in the wild. In Ndss.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach
to interpreting model predictions. In Advances in neu-
ral information processing systems, pages 4765–4774.
Park, K. H., Lee, E., and Kim, H. K. (2019). Show me
your account: Detecting mmorpg game bot leveraging
financial analysis with lstm. In International Work-
shop on Information Security Applications, pages 3–
13. Springer.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). why
should i trust you?” explaining the predictions of any
classifier. In Proceedings of the 22nd ACM SIGKDD
international conference on knowledge discovery and
data mining, pages 1135–1144.
Tao, J., Xu, J., Gong, L., Li, Y., Fan, C., and Zhao, Z.
(2018). Nguard: A game bot detection framework
for netease mmorpgs. In Proceedings of the 24th
ACM SIGKDD International Conference on Knowl-
edge Discovery & Data Mining, pages 811–820.
Tsaur, W.-J., Tseng, C. H., and Chen, C.-L. (2019). Deep
neural networks for online game bot detection. In Pro-
ceedings on the International Conference on Artifi-
cial Intelligence (ICAI), pages 252–258. The Steering
Committee of The World Congress in Computer Sci-
ence, Computer Engineering and Applied Computing
(WorldComp).
Understand Watchdogs: Discover How Game Bot Get Discovered
931