Learning from Smartphone Location Data as Anomaly Detection for
Behavioral Authentication through Deep Neuroevolution
Mhd Irvan, Tran Phuong Thao, Ryosuke Kobayashi, Toshiyuki Nakata and Rie Shigetomi Yamaguchi
Graduate School of Information Science and Technology, The University of Tokyo, Japan
yamaguchi.rie@i.u-tokyo.ac.jp
Keywords:
Behavioral Authentication, Machine Learning, Deep Neuroevolution.
Abstract:
Passwords and face recognition are some examples of many approaches to authenticate smartphone users.
These approaches typically authenticate users at an initial log-in or unlock session, and there are risks of an
unauthorized person using the authenticated account if the smartphone owner lose their device while still in
unlocked status. Because of this reason, there is a necessity to continuously authenticate from time to time.
Passwords and biological biometrics-based authentication procedures are impractical for this kind of situation
because they require constant interruption. In this early research we are applying a behavioral authentication
approach implementing location history data to implicitly authenticate users. Traits derived from users’ move-
ments are easy to monitor and hard to fake. Previously visited locations represent patterns within people’s
daily behaviors and in this paper we are proposing deep learning method evolved by genetic algorithms to
recognize such patterns and to correctly authenticate people that match the patterns.
1 INTRODUCTION
Smartphones play important roles in many people’s
daily life. People commonly use smartphone appli-
cations to take photos, send messages, book rides, or
shop online. It is not unusual for those applications
to ask private information (such as names, gender, or
credit card information) from their users to improve
the quality of their service. The sensitive nature of
those private information requires application devel-
opers to properly secure access to their service.
A very popular way to secure such access is by
asking passwords from users during login process.
However, passwords and other knowledge-based au-
thentication methods such as PIN (personal iden-
tification number) codes carry great risk as users
tend to use the same passwords across multiple ser-
vices. Thus, many services currently require addi-
tional possession-based authentication method before
granting access (Zviran, 2006). A typical way of this
implementation is by sending a unique code through
SMS (short message service) to users’ phone num-
bers. This extra step is known as 2-factor authen-
tication (2FA) or multi-factor authentication (MFA)
(Banyal, 2013).
Unfortunately, possession-based authentication
methods bring potential inconveniences to users be-
cause they may have to carry additional devices
which can be easily lost. Many users also use the
same smartphone to input passwords and receive 2FA
codes. Thus, if their smartphone is stolen, attackers
can bypass 2FA checks (Vel
´
asquez, 2018).
To further secure users against such situations,
many smartphone makers offer the possibility to use
inherence factors to authenticate users. These inher-
ence factors often take the form of biological biomet-
rics information such as fingerprints or facial expres-
sions. Biometrics authentication methods offer more
seamless authentication because users only need to
provide information that they already have and carry
all the time (Ogbanufe, 2018).
All previously mentioned authentication methods
typically grant a one-time unlock session during login
process, and if the smartphones are stolen while being
in unlocked status, whoever steals the device could
have access to private data contained inside. Because
of this, many services ask to re-authenticate when a
certain amount of time has passed. Nevertheless, the
period between those re-authentications is still prone
to attacks.
Such threats introduce the necessity to regularly
authenticate users from time to time. This approach is
known as continuous authentication (Feng, 2017). It
requires users to continuously provide credentials to
Irvan, M., Thao, T., Kobayashi, R., Nakata, T. and Yamaguchi, R.
Learning from Smartphone Location Data as Anomaly Detection for Behavioral Authentication through Deep Neuroevolution.
DOI: 10.5220/0010395407230728
In Proceedings of the 7th International Conference on Information Systems Security and Privacy (ICISSP 2021), pages 723-728
ISBN: 978-989-758-491-6
Copyright
c
2021 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
723
Figure 1: Flow of authentication.
prove their authenticity. Passwords, 2FA codes, and
biometrics-based authentication methods are consid-
ered to be not suitable to continuously authenticate
users due to the inconvenience of manually inputting
those information multiple times. The laborious pro-
cess of inputting such information led researchers to
look for implicit factors to be applied into continuous
authentication.
Implicit factors, such as user behaviors, are suit-
able for continuous authentication because they do
not require constant interaction from users and can
be done unobtrusively in the background without in-
terrupting user activities. Furthermore, behaviors are
unique to each person and hard to mimic by another
(Sitov
´
a, 2015). Methods to authenticate users by
learning from their past behaviors are often referred as
behavioral authentications methods. Behavioral au-
thentications also have their own issues. The unique
nature of people’s behaviors means that it is challeng-
ing to recognize the patterns that define them.
In this paper, we propose a method that recog-
nizes users behaviors through their location history
data gathered though their smartphone’s built-in GPS
(Global Positioning System) sensor. Our method con-
tinuously authenticates users based on their past be-
havioral patterns. When our proposed method recog-
nizes that the owner is no longer accompanying the
phone, smartphone developers may use those infor-
mation to lock access to the phone to prevent further
access by asking for explicit re-authentication, such
as passwords or facial expression (figure 1).
We implement deep neuroevolution models (Such,
2017), which combine Deep Neural Network (DNN)
architectures (Szegedy, 2013) with Genetic Algo-
rithm (GA) operations (Goldberg, 2006), to learn
from users location history and find patterns inside
their moving behaviors to regularly authenticate users
based on their current location. Through a collabora-
tive research project between our affiliated university
and various commercial companies, behavioral data
from over 7,000 smartphone users were collected.
To evaluate the feasibility of our proposed
method, we conducted early experiments on a small
number of users inside the dataset. Our early findings
from the experiments demonstrate that our model can
be used to detect anomaly in expected users’ locations
with relatively high accuracy.
2 RELATED WORK
Hsieh and Leu (Hsieh, 2011) proposed an authenti-
cation scheme which exploits One-Time Passwords
(OTPs) based on the time and location information of
the mobile device to authenticate users while access-
ing Internet services, such as online banking services
and e-commerce transactions. Their research demon-
strated that location information can be used to cor-
rectly authenticate genuine users. However, their re-
search is applicable only for a one-time authentication
session, instead of continuously repeated. This limi-
tation comes from the requirement to manually enter
SMS-based OTPs based the time and location.
Ghogare et al. (Ghogare 2012) also showed that
location can be used as one of the credentials to give
access to data only to legitimate user. However, their
system was not designed for smartphone users in
mind. They implemented dedicated GPS devices to
get the location information of users. The location in-
formation is transferred during an explicit authentica-
tion session so their location-based authentication ap-
proach is not suitable to implicitly authenticate users
in the background without interrupting user activities.
Zhang et al. (Zhang, 2012) applied a location-
based authentication for mobile transaction using
smartphones. Similar to Ghogare et al. research, their
authentication is also applied during an explicit au-
thentication session, instead of implicitly. However,
the location information in their research comes from
users smartphones, instead of separate GPS devices.
They showed that since users typically carry their
smartphones everyday and everywhere, the amount of
location information is richer and contribute towards
stronger location-based authentication.
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
724
Figure 2: Proposed approach.
3 PROPOSED APPROACH
Deep neuroevolution concept is proposed as our au-
thentication algorithm. Deep neuroevolution is a con-
cept of evolving Deep Neural Network (DNN) mod-
els with Genetic Algorithm (GA) operators instead of
with standard gradient descent method (Such, 2017).
It founds success in problem areas where patterns
in data continuously change and finding an optimal
DNN model in such situation is a laborious process.
Similarly, learning from location data is challenging
because people’s behaviors occasionally change and
a good DNN model for past behaviors may not neces-
sarily produce good results for the changed behaviors
(Miikkulainen, 2019). For this reason, the DNN pa-
rameters of our authentication models are regularly
evolved by GA operators to adapt to the dynamics of
users’ behaviors.
Location history carries rich information. Places
typically visited on certain time represents unique be-
haviors of a person. Furthermore, in addition to reg-
ularly visiting favorite places, people also visit new
places from time to time. Correctly authenticating at
new places is challenging. Meanwhile, discovering a
good neural network for a particular kind of behav-
ior is a laborious process. There is no general ideal
number of layers and nodes when designing a neu-
ral network for location data. Researchers have re-
cently found that GA can be used to automate this
design process. GA operators, such as mutation and
crossover, can be used to autonomously evolve a neu-
ral network into a shape suitably work with the data it
is fed with. Consequently, our DNN will continue to
evolve to match new behaviors.
To maintain privacy, our model is designed to
work exclusively with locally kept data inside each
smartphone. At first, an initial global model of DNN
Learning from Smartphone Location Data as Anomaly Detection for Behavioral Authentication through Deep Neuroevolution
725
is distributed across users’ device. This DNN feed
exclusively on the device’s user location history as
inputs and learn the movement patterns behind their
travel. GA operators will then regularly change the
DNN parameters, namely number of hidden layers,
number of nodes at each hidden layers, and weights
between them. Consequently, each device maintains
their own evolved DNN to match its own user behav-
iors. Furthermore, Reinforcement Learning (RL) is
used to give feedbacks the system based on the au-
thentication accuracy produced by DNN. Finally, the
values of DNN parameters from each smartphone are
aggregated and averaged to build a new global DNN
model in a similar fashion to Federated Learning (FL)
to be re-distributed across users’ device again (figure
2).
Inputs for the DNN are defined as pairs of “time”
and “place”. Since people’s behavior may vary
greatly between each day, as well as between week-
days and weekends, “time” is defined as a collection
of labels of “day”, “weekday/weekend”, “hour”, and
“minutes”. Meanwhile, “place” is defined as a pair of
“latitude” and “longitude” information.
DNN is initially trained with the first week of
location history, where regularly visited locations at
particular time are assumed as usual paths, and the
less frequently visited locations are assumed as un-
usual paths. When the authentication system encoun-
ters unusual locations at particular time during the
learning process, it observe the deviation distance be-
tween the expected location and time and assumes au-
thenticity for a while. This is essential for DNN to
grasp the nature of its user’s change in behavior. Re-
inforcement learning is used to give feedbacks to the
neural network about the consequence of its observa-
tion and GA operators are tasked to update the pa-
rameters based on those feedbacks. Each device will
evolve the neural network independently to better fit
their user’s data. Overtime, the DNN will understand
better whether the deviation is a change of behaviors
or an anomaly due to theft (figure 3). After a pre-
determined number of evolution rounds, every device
reports to the remote server informing their evolved
design and the remote server will then average the
evolution parameters (number of layers, nodes, and
the respective weights between them) found in all
evolved neural networks sent by the clients to gener-
ate a new global neural network. This new global neu-
ral network is again dispatched to all clients to repeat
the evolution process and evolve further. The infor-
mation reported to the remote server contains strictly
only machine learning parameters. No actual location
data is being transmitted into the server to maintain
privacy.
Figure 3: Observing change of behaviors.
4 EARLY EXPERIMENT
We conducted experiments by giving a real-world
dataset collected from regular smartphone users as in-
puts to our proposed method. This dataset reflects
their actual lifestyle.
4.1 Data Collection Method
Our dataset came from a collaborative project be-
tween our affiliated university and several commercial
companies which develop and distribute smartphone
applications in Japan. These applications are rela-
tively popular amongst smartphone users in Japan.
Users of the applications were presented with an op-
tion to participate in a university project that study
and analyze smartphone users’ lifestyle habit. A web-
site address to the detailed project description hosted
on the university website was also linked. By agree-
ing to participate in the project, users agree to share
data about their smartphone, application usage, and
their location data. Users were clearly informed that
their data would be limited to analysis by the univer-
sity laboratory and their privacy would be strictly pro-
tected. The project passed an extensive review by the
Table 1: An example of collected data from a user.
ID Date Time Latitude Longitude
1 2017XXXX 21 : XX 31.XXXXXX 139.XXXXXX
1 2017XXXX 21 : XX 31.XXXXXX 139.XXXXXX
1 2017XXXX 21 : XX 31.XXXXXX 139.XXXXXX
1 2017XXXX 21 : XX 31.XXXXXX 139.XXXXXX
1 2017XXXX 21 : XX 31.XXXXXX 139.XXXXXX
... ... ... ... ...
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
726
Table 2: Summary of result for each experiment.
Number of random users Training time Authentication time FAR FRR
100 users 1 hour 0.8 seconds 0.94% 1.74%
100 users 2 hours 2.0 seconds 0.71% 1.21%
200 users 1 hour 0.8 seconds 0.98% 1.93%
200 users 2 hours 2.1 seconds 0.84% 1.62%
university’s ethical committee and deemed to be ap-
propriately implemented.
4.2 Dataset
The dataset used in this paper contains information
about GPS coordinates every 5 to 10 minutes gath-
ered from 7,236 smartphone users over a period from
February 2017 to April 2017. Table 1 illustrates the
contents of data of a user. Because this research is still
in its early phase, and we we would like to initially
validate our approach, we only used small subsets of
the whole data.
For our early experiments, we created two groups
of users. A group of 100 randomly selected users and
a group of 200 randomly selected users. We trained
our model by initially feeding the first week of loca-
tion history as training data and the system is tasked to
continuously authenticate users every 5 minutes dur-
ing the following week. We gradually increased the
size of training data by a week each time up until the
second to last week of the collected data.
4.3 Early Results and Discussions
We run our experiments two times for each group. In
the first experiment, DNN was trained and evolved
by GA for one hour, while in the second experiment
DNN was trained and evolved by GA for two hours.
Our experiments showed that after two hours training
the model, GA evolved DNN into a size so large that
it requires 2 seconds or more to authenticate users on
average. Meanwhile, training our model for 1 hour,
produced a DNN that is still capable to authenticate
users within 1 second. Table 2 summarizes the aver-
age False Accept Rate (FAR) and False Reject Rate
(FRR) for each of our experiments.
As we can see from table 2, DNN models trained
and evolved for 2 hours produced better FAR and
FRR in both groups of users. They did, however, re-
quire a longer time to authenticate than the models
trained in shorter time. While in a traditional explicit
authentication methods, such as face recognition, au-
thentication time needs to be as short as possible, we
believe our behavioral authentication method does not
need to address this necessity as much. The reason
is because our authentication method is done implic-
itly in the background and does not need to interrupt
users’ activities. Users may not necessarily notice the
extra seconds taken to implicitly authenticate them.
These results demonstrate that DNNs evolved
through GA can successfully learn the behavioral pat-
terns contained within traces of location history of
smartphone users. Mobile devices can be confident
that their owners are no longer the ones who accom-
pany them when the DNN outputs a reject behavioral
authentication signal. In cases where false rejects do
happen, the owners can simply authenticate through
explicit authentication methods (such as passwords or
face recognition).
5 FINAL REMARK
Our research is still in its early phase and initial re-
sults demonstrated that deep neuroevolution models
where DNNs are evolved by GA can be a good ap-
proach to implicitly authenticate users through their
behaviors. Although our results showed that this ap-
proach is still not accurate enough to fully replace
traditional explicit authentication methods, we be-
lieve it can be a suitable alternative for authenticating
additional behavioral factors after the initial unlock
through explicit authentications.
We also see a potential scenario for our approach
to be implemented for smoother mobile payments
through smartphones without requiring additional in-
teraction, (such as providing PIN codes) from users.
While ideally the learning process for this kind of
application should be done continuously on the mo-
bile device itself, it would rapidly drain the battery
power. As such, the learning process itself could be
limited during the time when the phone is connected
to a power outlet. We are currently investigating this
feasibility.
To also further validate our approach, we are plan-
ning to conduct more experiments using whole users’
information from the dataset. Parameter optimization
to achieve better FAR and FRR is also being planned.
Learning from Smartphone Location Data as Anomaly Detection for Behavioral Authentication through Deep Neuroevolution
727
REFERENCES
Banyal, R. K., Jain, P., & Jain, V. K. (2013, Septem-
ber). Multi-factor authentication framework for cloud
computing. In 2013 Fifth International Conference on
Computational Intelligence, Modelling and Simula-
tion (pp. 105-110). IEEE.
Feng, H., Fawaz, K., & Shin, K. G. (2017, October). Con-
tinuous authentication for voice assistants. In Proceed-
ings of the 23rd Annual International Conference on
Mobile Computing and Networking (pp. 343-355).
Goldberg, D. E. (2006). Genetic algorithms. Pearson Edu-
cation India.
Ghogare, S. D., Jadhav, S. P., Chadha, A. R., & Patil, H.
C. (2012). Location based authentication: A new ap-
proach towards providing security. International Jour-
nal of Scientific and Research Publications, 2(4), 1-5.
Hsieh, W. B., & Leu, J. S. (2011, July). Design of a time
and location based One-Time Password authentication
scheme. In 2011 7th International Wireless Communi-
cations and Mobile Computing Conference (pp. 201-
206). IEEE.
Miikkulainen, R., Liang, J., Meyerson, E., Rawal, A., Fink,
D., Francon, O., ... & Hodjat, B. (2019). Evolving
deep neural networks. In Artificial Intelligence in the
Age of Neural Networks and Brain Computing (pp.
293-312). Academic Press.
Ogbanufe, O., & Kim, D. J. (2018). Comparing fingerprint-
based biometrics authentication versus traditional au-
thentication methods for e-payment. Decision Support
Systems, 106, 1-14.
Sitov
´
a, Z.,
ˇ
Sed
ˇ
enka, J., Yang, Q., Peng, G., Zhou, G., Gasti,
P., & Balagani, K. S. (2015). HMOG: New behav-
ioral biometric features for continuous authentication
of smartphone users. IEEE Transactions on Informa-
tion Forensics and Security, 11(5), 877-892.
Such, F. P., Madhavan, V., Conti, E., Lehman, J., Stanley,
K. O., & Clune, J. (2017). Deep neuroevolution: Ge-
netic algorithms are a competitive alternative for train-
ing deep neural networks for reinforcement learning.
arXiv preprint arXiv:1712.06567.
Szegedy, C., Toshev, A., & Erhan, D. (2013). Deep neural
networks for object detection. In Advances in neural
information processing systems (pp. 2553-2561).
Vel
´
asquez, I., Caro, A., & Rodr
´
ıguez, A. (2018). Authenti-
cation schemes and methods: A systematic literature
review. Information and Software Technology, 94, 30-
37.
Zhang, F., Kondoro, A., & Muftic, S. (2012, June).
Location-based authentication and authorization using
smart phones. In 2012 IEEE 11th International Con-
ference on Trust, Security and Privacy in Computing
and Communications (pp. 1285-1292). IEEE.
Zviran, M., & Erlich, Z. (2006). Identification and authenti-
cation: technology and implementation issues. Com-
munications of the Association for Information Sys-
tems, 17(1), 4.
ICISSP 2021 - 7th International Conference on Information Systems Security and Privacy
728