Predicting Customer Behavioural Patterns using a Virtual Credit Card
Transactions Dataset
Ayrton Senna Azzopardi and Joel Azzopardi
Department of Artificial Intelligence, Faculty of ICT, University of Malta, Msida, MSD2080, Malta
Keywords:
Data Mining, Financial Transactions, Customer Churn Prediction.
Abstract:
Nowadays, many businesses are resorting to data mining techniques on their data, to save costs and time, as
well as to understand customers’ needs. Analysing such data can leader to higher profits and higher customer
satisfaction. This paper presents a data mining study that is applied on millions of transactional records col-
lected for a number of years, by a leading virtual credit card company based in Malta. In this study, 2 machine
learning techniques, namely Artificial Neural Networks (ANN) and Gradient Boosting (GBM), are analysed
to identify the best modelling framework that predicts the churning behaviour of this company’s customers.
Apart from helping the marketing department of this firm by providing a model that predicts churning cus-
tomers, we contribute to literature by identifying the minimum amount of customer activity needed to predict
churn. In addition, we also analyse the “cold start” problem by performing a time-series experiment based on
the few data available at the beginning of the customer purchase history.
1 INTRODUCTION
With the advancements in web technologies, online
shopping as well as online gambling have rapidly in-
creased in popularity in the past years. In fact, world-
wide retail e-commerce sales are showing logarithmic
growth and forecasted to exceed 5.4 trillion USD ($)
in 2022 and 6.3 trillion USD in 2024
1
, whilst the on-
line gambling market is forecasted to exceed 92.9 bil-
lion USD in 2023
2
. The ability of making a financial
transaction instantly and from across the world is a
core reason why e-commerce and gambling websites
grew at such a rapid pace. However, when making
online transactions expose users’ financial data. This
has been a major concern to some users, and so they
are increasing resorting to methods that hep protect
their financial information.
One such popular approach is through virtual
credit cards. Funds are initially deposited into an ac-
count linked with such a virtual credit card, and then
the actual virtual credit card is used when performing
transactions online. In this way the actual bank ac-
count details of a user are not exposed when making
a transaction on the web. Moreover, the actual bank
1
May 2022: https://www.trade.gov/
ecommerce-sales-size-forecast
2
May 2022: https://www.statista.com/statistics/
270728/market-volume-of-online-gaming-worldwide/
will not have any knowledge whatsoever how clients
are spending their money. This is extremely benefi-
cial to online gamers as certain banks are rejecting
attempts of credit cards used in gambling websites.
The increase in popularity of e-commerce and
gambling websites is leading to millions of users us-
ing virtual credit card services. Consequently, virtual
credit card companies are dealing with huge amounts
of incoming data in the form of new users, creation
of credit cards, deposit of funds and mostly online
transactions. The customer data has the potential to
be mined in order to extract meaningful knowledge
about the different users and any trends that might
lead to increased profits and customer satisfaction.
For this reason, we present a data mining study
that is applied on millions of transactional records that
were collected for a number of years, by a leading vir-
tual credit card company based in Malta. We utilise
machine learning techniques to predict the churning
behaviour of this company’s customers. We also anal-
yse and identify the minimum amount of customer
activity needed to predict churn. In addition, we at-
tempt to handle the “cold start” problem by perform-
ing a time-series experiment based on the minimal
data available at the beginning of a customer’s pur-
chase history.
160
Azzopardi, A. and Azzopardi, J.
Predicting Customer Behavioural Patterns using a Virtual Credit Card Transactions Dataset.
DOI: 10.5220/0011342300003280
In Proceedings of the 19th International Conference on Smart Business Technologies (ICSBT 2022), pages 160-167
ISBN: 978-989-758-587-6; ISSN: 2184-772X
Copyright
c
2022 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
1.1 Aims and Objectives
The aim of this research is to investigate machine
learning techniques and develop a modelling frame-
work that is capable of predicting whether a customer
is churning based on one’s virtual credit card trans-
actions. The aforementioned aim will be fulfilled
through the following objectives:
Extract a number of dynamic features from the
provided raw financial transactions, so as to find
the most effective feature set when predicting cus-
tomer churn.
Build a machine learning setup that is effective
in modelling and predicting customer churn using
the extracted features.
Determine the minimum amount of customer
activity needed, in order to effectively predict
whether a customer is churning or not.
Determine whether demographic features to-
gether with any initial financial transactions can
be used to predict whether a relatively new cus-
tomer will continue to use the company’s services
or else churns.
1.2 Contribution to Research
Despite the fact that there are quite a few researchers
to have studied customer churn in the financial do-
main, the concept of predicting churn using dynamic
features solely extracted from raw financial transac-
tions, as presented in this paper, is still quite a novel
approach. Traditional customer churn prediction so-
lutions focus solely on domain specific and static fea-
tures. Following recent research that have shown
the effectiveness of dynamic behavioural predictors,
we extract meaningful dynamic features from a cus-
tomer’s virtual credit card usage history, and validate
their usefulness with regards to customer churn pre-
diction.
Our second objective is to construct a machine
learning setup that is able to model the churn be-
haviour of customers. Section 2 provides review of a
number of different machine learning techniques ap-
plied for this problem. Unfortunately, one can not that
the generated framework with the best results is spe-
cific to the dataset that is used in that particular ex-
periment. It does not entail that the same maintain the
same performance when applied on different datasets.
For this reason, our second objective evaluates and
finds the best performing technique on the dataset that
we utilise that records virtual credit card transactions.
Our third objective attempts to identify the opti-
mal observation window size. From all the reviewed
literature, only Leung and Chung (Leung and Chung,
2020) attempt to test different observation window
sizes. However, this experiment was solely an evalu-
ation of just 2 observation periods differing in length
(4 months vs 6 months). We fill this research gap by
performing a more extensive time-series experiment
so as to determine the minimum amount of customer
spending history needed.
Finally, in our fourth objective we tackle the “cold
start” problem. This objective is mainly motivated
by the fact that since organisations would barely have
any observed data for newly registered customers, the
latter are generally excluded from further analysis. In
this research, we attempt to fill this gap by applying
another time-series experiment using customer demo-
graphics – the only data available at that specific time.
2 LITERATURE REVIEW
The financial and banking industry has been evolving
quite substantially in recent years. Consequently, ex-
isting companies in this industry are facing extreme
competition, not only by direct competitors but also
by new entrants and start-ups that can be disruptive by
providing innovative financial solutions (Shirazi and
Mohammadi, 2019). Different industries are focusing
more on managing the churn behaviour of customers,
rather than investing in strategies in an attempt to ac-
quire new customers given the fact that customer re-
tention is far less costly than acquiring new customers
(Kaya et al., 2018; Bilal Zori
´
c, 2016; Kim et al., 2005;
Rosa, 2019; Shirazi and Mohammadi, 2019; Safine-
jad et al., 2018; Leung and Chung, 2020; Szmydt,
2018; Keramati et al., 2016; Farquad et al., 2014).
In this section, we will go through some of the
work found in literature with regards to predicting
the churn behaviour of customers in the financial sec-
tor, giving an overview of the decisions taken by aca-
demics during their research when tackling the extrac-
tion of churn related features, the modelling of their
machine learning approach and finally the choice of
the observation window size.
Kim et al. (Kim et al., 2005) are amongst the
first academics that started researching this problem.
They applied Support Vector Machines (SVM) to the
analysis of customer churn behaviour and evaluate
its effectiveness through the use of demographic and
credit card usage information. They only utilise a 3-
month period when observing the credit card usage
of customers as they claim that such period is ade-
quate in understanding the behaviour of a customer.
Their model is compared against a three-layer Back-
Propagation Neural Network (BPN), where SVM ob-
Predicting Customer Behavioural Patterns using a Virtual Credit Card Transactions Dataset
161
tained better prediction results. In addition, Kim et al.
(Kim et al., 2005) state that the process of parameter
tuning is a vital step since different parameter values
drastically change the prediction performance.
Similarly, Farquad et al. (Farquad et al., 2014)
also apply SVM to predict customer churn. However,
they go a step further by constructing a hybrid model,
where apart from predicting whether a customer is
churning or not, the model is capable of extracting in-
formative rules on the customers. Their approach can
be viewed in 3 phases. Initially, the number of fea-
tures used when predicting churn is reduced through
a recursive feature elimination process. Hereafter, the
support vectors computed after training the SVM with
the reduced feature set, are extracted. These support
vectors together with the predicted values are used
to construct a new dataset to be utilised in the fi-
nal phase. Eventually, a Naive-Bayes Tree is imple-
mented to purposely generate meaningful rules giving
more insights about the churn behaviour of customers.
One can find a number of works in literature that
follow a similar approach to the one applied in (Far-
quad et al., 2014), where researchers aim to generate
a number of informative rules when analysing cus-
tomer churn. The generated rules tend to group cus-
tomers into different segments according to common
behaviour. For instance, Keramati et al. (Keramati
et al., 2016) aim to outline common characteristics
of churning customers. To identify any hidden be-
havioural patterns, a Decision Tree (DT) is employed
since, by nature, DT generate clear and significant “if-
then” rules, allowing the authors in (Keramati et al.,
2016) to fulfil their aim. Similarly, Cil et al. (Cil
et al., 2018) also utilise DT in their quest to discover
meaningful knowledge from their dataset. Utilising a
dataset consisting of socio-demographic information
together with nearly 4,000,000 investment fund trans-
actions of around 65,525 bank customers they analyse
up to 6 months prior to the closure or inactivity of a
customer’s account from the customer’s entire invest-
ment fund transaction history. Subsequently, they per-
form 2 types of analysis. Initially, they model a DT on
the computed investment fund transaction order data
so as to determine the transactional patterns of cus-
tomers that closed off their account with the bank.
The learnt patterns in the form of DT rules are then
utilised on future customers to predict those that po-
tentially are willing to churn.
On the other hand, other studies do not focus
on acquiring such rules to characterise churning cus-
tomers, but rather focus on predicting whether a cus-
tomer is churning or not as efficiently as possible. A
popular approach in this case is the standard ANN.
Bilal (Bilal Zori
´
c, 2016) proposes a neural network
based framework that is capable of predicting the like-
lihood of churn for customers of a small Croatian
bank. The dataset used in this study contains merely
socio-demographic information and levels of service
usage. Similarly, (Safinejad et al., 2018) employ a
non-linear ANN to predict future churn rate of finan-
cial customers. They make use of a dataset that con-
tains raw financial transactions of more than 4,500
customers recorded between 2009 and 2011. The 3-
year observation window is divided into seasonal in-
tervals and for each interval (12 in total), Recency (R),
Frequency (F), Monetary (M) and Length (L) vari-
ables are calculated as features. Subsequently, they
utilise a “fuzzy dynamic model” that can be split into
3 phases. Firstly, a weighted-RFML model is utilised
to cluster customers and identify the segment rep-
resenting the most valuable customers. Secondly, a
fuzzy rule-based model that takes as inputs the L, F
and M variables and outputs a 3-mode (low, medium,
high) churn rate value, is developed. Thirdly, the pre-
diction of future churn rate is modelled using 2 mod-
els - ARIMA as a linear machine learning model and
ANN as a non-linear model. The authors in (Safine-
jad et al., 2018) conclude that the ANN model outper-
formed the other while claiming to have identified a
suitable model for customer churn prediction together
with an appropriate definition of churn in the finance
sector.
Most of the customer churn prediction systems re-
viewed above, focus solely on domain specific and
static features. Some of these features that are ex-
tracted and used in such traditional attempts, gen-
erally represent product or account type ownership,
service usage aggregation and socio-demographic in-
formation. Dynamic behavioural patterns in a cus-
tomer’s financial transactions, are rarely considered.
In fact, Kaya et al. (Kaya et al., 2018) try to fill
this research niche by exploring the spatio-temporal
patterns and choice behaviour of customers and de-
termine whether such behaviour relates to the cus-
tomer churn event. They extract novel features based
on the spatio-temporal and choice patterns of cus-
tomers. Spatio-temporal features include diversity,
loyalty and regularity that measure how varied or con-
stant customers are within their purchase behaviour
with regards to time and location perspective. On
the other hand, the financial choice patterns outline
how customers disperse their spending with regards
to merchants, purchase categories and locations of
merchants. (Kaya et al., 2018) claim that churn ac-
tivities can be effectively predicted using dynamic
behavioural patterns and furthermore, using domain-
independent variables specifically those that are based
on the spatio-temporal patterns in human activities.
ICSBT 2022 - 19th International Conference on Smart Business Technologies
162
Leung and Chung (Leung and Chung, 2020)
also propose a dynamic classification framework that
utilises actual customer behavioural patterns. For ev-
ery customer, the utilised dataset contains static pre-
dictors, such as the traditional demographic and prod-
uct ownership variables, and also account activity pre-
dictors including aggregation of financial transactions
and service usage. A trend factor is computed for
each account activity predictor, aiming to capture the
trends of account activities within the observation pe-
riod. The authors in (Leung and Chung, 2020) exper-
iment with both the observation window (4 months vs
6 months) and the labelling window (2 months vs 3
months). After computing the trend factors for each
different period, they evaluate 3 different supervised
machine learning models, namely Logistic Regres-
sion (LR), Random Forest (RF) and Gradient Boost-
ing (GBM). The authors conclude that with 6 months
of data, the models obtained better accuracy than with
4 months of data. On the other hand, accuracy de-
creases rapidly as the prediction window is extended.
Furthermore, RF and GBM outperformed the LR pre-
diction model.
Deep learning models are known to be able to cap-
ture high-level representations from the huge amounts
of customer data being created at an increasing rate
due to the increasing amount of financial activities
(Hsu et al., 2019). In fact, Hsu et al. (Hsu et al., 2019)
develop a Recurrent Neural Network (RNN) feature
extractor with GRU. The aim is to better model the
time dependencies found within a customer’s credit
card spending history, as a result of which, a num-
ber of dynamic features are then extracted for cus-
tomer churn prediction. They propose an innovative
approach by combining this strong dynamic feature
extraction from RNN with a RF. This enhanced RNN-
RF model is therefore capable of combining dynamic
and static features allowing for better performance
when predicting credit card customer churn. They
evaluate their innovative approach on a dataset con-
sisting of 30,000 instances with 23 features each, sep-
arated into 5 static socio-demographic features and 18
dynamic features describing monthly the customer’s
service usage in a 6 month period. The authors
conclude that the RNN-RF predictive model outper-
formed other benchmark models and also stated that
the model performed better with more training in-
stances. Further analysis on the use of DL models
for this problem is provided in (Jain and Jayabalan,
2022).
3 METHODOLOGY
The objectives discussed in Section 1.1, were
achieved by constructing a customer churn predictive
framework that can be applied on raw financial trans-
actions.
Initially, this framework pre-processes the raw fi-
nancial transactions to filter out any missing or redun-
dant data. In the pre-processing stage, the financial
transactions are also segmented into several time pe-
riods as a preparatory work for future analysis. Here-
after, numerous features are generated from the finan-
cial transaction records so as to extract and represent
any behavioural patterns hidden within the data. Fur-
thermore, customers are also classified and labelled
according to our customer churn definition. Subse-
quently, the extracted features and the generated cus-
tomer churn labels are fed into a machine learning
technique in order to model the churn behaviour of
customers and be able to predict churn activities.
Consequently, after determining the best set of
features and the best performing predictive model, an
experiment is performed where different observation
window sizes are evaluated to determine the mini-
mum amount of customer activity required prior to the
churn event. Finally, a time-series experiment is per-
formed to determine whether demographic informa-
tion combined with any spending information that is
available at the beginning of the customer’s relation-
ship with the company, can be used to predict whether
new customers are willing to continue using the com-
pany’s service.
3.1 The Extracted Features
In this study, we extracted different types of fea-
tures aimed at representing the customer behaviour
required for churn prediction.
In our review of literature, we have seen how the
authors in (Cil et al., 2018; Bilal Zori
´
c, 2016; Kim
et al., 2005; Rosa, 2019; Shirazi and Mohammadi,
2019; Leung and Chung, 2020; Hsu et al., 2019; Le-
ung and Chung, 2020; Keramati et al., 2016; Farquad
et al., 2014) all make use of socio-demographic in-
formation within their churn predictive systems. For
this reason, we followed their approach and made use
of the customer demographics found in the provided
dataset as well. Furthermore, we have seen how the
authors in (Kim et al., 2005; Keramati et al., 2016;
Bilal Zori
´
c, 2016; Rosa, 2019; Leung and Chung,
2020) also make use of static predictors representing
the service usage of customers. With this in mind, we
compute a number of statistical features that aggre-
gate different aspects of the purchase history of a cus-
Predicting Customer Behavioural Patterns using a Virtual Credit Card Transactions Dataset
163
tomer within a particular observation window. These
are referred to as “global” statistics.
Latest research has shown that dynamic be-
havioural features tend to be more effective in rep-
resenting customer behaviour for churn prediction
(Kaya et al., 2018; Leung and Chung, 2020; Hsu
et al., 2019). For this reason, some of the “global”
statistics computed on the entire observation window,
were also applied for each month period in the win-
dow, creating a new set of macro-average monthly
statistics. Such feature extraction process is similar
to the one performed by Leung and Chung (Leung
and Chung, 2020). However they then compute a
trend factor value based on the computed dynamic
features, prior to inputting the features into the predic-
tive model. At this stage, we followed the approach
taken by Kaya et al. in (Kaya et al., 2018) and in-
putted the dynamic features directly in the predictive
model. In addition, we also computed a feature vec-
tor containing the amounts spent by the customer on
each day of the observation window. The motivation
behind this, is to allow the predictive model to train
on variables that resemble the raw financial transac-
tions as much as possible in an attempt to not lose
any dynamic behavioural information of customers.
Finally, inspired by the innovative idea of hav-
ing features representing the choice behaviour of cus-
tomer with regards to merchants, purchase categories
and locations of merchants (Kaya et al., 2018), we
generated a merchant vector containing the number
of purchases done towards each Merchant Category
Code (MCC).
As can be seen, the constructed framework utilises
different types of features including demographic,
static and dynamic predictors so as to predict the
churning customers in the financial industry.
3.2 Machine Learning Techniques
The focus of our framework is not to acquire rules and
determine common characteristics of churning cus-
tomers, but rather to efficiently predict whether a cus-
tomer is churning or not. For this reason, we followed
the approach taken by (Bilal Zori
´
c, 2016; Rosa, 2019;
Safinejad et al., 2018), and employed a Neural Net-
work (ANN) as our predictive model. ANNs are in-
tended to artificially replicate the behaviour of the bi-
ological systems found in the human brain. For this
reason, similar to other researchers contributing to the
field of churn prediction, we believed that this non-
linear predictive model could be a suitable contender
in modelling the churn behaviour.
Furthermore, we also employed a Gradient Boost-
ing Model (GBM) as our customer churn predictive
model. The fact that GBM is capable of tuning weak
predictive models so as to become better predictors
by generating a single predictive model as an ensem-
ble of numerous weak ones, inspired us to make use
of such technique in our quest to predict customer
churn. In addition, (Leung and Chung, 2020) stat-
ing that GBM outperformed a LR and was also on par
with a RF, gave us further motivation in constructing
such predictive model.
Both classifiers were implemented using readily
available libraries. We decided to implement our
ANN using Keras
3
which is a deep learning API writ-
ten in Python, running on top of the machine learning
platform TensorFlow. On the other hand, we decided
to construct our GBM using XGBoost
4
. XGBoost is
an optimised distributed gradient boosting library de-
signed to be highly efficient, flexible and portable.
3.3 Data Observation and Labelling
Most of the studies reviewed in Section 2, did not
opt for an observation window exceeding 6 months.
In fact, most of the systems we reviewed employ
an observation window size varying between 3 to 6
months. As a result, we performed an experiment
where we train our predictive model on varying ob-
servation window sizes starting from 1 month worth
of data up to 6 months. The results of such experiment
will give us an inclination of the minimum amount of
observed customer activity needed to predict churn.
On the other hand, when we observed the finan-
cial transactions in the period following the observa-
tion window so as to determine the churn label for
customers, we decided to only consider the succeed-
ing month. This is mainly because according to Le-
ung and Chung (Leung and Chung, 2020), prediction
accuracy decreases instantly as the prediction win-
dow increases. Furthermore, companies and market-
ing departments would find it more beneficial if they
can predict what activity is expected in the coming
month. In addition, we did not employ any fuzzy
logic in our customer churn definition, meaning that a
customer can either be labelled as “Churned” or “Not
Churned”. In fact, our customer churn definition is
quite straightforward - if a customer has at least 1
transaction in the labelling window then the label is
“Not Churned” else “Churned”.
3
May 2022: https://github.com/keras-team/keras
4
May 2022: https://github.com/dmlc/xgboost/tree/
master/python-package
ICSBT 2022 - 19th International Conference on Smart Business Technologies
164
3.4 The Cold Start Problem
In an attempt to address the “cold start” problem in
machine learning problems, a time-series experiment
is performed where we initially try to predict the
churning behaviour of newly registered customers us-
ing only their demographic information. Then, as new
data starts coming in for these customers, we added
their daily purchase amounts to the feature set and
performed the customer churn prediction once again,
measuring the performance each time. This time-
series experiment was performed on a weekly ba-
sis, meaning that the churn prediction was performed
whenever another week of the customer’s purchase
history was observed, each time adding a feature vec-
tor of length 7, comprising of the purchase amounts
for every day in that week.
4 EVALUATION
The four research objectives set in Section 1.1, so as
to fulfil the project’s aim, are measured and evaluated,
as discussed in the following subsections.
4.1 Evaluation of Similar Systems
One of the main challenges that is encountered by re-
search within the financial domain, is the sensitive na-
ture of such financial data. According to (Martens
et al., 2016), studies on data processing and analysis
for financial businesses, highly depend on close col-
laborations with the industry (Cil et al., 2018; Safine-
jad et al., 2018). However, any company data to-
gether with the information resulting from the re-
search rarely get shared with the scientific commu-
nity, due to its sensitive nature. In light of such is-
sue, evaluation is challenging due to the lack of gold
standard datasets available. In fact, all the related sys-
tems reviewed in Section 2, do not compare their find-
ings against those obtained in other research work, but
solely evaluate their own proposed solution on a ded-
icated test set and discuss the results in terms of vari-
ous performance metrics.
Apart from the traditional Accuracy score, the per-
formance of most classification models is measured
using the AUROC metric. This metric provides an
aggregate measure of performance across all possible
classification thresholds. This metric has been used
to evaluate the systems described in (Keramati et al.,
2016; Rosa, 2019; Kaya et al., 2018; Hsu et al., 2019).
4.2 Evaluation of the Extracted
Features and the Machine Learning
Techniques Used
In order to determine the most effective features in
capturing customer behavioural patterns and the best
performing modelling technique in customer churn
prediction, a greedy search was employed. We com-
puted all the combinations of the different feature cat-
egories and fed the computed feature sets into our two
predictive models, measuring the prediction perfor-
mance of each model using the AUROC metric. It
is worth mentioning, that the observation window for
this evaluation experiment was taken to be 3 months.
The ANN classification model managed to ob-
tain its highest AUROC score when trained on de-
mographic information, global statistics aggregating
the entire 3-month observation window and the vec-
tor comprising of the daily purchase amounts, ob-
taining a score of 0.62. With such performance, we
can conclude that the implemented ANN model had
some form of ability in distinguishing between the 2
classes. On the other hand, the GBM classification
model performed at its best when trained on global
statistics acquired from the entire observation win-
dow, monthly statistics obtained for each month in the
window, the vector consisting of the number of pur-
chases done towards the different MCCs and finally
the vector comprising of the daily purchase amounts.
In this scenario, the GBM managed to obtain an AU-
ROC score of 0.69, distinguishing the 2 classes way
better than the implemented ANN model. Further-
more, we noticed that the GBM improved its score
by a few percentages whenever more features are ob-
served.
In addition, for the best performing ANN and the
best performing GBM setup, we computed other per-
formance metrics using traditional values from the
confusion matrix so as to have a better understand-
ing of the models’ prediction performances. In fact,
we computed the Sensitivity, Specificity, False Posi-
tive Rate, False Negative Rate and the Precision met-
ric. The obtained measurements are shown in Tables
1 and 2 for the ANN and GBM respectively.
Table 1: Evaluation results of the ANN predictive model
based on the Confusion Matrix.
Metric Score
Sensitivity 0.4535
Specificity 0.7284
False Positive Rate 0.2716
False Negative Rate 0.5465
Precision 0.6238
Predicting Customer Behavioural Patterns using a Virtual Credit Card Transactions Dataset
165
Table 2: Evaluation results of the GBM predictive model
based on the Confusion Matrix.
Metric Score
Sensitivity 0.6989
Specificity 0.6865
False Positive Rate 0.3135
False Negative Rate 0.3011
Precision 0.6581
From these metrics, one can conclude that the
GBM with 70% sensitivity, was more capable of pre-
dicting positive cases i.e. “Churners”, whilst on the
other hand ANN with 73% specificity, was more ca-
pable of identifying negative cases i.e. “Non Churn-
ers”. Furthermore, GBM was quite consistent and
managed to incorrectly classify both positive and neg-
ative cases around 30% of the time. On the other
hand, ANN incorrectly classified positive cases as
negative around 55% of the time. With regards to how
many predicted positive cases were actually correct,
the GBM edged the ANN model with a 3.5% better
precision. To conclude this section, it is fair to say
that the implemented GBM is more suitable to pre-
dict the churn behaviour of customers. In view of this,
the constructed GBM model and all the extracted fea-
tures bar customer demographics, are utilised in the
remaining experiments.
4.3 Evaluation of the Varying
Observation Window Sizes
In this section, we examined different observation
window sizes and checked how the prediction perfor-
mance of the classification model changes in return,
so as to determine the minimum amount of customer
purchase history required while still predicting churn
with the same performance. The results of such ex-
periment are shown in Table 3.
Table 3: Evaluation results of the different observation win-
dow sizes.
Number of Months AUROC Score
1 0.6674
2 0.6883
3 0.6927
4 0.6865
5 0.6801
6 0.6850
Despite varying the observation window size from
1 month up to 6 months, the performance of the pre-
dictive model did not change extensively however.
The 3-month observation window remained with the
best performance metric score, with the other window
sizes not managing to cap that. Despite obtaining
the lowest AUROC score (0.67) the 1-month obser-
vation window was only around 2.5% off the top. It
can be concluded that decreasing the amount of ob-
served purchase history is possible without sacrificing
too much predictive performance.
4.4 Evaluation of the Usage of
Demographics and any Initial
Observations for Churn Prediction
on New Users
In this section, we conducted a time-series experiment
where we examined how effective the classification
model is when predicting churn on new customers.
The results of such experiment are shown in Table 4.
Table 4: Evaluation results of the usage of demographics
combined with the initial purchase observations for churn
prediction on new users.
Number of Weeks AUROC Score
0 0.5026
1 0.5746
2 0.6175
3 0.6524
4 0.6828
Results show that knowing just the age, country
and currency information of a customer, is not enough
to be able to predict whether a newly registered user
is willing to continue using the company’s services or
rather stops and defaults. Predicting customer churn
with only demographic data is as effective as tossing
a coin. It is worth noting that with a few weeks of
observed purchase data, the prediction performance
increased quite rapidly, reaching the levels of having
a 3-month observation of purchases. Both this ex-
periment and the one preceding it, have shown that a
month’s worth of data is still quite sufficient to predict
whether a customer is defaulting in the next month.
We can conclude that by only observing the purchase
data of the current month, we can infer churn predic-
tions for the following month.
5 CONCLUSION AND FUTURE
WORK
In this paper, we presented a data mining study that
was applied on millions of financial transactions col-
lected for a number of years, by a leading virtual
credit card company based in Malta.
All 4 objectives specified in Section 1.1 have been
ICSBT 2022 - 19th International Conference on Smart Business Technologies
166
fulfilled. With regards to the first objective (identify-
ing the most effective feature set), we extracted differ-
ent features, namely: demographic features, “global”
statistics that are relative to the entire observation
window, dynamic monthly statistics for each month
in the observation window, a vector containing daily
purchase amounts and another vector containing the
number of purchases done towards each MCC. The
best results were achieved when utilising all the fea-
tures except for demographic features.
We fulfilled the second objective (build a setup
that can model and predict customer churn) by ap-
plying the Artificial Neural Network and the Gra-
dient Boosting Model for this problem. The GBM
classifier resulted in the best machine learning frame-
work of this study, obtaining an AUROC score of
0.6927. In addition, we also observed that our learn-
ing framework is capable of correctly identifying 70%
of “Churners”, potentially making it a suitable solu-
tion in Customer Relationship Management.
When handling the third objective (to determine
the minimum amount of customer activity needed in
order to predict its likelihood of churning), exper-
iment results show that decreasing the observation
window to a month’s length does not extensively af-
fect the predictive performance of the classifier, giv-
ing the ability to negotiate between prediction accu-
racy and amount of data observed.
For our final objective (attempting to handle the
cold-start problem using a customer’s demographic
features that can be made accessible upon registra-
tion), we attempted to predict customer churn using
only demographic information and in time, combine
any new purchase data. This experiment showed that
for the current dataset, predicting churn behaviour
using only customer demographics (the customer’s
age, country and currency information), is not any-
where sufficient enough to be able to predict whether
a newly registered customer is going to default or not
in the coming month.
The work described in this paper can be further
improved by augmenting the constructed framework
to a tree-based model in order to extract meaningful
behavioural rules. These can be used to capture the
actual characteristics of churning customers. Further-
more, after addressing the problem of customer churn
prediction, it now makes sense to tackle the problem
of predicting the next purchases of customers. The
approaches performed in collaborative recommenda-
tion systems can be adopted and tweaked to our pur-
pose.
REFERENCES
Bilal Zori
´
c, A. (2016). Predicting customer churn in
banking industry using neural networks. Interdisci-
plinary Description of Complex Systems: INDECS,
14(2):116–124.
Cil, F., Cetinyokus, T., and Gokcen, H. (2018). Knowl-
edge discovery on investment fund transaction his-
tories and socio-demographic characteristics for cus-
tomer churn. International Journal of Intelligent Sys-
tems and Applications in Engineering, 6(4):262–270.
Farquad, M. A. H., Ravi, V., and Raju, S. B. (2014). Churn
prediction using comprehensible support vector ma-
chine: An analytical crm application. Applied Soft
Computing, 19:31–40.
Hsu, T.-C., Liou, S.-T., Wang, Y.-P., Huang, Y.-S., et al.
(2019). Enhanced recurrent neural network for com-
bining static and dynamic features for credit card de-
fault prediction. In ICASSP 2019-2019 IEEE Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), pages 1572–1576. IEEE.
Jain, S. V. and Jayabalan, M. (2022). Applying machine
learning methods for credit card payment default pre-
diction with cost savings. Biomedical and Business
Applications Using Artificial Neural Networks and
Machine Learning, pages 285–305.
Kaya, E., Dong, X., Suhara, Y., Balcisoy, S., Bozkaya, B.,
et al. (2018). Behavioral attributes and financial churn
prediction. EPJ Data Science, 7(1):41.
Keramati, A., Ghaneei, H., and Mirmohammadi, S. M.
(2016). Developing a prediction model for customer
churn from electronic banking services using data
mining. Financial Innovation, 2(1):10.
Kim, S., Shin, K.-s., and Park, K. (2005). An application
of support vector machines for customer churn analy-
sis: Credit card case. In International Conference on
Natural Computation, pages 636–647. Springer.
Leung, H. C. and Chung, W. (2020). A dynamic classifica-
tion approach to churn prediction in banking industry.
Martens, D., Provost, F., Clark, J., and de Fortuny, E. J.
(2016). Mining massive fine-grained behavior data to
improve predictive analytics. MIS quarterly, 40(4).
Rosa, N. B. d. C. (2019). Gauging and foreseeing customer
churn in the banking industry: a neural network ap-
proach. PhD thesis.
Safinejad, F., Noughabi, E. A. Z., and Far, B. H. (2018). A
fuzzy dynamic model for customer churn prediction in
retail banking industry. In Applications of Data Man-
agement and Analysis, pages 85–101. Springer.
Shirazi, F. and Mohammadi, M. (2019). A big data analyt-
ics model for customer churn prediction in the retiree
segment. International Journal of Information Man-
agement, 48:238–253.
Szmydt, M. (2018). Predicting customer churn in electronic
banking. In International Conference on Business In-
formation Systems, pages 687–696. Springer.
Predicting Customer Behavioural Patterns using a Virtual Credit Card Transactions Dataset
167