Enhancing Customer Purchasing Behaviour Prediction in

E-Commerce: A Deep Learning Perspective

Rekulara Sharath, Anishetty Vineeth Kumar, Bokkena Sangameshwar and Bidyutlata Sahoo

Department of CSE(AI&ML), Institute of Aeronautical Engineering, Dundigal, Hyderabad, Telangana, India

Keywords: Deep Learning, Machine Learning, Data Preprocessing, Feature Engineering.

Abstract: Digital retailers are experiencing a growing volume of online transactions with consumers, which is driven

by consumers' ability to buy products through E commerce platforms. Such interactions tend to form complex

behavioural constructs that are extractable to assist companies in comprehending consumer requirements. One

of the most important applications is the correct determination of the behavior of consumers in the e commerce

domain. For selling any sort of product over the Internet or in the other words in order to achieve high profit

in an e-commerce business, the interplay between a customer and a merchandise is quite very critical.

Moreover, a lot of e commerce websites and services proliferate and competition has become just a mouse-

click away. Therefore the need to stay in the business, and enhance profitability measures purchases in a more

advanced way predicting desirability and allowing companies to customize services for customers based on

their indees. To help forecast behavioral patterns the research will incorporate foam Developing Learning

approaches. Also, narrative data from the dataset would be drawn through exploratory data analysis (EDA).

The dataset used in this research encompasses of different attributes, such as kind of visitors, that is whether

they made a purchase or not and many other variables. In this research, Deep Learning techniques aptly suited

for Multi-Level Data due to its robust capacity of modeling and accurate categorization are employed. In

addition to the insights gained from each particular set of data within EDA, the results from the behavioural

analysis prediction using any of the deep learning methods can add useful statistics to the e commerce

companies. Understanding user behaviour Smart usability design engagement, site design optimization,

personalisation and improvements in user experience..

1 INTRODUCTION

In the dynamic realm of digital commerce, the rapid

shift toward online platforms has significantly

reshaped consumer behavior, necessitating that e-

commerce businesses stay ahead by accurately

understanding and predicting purchasing patterns.

(Sarkar, Mia, et al. , 2023) This evolution has

emocratized shopping and created an environment

where every user interaction generates valuable data

that can reveal deep insights into consumer

preferences and trends. This paper presents a

comprehensive study that leverages advanced data

analytics, particularly focusing on Logistic

Regression, Neural Networks, and XGBoost, to

predict the accuracy of reorder behaviors in e

commerce. The foundation of this research is built on

meticulous Exploratory Data Analysis (EDA) of a

diverse dataset that includes customer demographics,

browsing history, purchase frequencies, and past

interactions with promotional offers. By uncovering

hidden patterns through EDA, we aim to enhance our

understanding of the factors influencing consumer

decisions in online shopping. Notably, we achieved

an accuracy of 90.13% using Artificial Neural

Networks, demonstrating the effectiveness of this

technique in predicting reorder behaviors.

Deep Learning and machine learning

techniques, such as Neural Networks and XGBoost,

are pivotal in this study, enabling us to process large

volumes of data and extract complex patterns that

traditional statistical methods might overlook. By

integrating these techniques, the study seeks to

develop robust predictive models capable of

forecasting key consumer behaviors, such as the

likelihood of reordering and product preferences. The

implications of these predictive models are profound,

offering e-commerce businesses a competitive edge

through data-driven decision-making. (Kumar,

Margala, et al. , 2023) Accurate behavioral

Sharath, R., Kumar, A. V., Sangameshwar, B. and Sahoo, B.

Enhancing Customer Purchasing Behaviour Prediction in E- Commerce: A Deep Learning Perspective.

DOI: 10.5220/0013599500004664

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) - Volume 2, pages 649-655

ISBN: 978-989-758-763-4

649

predictions facilitate the creation of personalized

marketing campaigns, optimized user experiences,

and targeted strategies that resonate with specific

customer segments. The ultimate goal of this research

is to equip companies with the knowledge and skills

necessary to successfully negotiate the complex

terrain of consumer behavior, promote innovation in

customer service, and maintain development in the

fiercely competitive e-commerce sector. There are six

sections in this study. The arrangement is as follows:

Section 2 examines pertinent scholarly works. A

thorough explanation of the research technique is

given in Section 3. Section 4 discusses the results

analysis and accuracy validation using different

evaluation criteria. The paper's conclusion, found in

Section 5, reviews the goals in light of the data and

considers potential future approaches for the study of

consumer purchasing behavior. Lastly, Section 6

contains a list of references.

2 LITERATURE SURVEY

The literature review reveals a range of approaches

employed in predicting customer purchasing

behavior, each utilizing various supervised

classification techniques. These include Logistic

Regression, Decision Trees, K-Nearest Neighbors,

Naïve Bayes, Support Vector Machines (SVM),

Random Forests, and Stochastic Gradient Descent,

which, after thorough feature optimization, achieved

a remarkable accuracy rate of 88%. (Sarkar, Mia, et

al. , 2023) To reduce dataset complexity and boost

classifier performance, K-means clustering was

introduced, which enhanced data homogeneity. This,

combined with algorithms like C4.5, J48, CS-MC4,

and Multinomial Logistic Regression, delivered

superior prediction outcomes.

(Kumar, Margala, et al. , 2023) Moreover, a

dynamic pricing strategy was developed to categorize

customers based on behavioral patterns and optimize

pricing in real-time, guided by historical data. This

strategy aimed to maximize revenue while ensuring

customer satisfaction. (Chaubey, Gavhane, et al. ,

2014)SVM models also played a significant role in

analyzing inventory and sales data, uncovering a

critical insight that age is a major factor influencing

online purchasing decisions. This discovery is crucial

for developing more targeted e-commerce marketing

strategies. (Vankhede, Kumar, et al. ,

2024)Additionally, the Customer Behavior Mining

Framework integrated K-means clustering and

decision trees to anticipate customer actions,

demonstrating moderate accuracy. However, the

study suggests that the framework's performance

could be enhanced by leveraging more sophisticated

algorithms. Lastly, web usage mining was used to

gather valuable insights by analyzing client, server,

and agent logs. This approach offered a holistic view

of customer behavior, examining both technical and

interaction aspects within e-commerce platforms to

provide deeper understanding and actionable insights.

3 DESIGN AND PRINCIPLE OF

OPERATION

3.1 METHODOLOGY

The dataset utilized for this study is the Instacart

Market Basket Analysis dataset, sourced from the

public dataset repository, Kaggle. (Valecha, Varma,

et al. , 2018) This dataset contains a comprehensive

record of over 3 million grocery orders made by more

than 200,000 users across multiple retailers. It

includes detailed information on customer purchasing

behavior, such as the products ordered, the sequence

of orders, product categories, and reorder patterns.

The procedure is carried out to perform the analysis

effectively to gain the necessary insights for in E-

commerce. The whole methodology is divided into

the following steps as shown in Fig. 1:

• Data Collection

• Data Preprocessing

• Feature Engineering

• Model Selection

• Model Design

• Training

• Evaluation

• Hyper Parameter Tuning

• Re-Evaluation

Figure 1: Methodology.

INCOFT 2025 - International Conference on Futuristic Technology

650

3.1.1 Data Collection

The dataset utilized for this study is the Instacart

Market Basket Analysis dataset, sourced from the

public dataset repository, Kaggle. This dataset

contains a comprehensive record of over 3 million

grocery orders made by more than 200,000 users

across multiple retailers. It includes detailed

information on customer purchasing behavior, such as

the products ordered, the sequence of orders, product

categories, and reorder patterns. The procedure is

carried out to perform the analysis effectively to gain

the necessary insights for in E-commerce.

3.1.2 Data Preprocessing

Data preprocessing involved several critical steps to

prepare the data for predictive modeling. Initially,

missing values were identified and handled by either

removing incomplete entries or imputing values,

depending on the significance of the feature.

Categorical variables such as product names and

aisles were transformed using label encoding or one-

hot encoding to convert them into numerical formats.

3.1.3 Feature Engineering

A primary focus was on feature engineering, where

new variables were added to improve model accuracy,

including reorder frequency, duration between orders,

and product affinity ratings. These actions were

essential for enhancing the dataset's quality and

maximizing machine learning algorithms'

performance.

3.1.4 Model Selection

Complex patterns and temporal relationships are

captured by Deep Learning models like Artificial

Neural Networks (ANNs) and classic Machine

Learning methods like Logistic Regression, XGboost.

3.1.5 Model Design

The model design phase focuses on architecture,

using layers and activation functions suited to the

dataset. A specific model architecture, using a four-

layer structure with a ReLU activation function,

yielded superior accuracy.

3.1.6 Training

The dataset was separated into training and test sets in

order to evaluate the model's performance. Some of

the algorithms that were trained using features

generated from product data, client orders, and

reorder trends are XGboost, Artificial Neural

Networks, and logistic regression.

3.1.7 Evaluation

Performance is evaluated using criteria like as

accuracy, precision, recall, and F1-score. The efficacy

of the model is further validated by comparative

analysis. Businesses gain from the system by using it

to make better decisions, target customers more

effectively, and maximize marketing campaigns.

3.1.8 Hyper parameter Tuning

The act of adjusting a machine learning model's pre-

training parameters—which are not determined by the

data—is known as hyperparameter tuning. The

performance of the model is greatly influenced by

these variables, also known as hyperparameters.

(Baderiya, Chawan, et al. , 2018) Two common

tuning methods are Random Search and Grid Search.

While Random Search selects random combinations

of hyperparameters, Grid Search tests a preset set of

values for each hyperparameter in detail.

3.1.9 Evaluation

The results are then re-evaluated after tuning the

hyper parameters till the expected outcomes are

obtained.

4 RESULTS AND ANALYSIS

We used XGBoost, Logistic Regression, and

Artificial Neural Networks (ANN) to assess our

system's performance. To ensure data quality and

relevance to our analysis, these models were trained

on a carefully preprocessed dataset that underwent

intensive cleaning, normalization, and feature

engineering. To accomplish the goals, exploratory

data analysis is then carried out. The trained models

are assessed using metrics including recall, F1-score,

accuracy, precision, and ROC curves.

Accuracy: This is the proportion of accurately

predicted instances to all instances in a dataset. It is a

typical metric for assessing models of categorization.

The calculation of accuracy is as:

Accuracy =

Number of Correct Predictions

Total number of Predictions



∗100

Enhancing Customer Purchasing Behaviour Prediction in E- Commerce: A Deep Learning Perspective

651

A model is considered to have a high accuracy

score when its predictions closely match the actual

outcomes. (Sabbeh, 2018) However, it's important to

keep in mind that accuracy may not always be a

reliable indicator for unbalanced datasets; in these

Precision: This calculates the percentage of

accurate positive predictions among all the model’s

positive predictions

Precision =

𝑇𝑃

𝑇𝑃+𝐹𝑃

F1-Score: The F1-score, which is the harmonic

mean of recall and precision, is a single measure that

strikes a balance between the two.

𝐹1− score =2∗

Precision ∗ Recall

Precision + Recall



Recall: Recall, or sensitivity, is the proportion of

actual positive examples that the model correctly

predicted situations, other metrics like precision,

recall, or F1-score may provide greater insight into

the model's performance

Recall =

𝑇𝑃

𝑇𝑃 +𝐹𝑁

4.1 Logistic Regression Classifier

Logistic regression is a widely used statistical

technique for binary classification that predicts the

likelihood of an outcome based on one or more

predictor variables. (Saroja, Kannan, et al. , 2018) It

makes use of the logistic function, which converts a

linear feature combination into a likelihood score with

a range of 0 to 1.

Through the establishment of a threshold of 0.5,

the model divides the result into two categories. The

accuracy evaluation yielded a value of 78%. Class 0.0

is more accurately predicted by the model than class

1.0, as seen by its higher precision and recall. Since

class 1.0 has a lower recall, it is possible that there is

a problem with the model's performance for this class

or with class imbalance since the model is having a

trouble identifying instance of a class.

Table 1 Logistic Regression Classification report.

Class Precision Recall F1-score

0 0.80 0.96 0.87

1 0.66 0.26 0.37

Accurac

0.78

4.2 XGBoost Results

The gradient boosting framework for classification

and regression tasks is improved by the potent and

effective machine learning algorithm known as

XGBoost (Extreme Gradient Boosting). It works by

creating a group of decision trees, with each new tree

trying to fix the mistakes produced by the ones before

it. (Yunshengi, Qianqian, et al. , 2018). Two

important aspects of XGBoost are its parallel

processing capability, which speeds up model

training, and its regularization capabilities, which aid

in preventing overfitting.

To maximize performance, the method makes use

of sophisticated strategies like tree trimming,

handling missing values, and integrated cross-

validation. Large datasets and complicated issues are

particularly well suited for XGBoost, which is why it

is a popular choice for both real-world applications

and data science contests.

Because of its adaptability, practitioners can

achieve strong model performance and excellent

predicted accuracy by customizing it through

hyperparameter tuning. When evaluation the

accuracy, the obtained result is 74%. In comparison

to class 0.0, class 1.0 (the minority class) has a lower

F1 score of 0.37, indicating that the model performed

less well in class 1.0 prediction.

A lower score indicates difficulties with either

precision, recall, or both for class 1.0. Recall and

precision are combined into one measure, the F1

score. Based on its AUC score of 0.83, the model

seems to have a reasonable ability to differentiate

between the classes. While the general average F1

score of 0.60 indicates the overall performance across

both classes, the weighted average F1 score of 0.79

accounts for the distribution of course.

Table.2 XG Boost Classification report.

Class Precision Recall F1-score

0 0.97 0.74 0,84

1 0.24 0.77 0.37

Accuraca

0.74

The confusion matrix for the XGBoost model is as follows:

INCOFT 2025 - International Conference on Futuristic Technology

652

Figure 2: XGBoost Classification matrix.

Figure 3: XGBoost ROC Curve.

4.3 Artificial Neural Network ( ANN )

An Artificial neural network (ANN) classifier is a

computational model that resonates the human brain

and is designed to perform a range of classification

tasks. ANNs are highly helpful for classification

applications since they can learn complex and non-

linear correlations in data. (Sharma, Vidyalakshmi, et

al. , 2014) The network employs optimization

techniques like gradient descent during the learning

phase to minimize the discrepancy between the

expected and actual results.

This method, called backpropagation, is applied to

change wights. Backpropagation is the technique by

which artificial neural networks (ANNs) train by

modifying the weights of connections based on the

prediction error. This high level of accuracy reflects

the model’s strong overall performance. With an F1

Score of 0.89, the model appears to be able to handle

the majority class well, as evidenced by the balanced

precision and recall over the whole dataset.

The model performs well in class discrimination,

demonstrating a strong capacity to differentiate

between the two classes with an AUC-ROC score of

0.81. Macro Average F1 Score: At 0.66, this metric

provides an average performance measure across both

classes without considering class imbalance.

Weighted Average F1 Score: At 0.89, this metric

accounts for class distribution, reflecting better

performance when considering the number of

instances in each class. In our comparative analysis,

we benchmarked the performance of ANN model

against other machine learning techniques commonly

used in predictive analytics.

The following pictures shows the classification

matrix and the ROC curves for the ANN model:

Figure 4: Training and Validation Accuracy graph.

Figure 5: Classification matrix for ANN.

Enhancing Customer Purchasing Behaviour Prediction in E- Commerce: A Deep Learning Perspective

653

Figure 6 : ANN ROC Curve.

4.4 Comparison Table

The table below compares the F1-scores, recall,

accuracy, and precision of the models Logistic

Regression, XGBoost, and ANN

Table .3 Comparison Table.

Metrics/Model Logistic

ression

XgBoost ANN

Accuracy 0.78 0.74 0.9013

Precision 0.70 0.61 0.89

Recall 0.55 0.70 0.78

F1-Score 0.47 0.47 0.76

The four main metrics—accuracy, precision,

recall, and F1- score—that are used to measure the

efficacy of Logistic Regression, XGBoost, and

Artificial Neural Networks (ANN) are compared in

the table. ANN outperforms both XGBoost (74%) and

Logistic Regression (78%) in terms of overall

performance, with the maximum accuracy of 90.13%.

Comparison graph between the accuracies of the

models Logistic Regression, XGBoost, ANN is

shown as below:

Figure 7 : Model comparison.

When comparing the models based on accuracy,

the Artificial Neural Network (ANN) classifier

demonstrates the highest performance, achieving an

accuracy of 90.13%, significantly outperforming both

Logistic Regression and XGBoost. Logistic

Regression, with an accuracy of 78%, performs better

than XGBoost, which scores 74%, but both fall short

of the ANN's capability. The substantial difference in

accuracy between ANN and the other models

suggests that ANN is more effective at capturing

complex patterns in the data, making it the most

suitable model for tasks requiring high predictive

performance.

Figure 8: Accuracy comparision with other researchers.

The Fig.8 shows the improvement in model

accuracy over time. In 2024, ANN achieved the

highest accuracy at 90.13%, followed by ensemble

stack algorithms in 2023 with 88%. In 2022, a

combination of classifiers like C4.5 and MLR reached

87%. Earlier models, such as MLPNN and Naive

Bayes in 2018, achieved 85%, while the Decision

Tree in 2015 had the lowest accuracy at 72%. The

trend highlights significant advancements in

predictive accuracy, with ANN and ensemble

methods at the forefront.

5 CONCLUSION

With an accuracy of 90.13%, our study shows how

powerful advanced deep learning techniques—

specifically, Artificial Neural Networks (ANNs) and

Recurrent Neural Networks (RNNs)—can be in

forecasting consumer purchase behavior in e-

commerce. This demonstrates how the models can

identify intricate patterns in data that traditional

approaches frequently fail to pick up on. Thorough

data pretreatment, hyperparameter tweaking, and

cross-validation were essential to this

accomplishment since they guaranteed clean, well

INCOFT 2025 - International Conference on Futuristic Technology

654

structured data and optimized model performance.

The models provide businesses with insightful

information that helps them make data-driven

decisions and customize marketing to increase sales

and improve consumer engagement. Subsequent

investigations could delve into sophisticated

structures such as Transformer models to enhance

performance, tackle scalability issues for real-time

implementation, and integrate ensemble learning

techniques. Predictive accuracy could be further

increased by incorporating additional data sources

and improving feature engineering.

REFERENCES

Sarkar M, Ayon E. H, Mia M.T, Ray R.K, Chowdhury M.S,

Ghosh B.P, Al Imran , M. Islam, M.T Tayaba and Puja

A.R, “Optimizing E-Commerce Profits: A

Comprehensive Machine Learning Framework for

Dynamic Pricing and Predicting Online Purchases,”

Journal of Computer Science and Technology Studies,

pp. 186-193, 2023.

Suresh Kumar S, Margala M and Siva Shankar S, “A Novel

Weight Optimized LSTM for dynamic pricing solutions

in e-commerce platforms based on customer buying

behaviour”.

Chaubey G , Gavhane P.R and Bisen D, “Customer

Purchasing Behavior prediction using machine learning

classification techniques,” J Ambient Intel Human

Comput , pp. 16133-16157, 2014.

P. Vankhede and S. Kumar, “Predictive Analytics for

Website User Behavior Analysis,” in IEEE

International Students Conference on Electrical,

Electronics and Computer Science (SCEECS), Bhopal,

India, 2024.

Harsh Valecha, Aparna Varma and Ishita Khare,

“Prediction of Consumer Behaviour Using Random

Forest Algorithm,” in 5th IEEE Uttar Pradesh Section

International Conference on Electrical, Electronics and

Computer Engineering (UPCON), 2018.

Mr. Shrey Harsh Baderiya and Prof. Pramila M. Chawan,

“Customer Buying Prediction Using Machine Learning

Techniques: A Survey,” International Research Journal

of Engineering and Technology (IRJET), vol. 05, no.

10, 2018.

Sahar F. Sabbeh, “Machine Learning Techniques for

Customer Retention: A Comparative Study,”

International Journal of Advanced Computer Science

and Applications, vol. 9, 2018.

M.N. Saroja, S. Kannan and K.R. Baskaran, “Analysing the

Purchase Behavior of a Customer for Improving the

Sales of a Product,” International Journal of Recent

Technology and Engineering (IJRTE), vol. 7, no. 4S,

pp. 2277-3878, 2018.

Ge Yunshengi, Zhang Qianqian and Kong Jie, “Research on

the Prediction of User behaviour based on neural

network,” in ICIIP, 2018.

Pooja Sharma, Vidyalakshmi Nair and Amalendu Jyotishi,

“Patterns of Online Grocery Shopping in India: An

Empirical Study,” in CONIAAC, Amritapuri India,

2014.

Enhancing Customer Purchasing Behaviour Prediction in E- Commerce: A Deep Learning Perspective

655