extensively explored the influencing factors and
predictions of Shopping Satisfaction. For instance, a
study conducted by Ludin collected data on the
factors influencing customer satisfaction among
consumers in the Klang Valley and analysed the data
through descriptive analysis and regression analysis
(Ludin et al., 2014). Another related work by Lin used
Structural Equation Modeling (SEM) as the main
analysis tool to analyze the shopping experience of
users of major shopping websites in Taiwan, and the
results showed that website service quality can
directly have a positive impact on customer
satisfaction (Lin et al., 2009). Another related work
of Gim used methods such as association analysis and
multiple regression analysis to analyze the
influencing factors of shopping satisfaction of Viet
Nam consumers (Gim, 2014).
However, a few studies have been conducted on
Shopping Satisfaction in the previous literature. In
this case, the purpose of this paper is to further build
an accurate model through a classification algorithm
to make a more accurate prediction of Shopping
Satisfaction. This research will contribute to existing
research by using EDA and machine learning
algorithms to predict customer satisfaction with
shopping. By analyzing a comprehensive data set, this
study will identify the factors that will have the
greatest impact on purchase satisfaction and build a
predictive model to predict their purchase satisfaction.
The conclusions of the study are very meaningful for
online platforms, according to which they can
improve the mechanism and recommendation
methods of their platforms, and adopt different
strategies to increase customer purchase satisfaction,
thereby increasing sales and increasing revenue.
2 DATASET
This section will include data preprocessing, feature
engineering, feature selection, and data partitioning.
These steps will better assist this study in exploring
the influencing factors of shopping satisfaction.
2.1 Dataset Preparation
The dataset used in this article is the Amazon
consumer behaviour dataset, which comes from the
Kaggle website and is sourced from (Kaggle, 2023).
This is a publicly available dataset created by Swathi
Menon that contains 22 characteristics of amazon
consumer behavior. This paper analyzes 22 features
in the dataset to explore the influencing factors of
shopping satisfaction and establishes a model to
predict shopping satisfaction.
Analysis of this dataset can help companies
identify the most influential factors of purchase
satisfaction and explore the interrelationships
between these factors to form the causes. These
findings can provide valuable insights for companies
to help them develop effective sales strategies to
increase consumer satisfaction with their purchases,
thereby increasing sales.
After carrying out the analysis for missing values
count, it can be found there are fewer missing values
in the dataset, with only two missing values in
Product_Search_Method. So, this study just drops out
the missing values.
2.2 Feature Engineering
Feature engineering involves enhancing the
performance of predictive models by transforming
the feature space of a dataset (Nargesian, et al., 2017).
In order to make effective use of variables in the
analysis, this article performs label encoding using a
library in Python using “sklearn.preprocessing”.
Label encoding is a critical pre-processing step for
categorical data, as it allows to represent categories
numerically, which is necessary for many machine
learning algorithms.
Label encoding requires a total of two steps. First,
this paper imported the required libraries. After that,
eighteen categorical variables such as "age",
"gender", etc were chose for analysis. Eventually, all
the categorical variables were expressed in numerical
form in this study.
2.3 Importance and Feature Selection
In order to examine the variables that are related to
purchase satisfaction, a correlation analysis was
performed in this paper.
Importance analysis can explore the relationships
between variables. Search accuracy is highly
importance to purchase satisfaction, which means that
if a customer searches for exactly what he wants to
buy, the probability of his satisfaction will be very
high. The importance of shopping type, customer
evaluation, and personalized recommendation and
purchase satisfaction is relatively high. However,
there are some variables with very low importance
with purchase satisfaction, suggesting that there is no
significant correlation between them.
Therefore, in this case, the feature selection was
carried out. Feature selection is used to clean up the
noisy, redundant and irrelevant data (Venkatesh et al.,
2019). This study picked the top 10 most important