
frequency, offering business insights in addition to
predictions.
2 RELATED WORKS
The availability of large-scale behavioral data and
advances in computational techniques have enabled
machine learning methods such as gradient boost-
ing to model user purchase behavior without relying
on restrictive parametric assumptions (Wang et al.,
2023). Algorithms such as logistic regression, ran-
dom forests, and gradient boosting have been applied
to predict outcomes ranging from repurchase propen-
sity to purchase frequency or spending.
Ensemble methods, particularly gradient boosting
trees, have emerged as the most effective for these
tasks. Song and Liu demonstrated that XGBoost im-
proves purchase prediction in e-commerce by incor-
porating diverse behavioral features (Song and Liu,
2020).Yang et al. extended this direction by combin-
ing Random Forest and LightGBM in a hybrid en-
semble to address class imbalance and enhance re-
purchase prediction (Yang et al., 2021). These stud-
ies show that ensemble approaches can surpass tra-
ditional RFM and probabilistic models by leveraging
richer predictors and nonlinear interactions. Feature
engineering remains central, with extensions of RFM
to include variables such as customer tenure, inter-
purchase intervals, and engagement metrics further
improving predictive power. Overall, machine learn-
ing, especially ensemble learning, has become a cor-
nerstone in purchase frequency prediction.
Wang et al. proposed a user purchase behav-
ior prediction model based on XGBoost, leveraging
multi-dimensional behavioral features such as histor-
ical transaction patterns, account activity metrics, and
user segmentation tags to accurately forecast future
purchasing behavior (Wang et al., 2023). Building
on this direction, Sun et al. applied gradient boost-
ing decision trees (GBDT) and random forests to
predict key components of customer lifetime value
(CLV), particularly purchase frequency, and showed
that these models outperformed classical probabilistic
approaches such as Pareto/NBD and Pareto/GGG on
real-world retail datasets (Sun et al., 2021). In practi-
cal settings, CLV prediction is often decomposed into
sub-tasks, churn probability, expected frequency, and
average value, with boosting applied to each.
Overall, gradient boosting methods stand out for
their accuracy, flexibility, and ability to incorporate
diverse features in frequency prediction. They con-
sistently outperform traditional models, though chal-
lenges remain in interpretability and integration with
newer techniques.
While gradient boosting dominates structured re-
tail data, newer methods are emerging to capture se-
quential dynamics and improve interpretability. Deep
learning models, such as LSTMs and transformers,
have been applied to customer transaction histories,
treating them as sequences of events. For exam-
ple, attention-based LSTMs have been used to pre-
dict high-value customer behavior, while transformer
architectures have shown advantages when long and
complex purchase cycles are involved (Lathwal and
Batra, 2024; Kim et al., 2023).
Hybrid approaches combine machine learning
with probabilistic or domain-specific modeling to bal-
ance flexibility and structure. Examples include two-
stage models where neural networks predict distribu-
tional parameters for purchase counts, later refined
with boosting, or reinforcement learning frameworks
that go beyond prediction to optimize marketing in-
terventions.
In sum, recent work explores deep learning for
sequential behavior, hybrid models to integrate prior
knowledge, and XAI methods for interpretability. Yet
ensemble trees, particularly boosting algorithms, re-
main the strongest baseline for purchase frequency
prediction, balancing accuracy, scalability, and trans-
parency (Grinsztajn et al., 2022). These developments
provide the foundation for our boosting-based frame-
work for next-month purchase frequency prediction.
3 DATA AND PREPROCESSING
3.1 Dataset Description
The dataset was constructed from a one-year trans-
actional history preceding the prediction month. The
target was defined as the purchase frequency in June
2025, categorized into three discrete classes: (i) one
purchase, (ii) two to three purchases, and (iii) four or
more purchases (heavy buyers). The cohort included
only active customers with at least one completed or-
der in the historical period. The final dataset con-
tained several million rows, with class distributions
reflecting the natural skew typically observed in e-
commerce frequency prediction tasks.
3.2 Feature Groups and Selection
Features originated from a customer-level datamart
that aggregates behavioral, transactional, and demo-
graphic signals, accumulating nearly 600 raw at-
tributes per member. For clarity, these features can
be described in the following categories:
Predicting Purchase Frequency in e-Commerce: Hybrid Machine Learning Approach
65