
Intelligence (AI) and Machine Learning (ML) help
businesses analyze large volumes of data, detect de-
mand patterns, adjust prices in real-time, and antici-
pate market trends (El Youbi et al., 2023). These tech-
nologies enable more dynamic and data-driven pric-
ing strategies (Aparicio and Misra, 2023).
Automatically determining the optimal price re-
mains complex and challenging due to factors like
seasonality, competition, and production costs. These
variables are often interdependent and can change
rapidly, introducing high levels of volatility and un-
certainty into pricing strategies. As a result, compa-
nies struggle to understand the rationale behind pric-
ing recommendations, which can hinder trust in au-
tomated systems and limit their adoption in dynamic
markets.
This article investigates regression models for the
pricing task. These models offer a practical balance
between simplicity, precision, and performance. They
may enable interpretation of how input variables –
such as product weight, dimensions, and stock levels–
affect pricing outcomes. In particular, our investiga-
tion considers the Brazilian automotive sector as the
context of our data-driven pricing models. This con-
text presents high production costs, fluctuating de-
mands, and a strong influence on perceived values.
Such context was poorly studied in the literature and
motivates novel analyses and contributions.
Our investigation introduces the following contri-
butions:
• Original analyses leveraging real-world dataset
from the Brazilian automotive e-commerce sector.
• Evaluation of nine well-studied ML models with
default and fine-tuned hyperparameters, includ-
ing linear, tree-based, and neural network models:
Linear Regression, Random Forest, Lasso, Ridge,
Support Vector Machine (SVM), XGBoost, Light
GBM, Long Short Term Memory (LSTM), and
Feed Forward.
• Interpretation of the model predictions using
SHapley Additive exPlanations (SHAP) (Lund-
berg and Lee, 2017), chosen for its ability to pro-
vide consistent local explanations. SHAP high-
lights the most influential features in pricing,
making it easier to understand how each variable
impacts predictions in the automotive sector.
Our study found that LightGBM and XGBoost
offered the best balance between accuracy and effi-
ciency for price prediction tasks. Hyperparameter op-
timisation further enhanced model performance, con-
sistently reducing prediction errors. SHAP analysis
revealed that key factors influencing product price in-
clude weight, stock quantity, sales volume, and phys-
ical dimensions.
The remaining of this article is organized as fol-
lows. Section 2 reviews related work. Section 3 de-
scribes the overall methodology, including data col-
lection, exploratory analysis, preprocessing, model
development, training, and assessment. Section 4
presents the results whereas Section 5 discusses them.
Section 6 concludes the article and highlights future
investigations.
2 RELATED WORK
The study presented by (Bhaskar et al., 2022) ad-
dresses the issue of consumer deception in the used
car market through price manipulation. To mitigate
this problem, three regression models are developed
to predict the selling price of used vehicles based on
features such as listed price and mileage. The mod-
els evaluated include Linear Regression, Lasso Re-
gression, and an ensemble approach combining both.
The dataset, obtained from Kaggle, underwent pre-
processing steps like removing missing values and
categorical encoding. Among the evaluated models,
the ensemble regression achieved the highest predic-
tive accuracy (94%), indicating its effectiveness in es-
timating used car prices. In contrast to this approach,
our study explores the pricing problem within a dif-
ferent context—namely, the Brazilian automotive e-
commerce sector—where pricing dynamics are influ-
enced by additional factors such as inventory levels,
product dimensions, and temporal variables like the
date of sale. We expand the methodological scope
by evaluating nine distinct regression models, encom-
passing linear, tree-based, and neural network mod-
els. While both studies share the goal of price pre-
diction in the automotive sector, our work addresses a
broader, more complex set of variable.
(Chowdhury et al., 2024) explored the use of su-
pervised ML models to optimize pricing strategies in
e-commerce, with a focus on predicting customer sat-
isfaction. Using a dataset with features such as histor-
ical prices, customer demographics, and transaction
data, the authors compare the performance of Linear
Regression, Decision Trees, Random Forest, SVM,
and Neural Networks models, to predict customer sat-
isfaction based on price prediction. The main con-
tribution of the study is in the comparative evalua-
tion of the models regarding their predictive capac-
ity, where Neural Networks presented the best overall
performance. However, the high computational cost
of the Neural Network may be a barrier to practical
use. Random Forest emerged as a viable alternative,
balancing accuracy (MAE 0.130, R
2
0.82) with inter-
KDIR 2025 - 17th International Conference on Knowledge Discovery and Information Retrieval
148