2022, highlighting how the industry is deeply
impacted by macroeconomic factors such as the
pandemic. The research demonstrated that the
automotive industry fared better in the pre-pandemic
period compared to the pandemic period (Toma,
2023).
As one of the largest sectors in terms of
employment and technological innovation, the
industry's impact extends far beyond the vehicles it
produces. The integration of modern supply chain
practices, such as modular procurement, has made the
automotive sector a driver of global economic
development. However, the COVID-19 pandemic
disrupted these supply chains, leading to shortages of
key components, such as semiconductors, which has
further impacted car prices (Radić. N & Radić. V,
2021). Another study emphasized the ripple effects of
supply chain disruptions in the automotive industry,
underscoring how these disruptions not only led to a
decline in production but also contributed to price
volatility in the market (Asghar et al., 2021). As the
global economy recovers from the effects of the
pandemic, the used car market has experienced rapid
growth. Many buyers, unable to afford new vehicles,
have turned to used cars as a more affordable option.
According to research, the surge in used car demand,
spurred by the shortage of new cars and a rise in
consumer purchasing power, has driven up the prices
of used cars (Das Adhikary et al., 2022). With the
rising demand, car sellers have taken advantage by
listing vehicles at inflated prices, further emphasizing
the need for accurate car price prediction models to
help buyers make informed decisions.
A number of machine learning algorithms, each
with specific advantages and disadvantages, have
been used to forecast automobile values, helping
buyers and sellers evaluate vehicles more accurately.
According to a study, LR models are useful for
estimating the cost of used automobiles and
emphasize the significance of a vehicle’s attributes
such as its make, model, condition, and mileage (Muti
& Yıldız, 2023). LR, while effective, often struggles
with non-linear data patterns, leading researchers to
explore more advanced models like Random Forests
(RF) and DT for better accuracy. Another study used
Random Forest models with more than 200 DTs to
predict used car prices, achieving high accuracy rates
(Ranjith, 2021). The study showed that because RF
can handle complicated interactions between
variables and many features, it performs better than
other regression models. Another study evaluates the
increasing complexity of China's used car market by
using machine learning models, including
LightGBM, to analyze key factors from five datasets,
ultimately constructing a predictive model that
enhances used car sales strategies. (Wang et al.,
2022). Additionally, a 2022 research introduces an
intelligent framework using artificial neural networks
to estimate used car prices, outperforming traditional
models like random forests in accuracy, as validated
with large datasets of U.S. vehicles. (Pillai, 2022).
Moreover, ensemble machine learning methods like
Random Forest, Support Vector Machine, and
Artificial Neural Network were used in a study on
Bosnia and Herzegovina's car price prediction. Using
this method produced a model with an accuracy of
87.38%. (Gegic et al., 2019).
However, the rise of electric vehicles (EVs) and
advancements in digitalization are reshaping the
automotive industry. The EU and Germany's focus on
the decarbonization of the automotive sector
highlights the impact of environmental regulations on
car prices. As consumers shift towards EVs, machine
learning models must adapt to include variables such
as battery life, charging infrastructure, and
government incentives (Nettekoven, 2023).
Although there are already various researches on
car price prediction, there isn’t a comprehensive
comparative analysis of different algorithms that can
predict car prices. Also, different prediction methods
may perform differently in certain situations in reality.
Therefore, this paper will use experiments to show
which algorithm has the best performance in general
and how those algorithms can perform better than
each other in different situations.
3 METHODOLOGYS
3.1 Data Description and Preparation
The dataset from Kaggle was used in this
investigation because it has a number of attributes,
including make, model, horsepower, mileage. The
brands and makes of cars might serve as a
representation of what consumers choose to purchase
in the present day. Because of its richness and
significance, it is a good fit for creating reliable
prediction models.
The first step in data preparation involved
cleaning the dataset, where missing values were
addressed through imputation. For continuous
variables, median values were used, while mode
values were applied to categorical variables. Feature
engineering was also performed to enhance model
performance by creating new features from existing
data, such as categorizing continuous variables like
mileage and age into bins.