Figure 9, Figure 10, and Figure 11 display the
scatter plots of y_test vs y_pred_test for the LR, RR,
and RF, respectively, presenting the test samples'
actual and expected values in comparison.
The predicted vs actual values plot is essential for
assessing model accuracy. It shows the correlation
between the test values that were obtained and the
values that had been expected. An ideal model would
show data points closely aligned with the diagonal
line where y_test equals y_pred_test. Points above the
line indicate under-predictions, while points below
suggest over-predictions.
The plots for both LR and RR show that the
majority of data points are closely aligned with the
diagonal line. This alignment indicates that both
models are generally accurate in their predictions.
Although there are some deviations, most predictions
show steady performance, coming in fairly near to the
actual values.
The RF plot shows that most points are also
aligned with the diagonal line. However, there are a
few points above the line, which indicate under-
predictions where the model’s predicted values are
lower than the actual values. This suggests that while
the RF performs well overall, it occasionally
underestimates car prices, reflecting some variability
in its predictions.
In the context of regression analysis, LR and RR
are known for their excellent stability. These features
make them particularly useful when they are used to
understand how the model makes its predictions.
They provide clear explanations of the relationship
between features and target variables. On the other
hand, RF is known for their high prediction accuracy.
However, achieving superb performance with this
model typically requires careful hyperparameter
tuning. This process not only improves the model's
generalization ability to new data but also reduces the
risk of overfitting.
5 LIMITATION AND OUTLOOKS
In this research, different regression models were
compared to predict used car prices. However, several
limitations may affect the accuracy of the models.
Firstly, the quantity and quality of the data present
constraints. The dataset used may have a small
sample size, which could limit the effectiveness and
predictive ability of model training. Additionally,
although missing values were addressed, some
outliers may still be present, introducing noise and
affecting prediction accuracy. Furthermore, feature
selection presents challenges. While basic car
information was used, other important factors
influencing car prices, such as specific car
configurations and changes in market demand, may
have been overlooked. The feature engineering
approach also has limitations; for categorical features
like Fuel_Type and Seller_Type, simple One-Hot
encoding may not fully capture their latent
information.
To address these issues, several improvements are
suggested. Expanding the dataset is crucial for
enhancing model performance. Increasing the sample
size can improve the stability of model training and
predictions, particularly by collecting data from
various sources. Improving data quality control,
especially in managing missing values and outliers,
will enhance the reliability of the models.
Additionally, employing more advanced feature
processing techniques can further improve model
performance. Finally, using complex encoding
methods and different transformation techniques
could also contribute to better model performance and
accuracy.
6 CONCLUSIONS
Accurate forecasting of used car prices is important
for consumers to achieve reasonable purchases,
dealers to set effective prices and manage inventory,
and financial institutions to manage risks better. This
study evaluated the efficacy of many regression
models for used car price prediction and compared
them. By analyzing LR, RR, and RF, and found that
the RF demonstrated the greatest performance on the
training and test datasets. Therefore, it shows that the
RF is better at capturing the complex nonlinear
relationships in the data and providing more accurate
predictions. However, there were some limitations
due to the small dataset, which may impact model
accuracy. Data exceptions and feature selection issues
also affected model performance. Upcoming research
should concentrate on improving data quality control
and exploring additional features. Using advanced
feature engineering and encoding techniques could
further enhance model performance. Overall, this
research provides insights into forecasting used car
prices and highlights the relative advantages and
disadvantages of different regression models at the
same time. By improving data processing procedures
and model training methods, more reliable
predictions can be achieved in practical applications.