overfitting and maintains a high predictive power on
unseen data. The Random Forest model shows a
similar trend with the application of interpolation,
especially when the KS values are improved
significantly. For the KNN model, despite the small
improvement in accuracy and AUC, the decrease in
KS values may suggest that the model faces
challenges in preventing overfitting. The KNN model
relies on local neighbourhood information, and
interpolation may have introduced some local biases
that do not represent the global pattern of the data,
which leads to a decrease in the model's ability to
generalise on new data. As the risk of overfitting is
closely related to model complexity, KNN being a
simpler model, the local noise introduced by
interpolation may cause the model to perform less
well than expected on the retained set.
The stacked model showed the most significant
performance improvements, especially in the AUC
and KS values. These significant improvements
suggest that simple linear interpolation greatly
reduces the random noise in the data, allowing the
stacked models to better learn and generalise the
global features of the data. For such complex models,
the improvement in interpolation processing helped
them to perform better on the retained set, suggesting
that interpolation not only helped to prevent
overfitting but also enhanced the generalisation
ability of the model.
4 LIMITATIONS AND FUTURE
PROSPECTS
While the study demonstrates that SLI can enhance
model performance, it is important to acknowledge
certain limitations. First and foremost, this study just
concentrates on data pertaining to a solitary stock
(Tesla), hence limiting its applicability to other stocks
or financial instruments. The data attributes of the
Tesla stock, such as volatility and trading volume,
might impact the reliability of SLI, and hence, the
outcomes may vary for equities with distinct trading
patterns or in diverse market circumstances.
Secondly, the study chose four models (XGBoost,
Random Forest, k-nearest Neighbors, and Stacking
Model) based on a comparison of baseline models.
The study did not investigate the effect of SLI on
other potentially pertinent models, such as neural
networks or other integration methods, which may
exhibit distinct responses to interpolation techniques,
despite the fact that these models encompass a range
of machine learning methodologies. Furthermore, the
study exclusively employed Accuracy, AUC, and KS
as performance indicators. Although these indicators
are often used and significant, they do not encompass
all facets of model performance. Metrics like as
Precision, Recall, and F1 Score can offer valuable
insights when working with highly imbalanced data,
a regular occurrence in financial markets.
Regrettably, the study does not thoroughly
investigate the overfitting problems that may arise
from SLI. Although SLI might enhance the
consistency of data, it can also generate artifacts that
some models may overfit, particularly in models such
as K-nearest neighbors that are sensitive to the local
structure of data. Additional examinations, such as
cross-validation and evaluation on data that was not
used during training, are required to verify that the
reported improvements in performance are not only a
result of overfitting.
5 CONCLUSION
This study investigates the impact of linear
interpolation on the effectiveness of machine learning
predictive models for stock data based on Tesla stock
data. Stepwise feature selection was used to optimise
the model. Also, Logistic Regression, Decision Tree,
K Nearest Neighbors, Random Forest, Gaussian
naive bayes, Light GBM, XGBoost, Gradient
Boosting, and Neural Network were used for
prediction, and the three algorithms with the best
results were selected for parameter tuning.
The results
show that SLI improves model accuracy, as well as
AUC and KS statistics to a certain extent, especially
on stacked models and integration methods that
exhibit significant performance gains. This suggests
that SLI, as a data preprocessing technique, can
enhance the predictive power of models by improving
data consistency and completeness. However, the
study also reveals that SLI's effect is inconsistent
across different models and data characteristics,
especially in contexts where overfitting may be
triggered, and SLI needs to be applied with more
caution. Also, this study has the limitation of having
a single set of data and a small number of model
choices. Nevertheless, the results of this study
provide necessary empirical support for the
application of SLI in financial data modelling,
highlighting the crucial role of data preprocessing in
enhancing the performance of machine learning
models.
Irregular Stock Data Prediction Performance Optimisation Based on the Simple Linear Interpolation