focused on generic market pricing, leaving out most
of the critical gaps in the sector-specific prediction
(Kim & Kang, 2019). In most cases, statistical
methods cannot define the very complex and non-
linear relationships between different variables of the
market and the price of stock (Zhong & Enke, 2017).
In addition, existing models fail to include industry-
specific knowledge, a vital factor for accurate
prediction in the manufacturing sector (Chandola,
Banerjee, & Kumar, 2009; Oztekin et al., 2016).
It is also known that previous research has limited
model interpretability and practical application (Hsu
et al., 2026; Sezer, Gudelek, & Ozbayoglu, 2020).
Although some researchers are able to achieve high
accuracy rates in controlled environments, the same
cannot be said for real-world performance, as they
tend to assume inadequate consideration of market
microstructure and industry-specific factors
(Khaidem, Saha & Dey, 2016). Moreover, the
absence of a framework of robust evaluation of
prediction systems that includes the accuracy of
prediction as well as the stability of the model has
inhibited progress in developing the reliable
prediction systems (Kumar et al., 2016).
In this study, the stock price prediction is
investigated within the electrical manufacturing
sector based on a multi-stage framework aimed to
improve the accuracy and stability of the resulting
model. In the framework, there are several major
components such as data preprocessing, feature
engineering, model training, performance evaluation,
and predictive analysis, which I compare three of the
machine learning models: Random Forest (RF),
XGBoost, and Gradient Boosting Decision Trees
(GBDT) based on R² values, volatility, and trend
stability. The results show that RF is better than the
rest of the models in terms of accuracy and robustness
in prediction and can serve as a preferred choice for
short- and long-term prediction.
2 METHODOLOGY
2.1 Data Collection and Sources
As expounded by Lin et al., the type of data collection
is complete when it uses the two major sources of
primary data only. Historical trading data from Yahoo
Finance with market information validation of the
total of 1,904 trading days from the year 2016 to 2025
(Lin et al., 2018). They also include 12 main price
variables, volume measures, and technical
characteristics features selected based on the
recommendation of Nguyen and Lee (2017).
2.2 Data Preprocessing and Feature
Engineering
All the necessary source data are obtained from
different sources and combined in order to create the
database that consists of 1904 trading days and
contains only one data collection format. Yahoo
Finance and Eastern’s finance were used to obtain the
trading history of the company. In this case the total
number of raw variables is for price metrics thirteen
with volume indicators consisting of total value
weight, quantities and for technical features the total
is thirteen. The missing value imputation, outlier
detection and series validation were made under the
following specifications of robust data cleaning.
The workflow is organized into five key stages:
(1) Data Preprocessing, where raw stock price data is
collected, cleaned, normalized, and structured into
time-series formats to ensure sequential consistency;
(2) Feature Engineering, which extracts relevant
financial indicators and technical features while
employing dimensionality reduction techniques to
enhance computational efficiency; (3) Model
Training, involving the training of Random Forest
(RF), XGBoost, and Gradient Boosting Decision
Trees (GBDT) models on historical data, with
hyperparameter tuning to optimize predictive
performance; (4) Performance Evaluation, where
models are assessed using metrics such as R², mean
absolute error (MAE), and trend stability (S),
alongside volatility tracking and error distribution
analysis to determine reliability; and (5) Predictive
Analysis, where the best-performing model (RF) is
applied to generate short-term and long-term
forecasts, with trend-capture rate analysis confirming
its robustness. Other than that, this structure ensures
the integrity of the data and at the same time provides
more accurate order of magnitude predictions by
orders of magnitude, and more stability of the orders
of magnitude with the majority of the financial
forecasting applications.
2.3 Feature Construction
In terms of feature engineering, there are three main
components: technical analysis, fundamental
indicators, and temporal features. Its technical
features comprise a conventional price money metric
together with sectoral abnormalities. It contributes to
advancing the state-of-the-art in the integration of
supply chain dynamics, knowledge spillovers, and
supply chain performance with electricity-generating
sources and power transmission.