Authors:
Athiruj Poositaporn
1
;
2
and
Hanmin Jung
1
;
2
Affiliations:
1
University of Science and Technology, 217, Gajeong-ro, Yuseong-gu, Daejeon, Gyeonggi-do, Republic of Korea
;
2
Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon, Republic of Korea
Keyword(s):
Internet of Things, Pattern Prediction, Prediction Framework, Pattern Analysis, K-means Clustering.
Abstract:
Accurately predicting patterns from large and complex datasets remains a significant challenge, particularly
in environments where real-time predictions are crucial. Despite advancements in predictive modeling, there
remains a gap in effectively integrating clustering techniques with advanced similarity metrics to enhance
prediction accuracy. This research introduces a clustering-based pattern prediction framework integrating Kmeans with our Overall Difference with Crossover Penalty (OD with CP) similarity metric to predict data
patterns. In the experiment, we demonstrated its application in air pollution pattern prediction by comparing
15 different model-cluster combinations. We employed five predictive models: Euclidean Distance, Markov
Chain, XGBoost, Random Forest, and LSTM to predict the next day's pollution pattern across three cluster
sizes (K = 10, 20, and 30). Our aim was to address the limitation of traditional clustering methods in pattern
prediction by evaluating
the performance of each model-cluster combination to determine the most accurate
predictions. The results showed that our framework identified the most accurate model-cluster combination.
Therefore, the study highlighted the generalizability of our framework and indicated its adaptability in pattern
prediction. In the future, we aim to apply our framework to a Large Language Model (LLM) combined with
Retrieval Augmented Generation (RAG) to enhance in-depth result interpretation. Furthermore, we intend to
expand the study to include client engagement strategy to further validate the effectiveness of our study in
real-world applications.
(More)