Considering the macro average and weighted
average values, the enhanced model demonstrates
notable gains in precision, recall, and F1-score. The
macro average precision increased from 23% to 55%
compared to the original model, while the weighted
average precision rose from 31% to 64%. This
suggests that the enhanced model adapts more
effectively to complex textual data and achieves
superior classification results even with imbalanced
category distribution.
Figure 6: Detailed Output Data Report for Random forest
Models.
6 CONCLUSIONS
The enhanced KNN model shows notable
advancements in text preprocessing, feature
extraction, and model selection, leading to a
substantial increase in MBTI prediction accuracy.
While some categories still exhibit lower precision
and recall, comparing it with other models clearly
reveals a significant boost in overall performance,
including higher classification accuracy and
improved generalization. This suggests that the
optimization strategy used in this study effectively
enhances the KNN model’ s ability to classify
complex text data, making it more suitable for
predicting user MBTI in intricate text scenarios.
REFERENCES
Adnan, K., Akbar, R., 2019. An analytical study of
information extraction from unstructured and
multidimensional big data. Journal of Big Data, 6(1),
1-38.
Adnan, K., Akbar, R., 2019. Limitations of information
extraction methods and techniques for heterogeneous
unstructured big data. nternational Journal of
Engineering Business Management, 11,
1847979019890771.
Alshanik, F., Apon, A., Herzog, A., Safro, I., Sybrandt, J.,
2020, December. Accelerating text mining using
domain-specific stop word lists. In 2020 IEEE
International Conference on Big Data (Big Data),
2639-2648. IEEE.
Amadeus, M., Castañeda, W. A. C., 2023. Clustering
Methods and Tools to Handle High-Dimensional Social
Media Text Data. In Advanced Applications of NLP and
Deep Learning in Social Media Data, 36-74. IGI
Global.
Chai, C. P., 2023. Comparison of text preprocessing
methods. Natural Language Engineering, 29(3),
509-553.
Dai, S., Li, K., Luo, Z., Zhao, P., Hong, B., Zhu, A., Liu, J.,
2024. AI-based NLP section discusses the application
and effect of bag-of-words models and TF-IDF in NLP
tasks. Journal of Artificial Intelligence General Science
(JAIGS), 5(1), 13-21.
Golbeck, J., Robles, C., Edmondson, M., Turner, K., 2011,
October. Predicting personality from twitter. In 2011
IEEE Third International Conference on Privacy,
Security, Risk and Trust and 2011 IEEE Third
International Conference on Social Computing,
149-156. IEEE.
Christian, H., Suhartono, D., Chowanda, A., Zamli, K. Z.,
2021. Text based personality prediction from multiple
social media data sources using pre-trained language
model and model averaging. Journal of Big Data, 8(1),
68.
Hernandez, R. K., Scott, I., 2017, December. Predicting
Myers-Briggs type indicator with text. In 31st
Conference on Neural Information Processing Systems
(NIPS 2017).
Shafi, H., 2021. A machine learning approach for
personality type identification using MBTI framework.
Journal of Independent Studies and Research
Computing, 19(2).
Keinan, R., 2024. Sexism identification in social networks
using TF-IDF embeddings, preprocessing, feature
selection, word/Char N-grams and various machine
learning models in Spanish and English. Working Notes
of CLEF.
Amirhosseini, M. H., Kazemian, H., 2020. Machine
learning approach to personality type prediction based
on the Myers–Briggs type indicator®. Multimodal
Technologies and Interaction, 4(1), 9.
Nguyen, D., Doğruöz, A. S., Rosé, C. P., De Jong, F., 2016.
Computational sociolinguistics: A survey.
Computational Linguistics, 42(3), 537-593.
Tareaf, R. B., 2022, December. MBTI BERT: A
Transformer-Based Machine Learning Approach Using
MBTI Model for Textual Inputs. In 2022 IEEE 24th Int
Conf on High Performance Computing &
Communications; 8th Int Conf on Data Science &
Systems; 20th Int Conf on Smart City; 8th Int Conf on
Dependability in Sensor, Cloud & Big Data Systems &
Application (HPCC/DSS/SmartCity/DependSys),
2285-2292. IEEE.
Shah, K., Patel, H., Sanghvi, D., Shah, M., 2020. A
comparative analysis of logistic regression, random