Improving Machine Learning Methods to Enhance Prediction Accuracy of MBTI Dataset
Kaitao Yan
2024
Abstract
This study presents an optimized machine learning approach to enhance the accuracy and generalization of predicting the Myers-Briggs Type Indicator (MBTI) dataset from Kaggle. Improvements across several modules—namely data preprocessing, feature engineering, model selection, and training methods—resulted in an increase in the accuracy of the original K-Nearest Neighbors (KNN) model from 30% to 45%. Key enhancements in this study include the use of a Term Frequency-Inverse Document Frequency Vectorizer (TfidfVectorizer) instead of a Count Vectorizer for more precise feature extraction, the refinement of text processing through a customized stop word list and a pattern-based token signifier, and the optimization of data processing. Additionally, a comparative analysis of various classification models, such as Support Vector Machines (SVMs) and Random Forest models, is conducted to validate the performance of the improved KNN model across several metrics. The advancements of the enhanced KNN model underscore the effectiveness of the optimization strategy in improving the original KNN model's ability to accurately predict MBTI personality types.
DownloadPaper Citation
in Harvard Style
Yan K. (2024). Improving Machine Learning Methods to Enhance Prediction Accuracy of MBTI Dataset. In Proceedings of the 2nd International Conference on Data Analysis and Machine Learning - Volume 1: DAML; ISBN 978-989-758-754-2, SciTePress, pages 466-472. DOI: 10.5220/0013526200004619
in Bibtex Style
@conference{daml24,
author={Kaitao Yan},
title={Improving Machine Learning Methods to Enhance Prediction Accuracy of MBTI Dataset},
booktitle={Proceedings of the 2nd International Conference on Data Analysis and Machine Learning - Volume 1: DAML},
year={2024},
pages={466-472},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013526200004619},
isbn={978-989-758-754-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 2nd International Conference on Data Analysis and Machine Learning - Volume 1: DAML
TI - Improving Machine Learning Methods to Enhance Prediction Accuracy of MBTI Dataset
SN - 978-989-758-754-2
AU - Yan K.
PY - 2024
SP - 466
EP - 472
DO - 10.5220/0013526200004619
PB - SciTePress