Comparison of Machine Learning Methods in Performance of Phishing Website Classification
Jiahan Xie
2025
Abstract
In this article, three models using Decision Tree, Random Forest, and Gradient Boosting methods are built and evaluated in performance of phishing website classification. The model training and comparisons are based on a relevant dataset from Mendeley Data, on which a thorough data preprocessing and feature selection process is applied to ensure the quality of model evaluation, including handling of erroneous features and encoding for domain-related variables. Afterwards, grid search and a hybrid two-stage searching approach based on cross-validation are used for hyperparameter tuning. The Gradient Boosting model achieves the best performance regarding multiple evaluation metrics on the test set, with Random Forest being a close alternative. This result demonstrates that the use of ensemble learning methods can build more efficient classifiers compared to traditional machine learning methods. The study provides guidance for the selection of classification models for phishing websites and is expected to be helpful in future research concerning other ensemble machine learning models and deep learning models.
DownloadPaper Citation
in Harvard Style
Xie J. (2025). Comparison of Machine Learning Methods in Performance of Phishing Website Classification. In Proceedings of the 2nd International Conference on Innovations in Applied Mathematics, Physics, and Astronomy - Volume 1: IAMPA; ISBN 978-989-758-774-0, SciTePress, pages 434-444. DOI: 10.5220/0013827200004708
in Bibtex Style
@conference{iampa25,
author={Jiahan Xie},
title={Comparison of Machine Learning Methods in Performance of Phishing Website Classification},
booktitle={Proceedings of the 2nd International Conference on Innovations in Applied Mathematics, Physics, and Astronomy - Volume 1: IAMPA},
year={2025},
pages={434-444},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013827200004708},
isbn={978-989-758-774-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 2nd International Conference on Innovations in Applied Mathematics, Physics, and Astronomy - Volume 1: IAMPA
TI - Comparison of Machine Learning Methods in Performance of Phishing Website Classification
SN - 978-989-758-774-0
AU - Xie J.
PY - 2025
SP - 434
EP - 444
DO - 10.5220/0013827200004708
PB - SciTePress