Comparison of Machine Learning Methods in Performance of Phishing Website Classification

Jiahan Xie

2025

Abstract

In this article, three models using Decision Tree, Random Forest, and Gradient Boosting methods are built and evaluated in performance of phishing website classification. The model training and comparisons are based on a relevant dataset from Mendeley Data, on which a thorough data preprocessing and feature selection process is applied to ensure the quality of model evaluation, including handling of erroneous features and encoding for domain-related variables. Afterwards, grid search and a hybrid two-stage searching approach based on cross-validation are used for hyperparameter tuning. The Gradient Boosting model achieves the best performance regarding multiple evaluation metrics on the test set, with Random Forest being a close alternative. This result demonstrates that the use of ensemble learning methods can build more efficient classifiers compared to traditional machine learning methods. The study provides guidance for the selection of classification models for phishing websites and is expected to be helpful in future research concerning other ensemble machine learning models and deep learning models.

Download


Paper Citation


in Harvard Style

Xie J. (2025). Comparison of Machine Learning Methods in Performance of Phishing Website Classification. In Proceedings of the 2nd International Conference on Innovations in Applied Mathematics, Physics, and Astronomy - Volume 1: IAMPA; ISBN 978-989-758-774-0, SciTePress, pages 434-444. DOI: 10.5220/0013827200004708


in Bibtex Style

@conference{iampa25,
author={Jiahan Xie},
title={Comparison of Machine Learning Methods in Performance of Phishing Website Classification},
booktitle={Proceedings of the 2nd International Conference on Innovations in Applied Mathematics, Physics, and Astronomy - Volume 1: IAMPA},
year={2025},
pages={434-444},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013827200004708},
isbn={978-989-758-774-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 2nd International Conference on Innovations in Applied Mathematics, Physics, and Astronomy - Volume 1: IAMPA
TI - Comparison of Machine Learning Methods in Performance of Phishing Website Classification
SN - 978-989-758-774-0
AU - Xie J.
PY - 2025
SP - 434
EP - 444
DO - 10.5220/0013827200004708
PB - SciTePress