
2 LITERATURE SURVEY
Phishing attacks, which take advantage of users'
weaknesses to trick users and steal private data, have
become a serious problem in the field of
cybersecurity. In response, a great deal of research
has been done to create effective detection methods,
and machine learning is essential to thwarting these
kinds of attacks.
In this section, a selection of research studies that
utilize the algorithms mentioned above are reviewed,
and their findings are summarized:
Marwa Abd Al Hussein Qasim and Dr. Nahla
Abbas Flayh (2025) examined machine learning
techniques for phishing website identification. In
order to identify phishing websites using
characteristics like URL architecture and website
content, the study focuses on techniques like Support
Vector Machines (SVM), Decision Trees, and
Random Forests. In order to improve detection
accuracy, the researchers stress the importance of
feature selection and dimensionality reduction
methods like PCA. They also discuss the importance
of dataset preprocessing and performance evaluation
metrics. This review underscores the potential of
hybrid models in mitigating phishing threats and
providing efficient solutions to protect users from
cyberattacks.
R. Jayaraj, A. Pushpalatha, K. Sangeetha, T.
Kamaleshwar, S. Udhaya Shree, Deepa Damodaran
(2023) examined machine learning for phishing
website identification. They emphasize the
application of Hybrid Ensemble Feature Selection
(HEFS), which successfully selects features by
combining function perturbation and data
perturbation techniques. The study introduces the
Cumulative Distribution Function gradient (CDF-g)
algorithm to improve feature subset generation and
reduce overfitting. The authors emphasize the
importance of feature engineering and reducing
classifier complexity to enhance phishing detection
accuracy. Their findings suggest that hybrid
approaches provide robust solutions for phishing
threat mitigation.
Machikuri Santoshi Kumari, Chiguru Keerthi Priya,
Gondhi Bhavya, Haridas Tota, Monisha Awasthi,
Surendra Tripathi (2023) examined machine learning
for phishing URL detection. To improve detection
accuracy, they suggest a methodology that combines
blacklisting and boosting strategies. A sizable dataset
of annotated URLs is used to train the method, and
metrics like precision, recall, and F1 score are used to
assess its effectiveness. The authors emphasize the
importance of data collection, URL preprocessing,
feature extraction, and blacklisting in the detection
system. They suggest future improvements, such as
leveraging deep learning and expanded datasets, to
enhance the robustness of phishing detection.
Ameya Chawla (2022) examined machine learning
for phishing website identification. The study
examines typical characteristics of phishing websites
and creates a model to identify them. A dataset was
used to train a number of classifiers, such as Random
Forest, Decision Tree, Logistic Regression, K Nearest
Neighbors (KNN), and Artificial Neural Networks
(ANN). A Max Vote Classifier that combined
Random Forest, ANN, and KNN had the greatest
accuracy of 97.73%; Decision Tree and ANN also
performed well. A web application that uses the
trained model to analyze input URLs and identify
phishing websites is one practical way to implement
the suggested solution.
Sibel Kapan and Efnan Sora Gunal (2023)
reviewed machine learning for phishing attack
detection. They created a new phishing dataset,
combining Alexa and PhishTank URLs. The study
found that URL and HTTP features provided the best
performance, with the decision tree classifier
achieving an F1-score of 0.99. The paper highlights
the significance of feature engineering and classifier
choice in enhancing phishing detection.
Arathi Krishna V, Anusree A, Blessy Jose,
Ruthika Anilkumar, and Ojus Thomas Lee (2021)
reviewed phishing detection models using machine
learning-based URL analysis. The performance and
accuracy of many machine learning methods used for
phishing URL identification are examined in this
research. It highlights that Random Forest often
outperforms other models but notes that performance
varies depending on factors like dataset, train-test
split ratio, and feature selection. The authors mention
the importance of further research to optimize
detection models for accuracy and efficiency.
Jinu Kulkarni and Leonard L. Brown III (2019)
examined machine learning for phishing website
identification. The study looked at a number of
methods for increasing the precision of phishing
detection. Neural networks and support vector
machines (SVM) were tested on a dataset of 1,353
URLs that were classified as phishing, suspicious, or
legitimate. According to their research, these
classifiers had an accuracy of over 90%, and features
like SSL, web traffic, and URL length were crucial
for detection. The authors underlined the necessity to
address problems like overfitting in Decision Tree
classifiers to increase robustness and the growing
significance of machine learning in distinguishing
authentic websites from phishing ones.
ICRDICCT‘25 2025 - INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION,
COMMUNICATION, AND COMPUTING TECHNOLOGIES
20