2 RELATED WORKS
Phishing attack detection has continued to be an
important field of study due to the evolution of
increasingly sophisticated cyberattacks. Machine
learning algorithms for phishing attack detection have
made extensive use of SVM and KNN. Yet, such
models do not generalize well because they are
trained on tagged data and do not acquire experience
about unknown or unseen URLs. To improve these
shortcomings, researchers have emphasized rule-
based systems and hybrid systems that integrate
machine learning and rule extraction methodologies
for increased flexibility, efficiency, and
understandability. (M. SatheeshKumar et al., 2022)
for instance, suggested a rule-based phishing
detection system examining URL, domain, and page
attributes separately without blacklists to improve
real-time detection of zero-day phishing attacks.
Likewise, (Youness Mourtaji et al., 2021)
suggested a hybrid solution that combines rule-based
methods with Convolutional Neural Networks
(CNNs), using multiple viewpoints to enhance
detection performance. Hybrid methods solve the
interpretability problem in machine learning models
by providing explainable rules together with deep
learning power. The application of labeled datasets by
traditional machine learning-based phishing detection
renders them vulnerable to adaptive phishing attacks
(Asif Ejaz et al., 2023) described how attackers
exploit vertical feature spaces to bypass detection,
and proposed Anti-Subtle Phish, which employs
horizontal feature spaces to improve robustness. (Fadi
Thabtah et al.,2021) nonetheless, developed Phish
Alert, a browser plugin that draws rules from trained
machine learning models for real-time anomaly
detection, thus leveraging the advantages of machine
learning and rule-based filtering. Case-based
reasoning (CBR) is also one technique that has gained
prominence for phishing detection. (Lizhen Tang,
Qusay H. Mahmoud 2021) proposed a CBR-based
phishing detection system employing previous trends
of phishing attacks to discover new threats with
minimal reliance on labeled data. (Nureni A et al.,
2022) also proposed a fuzzy deep neural network
model that optimizes phishing detection rules with
better classification efficiency at high accuracy. Other
researchers have investigated rule-based approaches
other than the one described above. (Hassan Abutair
et al., 2019) applied association rule mining to
generate phishing URL patterns without relying on
huge training datasets, thus the approach can
accommodate emerging attack variations more easily
in addition. Moreover, (M. Sathish Kumar et al.,
2021) systematically reviewed the application of deep
learning in detecting phishing and noted that more
explainable models need to be implemented and
reiterated rule-based methods again. Whereas
machine learning and deep learning solutions have
proved their high detection rates, the fact that they're
black-box poses a challenge towards cybersecurity
adoption. (S. Carolin Jeeva et al 2016) highlighted
this limitation in a review of phishing detection
techniques, calling for the integration of nature-
inspired algorithms and rule-based systems to
improve interpretability. (Cagatay Catal et al., 2022)
also made inputs in this conversation by creating an
anti-phishing browser engine through which Random
Forest is combined with a rule extraction framework
to render the phishing detection decisions transparent.
Overall, combining rule-based systems and
advanced phishing detection mechanisms has proven
to be the most effective methodology for security
enhancements. While deeper learning models
constantly push detection capacities, rule-based
systems are more explainable and can adapt rapidly,
and so serve as an attractive option or augmentation
of machine learning-based detection techniques. As a
consequence, future work can be anticipated in
enhancing hybrid methodology and applying XAI-
based measures for bolstering phishing prevention
tactics.
3 PROPOSED SYSTEM
This project suggests an RBS for phishing detection
as a substitute for conventional machine learning
techniques. The RBS analyzes URLs based on
predetermined rules considering suspicious
keywords, abnormal structures, and webpage
accessibility, making it capable of classifying any
URL irrespective of the training data provided. This
system is highly open to created URLs that have not
been detected before unlike Machine Learning
techniques. It also offers efficiency with lower
computational costs, which enables faster detections,
and brings transparency, as decisions are based on
clear, human-readable rules. The system is even
maintainable, simply by modifying or adding new
rules to guard against new phishing techniques,
making it more adaptable to counter new attacks. It's
also a lot easier to deploy and maintain, as it allows
all the fancy model training to happen offline and
only infrequent dataset updates. By employing direct
webpage accessibility assessment to inform
detection, this approach enhances phishing detection
irrespective of historic data and circumvents some of