rapidly changing rumor patterns since
misinformation evolves. Cue generalization in
datasets makes it harder to keep detection efficiency
high. These models also need a huge range of labelled
datasets to be trained, making it difficult to translate
into real-time, complex scenarios.
A well-known approach is network-based rumor
tracing, which pays more attention to the structural
characteristics of information diffusion. These
techniques utilize propagation models, such as
Susceptible-Infected-Recovered (SIR) or
independent Cascade (IC), to track the dissemination
of false information on social networks. Using
community detection and network centrality
measures imply influential nodes responsible for
spreading rumors. Through effective at examining
rumor propagation, the nature of these techniques
renders them difficult to scale to a large social media
platform. Platforms such as Twitter and Facebook
generate massive volumes of data, which require
highly efficient computational methods to analyze,
and many traditional models do not perform well
enough to enable real-time analysis. Moreover,
methods relying on networks often get disrupted by
noise and fall to distinguish between organic viral
content and misinformation.
If the existing systems have been useful for
detecting rumors at least in part, they have
nevertheless all striking limitations with regard to
accuracy, scalability and false positive/negative
ratios. These traditional models are limited either by
their dependence on ontology-based query models or
by network-based tracking and therefore fail to
encapsulate the complex nature of the rumor spread.
This means that high false-positive rates result in
unnecessary flagging of content and false negatives
allow harmful misinformation to continue spreading
without detection. Moreover, conventional methods
do not adapt to new trends of misinformation, which
renders them impractical in the long run. These
limitations demonstrate the need for an improved
approach combines multiple Analytical Techniques,
helps increase accuracy and performance of internet
rumor source identification.
4 PROPOSED SYSTEM
The proposed approach utilizes a combination of
well-established machine learning techniques and
network analysis to create a robust system for both
detecting and tracing the origins of fake news. This
method integrates three advanced models: BERT
(Bidirectional Encoder Representations from
Transformers), Random Forest, and LSTM (Long
Short-Term Memory). Each model contributes
uniquely to the system, enhancing its overall accuracy
and efficiency. By combining these methods, the
system effectively handles both textual data and
network-based patterns, which are essential for
identifying and tracking the spread of
misinformation.
BERT, a transformer-based model, plays an
essential role in understanding the relationships
within textual data. Unlike traditional machine
learning models that process text word by word,
BERT analyzes entire sentences, making it highly
effective in recognizing complex linguistic structures
and identifying subtle differences in meaning. Pre-
trained on large-scale text datasets, BERT grasps
intricate semantic and syntactic structures, making it
especially powerful for tasks like rumor detection,
where contextual understanding is vital for assessing
the credibility of information.
Random Forests, an ensemble learning technique,
improve the model's performance by classifying the
textual features extracted by BERT, thereby
enhancing prediction accuracy. Random Forests
create multiple decision trees, each trained on
different data subsets, and aggregate their outputs to
make a final prediction. This method reduces
overfitting and increases the model's ability to
generalize, making it more reliable in distinguishing
between rumors and factual content.
LSTM networks, a type of recurrent neural
network (RNN), are employed to capture the
temporal dependencies in data. As rumors typically
spread over time, analyzing the sequence in which
information is shared provides valuable insights into
its origin. LSTMs excel in maintaining long-term
dependencies in sequential data, which makes them
ideal for tracking the progression of rumors across
social networks. The integration of LSTMs helps the
system analyze how rumors evolve and trace their
spread back to the initial source.
In addition to these machine learning techniques,
the approach also incorporates network analysis to
study the structure of information dissemination.
Social networks are complex, and understanding how
information flows through them reveals key
influencers and the pathways along which rumors
spread. By applying centrality metrics such as degree,
betweenness, and closeness, the model can identify
influential nodes that amplify misinformation. This
network-based analysis enhances the system’s ability
to track the propagation of rumors, particularly those
amplified by specific users or groups.
The combination of BERT, Random Forests, and