Weighted Ensemble Model for Tackling Fake News

Ananya Kohli

, Divyashree Shetti

, Sri Lakshmi G N

, Vaishnavi Bhat

and Shashank Hegde

School of Computer Science Engineering, KLE Technological University, Hubballi, India

Keywords:

Fake News Detection, Ensemble Learning, BERT, Classiﬁcation Models, Weighted Averaging, Transformers.

Abstract:

Fake news detection has become crucial in resisting misinformation across multiple domains like social media,

news outlets, and public communications. Accurate classiﬁcation and sentiment analysis play a pivotal role

in addressing this challenge. Although traditional machine learning models have shown moderate success,

they face limitations in achieving high accuracy and adaptability when applied to diverse types of content.

To address this, a fake news detection model is proposed that evaluates the authenticity of news reports by

leveraging feature extraction and credibility scoring through accuracy. The proposed study presents a robust

fake news detection model that combines BERT (Bidirectional Encoder Representations from Transformers)

embeddings with ensemble learning techniques. Eight machine learning classiﬁers - Logistic Regression,

SGD (Stochastic Gradient Descent), XGBoost (Extreme Gradient Boosting), SVM (Support Vector Machine),

Random Forest, AdaBoost (Adaptive Boosting), KNN (K-Nearest Neighbor) and Naive Bayes were trained on

an 80:20 train-validation split. Using ensemble techniques including Majority Voting, Unweighted Averaging

and Weighted Averaging, the proposed work with Weighted Averaging proved to be the most accurate method,

with an accuracy of 94.8317%. This is because the weights were normalized depending on the individual

model approach, making the model a reliable and adaptable solution to misinformation detection.

1 INTRODUCTION

In today’s digital age, misinformation spreads faster

than ever before, creating new challenges in how we

consume and trust the information around us. Detect-

ing fake news is not a simple task, falsehoods often

mirror the structure and tone of credible news, mak-

ing the lines between fact and ﬁction difﬁcult to dis-

cern, even for human readers. Researchers have long

sought to develop systems that can differentiate be-

tween true and false information based on patterns in

the text (Castillo et al., 2011). One such breakthrough

is BERT (Bidirectional Encoder Representations from

Transformers), which has revolutionized NLP (Natu-

ral Language Processing) by capturing contextual re-

lationships in text with extraordinary accuracy (De-

vlin et al., 2018). Unlike earlier used models, BERT

processes text bidirectionally, allowing it to consider

the full context of each word in a sentence, mak-

ing it exceptionally well-suited for understanding the

complications of language. The proposed fake news

detection approach combines BERT’s robust embed-

dings with ensemble learning techniques.

The proposed approach is built on the premise

that no single model is perfect, but by combining the

predictions of multiple classiﬁers, we can achieve a

more reliable and accurate result. This is related by

”No Free Lunch Theorem” which highlights that no

single algorithm can outperform all others across all

types of problems, underscoring the necessity for tai-

lored solutions. Thus this research uses BERT em-

beddings as the foundation, feeding them into a vari-

ety of classiﬁers, including Logistic Regression, XG-

Boost, SVM and more. Each classiﬁer has been ﬁne-

tuned with optimized hyper-parameters, ensuring it

performs at its best. Through the use of regularization

techniques, over-ﬁtting is prevented, ensuring that the

developed framework generalizes well to new, un-

seen data(Bengio et al., 2012). Additionally, these

trained classiﬁers are integrated into ensemble tech-

niques such as Majority Voting, Weighted Averag-

ing (with normalized weights assigned based on val-

idation accuracies) and Unweighted Averaging. This

method produces a system that is both highly accu-

rate and adaptable to the varied misinformation strate-

gies employed across different platforms, achieving

an impressive accuracy of 94.8317%. The outcome is

a fake news detection tool capable of evolving with

new challenges, maintaining its relevance and effec-

878

Kohli, A., Shetti, D., Lakshmi G N, S., Bhat, V. and Hegde, S.

Weighted Ensemble Model for Tackling Fake News.

DOI: 10.5220/0013606400004664

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) - Volume 2, pages 878-884

ISBN: 978-989-758-763-4

tiveness in an ever-changing information ecosystem.

However, the approach does face limitations, pri-

marily because of its resource-intensive nature of high

computational demands of BERT embeddings and en-

semble learning techniques along with the constraint

in model’s scalability when applied to multilingual

datasets or specialized domains, where further reﬁne-

ment is necessary to ensure consistent performance.

Despite these challenges, the integration of ensem-

ble methods and BERT embeddings provides a robust

framework for combating misinformation, with po-

tential for real-world applications in media platforms,

fact-checking organizations, and beyond.

The paper is organized as follows: Section 2 re-

views existing fake news detection algorithms, with a

focus on ensemble techniques and their applications.

Section 3 delves into the architecture of previous fake

news detection systems, offering insights into their

strengths and weaknesses. Section 4 introduces the

proposed methodology, detailing how BERT embed-

dings are utilized to train a diverse set of models

using regularization techniques. Section 5 presents

the experimental results by comparing the accuracy

of various models and identifying the most success-

ful approach. Finally, Section 6 concludes the pa-

per by summarizing the ﬁndings and reﬂecting on the

broader implications of the proposed approach.

2 BACKGROUND STUDY

The detection of fake news has signiﬁcantly evolved,

transitioning from traditional machine learning meth-

ods like Logistic Regression and SVM to advanced

deep learning approaches. Earlier methods relied on

linguistic features such as TF-IDF for classiﬁcation,

which performed well for straightforward tasks but

struggled with understanding deeper contextual rela-

tionships within text. The advent of deep learning,

particularly models like LSTM and BERT, revolution-

ized fake news detection by capturing semantic nu-

ances and bidirectional context in language. BERT,

with its robust contextual understanding, has greatly

improved classiﬁcation performance (Devlin et al.,

2018; Vaswani et al., 2017; Yang and Cui, 2021).

However, challenges such as overﬁtting on limited

datasets and domain adaptation issues hinder their

generalization (Jin et al., 2022; Wang et al., 2023).

Ensemble learning methods which include Random

Forest and XGBoost mitigate these limitations by

combining multiple models, reducing overﬁtting, and

enhancing robustness (Breiman, 2001; Chen et al.,

2016; Friedman, 2001). These methods also facili-

tate improved decision-making through diverse fea-

ture combinations, which is crucial for handling com-

plex and ambiguous fake news content. Additionally,

the integration of explainable AI (XAI) techniques

in ensemble models offers more transparent insights

into the decision-making process, further strengthen-

ing trust in automated systems (Gilpin et al., 2018).

The increase in fake news across social media

and digital platforms highlights the need for adaptable

systems capable of handling rapidly evolving content

types and domains. Traditional approaches relying on

handcrafted linguistic features such as n-grams, bag-

of-words, and syntactic structures (Joachims, 1998;

Salton, 1986) often fall short in addressing the com-

plexities of modern strategies for spreading false in-

formation. Deep learning models, including CNNs

and LSTMs, brought advancements by capturing hier-

archical and temporal patterns from text (Kim, 2014;

Hochreiter and Schmidhuber, 1997), yet they struggle

with diverse, noisy, or domain-speciﬁc datasets. Cur-

rent research emphasizes hybrid methods that com-

bine the powerful feature extraction of models like

BERT with ensemble strategies. These approaches

provide scalability and adaptability for misinforma-

tion detection across varied contexts (Zhang and Bao,

2020; Jiang et al., 2021; Zhou et al., 2020).

Despite the accuracy gains of deep learning mod-

els like BERT, they are computationally expensive

and prone to overﬁtting, particularly on imbalanced

datasets (Vaswani et al., 2017; Czapla et al., 2019).

Additionally, their ”black-box” nature raises concerns

about interpretability and trust (Gilpin et al., 2018;

Marco Tulio Ribeiro, 2016). To address these is-

sues, the proposed research introduces a novel en-

semble learning approach that integrates BERT with

simpler classiﬁers such as Logistic Regression, SVM,

XGBoost and others. This ensemble reduces re-

liance on any single model, improving both general-

ization and computational efﬁciency while enhancing

transparency and robustness. Further incorporate fea-

ture importance analysis using XGBoost is incorpo-

rated to provide greater model explainability (Caru-

ana and Niculescu-Mizil, 2006; Marco Tulio Ribeiro,

2016). By combining SGD and Naive Bayes, this

approach ensures scalability and better performance

in dynamic, high-dimensional and real-time environ-

ments (Freund and Schapire, 1997; McCallum and

Nigam, 1998). This sets the stage for the proposed

methodology, where the aim is to leverage these in-

sights and integrate various models to build a more

effective and efﬁcient fake news detection system.

Weighted Ensemble Model for Tackling Fake News

879

3 PROPOSED METHODOLOGY

The proposed methodology follows a structured

workﬂow that integrates BERT embeddings with tra-

ditional machine learning models to enhance fake

news detection. The process begins with cleaning

the dataset for any invalid values followed by di-

viding it into training and validation sets, which are

then passed through BERT for embedding genera-

tion. The BERT model using equation 1, processes

the input data and produces context-aware embed-

dings, which are stored for efﬁcient retrieval and us-

age. These embeddings capture the textual features

of the news articles and serve as input for various

machine learning classiﬁers, including Logistic Re-

gression, SGD (Stochastic Gradient Descent), XG-

Boost (Extreme Gradient Boosting), SVM (Support

Vector Machine), Random Forest, AdaBoost (Adap-

tive Boosting), KNN (K-Nearest Neighbor) and Naive

Bayes. The BERT embedding calculation is given by

the equation 1

Eoutput = f BERT(Tokenized Input) (1)

Once the embeddings are ready, the individual

models are trained on the dataset with an 80:20

split for training and validation. Later these trained

models are given as inputs to ensemble techniques

such as Majority Voting, Unweighted Averaging and

Weighted Averaging. The performance of individ-

ual and ensemble models are assessed using accuracy,

precision, F1 score and recall on both the training and

validation sets. These performance metrics are stored

for further analysis. The models’ validation accura-

cies are used to weigh their contributions in the en-

semble techniques. Speciﬁcally, for Weighted Aver-

aging in equation 2, normalized weights are assigned

based on the validation accuracies of the individual

models, allowing more accurate models to have a

greater inﬂuence in the ﬁnal decision. The Weighted

Averaging calculation is given by the equation 2

ˆy

ﬁnal

∑

m=1

· ˆy

∑

m=1

(2)

where ˆy

is the prediction and w

is the weight as-

sociated with model m and M refers to the total num-

ber of individual models used. This equation 2 cal-

culates the ﬁnal prediction by summing the weighted

predictions and applying a threshold (0.5) to get a bi-

nary class.

Alongside Weighted Averaging, two additional

ensemble techniques such as Majority Voting and Un-

weighted Averaging are applied to combine the pre-

dictions from all models, providing a more gener-

alized output. In Majority Voting, according to the

equation 3, the ﬁnal predicted class is determined by

the majority vote across all models and Unweighted

Averaging from equation 4 assigns equal importance

to each of the input models. The equations for Major-

ity Voting 3 and Unweighted Averaging 4 are

ˆy

= argmax

c∈C

∑

m=1

δ(y

= c) (3)

ˆy

ﬁnal

∑

m=1

(4)

where δ is an indicator function for each model’s

prediction and 1 is the weight associated with model

After the models are trained, they predict labels

for the test embeddings. The predicted labels from

each model are stored for performance evaluation. To

assess the overall effectiveness of the system, accu-

racy, precision, F1 score, and recall are computed not

only for the individual models but also for the ensem-

ble techniques.

Figure 1: Proposed Ensemble Framework.

The dataset (Lifferth, 2018) consisted of primar-

ily textual data which included a training set with la-

beled records and a test set with a similar structure

but no ground truth labels. The training data included

unique id, titles, authors, and text content for analy-

sis. The ﬁnal output incorporates ensemble predic-

tions, speciﬁcally using the columns y pred majority,

y pred unweighted and y pred weighted, which ag-

gregated model outputs through Majority Voting, Un-

weighted Averaging and the proposed Weighted Av-

eraging model, respectively to enhance prediction ac-

curacy and reliability.

INCOFT 2025 - International Conference on Futuristic Technology

880

3.1 Proposed Architecture

The architecture shown in Figure 2 illustrates the

components and workﬂow of the entire ensemble

model. In this setup, the training and testing data are

input to BERT, which generates embeddings that are

then provided to traditional machine learning models.

The predictions from these models are fed into en-

semble classiﬁers. The accuracy of each ensemble

classiﬁer’s predictions is calculated, and the predic-

tions with the highest accuracy are selected as the ﬁ-

nal ensemble output.

Figure 2: Fake news detection ensemble architecture.

3.2 Proposed Algorithm

The algorithm 1 shows that this methodology in-

tegrates BERT for feature extraction, followed by

training a range of classiﬁers and ﬁnally combines

their outputs using ensemble methods like Majority

Voting, Unweighted Averaging and Weighted Aver-

aging. This approach ensures a robust and accu-

rate fake news detection system which demonstrates

the strength of combining diverse model predictions

based on their validation performance.

Data: Textual train dataset labeled as reliable

(0) or potentially fake (1), and a test

dataset.

Result: Predicted labels for the test dataset

and evaluation metrics.

Initialize train and test datasets;

Preprocess the data by removing invalid

values or tuples;

Generate BERT embeddings for both datasets

and save them;

while train-validation split is incomplete do

Split train dataset into 80:20

train-validation;

Train individual machine learning models

on the training data;

Compute validation accuracies for each

model;

if validation accuracy is acceptable then

Store the model and its accuracy;

end

else

Re-adjust hyperparameters or

preprocessing and re-train the

models;

end

Combine predictions using ensemble

methods;

if ensemble method is Majority Voting then

Assign labels based on the most frequent

prediction;

end

else if ensemble method is Unweighted

Averaging then

Average predicted probabilities and

assign labels;

end

else if ensemble method is Weighted

Averaging then

Use normalized validation accuracies as

weights, compute weighted averages

and assign labels;

end

Calculate validation accuracy for each of

these ensemble methods and select the

ensemble method with the highest accuracy;

Predict labels for the test dataset using the

chosen ensemble method;

Evaluate results with accuracy, precision,

recall, and F1-score;

Algorithm 1: Weighted Ensemble Model for Tackling

Fake News

Weighted Ensemble Model for Tackling Fake News

881

4 RESULTS AND ANALYSIS

The evaluation of proposed fake news detection sys-

tem reveals signiﬁcant insights into the performance

of individual machine learning models and ensemble

techniques. Leveraging BERT embeddings as fea-

ture representations, the models were assessed using

metrics such as accuracy, precision, recall and F1-

score. The results, summarized in Table 1, highlight

the comparative strengths of different approaches, in-

cluding the enhanced reliability achieved through en-

semble methods like Weighted Averaging.

Table 1: Performance Metrics for Models and Ensemble

Techniques

Model Accuracy Precision Recall F1-Score

Logistic

Regression 0.954087 0.954106 0.954092 0.954086

SVM 0.946154 0.946419 0.946172 0.946147

KNN 0.891106 0.893549 0.891163 0.890946

XGBoost 0.931490 0.931823 0.931511 0.931479

SGD 0.938942 0.941321 0.938996 0.938865

Random Forest 0.891106 0.892637 0.891151 0.891007

AdaBoost 0.842788 0.843168 0.842812 0.842751

Naive Bayes 0.670433 0.720533 0.670776 0.650858

Majority Voting 0.931010 0.933763 0.931067 0.930906

Unweighted

Averaging 0.859856 0.883128 0.859856 0.857666

Proposed model

(Weighted Averaging) 0.948317 0.949070 0.948317 0.948297

Figure 3: Model accuracy comparison.

Table 1 presents the performance metrics of

individual machine learning models and ensemble

techniques for fake news detection. The models

were evaluated based on accuracy, precision, re-

call, and F1-score to determine their effectiveness.

Logistic Regression achieved the highest accuracy

(0.954087) among individual models, followed by

SVM (0.946154) and SGD (0.938942), indicating

strong classiﬁcation capabilities. In contrast, Na

ıve

Bayes had the lowest accuracy (0.670433), highlight-

ing its limitations in this context. Figure 3 further vi-

sualizes the accuracy comparison, reinforcing the su-

perior performance of the proposed model.

The diagrammatical accuracy comparison among

different models in Figure 3 show that Logistic Re-

gression achieves the highest accuracy, followed by

Weighted Averaging and Support Vector Machine.

While Logistic Regression performs the best in terms

of individual model accuracy, Weighted Averaging

improves on this by combining the strengths of mul-

tiple models, leading to more robust predictions than

a single model.

The graph in Figure 4 compares the accuracy

of ensemble techniques such as Majority Voting,

Weighted Averaging, and Unweighted Averaging.

Weighted Averaging achieves the highest accuracy,

while Majority Voting performs better than Un-

weighted Averaging with an accuracy of 0.948317.

Both Majority Voting and Unweighted Averaging un-

derperform because they treat all models equally, al-

lowing weaker models to inﬂuence the ﬁnal predic-

tions. In contrast, Weighted Averaging improves ac-

curacy by giving more weight to stronger models and

reducing the impact of weaker ones.

Figure 4: Ensemble techniques accuracy comparison

Although the approach enhances reliability, it

comes with several limitations. The system is

resource-intensive due to the high computational de-

mands of BERT embeddings and ensemble tech-

niques, which may limit its deployment in resource-

constrained environments. Also the model primarily

processes textual data, which restricts its ability to

INCOFT 2025 - International Conference on Futuristic Technology

882

handle multimodal fake news, such as misinformation

spread through images or videos. Additionally chal-

lenges arise in deploying the model at scale for mul-

tilingual datasets or adapting it to highly specialized

domains as it requires further reﬁnement to maintain

optimal performance. These limitations underscore

the need for continued research to improve the sys-

tem’s versatility.

5 Conclusion and Future Work

This research developed a hybrid fake news detection

system by integrating BERT embeddings with en-

semble machine learning models. The system effec-

tively captured the semantic meaning of news content,

achieving improved accuracy and reliability through

Voting, Unweighted and Weighted Averaging tech-

niques. Weighted Averaging proved to be the most

reliable, leveraging the strengths of diverse models

and mitigating the impact of outliers using normal-

ized weights for consistent performance. Further-

more, the system demonstrated scalability and adapt-

ability across different datasets, making it suitable for

real-world applications. By combining the power of

deep learning with traditional classiﬁers, it addresses

key challenges such as overﬁtting and model inter-

pretability. The integration of these techniques lays

the foundation for building a robust and efﬁcient fake

news detection system. Additionally, the approach’s

transparency helps enhance trust and accountability in

automated decision-making.

The proposed approach contributes to future ad-

vancements in fake news detection by enhancing

accuracy through weighted averaging in ensemble

learning, making it a scalable and adaptable frame-

work. News veriﬁcation systems can leverage the

model to assist journalists and media organizations in

assessing the credibility of articles before publication.

Search engines can incorporate the model to ﬁlter out

misleading content, enhancing the integrity of online

information. The model can be enhanced to prevent

market manipulation through fake ﬁnancial news and

also detect false health claims, medical misinforma-

tion, and prevent public health crises.

Future enhancements include exploring diverse

data types, ﬁne-tuning BERT for domain-speciﬁc ap-

plications and enabling real-time detection capabili-

ties. Expanding support for multiple languages and

utilizing larger datasets will further improve system

performance. Additionally, incorporating explainable

AI and robust defenses against fake content can en-

hance transparency and reliability in detection.

REFERENCES

Bengio, Y. et al. (2012). Practical recommendations for

gradient-based training of deep architectures. Neural

Networks: Tricks of the Trade, pages 437–478.

Breiman, L. (2001). Random forests. Machine Learning,

45(1):5–32.

Caruana, R. and Niculescu-Mizil, A. (2006). An empirical

comparison of supervised learning algorithms. Pro-

ceedings of the 23rd international conference on Ma-

chine learning, pages 161–168.

Castillo, C. et al. (2011). Information credibility on twitter.

In Proceedings of the 22nd International Conference

on World Wide Web, pages 675–684. ACM.

Chen, T. et al. (2016). Xgboost: A scalable tree boosting

system. ACM SIGKDD, pages 785–794.

Czapla, P., Gugger, S., Howard, J., and Kardas, M. (2019).

Universal language model ﬁne-tuning for polish hate

speech detection. In Proceedings of the PolEval2019

Workshop, page 149.

Devlin, J. et al. (2018). Bert: Pre-training of deep bidirec-

tional transformers for language understanding. arXiv

preprint arXiv:1810.04805.

Freund, Y. and Schapire, R. E. (1997). A decision-theoretic

generalization of on-line learning and an application

to boosting. Journal of Computer and System Sci-

ences, 55(1):119–139.

Friedman, J. H. (2001). Greedy function approximation: A

gradient boosting machine. The Annals of Statistics,

29(5):1189–1232.

Gilpin, L. H. et al. (2018). Explaining explanations: An

overview of interpretability of machine learning. ACM

Computing Surveys (CSUR), 51(5):93.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term

memory. Neural Computation, 9(8):1735–1780.

Jiang, T., Yu, X., Li, C., Song, Y., and Zhan, Y. (2021). A

novel stacking approach for accurate detection of fake

news. IEEE Access, 9:22626–22639.

Jin, Y. et al. (2022). Towards ﬁne-grained reasoning for

fake news detection. In Proceedings of the AAAI Con-

ference on Artiﬁcial Intelligence, volume 36, pages

6339–6346.

Joachims, T. (1998). Text categorization with support vec-

tor machines: Learning with many relevant features.

European Conference on Machine Learning (ECML).

Kim, Y. (2014). Convolutional neural networks for sentence

classiﬁcation. In Proceedings of the 2014 Conference

on Empirical Methods in Natural Language Process-

ing (EMNLP).

Lifferth, W. (2018). Fake news. https://kaggle.com/

competitions/fake-news. Kaggle.

Marco Tulio Ribeiro, Sameer Singh, C. G. (2016). ”why

should i trust you?”: Explaining the predictions of any

classiﬁer. Proceedings of the 22nd ACM SIGKDD In-

ternational Conference on Knowledge Discovery and

Data Mining, pages 1135–1144.

McCallum, A. and Nigam, K. (1998). A comparison of

event models for naive bayes text classiﬁcation. In

Proceedings of the AAAI-98 Workshop on Learning

for Text Categorization, pages 41–48. AAAI Press.

Weighted Ensemble Model for Tackling Fake News

883

Salton, G. (1986). Another look at automatic text-retrieval

systems. Communications of the ACM, 29(7):648–

656.

Vaswani, A. et al. (2017). Attention is all you need. In

Proceedings of NIPS, pages 5998–6008.

Wang, J. et al. (2023). Tlfnd: A multimodal fusion model

based on three-level feature matching distance for fake

news detection. Entropy, 25(11):1533.

Yang, Y. and Cui, X. (2021). Bert-enhanced text graph neu-

ral network for classiﬁcation. Entropy, 23(11):1536.

Zhang, X. and Bao, L. (2020). Fake news detection via nlp

techniques: A review. Journal of Computer Science

and Technology.

Zhou, L. et al. (2020). Stacked ensemble learning for fake

news detection. IEEE Access, 8:21390–21401.

INCOFT 2025 - International Conference on Futuristic Technology

884