FakeWhastApp.BR: NLP and Machine Learning Techniques for Misinformation Detection in Brazilian Portuguese WhatsApp Messages

Lucas Cabral; José Monteiro; José Franco da Silva; César Mattos; Pedro Mourão

doi:10.5220/0010446800630074

FakeWhastApp.BR: NLP and Machine Learning Techniques for Misinformation Detection in Brazilian Portuguese WhatsApp Messages

Lucas Cabral, José Monteiro, José Franco da Silva, César Mattos, Pedro Mourão

2021

Abstract

In the past few years, the large-scale dissemination of misinformation through social media has become a critical issue, harming the trustworthiness of legit information, social stability, democracy and public health. Thus, developing automated misinformation detection methods has become a field of high interests both in academia and in industry. In many developing countries such as Brazil, India, and Mexico, one of the primary sources of misinformation is the messaging application WhatsApp. Despite this scenario, due to the private messaging nature of WhatsApp, there still few methods of misinformation detection developed specifically for this platform. In this work we present the FakeWhatsApp.BR, a dataset of WhatsApp messages in Brazilian Portuguese, collected from Brazilian public groups and manually labeled. Besides, we evaluated a series of misinformation classifiers combining Natural Language Processing-based techniques of feature extraction and a set of well-know machine learning algorithms, totaling 108 different scenarios. Our best result achieved a F1 score of 0.73, and the analysis of errors indicates that they occur mainly due to the predominance of short texts that accompany media files. When texts with less than 50 words are filtered, the F1 score rises to 0.87.

Download

Paper Citation

in Harvard Style

Cabral L., Monteiro J., Franco da Silva J., Mattos C. and Mourão P. (2021). FakeWhastApp.BR: NLP and Machine Learning Techniques for Misinformation Detection in Brazilian Portuguese WhatsApp Messages. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-509-8, pages 63-74. DOI: 10.5220/0010446800630074

in Bibtex Style

@conference{iceis21,
author={Lucas Cabral and José Monteiro and José Franco da Silva and César Mattos and Pedro Mourão},
title={FakeWhastApp.BR: NLP and Machine Learning Techniques for Misinformation Detection in Brazilian Portuguese WhatsApp Messages},
booktitle={Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2021},
pages={63-74},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010446800630074},
isbn={978-989-758-509-8},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - FakeWhastApp.BR: NLP and Machine Learning Techniques for Misinformation Detection in Brazilian Portuguese WhatsApp Messages
SN - 978-989-758-509-8
AU - Cabral L.
AU - Monteiro J.
AU - Franco da Silva J.
AU - Mattos C.
AU - Mourão P.
PY - 2021
SP - 63
EP - 74
DO - 10.5220/0010446800630074