loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Ahmar K. Hussain ; Bernhard A. Sabel ; Marcus Thiel and Andreas Nürnberger

Affiliation: Otto von Guericke University Magdeburg, Germany

Keyword(s): Fake Papers, Classification, Meta Data Features, TF-IDF, Biomedicine, Large Language Models.

Abstract: In order to address the issue of fake papers in scientific literature, we propose a study focusing on the classification of fake papers based on certain features, by employing machine learning classifiers. A new dataset was collected, where the fake papers were acquired from the Retraction Watch database, while the non-fake papers were obtained from PubMed. The features extracted for classification included metadata, journal-related features as well and textual features from the respective abstracts, titles, and full texts of the papers. We used a variety of different models to generate features/word embeddings from the abstracts and texts of the papers, including TF-IDF and different variations of BERT trained on medical data. The study compared the results of different models and feature sets and revealed that the combination of metadata, journal data, and BioBERT embeddings achieved the best performance with an accuracy and recall of 86% and 83% respectively, using a gradient boos ting classifier. Finally, this study presents the most important features acquired from the best performing classifier. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.141.38.5

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Hussain, A. K., Sabel, B. A., Thiel, M. and Nürnberger, A. (2025). Automated Detection of Fake Biomedical Papers: A Machine Learning Perspective. In Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-749-8; ISSN 2184-4992, SciTePress, pages 662-670. DOI: 10.5220/0013482800003929

@conference{iceis25,
author={Ahmar K. Hussain and Bernhard A. Sabel and Marcus Thiel and Andreas Nürnberger},
title={Automated Detection of Fake Biomedical Papers: A Machine Learning Perspective},
booktitle={Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2025},
pages={662-670},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013482800003929},
isbn={978-989-758-749-8},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 27th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Automated Detection of Fake Biomedical Papers: A Machine Learning Perspective
SN - 978-989-758-749-8
IS - 2184-4992
AU - Hussain, A.
AU - Sabel, B.
AU - Thiel, M.
AU - Nürnberger, A.
PY - 2025
SP - 662
EP - 670
DO - 10.5220/0013482800003929
PB - SciTePress