PDF Malware Detection: Toward Machine Learning Modelling with Explainability

Venkatesh K., Gopi Chand K., Aravind B., Kantha Raju K., Madhan Mohan Reddy Y.

2025

Abstract

In the digital age, PDF files are widely used for document sharing, but their popularity also makes them a target for malware attacks. This project, titled " Detecting Malware in PDFs: Advancing Machine Learning Models with Interpretability Assessment," aims the goal is to design and assess machine learning models aimed at identifying malware within PDF files. Utilizing a dataset from Kaggle, which contains labelled examples of malicious and benign PDFs, various algorithms including RF, C5.0, J48, SVM, AdaBoost, DNN, GBM, and KNN will be applied. The primary focus is on achieving high detection accuracy while also providing explainability to gain insight into how the models make decisions. By leveraging machine learning techniques, this project seeks to enhance cybersecurity measures, offering a robust solution to identify and mitigate potential threats embedded in PDF documents.

Download


Paper Citation


in Harvard Style

K. V., K. G., B. A., K. K. and Y. M. (2025). PDF Malware Detection: Toward Machine Learning Modelling with Explainability. In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25; ISBN 978-989-758-777-1, SciTePress, pages 432-437. DOI: 10.5220/0013884200004919


in Bibtex Style

@conference{icrdicct`2525,
author={Venkatesh K. and Gopi K. and Aravind B. and Kantha K. and Madhan Y.},
title={PDF Malware Detection: Toward Machine Learning Modelling with Explainability},
booktitle={Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25},
year={2025},
pages={432-437},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013884200004919},
isbn={978-989-758-777-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25
TI - PDF Malware Detection: Toward Machine Learning Modelling with Explainability
SN - 978-989-758-777-1
AU - K. V.
AU - K. G.
AU - B. A.
AU - K. K.
AU - Y. M.
PY - 2025
SP - 432
EP - 437
DO - 10.5220/0013884200004919
PB - SciTePress