Intelligent Model for PDF Malware Detection
Kamakshamma Vasepalli, Sneha K., Snehitha M., Shaista Ainan S., Prudhvi Tejasvi J.
2025
Abstract
The widespread use of Portable Document Format (PDF) files in digital communication has made them a primary target for cyber threats. Malicious PDFs often contain embedded JavaScript, auto- executable actions, and hidden exploits, making detection challenging. Existing approaches, such as the ML Pdf neural network model, rely on deep learning for classification but suffer from high computational overhead and limited interpretability. To address these limitations, this paper proposes a hybrid PDF malware detection system that combines Flask, pdfid.py, PyPDF2, and a machine learning approach using Random Forest (RF) and Support Vector Machine (SVM). This system extracts structural and security-related features from PDFs, leveraging static analysis to identify malicious indicators such as JavaScript execution, embedded file injections, and encryption anomalies. Unlike purely deep-learning-based methods, this approach enhances detection efficiency and provides greater explain ability in classification decisions. An evaluation of the system is conducted using a real-world dataset of 105,000 PDFs, achieving an accuracy of 98.9%, outperforming the ML Pdf model and commercial antivirus solutions. The results demonstrate that the method is scalable, interpretable, and effective in detecting PDF-based threats with a low false-positive rate. Future work will explore dynamic analysis techniques and real-time threat intelligence integration to enhance detection robustness.
DownloadPaper Citation
in Harvard Style
Vasepalli K., K. S., M. S., S. S. and J. P. (2025). Intelligent Model for PDF Malware Detection. In Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25; ISBN 978-989-758-777-1, SciTePress, pages 800-806. DOI: 10.5220/0013943800004919
in Bibtex Style
@conference{icrdicct`2525,
author={Kamakshamma Vasepalli and Sneha K. and Snehitha M. and Shaista S. and Prudhvi J.},
title={Intelligent Model for PDF Malware Detection},
booktitle={Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25},
year={2025},
pages={800-806},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013943800004919},
isbn={978-989-758-777-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies - ICRDICCT`25
TI - Intelligent Model for PDF Malware Detection
SN - 978-989-758-777-1
AU - Vasepalli K.
AU - K. S.
AU - M. S.
AU - S. S.
AU - J. P.
PY - 2025
SP - 800
EP - 806
DO - 10.5220/0013943800004919
PB - SciTePress