Optimizing Leak Detection in Open-source Platforms with Machine Learning Techniques
Sofiane Lounici, Marco Rosa, Carlo Negri, Slim Trabelsi, Melek Önen
2021
Abstract
Public code platforms like GitHub are exposed to several different attacks, and in particular to the detection and exploitation of sensitive information (such as passwords or API keys). While both developers and companies are aware of this issue, there is no efficient open-source tool performing leak detection with a significant precision rate. Indeed, a common problem in leak detection is the amount of false positive data (i.e., non critical data wrongly detected as a leak), leading to an important workload for developers manually reviewing them. This paper presents an approach to detect data leaks in open-source projects with a low false positive rate. In addition to regular expression scanners commonly used by current approaches, we propose several machine learning models targeting the false positives, showing that current approaches generate an important false positive rate close to 80%. Furthermore, we demonstrate that our tool, while producing a negligible false negative rate, decreases the false positive rate to, at most, 6% of the output data.
DownloadPaper Citation
in Harvard Style
Lounici S., Rosa M., Negri C., Trabelsi S. and Önen M. (2021). Optimizing Leak Detection in Open-source Platforms with Machine Learning Techniques.In Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP, ISBN 978-989-758-491-6, pages 145-159. DOI: 10.5220/0010238101450159
in Bibtex Style
@conference{icissp21,
author={Sofiane Lounici and Marco Rosa and Carlo Negri and Slim Trabelsi and Melek Önen},
title={Optimizing Leak Detection in Open-source Platforms with Machine Learning Techniques},
booktitle={Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,},
year={2021},
pages={145-159},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010238101450159},
isbn={978-989-758-491-6},
}
in EndNote Style
TY  - CONF 
JO  - Proceedings of the 7th International Conference on Information Systems Security and Privacy - Volume 1: ICISSP,
TI  - Optimizing Leak Detection in Open-source Platforms with Machine Learning Techniques
SN  - 978-989-758-491-6
AU  - Lounici S. 
AU  - Rosa M. 
AU  - Negri C. 
AU  - Trabelsi S. 
AU  - Önen M. 
PY  - 2021
SP  - 145
EP  - 159
DO  - 10.5220/0010238101450159