When developing Android mobile applications, it is
essential to adopt security-focused practices, from
early stages, during the overall development cycle,
and it is important to receive valuable automated tool
support. One way to support app developers, in
identifying source code vulnerabilities, is by apply-
ing AI methods. This study presents a dataset called
LVDAndro, which contains over 20 million distinct
source code samples, labelled based on CWE-IDs, for
identifying Android source code vulnerabilities. The
dataset can be used to train machine learning mod-
els to predict vulnerabilities, achieving 94% accuracy
in binary and multi-class classification, with 0.94 and
0.93 F1-Scores, respectively. The dataset is available
on GitHub and ongoing efforts are underway to ex-
pand it and increase sample sizes for deeper learning
models. The addition of more scanners can further
increase the model’s accuracy. Adopting security-
focused practices and receiving automated tool sup-
port is important for developing secure Android apps.
