
gests a high degree of consistency between pre-
cision and recall.
– MobileNet performs exceptionally well, im-
proving across larger or more complex datasets.
Its ability to generalize is highlighted by its
minimal fluctuation in accuracy, precision, and
recall. The AUC remains high, confirming its
reliability for classification.
• DenseNet
– Dataset 0: DenseNet shows strong performance
with a loss of 0.214 and accuracy of 0.947, but
slightly lower than MobileNet for this dataset.
Precision, recall, and F-measure are all consis-
tent at 0.947, and the AUC is high at 0.979.
– Datasets P1, P2, P3: As the datasets change,
DenseNet consistently improves, with its loss
reducing to 0.120 for P3 and accuracy peaking
at 0.968 across P2 and P3. Precision, recall, and
F-measure remain stable at 0.968, and the AUC
remains around 0.989.
– DenseNet shows strong and stable perfor-
mance, with an especially low loss on the P2
and P3 datasets. It has excellent precision, re-
call, and AUC, making it a reliable model for
malware detection.
• ResNet50
– Dataset 0: ResNet50 starts with the highest loss
among the models (0.442) and relatively lower
accuracy (0.920). Its precision, recall, and F-
measure are all consistent at 0.920, and the
AUC is 0.960, indicating moderate discrimina-
tion ability.
– Datasets P1, P2, P3: ResNet50 improves in ac-
curacy and other metrics with P1, P2, and P3,
reaching a maximum accuracy of 0.952 for P2.
However, its loss remains relatively high com-
pared to other models (0.246 for P3). The AUC
remains stable around 0.979-0.980, indicating
that the model can still discriminate well be-
tween malware and trusted samples.
– ResNet50 performs solidly on the larger
datasets but struggles with higher loss and
slightly lower accuracy compared to DenseNet
and MobileNet. It does, however, maintain
good AUC values across all datasets.
• Inception
– Dataset 0: Inception shows competitive perfor-
mance, with an accuracy of 0.941 and loss of
0.193. Its precision, recall, and F-measure are
all consistent at 0.941, and the AUC is 0.976,
suggesting good performance.
– Datasets P1, P2, P3: As with the other mod-
els, Inception’s performance improves with the
more complex datasets, reaching an accuracy
of 0.952 for P2 and a low loss of 0.157. The
AUC peaks at 0.987 for P2, indicating strong
discriminatory power.
– Inception performs well, especially on dataset
P2, with high precision, recall, and AUC val-
ues. It strikes a balance between performance
metrics and generalization capability, though it
slightly lags behind MobileNet and DenseNet
on the larger datasets.
We note improved performance on the P1, P2,
and P3 datasets; all models show better perfor-
mance on the P1, P2, and P3 compared to dataset
0. This suggests that the complexity or size of these
datasets helps the models learn more effectively, re-
sulting in higher accuracy, precision, recall, and lower
loss. MobileNet and DenseNet Outperform: Mo-
bileNet and DenseNet consistently outperform the
other models regarding accuracy, precision, recall,
and F-measure, particularly on datasets P1 through
P3. Both models maintain low loss values and high
AUC, making them strong candidates for the task.
ResNet50, while still performing well in terms of ac-
curacy and AUC, suffers from higher loss values, in-
dicating that it struggles to fit the data as effectively
as the other models. Inception shows balanced perfor-
mance across all datasets, with competitive metrics,
though slightly behind MobileNet and DenseNet in
accuracy and loss. MobileNet and DenseNet stand out
for their ability to generalize and maintain high per-
formance across all datasets. Both models show min-
imal fluctuation in accuracy and have high AUC, in-
dicating their effectiveness in malware classification.
The performance improvements across the datasets
suggest that the models benefit from more complex
datasets (i.e., P1, P2, P3). While the ResNet50 and In-
ception models perform adequately, their higher loss
values and slightly lower accuracy suggest they may
not generalize as well as MobileNet and DenseNet,
especially in handling more complex data.
These findings indicate that MobileNet and
DenseNet would be the most reliable models for
malware detection. However, considering that the
MobileNet models obtain slightly higher accuracy if
compared with the DenseNet one, we consider the
MobileNet model the best model for the detection of
packed malware. As a matter of fact, the DenseNet
model obtains an accuracy equal to 0.949, 0.974,
0.973, and 0,974, while the DenseNet reaches the fol-
lowing accuracy, i.e., 0.947, 0.966, 0.968, and 0.968.
A Method for Packed (and Unpacked) Malware Detection by Means of Convolutional Neural Networks
563