
rations was carried out to enable inference on the Intel
NCS2. To the best of our knowledge, this is the first
implementation of this kind of model on this type of
device.
Although we reached an accuracy of 95%, 0.5 FPS
cannot meet the speed requirement for real-time ap-
plications. Nevertheless, this implementation is still
usable in different contexts. These results are ob-
tained on the dataset described in Section 4; however,
similar values have been collected with the target data
provided by the industrial partner. For comparison,
we tried naive Yolact on NCS2, which got only 0.09
fps, while our model got 0.5 FPS.
Possible future works include exploring the uti-
lization of more powerful and recent edge devices and
a comprehensive analysis of contemporary instance
segmentation models.
ACKNOWLEDGEMENTS
The research leading to these results has received
funding from Project “Ecosistema dell’innovazione
- Rome Technopole” financed by the EU in the
NextGenerationEU plan through MUR Decree n.
1051 23.06.2022 - CUP B83C22002820006
REFERENCES
Arnab, A. and Torr, P. H. S. (2017). Pixelwise instance seg-
mentation with a dynamically instantiated network.
CoRR, abs/1704.02386.
Avola, D., Cinque, L., Fagioli, A., Foresti, G. L., Marini,
M. R., Mecca, A., and Pannone, D. (2022). Medici-
nal boxes recognition on a deep transfer learning aug-
mented reality mobile application. In Sclaroff, S.,
Distante, C., Leo, M., Farinella, G. M., and Tombari,
F., editors, Image Analysis and Processing – ICIAP
2022, pages 489–499, Cham. Springer International
Publishing.
Avola, D., Cinque, L., Marini, M. R., Princic, A., and
Venanzi, V. (2023). Keyrtual: A lightweight vir-
tual musical keyboard based on rgb-d and sensors fu-
sion. In Computer Analysis of Images and Patterns:
20th International Conference, CAIP 2023, Limassol,
Cyprus, September 25–28, 2023, Proceedings, Part II,
page 182–191, Berlin, Heidelberg. Springer-Verlag.
Bai, M. and Urtasun, R. (2016). Deep watershed transform
for instance segmentation. CoRR, abs/1611.08303.
Bolya, D., Zhou, C., Xiao, F., and Lee, Y. J. (2019).
YOLACT: real-time instance segmentation. CoRR,
abs/1904.02689.
Cheng, H., Zhang, M., and Shi, J. Q. (2024). A survey
on deep neural network pruning: Taxonomy, compar-
ison, analysis, and recommendations. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
46(12):10558–10578.
Cong, P., Li, S., Zhou, J., Lv, K., and Feng, H. (2023). Re-
search on instance segmentation algorithm of green-
house sweet pepper detection based on improved
mask rcnn. Agronomy, 13(1).
Dai, J., He, K., Li, Y., Ren, S., and Sun, J. (2016). Instance-
sensitive fully convolutional networks. CoRR,
abs/1603.08678.
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., and Sun,
J. (2021). Repvgg: Making vgg-style convnets great
again. CoRR, abs/2101.03697.
Fontana, F., Lanzino, R., Marini, M. R., Avola, D., Cinque,
L., Scarcello, F., and Foresti, G. L. (2024). Distilled
gradual pruning with pruned fine-tuning. IEEE Trans-
actions on Artificial Intelligence, 5(8):4269–4279.
Fu, C., Shvets, M., and Berg, A. C. (2019). Retinamask:
Learning to predict masks improves state-of-the-art
single-shot detection for free. CoRR, abs/1901.03353.
Gao, N., Shan, Y., Wang, Y., Zhao, X., Yu, Y., Yang,
M., and Huang, K. (2019). SSAP: single-shot in-
stance segmentation with affinity pyramid. CoRR,
abs/1909.01616.
Gonizzi Barsanti, S., Marini, M. R., Malatesta, S. G., and
Rossi, A. (2024). Evaluation of denoising and vox-
elization algorithms on 3d point clouds. Remote Sens-
ing, 16(14):2632.
G
´
omez-Zamanillo, L., Gal
´
an, P., Bereciart
´
ua-P
´
erez, A.,
Pic
´
on, A., Moreno, J. M., Berns, M., and Echazarra,
J. (2024). Deep learning-based instance segmentation
for improved pepper phenotyping. Smart Agricultural
Technology, 9:100555.
Han, S., Pool, J., Tran, J., and Dally, W. J. (2015). Learn-
ing both weights and connections for efficient neural
networks. CoRR, abs/1506.02626.
Hariharan, B., Arbel
´
aez, P. A., Girshick, R. B., and Malik,
J. (2014). Hypercolumns for object segmentation and
fine-grained localization. CoRR, abs/1411.5752.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. B. (2017a).
Mask R-CNN. CoRR, abs/1703.06870.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep
residual learning for image recognition. CoRR,
abs/1512.03385.
He, Y., Zhang, X., and Sun, J. (2017b). Channel pruning for
accelerating very deep neural networks. In 2017 IEEE
International Conference on Computer Vision (ICCV),
pages 1398–1406.
Khayya, E. K., Oirrak, A. E., and Datsi, T. (2024). A survey
on rgb images classification using convolutional neu-
ral network (cnn) architectures: applications and chal-
lenges. In 2024 International Conference on Circuit,
Systems and Communication (ICCSC), pages 1–8.
Kirillov, A., Levinkov, E., Andres, B., Savchynskyy, B.,
and Rother, C. (2016). Instancecut: from edges to
instances with multicut. CoRR, abs/1611.08272.
Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H. P.
(2016). Pruning filters for efficient convnets. CoRR,
abs/1608.08710.
ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods
494