
lowed by YoloV10x with 0.727. Although YoloV10l
and YoloV10x maintain high and similar perfor-
mance, the RT-DETR models face considerable dif-
ficulties with the mAP50-95 metric, indicating that
they have lower generalization ability across multiple
accuracy thresholds.
5 CONCLUSIONS
This work developed a synthetic data generator within
a fully virtual environment created using the Unity
tool. The environment was created specifically before
the loading of trains to evaluate two computer vision
models, one very well-known and more traditional
that uses convolutional neural networks, YOLO, and
another model that is based on Vision Transformers
(VITs). Specifically in this work, RT-DETR is being
used, which is a real-time VIT.
From the training results of the four models, two
from YOLO and two from RT-DETR of compatible
sizes, it was observed that, for this environment, in
the context of train loading, YOLO outperforms RT-
DETR by approximately 10% mAP results. Addition-
ally, no significant improvement was seen in perfor-
mance with the larger model sizes of either YOLO or
RT-DETR, indicating that increasing the model size
did not yield better results for this specific task.
In a real-world scenario, the insights from this
study show that a YOLO network can be applied in
the area of train loading to improve automation and
monitoring processes in loading operations. The abil-
ity of computer vision models to detect and analyze
loading conditions in real time can significantly im-
prove efficiency and become a great ally for the oper-
ator reducing human error and increasing safety.
Grounds for future work include expanding the
analysis to other vision networks, comparing perfor-
mance on different hardware, and comparing the use
of synthetic data, real data, and mixed data in training
the networks.
ACKNOWLEDGEMENTS
The authors thank the entire team of the Master’s
Program in Instrumentation, Control and Automa-
tion of Mining Processes (PROFICAM), Fundac¸
˜
ao
de Amparo
`
a Pesquisa do Estado de Minas Gerais
(FAPEMIG), Vale Technological Institute (ITV) and
Federal University of Ouro Preto (UFOP). This
study was financed in part by the Coordenac¸
˜
ao
de Aperfeic¸oamento de Pessoal de N
´
ıvel Superior
- Brasil (CAPES) - Finance Code 001, the Con-
selho Nacional de Desenvolvimento Cient
´
ıfico e Tec-
nol
´
ogico (CNPQ) financing code 306101/2021-1,
FAPEMIG financing code APQ-00890-23 and APQ-
01306-22, the Instituto Tecnol
´
ogico Vale (ITV) and
the Universidade Federal de Ouro Preto (UFOP).
REFERENCES
Aydin, I., Karakose, M., and Akin, E. (2014). A new
contactless fault diagnosis approach for pantograph-
catenary system using pattern recognition and image
processing methods. Advances in Electrical and Com-
puter Engineering, 14(3):79–89.
Chen W, Meng S, J. Y. (2022). Foreign object detection in
railway images based on an efficient two-stage convo-
lutional neural network. Comput Intell Neurosci.
ElMasry, G., Cubero, S., Molt
´
o, E., and Blasco, J. (2012).
In-line sorting of irregular potatoes by using auto-
mated computer-based machine vision system. Jour-
nal of Food Engineering, 112(1):60–68.
Gholamizadeh, K., Zarei, E., and Yazdi, M. (2024). Rail-
way Transport and Its Role in the Supply Chains:
Overview, Concerns, and Future Direction, pages
769–796. Springer International Publishing, Cham.
H
¨
utten, N., Meyes, R., and Meisen, T. (2022). Vision trans-
former in industrial visual inspection. Applied Sci-
ences, 12(23).
Megahed, F. and Camelio, J. (2012). Real-time fault detec-
tion in manufacturing environments using face recog-
nition techniques. Journal of Intelligent Manufactur-
ing, 23:1–16.
Park, B., Chen, Y.-R., Nguyen, M., and Hwang, H.
(1996). Characterizing multispectral images of tumor-
ous, bruised, skin-torn, and wholesome poultry car-
casses. Transactions of the ASABE, 39:1933–1941.
Salas, A. J. C., Meza-Lovon, G., Fern
´
andez, M. E. L., and
Raposo, A. (2020). Training with synthetic images for
object detection and segmentation in real machinery
images. IEEE.
Skala, T., Mari
ˇ
cevi
´
c, M., and
ˇ
Cule, N. (2024). Optimization
and application of ray tracing algorithms to enhance
user experience through real-time rendering in virtual
reality. Acta Graphica, 32(2):108–129.
Sonka, M., Hlavac, V., and Boyle, R. (2014). Image
Processing, Analysis, and Machine Vision. Cengage
Learning.
Yang, Y., Miao, C., Li, X., and Mei, X. (2014). On-line con-
veyor belts inspection based on machine vision. Optik,
125(19):5803–5807.
Yao, R., Qi, P., Hua, D., Zhang, X., Lu, H., and Liu, X.
(2023). A foreign object detection method for belt
conveyors based on an improved yolox model. Tech-
nologies, 11(5).
Zhang, B., Huang, W., Li, J., Zhao, C., Fan, S., Wu, J., and
Liu, C. (2014). Principles, developments and appli-
cations of computer vision for external quality inspec-
tion of fruits and vegetables: A review. Food Research
International, 62:326–343.
ICEIS 2025 - 27th International Conference on Enterprise Information Systems
880