
model approach provides a balanced compromise be-
tween robustness, interpretability, and runtime effi-
ciency, making it suitable for industrial deployment.
Next Steps: We acknowledge that the current eval-
uation is limited by dataset size and the scope of re-
ported metrics. To strengthen the quantitative anal-
ysis, we plan to significantly expand the annotated
dataset and compute standard detection metrics such
as precision, recall, and F1-score for each stage (bar
detection, triangle inference, safety classification).
This broader evaluation will provide a more compre-
hensive understanding of each method’s strengths and
failure modes, and help guide future improvements
in model architecture and rule design for industrial
safety validation.
6 CONCLUSION
We proposed an annotation-light vision framework
for real-time safety validation of steel bar storage in
outdoor industrial environments. By combining dual-
resolution zero-shot segmentation using SAM with
lightweight geometric reasoning, the system assesses
structural support from top and front views with no
manual labeling.
Key contributions include: (i) multi-scale SAM
mask generation for detecting both fine supports and
bulk materials, (ii) morphological proximity rules for
lateral support inference, (iii) triangle-based valida-
tion from frontal views, and (iv) efficient implemen-
tation suitable for real-world deployment.
Our method addresses key limitations of prior
work by avoiding task-specific annotations, handling
multi-scale structures, and offering interpretable,
geometry-driven safety decisions. Experimental re-
sults on real warehouse footage show reliable per-
formance under challenging conditions like occlusion
and clutter.
Future work includes extending to more complex
stacking scenarios, adding temporal smoothing, and
integrating multi-camera fusion. We also plan to
explore self-supervised fine-tuning of SAM for im-
proved low-contrast performance. This work lays the
foundation for fully automated structural safety mon-
itoring in heavy-industry logistics.
ACKNOWLEDGEMENT
The COGNIMAN project
1
, leading to this paper, has
received funding from the European Union’s Hori-
zon Europe research and innovation programme un-
der grant agreement No 101058477.
REFERENCES
Cen, J., Fang, J., and Shen, W. (2023). Segment anything in
3d with radiance fields. In Proceedings of ICCV.
Duda, R. and Hart, P. (1972). Use of the hough transforma-
tion to detect lines and curves in pictures. Communi-
cations of the ACM, 15(1):11–15.
Eiffert, S., Wendel, A., and Kirchner, N. (2021). Tool-
box spotter: A computer vision system for real-world
situational awareness in heavy industries. In IEEE
Conference on Automation Science and Engineering
(CASE).
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn. In Proceedings of ICCV.
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C.,
Gustafson, L., Xiao, T., Whitehead, S., Berg, A., Lo,
W.-Y., Doll
´
ar, P., and Girshick, R. (2023). Segment
anything. In Proceedings of ICCV.
K
¨
alvi
¨
ainen, H., Hirvonen, P., Xu, L., and Oja, E. (1995).
Probabilistic and non-probabilistic hough transforms:
overview and comparisons. Image and Vision Com-
puting, 13(4):239–252.
Lee, S. and Kim, H. (2021). Geometric primitive detec-
tion for structural support analysis. In Proceedings of
ICRA.
Lin, X. and Ferrari, V. (2024). Sam-6d: Zero-shot 6d object
pose estimation with segment anything. In Proceed-
ings of CVPR.
Patel, R. and Gupta, S. (2020). Automated safety violation
detection in manufacturing through vision ai. IEEE
Transactions on Industrial Informatics, 17(5):3502–
3512.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-
net: Convolutional networks for biomedical image
segmentation. In Navab, N., Hornegger, J., Wells,
W. M., and Frangi, A. F., editors, Medical Image Com-
puting and Computer-Assisted Intervention – MICCAI
2015, pages 234–241, Cham. Springer International
Publishing.
Smith, J. and Lee, P. (2019). Vision-based automation and
safety in industrial environments: A survey. IEEE
Transactions on Automation Science and Engineer-
ing, 16(4):1548–1565.
Wu, Y. and Zhang, X. (2019). Multi-scale image segmen-
tation using deep learning for industrial applications.
Pattern Recognition Letters, 120:109–116.
Zhang, L., Chen, Y., and Zhao, J. (2022). Proximity-based
support verification in robotic assembly. In Proceed-
ings of IROS.
1
www.cogniman.eu
ICINCO 2025 - 22nd International Conference on Informatics in Control, Automation and Robotics
478