Robust Scene Understanding for Mobile Robots Based on Vision and Deep Learning Models

Leticia C. Pereira, Fernando S. Osório

2025

Abstract

This paper presents the architecture and results of AIVFusion, a real-time perception system designed to generate a rich, multi-layered understanding of an environment from a single monocular camera for autonomous mobile robots. The system is designed to fuse information from different deep learning models to achieve a comprehensive scene understanding. Our architecture integrates three open-source models to perform distinct perception tasks: object detection (YOLOv8), semantic segmentation (FastSAM), and monocular depth estimation (Depth Anything V2). By fusing these outputs, the system generates a unified representation that identifies the navigable area, detects nearby obstacles based on depth information, and semantically labels those identified as “person”. The resulting perceptual information can then be leveraged by higher-level systems for tasks such as decision-making and safer navigation. The system’s viability is demonstrated through qualitative tests in indoor environments. These results confirm its ability to operate in real-time (approximately 10 FPS) and to effectively fuse the perception layers, even in challenging scenarios involving partial object occlusion.

Download


Paper Citation


in Harvard Style

Pereira L. and Osório F. (2025). Robust Scene Understanding for Mobile Robots Based on Vision and Deep Learning Models. In Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO; ISBN 978-989-758-770-2, SciTePress, pages 386-393. DOI: 10.5220/0013789100003982


in Bibtex Style

@conference{icinco25,
author={Leticia Pereira and Fernando Osório},
title={Robust Scene Understanding for Mobile Robots Based on Vision and Deep Learning Models},
booktitle={Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO},
year={2025},
pages={386-393},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013789100003982},
isbn={978-989-758-770-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO
TI - Robust Scene Understanding for Mobile Robots Based on Vision and Deep Learning Models
SN - 978-989-758-770-2
AU - Pereira L.
AU - Osório F.
PY - 2025
SP - 386
EP - 393
DO - 10.5220/0013789100003982
PB - SciTePress