Multiple Multi-Modal AI for Semantic Annotations of 3D Spatial Data
Lee Kent, Hermenegildo Solheiro, Keisuke Toyoda
2025
Abstract
3D reconstruction of physical environments presents significant challenges, particularly when it comes to the semantic interpretation of these spaces, which often requires human input. This paper introduces a novel process that leverages multiple AI models trained on 2D images to automatically interpret and semantically annotate 3D spaces. Using a game engine as an intermediary, the process facilitates the integration of various 3D formats with 2D-trained AI models, enabling the capture and reprojection of semantic annotations back into the 3D space. A representative 3D scene is employed to evaluate the system’s performance, achieving an object identification accuracy of 87% alongside successful semantic annotation. By offloading semantic annotation tasks to external 2D AI, this approach reduces the computational burden on edge devices, enabling dynamic updates to the system’s internal knowledge base. This methodology enhances the scalability of spatial AI, providing a more comprehensive understanding of 3D reconstructed environments and improving the feasibility of real-time, AI-driven reasoning in spatial applications.
DownloadPaper Citation
in Harvard Style
Kent L., Solheiro H. and Toyoda K. (2025). Multiple Multi-Modal AI for Semantic Annotations of 3D Spatial Data. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 1: GRAPP; ISBN 978-989-758-728-3, SciTePress, pages 308-316. DOI: 10.5220/0013235300003912
in Bibtex Style
@conference{grapp25,
author={Lee Kent and Hermenegildo Solheiro and Keisuke Toyoda},
title={Multiple Multi-Modal AI for Semantic Annotations of 3D Spatial Data},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 1: GRAPP},
year={2025},
pages={308-316},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013235300003912},
isbn={978-989-758-728-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 1: GRAPP
TI - Multiple Multi-Modal AI for Semantic Annotations of 3D Spatial Data
SN - 978-989-758-728-3
AU - Kent L.
AU - Solheiro H.
AU - Toyoda K.
PY - 2025
SP - 308
EP - 316
DO - 10.5220/0013235300003912
PB - SciTePress