
mination, and occlusions. Investigating adaptive fil-
tering or robust feature matching will be crucial for
maintaining performance and understanding failure
modes in the face of temporal inconsistencies. (2)
Computational Efficiency: Optimizing the pipeline
for real-time operation on embedded platforms is a
key next step. Exploring incremental registration, par-
allel computing, or hardware acceleration can reduce
latency, enabling live collaborative localization and
mapping, and balancing accuracy with computational
demands. (3) Uncertainty and Scalability: Quantify-
ing uncertainties in transformations and trajectories is
essential for downstream tasks. Propagating registra-
tion errors and integrating confidence metrics will im-
prove decision-making. Additionally, extending the
approach to fuse point clouds from multiple hetero-
geneous robots requires addressing scalability, con-
sistency, and conflict resolution.
6 CONCLUSIONS
This work presented a comprehensive study on point
cloud registration techniques tailored for visual geo-
referenced localization between aerial and ground
robots using monocular visual SLAM data. It was
demonstrated that combining coarse feature-based
alignment with fine-grained ICP refinements effec-
tively overcomes challenges associated with scale
ambiguity, sparse data, and viewpoint discrepancies
typical of monocular SLAM outputs. The experi-
mental evaluation on heterogeneous robotic platforms
confirmed that the approach improves map fusion ac-
curacy and enables consistent trajectory estimation,
crucial for cooperative perception and navigation in
environments with GNSS-denied or intermittent con-
ditions. These results highlight the potential of the
registration pipelines to enhance multi-robot coordi-
nation and collaborative mapping, supporting many
applications. Future work will focus on real-time im-
plementation and scalability to larger teams and dy-
namic environments.
REFERENCES
Achtelik, M., Bachrach, A., He, R., Prentice, S., and Roy,
N. (2011). Stereo vision and laser odometry for au-
tonomous helicopters in gps-denied indoor environ-
ments. In Unmanned Systems Technology XIII, vol-
ume 8045, page 80450H. SPIE.
Bach, W. and Aggarwal, J. K. (1988). Motion Understand-
ing: Robot and Human Vision. Springer Science &
Business Media, New York.
Bay, H., Tuytelaars, T., and Van Gool, L. (2006). Surf:
Speeded up robust features. Computer Vision – ECCV
2006, pages 404–417.
Campos, C., Elvira, R., Rodriguez, J. J. G., Montiel, J.
M. M., and Tard
´
os, J. D. (2021). Orb-slam3: An
accurate open-source library for visual, visual-inertial
and multi-map slam. IEEE Transactions on Robotics,
37(6):1874–1890.
Hartley, R. and Zisserman, A. (2003). Multiple View Geom-
etry in Computer Vision. Cambridge University Press,
2nd edition.
Jian, B. and Vemuri, B. C. (2011). Robust point set regis-
tration using gaussian mixture models. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
33(8):1633–1645.
Kim, G. and Kim, A. (2018). Scan context: Egocentric spa-
tial descriptor for place recognition within 3d point
cloud map. In 2018 IEEE/RSJ International Confer-
ence on Intelligent Robots and Systems (IROS), pages
4802–4809. IEEE.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60(2):91–110.
Mur-Artal, R. and Tard
´
os, J. D. (2017). Orb-slam2:
An open-source slam system for monocular, stereo,
and rgb-d cameras. IEEE Transactions on Robotics,
33(5):1255–1262.
Nex, F. and Remondino, F. (2014). Uav for 3d mapping
applications: a review. Applied Geomatics, 6(1):1–15.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G.
(2011). Orb: An efficient alternative to sift or surf. In
2011 International Conference on Computer Vision,
pages 2564–2571.
Rusinkiewicz, S. and Levoy, M. (2001). Efficient variants
of the icp algorithm. In Proceedings Third Interna-
tional Conference on 3-D Digital Imaging and Mod-
eling, pages 145–152. IEEE.
Scaramuzza, D. and Fraundorfer, F. (2011). Visual odome-
try [tutorial]. IEEE Robotics & Automation Magazine,
18(4):80–92.
Szeliski, R. (2010). Computer Vision: Algorithms and Ap-
plications. Springer, London.
Zhang, Z. (2000). A flexible new technique for camera cal-
ibration. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 22(11):1330–1334.
Zhou, B., Wang, K., Wang, S., and Shen, S. (2020). Ro-
bust map merging for multi-robot visual slam in dy-
namic environments. IEEE Transactions on Robotics,
36(6):1649–1665.
ICINCO 2025 - 22nd International Conference on Informatics in Control, Automation and Robotics
218