
ACKNOWLEDGEMENTS
The Ministry of Science, Innovation and Universities
(Spain) has funded this work through FPU23/00587
(M. Alfaro) and FPU21/04969 (J.J. Cabrera). This
work is part of the projects PID2023-149575OB-
I00, funded by MICIU/AEI/10.13039/501100011033
and by FEDER UE, and CIPROM/2024/8, funded by
Generalitat Valenciana.
REFERENCES
Ali-Bey, A., Chaib-Draa, B., and Giguere, P. (2023).
MixVPR: Feature mixing for visual place recognition.
In Proceedings of the IEEE/CVF winter conference on
applications of computer vision, pages 2998–3007.
Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., and
Sivic, J. (2016). NetVLAD: CNN architecture for
weakly supervised place recognition. In Proceedings
of the IEEE conference on computer vision and pat-
tern recognition, pages 5297–5307.
Berton, G., Masone, C., and Caputo, B. (2022). Rethinking
visual geo-localization for large-scale applications. In
Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pages 4878–
4888.
Berton, G., Trivigno, G., Caputo, B., and Masone, C.
(2023). Eigenplaces: Training viewpoint robust mod-
els for visual place recognition. In Proceedings of the
IEEE/CVF International Conference on Computer Vi-
sion, pages 11080–11090.
Cabrera, J. J., Cebollada, S., Pay
´
a, L., Flores, M., and
Reinoso, O. (2021). A robust CNN training approach
to address hierarchical localization with omnidirec-
tional images. In ICINCO, pages 301–310.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,
M., Heigold, G., Gelly, S., et al. (2020). An image is
worth 16x16 words: Transformers for image recogni-
tion at scale. arXiv preprint arXiv:2010.11929.
Finman, R., Paull, L., and Leonard, J. J. (2015). Toward
object-based place recognition in dense rgb-d maps. In
ICRA Workshop Visual Place Recognition in Chang-
ing Environments, Seattle, WA, volume 76, page 480.
Heredia-Aguado, E., Cabrera, J. J., Jim
´
enez, L. M., Va-
liente, D., and Gil, A. (2025). Static early fusion tech-
niques for visible and thermal images to enhance con-
volutional neural network detection: A performance
analysis. Remote Sensing, 17(6).
Huang, H., Liu, C., Zhu, Y., Cheng, H., Braud, T., and Ye-
ung, S.-K. (2024). 360Loc: A dataset and benchmark
for omnidirectional visual localization with cross-
device queries. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 22314–22324.
Izquierdo, S. and Civera, J. (2024). Optimal transport ag-
gregation for visual place recognition. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 17658–17668.
Keetha, N., Mishra, A., Karhade, J., Jatavallabhula, K. M.,
Scherer, S., Krishna, M., and Garg, S. (2023).
AnyLoc: Towards universal visual place recognition.
IEEE Robotics and Automation Letters, 9(2):1286–
1293.
Komorowski, J., Wysocza
´
nska, M., and Trzcinski, T.
(2021). MinkLoc++: LiDAR and monocular image
fusion for place recognition. In 2021 International
Joint Conference on Neural Networks (IJCNN), pages
1–8. IEEE.
Liu, W., Fei, J., and Zhu, Z. (2022). MFF-PR: Point cloud
and image multi-modal feature fusion for place recog-
nition. In 2022 IEEE International Symposium on
Mixed and Augmented Reality (ISMAR), pages 647–
655. IEEE.
Liu, Y., Wang, S., Xie, Y., Xiong, T., and Wu, M. (2024). A
review of sensing technologies for indoor autonomous
mobile robots. Sensors, 24(4):1222.
Lu, F., Zhang, L., Lan, X., Dong, S., Wang, Y., and Yuan,
C. (2024). Towards seamless adaptation of pre-trained
models for visual place recognition. arXiv preprint
arXiv:2402.14505.
Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec,
M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F.,
El-Nouby, A., et al. (2023). DINOv2: Learning robust
visual features without supervision. arXiv preprint
arXiv:2304.07193.
Pronobis, A. and Caputo, B. (2009). COLD: The CoSy
localization database. The International Journal of
Robotics Research, 28(5):588–594.
Schubert, S., Neubert, P., Garg, S., Milford, M., and Fis-
cher, T. (2023). Visual place recognition: A tutorial
[tutorial]. IEEE Robotics & Automation Magazine,
31(3):139–153.
Uy, M. A. and Lee, G. H. (2018). PointNetVLAD: Deep
point cloud based retrieval for large-scale place recog-
nition. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 4470–
4479.
Yu, X., Zhou, B., Chang, Z., Qian, K., and Fang, F. (2022).
MMDF: Multi-modal deep feature based place recog-
nition of mobile robots with applications on cross-
scene navigation. IEEE Robotics and Automation Let-
ters, 7(3):6742–6749.
Place Recognition with Omnidirectional Imaging and Confidence-Based Late Fusion
125