Visual Descriptor Learning from Monocular Video

Umashankar Deekshith, Nishit Gajjar, Max Schwarz, Sven Behnke

Abstract

Correspondence estimation is one of the most widely researched and yet only partially solved area of computer vision with many applications in tracking, mapping, recognition of objects and environment. In this paper, we propose a novel way to estimate dense correspondence on an RGB image where visual descriptors are learned from video examples by training a fully convolutional network. Most deep learning methods solve this by training the network with a large set of expensive labeled data or perform labeling through strong 3D generative models using RGB-D videos. Our method learns from RGB videos using contrastive loss, where relative labeling is estimated from optical flow. We demonstrate the functionality in a quantitative analysis on rendered videos, where ground truth information is available. Not only does the method perform well on test data with the same background, it also generalizes to situations with a new background. The descriptors learned are unique and the representations determined by the network are global. We further show the applicability of the method to real-world videos.

Download


Paper Citation


in Harvard Style

Deekshith U., Gajjar N., Schwarz M. and Behnke S. (2020). Visual Descriptor Learning from Monocular Video.In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, ISBN 978-989-758-402-2, pages 444-451. DOI: 10.5220/0008989304440451


in Bibtex Style

@conference{visapp20,
author={Umashankar Deekshith and Nishit Gajjar and Max Schwarz and Sven Behnke},
title={Visual Descriptor Learning from Monocular Video},
booktitle={Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP,},
year={2020},
pages={444-451},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008989304440451},
isbn={978-989-758-402-2},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP,
TI - Visual Descriptor Learning from Monocular Video
SN - 978-989-758-402-2
AU - Deekshith U.
AU - Gajjar N.
AU - Schwarz M.
AU - Behnke S.
PY - 2020
SP - 444
EP - 451
DO - 10.5220/0008989304440451