A Lightweight Real-time Stereo Depth Estimation Network with Dynamic Upsampling Modules

Yong Deng, Jimin Xiao, Steven Zhou, Steven Zhou

Abstract

Deep learning based stereo matching networks achieve great success in the depth estimation from stereo image pairs. However, current state-of-the-art methods usually are computationally intensive, which prevents them from being applied in real-time scenarios or on mobile platforms with limited computational resources. In order to tackle this shortcoming, we propose a lightweight real-time stereo matching network for disparity estimation. Our network adopts the efficient hierarchical Coarse-To-Fine (CTF) matching scheme, which starts matching from the low-resolution feature maps, and then upsamples and refines the previous disparity stage by stage until the full resolution. We can take the result of any stage as output to trade off accuracy and runtime. We propose an efficient hourglass-shaped feature extractor based on the latest MobileNet V3 to extract multi-resolution feature maps from stereo image pairs. We also propose to replace the traditional upsampling method in the CTF matching scheme with the learning-based dynamic upsampling modules to avoid blurring effects caused by conventional upsampling methods. Our model can process 1242 x 375 resolution images with 35-68 FPS on a GeForce GTX 1660 GPU, and outperforms all competitive baselines with comparable runtime on the KITTI 2012/2015 datasets.

Download


Paper Citation


in Harvard Style

Deng Y., Xiao J. and Zhou S. (2021). A Lightweight Real-time Stereo Depth Estimation Network with Dynamic Upsampling Modules.In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, ISBN 978-989-758-488-6, pages 701-710. DOI: 10.5220/0010197607010710


in Bibtex Style

@conference{visapp21,
author={Yong Deng and Jimin Xiao and Steven Zhou},
title={A Lightweight Real-time Stereo Depth Estimation Network with Dynamic Upsampling Modules},
booktitle={Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP,},
year={2021},
pages={701-710},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010197607010710},
isbn={978-989-758-488-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP,
TI - A Lightweight Real-time Stereo Depth Estimation Network with Dynamic Upsampling Modules
SN - 978-989-758-488-6
AU - Deng Y.
AU - Xiao J.
AU - Zhou S.
PY - 2021
SP - 701
EP - 710
DO - 10.5220/0010197607010710