Near Real-time Object Detection in RGBD Data

Ronny Hänsch, Stefan Kaiser, Olaf Helwich

Abstract

Most methods of object detection with RGBD cameras set hard constraints on their operational area. They only work with specific objects, in specific environments, or rely on time consuming computations. In the context of home robotics, such hard constraints cannot be made. Specifically, an autonomous home robot shall be equipped with an object detection pipeline that runs in near real-time and produces reliable results without restricting object type and environment. For this purpose, a baseline framework that works on RGB data only is extended by suitable depth features that are selected on the basis of a comparative evaluation. The additional depth data is further exploited to reduce the computational cost of the detection algorithm. A final evaluation of the enhanced framework shows significant improvements compared to its original version and state-of-the-art methods in terms of both, detection performance and real-time capability.

References

  1. Arbeiter, G., Fuchs, S., Bormann, R., Fischer, J., and Verl, A. (2012). Evaluation of 3d feature descriptors for classification of surface geometries in point clouds. In IROS 2012, pages 1644-1650.
  2. Badami, I., Stückler, J., and Behnke, S. (2013). Depthenhanced hough forests for object-class detection and continuous pose estimation. In SPME 2013, pages 1168-1174.
  3. Ballard, D. (1981). Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition, 13(2):111- 122.
  4. Bo, L., Ren, X., and Fox, D. (2013). Unsupervised feature learning for rgb-d based object recognition. In International Symposium on Experimental Robotics, pages 387-402.
  5. Bo, L., Ren, X., and Fox, D. (2014). Learning hierarchical sparse features for RGB-(D) object recognition. I. J. Robotics Res., 33(4):581-599.
  6. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5-32.
  7. Couprie, C., Farabet, C., Najman, L., and LeCun, Y. (2013). Indoor semantic segmentation using depth information. CoRR.
  8. Farabet, C., Couprie, C., Najman, L., and LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1915-1929.
  9. Gall, J., Razavi, N., and Van Gool, L. (2012). An introduction to random forests for multi-class object detection. In Outdoor and Large-Scale Real-World Scene Analysis, pages 243-263.
  10. Gupta, S., Girshick, R., Arbelaez, P., and Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. In ECCV 2014.
  11. Hinterstoisser, S., Holzer, S., Cagniart, C., Ilic, S., Konolige, K., Navab, N., and Lepetit, V. (2011). Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In ICCV 2011, pages 858-865.
  12. Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., and Navab, N. (2010). Dominant orientation templates for realtime detection of texture-less objects. In CVPR 2010, pages 2257-2264.
  13. Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., and Navab, N. (2013). Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In ACCV 2012, pages 548-562.
  14. Janoch, A., Karayev, S., Jia, Y., Barron, J., Fritz, M., Saenko, K., and Darrell, T. (2011). A category-level 3-d object dataset: Putting the kinect to work. In ICCV 2011, pages 1168-1174.
  15. Knopp, J., Prasad, M., Willems, G., Timofte, R., and Van Gool, L. (2010). Hough transform and 3d surf for robust three dimensional classification. In ECCV 2010, pages 589-602.
  16. Lai, K., Bo, L., Ren, X., and Fox, D. (2011a). A large-scale hierarchical multi-view rgb-d object dataset. In ICRA, pages 1817-1824. IEEE.
  17. Lai, K., Bo, L., Ren, X., and Fox, D. (2011b). A scalable tree-based approach for joint object and pose recognition. In AAAI 2011.
  18. Leibe, B., Leonardis, A., and Schiele, B. (2006). An implicit shape model for combined object categorization and segmentation. In Toward Category-Level Object Recognition, pages 508-524.
  19. M örwald, T., Prankl, J., Richtsfeld, A., Zillich, M., and Vincze, M. (2010). BLORT - The Blocks World Robotic Vision Toolbox. In Best Practice in 3D Perception and Modeling for Mobile Manipulation (in conjunction with ICRA 2010).
  20. Rios-Cabrera, R. and Tuytelaars, T. (2013). Discriminatively trained templates for 3d object detection: A real time scalable approach. In ICCV 2013, pages 2048- 2055.
  21. Rusinkiewicz, S. and Levoy, M. (2001). Efficient variants of the icp algorithm. In International Conference on 3-D Digital Imaging and Modeling.
  22. Rusu, R., Blodow, N., and Beetz, M. (2009). Fast point feature histograms (fpfh) for 3d registration. In ICRA 2009, pages 3212-3217.
  23. Rusu, R., Bradski, G., Thibaux, R., and Hsu, J. (2010). Fast 3d recognition and pose using the viewpoint feature histogram. In IROS 2010, pages 2155-2162.
  24. Tang, S., Wang, X., Lv, X., Han, T. X., Keller, J., He, Z., Skubic, M., and Lao, S. (2013). Histogram of oriented normal vectors for object recognition with a depth sensor. In ACCV 2012, pages 525-538.
  25. Tombari, F. and Di Stefano, L. (2010). Object recognition in 3d scenes with occlusions and clutter by hough voting. In PSIVT 2010, pages 349-355.
  26. Vergnaud, D. (2011). Efficient and secure generalized pattern matching via fast fourier transform. In AFRICACRYPT 2011, pages 41-58, Berlin, Heidelberg. Springer-Verlag.
  27. Wang, W., Chen, L., Chen, D., Li, S., and Kuhnlenz, K. (2013). Fast object recognition and 6d pose estimation using viewpoint oriented color-shape histogram. In ICME, pages 1-6.
Download


Paper Citation


in Harvard Style

Hänsch R., Kaiser S. and Helwich O. (2017). Near Real-time Object Detection in RGBD Data . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-226-4, pages 179-186. DOI: 10.5220/0006101401790186


in Bibtex Style

@conference{visapp17,
author={Ronny Hänsch and Stefan Kaiser and Olaf Helwich},
title={Near Real-time Object Detection in RGBD Data},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)},
year={2017},
pages={179-186},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006101401790186},
isbn={978-989-758-226-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)
TI - Near Real-time Object Detection in RGBD Data
SN - 978-989-758-226-4
AU - Hänsch R.
AU - Kaiser S.
AU - Helwich O.
PY - 2017
SP - 179
EP - 186
DO - 10.5220/0006101401790186