Part-driven Visual Perception of 3D Objects

Frithjof Gressmann, Timo Lüddecke, Tatyana Ivanovska, Markus Schoeler, Florentin Wörgötter

2017

Abstract

During the last years, approaches based on convolutional neural networks (CNN) had substantial success in visual object perception. CNNs turned out to be capable of extracting high-level features of objects, which allow for fine-grained classification. However, some object classes exhibit tremendous variance with respect to their instances appearance. We believe that considering object parts as an intermediate representation could be helpful in these cases. In this work, a part-driven perception of everyday objects with a rotation estimation is implemented using deep convolution neural networks. The used network is trained and tested on artificially generated RGB-D data. The approach has a potential to be used for part recognition of realistic sensor recordings in present robot systems.

References

  1. Biederman, I. (1987). Recognition-by-components: a theory of human image understanding. Psychological review, 94(2):115.
  2. Chen, X., Golovinskiy, A., and Funkhouser, T. (2009). A benchmark for 3D mesh segmentation. ACM Transactions on Graphics (Proc. SIGGRAPH), 28(3).
  3. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, 32(9).
  4. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 580-587. IEEE.
  5. Gschwandtner, M., Kwitt, R., Uhl, A., and Pree, W. (2011). Blensor: Blender sensor simulation toolbox. In Advances in Visual Computing, volume 6939 of Lecture Notes in Computer Science, chapter 20. Springer Berlin / Heidelberg, Berlin, Heidelberg.
  6. Gupta, S., Arbeláez, P., Girshick, R., and Malik, J. (2015). Aligning 3d models to rgb-d images of cluttered scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4731-4740.
  7. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
  8. Holzer, S., Rusu, R. B., Dixon, M., Gedikli, S., and Navab, N. (2012). Adaptive neighborhood selection for realtime surface normal estimation from organized point cloud data using integral images. In International Conference on Intelligent Robots and Systems (IROS), pages 2684-2689. IEEE.
  9. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105.
  10. Leibe, B., Leonardis, A., and Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Workshop on statistical learning in computer vision, ECCV, volume 2.
  11. Oliveira, G. L., Valada, A., Bollen, C., Burgard, W., and Brox, T. (2016). Deep learning for human part discovery in images. In IEEE International Conference on Robotics and Automation (ICRA).
  12. Papon, J. and Schoeler, M. (2015). Semantic pose using deep networks trained on synthetic rgb-d. In IEEE International Conference on Computer Vision (ICCV).
  13. Schoeler, M., Papon, J., and Worgotter, F. (2015). Constrained planar cuts - object partitioning for point clouds. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  14. Tsogkas, S., Kokkinos, I., Papandreou, G., and Vedaldi, A. (2015). Semantic part segmentation with deep learning. arXiv preprint arXiv:1505.02438.
  15. Wang, Y., Asafi, S., van Kaick, O., Zhang, H., Cohen-Or, D., and Chen, B. (2012). Active co-analysis of a set of shapes. 31(6):165:1-165:10.
  16. Zhang, N., Donahue, J., Girshick, R., and Darrell, T. (2014). Part-based r-cnns for fine-grained category detection. In European Conference on Computer Vision. Springer.
Download


Paper Citation


in Harvard Style

Gressmann F., Lüddecke T., Ivanovska T., Schoeler M. and Wörgötter F. (2017). Part-driven Visual Perception of 3D Objects . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-226-4, pages 370-377. DOI: 10.5220/0006211203700377


in Bibtex Style

@conference{visapp17,
author={Frithjof Gressmann and Timo Lüddecke and Tatyana Ivanovska and Markus Schoeler and Florentin Wörgötter},
title={Part-driven Visual Perception of 3D Objects},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)},
year={2017},
pages={370-377},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006211203700377},
isbn={978-989-758-226-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)
TI - Part-driven Visual Perception of 3D Objects
SN - 978-989-758-226-4
AU - Gressmann F.
AU - Lüddecke T.
AU - Ivanovska T.
AU - Schoeler M.
AU - Wörgötter F.
PY - 2017
SP - 370
EP - 377
DO - 10.5220/0006211203700377