Benchmarking RGB-D Segmentation: Toy Dataset of Complex Crowded Scenes

Aleksi Ikkala; Joni Pajarinen; Ville Kyrki

doi:10.5220/0005675501070116

Benchmarking RGB-D Segmentation: Toy Dataset of Complex Crowded Scenes

Aleksi Ikkala, Joni Pajarinen, Ville Kyrki

2016

Abstract

In this paper we present a new RGB-D dataset captured with the Kinect sensor. The dataset is composed of typical children’s toys and contains a total of 449 RGB-D images alongside with their annotated ground truth images. Compared to existing RBG-D object segmentation datasets, the objects in our proposed dataset have more complex shapes and less texture. The images are also crowded and thus highly occluded. Three state-of-the-art segmentation methods are benchmarked using the dataset. These methods attack the problem of object segmentation from different starting points, providing a comprehensive view on the properties of the proposed dataset as well as the state-of-the-art performance. The results are mostly satisfactory but there remains plenty of room for improvement. This novel dataset thus poses the next challenge in the area of RGB-D object segmentation.

References

Anand, A., Koppula, H. S., Joachims, T., and Saxena, A. (2011). Contextually guided semantic labeling and search for 3D point clouds. International Journal of Robotics Research, abs/1111.5358.
Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121-167.
Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):1-27.
Chen, Y.-W. and Lin, C.-J. (2006). Combining SVMs with various feature selection strategies. In Feature Extraction, volume 207 of Studies in Fuzziness and Soft Computing, chapter 13, pages 315-324. Springer Berlin Heidelberg.
Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G. R., Konolige, K., and Navab, N. (2012). Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In Proceedings of the Asian Conference on Computer Vision (ACCV), pages 548-562. Springer.
Koppula, H. S., Anand, A., Joachims, T., and Saxena, A. (2011). Semantic labeling of 3D point clouds for indoor scenes. In Advances in Neural Information Processing Systems, pages 244-252. Curran Associates, Inc.
Lai, K., Bo, L., Ren, X., and Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 1817-1824. IEEE.
Mian, A., Bennamoun, M., and Owens, R. (2006). Threedimensional model-based object recognition and segmentation in cluttered scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1584-1601.
Mian, A., Bennamoun, M., and Owens, R. (2010). On the repeatability and quality of keypoints for local featurebased 3d object retrieval from cluttered scenes. International Journal of Computer Vision, 89(2-3):348- 361.
Richtsfeld, A., Mörwald, T., Prankl, J., Zillich, M., and Vincze, M. (2012). Segmentation of unknown objects in indoor environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4791-4796. IEEE.
Richtsfeld, A., Mörwald, T., Prankl, J., Zillich, M., and Vincze, M. (2014). Learning of perceptual grouping for object segmentation on RGB-D data. Journal of Visual Communication and Image Representation, 25(1):64 - 73.
Rusu, R. B. and Cousins, S. (2011). 3D is here: Point cloud library (PCL). In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 1-4. IEEE.
Silberman, N. and Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pages 601-608. IEEE.
Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In Proceedings of the 12th European Conference on Computer Vision (ECCV), pages 746-760. Springer-Verlag.
Singh, A., Sha, J., Narayan, K. S., Achim, T., and Abbeel, P. (2014). BigBIRD: A large-scale 3D database of object instances. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 509-516. IEEE.
Stein, S., Schoeler, M., Papon, J., and Worgotter, F. (2014). Object partitioning using local convexity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 304-311. IEEE.
Uckermann, A., Haschke, R., and Ritter, H. (2013). Realtime 3D segmentation for human-robot interaction. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2136-2143. IEEE.

Download

Paper Citation

in Harvard Style

Ikkala A., Pajarinen J. and Kyrki V. (2016). Benchmarking RGB-D Segmentation: Toy Dataset of Complex Crowded Scenes . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 107-116. DOI: 10.5220/0005675501070116

in Bibtex Style

@conference{visapp16,
author={Aleksi Ikkala and Joni Pajarinen and Ville Kyrki},
title={Benchmarking RGB-D Segmentation: Toy Dataset of Complex Crowded Scenes},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={107-116},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005675501070116},
isbn={978-989-758-175-5},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Benchmarking RGB-D Segmentation: Toy Dataset of Complex Crowded Scenes
SN - 978-989-758-175-5
AU - Ikkala A.
AU - Pajarinen J.
AU - Kyrki V.
PY - 2016
SP - 107
EP - 116
DO - 10.5220/0005675501070116