Joint Segmentation and Tracking of Object Surfaces in Depth Movies along Human/Robot Manipulations

Babette Dellen; Farzad Husain; Carme Torras

doi:10.5220/0004209502440251

Joint Segmentation and Tracking of Object Surfaces in Depth Movies along Human/Robot Manipulations

Babette Dellen, Farzad Husain, Carme Torras

2013

Abstract

A novel framework for joint segmentation and tracking in depth videos of object surfaces is presented. Initially, the 3D colored point cloud obtained using the Kinect camera is used to segment the scene into surface patches, defined by quadratic functions. The computed segments together with their functional descriptions are then used to partition the depth image of the subsequent frame in a consistent manner with respect to the precedent frame. This way, solutions established in previous frames can be reused which improves the efficiency of the algorithm and the coherency of the segmentations along the movie. The algorithm is tested for scenes showing human and robot manipulations of objects. We demonstrate that the method can successfully segment and track the human/robot arm and object surfaces along the manipulations. The performance is evaluated quantitatively by measuring the temporal coherency of the segmentations and the segmentation covering using ground truth. The method provides a visual front-end designed for robotic applications, and can potentially be used in the context of manipulation recognition, visual servoing, and robot-grasping tasks.

References

Abramov, A., Aksoy, E. E., Dörr, J., Wörgötter, F., Pauwels, K., and Dellen, B. (2010). 3d semantic representation of actions from efficient stereo-image-sequence segmentation on gpus. In 5th Intl. Symp. 3D Data Processing, Visualization and Transmission.
Agostini, A., Torras, C., and Wörgötter, F. (2011). Integrating task planning and interactive learning for robots to work in human environments. In IJCAI, Barcelona, pages 2386-2391.
Aksoy, E. E., Abramov, A., Dörr, J., Ning, K., Dellen, B., and Wörgötter, F. (2011). Learning the semantics of object-action relations by observation. Int. J. Rob. Res., 30(10):1229-1249.
Arbelaez, P., Maire, M., Fowlkes, C., and Malik, J. (2009). From contours to regions: An empirical evaluation. In CVPR, pages 2294 -2301.
Cremers, D. (2006). Dynamical statistical shape priors for level set-based tracking. IEEE TPAMI, 28(8):1262 - 1273.
Dellen, B., Alenya, G., Foix, S., and Torras, C. (2011). Segmenting color images into surface patches by exploiting sparse depth data. In IEEE Workshop on Applications of Computer Vision, pages 591 -598.
Deng, Y. and Manjunath, B. (2001). Unsupervised segmentation of color-texture regions in images and video. IEEE TPAMI, 23(8):800 -810.
Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. Intl. J. of Computer Vision, 59(2):167-181.
Grundmann, M., Kwatra, V., Han, M., and Essa, I. (2010). Efficient hierarchical graph-based video segmentation. In CVPR, pages 2141 -2148.
Hofman, I. and Jarvis, R. (2000). Object recognition via attributed graph matching. In Proc. Australian Conf. on Robotics and Automation, Melbourne, Australia.
Kinect (2010). Kinect for xbox 360. www.xbox.com/en-US/kinect.
Kragic, D. (2001). Visual Servoing for Manipulation: Robustness and Integration Issues. PhD thesis, Computational Vision and Active Perception Laboratory, Royal Institute of Technology, Stockholm, Sweden.
Kruskal, J. B. (1956). On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. In Proc. of the American Mathematical Society.
Lopez-Mendez, A., Alcoverro, M., Pardas, M., and Casas, J. (2011). Real-time upper body tracking with online initialization using a range sensor. In IEEE Intl. Conf. on Computer Vision Workshops, pages 391 -398.
Parvizi, E. and Wu, Q. (2008). Multiple object tracking based on adaptive depth segmentation. In Canadian Conf. on Computer and Robot Vision, pages 273 -277.
Patras, I., Hendriks, E., and Lagendijk, R. (2001). Video segmentation by map labeling of watershed segments. IEEE TPAMI, 23(3):326 -332.
Rozo, L., Jimenez, P., and Torras, C. (2011). Robot learning from demonstration of force-based tasks with multiple solution trajectories. In 15th Intl. Conf. on Advanced Robotics, pages 124 -129.
Taylor, G. and Kleeman, L. (2002). Grasping unknown objects with a humanoid robot. In Proc. of Australasian Conf. on Robotics and Automation, pages 191-196.
Wang, C., de La Gorce, M., and Paragios, N. (2009). Segmentation, ordering and multi-object tracking using graphical models. In IEEE 12th Intl. Conf. on Computer Vision, pages 747 -754.
Wang, D. (1998). Unsupervised video segmentation based on watersheds and temporal tracking. IEEE Transactions on Circuits and Systems for Video Technology, 8(5):539 -546.

Download

Paper Citation

in Harvard Style

Dellen B., Husain F. and Torras C. (2013). Joint Segmentation and Tracking of Object Surfaces in Depth Movies along Human/Robot Manipulations . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 244-251. DOI: 10.5220/0004209502440251

in Bibtex Style

@conference{visapp13,
author={Babette Dellen and Farzad Husain and Carme Torras},
title={Joint Segmentation and Tracking of Object Surfaces in Depth Movies along Human/Robot Manipulations},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},
year={2013},
pages={244-251},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004209502440251},
isbn={978-989-8565-47-1},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Joint Segmentation and Tracking of Object Surfaces in Depth Movies along Human/Robot Manipulations
SN - 978-989-8565-47-1
AU - Dellen B.
AU - Husain F.
AU - Torras C.
PY - 2013
SP - 244
EP - 251
DO - 10.5220/0004209502440251