Region Extraction of Multiple Moving Objects with Image and Depth Sequence

Katsuya Sugawara, Ryosuke Tsuruga, Toru Abe, Takuo Suganuma

Abstract

This paper proposes a novel method for extracting the regions of multiple moving objects with an image and a depth sequence. In addition to image features, diverse types of features, such as depth and image-depth-derived 3D motion, have been used in existing methods for improving the accuracy and robustness of object region extraction. Most of the existing methods determine individual object regions according to the spatial-temporal similarities of such features, i.e., they regard a spatial-temporal area of uniform features as a region sequence corresponding to the same object. Consequently, the depth features in a moving object region, where the depth varies with frames, and the motion features in a nonrigid or articulated object region, where the motion varies with parts, cannot be effectively used for object region extraction. To deal with these difficulties, our proposed method extracts the region sequences of individual moving objects according to depth feature similarity adjusted by each object movement and motion feature similarity computed only in adjacent parts. Through the experiments on scenes where a person moves a box, we demonstrate the effectiveness of the proposed method in extracting the regions of multiple moving objects.

References

  1. Abramov, A., Pauwels, K., Papon, J., Wörgötter, F., and Dellen, B. (2012). Depth-supported real-time video segmentation with the Kinect. In Proc. IEEE Workshop Appl. Comput. Vision, pages 457-464.
  2. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and Susstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Machine Intell., 34(11):2274-2282.
  3. Bergamasco, F., Albarelli, A., Torsello, A., Favaro, M., and Zanuttigh, P. (2012). Pairwise similarities for scene segmentation combining color and depth data. In Proc. 21st Int. Conf. Pattern Recognit., pages 3565- 3568.
  4. C¸i g?la, C. and Alatan, A. A. (2008). Object segmentation in multi-view video via color, depth and motion cues. In Proc. IEEE Int. Conf. Image Process., pages 2724- 2727.
  5. Comaniciu, D. and Meer, P. (1999). Mean shift analysis and applications. In Proc. Int. Conf. Comput. Vision, volume 2, pages 1197-2003.
  6. Couprie, C., Farabet, C., LeCun, Y., and Najman, L. (2013). Causal graph-based video segmentation. In Proc. IEEE Int. Conf. Image Process., pages 4249-4253.
  7. DeMenthon, D. and Megret, R. (2002). Spatio-temporal segmentation of video by hierarchical mean shift analysis. Technical Report TR-4388, Center for Automat. Res., U. of Md, College Park.
  8. Farnebäck, G. (2003). Two-frame motion estimation based on polynomial expansion. In Proc. Scand. Conf. Image Anal., pages 363-370.
  9. Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. Int. J. Comput. Vision, 59(2):167-181.
  10. Fernández, J. and Aranda, J. (2000). Image segmentation combining region depth and object features. In Proc. 15th Int. Conf. Pattern Recognit., volume 1, pages 618-621.
  11. Galasso, F., Cipolla, R., and Schiele, B. (2012). Video segmentation with superpixels. In Proc. 11th Asian Conf. Comput. Vision, volume 1, pages 760-774.
  12. Grundmann, M., Kwatra, V., Han, M., and Essa, I. (2010). Efficient hierarchical graph-based video segmentation. In Proc. IEEE Conf. Comput. Vision Pattern Recognit., pages 2141-2148.
  13. Lezama, J., Alahari, K., Sivic, J., and Laptev, I. (2011). Track to the future: Spatio-temporal video segmentation with long-range motion cues. In Proc. IEEE Conf. Comput. Vision Pattern Recognit., pages 3369-3376.
  14. Microsoft (2013). Kinect for Windows SDK v1.8. http://www.microsoft.com/en-us/download/ details.aspx?id=40278. Online; accessed 1-Sep.- 2015.
  15. Microsoft (2015). Kinect - Windows app development. https://dev.windows.com/en-us/kinect. Online; accessed 1-Sep.-2015.
  16. Trichet, R. and Nevatia, R. (2013). Video segmentation with spatio-temporal tubes. In Proc. IEEE Int. Conf. Adv. Video Signal Based Surv., pages 330-335.
  17. Xia, L., Chen, C.-C., and Aggarwal, J. K. (2011). Human detection using depth information by Kinect. In Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pages 15-22.
  18. Xu, C. and Corso, J. J. (2012). Evaluation of super-voxel methods for early video processing. In Proc. IEEE Conf. Comput. Vision Pattern Recognit., pages 1202- 1209.
Download


Paper Citation


in Harvard Style

Sugawara K., Tsuruga R., Abe T. and Suganuma T. (2016). Region Extraction of Multiple Moving Objects with Image and Depth Sequence . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 255-262. DOI: 10.5220/0005782402550262


in Bibtex Style

@conference{visapp16,
author={Katsuya Sugawara and Ryosuke Tsuruga and Toru Abe and Takuo Suganuma},
title={Region Extraction of Multiple Moving Objects with Image and Depth Sequence},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={255-262},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005782402550262},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Region Extraction of Multiple Moving Objects with Image and Depth Sequence
SN - 978-989-758-175-5
AU - Sugawara K.
AU - Tsuruga R.
AU - Abe T.
AU - Suganuma T.
PY - 2016
SP - 255
EP - 262
DO - 10.5220/0005782402550262