Real-time Scale-invariant Object Recognition from Light Field Imaging

Séverine Cloix, Thierry Pun, David Hasler

Abstract

We present a novel light field dataset along with a real-time and scale-invariant object recognition system. Our method is based on bag-of-visual-words and codebook approaches. Its evaluation was carried out on a subset of our dataset of unconventional images. We show that the low variance in scale inferred from the specificities of a plenoptic camera allows high recognition performance. With one training image per object to recognise, recognition rates greater than 90 % are demonstrated despite a scale variation of up to 178 %. Our versatile light-field image dataset, CSEM-25, is composed of five classes of five instances captured with the recent industrial Raytrix R5 camera at different distances with several poses and backgrounds. We make it available for research purposes.

References

  1. Adelson, E. H. and Bergen, J. R. (1991). The plenoptic function and the elements of early vision. Computational models of visual processing, 1(2):2-20.
  2. Arthur, D. and Vassilvitskii, S. (2007). k-means++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 1027- 1035.
  3. Bolles, R. C., Baker, H. H., and Marimont, D. H. (1987). Epipolar-plane image analysis: An approach to determining structure from motion. International Journal of Computer Vision, 1(1):7-55.
  4. Cho, D., Kim, S., and Tai, Y.-W. (2014). Consistent matting for light field images. InComputer Vision-ECCV 2014, pages 90-104. Springer.
  5. Cloix, S., Weiss, V., Bologna, G., Pun, T., and Hasler, D. (2014). Obstacle and planar object detection using sparse 3d information for a smart walker. In VISAPP (2)7814, pages 292-298.
  6. Coates, A., Karpathy, A., and Ng, A. Y. (2012). Emergence of object-selective features in unsupervised feature learning. In Pereira, F., Burges, C., Bottou, L., and Weinberger, K., editors, Advances in Neural Information Processing Systems 25, pages 2681-2689. Curran Associates, Inc.
  7. Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. In In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1-22.
  8. Everingham, M., Eslami, S., Van Gool, L., Williams, C., Winn, J., and Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1):98-136.
  9. Gavrila, D. M. and Munder, S. (2006). Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle. International Journal of Computer Vision, 73(1):41-59.
  10. Georgiev, T. G. and Lumsdaine, A. (2009). Resolution in plenoptic cameras. In Computational Optical Sensing and Imaging, page CTuB3. Optical Society of America.
  11. Ghasemi, A., Afonso, N. J., and Vetterli, M. (2014). LCAV31: a dataset for light field object recognition. InProceedings of the SPIE, volume 9020, San Francisco, California, USA. International Society for Optics and Photonics.
  12. Ghasemi, A. and Vetterli, M. (2014). Scale-invariant representation of light field images for object recognition and tracking. In Proceedings of the SPIE, volume 9020 of Proceedings of SPIE, San Francisco, California, USA. International Society for Optics and Photonics.
  13. Helmer, S. and Lowe, D. (2010). Using stereo for object recognition. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 3121- 3127.
  14. Joshi, N., Matusik, W., and Avidan, S. (2006). Natural video matting using camera arrays. ACM Trans. Graph., 25(3):779-786.
  15. Kim, C., Zimmer, H., Pritch, Y., Sorkine-Hornung, A., and Gross, M. H. (2013). Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graph., 32(4):73.
  16. Levoy, M. (2011). The (old) stanford light fields archive. http://graphics.stanford.edu/software/lightpack/lifs.html. [Online, accessed 30-March-2015].
  17. Ng, R. (2005). Fourier slice photography. In ACM Transactions on Graphics (TOG), volume 24, pages 735-744. ACM.
  18. Perwass, C. and Wietzke, L. (2012). Single lens 3d-camera with extended depth-of-field. In IS&T/SPIE Electronic Imaging, pages 829108-829108. International Society for Optics and Photonics.
  19. Wanner, S., Meister, S., and Goldluecke, B. (2013). Datasets and benchmarks for densely sampled 4d light fields. InVision, Modelling and Visualization (VMV).
  20. Wetzstein, G. Synthetic light field archive. http://web. media.mit.edu/gordonw/SyntheticLightFields/index.php. [Online, accessed 30-March-2015].
  21. Wilburn, B., Joshi, N., Vaish, V., Talvala, E.-V., Antunez, E., Barth, A., Adams, A., Horowitz, M., and Levoy, M. (2005). High performance imaging using large camera arrays. ACM Trans. Graph., 24(3):765-776.
  22. Zobel, M., Fritz, M., and Scholz, I. (2002). Object tracking and pose estimation using light-field object models. In Proceedings of the Vision, Modeling, and Visualization Conference 2002 (VMV 2002), Erlangen, Germany, November 20-22, 2002, pages 371-378.
Download


Paper Citation


in Harvard Style

Cloix S., Pun T. and Hasler D. (2016). Real-time Scale-invariant Object Recognition from Light Field Imaging . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 336-344. DOI: 10.5220/0005678603360344


in Bibtex Style

@conference{visapp16,
author={Séverine Cloix and Thierry Pun and David Hasler},
title={Real-time Scale-invariant Object Recognition from Light Field Imaging},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={336-344},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005678603360344},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Real-time Scale-invariant Object Recognition from Light Field Imaging
SN - 978-989-758-175-5
AU - Cloix S.
AU - Pun T.
AU - Hasler D.
PY - 2016
SP - 336
EP - 344
DO - 10.5220/0005678603360344