Using Inertial Data to Enhance Image Segmentation - Knowing Camera Orientation Can Improve Segmentation of Outdoor Scenes

Osian Haines, David Bull, J. F. Burn

2015

Abstract

In the context of semantic image segmentation, we show that knowledge of world-centric camera orientation (from an inertial sensor) can be used to improve classification accuracy. This works because certain structural classes (such as the ground) tend to appear in certain positions relative to the viewer. We show that orientation information is useful in conjunction with typical image-based features, and that fusing the two results in substantially better classification accuracy than either alone – we observed an increase from 61% to 71% classification accuracy, over the six classes in our test set, when orientation information was added. The method is applied to segmentation using both points and lines, and we also show that combining points with lines further improves accuracy. This work is done towards our intended goal of visually guided locomotion for either an autonomous robot or human.

References

  1. Angelaki, D. and Cullen, K. (2008). Vestibular system: The many facets of a multimodal sense. Annual Review of Neuroscience, 31:125-150.
  2. Bi, Y., Guan, J., and Bell, D. (2008). The combination of multiple classifiers using an evidential reasoning approach. Artificial Intelligence, 172(15):1731-1751.
  3. Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer.
  4. Brock, M. and Kristensson, P. (2013). Supporting blind navigation using depth sensing and sonification. In Proc. Conf. Pervasive and ubiquitous computing adjunct publication.
  5. Dahlkamp, H., Kaehler, A., Stavens, D., Thrun, S., and Bradski, G. (2006). Self-supervised monocular road detection in desert terrain. In Proc. Robotics Science and Systems. Philadelphia.
  6. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition.
  7. Delong, A., Osokin, A., Isack, H. N., and Boykov, Y. (2012). Fast approximate energy minimization with label costs. International Journal of Computer Vision, 96(1):1-27.
  8. Deshpande, N. and Patla, A. (2007). Visual-vestibular interaction during goal directed locomotion: effects of aging and blurring vision. Experimental brain research, 176(1):43-53.
  9. DeSouza, G. and Kak, A. (2002). Vision for mobile robot navigation: A survey. IEEE Trans. Pattern Analysis and Machine Intelligence, 24(2):237-267.
  10. Domke, J. (2013). Learning graphical model parameters with approximate marginal inference. IEEE Trans. Pattern Analysis and Machine Intelligence, 35(10):2454.
  11. Gould, S., Fulton, R., and Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In Proc. IEEE Int. Conf. Computer Vision.
  12. Gupta, S., Arbeláez, P., Girshick, R., and Malik, J. (2014). Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation. Int. Journal of Computer Vision, pages 1-17.
  13. Haines, O. and Calway, A. (2012). Detecting planes and estimating their orientation from a single image. In Proc. British Machine Vision Conf.
  14. Hoiem, D., Efros, A., and Hebert, M. (2007). Recovering surface layout from an image. Int. Journal of Computer Vision, 75(1):151-172.
  15. Joshi, N., Kang, S., Zitnick, C., and Szeliski, R. (2010). Image deblurring using inertial measurement sensors. ACM Trans. Graphics, 29(4):30.
  16. Kleiner, A. and Dornhege, C. (2007). Real-time localization and elevation mapping within urban search and rescue scenarios. Journal of Field Robotics, 24(8-9):723- 745.
  17. Krähenbühl, P. and Koltun, V. (2011). Efficient inference in fully connected crfs with gaussian edge potentials. In Advances in Neural Information Processing Systems, pages 109-117.
  18. Kundu, A., Li, Y., Dellaert, F., Li, F., and Rehg, J. (2014). Joint semantic segmentation and 3d reconstruction from monocular video. In Proc. European Conf. Computer Vision.
  19. Li, S. (2009). Markov Random Field Modeling in Image Analysis. Springer-Verlag.
  20. Lorch, O., Albert, A., Denk, J., Gerecke, M., Cupec, R., Seara, J., Gerth, W., and Schmidt, G. (2002). Experiments in vision-guided biped walking. In Proc. IEEE Int. Conf. Intelligent Robots and Systems.
  21. Maimone, M., Cheng, Y., and Matthies, L. (2007). Two years of visual odometry on the mars exploration rovers. Journal of Field Robotics, 24(3):169-186.
  22. Nützi, G., Weiss, S., Scaramuzza, D., and Siegwart, R. (2011). Fusion of imu and vision for absolute scale estimation in monocular slam. Journal of Intelligent and Robotic Systems, 61(1-4):287-299.
  23. Patla, A. (1997). Understanding the roles of vision in the control of human locomotion. Gait & Posture, 5(1):54-69.
  24. Piniés, P., Lupton, T., Sukkarieh, S., and Tardós, J. (2007). Inertial aiding of inverse depth slam using a monocular camera. In Proc. IEEE Int. Conf. Robotics and Automation.
  25. Sadhukhan, D., Moore, C., and E., C. (2004). Terrain estimation using internal sensors. In Proc. Int. Conf. on Robotics and Applications.
  26. Sturgess, P., Alahari, K., Ladicky, L., and Torr, P. (2009). Combining appearance and structure from motion features for road scene understanding. In Proc. British Machine Vision Conf.
  27. Tapu, R., Mocanu, B., and Zaharia, T. (2013). A computer vision system that ensure the autonomous navigation of blind people. In Proc. Conf. E-Health and Bioengineering.
  28. Virre, E. (1996). Virtual reality and the vestibular apparatus. Engineering in Medicine and Biology Magazine, 15(2):41-43.
  29. Von Gioi, R., Jakubowicz, J., Morel, J., and Randall, G. (2010). Lsd: A fast line segment detector with a false detection control. IEEE Trans. Pattern Analysis and Machine Intelligence, 32(4):722-732.
Download


Paper Citation


in Harvard Style

Haines O., Bull D. and Burn J. (2015). Using Inertial Data to Enhance Image Segmentation - Knowing Camera Orientation Can Improve Segmentation of Outdoor Scenes . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-090-1, pages 21-32. DOI: 10.5220/0005274000210032


in Bibtex Style

@conference{visapp15,
author={Osian Haines and David Bull and J. F. Burn},
title={Using Inertial Data to Enhance Image Segmentation - Knowing Camera Orientation Can Improve Segmentation of Outdoor Scenes},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={21-32},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005274000210032},
isbn={978-989-758-090-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)
TI - Using Inertial Data to Enhance Image Segmentation - Knowing Camera Orientation Can Improve Segmentation of Outdoor Scenes
SN - 978-989-758-090-1
AU - Haines O.
AU - Bull D.
AU - Burn J.
PY - 2015
SP - 21
EP - 32
DO - 10.5220/0005274000210032